ShowerData Documentation

ShowerData is a Python library for efficient storage and retrieval of calorimeter shower data in HDF5 format. It is designed for the training of fast machine learning-based surrogate models of particle showers simulations. One of its key advantages is the consistent use of type hints throughout the codebase.

Features

  • Efficient HDF5 Storage: Storing variable-length point clouds without padding

  • Easy Data Access: Simple API for loading and saving shower data

  • Observable Calculations: Functions to compute basic observables

  • Command-Line Interface: Tools for shuffling, adding observables, shifting, and clustering showers

Quick Start

Install ShowerData using pip:

pip install showerdata

Load and work with shower data:

import showerdata
import numpy as np

# Load first 1000 showers from an HDF5 file
showers = showerdata.load("path/to/your/file.h5", stop=1000)

# Access numpy array of shower
hits = showers.points
inc_pdg = showers.pdg
inc_energies = showers.energies
inc_directions = showers.directions

# Calculate observables for first 10 showers
num_points = showerdata.observables.calc_num_points_per_layer(showers[:10])
layer_energies = showerdata.observables.calc_energy_per_layer(showers[:10])

# Load precomputed observables if available
# Note: running "showerdata add-observables" first is recommended
observables = showerdata.observables.read_observables_from_file(
   path="path/to/your/file.h5",
   stop=10,
   observables=["num_points_per_layer", "energy_per_layer"]
)

assert np.all(num_points == observables["num_points_per_layer"])
assert np.all(layer_energies == observables["energy_per_layer"])

showers.save("new_file1.h5")  # Save showers to a new file
showerdata.save(showers, "new_file2.h5") # Equivalent way to save

# Create a Showers object from numpy arrays
showers = showerdata.Showers(
   points=hits,
   pdg=inc_pdg,
   energies=inc_energies,
   directions=inc_directions
)

Use the command-line interface:

# Shuffle multiple HDF5 files into one
showerdata shuffle file1.h5 file2.h5 shuffled_file.h5 --overwrite

# Add observables to an existing HDF5 file
showerdata add-observables file.h5 --batch-size 1000

# Shift showers in an existing HDF5 file to counteract the incident angle
showerdata shift input_file.h5 output_file.h5

# Shift back to original positions
showerdata shift output_file.h5 restored_file.h5 --inverse

# Cluster showers
showerdata cluster input_file.h5 output_file.h5 --processes 8

# Filter hits in showers based on energy threshold and spatial region
showerdata filter input_file.h5 output_file.h5 -r 200 -d

# List available CLI commands
showerdata

# Get help on a specific command
showerdata <subcommand> --help

Documentation Contents

Development:

Indices