Core Module

ShowerData - A Python library for shower data storage.

showerdata.add_target_dataset(path, shape, key='target', exists_ok=False)[source]

Add an empty target dataset to an existing HDF5 file.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
shape (tuple[int, int, int]) – Shape of the target dataset.
key (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.
exists_ok (bool) – If True, do not raise an error if the target dataset already exists. Defaults to False.

Return type:

None

showerdata.load_target(path, key='target', start=0, stop=None, max_points=-1)[source]

Load latent space target data for a specific generative model from an HDF5 file. The target usually has the same shape as the shower points.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
key (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.
start (int) – Start index for loading target data. Defaults to 0.
stop (Optional[int]) – Stop index for loading target data. If None, load until end of file. Defaults to None.
max_points (Optional[int]) – Maximum number of points to load per shower. If -1, load all points. Defaults to -1.

Returns:

Loaded target data and corresponding number of points.

Return type:

tuple[NDArray[np.float32], NDArray[np.int32]]

showerdata.save_target(data, path, num_points=None, name='target', overwrite=False)[source]

Save latent space target data for a specific generative model to an HDF5 file. The target usually has the same shape as the shower points.

Parameters:

data (NDArray[np.float32]) – Target data to save.
num_points (NDArray[np.int32] | None) – Number of points for each shower in the target data.
path (str | os.PathLike[str]) – Path to the HDF5 file.
name (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.
overwrite (bool) – If True, overwrite existing dataset in file. Defaults to False.

Return type:

None

showerdata.save_target_batch(data, path, num_points=None, start=0, key='target')[source]

Save a batch of latent space target data for a specific generative model to an HDF5 file. The target usually has the same shape as the shower points. The file must already exist and have the correct shape. Use add_target_dataset to create the target dataset first.

Example

>>> showerdata.create_empty_file("showers.h5", shape=(1000, 500, 5))
>>> showerdata.add_target_dataset("showers.h5", shape=(1000, 500, 3), key="target")
>>> # Now you can use save_target_batch
>>> target_data = np.random.rand(100, 500, 3).astype(np.float32)  # Example target data
>>> num_points = np.random.randint(1, 501, size=(100,), dtype=np.int32)  # Example num_points
>>> showerdata.save_target_batch(target_data, num_points, "showers.h5", start=0, key="target")

Parameters:

data (NDArray[np.float32]) – Target data to save.
path (str | os.PathLike[str]) – Path to the HDF5 file.
num_points (NDArray[np.int32] | None) – Number of points for each shower in the target data.
start (int) – Start index in the file. Defaults to 0.
key (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.

Return type:

None

showerdata.cluster(showers, random_shift=True, detector_config=Geometry(ILD), processes=1)[source]

Cluster hits into readout cells using a regular grid.

Parameters:

showers (Showers) – Showers to cluster.
random_shift (bool) – Whether to apply a random shift to the grid (default: True).
detector_config (DetectorGeometry) – Simplified detector description (default: ILD).
processes (int) – Number of parallel processes to use (default: 1, i.e. no parallelism).

Return type:

Showers

Returns:

Clustered showers.

showerdata.concatenate(data)[source]

Concatenate a iterable of Showers instances into a single instance.

Parameters:: data (Iterable[Showers]) – Iterable of Showers instances to concatenate.
Returns:: Concatenated Showers instance.
Return type:: Showers

showerdata.create_empty_file(path, shape, overwrite=True)[source]

Create an empty HDF5 file with the specified dataset shape. To be used before calling save_batch.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
shape (tuple[int, int, int]) – Shape of the showers dataset.
overwrite (bool) – If True, overwrite existing file. Defaults to True.

Return type:

None

showerdata.filter_showers(shower_data, radius=inf, ecal_threshold=0.0, hcal_threshold=0.0, num_layers_ecal=30)[source]

Filter hists in the shower data based on the specified criteria.

Parameters:

shower_data (Showers) – The shower data to filter.
radius (float) – Radius (in millimeters) for the cylindrical cut filter.
ecal_threshold (float) – Energy threshold (in GeV) for the ECAL hit filter.
hcal_threshold (float) – Energy threshold (in GeV) for the HCAL hit filter.
num_layers_ecal (int) – Number of layers in the ECAL detector.

Return type:

Showers

Returns:

The filtered shower data.

showerdata.get_file_length(path)[source]

Get the number of samples in an HDF5 shower data file. Unlike get_file_shape, this function works also for files only containing incident particle data.

Parameters:: path (str | os.PathLike[str]) – Path to the HDF5 file.
Returns:: Number of samples in the file.
Return type:: int

showerdata.get_file_shape(path)[source]

Get the shape of the showers dataset in an HDF5 file. Only works for files containing shower data.

Parameters:: path (str | os.PathLike[str]) – Path to the HDF5 file.
Returns:: Shape of the showers dataset (num_showers, num_points, 4).
Return type:: tuple[int, int, int]

class showerdata.IncidentParticles(energies, pdg, directions=None)[source]

Bases: object

Data structure for incident particle data.

Parameters:

energies (ArrayLike) – Energies of the incident particles.
pdg (ArrayLike or int) – Particle Data Group identifier(s).
directions (Optional[ArrayLike]) – Directions of the incident particles as a unit vector. Defaults to (0, 0, 1).

energies

Energies of the incident particles.

Type:: NDArray

directions

Directions of the incident particles given as a unit vector.

Type:: NDArray

pdg

Particle Data Group identifiers for the incident particles.

Type:: NDArray

__getitem__(index)[source]

Get a subset of the IncidentParticles.

Return type:: IncidentParticles

save(path, overwrite=False)[source]

Save incident particles data to an HDF5 file.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
overwrite (bool) – If True, overwrite existing file. Defaults to False.

Return type:

None

showerdata.load(path, start=0, stop=None, max_points=-1)[source]

Load shower data from an HDF5 file.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
start (int) – Start index for loading showers. Defaults to 0.
stop (Optional[int]) – Stop index for loading showers. If None, load until end of file. Defaults to None.
max_points (int) – Maximum number of points to load per shower. If -1, load all points. Defaults to -1.

Returns:

Loaded shower data.

Return type:

Showers

showerdata.load_inc_particles(path, start=0, stop=None)[source]

Load incident particle data from an HDF5 file.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
start (int) – Start index for loading incident particles. Defaults to 0.
stop (Optional[int]) – Stop index for loading incident particles. If None, load until end of file. Defaults to None.

Returns:

Loaded incident particle data.

Return type:

IncidentParticles

showerdata.save(data, path, overwrite=False)[source]

Save data to an HDF5 file.

Parameters:

data (Showers | IncidentParticles) – Data to save.
path (str | os.PathLike[str]) – Path to the HDF5 file.
overwrite (bool) – If True, overwrite existing file. Defaults to False.

Return type:

None

showerdata.save_batch(data, path, start=0)[source]

Save a batch of shower data to an HDF5 file. The file must already exist and have the correct shape. Use create_empty_file to create the file first.

Example

>>> showerdata.create_empty_file("showers.h5", shape=(1000, 500, 5))
>>> # Now you can use save_batch to fill the file with data.
>>> showers = showerdata.Showers(...)  # Create or load some showers
>>> showerdata.save_batch(showers, "showers.h5", start=0)

Parameters:

data (Showers) – Shower data to save.
path (str | os.PathLike[str]) – Path to the HDF5 file.
start (int) – Start index in the file. Defaults to 0.

Return type:

None

class showerdata.ShowerDataFile(path, mode='r', shape=None)[source]

Bases: object

Context manager for handling shower data in HDF5 files.

Example

>>> # read showers from a file
>>> with showerdata.ShowerDataFile("showers.h5") as file:
...    print(file.shape)
...    print(file.num_showers)
...    shower = file[0]  # Get the first shower
>>> # create a new file and write showers
>>> with showerdata.ShowerDataFile(
...     path="new_showers.h5",
...     mode="w",
...     shape=(1000, 500, 5),
... ) as file:
...     new_showers = showerdata.Showers(...)  # Create or load some showers
...     file[0:100] = new_showers  # Write first 100 showers
...     file[100:200] = new_showers  # Write next 100 showers

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
mode (str) – File mode, either ‘r’ (read), ‘w’ (write), or ‘a’ (append). Defaults to ‘r’.
shape (Optional[tuple[int, int, int]]) – Shape of the showers dataset when creating a new file. Required if mode is ‘w’.

close()[source]: Close the HDF5 file.

property shape: tuple[int, int, int]: Shape of the showers dataset in the file.

property num_showers: int: Number of showers in the dataset.

class showerdata.Showers(points=(), energies=(), pdg=(), directions=None, shower_ids=None, num_points=None, copy=None)[source]

Bases: object

Data structure for shower data.

Parameters:

points (ArrayLike) – Shower point cloud.
energies (ArrayLike) – Energies of the incident particles.
pdg (ArrayLike or int) – Particle Data Group identifier(s).
directions (Optional[ArrayLike]) – Directions of the incident particles as a unit vector. Defaults to (0, 0, 1).
shower_ids (Optional[ArrayLike]) – Unique identifiers for each shower. Defaults to sequential IDs.
copy (bool | None) – If True, data will be copied to ensure immutability. Defaults to None.

points

Array of shower points. Format: (num_showers, max_points, 4 or 5).

Type:: NDArray

energies

Energies of the incident particles.

Type:: NDArray

directions

Directions of the incident particles given as a unit vector.

Type:: NDArray

pdg

Particle Data Group identifiers for the incident particles.

Type:: NDArray

shower_ids

Unique identifiers for each shower.

Type:: NDArray

property is_empty: bool: Check if the Showers instance is empty.

copy()[source]: Create a copy of the Showers instance. :returns: A new Showers instance with copied data. :rtype: Showers

save(path, overwrite=False)[source]

Save data to an HDF5 file.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.
overwrite (bool) – If True, overwrite existing file. Defaults to False.

Return type:

None

property inc_particles: IncidentParticles: Get the incident particles associated with the showers.