Core Module

ShowerData - A Python library for shower data storage.

showerdata.add_target_dataset(path, shape, key='target', exists_ok=False)[source]

Add an empty target dataset to an existing HDF5 file.

Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • shape (tuple[int, int, int]) – Shape of the target dataset.

  • key (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.

  • exists_ok (bool) – If True, do not raise an error if the target dataset already exists. Defaults to False.

Return type:

None

showerdata.load_target(path, key='target', start=0, stop=None, max_points=-1)[source]

Load latent space target data for a specific generative model from an HDF5 file. The target usually has the same shape as the shower points.

Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • key (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.

  • start (int) – Start index for loading target data. Defaults to 0.

  • stop (Optional[int]) – Stop index for loading target data. If None, load until end of file. Defaults to None.

  • max_points (Optional[int]) – Maximum number of points to load per shower. If -1, load all points. Defaults to -1.

Returns:

Loaded target data and corresponding number of points.

Return type:

tuple[NDArray[np.float32], NDArray[np.int32]]

showerdata.save_target(data, path, num_points=None, name='target', overwrite=False)[source]

Save latent space target data for a specific generative model to an HDF5 file. The target usually has the same shape as the shower points.

Parameters:
  • data (NDArray[np.float32]) – Target data to save.

  • num_points (NDArray[np.int32] | None) – Number of points for each shower in the target data.

  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • name (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.

  • overwrite (bool) – If True, overwrite existing dataset in file. Defaults to False.

Return type:

None

showerdata.save_target_batch(data, path, num_points=None, start=0, key='target')[source]

Save a batch of latent space target data for a specific generative model to an HDF5 file. The target usually has the same shape as the shower points. The file must already exist and have the correct shape. Use add_target_dataset to create the target dataset first.

Example

>>> showerdata.create_empty_file("showers.h5", shape=(1000, 500, 5))
>>> showerdata.add_target_dataset("showers.h5", shape=(1000, 500, 3), key="target")
>>> # Now you can use save_target_batch
>>> target_data = np.random.rand(100, 500, 3).astype(np.float32)  # Example target data
>>> num_points = np.random.randint(1, 501, size=(100,), dtype=np.int32)  # Example num_points
>>> showerdata.save_target_batch(target_data, num_points, "showers.h5", start=0, key="target")
Parameters:
  • data (NDArray[np.float32]) – Target data to save.

  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • num_points (NDArray[np.int32] | None) – Number of points for each shower in the target data.

  • start (int) – Start index in the file. Defaults to 0.

  • key (str) – Name of the target dataset in the HDF5 file. Defaults to “target”.

Return type:

None

showerdata.cluster(showers, random_shift=True, detector_config=Geometry(ILD), processes=1)[source]

Cluster hits into readout cells using a regular grid.

Parameters:
  • showers (Showers) – Showers to cluster.

  • random_shift (bool) – Whether to apply a random shift to the grid (default: True).

  • detector_config (DetectorGeometry) – Simplified detector description (default: ILD).

  • processes (int) – Number of parallel processes to use (default: 1, i.e. no parallelism).

Return type:

Showers

Returns:

Clustered showers.

showerdata.concatenate(data)[source]

Concatenate a iterable of Showers instances into a single instance.

Parameters:

data (Iterable[Showers]) – Iterable of Showers instances to concatenate.

Returns:

Concatenated Showers instance.

Return type:

Showers

showerdata.create_empty_file(path, shape, overwrite=True)[source]

Create an empty HDF5 file with the specified dataset shape. To be used before calling save_batch.

Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • shape (tuple[int, int, int]) – Shape of the showers dataset.

  • overwrite (bool) – If True, overwrite existing file. Defaults to True.

Return type:

None

showerdata.filter_showers(shower_data, radius=inf, ecal_threshold=0.0, hcal_threshold=0.0, num_layers_ecal=30)[source]

Filter hists in the shower data based on the specified criteria.

Parameters:
  • shower_data (Showers) – The shower data to filter.

  • radius (float) – Radius (in millimeters) for the cylindrical cut filter.

  • ecal_threshold (float) – Energy threshold (in GeV) for the ECAL hit filter.

  • hcal_threshold (float) – Energy threshold (in GeV) for the HCAL hit filter.

  • num_layers_ecal (int) – Number of layers in the ECAL detector.

Return type:

Showers

Returns:

The filtered shower data.

showerdata.get_file_length(path)[source]

Get the number of samples in an HDF5 shower data file. Unlike get_file_shape, this function works also for files only containing incident particle data.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.

Returns:

Number of samples in the file.

Return type:

int

showerdata.get_file_shape(path)[source]

Get the shape of the showers dataset in an HDF5 file. Only works for files containing shower data.

Parameters:

path (str | os.PathLike[str]) – Path to the HDF5 file.

Returns:

Shape of the showers dataset (num_showers, num_points, 4).

Return type:

tuple[int, int, int]

class showerdata.IncidentParticles(energies, pdg, directions=None)[source]

Bases: object

Data structure for incident particle data.

Parameters:
  • energies (ArrayLike) – Energies of the incident particles.

  • pdg (ArrayLike or int) – Particle Data Group identifier(s).

  • directions (Optional[ArrayLike]) – Directions of the incident particles as a unit vector. Defaults to (0, 0, 1).

energies

Energies of the incident particles.

Type:

NDArray

directions

Directions of the incident particles given as a unit vector.

Type:

NDArray

pdg

Particle Data Group identifiers for the incident particles.

Type:

NDArray

__getitem__(index)[source]

Get a subset of the IncidentParticles.

Return type:

IncidentParticles

save(path, overwrite=False)[source]

Save incident particles data to an HDF5 file.

Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • overwrite (bool) – If True, overwrite existing file. Defaults to False.

Return type:

None

showerdata.load(path, start=0, stop=None, max_points=-1)[source]

Load shower data from an HDF5 file.

Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • start (int) – Start index for loading showers. Defaults to 0.

  • stop (Optional[int]) – Stop index for loading showers. If None, load until end of file. Defaults to None.

  • max_points (int) – Maximum number of points to load per shower. If -1, load all points. Defaults to -1.

Returns:

Loaded shower data.

Return type:

Showers

showerdata.load_inc_particles(path, start=0, stop=None)[source]

Load incident particle data from an HDF5 file.

Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • start (int) – Start index for loading incident particles. Defaults to 0.

  • stop (Optional[int]) – Stop index for loading incident particles. If None, load until end of file. Defaults to None.

Returns:

Loaded incident particle data.

Return type:

IncidentParticles

showerdata.save(data, path, overwrite=False)[source]

Save data to an HDF5 file.

Parameters:
Return type:

None

showerdata.save_batch(data, path, start=0)[source]

Save a batch of shower data to an HDF5 file. The file must already exist and have the correct shape. Use create_empty_file to create the file first.

Example

>>> showerdata.create_empty_file("showers.h5", shape=(1000, 500, 5))
>>> # Now you can use save_batch to fill the file with data.
>>> showers = showerdata.Showers(...)  # Create or load some showers
>>> showerdata.save_batch(showers, "showers.h5", start=0)
Parameters:
  • data (Showers) – Shower data to save.

  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • start (int) – Start index in the file. Defaults to 0.

Return type:

None

class showerdata.ShowerDataFile(path, mode='r', shape=None)[source]

Bases: object

Context manager for handling shower data in HDF5 files.

Example

>>> # read showers from a file
>>> with showerdata.ShowerDataFile("showers.h5") as file:
...    print(file.shape)
...    print(file.num_showers)
...    shower = file[0]  # Get the first shower
>>> # create a new file and write showers
>>> with showerdata.ShowerDataFile(
...     path="new_showers.h5",
...     mode="w",
...     shape=(1000, 500, 5),
... ) as file:
...     new_showers = showerdata.Showers(...)  # Create or load some showers
...     file[0:100] = new_showers  # Write first 100 showers
...     file[100:200] = new_showers  # Write next 100 showers
Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • mode (str) – File mode, either ‘r’ (read), ‘w’ (write), or ‘a’ (append). Defaults to ‘r’.

  • shape (Optional[tuple[int, int, int]]) – Shape of the showers dataset when creating a new file. Required if mode is ‘w’.

close()[source]

Close the HDF5 file.

property shape: tuple[int, int, int]

Shape of the showers dataset in the file.

property num_showers: int

Number of showers in the dataset.

class showerdata.Showers(points=(), energies=(), pdg=(), directions=None, shower_ids=None, num_points=None, copy=None)[source]

Bases: object

Data structure for shower data.

Parameters:
  • points (ArrayLike) – Shower point cloud.

  • energies (ArrayLike) – Energies of the incident particles.

  • pdg (ArrayLike or int) – Particle Data Group identifier(s).

  • directions (Optional[ArrayLike]) – Directions of the incident particles as a unit vector. Defaults to (0, 0, 1).

  • shower_ids (Optional[ArrayLike]) – Unique identifiers for each shower. Defaults to sequential IDs.

  • copy (bool | None) – If True, data will be copied to ensure immutability. Defaults to None.

points

Array of shower points. Format: (num_showers, max_points, 4 or 5).

Type:

NDArray

energies

Energies of the incident particles.

Type:

NDArray

directions

Directions of the incident particles given as a unit vector.

Type:

NDArray

pdg

Particle Data Group identifiers for the incident particles.

Type:

NDArray

shower_ids

Unique identifiers for each shower.

Type:

NDArray

property is_empty: bool

Check if the Showers instance is empty.

copy()[source]

Create a copy of the Showers instance. :returns: A new Showers instance with copied data. :rtype: Showers

save(path, overwrite=False)[source]

Save data to an HDF5 file.

Parameters:
  • path (str | os.PathLike[str]) – Path to the HDF5 file.

  • overwrite (bool) – If True, overwrite existing file. Defaults to False.

Return type:

None

property inc_particles: IncidentParticles

Get the incident particles associated with the showers.