Deep Learn Physics
Open Data

Get data - Train net - Data challenge!

Introduction


Electron

This webpage summarizes our open data samples that can be used by anyone for research purposes under the license. This effort is inspired by ILSVRC and many others making similar efforts to enable and accelerate machine learning techniques R&D. Our research is development of machine learniing algorithms to extract as much physics as possible from data taken by particle 2D/3D imaging detectors. The ultimate goal is to describe the full details of physics in each entry of data. What particles are in there (and where)? What is the hierarhcy among them (which particle produced which)? What is the energy/momentum of particles? Is there a new physics (unseen interactions) in the picture? By making our effort available in public domain, we wish to contribute toward one big goal, the development of A.I. to perform scientific research.

If you wish to contributed, please contact us!

Public Data


Muon

Our sample consists of voxelized 3D data and projected 2D images of charged particles' trajectories (detected energy deposition patterns) in particle detectors stored in larcv file format. You can find a set of tutorials on how to install larcv , browsing file contents , and an example of how to interface for training and inference using tensorflow . larcv provides data interfaces in both C++ and python (numpy array). Any machine learning software with either of these interfaces can be easily adopted.

At the moment, only liquid argon time projection chamber (LArTPC) simulated data set is available. The publication that details the data set is in preparation. Below we organize a list of available data set by a detector type and three popular machine learning application criterias in computer vision: 0) image classification, 1) object detection, and 2) pixel segmentaions. In actuality our sample contains highly detailed simulation information to allow development of wide variety of analysis applications beyond these examples. If you find a difficulty accessing, understanding, or using any part of data, please contact us so that we can improve for everyone.


Practice/Tutorial Samples

Tag: TutorialClassification

File URL Detector Task Brief description
practice_train_5k.root practice_test_5k.root LArTPC 2D image classification Image classification train data set consists of five particle types (electron, gamma ray, muon, charged pion, and proton), prepared for tutorial purpose. 5,000 images separately for train and test set. In each set you find one particle per image. The fraction of each particle image in the whole set is same (1,000 images per particle type). Images are made by projecting simulated 3D particle energy deposition profiles on 2D planes (xy, yz, zx projections in cartesian coordinate). More details can be found here.

Tag: TutorialSegmentation

File URL Detector Task Brief description
practice_train_2k.root practice_test_2k.root LArTPC 2D image segmentation / object detection Semantic segmentation train data set include 2D images each containing a varying multiplicity of particles from five possible types (electron, gamma ray, muon, charged pion, and proton). The "segment" image contains 3 class of semantics: background, EM-shower particles, and track particles (i.e. not EM-shower). In each image all primary particles are generated from one point, called event vertex, which position is randomized across events. There may be secondary particles produced at by primaries. These files are prepared for tutorial purpose containing 2,000 images separately for train and test set. Images are made by projecting simulated 3D particle energy deposition profiles on 2D planes (xy, yz, zx projections in cartesian coordinate). More details can be found here.

Single Particle Sample

Tag: ChallengeClassification

File URL Detector Task Brief description
train_50k.root test_40k.root challenge_40k.root LArTPC 2D image classification Image classification train data set consists of five particle types (electron, gamma ray, muon, charged pion, and proton). 50,000 and 40,000 images separately for train and test set respectively. In each set you find one particle per image. The fraction of each particle image in the whole set is same (10,000 images per particle type in train set, 8,000 per particle in test set). Images are made by projecting simulated 3D particle energy deposition profiles on 2D planes (xy, yz, zx projections in cartesian coordinate). Challenge set contains 40,000 images without label information. All files contain statistically independnet samples. More details can be found here.

Multiple Particle Sample

Tag: ChallengeSegmentation

File URL Detector Task Brief description
train_15k.root test_10k.root challenge_10k.root LArTPC 2D image segmentation / object detection Semantic segmentation train data set include 2D images each containing a varying multiplicity of particles from five possible types (electron, gamma ray, muon, charged pion, and proton). The "segment" image contains 3 class of semantics: background, EM-shower particles, and track particles (i.e. not EM-shower). In each image all primary particles are generated from one point, called event vertex, which position is randomized across events. There may be secondary particles produced at by primaries. 15,000 and 10,000 images for train and test set respectively. Challenge set contains 10,000 images without label information. All files contain statistically independnet samples. Images are made by projecting simulated 3D particle energy deposition profiles on 2D planes (xy, yz, zx projections in cartesian coordinate). More details can be found here.

Challenge


Muon

Would you share your awesome network's performance on challenge data set? We would like to organize an informal competition for those who wish to try this on subset of open data with challenge samples. Please contact us to share your results. We expect the following format for different types of challenge.

Particle image classification

A CSV file with "entry" and "prediction" columns. The "entry" should be the integer index of an image in the file, and "prediction" should be the classification category index. One image entry per row. Here is a tutorial example to create such CSV file.

Particle detection

A larcv file with larcv::EventParticle data product. Please record particle type using PDG code and store 2D and/or 3D bounding box whichever is appropriate for the challenge data set. A tutorial example is to be posted.

Particle segmentation

A larcv file with larcv::EventClusterVoxel2D (for 2D) or larcv::EventClusterVoxel3D (for 3D) that saves your pixel-wise segmentation results. To associate your segmentation with annotation particle information, please also store larcv::EventParticle data product. A tutorial example is to be posted.

About Us


We are a group of experimental particle physicists and interested in applying machine learning techniques to physics data reconstruction and analysis. Our main page details a bit more about our group effort. This particular effort of sharing public data set is originated from the LArTPC experiment community where many physicists are interested in the techniques but lack of technical expertise to get work started. We hope this effort helps everyone in LArTPC community to take the 1st step of learning and applying machine learning techniques for his and her analysis. "Accelerating your research will benefit us all (the community)" is the idea behind this open-data effort.