TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Searching for dark matter in CERN's Large Hadron Collider dataset
1. Searching for dark matter in CERN's
Large Hadron Collider dataset
Lavanya Shukla
2. What's
on the
menu
today?
The Standard Model
Sneak Peak into the Large
Hadron Collider
Machine Learning Challenges
Searching for Dark Matter
Particle Identification
How Can You Get Involved
3. you can play
with datasets
from CERN's
LHC experiments
http://opendata.cern.ch
5. All matter is made up of particles.
These particles interact with each other
by exchanging other particles
associated with the fundamental forces.
The Standard Model describes what the
universe is made up of and how it holds
together.
The Standard Model
6. Pretty much everything we see in the
universe is made of up quarks, down
quarks and electrons.
All the other things exist only for a short
amount of time before decaying into other
particles.
The Standard Model
7. The 12 fermions are the building blocks of
matter.
Quarks group of 6 particles, organize
together to make protons (2 up, 1 down)
and neutrons (1 up, 2 down).
Dependent particles, have partial
charge so are never seen alone in the
wild.
Need to combine with other quarks
using strong nuclear force to form
particles, most commonly in the nucleus.
Fermions
8. The 12 fermions are the building blocks of
matter.
Leptons group of 6 particles, with
electrons being the best known.
Independent, have whole units of
charge.
Not seen in the nucleus.
Fermions
Err..
9. Fermions interact with each other through
fundamental forces. The 5 bosons are the
building blocks of fundamental forces.
Strong force (very short range, typically
smaller than that of an atomic nucleus)
gluons – bind the quarks in protons
and neutrons
photons – carry electromagnetic
force in light
Role: Holds atomic nuclei and quarks
together.
Bosons
10. Fermions interact with each other through
fundamental forces. The 5 bosons are the
building blocks of fundamental forces.
Bosons
Weak force (very long range, but million
times weaker than strong force)
Carried by Z and W bosons
Higgs boson
Role: responsible for radioactive decay
12. 16 mile long tunnel
accelerates protons to almost the
speed of light (0.9999999c) and
smashes them into each other
4 big detectors
ALICE, ATLAS, CMS and LHCb
detect particles resulting from
proton-proton bunch collisions
40 million
bunch collisions per second
10 petabytes
of data per year
The detector measures the energy and momentum of every particle
flying out of the collision event and identify the types of those
particles by calculating their mass.
13.
14. The Standard Model describes what the
universe is made up of and how it holds
together.
The Standard Model
Every particle leaves different traces in
different sub-detectors of the experiment.
17. Anomaly detection (data quality and
infrastructure monitoring)
Detector design optimization (using
bayesian optimization, surrogate
modeling etc.)
Precise and fast particle tracking
(single tracks, shower, jets etc.)
Fast and accurate data processing, and
design of triggers
Particle Identification
Some Machine Learning
Challenges in Particle Physics
19. If we add up all the stars, planets,
galaxies, comets, black holes, dark
clouds - everything out there and the
gravity doesn't add up.
There is something holding the universe
and our galaxy together, without which
our galaxy would fall apart
Dark 'matter' is a misnomer
85% of the gravity in the
universe has an unknown
source
Hi! I'm Fred!
20. There is something.
It interacts with gravity.
Mass and gravity go together
so it has mass, maybe?
Probably is invisible
has no interaction with light or
electromagnetic force, doesn't emit
or reflect light
There is a lot of it.
So.. Not much really.
What do we know for sure?
I like playing
hard to get
21. WIMP = weakly interacting massive
particle
heavy like a neutrino, has 100 - 1000
times the mass of a proton
a particle interacts with other matter
like the neutrino
weak interaction = force between
subatomic particles responsible for
radioactive decay
WIMP theory of Dark Matter
Neither absorb nor emit light
Don't interact strongly with other particles.
But when they encounter each other, they
annihilate and make gamma rays, which we can
detect.
22. THE CHALLENGE
WE DON'T KNOW WHAT WE'RE
LOOKING FOR
There are plenty of theories that try to
predict presence of dark matter, but we
have no inkling of the nature of effect we
should be looking for.
LACK OF CLEAN, OBSERVABLE
DATA
The direct and indirect targets that might
predict dark matter’s presence are laced
with noise and background phenomenon
that might lead to misleading results.
23. We'll use ML to tackle
the problem of signal
extraction from
background noise
24. Eliminate background noise
from base tracks
1. 2. Cluster tracks into neutrino
and dark matter interactions.
The scale of the problem is massive.
10,000 out of 10 million basetracks are results of electromagnetic showers, rest are noise.
25. Feynman diagram of a dark matter particle X
scattering an electron of lead nuclei.
Neutrinos produce similar
showers
The problem is that when a neutrino
interacts with a nucleus, it also produces
an electron that get boosted and similar
showers are produced.
One of the key distinguishing qualities
between them is the energy-angle
correlation.
We apply clustering to distinguish the two
electromagnetic showers.
27. The OPERA dataset has the
following features
15 million base tracks
Each base track contains:
X, Y, Z co-ordinates
angles from origin (TX, TY)
Signal
Background consists of base tracks scattered
randomly in space, and has a label=0
Signals consist of base tracks forming a cone shape,
has a label=1
28. In addition we compute the
following features
distance from the origin (dX, dY, dZ, dTX, dTY)
alpha, the angle between directions
d, the projection of base track1
Signal has a defined geometric structure
Noise is largely random vectors
30. Feature Engineering: Add
particle track info to dataset
Detect particle track
Signal base tracks are part of a larger vector of movement of
particles.
This particle moves across many layers in a straight line path.
Detect parent in case of decay
Look for other children particles in the same layer.
31. Train a model: to classify
signal from noise
Define Neural Network
Train Neural Network
Make Predictions
32. Train classifier: Neural
Network to classify signal
from noise
Output of classifer:
Probability of each id being
a signal (higher is better)
Define and Train XGBoost Train a model: to classify
signal from noise
35. 2. Cluster tracks into
electromagnetic showers
that are good candidates for dark matter interactions.
36. Clustering: 2 Models
DBSCAN K-Means Our dataset
DBSCAN relies on euclidean distance, which won’t work for our case because we
care about relative angles between base tracks, in addition of relative positions.
37. Clustering: K-Means
Find the optimal number of clusters
The inertia's rate of decline flattens around k=6
clusters. So we'll train K-Means with 6 clusters.
38. Clustering: K-Means
Train K-Means with 6 clusters
Visualize clusters
Excellent! From all that background noise,
we’ve extracted the signal base tracks
that are the best candidates for dark
matter particle interactions! w00t!
39. 2. Cluster tracks into
electromagnetic showers
that are good candidates for dark matter interactions.
40. 2. Cluster tracks into
electromagnetic showers
that are good candidates for dark matter interactions.
42. Search for Hidden Particles
(SHiP)
Expected to launch in 2025
Designed more specifically to search for dark matter
Would employ the same techniques we used above to
reconstruct dark matter particle trajectories from data
Hopefully, the new data will allow us to determine with
greater confidence and accuracy which of these signal
clusters are a result of dark matter interactions!
When SHiP starts up, I’ll hope you'll use some of the
techniques we've used today and join me in exploring
the dataset!
44. The Goal
Identify the type of a particle associated
with a track using responses from different
detector systems.
There are five particle types: Electron,
Proton, Kaon, Pion and Muon.
Therefore Particle Identification is a multi-
class classification Problem
45. The Problem
The inputs to the classifier are the particle
track responses from the 5 sub-detector
systems:
Tracking system
Ring Imaging Cherenkov detector
Electromagnetic calorimeter
Hadron calorimeter
Muon Chambers
The outputs of the classifier are six labels,
five of them correspond to five different
particle types and Ghost is a catch-all for
noisy tracks and other particle types.
46. Getting a little in the weeds
Each particle has a certain energy,
momentum and mass.
Energy and momentum are determined by
the speed of the particle, and mass by its
type.
From the laws of conservations of energy
and momentum, we know that when a
particle decays:
the energy and momentum of the mother
particle = the sum of energies and
momentums of the daughter particles.
47. Getting a little in the weeds
The sub-detectors estimate the daughter
particle’s trajectory, momentum, energy
and type.
From this we can reconstruct mother
particle’s parameters (e.g. the mass) and
therefor detect the particle.
49. The Dataset
1.2 million data-points, each
representing a particle track.
50 features
measurements from the sub-
detectors
derived features from these
measurements
52. Neural Net Classifier
Here’s a plot of the ROC curves for all
particle classifiers.
AdaBoost model performs slightly better than the neural network in this case.
AdaBoost Neural Network