In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory
1. LLNL-PRES-808845
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-
AC52-07NA27344. Lawrence Livermore National Security, LLC
The Incorporation of Machine Learning into Scientific
Simulations at Lawrence Livermore National Laboratory
The Stanford HPCC and HPC-AI Advisory Council
Annual Stanford Conference
Katie Lewis, Lawrence Livermore National Laboratory
Advanced Machine Learning Project Leader
April 22, 2020
2. 2
LLNL-PRES-808845
Supercomputing and Computational Physics at
Lawrence Livermore National Laboratory
www.llnl.gov/about/history
www.llnl.gov/news/berni-alder-pioneer-times
www.top500.org
§ Lawrence Livermore National Laboratory (LLNL) was founded in 1952
— Scientific Computing was part of our initial portfolio
§ “It is now accepted that in addition to the experimental and theoretical branches of physics,
there is a third: computer simulation.” - David Young, postdoc who worked with Bernie Alder
in the late 1960s
§ Today, LLNL and the DOE Complex continue to dominate supercomputing for scientific
simulations in support of national security.
§ Data Science is increasingly a part of this landscape.
Sierra – 125 petaflops
4. 4
LLNL-PRES-808845
Many research topics are already being investigated
a_rf_nobkg (No Bkg)
0.00
0.16
0.31
0.47
0.63
0.78
0.94
Material Discovery
Augmented
Turbulence Modeling
B" C"
A"
D"
A"
C"
B" D"
High%Vor)city%
Improved Design Workflows
Improved Material Interface
Reconstruction
Advanced
Surrogate Models
Mul?-scale Coupling
Journal for Reac-on Chemistry & Engineering
5. 5
LLNL-PRES-808845
Terminology
§ Geometry is discretized using a “mesh” or “grid”. Time is
discretized into timesteps.
§ Fields (like density, velocity, or temperature) are
calculated at the mesh points or “zones”.
§ In Eulerian simulations, the mesh is static and the
materials move through it.
§ In Lagrangian simulations, the mesh moves with the
material.
§ In Arbitrary Lagrangian-Eulerian (ALE) simulations, the
mesh moves, but not necessarily with the material.
3D Lagrangian simulation
faculty.washington.edu
2D ALE Simulation
6. 6
LLNL-PRES-808845
Application: ML to control Arbitrary Lagrangian-Eulerian (ALE)
B" C"
A"
D"
A"
C"
B" D"
High%Vor)city%
This problem is typically solved with hand-tuned relaxaSon strategies.
Ming Jiang, Brian Gallagher, Keith Henderson, Alister Maguire
time
7. 7
LLNL-PRES-808845
Trained relaxation strategies can significantly reduce user burden
Ming Jiang, Brian Gallagher, Keith Henderson, Alister Maguire
Zone Angle Skew Temp Energy Label
1 85 1.0 243 0.9 0.1
2 26 1.2 752 3.5 0.9
… … … … … …
Simulation
Run
Inference
Algorithm
Learning
Algorithm
Statistical
Models
Training
Data
Simulation State
Simulation state:
mesh + physics
Class label:
failure event
f(x1, x2, ..., xn) = 0..1
8. 8
LLNL-PRES-808845
Research project is moving into user community
Building on top of M. Jiang, B. Gallagher, J. Kallman, and D. Laney, “A
Supervised Learning Framework for Arbitrary Lagrangian-Eulerian
Simulations,” IEEE International Conference on Machine Learning and
Applications (ICMLA), pp. 977–982, 2016.
§ Initial results showed high accuracy using random forests
§ Recent work improves the imbalance in training data and
generalization to noisy data using Convolutional Neural Networks
(CNNs)
§ Working with user community to provide quantitative analysis of
results using the CNN for inference inline
— Evaluate quantities of interest against experimental results
— Develop a reward function for Reinforcement Learning
§ Test case for proxy application on new hardware (more later)
Bubble Shock
Shock Tube
Ming Jiang, Brian Gallagher, Keith Henderson, Alister Maguire
9. 9
LLNL-PRES-808845
ApplicaBon: ML for Material Interface ReconstrucBon (MIR)
Current, High Res
Reconstruction
Actual Material
Boundaries
Current, Low Res
Reconstruction
Dan Fenn, Walt Nissen, Kenny Weiss
10. 10
LLNL-PRES-808845
Training with actual geometry may avoid the common errors
seen in heuristic solutions
High-Res
Background
Low-Res
Background
Current Interface
Reconstruction
Dan Fenn, Walt Nissen, Kenny Weiss
11. 11
LLNL-PRES-808845
We can use this methodology to train on many shapes
Varying:
• Position
• Size
• Rotation
• Background vs.
Foreground
Dan Fenn, Walt Nissen, Kenny Weiss
12. 12
LLNL-PRES-808845
Initial results are very promising!
Original Geometry
(before overlink)
Plot of volume
fractions
Volume fraction
preserving
reconstruction
NN reconstruction
0.6% overall error
NN reconstrucBon
0.5% overall error
Dan Fenn, Walt Nissen, Kenny Weiss
13. 13
LLNL-PRES-808845
Overfitting is evident in results
Original Geometry
(before overlink)
NN reconstruction
0.6% error
NN reconstruction
0.5% error
Dan Fenn, Walt Nissen, Kenny Weiss
Plot of volume
fractions
Volume fraction
preserving
reconstruction
14. 14
LLNL-PRES-808845
The algorithm does not currently account for the material
volume fractions in each zone
Original Geometry
(before overlink)
NN reconstruction
1.5% overall error
Dan Fenn, Walt Nissen, Kenny Weiss
Plot of volume
fractions
15. 15
LLNL-PRES-808845
Material Interface Reconstruction – Next Steps
§ Incorporate volume fraction information into training as a loss/reward function
— Modifying threshold to meet volume fractions was unsuccessful (i.e., too noisy)
§ Incorporate active learning techniques to handle new types of geometries
§ Evaluate how the algorithm will work in-situ, accounting for conserved physical
quantities
§ Investigate reconstruction for multiple materials
Dan Fenn, Walt Nissen, Kenny Weiss
16. 16
LLNL-PRES-808845 Tom Stitt
ApplicaBon: ML for Fast Surrogate Modeling
100s of hydro simulations were
used to train a CNN.
Inference is ~4000x the speed of
the full simulation.
Maximum mean squared error
across cycles in ~1.6%, although
maximum error can be much
higher.
This methodology can be used to
optimize parameters for full
simulation.
2D Hydro Simulation Neural Network
Surrogate
Difference
17. 17
LLNL-PRES-808845
Can ML replace interpolation schemes used within continuum models
when querying opacity or equation of state models, reducing memory
footprints while maintaining accuracy?
Application: ML to improve multi-scale coupling
Ab-initio Atoms Long-time Microstructure Dislocation Crystal Continuum
Inter-atomic
forces, EOS,
excited states
Defects and
interfaces,
nucleation
Defects and
defect structures
Meso-scale multi-
phase, multi-grain
evolution
Meso-scale
strength
Meso-scale
material response
Macro-scale
material response
Rob Blake, Ben Yee, and Mike Hohensee
18. 18
LLNL-PRES-808845
Before running continuum code:
• Perform atomic physics calculations to obtain
detailed data
• Store data in a 3-D table
During continuum code:
• Table lookup & linear interpolation
Fast, but inaccurate and memory intensive
Current method Proposed method with machine learning
Before running con`nuum code:
• Perform atomic physics calcula`ons to obtain
detailed data
• Regression problem: Use data to train a neural net
During con`nuum code:
• Apply inference on neural net
FLOPs ⬆, accuracy ⬆, memory ⬇
𝜈!, 𝜂", 𝑇#
Trained
neural net
𝜎(𝜈!, 𝜂", 𝑇#)
𝑇$ 𝑇% … 𝑇&$
𝜂$ …
𝜂% …
⋮ ⋮ ⋮ ⋱ ⋮
𝜂&$ …
Evaluation of ML for Opacity Interpolation
Rob Blake, Ben Yee, and Mike Hohensee
19. 19
LLNL-PRES-808845
Networks trained on a subset of the domain
§ Initial Evaluation:
— Network trained on 2D slice of iron, varying density and frequency
— Specific density slices omitted for network validation
§ Results:
— Current network has accuracy comparable to existing tables
— Current network has improved accuracy between data slices
— Network consumes less memory, ~100x savings.
§ Unfortunately:
— The highest error is where is matters most (spiky data is problematic)
— Table data is highly curated
§ Investigating ML to predict which tables will be needed for
improved accuracy at runtime
Rob Blake, Ben Yee, and Mike Hohensee
21. 21
LLNL-PRES-808845
Medical imaging using Machine Learning may be
transferable to our needs
https://algorithmia.com/blog/vertical-spotlight-
machine-learning-for-healthcare-diagostics
https://github.com/mateuszbuda/brain-
segmentation-pytorch
Morry Aufderheide, Kevin Chen, and Hardeep Sullan
22. 22
LLNL-PRES-808845
Creating a training set of labeled images using simulations
§ Approach: use simulation to generate synthetic radiographs and image mask labels
— Start with clean radiographs and then introduce distortions normally found in experiments
— End goal is to detect features in experimental radiographs, while limiting manual labeling
Brain Tumor Dataset Radiograph Dataset
Cross-validation Dice score (2*overlap/total pixels) for 100 clean radiographic test images looks promising.
Morry Aufderheide, Kevin Chen, and Hardeep Sullan
24. 24
LLNL-PRES-808845
§ ML (esp. DL) needs a lot of data, with verified labels and provenance
§ Traditional databases are not common in the HPC environment
Kosh (Sanskrit for Treasury) is being developed to solve these problems
§ Multi-modal data sources seamlessly searchable and accessible by authenticated end-
users
— Plan to incorporated sampling algorithms (spatial and temporal)
— Datasets can have multiple files associated with them and multiple file formats
§ Data can be distributed across organization/lab/compute centers.
§ Users can query data to get only what they want for training. Adding augmentation
Data Infrastructure is a fundamental need for ML
Charles Doutriaux and Becky Haluska
28. 28
LLNL-PRES-808845
§ High-precision scientific simulation
§ Frequent ML training
§ Potentially very high-frequency inference
LLNL is strategically looking at AI test applications across
Scientific Computing programs
Active learning or intelligent sampling
Smart ALE, RANS
ML inference
every time step:
in the loop
ML training or
inference every
1k time steps:
on the loopML training or
inference every
simulation:
around the loop
Physics simulation
Experimental data
Transfer learning
every 10k simulations:
outside the loop
Elevated predictive model
Courtesy of Brian Spears
29. 29
LLNL-PRES-808845
§ High-precision scientific simulation
§ Frequent ML training
§ Potentially very high-frequency inference
We are developing a proxy application to understand
memory and bandwidth issues with accelerators for ALE
Active learning or intelligent sampling
Smart ALE, RANS
ML inference
every time step:
in the loop
ML training or
inference every
1k time steps:
on the loopML training or
inference every
simulation:
around the loop
Physics simulation
Kris Zieb and Ian Karlin
time
30. 30
LLNL-PRES-808845
§ A wide array of low-risk applications can be used to explore ML
— ALE, Material Interface Reconstruction, etc. already employ heuristics
— Neural Networks as surrogate models can be tested using full simulations
§ Many research projects at LLNL include:
— Physics informed ML
— Interpretability / Model interrogation
— Sparse data and transfer learning
§ As research matures, our ML applications will become higher risk
§ All applications need reproducibility and some amount of uncertainty analysis
— Verification and validation of training data
— Model (e.g., neural network) correctness for application
— Recognition of predictions outside of model scope
A note on Verification and Validation (V&V)
32. 32
LLNL-PRES-808845
Many Thanks!
Morry Aufderheide
Rob Blake
Kevin Chen
Sean Copeland
Charles Doutriaux
Dan Fenn
Brian Gallagher
Becky Haluska
Keith Henderson
Ming Jiang
Josh Kallman
Ian Karlin
Alister Maguire
Walt Nissen
Brian Spears
Tom Stitt
Hardeep Sullan
Brian Van Essen
Ping Wang
Kenny Weiss
Kris Zieb
33. Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United
States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or
implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific
commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or
imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC.
The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or
Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.