This document discusses how high-throughput experimentation (HTE) and machine learning (ML) can accelerate materials discovery for functional metallic glasses (MGs). It describes a round robin experiment between NIST and NREL to synthesize and characterize composition spread samples to test data sharing standards. General trends predicted by ML models often correlate within a given synthesis method but systematic differences can occur between methods. While ML is not a replacement for physics, the combination of HTE and ML can identify promising new materials faster than traditional experimentation alone. Autonomous research platforms may enable an even greater acceleration of the materials discovery process.
How HTE and ML Enable Fast Materials Discovery for Functional Metallic Glasses
1. Failing Fastest: What an
Effective HTE and ML Workflow
Enables for Functional Metallic
Glasses
Jason R. Hattrick-Simpers
NIST Gaithersburg
Jason.Hattrick-Simpers@nist.gov
@jae3goals
2. Acknowledgements
USC
Travis Williams
SLAC
Dr. Apurva Mehta
Dr. Fang Ren
Dr. Suchismita
Brenna Gibbons
Northwestern
Prof. Wolverton
Dr. Logan Ward
UNSW
Prof. Kevin Laws
NIST
Dr. Martin Green
Dr. Zachary Trautt
Dr. Gilad Kusne
Dr. Brian DeCost
Dr. Kil-won Moon
Mr. Ryan Smith
NREL
Dr. Andriy
Zakutayev
CSM
Prof. Packard
Dr. Schoeppner
3. Beware of the AI/ML Hyperbole!
• Did Google guess my last name
with just my picture????
• Clearly no more than it could have
guessed my trade
• ML models are often interpolative
and correlative but our
INTERPRETATIONS can build in
causation that doesn’t exist.
4. Once You Prove It Works, Now What?
• Drowning in possibilities
• How do I select the most interesting systems to study?
• MGs for coating vs. BMGs
• Can I make a model with different target end-applications?
• Heuristics will be challenged
• Can I learn something new from a physics-agnostic model?
• Can I use similar methods to design an alloy for performance?
• What happens when I have to start over?
• How do I FAIR-ly share my data?
10. Case Example Ni-Ti-Al: Breaking from
Convention AND Property Prediction
No “deep” eutectics necessary!
11. But How to Create Property Models?
• There is no L-B-type data set for
properties of MG
• NLP/data extraction from
figures is in its infancy
• Manually scrape the literature
• 2000+ entries
• Errant measurements
• Many different groups
• Inconsistent definition of
“amorphous”
• Train on <500 entries
Algorithm R2 MAE
Random Forest 0.873 10.315
Linear Support
Vector Machine
0.614 25.024
Gaussian
Processes
-3.951 99.455
LASSO 0.805 15.818
12. How Much Science is Encoded in our
Descriptors?
Bad
Elements
Holdout
size
R2 Mean % error
Boron 132 -1.7 30%
Zinc 12 -0.8 58%
Ruthenium 2 -44 45%
Good
Elements
Holdout
size
R2 Mean % error
Vanadium 5 0.97 6.2%
Erbium 23 0.95 4.1%
Sulfur 5 0.95 7.8%
• Can the chemical descriptors
used predict properties for
materials outside of the training
set?
• Remove only ternaries
• Remove all entries of an element
Ti Alloys
Training
•Zr45Cu55
•Zr60Al15Cu25
Validation
•Ti40Cu60
•Hf25Ti15Cu60
Ti-Cu-Hf
Included
•Hf25Ti15Cu60
•Hf27.5Ti15Cu57.5
Not Included
•Ti40Cu60
•Hf13.9Ti41.3Si1.1Cu43.7
15. High-Throughput vs Rapid Knowledge
Generation
• In ~ 7 weeks of beamtime we have characterized >30,000 unique
alloys
• Acquisition knowledge time is <1 minute/composition
• In the case of nano-identation measurement + analysis time is >1
hour/composition
• This is high quality data
• Can we fuse high-quality high-density experimental data with lower-
quality ML predictions to converge on the answer faster?
• Brian DeCost says yes(?)
18. HTE-MC 1st Steps: NIST – NREL Round Robin
Sample synthesis and measurements:
• Synthesize: Zn-Sn-Ti-O composition spread
sample libraries using combinatorial PLD
(@NIST) or sputtering (@NREL)
• Measure: Chemical composition, Crystal
structure, Electrical conductivity, Optical
transmittance, Band gap
• Exchange: Sample libraries and associated
data, repeat measurements
Zn-Sn-Ti-O:
• Chemical composition
• Crystal structure
• Electrical conductivity
• Optical transmittance
• Work function
Goal: test and improve the standards for exchange of data and sample among participant labs
NREL Samples NIST Sample
19. Playing FAIR with Data
To be Findable:
• (meta)data are assigned a globally unique and
persistent identifier
• data are described with rich metadata
• metadata clearly and explicitly include the identifier
of the data it describes
• (meta)data are registered or indexed in a searchable
resource
To be Accessible:
• (meta)data are retrievable by their identifier using a
standardized communications protocol
– the protocol is open, free, and universally
implementable
– the protocol allows for an authentication and
authorization procedure, where necessary
• metadata are accessible, even when the data are no
longer available
To be Interoperable:
• (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
• (meta)data use vocabularies that follow FAIR
principles
• (meta)data include qualified references to other
(meta)data
To be Reusable:
• meta(data) are richly described with a plurality of
accurate and relevant attributes
– (meta)data are released with a clear and accessible data
usage license
– (meta)data are associated with detailed provenance
– (meta)data meet domain-relevant community standards
Wilkinson, Mark D., et al. "The FAIR Guiding Principles for scientific data
management and stewardship." Scientific data 3 (2016). DOI:
10.1038/sdata.2016.18
20. General Observations – Trends Great
General Trends Well Correlated
Within Samples Made at an Institution
General Trends Correlate
For Samples Made at Different
Institutions
Systematic Differences Can Occur for
Different Measurement Types
21. Conclusions
• ML is dumb and doesn’t understand Physics
• This is great news!
• We can elect to pursue moonshots materials (especially with HTE)
• It is possible to build models that are processing sensitive
• We can design the alloy with an eye towards transfer
• When building new datasets thorough testing is required before the
models are to be believed
• Know (and REPORT in your papers) where the model breaks
• We are already HTE-ML-ing phase MG stability faster than the rate of wafer
generation
• Physical property measurements are slower
• Can autonomous platforms realize the promised 10x – 100x acceleration?