SlideShare a Scribd company logo
1 of 34
Download to read offline
Data Mining to Discovery for Inorganic Solids:
Software Tools and Applications
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Artificial Intelligence for Materials Science
August 7, 2018
Slides (already) posted to hackingmaterials.lbl.gov
•  Three projects available now
–  Interpretable descriptors of crystal structure
–  matminer
–  atomate / Rocketsled
•  One project in progress
–  A text mining materials database
2
Overview of talk
3
I. Interpretable descriptors of
crystal structure
Machine learning: the big problem in my view is connecting
data to ML algorithms through features
4
Lots of data on
complex objects that
you want to interrelate
Clustering,	Regression,	Feature	
extraction,	Model-building,	etc.	
Well developed
data-mining routines that work
only on numbers (ideally ones
with high relevance to your
problem)
Need to transform materials science objects into a set of
physically relevant numerical data (“features” or “descriptors”)
5
The crystal structure is a core entity that
machine learning algorithms should know about
Step 1. Describe each site as a
fingerprint telling you close it is
to each of 22 known local
environments (e.g., tetrahedral,
octahedral, etc.)
Step 2: Describe each structure
as the average of its site
fingerprints*
tetrahedron
octahedron
distorted 8-coordinated cube
*(plus additional statistics like standard deviation, min, max,
etc. if desired – or split into separate cation/anion vectors)
Defining local order parameters for various environments
6
Use	a	given	local	order	parameter	
with	a	threshold	
for	motif	recognition:	
	
If	qtet	>	qthresh,	
				then	motif	is	tetrahedron.	
	
Else	
				not	(too	much)	a	tetrahedron.	
Tetrahedral order parameter, qtet, [1]:
[1] Zimmermann et al., J. Am. Chem. Soc., 2017, 10.1021/jacs.5b08098
We have now developed mathematical order parameters for
22 different local environments
7
How well do these work?
8
1. Order parameters clearly
distinguish different environments
even after thermal distortion
2. Work well in applications (defect site
finding, diffusion characterization)
[1] Zimmermann et al., Frontiers of Materials, 2017, doi: 10.3389/fmats.2017.00034
9
Structure fingerprints: can they distinguish crystal
structures?
BaAl2O4 BaZnF4 CaFe2O4 CrVO4 K2NiF4
CaB2O4-I MgUO4 Pb3O4 SbNbO4 Sr2PbO4
Tetragonal BaTiO3 Th3P4 TlAlF4 ZnSO4 α-MnMoO4
BCCAragonite Barite β-K2SO4 Calcite
Half-Heusler
FCC GarnetHCP Rocksalt Diamond
High-cristobalite Ilmenite Low-cristobalite Low-quartz
Monazite Olivine Perovskites RutilePhenacite
Tetragonal BaTiO3 Th3P4 TlAlF4 ZnSO4 α-MnMoO4
BCCAragonite Barite β-K2SO4 Calcite
Half-Heusler
FCC GarnetHCP Rocksalt Diamond
High-cristobalite Ilmenite Low-cristobalite Low-quartz
Monazite Olivine Perovskites Rutile
Scheelite Spinel Thenardite Wolframite Zircon
Phenacite
•  40 diverse crystal structure prototypes
•  Many complex examples (e.g., multi-cation, multi-anion) from each class
•  Thousands of crystal structures in the test set
•  Create structure fingerprints based on averages of local environments
•  The Euclidean distance of structure fingerprints
between structures of the same prototype is
small and different prototypes is larger
10
Local environments fingerprints do distinguish prototypes!
Overlapping	coefficient:	
OVC	=	1.7%	
distance between structure fingerprint vectors
distribution
same prototype
different prototype
11
Can cluster
crystal structures
by “local
environment
similarity”
Results on MP web site, e.g. for BCC-like structures
12
https://www.materialsproject.org/materials/mp-91/!
Target: W
similar structures
(distance near 0)
Cs3Sb!
TiGaFeCo!
CeMg2Cu!
•  Incorporate into machine learning models
•  Compare performance against other site /
structure descriptors
•  Beyond local environments
13
Structure descriptors – next steps
Implemented in:
•  pymatgen - www.pymatgen.org
•  matminer – https://hackingmaterials.github.io/matminer
More info: talk to Nils
Zimmermann at the poster
session!
14
II. matminer
15
Currently, it can be hard to get started with ML in materials
How can we make
this transformation?
Test different ideas?
Where do we get
the data?
Goal of matminer: connect materials data with data mining
algorithms and data visualization libraries
16
Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
>40 featurizer classes can
generate thousands of
potential descriptors
17
Matminer contains a library of descriptors for various
materials science entities
feat	=	EwaldEnergy([options])	
y	=	feat.featurize([input_data])	
•  compatible with
scikit-learn
pipelining
•  automatically deploy
multiprocessing to
parallelize over data
•  include citations to
methodology papers
18
Interactive Jupyter notebooks demonstrate use cases
https://github.com/hackingmaterials/matminer_examples!
Many	examples	available:		
	
•  Retrieving	data	from	various	databases	
	
•  Predicting	bulk	/	shear	modulus	with	ML	
•  Predicting	formation	energies:	
•  from	composition	alone	
•  with	Voronoi-based	structure	features	
included	
•  with	Coulomb	matrix	and	Orbital	Field	
matrix	descriptors	(reproducing	
previous	studies	in	the	literature)	
•  Making	interactive	visualizations	
	
•  Creating	an	ML	pipeline
•  Further increase coverage and scope of feature
extraction methods available in the literature
•  Increase the number of “standard” data sets that
can be used to benchmark different ML
approaches
•  Apply to materials problems (in progress)
19
matminer – next steps
Implemented in:
•  matminer – https://hackingmaterials.github.io/matminer
20
III. atomate / Rocketsled
Generalizable
forward solver
Supercomputing
Power
Statistical
optimization
FireWorks NERSC Various optimization libraries
(Figure: J. Mueller)
With high-throughput DFT, we can generate data rapidly –
what to do next?
21
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
Atomate is our software to easily run millions of such
calculations at supercomputing centers
22
Results!!
researcher!
Start	with	all	binary	
oxides,	replace	O->S,	
run	several	different	
properties	
Workflows to run!
ü  band structure!
ü  surface energies!
ü  elastic tensor!
q  Raman spectrum!
q  QH thermal expansion!
q  spin-orbit coupling!
Can we build a general computational optimizer?
23
Generalizable
forward solver
Supercomputing
Power
Statistical
optimization
FireWorks
/ atomate
NERSC Various optimization libraries
(Figure: J. Mueller)
Rocketsled: Automatic materials screening that selects
materials to compute AND submits them to supercomputer
24
screening space of ~20,000
potential ABX3 perovskite
combinations as water splitting
materials – precomputed in DFT
by different group
if a machine learning algorithm was in
charge of picking the next compound
based on past data, how efficient
would it be?
•  Built off the scikit-optimization package, with 10
different regressors (ML algorithms) available
•  Bootstrapped uncertainty estimates for balancing
exploration and exploitation
•  Next step: deployment for thermoelectrics search
25
Further details and next steps
Implemented in:
•  rocketsled – https://github.com/hackingmaterials/rocketsled
26
IV. A text mining materials database
Some questions that current search tools don’t answer:
these questions require materials-specific search tools!
“I’d like a list of all the chemical compositions that have been studied as
thermoelectrics, ideally weighted by research interest in them. Ok, now
filter to thermoelectric materials known to have layered structures. Now
show me some materials that are aren’t in that list but are similar in terms
of structure and electronic properties in the Materials Project database.”!
“What are all the known applications and unique properties of
NaCoO2? What techniques (computational, experimental) have
been used to study this compound in the past?”!
“I just predicted a new composition as a battery cathode. A lit search
shows no hits at all for that composition. Has anyone ever made
anything similar to that composition? I’d like to know for synthesis
ideas and also want to check against similarity to known battery
materials.”!
28
An engine to label the content of scientific abstracts
Matstract
corpus
Unlabeled
data
Data
labels
Feature engineering
Text cleaning
Tokenization
POS tag
labels
Word embeddings
(word2vec)
Text processing
Hand crafted features
Supervised learning
Neural network
(LSTM)
Logistic regression
Train/test
sets
Named
Entities
Named
Entities
“Learning” what a
scientific study is about
from >2 million
materials science
abstracts
29
Learn relationships over many abstracts
30
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
31
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
•  Further testing
•  Similarity metrics, e.g. if a target compound
doesn’t exist, retrieve information for “similar”
compounds instead
•  Integration with Materials Project
32
Materials abstracts – next steps
Interested in being a beta tester?
Contact me
•  Our group has been working on methods and
software for various applications
–  Interpretable descriptors of crystal structure
–  matminer
–  atomate / Rocketsled
–  A text mining materials database
•  We encourage you to try the software and let us
know what you think!
–  Help lists are available for all software
33
Conclusions
•  Structure descriptors
–  N. Zimmermann (project lead)
•  Atomate / Rocketsled
–  K Matthew (project lead, atomate)
–  A. Dunn (project lead, rocketsled)
•  Matminer
–  L. Ward (project lead, U. Chicago)
•  Text mining
–  V. Tshitoyan, J. Dagdelen, L. Weston
•  All that provided feedback & contributed code to open-source software efforts!
•  Funding:
–  DOE-BES
–  Toyota Research Institute
34
Thank you!
Slides (already) posted to hackingmaterials.lbl.gov

More Related Content

What's hot

Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...KAMAL CHOUDHARY
 
A Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge SystemsA Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge Systemsaimsnist
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Scienceaimsnist
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...aimsnist
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningKAMAL CHOUDHARY
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...aimsnist
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Designaimsnist
 
Database of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageDatabase of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageKAMAL CHOUDHARY
 
High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...KAMAL CHOUDHARY
 
Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Anubhav Jain
 
Computational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discoveryComputational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discoveryKAMAL CHOUDHARY
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distddm314
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Anubhav Jain
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
 
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...aimsnist
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 
A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...aimsnist
 

What's hot (20)

Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
 
A Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge SystemsA Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge Systems
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Database of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageDatabase of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit Spillage
 
High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...High-throughput discovery of low-dimensional and topologically non-trivial ma...
High-throughput discovery of low-dimensional and topologically non-trivial ma...
 
Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...
 
Computational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discoveryComputational Database for 3D and 2D materials to accelerate discovery
Computational Database for 3D and 2D materials to accelerate discovery
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 dist
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum Computation
 
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...
 

Similar to Data Mining to Discovery for Inorganic Solids: Software Tools and Applications

Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
Using MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryUsing MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryDan Gunter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Anubhav Jain
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designUniversity of California, San Diego
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignKAMAL CHOUDHARY
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Anubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Morgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distMorgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distddm314
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...National Institute of Informatics
 

Similar to Data Mining to Discovery for Inorganic Solids: Software Tools and Applications (20)

Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Using MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryUsing MongoDB for Materials Discovery
Using MongoDB for Materials Discovery
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Is 20TB really Big Data?
Is 20TB really Big Data?Is 20TB really Big Data?
Is 20TB really Big Data?
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Morgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distMorgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 dist
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 

More from aimsnist

Enabling Data Science Methods for Catalyst Design and Discovery
Enabling Data Science Methods for Catalyst Design and DiscoveryEnabling Data Science Methods for Catalyst Design and Discovery
Enabling Data Science Methods for Catalyst Design and Discoveryaimsnist
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AIaimsnist
 
Coupling AI with HiTp experiments to Discover Metallic Glasses Faster
Coupling AI with HiTp experiments to Discover Metallic Glasses FasterCoupling AI with HiTp experiments to Discover Metallic Glasses Faster
Coupling AI with HiTp experiments to Discover Metallic Glasses Fasteraimsnist
 
Classical force fields as physics-based neural networks
Classical force fields as physics-based neural networksClassical force fields as physics-based neural networks
Classical force fields as physics-based neural networksaimsnist
 
Pathways Towards a Hierarchical Discovery of Materials
Pathways Towards a Hierarchical Discovery of MaterialsPathways Towards a Hierarchical Discovery of Materials
Pathways Towards a Hierarchical Discovery of Materialsaimsnist
 
Materials Data in Action
Materials Data in ActionMaterials Data in Action
Materials Data in Actionaimsnist
 
Progress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science TextProgress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science Textaimsnist
 

More from aimsnist (7)

Enabling Data Science Methods for Catalyst Design and Discovery
Enabling Data Science Methods for Catalyst Design and DiscoveryEnabling Data Science Methods for Catalyst Design and Discovery
Enabling Data Science Methods for Catalyst Design and Discovery
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AI
 
Coupling AI with HiTp experiments to Discover Metallic Glasses Faster
Coupling AI with HiTp experiments to Discover Metallic Glasses FasterCoupling AI with HiTp experiments to Discover Metallic Glasses Faster
Coupling AI with HiTp experiments to Discover Metallic Glasses Faster
 
Classical force fields as physics-based neural networks
Classical force fields as physics-based neural networksClassical force fields as physics-based neural networks
Classical force fields as physics-based neural networks
 
Pathways Towards a Hierarchical Discovery of Materials
Pathways Towards a Hierarchical Discovery of MaterialsPathways Towards a Hierarchical Discovery of Materials
Pathways Towards a Hierarchical Discovery of Materials
 
Materials Data in Action
Materials Data in ActionMaterials Data in Action
Materials Data in Action
 
Progress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science TextProgress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science Text
 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 

Recently uploaded (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 

Data Mining to Discovery for Inorganic Solids: Software Tools and Applications

  • 1. Data Mining to Discovery for Inorganic Solids: Software Tools and Applications Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA Artificial Intelligence for Materials Science August 7, 2018 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. •  Three projects available now –  Interpretable descriptors of crystal structure –  matminer –  atomate / Rocketsled •  One project in progress –  A text mining materials database 2 Overview of talk
  • 3. 3 I. Interpretable descriptors of crystal structure
  • 4. Machine learning: the big problem in my view is connecting data to ML algorithms through features 4 Lots of data on complex objects that you want to interrelate Clustering, Regression, Feature extraction, Model-building, etc. Well developed data-mining routines that work only on numbers (ideally ones with high relevance to your problem) Need to transform materials science objects into a set of physically relevant numerical data (“features” or “descriptors”)
  • 5. 5 The crystal structure is a core entity that machine learning algorithms should know about Step 1. Describe each site as a fingerprint telling you close it is to each of 22 known local environments (e.g., tetrahedral, octahedral, etc.) Step 2: Describe each structure as the average of its site fingerprints* tetrahedron octahedron distorted 8-coordinated cube *(plus additional statistics like standard deviation, min, max, etc. if desired – or split into separate cation/anion vectors)
  • 6. Defining local order parameters for various environments 6 Use a given local order parameter with a threshold for motif recognition: If qtet > qthresh, then motif is tetrahedron. Else not (too much) a tetrahedron. Tetrahedral order parameter, qtet, [1]: [1] Zimmermann et al., J. Am. Chem. Soc., 2017, 10.1021/jacs.5b08098
  • 7. We have now developed mathematical order parameters for 22 different local environments 7
  • 8. How well do these work? 8 1. Order parameters clearly distinguish different environments even after thermal distortion 2. Work well in applications (defect site finding, diffusion characterization) [1] Zimmermann et al., Frontiers of Materials, 2017, doi: 10.3389/fmats.2017.00034
  • 9. 9 Structure fingerprints: can they distinguish crystal structures? BaAl2O4 BaZnF4 CaFe2O4 CrVO4 K2NiF4 CaB2O4-I MgUO4 Pb3O4 SbNbO4 Sr2PbO4 Tetragonal BaTiO3 Th3P4 TlAlF4 ZnSO4 α-MnMoO4 BCCAragonite Barite β-K2SO4 Calcite Half-Heusler FCC GarnetHCP Rocksalt Diamond High-cristobalite Ilmenite Low-cristobalite Low-quartz Monazite Olivine Perovskites RutilePhenacite Tetragonal BaTiO3 Th3P4 TlAlF4 ZnSO4 α-MnMoO4 BCCAragonite Barite β-K2SO4 Calcite Half-Heusler FCC GarnetHCP Rocksalt Diamond High-cristobalite Ilmenite Low-cristobalite Low-quartz Monazite Olivine Perovskites Rutile Scheelite Spinel Thenardite Wolframite Zircon Phenacite •  40 diverse crystal structure prototypes •  Many complex examples (e.g., multi-cation, multi-anion) from each class •  Thousands of crystal structures in the test set •  Create structure fingerprints based on averages of local environments
  • 10. •  The Euclidean distance of structure fingerprints between structures of the same prototype is small and different prototypes is larger 10 Local environments fingerprints do distinguish prototypes! Overlapping coefficient: OVC = 1.7% distance between structure fingerprint vectors distribution same prototype different prototype
  • 11. 11 Can cluster crystal structures by “local environment similarity”
  • 12. Results on MP web site, e.g. for BCC-like structures 12 https://www.materialsproject.org/materials/mp-91/! Target: W similar structures (distance near 0) Cs3Sb! TiGaFeCo! CeMg2Cu!
  • 13. •  Incorporate into machine learning models •  Compare performance against other site / structure descriptors •  Beyond local environments 13 Structure descriptors – next steps Implemented in: •  pymatgen - www.pymatgen.org •  matminer – https://hackingmaterials.github.io/matminer More info: talk to Nils Zimmermann at the poster session!
  • 15. 15 Currently, it can be hard to get started with ML in materials How can we make this transformation? Test different ideas? Where do we get the data?
  • 16. Goal of matminer: connect materials data with data mining algorithms and data visualization libraries 16 Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
  • 17. >40 featurizer classes can generate thousands of potential descriptors 17 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) •  compatible with scikit-learn pipelining •  automatically deploy multiprocessing to parallelize over data •  include citations to methodology papers
  • 18. 18 Interactive Jupyter notebooks demonstrate use cases https://github.com/hackingmaterials/matminer_examples! Many examples available: •  Retrieving data from various databases •  Predicting bulk / shear modulus with ML •  Predicting formation energies: •  from composition alone •  with Voronoi-based structure features included •  with Coulomb matrix and Orbital Field matrix descriptors (reproducing previous studies in the literature) •  Making interactive visualizations •  Creating an ML pipeline
  • 19. •  Further increase coverage and scope of feature extraction methods available in the literature •  Increase the number of “standard” data sets that can be used to benchmark different ML approaches •  Apply to materials problems (in progress) 19 matminer – next steps Implemented in: •  matminer – https://hackingmaterials.github.io/matminer
  • 20. 20 III. atomate / Rocketsled Generalizable forward solver Supercomputing Power Statistical optimization FireWorks NERSC Various optimization libraries (Figure: J. Mueller)
  • 21. With high-throughput DFT, we can generate data rapidly – what to do next? 21 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)!
  • 22. Atomate is our software to easily run millions of such calculations at supercomputing centers 22 Results!! researcher! Start with all binary oxides, replace O->S, run several different properties Workflows to run! ü  band structure! ü  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion! q  spin-orbit coupling!
  • 23. Can we build a general computational optimizer? 23 Generalizable forward solver Supercomputing Power Statistical optimization FireWorks / atomate NERSC Various optimization libraries (Figure: J. Mueller)
  • 24. Rocketsled: Automatic materials screening that selects materials to compute AND submits them to supercomputer 24 screening space of ~20,000 potential ABX3 perovskite combinations as water splitting materials – precomputed in DFT by different group if a machine learning algorithm was in charge of picking the next compound based on past data, how efficient would it be?
  • 25. •  Built off the scikit-optimization package, with 10 different regressors (ML algorithms) available •  Bootstrapped uncertainty estimates for balancing exploration and exploitation •  Next step: deployment for thermoelectrics search 25 Further details and next steps Implemented in: •  rocketsled – https://github.com/hackingmaterials/rocketsled
  • 26. 26 IV. A text mining materials database
  • 27. Some questions that current search tools don’t answer: these questions require materials-specific search tools! “I’d like a list of all the chemical compositions that have been studied as thermoelectrics, ideally weighted by research interest in them. Ok, now filter to thermoelectric materials known to have layered structures. Now show me some materials that are aren’t in that list but are similar in terms of structure and electronic properties in the Materials Project database.”! “What are all the known applications and unique properties of NaCoO2? What techniques (computational, experimental) have been used to study this compound in the past?”! “I just predicted a new composition as a battery cathode. A lit search shows no hits at all for that composition. Has anyone ever made anything similar to that composition? I’d like to know for synthesis ideas and also want to check against similarity to known battery materials.”!
  • 28. 28 An engine to label the content of scientific abstracts Matstract corpus Unlabeled data Data labels Feature engineering Text cleaning Tokenization POS tag labels Word embeddings (word2vec) Text processing Hand crafted features Supervised learning Neural network (LSTM) Logistic regression Train/test sets Named Entities Named Entities “Learning” what a scientific study is about from >2 million materials science abstracts
  • 29. 29 Learn relationships over many abstracts
  • 30. 30 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  • 31. 31 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  • 32. •  Further testing •  Similarity metrics, e.g. if a target compound doesn’t exist, retrieve information for “similar” compounds instead •  Integration with Materials Project 32 Materials abstracts – next steps Interested in being a beta tester? Contact me
  • 33. •  Our group has been working on methods and software for various applications –  Interpretable descriptors of crystal structure –  matminer –  atomate / Rocketsled –  A text mining materials database •  We encourage you to try the software and let us know what you think! –  Help lists are available for all software 33 Conclusions
  • 34. •  Structure descriptors –  N. Zimmermann (project lead) •  Atomate / Rocketsled –  K Matthew (project lead, atomate) –  A. Dunn (project lead, rocketsled) •  Matminer –  L. Ward (project lead, U. Chicago) •  Text mining –  V. Tshitoyan, J. Dagdelen, L. Weston •  All that provided feedback & contributed code to open-source software efforts! •  Funding: –  DOE-BES –  Toyota Research Institute 34 Thank you! Slides (already) posted to hackingmaterials.lbl.gov