Materials Data in the 21st Century: From Mishmash to Moneyball

43
Materials Data in the 21 st Century: From Mishmash to Moneyball Bryce Meredig, Citrine Informatics OMI Workshop Madison, Wisconsin 9 February 2015

Transcript of Materials Data in the 21st Century: From Mishmash to Moneyball

Materials Data in the 21st Century: From Mishmash to Moneyball

Bryce Meredig, Citrine Informatics

OMI Workshop Madison, Wisconsin

9 February 2015

What Is Citrine?

venture- and grant-funded Si Valley startup w/revenue

our platform extracts insights from materials data

Who Is Citrine?

three founders, five full-time employees

backgrounds: DFT, GaN growth, Red Hat, Snapchat

Citrine’s Vision

provide Moneyball analytics for your research

unite all materials data & algorithms in one platform

Things Citrine Does Not Do

DFT or atomistic modeling consulting charge academics $

“Software is eating the world” -Marc Andreessen, cofounder of Netscape & Si Valley VC

…so why not materials?

lack of structured data

Where We Are

1. The materials data landscape is highly f r a g m e n t e d

Materials Data Pipeline

Laboratory equipment

Data generation USB stick,

email, Dropbox, PC

Transport/store Excel, PS, MATLAB, Python, R

Wrangling

Word / LaTeX

Report Authoring

Google, WoS, ICSD

Search/Aggreg. Elsevier, ArXiv, Researchgate

Distribution

Bibliography Tools Mendeley, EndNote,

Zotero, Papers

Materials Data Landscape

•  SpringerMaterials ($) •  NIMS •  MatWeb

General Databases •  ICSD ($) •  Powder Diffraction File ($) •  Nanohub •  ASM Phase Diagrams ($) •  Granta

Domain Databases •  Materials Project •  AFLOWLIB •  Harvard Clean Energy Project •  NoMaD •  CatApp

Computational Databases

Publishers •  Elsevier •  APS •  ACS •  Wiley •  Springer

Societies •  MRS •  TMS •  ASM •  APS •  ACS

Distribu(on  Characterization •  Bruker •  PANalytical •  Sigma •  Agilent •  KLA-Tencor

Crea(on  Synthesis/mfg •  Applied Materials •  Oerlikon •  Veeco •  Homemade tools

Testing •  Newport •  Weibull •  Intertek

Incep(on  Institutions •  Universities •  National Labs •  Industry

Software •  ICME •  Accelrys •  AFLOW •  ASE

Materials Data Landscape

•  SpringerMaterials ($) •  NIMS •  MatWeb

General Databases •  ICSD ($) •  Powder Diffraction File ($) •  Nanohub •  ASM Phase Diagrams ($) •  Granta ($)

Domain Databases •  Materials Project •  AFLOWLIB •  Harvard Clean Energy Project •  NoMaD •  CatApp

Computational Databases

Publishers •  Elsevier •  APS •  ACS •  Wiley •  Springer

Societies •  MRS •  TMS •  ASM •  APS •  ACS

Distribu(on  Characterization •  Bruker •  PANalytical •  Sigma •  Agilent •  KLA-Tencor

Crea(on  Synthesis/mfg •  Applied Materials •  Oerlikon •  Veeco •  Homemade tools

Testing •  Newport •  Weibull •  Intertek

Incep(on  Institutions •  Universities •  National Labs •  Industry

Software •  ICME •  Accelrys •  AFLOW •  ASE

Citrination

2. Not only that, materials data are incredibly diverse

materials have a wide variety of data formats and sources

http://www2.warwick.ac.uk/fac/sci/physics/research/condensedmatt/microscopy/research/staff/reza/

http://newscenter.lbl.gov/2012/10/30/folding-funnels-biomimicry/ http://www.nature.com/srep/2013/130507/

srep01787/full/srep01787.html

http://epsc.wustl.edu/haskin-group/Raman/faqs.htm

http://www.beilstein-journals.org/bjnano/single/articleFullText.htm?publicId=2190-4286-1-15

http://www.esrf.eu/UsersAndScience/Experiments/CRG/BM32/Featuredexp/eymery

http://universe-review.ca/F13-atom04.htm

http://www.cameca.com/instruments-for-research/atom-probe.aspx

http://www.canmin.org/content/41/3/759.figures-only

materials have many degrees of freedom besides composition

some metadata examples (there are many, many more):

…crystalline  

http://www.materials.unsw.edu.au/tutorials/online-tutorials/1-atomic-structure

amorphous  

http://www.metallographic.com/Technical/Metallography-Intro.html

grain structure  

http://pubs.rsc.org/en/content/articlelanding/2011/jm/c1jm10125k#!divAbstract

nanoparticles  

http://ipn2.epfl.ch/CHBU/images/Nanotubemodels.gif

nanotubes  

http://media.treehugger.com/assets/images/2011/10/nanowires-solar-panel-001.jpg

nanowires  

http://upload.wikimedia.org/wikipedia/commons/1/13/Composite_3d.png

composites…  

all of these layers tell us about how materials behave, so…

any solution to the mishmash must include all of them

Times Are Changing

Materials Genome Initiative

Materials Project

OSTP Open Access Mandate

Publisher Open Data Efforts

(rise of numerous smaller databases creates need for meta-aggregation)

Citrine’s Work

Machine Learning for Thermo

B. Meredig & C. Wolverton, PRB 89, 094104 (2014)  

Auto-Discovery of Correlations

B. Meredig & C. Wolverton, Chem Mater 26, 1985 (2014)  

Citrination Platform “all materials data”

single data standard data-driven apps & models

data extraction from pdf’s semantic search, APIs

Citrination Platform

citrination.com  

Materials Data Standard JSON-based definition of arbitrary materials objects & processes Able to accommodate wide variety of materials data

Thermoelectric Discovery

Model Input Data Canonical Thermoelectrics Citrine Discovery

Compound positions determined by weighted composition (e.g., SiGe would be halfway between Si and Ge; Mg2Si is 1/3 of the way from Mg to Si.)

Distant, novel class of thermoelectrics

Universe of known TE compounds

MW Gaultois, AO Oliynyk, A Mar, TD Sparks, GJ Mulholland, & B Meredig, “A Recommendation Engine for Suggesting Unexpected Thermoelectric Chemistries: Initial Experimental Validation.”  

MW Gaultois, AO Oliynyk, A Mar, TD Sparks, GJ Mulholland, & B Meredig, “A Recommendation Engine for Suggesting Unexpected Thermoelectric Chemistries: Initial Experimental Validation.”  

TE Search: All Ternary Systems

General Materials Properties

Where We’re Going

Trapped Non-Digital Data

User Data Upload

http://www.bruker.com/products/x-ray-diffraction-and-elemental-analysis/x-ray-diffraction/xrd-software/overview/eva/eva-phase-analysis.html

User Model Development

“Github for materials models”

Challenges

incentives existing workflows need for buy-in

Stakeholders universities government labs (DOE labs, NIST...) in the US, EU, Japan, China... funding agencies journal publishers scholarship search engines professional societies database providers equipment makers materials industry (Dow, DuPont, Alcoa, Corning…) industries that rely on matls (aerospace, electronics, energy...)

and YOU.

Ways to Get Involved email [email protected] to join mailing list try citrination.com and give us feedback contribute data contribute models to platform (alpha) grant proposals – drive our dev