ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

58
ChemConnect A use case example using cloud services

Transcript of ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Page 1: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

ChemConnectA use case example using cloud services

Page 2: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Goals of talk

•Brief introduction to infrastructures and clouds•My experience/use of Google Cloud Platform•ChemConnect

Page 3: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Cloud ComputingWhat is a ‘cloud’ and why is it useful

Page 4: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Cloud Computing

4

COMPUTER NETWORK

STORAGE (DATABASE)

SERVERS

SERVICESAPPLICATIONS

Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim Grance

• Shared pool of configurable computing resources• On-demand network access• Provisioned by the Service Provider

Page 5: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

What is Cloud Computing?• Cloud Computing is a general term used to describe a new class of

network based computing that takes place over the Internet, • basically a step up from Utility Computing• a collection of integrated and networked hardware,

software and Internet infrastructure (called a platform).• Using the Internet for communication and transport

provides hardware, software and networking services to clients

• These platforms hide the complexity and details of the underlying infrastructure from users and applications by providing very simple graphical interface or API (Applications Programming Interface).

5

Page 6: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Cloud Summary• Cloud computing is an umbrella term used to refer to Internet

based development and services

• A number of characteristics define cloud data, applications services and infrastructure:

• Remotely hosted: Services or data are hosted on remote infrastructure. • Ubiquitous: Services or data are available from anywhere.• Commercialized: The result is a utility computing model similar to

traditional that of traditional utilities, like gas and electricity - you pay for what you would want!

6

Page 7: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Do you Use the Cloud?

Page 8: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Cloud Flavors?• SaaS – Software as a Service• IaaS – Infrastructure as a Service• PaaS – Platform as a Service• DaaS – Desktop as a Service

Page 9: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Cloud Service ModelsSoftware as a

Service (SaaS)Platform as a

Service (PaaS)Infrastructure as a

Service (IaaS)

Google App Engine

SalesForce CRMLotusLive

Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim Grance

Page 10: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Cloud Architecture

10

Page 11: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Cloud Platform for ChemConnect

https://cloud.google.com/

Others exist(another popular

choice)

Why this one?ChemConnect is based on several

Google services (and philosophies)

Project Connected to Google Account

Page 12: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

ServicesProvided

These are types of services provided by Google

as a cloud service providerFor ChemConnect the services of interest are:To run the JAVA based website(the ‘App’)

The ‘NOSQL’ database:(for large amounts of information)

Storage (data files)

Page 13: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

App Engine

Page 14: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Programming SupportAPI: Application Program Interfaces

Page 15: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Monitoring Services

Page 16: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Raw Database Information

Page 17: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

ChemConnect: client-server Structure

User interface on browser, tablet or phone

(adjustable for each)

Generates InterfaceChemConnect

Computing

andResponse

s

SERVER

CLIENT

Page 18: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Application EnvironmentsExample:

ChemConnect is written in JAVA

Eclipse:Uses a ‘standard’ (public domain)Environment to write code

Local debug and then Deploy to Google Cloud

Page 19: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Development Cycle:Google Cloud The

communityLocal Environment

Testing feedback

Local Deploy

Deploy to Cloud

Local client Interface

Web client Interface

Page 20: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Can’t get something for nothing:

Page 21: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Quotas (for ’small’ applications)

Page 22: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Make the immense amount of data in the combustion community

not only availablebut searchable

ChemConnect

Not restricted to ‘accepted’ published dataRecognize interdependencies between dataDatabase as an analytical tool

Fine-grained

Page 23: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

• Data is the backbone of modern scientific research• Exchange of data is paramount to successful interaction

between research groups

Motivation

Publications and conferences Data exchanged between

researchers (email, etc)Virtual Research Environment

paperData files

Clouds (infrastructures)

Page 24: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Key Concept: Meta-Data

Keywords specifyingData TypeData Source (origin, time, place, etc.)Data Qualifications (sharing, quality, etc.)

Data relationships to other data (ontologies)

Page 25: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

OntologiesPurpose:Defining interrelationships between data objects

Source:Semantic Web Concepts

Motivation:Large body of research in discovering relationships

Page 26: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

RDF: Resource Description Language

Subject: The subject of the description

Predicate: The description of the relationship between subject and objectObject: The object of the description

Subject

ObjectPredicate

Page 27: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Relationships(example from CHEMKIN mechanism)

Object Relationship ObjectMech-butane-2011 hasReaction c2h5+o2 = c2h5o2Mech-butane-2011 hasSpecies c2h5c2h5o2 = c2h4o2h hasReactant c2h4o2hc2h5o2 = c2h4o2h hasProduct c2h4o2hc2h4o2h isIsomer c2h5o2c2h4o2h hasStandardEnthal

py-276.51 kJ/mol

c2h5 hasProduct c2h5o2c2h5 hasProduct c2h4o2hc2h5o2 = c2h4o2h subMechanism C2c2h5o2 = c2h4o2h subMechanism C2H5O2C2h5 + o2 = c2h5o2 followedBy c2h5o2=c2h4o2h

Page 28: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Connecting ’unrelated’ dataPassive Connection:

Don’t need to know which structures you want to connect

toIf they share an RDF subject or a RDF object

Then they are connected!!

Keyword: Passive

Page 29: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Role of data standardsIn one sense,

standards are only important for the initial parsing of the data

and maybe outputting the dataBut not within the database itself

If new standards come up,they can supplement the data

(thinking of the keys, identifiers, meta-data keys, DOIs, etc.)

Page 30: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Large network of interconnections

Each ‘bond’ is an RDF

Page 31: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Restructuring of dataData

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data Element

Data Element

Data Element

Data Element

Data Element

Data Element

Data Element

Blocks of dataIndividual pieces of data(with tags/descriptions)

Network of interconnected

data

Page 32: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Consequence (example)

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Relationships are established

between previously

Independent data sets/elements

Page 33: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Semantic Web: Relationships within the net

http://…isbn/000651409X

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:name a:homepage

a:authora:publisher

Author URL

Origin(and development of idea)

Page 34: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Semantic Web: Relationships within the net

Adds ‘meaning’ to the independent sources of

informationGives ‘relationships’

Between the Pieces of information

Page 35: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Semantic Web:Merge relationships

http://…isbn/000651409X

Ghosh, AmitavBesse,

Christianne

Le palais des miroirs

f:orig

inalf:nom

f:traducteur

f:auteur f:titr

ehttp://…isbn/2020386682

f:nom

http://…isbn/000651409X

Ghosh, Amitav http://www.amitavghosh.com

The Glass Palace2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:author

a:publisher

Common URL!

Connecting sets of Concepts

FrenchLangua

ge

EnglishLangua

ge

Page 36: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Semantic Web:Creating new relationships

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteur

f:titre

http://…isbn/2020386682

f:nom

Ghosh, Amitavhttp://

www.amitavghosh.com

The Glass Palace2000

London

Harper Collins

a:year

a:city

a:p_name

a:name a:homepa

ge

a:author

a:publis

her

http://…isbn/000651409X

Two independent data sources(who did not know about each other)Become connected

Passive

Page 37: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Fine-Grained Information

Extraction of all the bits of information within the data object

CHEMKIN model:Extract set of molecules (with

isomer,thermodynamic data)Extract set of reactions (with ‘isomer’, kinetic

data, Extract relationships between

molecules and molecules (related through reactions)

molecules and reactions (reactants, products, etc.)

reactions and reactions (reaction network information)

Other Sources:Automatic Generation:

Mechanism with the information as above, plus2D-structure, reaction class information, substructure

informationThermodynamic Calculators: more thermodynamic information (plus

2d-structures)Have to have database capacity to store this immense amount of info

To be demonstrated

today

Page 38: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Linking data/models

ChemkinModel I

ChemkinModel II

2-D Structure Computational

ChemistryCalculations

Automatically

GeneratedCHEMKIN

Model

1-Butyl-3-hydroperoxide

C4H11O2

ch2ch2ch(ooh)ch3

1-c4hh8-3-ooh

hasSpecies

hasSpecies

hasSpecies

hasThermo

isIsomer isIsomer

isIsomer

Thermo

hasThermo

Thermo

hasThermo

Thermo

Page 39: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

RDF: Resource Description Framework

Snapshot from query interface

UCSanDiego#NaturalGas IsA

Mechanism

UCSanDiego#NaturalGas#n-c3h7=c2h4+ch3

MechanismReactionUCSanDiego#NaturalGas

Page 40: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

UCSanDiego#NaturalGas#N-C3H7IsAsReactant

UCSanDiego#NaturalGas#n-c3h7=c2h4+ch3

Names specific to the mechanism

Predicate relating items

Page 41: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Connection to other mechanisms Mechanism

Reaction in mechanism

Molecule in reaction

Simple SpeciesName

Isomer

GRI#GRI-3.0#C3H7Species in another mechanism

Page 42: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Database of RDF connections

Page 43: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Datastore Extremely large amount of Information

Needs anotherTechnology

(even a small CHEMKIN mechanism translates

to megabytes of information)

Page 44: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Other options

Page 45: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Database as Analytic DeviceTraversing through the network of

informationis a tool

to ‘analyze’ and

extract more/new information

Page 46: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

How a species reacts?

Species(Isomer)

asReactant

asProduct

SetOf

Reactions

SetOf

Reactions

Not just from one

Mechanism, but from all cataloged

mechanisms

Database as analytic device

Page 47: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Search Path (in interface)

Page 48: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Reaction information

Collecting Information

To ‘’cart’

(building a mechanism)

Page 49: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Reaction pathwaysDatabase as analytic

device

isAProductSpecies

isAReactant

Reaction

isAProductSpecies

isAReactant

Reaction

isAProductSpecies

isAReactant

ReactionSpecies

Establishes a further relationship between two species

Could even supplementDatabase

Species1 PathTo Species2

Page 50: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Species between MechanismsDatabase as analytic

device

CHEMKINMechanis

m

Species are labels:Only know atomic composition (NASA

polynomial)Not structure

CHEMKINMechanis

m

C3H7

N-C3H7i-C3H7

Reactions (asProduct)

Reactions (asReactant)

Reactions (asProduct)

Reactions (asReactant)

Reactions (asProduct)

Reactions (asReactant)

Compare reactions

(species as isomers)

The set with the most similarities: wins

Page 51: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Species between MechanismsDatabase as analytic

device

Reactions (asProduct)

Reactions (asReactant)

Reactions (asProduct)

Reactions (asReactant)

The set with the most similarities: wins

C3H7

N-C3H7

A new relationship can be established

For the cautious:The relationship can be

qualifiedWith a probability

(related to degree of matching)For more certainty:

One can extend the comparison through

A larger network(path through two or more reactions)

Page 52: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Species between Mechanisms:One step further

If one of the mechanisms is automatically generated

Then have the 2D structureThe species goes from a ‘label’

to a Species with a structure

(can be further classified with substructures)

Database as analytic device

Page 53: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Data input

Page 54: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Look and feel

Page 55: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

In the backgroundAccount Sign in:

Query:Which data do you have access to

Data input:How is your data shared

SecurityInhibit hacking Social media concepts:

groupsEach data point has sharing and ownership parameters

Page 56: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

In the backgroundTransactions:

How who and when was the data entered (or analysed)

How was the database used: which queriesWhy?

Have to filter query results are shown and order themBoth personal and in general

General Field (computer science):Recommendation Systems

Each google search (from different people) gives different results

eCommerce sites use this to

Page 57: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

Future directionsSome basic functionality is present:

Reading in CHEMKIN mechanisms from many sourcesManagement of RDFsSimple Query (single keyword search)

Data Sources:Automatic generated mechanisms (mechanism)

Data behind automatic generation (reaction classes, 2-D (sub)structures)

Independent thermodynamic dataComputational chemistry results

Query More complex searches

multiple keywordsinterpretation/preprocessing of keyword

expression before searchOrdering and filtering results (passive and with check

boxes)

Page 58: ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐data

To be continued:Demonstration

See you there!

If the gods of the internet (and the demon - ’demo

effect’) allows, you can try it out