Pat Langley Computational Learning Laboratory Center for the Study of Language and Information...

16
Pat Langley Pat Langley Computational Learning Laboratory Computational Learning Laboratory Center for the Study of Language and Information Center for the Study of Language and Information Stanford University, Stanford, CA 94304 Stanford University, Stanford, CA 94304 http://hypatia.stanford.edu/cll/ http://hypatia.stanford.edu/cll/ [email protected] [email protected] The Computational The Computational Discovery of Discovery of Communicable Knowledge Communicable Knowledge filiated with the DaimlerChrysler Research & Technology Cente filiated with the DaimlerChrysler Research & Technology Cente titute for the Study of Learning and Expertise. titute for the Study of Learning and Expertise.

Transcript of Pat Langley Computational Learning Laboratory Center for the Study of Language and Information...

Page 1: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Pat LangleyPat Langley

Computational Learning LaboratoryComputational Learning LaboratoryCenter for the Study of Language and InformationCenter for the Study of Language and Information

Stanford University, Stanford, CA 94304Stanford University, Stanford, CA 94304http://hypatia.stanford.edu/cll/http://hypatia.stanford.edu/cll/

[email protected]@csli.stanford.edu

The Computational Discovery of The Computational Discovery of Communicable KnowledgeCommunicable Knowledge

Also affiliated with the DaimlerChrysler Research & Technology Center and Also affiliated with the DaimlerChrysler Research & Technology Center and the Institute for the Study of Learning and Expertise.the Institute for the Study of Learning and Expertise.

Page 2: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

The Problem and the PotentialThe Problem and the Potential

traces of traffic behavior from GPS and cell phonestraces of traffic behavior from GPS and cell phones

prices of stocks and currencies from exchangesprices of stocks and currencies from exchanges

measurements of climate and ecosystem variablesmeasurements of climate and ecosystem variables

Our society is collecting increasing amounts of data in Our society is collecting increasing amounts of data in commercial and scientific domains.commercial and scientific domains.

These include complex spatial/temporal data sets like:These include complex spatial/temporal data sets like:

Computational techniques should let us find relations in Computational techniques should let us find relations in these data that are useful for business and society.these data that are useful for business and society.

Page 3: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Drawbacks of Current ApproachesDrawbacks of Current Approaches

assume assume attribute-valueattribute-value representations that cannot representations that cannot handle time or spacehandle time or space

cannot tell cannot tell interestinginteresting discoveries from mundane ones discoveries from mundane ones

state the discovered knowledge in some state the discovered knowledge in some opaqueopaque form form

The fields of machine learning and data mining have The fields of machine learning and data mining have developed methods to find regularities in data.developed methods to find regularities in data.

Despite many successful applications, most techniques:Despite many successful applications, most techniques:

This indicates the need for alternative methods that can This indicates the need for alternative methods that can address these issues.address these issues.

Page 4: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Paradigms for Machine LearningParadigms for Machine Learning

decision-treeinduction

case-basedlearning

induction oflogical rules

probabilisticinduction

neuralnetworks

Page 5: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Paradigms for Scientific DiscoveryParadigms for Scientific Discovery

taxonomyformation

equationdiscovery

qualitative lawdiscovery

process modelformation

structural modelconstruction

Page 6: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Discovering Numeric LawsDiscovering Numeric Laws

Statement of the task:Statement of the task:

• Given:Given: Quantitative measurements about objects or events in Quantitative measurements about objects or events in the world.the world.

• Find:Find: Numeric relations that hold among variables that Numeric relations that hold among variables that describe these items and that predict future behavior.describe these items and that predict future behavior.

Historical examples:Historical examples:

• Kepler’s three laws of planetary motionKepler’s three laws of planetary motion

• Archimedes’ principle of displacement in waterArchimedes’ principle of displacement in water

• Black’s law relating specific heat, mass, and temperatureBlack’s law relating specific heat, mass, and temperature

• Proust’s and Gay-Lussac’s laws of definite proportionsProust’s and Gay-Lussac’s laws of definite proportions

Page 7: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

BACON on Kepler’s Third LawBACON on Kepler’s Third Law

DD

AABBCC

d/pd/ppp

16.6916.69

1.771.773.573.577.167.16

1.481.48

3.203.202.432.431.961.96

dd22/p/p

36.4636.46

18.1518.1521.0421.0427.4027.40

dd33/p/p22

53.8953.89

58.1558.1551.0651.0653.6153.61

moonmoon dd

24.6724.67

5.675.678.678.67

14.0014.00

BACON carries out heuristic search through a space of numeric BACON carries out heuristic search through a space of numeric terms, looking for constant values and linear relations. terms, looking for constant values and linear relations.

This example shows the system’s progression from primitive This example shows the system’s progression from primitive variables (distance and period of Jupiter’s moons) to a complex variables (distance and period of Jupiter’s moons) to a complex term that has a nearly constant value.term that has a nearly constant value.

Page 8: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Some Laws Discovered by BACONSome Laws Discovered by BACON

Basic numeric relations:Basic numeric relations:

• Ideal gas lawIdeal gas law PV = aNT + bNPV = aNT + bN

• Kepler’s third lawKepler’s third law DD33 = [(A - k) / t] = [(A - k) / t]22 = j = j

• Coulomb’s lawCoulomb’s law FDFD22 / Q / Q11QQ22 = c = c

• Ohm’s lawOhm’s law TDTD22 / (LI - rI) = r / (LI - rI) = r

Relations with intrinsic properties:Relations with intrinsic properties:

• Snell’s law of refractionSnell’s law of refraction sin I / sin R = nsin I / sin R = n1 1 // nn22

• Archimedes’ lawArchimedes’ law C = V + iC = V + i

• Momentum conservationMomentum conservation mm11VV11 = = mm22VV22

• Black’s specific heat lawBlack’s specific heat law cc11mm11TT11 + c + c22mm22TT22 = (c = (c11mm11+ c+ c22mm2 2 )) TTff

Page 9: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Temporal Laws of Ecological BehaviorTemporal Laws of Ecological Behavior(Todorovski & Dzeroski, 1997)(Todorovski & Dzeroski, 1997)

Input:Input: time time phyt phyt zoo zoo phosp temp phosp temp

time time 22 phyt phyt 2 2 zoo zoo 22 phosp phosp 22 temp temp 22

time time 11 phyt phyt 1 1 zoo zoo 11 phosp phosp 11 temp temp 11

time time mm phyt phyt m m zoo zoo mm phosp phosp mm temp temp mm

. . . . .. . . . .. . . . .. . . . .

phosp phosp

cc22 + phosp + phospOutput: Output: phyt = cphyt = c11 •• phyt phyt •• – c – c33 •• phyt phyt••

Input: Input: a context-free grammar of domain constraintsa context-free grammar of domain constraints

Page 10: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Formulating Structural ModelsFormulating Structural Models

Statement of the task:Statement of the task:

• Given:Given: Qualitative or numeric empirical laws that describe Qualitative or numeric empirical laws that describe observed phenomena.observed phenomena.

• Find:Find: Explanatory models of these phenomena in terms of Explanatory models of these phenomena in terms of component objects and their relations.component objects and their relations.

Historical examples:Historical examples:

• Dalton’s and Avogadro’s molecular models of chemicalsDalton’s and Avogadro’s molecular models of chemicals

• Mendel’s genetic model of inherited traitsMendel’s genetic model of inherited traits

• Quark models of elementary particlesQuark models of elementary particles

• Structural models of planets, comets, and starsStructural models of planets, comets, and stars

Page 11: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Initial state: Initial state: (reacts in {hydrogen oxygen} out {water})(reacts in {hydrogen oxygen} out {water})(reacts in {hydrogen nitrogen} out {ammonia})(reacts in {hydrogen nitrogen} out {ammonia})(reacts in {oxygen nitrogen} out {nitrous oxide}) . . .(reacts in {oxygen nitrogen} out {nitrous oxide}) . . .

Final state: Final state: 2 hydrogen + 1 oxygen 2 hydrogen + 1 oxygen 2 water 2 water3 hydrogen + 1 nitrogen 3 hydrogen + 1 nitrogen 2 ammonia 2 ammonia2 oxygen + 1 nitrogen 2 oxygen + 1 nitrogen 2 nitrous oxide 2 nitrous oxidehydrogen hydrogen {h h} water {h h} water {h h o} {h h o} oxygen oxygen {h h} ammonia {h h} ammonia {h h h n} {h h h n} nitrogen nitrogen {h h} nitrous oxide {h h} nitrous oxide {n o o} . . . {n o o} . . .

DALTON on Chemical ReactionsDALTON on Chemical Reactions

DALTON finds these structural models through a depth-first DALTON finds these structural models through a depth-first search process constrained by conservation assumptions. search process constrained by conservation assumptions.

Page 12: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Constructing Process ModelsConstructing Process Models

Statement of the task:Statement of the task:

• Given:Given: Qualitative or numeric empirical laws that describe Qualitative or numeric empirical laws that describe temporal phenomena.temporal phenomena.

• Find:Find: Explanatory models of these phenomena in terms of Explanatory models of these phenomena in terms of processes among component objects.processes among component objects.

Historical examples:Historical examples:

• Caloric and kinetic theories of heat phenomenaCaloric and kinetic theories of heat phenomena

• Reaction pathways in chemistry and nucleosynthesisReaction pathways in chemistry and nucleosynthesis

• Models of continental drift and plate tectonics Models of continental drift and plate tectonics

• Process models of stellar evolution and destructionProcess models of stellar evolution and destruction

Page 13: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Inputs: Inputs: - quantum properties for elements and isotopes- quantum properties for elements and isotopes- conservation relations among these properties- conservation relations among these properties- an element to be explained (e.g., O or C)- an element to be explained (e.g., O or C)- elements to be assumed (e.g., H or He)- elements to be assumed (e.g., H or He)

Outputs: Outputs: - elementary reactions that obey conservation laws - elementary reactions that obey conservation laws - reaction pathways that explain the element’s evolution - reaction pathways that explain the element’s evolution

ASTRA on NucleosynthesisASTRA on Nucleosynthesis

ASTRA uses depth-first search to find reaction pathways for:ASTRA uses depth-first search to find reaction pathways for:- proton and neutron captures - proton and neutron captures - neutron and deuteron production- neutron and deuteron production- generation of helium (He) from hydrogen (H)- generation of helium (He) from hydrogen (H)- generation of carbon (C) and oxygen (O) - generation of carbon (C) and oxygen (O)

Page 14: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Standard pathway: Standard pathway: 44He + He + 44He He 88BeBe44He + He + 88Be Be 1212CC

Three Pathways for Carbon SynthesisThree Pathways for Carbon Synthesis

ASTRA generates many pathways novel to astrophysics, some ASTRA generates many pathways novel to astrophysics, some of which have viable reaction rates. of which have viable reaction rates.

Alternative pathways: Alternative pathways: 44He + He + D D 66LiLi33He + He + 66Li Li 99Be Be 44He + He + 99Be Be 1212C + nC + n

44He + He + D D 66LiLi44He + He + 66Li Li 1010Be Be 44He + He + 1010Be Be 1212C + DC + D

Page 15: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Proposed ResearchProposed Research

are designed to process temporal and structured dataare designed to process temporal and structured data use techniques from computational scientific discoveryuse techniques from computational scientific discovery describe new knowledge in a describe new knowledge in a communicablecommunicable form form

We plan to develop and evaluate discovery methods that:We plan to develop and evaluate discovery methods that:

We will apply our methods to domains that benefit from We will apply our methods to domains that benefit from such communicable representations.such communicable representations.

Likely notations for the discovered knowledge include:Likely notations for the discovered knowledge include: structural models of relations among entitiesstructural models of relations among entities process models of change over timeprocess models of change over time sets of simultaneous differential equationssets of simultaneous differential equations

Page 16: Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 csli.stanford.edu.

Benefits of the ApproachBenefits of the Approach

support discoveries in domains that involve complex support discoveries in domains that involve complex spatial, temporal, or relational dataspatial, temporal, or relational data

use domain knowledge to filter only discoveries that use domain knowledge to filter only discoveries that are interesting and novel to the domain userare interesting and novel to the domain user

present the new knowledge in some understandable present the new knowledge in some understandable notation that can be communicated among humansnotation that can be communicated among humans

Unlike most previous work on data mining and Unlike most previous work on data mining and knowledge discovery, our methods will:knowledge discovery, our methods will:

Such techniques will improve the way we manipulate and Such techniques will improve the way we manipulate and understand complex data. understand complex data.