Pat Langley Computational Learning Laboratory Center for the Study of Language and Information...
-
Upload
caroline-fowler -
Category
Documents
-
view
214 -
download
2
Transcript of Pat Langley Computational Learning Laboratory Center for the Study of Language and Information...
Pat LangleyPat Langley
Computational Learning LaboratoryComputational Learning LaboratoryCenter for the Study of Language and InformationCenter for the Study of Language and Information
Stanford University, Stanford, CA 94304Stanford University, Stanford, CA 94304http://hypatia.stanford.edu/cll/http://hypatia.stanford.edu/cll/
[email protected]@csli.stanford.edu
The Computational Discovery of The Computational Discovery of Communicable KnowledgeCommunicable Knowledge
Also affiliated with the DaimlerChrysler Research & Technology Center and Also affiliated with the DaimlerChrysler Research & Technology Center and the Institute for the Study of Learning and Expertise.the Institute for the Study of Learning and Expertise.
The Problem and the PotentialThe Problem and the Potential
traces of traffic behavior from GPS and cell phonestraces of traffic behavior from GPS and cell phones
prices of stocks and currencies from exchangesprices of stocks and currencies from exchanges
measurements of climate and ecosystem variablesmeasurements of climate and ecosystem variables
Our society is collecting increasing amounts of data in Our society is collecting increasing amounts of data in commercial and scientific domains.commercial and scientific domains.
These include complex spatial/temporal data sets like:These include complex spatial/temporal data sets like:
Computational techniques should let us find relations in Computational techniques should let us find relations in these data that are useful for business and society.these data that are useful for business and society.
Drawbacks of Current ApproachesDrawbacks of Current Approaches
assume assume attribute-valueattribute-value representations that cannot representations that cannot handle time or spacehandle time or space
cannot tell cannot tell interestinginteresting discoveries from mundane ones discoveries from mundane ones
state the discovered knowledge in some state the discovered knowledge in some opaqueopaque form form
The fields of machine learning and data mining have The fields of machine learning and data mining have developed methods to find regularities in data.developed methods to find regularities in data.
Despite many successful applications, most techniques:Despite many successful applications, most techniques:
This indicates the need for alternative methods that can This indicates the need for alternative methods that can address these issues.address these issues.
Paradigms for Machine LearningParadigms for Machine Learning
decision-treeinduction
case-basedlearning
induction oflogical rules
probabilisticinduction
neuralnetworks
Paradigms for Scientific DiscoveryParadigms for Scientific Discovery
taxonomyformation
equationdiscovery
qualitative lawdiscovery
process modelformation
structural modelconstruction
Discovering Numeric LawsDiscovering Numeric Laws
Statement of the task:Statement of the task:
• Given:Given: Quantitative measurements about objects or events in Quantitative measurements about objects or events in the world.the world.
• Find:Find: Numeric relations that hold among variables that Numeric relations that hold among variables that describe these items and that predict future behavior.describe these items and that predict future behavior.
Historical examples:Historical examples:
• Kepler’s three laws of planetary motionKepler’s three laws of planetary motion
• Archimedes’ principle of displacement in waterArchimedes’ principle of displacement in water
• Black’s law relating specific heat, mass, and temperatureBlack’s law relating specific heat, mass, and temperature
• Proust’s and Gay-Lussac’s laws of definite proportionsProust’s and Gay-Lussac’s laws of definite proportions
BACON on Kepler’s Third LawBACON on Kepler’s Third Law
DD
AABBCC
d/pd/ppp
16.6916.69
1.771.773.573.577.167.16
1.481.48
3.203.202.432.431.961.96
dd22/p/p
36.4636.46
18.1518.1521.0421.0427.4027.40
dd33/p/p22
53.8953.89
58.1558.1551.0651.0653.6153.61
moonmoon dd
24.6724.67
5.675.678.678.67
14.0014.00
BACON carries out heuristic search through a space of numeric BACON carries out heuristic search through a space of numeric terms, looking for constant values and linear relations. terms, looking for constant values and linear relations.
This example shows the system’s progression from primitive This example shows the system’s progression from primitive variables (distance and period of Jupiter’s moons) to a complex variables (distance and period of Jupiter’s moons) to a complex term that has a nearly constant value.term that has a nearly constant value.
Some Laws Discovered by BACONSome Laws Discovered by BACON
Basic numeric relations:Basic numeric relations:
• Ideal gas lawIdeal gas law PV = aNT + bNPV = aNT + bN
• Kepler’s third lawKepler’s third law DD33 = [(A - k) / t] = [(A - k) / t]22 = j = j
• Coulomb’s lawCoulomb’s law FDFD22 / Q / Q11QQ22 = c = c
• Ohm’s lawOhm’s law TDTD22 / (LI - rI) = r / (LI - rI) = r
Relations with intrinsic properties:Relations with intrinsic properties:
• Snell’s law of refractionSnell’s law of refraction sin I / sin R = nsin I / sin R = n1 1 // nn22
• Archimedes’ lawArchimedes’ law C = V + iC = V + i
• Momentum conservationMomentum conservation mm11VV11 = = mm22VV22
• Black’s specific heat lawBlack’s specific heat law cc11mm11TT11 + c + c22mm22TT22 = (c = (c11mm11+ c+ c22mm2 2 )) TTff
Temporal Laws of Ecological BehaviorTemporal Laws of Ecological Behavior(Todorovski & Dzeroski, 1997)(Todorovski & Dzeroski, 1997)
Input:Input: time time phyt phyt zoo zoo phosp temp phosp temp
time time 22 phyt phyt 2 2 zoo zoo 22 phosp phosp 22 temp temp 22
time time 11 phyt phyt 1 1 zoo zoo 11 phosp phosp 11 temp temp 11
time time mm phyt phyt m m zoo zoo mm phosp phosp mm temp temp mm
. . . . .. . . . .. . . . .. . . . .
phosp phosp
cc22 + phosp + phospOutput: Output: phyt = cphyt = c11 •• phyt phyt •• – c – c33 •• phyt phyt••
Input: Input: a context-free grammar of domain constraintsa context-free grammar of domain constraints
Formulating Structural ModelsFormulating Structural Models
Statement of the task:Statement of the task:
• Given:Given: Qualitative or numeric empirical laws that describe Qualitative or numeric empirical laws that describe observed phenomena.observed phenomena.
• Find:Find: Explanatory models of these phenomena in terms of Explanatory models of these phenomena in terms of component objects and their relations.component objects and their relations.
Historical examples:Historical examples:
• Dalton’s and Avogadro’s molecular models of chemicalsDalton’s and Avogadro’s molecular models of chemicals
• Mendel’s genetic model of inherited traitsMendel’s genetic model of inherited traits
• Quark models of elementary particlesQuark models of elementary particles
• Structural models of planets, comets, and starsStructural models of planets, comets, and stars
Initial state: Initial state: (reacts in {hydrogen oxygen} out {water})(reacts in {hydrogen oxygen} out {water})(reacts in {hydrogen nitrogen} out {ammonia})(reacts in {hydrogen nitrogen} out {ammonia})(reacts in {oxygen nitrogen} out {nitrous oxide}) . . .(reacts in {oxygen nitrogen} out {nitrous oxide}) . . .
Final state: Final state: 2 hydrogen + 1 oxygen 2 hydrogen + 1 oxygen 2 water 2 water3 hydrogen + 1 nitrogen 3 hydrogen + 1 nitrogen 2 ammonia 2 ammonia2 oxygen + 1 nitrogen 2 oxygen + 1 nitrogen 2 nitrous oxide 2 nitrous oxidehydrogen hydrogen {h h} water {h h} water {h h o} {h h o} oxygen oxygen {h h} ammonia {h h} ammonia {h h h n} {h h h n} nitrogen nitrogen {h h} nitrous oxide {h h} nitrous oxide {n o o} . . . {n o o} . . .
DALTON on Chemical ReactionsDALTON on Chemical Reactions
DALTON finds these structural models through a depth-first DALTON finds these structural models through a depth-first search process constrained by conservation assumptions. search process constrained by conservation assumptions.
Constructing Process ModelsConstructing Process Models
Statement of the task:Statement of the task:
• Given:Given: Qualitative or numeric empirical laws that describe Qualitative or numeric empirical laws that describe temporal phenomena.temporal phenomena.
• Find:Find: Explanatory models of these phenomena in terms of Explanatory models of these phenomena in terms of processes among component objects.processes among component objects.
Historical examples:Historical examples:
• Caloric and kinetic theories of heat phenomenaCaloric and kinetic theories of heat phenomena
• Reaction pathways in chemistry and nucleosynthesisReaction pathways in chemistry and nucleosynthesis
• Models of continental drift and plate tectonics Models of continental drift and plate tectonics
• Process models of stellar evolution and destructionProcess models of stellar evolution and destruction
Inputs: Inputs: - quantum properties for elements and isotopes- quantum properties for elements and isotopes- conservation relations among these properties- conservation relations among these properties- an element to be explained (e.g., O or C)- an element to be explained (e.g., O or C)- elements to be assumed (e.g., H or He)- elements to be assumed (e.g., H or He)
Outputs: Outputs: - elementary reactions that obey conservation laws - elementary reactions that obey conservation laws - reaction pathways that explain the element’s evolution - reaction pathways that explain the element’s evolution
ASTRA on NucleosynthesisASTRA on Nucleosynthesis
ASTRA uses depth-first search to find reaction pathways for:ASTRA uses depth-first search to find reaction pathways for:- proton and neutron captures - proton and neutron captures - neutron and deuteron production- neutron and deuteron production- generation of helium (He) from hydrogen (H)- generation of helium (He) from hydrogen (H)- generation of carbon (C) and oxygen (O) - generation of carbon (C) and oxygen (O)
Standard pathway: Standard pathway: 44He + He + 44He He 88BeBe44He + He + 88Be Be 1212CC
Three Pathways for Carbon SynthesisThree Pathways for Carbon Synthesis
ASTRA generates many pathways novel to astrophysics, some ASTRA generates many pathways novel to astrophysics, some of which have viable reaction rates. of which have viable reaction rates.
Alternative pathways: Alternative pathways: 44He + He + D D 66LiLi33He + He + 66Li Li 99Be Be 44He + He + 99Be Be 1212C + nC + n
44He + He + D D 66LiLi44He + He + 66Li Li 1010Be Be 44He + He + 1010Be Be 1212C + DC + D
Proposed ResearchProposed Research
are designed to process temporal and structured dataare designed to process temporal and structured data use techniques from computational scientific discoveryuse techniques from computational scientific discovery describe new knowledge in a describe new knowledge in a communicablecommunicable form form
We plan to develop and evaluate discovery methods that:We plan to develop and evaluate discovery methods that:
We will apply our methods to domains that benefit from We will apply our methods to domains that benefit from such communicable representations.such communicable representations.
Likely notations for the discovered knowledge include:Likely notations for the discovered knowledge include: structural models of relations among entitiesstructural models of relations among entities process models of change over timeprocess models of change over time sets of simultaneous differential equationssets of simultaneous differential equations
Benefits of the ApproachBenefits of the Approach
support discoveries in domains that involve complex support discoveries in domains that involve complex spatial, temporal, or relational dataspatial, temporal, or relational data
use domain knowledge to filter only discoveries that use domain knowledge to filter only discoveries that are interesting and novel to the domain userare interesting and novel to the domain user
present the new knowledge in some understandable present the new knowledge in some understandable notation that can be communicated among humansnotation that can be communicated among humans
Unlike most previous work on data mining and Unlike most previous work on data mining and knowledge discovery, our methods will:knowledge discovery, our methods will:
Such techniques will improve the way we manipulate and Such techniques will improve the way we manipulate and understand complex data. understand complex data.