Machine Learning for Understanding and Managing Ecosystems

52
Machine Learning for Understanding and Managing Ecosystems Tom Dietterich Oregon State University In collaboration with Postdocs: Dan Sheldon (now at UMass, Amherst), Mark Crowley (now at U. Waterloo) Graduate Students: Majid Taleghan, Kim Hall, Liping Liu, Akshat Kumar, Tao Sun, Rachel Houtman, Sean McGregor, Hailey Buckingham Economists: H. Jo Albers, Claire Montgomery Cornell Lab of Ornithology: Steve Kelling, Daniel Fink, Andrew Farnsworth, Wes Hochachka, Benjamin Van Doren, Kevin Webb 1 IBM Cognitive Computing

Transcript of Machine Learning for Understanding and Managing Ecosystems

  • Machine Learning for Understanding and Managing Ecosystems Tom Dietterich Oregon State University

    In collaboration with Postdocs: Dan Sheldon (now at UMass, Amherst), Mark Crowley (now at U.

    Waterloo) Graduate Students: Majid Taleghan, Kim Hall, Liping Liu, Akshat Kumar, Tao

    Sun, Rachel Houtman, Sean McGregor, Hailey Buckingham Economists: H. Jo Albers, Claire Montgomery Cornell Lab of Ornithology: Steve Kelling, Daniel Fink, Andrew Farnsworth, Wes Hochachka, Benjamin Van Doren, Kevin Webb

    1 IBM Cognitive Computing

  • The World Faces Many Sustainability Challenges

    Species Extinctions Invasive Species Effects of Climate Change on these

    IBM Cognitive Computing 2

  • Computational Sustainability

    The study of computational methods that can contribute to the sustainable management of the earths ecosystems Data Models Policies

    Data Integration

    Data Interpretation

    Model Fitting

    Policy Optimization

    Data Acquisition

    Policy Execution

    3 IBM Cognitive Computing

  • Outline: Three Projects at Oregon State

    Models of Bird Migration Collective Graphical Models

    Policy Optimization Controlling Invasive Species Managing Wildland Fire

    Data Integration

    Data Interpretation

    Model Fitting

    Policy Optimization

    Data Acquisition

    Policy Execution

    4 IBM Cognitive Computing

  • BirdCast Project Understanding Bird Migration

    Goal: Develop a scientific model of bird migration Produce 24- and 48-hour bird migration forecasts

    Understanding bird decision making Absolute timing (e.g., based on day length) Temperature Wind speed and direction Relative humidity Food availability

    IBM Cognitive Computing 5

  • Data (1): www.ebird.org Volunteer Bird

    Watchers Stationary Count Travelling Count

    Time, place, duration, distance travelled Checklist of

    species seen 8,000-12,000

    checklists uploaded per day

    6 IBM Cognitive Computing

  • Data (2): Doppler Weather Radar

    Radar detects weather (remove) smoke, dust, and

    insects (remove) birds and bats

    IBM Cognitive Computing 7

  • Data (3): Acoustic monitoring Night flight calls People can identify species or

    species groups from these calls

    IBM Cognitive Computing 8

  • Modeling Goal: Spatial Hidden Markov Model Define a grid over the US Consider a single bird We say the bird is in state on day if it is

    located inside cell on that day Let ( ) be the probability that the

    bird will fly from cell to cell on the night from day to day + 1

    We will represent this probability in terms of variables such as wind speed and direction distance from to air temperature relative humidity day of the year etc.

    Let be the coefficients of the probability model.

    9 IBM Cognitive Computing

  • Simulating the Migration of a Single Bird Assume we know the value of The bird starts in cell 4 at time = 1 1 4 = 1

    Simulate the first night by drawing a cell according to 4 rolling a dice

    Repeat this for time steps

    If we had enough bird watchers, we could map out the trajectory of the bird

    Then we could match that against our simulated trajectory and adjust until the simulations matched the observed behavior IBM Cognitive Computing 10

  • Simulating the Migration of a Single Bird Assume we know the value of The bird starts in cell 4 at time = 1 1 4 = 1

    Simulate the first night by drawing a cell according to 4 rolling a dice

    Repeat this for time steps

    If we had enough bird watchers, we could map out the trajectory of the bird

    Then we could match that against our simulated trajectory and adjust until the simulations matched the observed behavior IBM Cognitive Computing 11

  • Simulating the Migration of a Single Bird Assume we know the value of The bird starts in cell 4 at time = 1 1 4 = 1

    Simulate the first night by drawing a cell according to 4 rolling a dice

    Repeat this for time steps

    If we had enough bird watchers, we could map out the trajectory of the bird

    Then we could match that against our simulated trajectory and adjust until the simulations matched the observed behavior IBM Cognitive Computing 12

  • Population of Birds Consider a population of birds The state of this population is a vector such that () is

    the number of birds in cell on day We can simulate each of these birds moving simultaneously each bird rolls a dice every night to decide where to go

    If we have enough bird watchers, we can get a good estimate

    of every day We can compare our simulations against the observations

    and adjust until they match

    IBM Cognitive Computing 13

  • This is very slow Computer Science to the rescue Formulate the problem mathematically Formalism is called the Collective Graphical Model

    (CGM) Develop algorithms for probabilistic inference Use these algorithms to fit the model to the observations

    IBM Cognitive Computing 14

  • 16 grid cells

    Probabilistic Inference for CGMs Gibbs sampler + Markov

    basis [Sheldon, Dietterich, NIPS 2011]

    IBM Cognitive Computing 15

  • 16 grid cells

    Probabilistic Inference for CGMs

    49 grid cells

    Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS 2011]

    IBM Cognitive Computing 16

  • 16 grid cells

    Probabilistic Inference for CGMs

    49 grid cells

    Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS 2011]

    Convex optimization [Sheldon, Sun, Kumar, ICML 2013]

    IBM Cognitive Computing 17

  • 16 grid cells

    Probabilistic Inference for CGMs

    49 grid cells

    Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS 2011]

    Convex optimization [Sheldon, Sun, Kumar, ICML 2013]

    Asymptotic Gaussian approximation [Liu, Sheldon, Dietterich ICML 2014]

    No Data

    IBM Cognitive Computing 18

  • 16 grid cells

    Probabilistic Inference for CGMs

    49 grid cells

    Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS 2011]

    Convex optimization [Sheldon, Sun, Kumar, ICML 2013]

    Asymptotic Gaussian approximation [Liu, Sheldon, Dietterich ICML 2014]

    Non-linear belief propagation [Sun, Sheldon, Kumar, ICML 2015]

    IBM Cognitive Computing 19

  • 16 grid cells

    Probabilistic Inference for CGMs Gibbs sampler + Markov

    basis [Sheldon, Dietterich, NIPS 2011]

    Convex optimization [Sheldon, Sun, Kumar, ICML 2013]

    Asymptotic Gaussian approximation [Liu, Sheldon, Dietterich ICML 2014]

    Non-linear belief propagation [Sun, Sheldon, Kumar, ICML 2015]

    Proximal algorithm [Vilnis, Belanger, Sheldon, McCallum UAI 2015]

    49 grid cells

    IBM Cognitive Computing 20

  • Initial Results: Ruby-throated Humming Bird

    IBM Cognitive Computing 21

  • Need to Constrain the Model Problem: The migration model tends to store birds in

    Canada There are no observations there, so the model is not constrained by

    the data

    Solution: Constrain the model Specify the times and places where the CGM is allowed to have birds

    IBM Cognitive Computing 22

  • Constrained Results: Ruby-Throated Humming Bird

    IBM Cognitive Computing 23

  • Fitted Transition Parameters Distance and direction traveled: northness: 0.4808 distance: 0.1895 stayput: 3.5058

    time: 0.5217 temperature: 0.1556 wind profit: 0.2754

    IBM Cognitive Computing 24

  • Next Steps: Integrating Multiple Data Sources

    IBM Cognitive Computing 25

    ,+1

    (, )

    = 1, ,

    ,+1 ()

    ,+1 ()

    ,+1 ()

    ,+1 ()

    = 1, ,(, ) = 1, , = 1, ,

    = 1, , = 1, , = 1, ,

    eBird acoustic radar

    bird

    s ,+1

  • Outline: Three Projects at Oregon State

    Models of Bird Migration Collective Graphical Models

    Policy Optimization Controlling Invasive Species Managing Wildland Fire

    Data Integration

    Data Interpretation

    Model Fitting

    Policy Optimization

    Data Acquisition

    Policy Execution

    26 IBM Cognitive Computing

  • Invasive Species Management in River Networks

    Tamarisk: invasive tree from the Middle East Out-competes native vegetation for

    water Reduces biodiversity

    What is the best way to manage a spatially-spreading organism?

    27 IBM Cognitive Computing

  • Mathematical Model Tree-structured river network Each segment has sites where a tree

    can grow. Each site can be {empty, occupied by native, occupied by

    invasive}

    Management actions Each segment: {do nothing, eradicate,

    restore, eradicate+restore}

    1 2

    3 4

    5

    n

    28 IBM Cognitive Computing

  • Dynamics and Objective Dynamics: In each time period

    1 2

    3 4

    5

    n

    29 IBM Cognitive Computing

  • Dynamics and Objective Dynamics: In each time period Natural death

    1 2

    3 4

    5

    n

    30 IBM Cognitive Computing

  • Dynamics and Objective Dynamics: In each time period Natural death Seed production

    1 2

    3 4

    5

    n

    31 IBM Cognitive Computing

  • Dynamics and Objective Dynamics: In each time period Natural death Seed production Seed dispersal (preferentially downstream)

    1 2

    3 4

    5

    n

    32 IBM Cognitive Computing

  • Dynamics and Objective Dynamics: In each time period Natural death Seed production Seed dispersal (preferentially downstream) Seed competition to become established

    1 2

    3 4

    5

    t n

    n n n

    33 IBM Cognitive Computing

  • Dynamics and Objective Dynamics: In each time period Natural death Seed production Seed dispersal (preferentially downstream) Seed competition to become established

    Couples all edges because of spatial spread Inference is intractable

    1 2

    3 4

    5

    t n

    n n n

    34 IBM Cognitive Computing

  • Dynamics and Objective Dynamics: In each time period Natural death Seed production Seed dispersal (preferentially downstream) Seed competition to become established

    Couples all edges because of spatial spread Inference is intractable

    Objective: Minimize expected discounted costs

    (sum of cost of invasion plus cost of management) Subject to annual budget constraint

    1 2

    3 4

    5

    t n

    n n n

    35 IBM Cognitive Computing

  • Finding the Optimal Management Policy

    Formalize as a Markov Decision Process Solve by Stochastic Dynamic Programming SDP requires transition matrix , , = (|,) We dont know Solution: Write a simulator Draw Monte Carlo samples from simulator to estimate [, ,]

    IBM Cognitive Computing 36

  • Solving the Tamarisk MDP using Monte Carlo Samples

    Repeat Use the current policy to choose a state and management action Invoke the simulator , (, ) is the resulting state is the cost of the action and the resulting state

    Update our model of Apply stochastic dynamic programming to compute an improved policy

    Until the policy has converged Key question: What , should we choose? Our answer: The DDV heuristic

    IBM Cognitive Computing 37

  • Comparison against best previous Monte Carlo MDP planning method

    IBM Cognitive Computing 38

    1.E+05

    1.E+06

    1.E+07

    Num

    ber o

    f Sam

    ples

    MDP

    DDV

    Fiechter

  • Published Rule of Thumb Policies for Invasive Species Management

    Triage Policy Treat most-invaded edge first Break ties by treating upstream first

    Leading edge Eradicate along the leading edge of invasion

    Chades, et al. Treat most-upstream invaded edge first Break ties by amount of invasion

    DDV Our PAC solution

    39 IBM Cognitive Computing

  • Cost Comparisons: Rule of Thumb Policies vs. DDV

    0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    Large pop, upto down

    Chades Leading Edge Optimal

    Total Costs

    Triage

    DDV

    Chades

    Leading Edge

    40 IBM Cognitive Computing

  • Outline: Three Projects at Oregon State

    Models of Bird Migration Collective Graphical Models

    Policy Optimization Controlling Invasive Species Managing Wildland Fire

    Data Integration

    Data Interpretation

    Model Fitting

    Policy Optimization

    Data Acquisition

    Policy Execution

    41 IBM Cognitive Computing

  • Managing Wildfire in Eastern Oregon Natural state: Large Ponderosa Pine trees with

    open understory Frequent ground fires that remove

    understory plants (grasses, shrubs) but do not damage trees

    Fires have been suppressed since

    1920s Heavy accumulation of fuels in

    understory Large catastrophic fires that kill all

    trees and damage soils Huge firefighting costs and lives lost

    42 IBM Cognitive Computing

  • Study Area: Deschutes National Forest

    Goal: Return the landscape to its natural fire regime Management Question: LET-BURN: When lightning

    ignites a fire, should we let it burn?

    43 IBM Cognitive Computing

  • Formulating LETBURN as a Markov Decision Process ,,,,

    State space: 4000 management units; each unit is in one of 25 local states Weather Ignition site

    Action space: At fire ignition time , ,

    Reward function: (, ,) Cost of lost timber value Cost of lost species habitat Cost of fire suppression

    44

    ignition

    action

    fire outcome

    +1

    new ignition

    fire simulator lightning simulator

    IBM Cognitive Computing

  • The Simulator is Very Expensive

    Simulating one fire can take from 5 to 60 minutes (depending on the size of the fire) FARSITE Forest Vegetation Simulator (FVS) Lightning Strike model Weather Simulator

    Monte Carlo methods require at least 106 simulator calls What can we do?

    IBM Cognitive Computing 45

  • Current Strategy: Policy Search using a Surrogate Model Define a parameterized space of policies: = Simulate an initial set of 100-year trajectories under a variety

    of policies Apply Bayesian Optimization (SMAC; Hutter, et al., 2011) to

    find the optimal value of To simulate for some new , apply the Model-Free

    Monte Carlo algorithm (Fonteneau, et al., 2013)

    IBM Cognitive Computing 46

  • A Simpler Problem: LETBURN one year

    Is there any benefit to allowing fires to burn for just one year? Year 1: LETBURN Years 2-100: SUPPRESS ALL

    Evaluate via Monte Carlo trials

    47 IBM Cognitive Computing

  • Expected Benefit of LETBURN (Suppress all fires after year 1)

    0

    5

    10

    15

    20

    25

    30

    35

    -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

    Freq

    uenc

    y

    Expected Benefit (x $100,000)

    mean = $2.47 million

    median = $2.74 million

    48 [Houtman, Montgomery, Gagnon, Calkin, Dietterich, McGregor, Crowley 2013] IBM Cognitive Computing

  • Summary

    Models of Bird Migration Collective Graphical Models

    Policy Optimization Controlling Invasive Species Managing Wildland Fire

    Data Integration

    Data Interpretation

    Model Fitting

    Policy Optimization

    Data Acquisition

    Policy Execution

    49 IBM Cognitive Computing

  • Common Threads Spatially-spreading processes Bird migration Invasive species Fire spread

    Dynamical model CGM: Spatial HMM with clever inference Simulator of seed spread Simulator of fire spread

    Computational challenges Efficient probabilistic inference Minimize calls to expensive simulators Value of information heuristics + PAC guarantees Bayesian optimization

    IBM Cognitive Computing 50

  • Thank-you Dan Sheldon, Akshat Kumar, Tao Sun: Collective Graphical Models Steve Kelling, Andrew Farnsworth, Wes Hochachka, Daniel Fink:

    BirdCast H. Jo Albers, Kim Hall, Majid Taleghan, Mark Crowley: Tamarisk Claire Montgomery, Sean McGregor, Mark Crowley, Rachel Houtman Carla Gomes for spearheading the Institute for Computational

    Sustainability

    National Science Foundation Grants 0832804 (CompSust), 1331932 (CyberSEES), 1125228 (Birdcast), 1521687 (CompSustNet)

    51 IBM Cognitive Computing

  • Common Threads Spatially-spreading processes Bird migration Invasive species Fire spread

    Dynamical model CGM: Spatial HMM with clever inference Simulator of seed spread Simulator of fire spread

    Computational challenges Efficient probabilistic inference Minimize calls to expensive simulators Value of information heuristics + PAC guarantees Bayesian optimization

    IBM Cognitive Computing 52

    Machine Learning for Understanding and Managing EcosystemsThe World Faces Many Sustainability ChallengesComputational SustainabilityOutline: Three Projects at Oregon StateBirdCast ProjectUnderstanding Bird MigrationData (1): www.ebird.orgData (2): Doppler Weather RadarData (3): Acoustic monitoringModeling Goal: Spatial Hidden Markov ModelSimulating the Migration of a Single BirdSimulating the Migration of a Single BirdSimulating the Migration of a Single BirdPopulation of BirdsThis is very slowProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsInitial Results:Ruby-throated Humming BirdNeed to Constrain the ModelConstrained Results:Ruby-Throated Humming BirdFitted Transition Parameters Next Steps: Integrating Multiple Data SourcesOutline: Three Projects at Oregon StateInvasive Species Management in River NetworksMathematical ModelDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveFinding the Optimal Management PolicySolving the Tamarisk MDP using Monte Carlo SamplesComparison against best previous Monte Carlo MDP planning methodPublished Rule of Thumb Policies for Invasive Species ManagementCost Comparisons: Rule of Thumb Policies vs. DDVOutline: Three Projects at Oregon StateManaging Wildfire in Eastern OregonStudy Area: Deschutes National ForestFormulating LETBURN as a Markov Decision Process ,,, , The Simulator is Very ExpensiveCurrent Strategy:Policy Search using a Surrogate ModelA Simpler Problem: LETBURN one yearExpected Benefit of LETBURN(Suppress all fires after year 1)SummaryCommon ThreadsThank-youCommon Threads