Lepidoptera: Where They Are and When They Fly Roy Adams, UC Davis Ryan Smith, Union College Camila...

download Lepidoptera: Where They Are and When They Fly Roy Adams, UC Davis Ryan Smith, Union College Camila Matamala-Ost,Oregon State Chris Mattioli, Providence.

If you can't read please download the document

Transcript of Lepidoptera: Where They Are and When They Fly Roy Adams, UC Davis Ryan Smith, Union College Camila...

  • Slide 1
  • Lepidoptera: Where They Are and When They Fly Roy Adams, UC Davis Ryan Smith, Union College Camila Matamala-Ost,Oregon State Chris Mattioli, Providence College in Providence Rhode Island Elizabeth Cowdery, Cornell Grace Zalenski, Lewis & Clark
  • Slide 2
  • Lepidoptera Lepidoptera is second largest class in Insecta Approximately 600 species of moths occur in the H.J. Andrews Experimental Forest Relatively little is known
  • Slide 3
  • Prey Defoliators PollinatorsDecomposers Diverse Ecological Roles EggCaterpillar Pupa Moth Ecosystem Functions Stages of lifecycle
  • Slide 4
  • Pollination Rodents Reptiles Bats Birds Spiders Beetles True bugs Nematodes Ecosystem connections Prey
  • Slide 5
  • Assessing Environmental Impacts Temperature Caterpillar growth rate affected by temperature Caterpillar must reach certain critical size to enter pupal stage Majority of moths in Pacific northwest overwinter as egg or in cocoon Many species wont emerge unless undergo period of cold (dipause) Plant Nutrition Sensitive to nitrogen and water content High water content enhances growth Theoretically, moth abundance and/or emergence could be linked to changes in nitrogen and water content of plants. Source of food, impacted by abundance of food source, sensitive to changes in temperature and nutritional quality of plants.
  • Slide 6
  • Slide 7
  • Moth Sampling Universal Black light traps 22w circular bulbs, 12v batteries Set 1 2 hours before sunset Moths attracted to light and stunned by insecticide and acrylic veins Intervals of 1+ weeks Biased towards phototatic night flying moths (majority) Data not used this summer, will be used in island biogeography study
  • Slide 8
  • Moth Identification
  • Slide 9
  • Moth Data Used in Modeling Sampled with same method by Jeffrey Miller 2004-2008 Emergence uses data from 20 sites trapped 30+ times Moth Distribution includes data from biological inventory survey Almost 40% sites trapped only once More than half trapped either once or twice Feralia deceptiva
  • Slide 10
  • Vegetation Sampling Purpose: Test hypothesis that moths are distributed near host plants by contributing to a database of vegetation data at moth sampling sites 32 sites 100 meters in 4 directions All vascular plant species except fern allies (except horsetails) To learn more about host plants Polystichum munitum
  • Slide 11
  • Phenology and Climate Change As difficult as it is to predict precisely how the planet will warm over the next century or so, it is even harder to refine predictions of how those changes will affect specific species. 1 What are the drivers of moth emergence? Moths are poikilothermic How will climate change influence moth emergence? Due to human induced climate change over the last decade, phenology has become one of the leading indicators of species response to environmental change 2 Will this have an effect on other animals? 1: Barringer, Felicity. Trout Fishing in a Climate Changed America. New York Times Green Blog 16/7/2011. 16/7/2011. 2: Roy, DB & Sparks, TH. Phenology of British butterflies and climate change. Global Change Biology (2000). 6, 407-416.
  • Slide 12
  • Emergence Objectives Improve on the previous model Create a model that can predict on which day moths will emerge Use degree days instead of Julian days: GDD for plants
  • Slide 13
  • Model Counts with Julian Days as the interval Degree-Day Curve Model Model Showing Counts with Degree Days as interval
  • Slide 14
  • Degree Days Took max temp. data from HJ Andrews Assigned trap sites to Met. and Ref. Stands Interpolated missing data Discuss procedure for calculations
  • Slide 15
  • Thermal Climate of the H.J. Andrews Experimental Forest PRISM estimated mean monthly maximum and minimum temperature maps showing topographic effects of radiation and sky view factors. Provided by Jonathan W. Smith and EISI 2010
  • Slide 16
  • Formula:=IF(VANMET!B4>0,VANMET!B4,0)+'VANMET DEGREE DAYS'!B2
  • Slide 17
  • The Model Uses abundance data from trapping Estimates parameters of emergence and abundance curves from trap counts Optimizes parameter estimates to create emergence and abundance curves
  • Slide 18
  • P(j,k) We assume we catch all moths flying at trap time P(j,k) is the probability that a moth emerges in interval j and has a natural death time in interval k Measures abundance
  • Slide 19
  • Variables In original model, P(j,k) found by numerically integrating the joint density Q(j,k) and q j successively computed Likelihood function uses q j to optimize parameter estimates Emergence time: Lifespan:
  • Slide 20
  • Obtaining our parameters = Pr(moth caught by trap) m = # moths flying q j = Pr(moth trapped at t j ) Assume (a constant)
  • Slide 21
  • Multinomial distribution
  • Slide 22
  • Convergence in distribution... Where the F i s are Poisson random variables As and, we assume (expected value of moths caught) approaches some constant
  • Slide 23
  • Distribution, cont. m and alpha are unknown If m is large and alpha small enough, the likelihood will be very close to Poisson The model uses the multinomial distribution :
  • Slide 24
  • Incorporating degree days Degree day values: Each moth has emergence threshold, D Now define
  • Slide 25
  • Changes Compute P(j,k) differently because T e is discrete Single set of parameters for each species, rather than separate for each trap and year AIC: measure of fit
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • 3G Days Since May 1 st 3G Degree Days
  • Slide 30
  • Slide 31
  • 5O Days Since May 1 st 5O Degree Days
  • Slide 32
  • Slide 33
  • Future Work Degree Days Revisit interpolation methods Experiment with different degree thresholds and starting dates Model Multinomial v. Poisson Multiple traps for one year Take new data into account: Vegetation surveys Elevation, Aspect, Watershed, Habitat
  • Slide 34
  • Species Distribution Model Applications Combine numerical observations and relevant variables (often environmental and spatial) to predict species distribution in space and/or time Why do this? Ecological insight, further research topics Land use management and conservation planning
  • Slide 35
  • SDMs and Machine Learning Supervised machine learning Use training data: {(x 1,y 1 ), (x 2,y 2 ),,(x n,y n )} to arrive at a function f(x i ) y i. Split data into training, test, and validation sets. Assume that if a moth exists at each site weve trapped it at least once at that site.
  • Slide 36
  • SDMs and Machine Learning Training set: half of the original data set used in initially learning and fitting the function. Certain algorithms require their parameters to be tuned for optimum performance. This is accomplished by testing the model against a validation set a subset of the training set. Test set: half of the original data set separate from the training set. After parameter tuning, the functions accuracy can be evaluated by running it on a test set.
  • Slide 37
  • Quantifying Accuracy The area under the receiver operating characteristic curve (AUC) is used as our measure of accuracy for the distribution maps. It is the probability that a randomly selected positive instance (moth presence) is ranked higher than a randomly selected negative instance. AUC = 0.5 indicates a random guess.
  • Slide 38
  • Learning Algorithms Algorithms Random Forest Logistic Regression Support Vector Machines Generalized Boosted Regression Models Corresponding R package randomForest glmnet e1071 gbm
  • Slide 39
  • Random Forest Ensemble method Grows decision trees by combining bagging with the random selection of features. A decision tree is a model of decisions and their outcomes. Internal nodes represent points where a decision is made, and the leaves represent the outcomes. Bagging is the process of randomly sampling with replacement from the set of training examples, and constructing a decision tree from the bag. Random forest also randomly selects features for each training example rather using the whole of features.
  • Slide 40
  • Random Forest Plant #1 Pred #1 Present Plant #2 Temp Absent Present False AbsentPresent True HighLow TemperaturePlant #1Plant #2Predetor #1Moth 1HighTRUE Present 2LowTRUE Absent 3LowTRUEFALSE Present 4HighFALSETRUE Present 5LowFALSE Absent
  • Slide 41
  • Tuning Random Forest After creating n bags, and growing n trees new data can be classified by taking a vote of all the trees predictions. The number of trees grown can be altered and tuned as can the number of nodes of each tree.
  • Slide 42
  • Logistic Regression P(y = 1|x) = 1/(1+e -t ) Where t is ( 0 + 1 x 1 ++ n x n ) Attempt to find appropriate values to weigh the covariates.
  • Slide 43
  • Tuning Logistic Regression It is oftentimes optimal to restrict the number and size of these values in regression. There is a combination of penalty terms called the elastic net to achieve these restrictions. Penalty term takes the form: [((1-)/2)*|| 2 + (*||)] The parameters: and are tuned. controls which term is more important. controls the weight of the entire expression.
  • Slide 44
  • Support Vector Machines Non-probabilistic classifier. Attempts to construct an n-dimensional hyperplane to separate two possible classes of data. The most desirable hyperplane is the one with the largest functional margin.
  • Slide 45
  • Tuning Support Vector Machines Oftentimes data is not linearly separable. Kernel functions map the data unto a space where a hyperplane can be easily constructed. Linear Radial Sigmoid Polynomial
  • Slide 46
  • Generalized Boosted Regression Models Ensemble Method Loss function: a measure that represents the loss in predictive performance of a model. GBMs construct an initial regression tree that maximally reduces the loss function. A regression tree is a decision tree whose outputs are real-valued.
  • Slide 47
  • Generalized Boosted Regression Models To further reduce the loss function, new trees are added. At the second step, a regression tree is fitted using the residuals (variations in response) of the first tree. The model now updates to contain two terms, and residuals are taken from the two-term model. The process continues in this stage-wise fashion until a specified parameter n.trees. Fitted values update with each new tree addition.
  • Slide 48
  • Tuning GBM Like the other learning algorithms, GBM also has parameters to be tuned. The number of trees to be constructed and added. The number of nodes in each tree (interaction depth).
  • Slide 49
  • Algorithm Performance Logistic Regression Random Forest gbm SVM Avg AUC =.605 Avg AUC =.505 Avg AUC =.607 Avg AUC =.606
  • Slide 50
  • Acknowledgements NSF OSU OSU Arthropod Museum Matt Cox Steve Highland Tom Dietterich Dan Sheldon Olivia Poblacion Julia Jones Desiree Tullos Jorge Ramirez John & Emily Vera Jeff Miller/Paul C. Hammond