Step three: statistical analyses to test biological hypotheses

download Step three: statistical analyses to test biological hypotheses

If you can't read please download the document

description

Step three: statistical analyses to test biological hypotheses. General protocol continued. Biological hypotheses and statistical tests. Hypotheses driven by Biology Statistics depend on data and hypotheses NO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!! - PowerPoint PPT Presentation

Transcript of Step three: statistical analyses to test biological hypotheses

  • Step three: statistical analyses to test biological hypothesesGeneral protocol continued

  • Biological hypotheses and statistical testsHypotheses driven by BiologyStatistics depend on data and hypothesesNO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!!

    Explanatory hypotheses: relative position of specimens in data space:relationship among specimens in data spaceConfirmatory hypotheses: compare groups, associate shape with other variables, etc.

  • Some hypotheses (shape related)How do populations and species differ?Does the observed variation generate a predictable pattern?Are there additional factors (ecological, evolutionary) correlated with variation?How does shared evolutionary history affect the observed patterns?

  • Hypotheses as statistical testsDo populations differ?Is there a predictable pattern?Correlated factors?Effect of phylogeny?

    MANOVA, CVA

    PCA, UPGMA

    Regression, 2B-PLS

    Comparative Method

  • Exploratory data analysisInvestigate data using only Y-matrix of shape variables (PWScores + U1,U2)Specimens are points in high-dimensional data spaceLook for patterns and distributions of pointsGenerate summary plot of data space (ordination)Look for relationships of points (clustering)

  • Ordination and dimension reductionVisualize high dimensional data space as succinctly as possible

    Describe variation in original data with new set of variables (typically orthogonal vectors)Order new variables by variation explained (most least)Plot first few dimensions to summarize dataPrincipal Components Analysis (PCA) one approach (others include: PCoA, MDS, CA, etc.)

  • PCA: what does it do?Rotates data so that main axis of variation (PC1) is horizontalSubsequent PC axes are orthogonal to PC1, and are ordered to explain sequentially less variationThe goal is to explain more variation in fewer dimensions

  • PCA: interpretationsEigenvectors are linear combinations of original variables (interpreted by PC loadings of each variable)PCA PRESERVES EUCLIDEAN DISTANCES among objectsPCA does NOTHING to the data, except rotate it to axes expressing the most variation; it loses NO INFORMATION (if all PC vectors retained)If the original variables are uncorrelated, PCA not helpful in reducing dimensionality of data

    PCA does not find a particular factor (e.g., group differences, allometry): it identifies the direction of most variation, which may be interpretable as a factor (but may not)

  • Example: leatherside chub

  • ClusteringData are dots in a high-dimensional space (Y-matrix)Can we connect to dots for groupings, where clusters represent groups of similar specimens?Cluster methods generate 1-dimensional view of relationships, based on some criterionClustering requires distance (or similarity) between points MANY different criteria

    Clustering is algorithmic, not algebraic (i.e., it is a procedure, or set of rules for connecting data)

  • Clustering: UPGMA

  • Conclusions: exploratory methodsUseful tools for summarizing shape variationHelp you understand your data through visualizing variation (both ordination plots and cluster diagrams)Help describe relationships among specimens in terms of overall similarity

  • Confirmatory data analysisInvestigate data using shape variables (Y-matrix) and other (independent) variables (X-matrix)Test for patterns of shape variationIndependent variables determine type of statistical test

  • Types of independent variablesCategorical: variables delineating groups of specimens (e.g., male/female, species, etc.)Continuous: variables on a continuous scale (e.g., size, moisture, age, etc.)Different statistical methods for each

  • Some statistical testsCategorical: shape differences among groupsContinuous: relationship of variables and shapeContinuous: association of variables and shape

    MANOVA

    Mult. Regression

    2B-PLS (2-Block Partial Least squares)MANOVA and multivariate regression are both GLM statistics (General Linear Models)

  • Group differences: MANOVAIs there a difference in shape between groups?Multivariate generalization of ANOVACompares variation within groups to variation between groupsSignificant MANOVA: Group means are different in shape

  • RW1-RW30 Utah chub

    SourceSexLocSex X loc IL/SLSizeMANOVAWilks' Lambda 0.61907356 1.83 30 89 0.0159 Wilks' Lambda 0.75516916 0.96 30 89 0.5318Wilks' Lambda 0.10138762 1.40 180 533.33 0.0020Wilks' Lambda 0.00308619 3.26 240 706.35

  • MANOVA: post hoc testsPairwise comparisons using Generalized Mahalanobis Distance (D2 or D) Convert D2 T2 F to testFor experiment-wise error rate, adjust using Bonferroni: exp = / # comparisons

  • Discriminant analysis: CVA & DFACombination of MANOVA and PCATests for group differences (MANOVA)PCA of among-group variation relative to within-group variationSuggests which groups differ on which variablesCan classify specimens to groups

    Special case: 2 groups= discriminant function analysis (DFA)

  • DFA/CVA: post-hoc testsFor DFA/CVA, compare difference among groups using Generalized Mahalanobis Distance (D2)Mahalanobis D2 is logical choice because CVA/DFA is MANOVA, and the PCA is relative to within-group variability (i.e., VCV standardized)Convert D2 T2 F to perform statistical testExperiment-wise error rate adjusted as before (i.e., adjusted )

  • Continuous variation: regressionIs there a relationship between shape and some other variable?Multivariate regression of shape on continuous variableSignificant regression implies shape changes as a function of other variable (e.g., size)

  • Example of shape on size in mountain sucker

    Multivariate tests of significance: Statistic Value Fs df1 df2 Prob Wilks' Lambda: 0.34356565 22.822 36 430.0 3.580E-078 Pillai's trace: 0.65643435 22.822 36 430.0 3.580E-078 Hotelling-Lawley trace: 1.91065190 22.822 36 430.0 3.580E-078 Roy's maximum root: 1.91065190 22.822 36 430.0 3.580E-078

    Test that kth root and those that follow are zero: k U Fs df1 df2 Prob 1 0.34356565 22.822 36 430.0 3.580E-078

  • Continuous variation: association 2B-PLSIs there an association between shape and some other set of variables (not causal)?Find pairs of linear combinations for X & Y that maximize the covariation between data setsLinear combinations are constrained to be orthogonal within each set (like PC axes) but NOT between data setsCalculations less complicated for 2B-PLS (because fewer mathematical constraints)Analogous to multivariate correlation

    2B-PLS is called SINGULAR WARPS when shape is one or more of the data sets. Bookstein et al., 2003: J. of Hum. Evol.)

  • Resampling methodsMethods that take many samples from original data set in some specified way and evaluate the significance of the original based on these samplesResampling approaches are nonparametric, because they do not depend of theoretical distributions for significance testing (they generate a distribution from the data)Are very flexible, and can allow for complicated designs

    Very useful in morphometrics, and can be used for:Testing standard designsTesting non-standard designsTesting when sample sizes small relative to # of variables

  • Randomization (permutation)Proposed by Fisher (1935) for assessing significance of 2-sample comparison (Fishers exact test)Fishers exact test: a total enumeration of possible pairings of dataRandomization can be used to determine most any test statistic ProtocolCalculate observed statistic (e.g., T-statistic): EobsReorder data set (i.e. randomly shuffle data) and recalculate statistic ErandRepeat many times to generate distribution of statisticPercentage of Erand more extreme than Eobs is significance level

  • Randomization: commentsRandomization EXTREMELY useful and flexible techniqueHow and what to resample depends upon data and hypothesisRegression and correlation: shuffle Y vs. XGroup comparison (e.g., ANOVA): shuffle Y on groupsSome tests (e.g., t-test) may depend on direction (1-tailed vs. 2-tailed)

    Also useful when no theoretical distribution exists for statistic, or when design is non-standardThis is frequently the case in E&E studies

  • Step four: Graphical depiction of resultsStrength of landmark-based TPS approachCan view deformation of TPS grid among groups or with continuous variable

  • Superimposition

  • Effect of relative intestinal length: measure of trophic level

    Long IL/SL3.0

    Short IL/SL0.72

  • Effect of gradient on shape in mountain suckerLowHigh