1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop...

23
1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond Linear Regression http://www.stat.usu.edu/~jrstevens/pcmi
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop...

Page 1: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

1

John R. StevensUtah State University

Notes 1. Case Study Data Sets

Mathematics Educators Workshop 28 March 2009

1

Advanced Statistical Methods:

Beyond Linear Regression

http://www.stat.usu.edu/~jrstevens/pcmi

Page 2: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

2

Why this workshop?

2

Me …Outreach mission of USURecruitment – undergraduate & graduateToo much fun

You …

Page 3: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

3

Outline

3

Notes 1: Case Study Data sets1. Challenger Explosion2. Beetle Fumigation3. T-cell Cancer

Notes 2: Statistical Methods ILogistic Regression – incl. Separation of PointsEM Algorithm

Notes 3: Statistical Methods IITests for Differential ExpressionMultiple hypothesis testingVisualizationMachine Learning

Notes 4: Computer Implementation (Notes 5): Bonus Material

Page 4: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

4

Case Study 1: ChallengerJanuary 18, 1986 explosion prompted the

Presidential Commission on the Space Shuttle Challenger Accident

Commission's 1986 report attributed the explosion to a burn through of an O-ring seal at a field joint in one of the solid-fuel rocket boosters

After each of the previous 24 launches, the solid rocket boosters were inspected, and the presence or absence of damage to the field joint was noted

Page 5: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

5

Challenger Data

Motivating question:

What was sodifferent on the 25th launch?

Obs Flight Temp Damage1 STS1 66 NO2 STS9 70 NO3 STS51B 75 NO4 STS2 70 YES5 STS41B 57 YES6 STS51G 70 NO7 STS3 69 NO8 STS41C 63 YES9 STS51F 81 NO10 STS4 8011 STS41D 70 YES12 STS51I 76 NO13 STS5 68 NO14 STS41G 78 NO15 STS51J 79 NO16 STS6 67 NO17 STS51A 67 NO18 STS61A 75 YES19 STS7 72 NO20 STS51C 53 YES21 STS61B 76 NO22 STS8 73 NO23 STS51D 67 NO24 STS61C 58 YES

Page 6: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

6

Case Study 2: Beetle Fumigation – Rhyzopertha Dominica

(Image courtesy Clemson University – USDA Cooperative Extension Slide Series, www.insectimages.org)

Page 7: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

7

MotivationBeetle: lesser grain borer

A primary pest of stored grainA year-round problem in moderate climates

Australian grain industry: $6–8 billionZero tolerance for insect-infested grainPhosphine fumigant for controlSome beetles have developed resistance

levels more than 235 times greater than normal

(UQ News Online, 18 Oct. 1999)

Page 8: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

8

Experimental BackgroundTwo DNA markers linked to resistance

rp6.79: two genotypes: –,+rp5.11: three genotypes: B,H,A

Motivating question:

What contributes to the degree of resistance?

Mixture of six beetle genotypes exposure to various concentrations of fumigant (48 hours)

Page 9: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

9

Experimental Data

Phosphine Total Dosage Receiving Total Total Survivors Observed at Genotype (mg/L) Dosage Deaths Survivors -/B -/H -/A +/B +/H +/A 0 98 0 98 31 27 10 6 20 4 0.003 100 16 84 18 26 10 6 20 4 0.004 100 68 32 10 4 3 5 7 4 0.005 100 78 22 1 4 7 2 6 2 0.01 100 77 23 0 1 9 8 5 0 0.05 300 270 30 0 0 0 5 20 5 0.1 400 383 17 0 0 0 0 10 7 0.2 750 740 10 0 0 0 0 0 10 0.3 500 490 10 0 0 0 0 0 10 0.4 500 492 8 0 0 0 0 0 8 1.0 7850 7,806 44 0 0 0 0 0 44 10,798 10,420 378

Page 10: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

10

Practical Considerations in Choosing Dosage

Clearly a high dosage would kill all beetles, regardless of genotype

Time more important than concentrationExpense

more time with lower doseTechnical limitations

maintain concentration in silosSafety

spontaneous combustion at high conc.

Page 11: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

11

Case Study 3: T-cell CancerAcute lymphoblastic leukemia (ALL)

leukemia – cancer of white blood cellsALL – excess of lymphoblasts (immature cells

that become white blood cells)Two types of interest here:

T-cell – manage cell-mediated immune response(activation of cells, release of cytokines)

B-cell – manage humoral immune response(secretion of antibodies)

Researchers used gene expression technology

Page 12: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

12

Central Dogma of Molecular Biology

Page 13: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

13

General assumption of microarray technology

Use mRNA transcript abundance level as a measure of the level of “expression” for the corresponding gene

Proportional to degree of gene expression

Page 14: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

14

How to measure mRNA abundance?

Several different approaches with similar themes:Affymetrix GeneChipNimblegen arrayTwo-color cDNA arraymore

Representation of genes on slideSmall portion of geneLarger sequence of gene

oligonucleotide arrays

Page 15: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

15

Affymetrix Probes

(Images courtesy Affymetrix, www.affymetrix.com)

25 bp

Page 16: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

16

Affymetrix Technology – GeneChip

Each spot on array represents a single probe sequence (with millions of copies) Perfect match Mismatch

Each gene is represented by a unique set of probe pairs (usually 12-20 probe pairs per probe set)

These probes are fixed to the array

(Image courtesy Affymetrix, www.affymetrix.com)

Page 17: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

17

Affymetrix Technology – Expression

(Images courtesy Affymetrix, www.affymetrix.com)

A tissue sample is prepared so that its mRNA has fluorescent tags; wait for hybridization

Page 18: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

18

Affymetrix GeneChip

Image courtesy Affymetrix, www.affymetrix.com

Page 19: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

19

Cartoon Representations

Animation 1: GeneChip structure (1 min.)

Animation 2: Measuring gene expression (2.5 min)

Page 20: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

20

Data: Spot Intensities

Images courtesy Affymetrix, www.affymetrix.com

Full Array Image Close-up of Array Image

Page 21: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

21

Basic goal of microarray technology“Observe” gene expression in different

conditions – healthy vs. diseased, e.g. Decide which genes’ expression levels are

changing significantly between conditions Target those genes – to halt disease, e.g. Study those genes – to better understand

differences at the genetic level

Page 22: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

22

“Preprocessed” gene expression data12625 genes (hgu95av2 Affymetrix GeneChip)128 samples (arrays)a matrix of “expression values” – 128 cols, 12625

rowsphenotypic data on all 128 patients, including:

95 B-cell cancer33 T-cell cancer

Motivating question: Which genes are changing expression values systematically between B-cell and T-cell groups?

ALL Data

Page 23: 1 John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond.

23

Next …Analysis for these case studies

Build on known statistical methods

Notice huge potential for additional methods