Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do...

17
Data Analysis Data Analysis E3: Lecture 8

Transcript of Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do...

Page 1: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Data Analysis Data Analysis

E3: Lecture 8

Page 2: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Data AnalysisData Analysis

Lecture Outline

• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data

• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)

Page 3: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Data AnalysisData Analysis

Lecture Outline

• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data

• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)

Page 4: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Handling Data• After a laboratory experiment or time out in the field, you will

have several data points.

• How should one process this (potentially voluminous) data?1) Organize it (spreadsheet programs, like Excel, can help)2) Process it

I) Investigate portions of the data setII) Look at relevant descriptive statisticsIII) Transform data points in a well-defined wayIV) Combine data points in a well-defined way

3) Visualize it4) Subject it to an appropriate statistical test

Massaging?

Dressing-up?

3 P colonies4 R colonies

*Focusing

Page 5: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Picture = Words 1000Grade Distribution

ABCDE

Understanding the Black Box

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

Number of Trials

Acc

urac

y of

Pre

dict

ion

.

• We are visual animals and often can see patterns when data is presented visually

• Examples:

- Pie-chart illustrates the distribution of values of a single variable

- X-Y plot illustrates the form of the relationship between two variables

- Paired histograms illustrate the relationship between the distributions of two variables.

• The most appropriate picture will often depend on the data:- Categorical or

quantitative?- Frequencies, counts

or measurements?- Relationship between

data points?

Page 6: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Data AnalysisData Analysis

Lecture Outline

• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data

• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)

Page 7: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

The Data

• Go to our class website:

http://depts.washington.edu/kerrpost/Bio481/HomePage

• On the DATA link, download the following Excel (xls) files:

- “E3_LD_Processed_Data”- “E3_PG_Processed_Data”

• Take care as you process and visualize the class data– the product of your efforts can be used directly in your first two lab reports.

Page 8: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

DAY 1: Tuesday

Processing the Luria-Delbruck Data

×24

×3

48 hours at 37C

48 hours at 37C

DAY 3: Thursday

COUNT

COUNT

• We’ll start by computing some useful statistics:- Mean number of colonies on a rifampicin plate.- Variance in number of colonies on a rifampicin plate.- Total number of rifampicin plates (number of replicates in the class).

• Next we will compile the full distribution of rifampicin plate counts:- Actual distribution (COUNTIF function will be useful)- Expected distribution (get ready to write a complicated function!)- Let’s plot these distributions.

• Finally, let’s compute the density of cells in the original wells.

Page 9: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Data AnalysisData Analysis

Lecture Outline

• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data

• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)

Page 10: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

DAY 3: ThursdayDAY 2: WednesdayDAY 1: Tuesday

Processing the Public Goods Data

• We’ll start by computing the densities when alone.• Next, let’s compute the relative fitnesses (BK26+pBR relative to BK27) & plot these.

Agar comp.

Liqu

idco

mp.

24 hours at 37C

24 hours at 37C

24 hours at 37C

COUNT

24 hours at 37C

24 hours at 37C

COUNT

BK27

BK

26+

pBR

monocultures ↑monocultures ↑ competitions ↓competitions ↓

BK27

BK

26+

pBR

Init.

Mix

amp

BK27

al

one

BK26

+pBR

alon

e

monocultures ↑monocultures ↑ competitions ↓competitions ↓

Agar comp.

Liqu

idco

mp.

24 hours at 37C

24 hours at 37C

24 hours at 37C

24 hours at 37C

24 hours at 37C

COUNT

COUNT

COUNT

COUNTBK

27

alon

eBK

26+p

BRal

one

Page 11: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Save Your Work

• Save your work by renaming the data files:

- “E3_LD_Processed_Data_YOUR_INITIALS”- “E3_PG_Processed_Data_YOUR_INITIALS”

• Save these files on a thumb drive or email them to yourself.

• We will continue to work on these during class today.

Page 12: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Data AnalysisData Analysis

Lecture Outline

• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data

• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)

Page 13: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

• Suppose your hear that a high-protein diet during puberty leads to an increased height as an adult.

- The mean height in a high protein treatment was 5’11” and the mean height in a control treatment was 5’5”

- What would you feed your kids? How do you gauge this?

• The New York Times has just done an expose about sexism in graduate admissions in a famous department of mathematics

- While the number of male and female applicants was equal, the number of males admitted was greater.

- Should an formal inquiry take place? How do you evaluate the data?

How do we statistically evaluate data?• When you were a child, your father tells you he will let you stay up

late if the result of a coin he flips is heads.- Suppose the coin comes up heads 25% of the time- Is your Dad using a fair coin? How would you evaluate this?

12 3100 25

4 1Number of flips Number of “heads”

Control High-protein Control High-protein

♀a ♂a

Control High-protein

♀e ♂e

freq

uenc

y

Page 14: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

The Data

• Go to your email or thumb drive and download your processed data files:

- “E3_LD_Processed_Data_YOUR_INITIALS”- “E3_PG_Processed_Data_YOUR_INITIALS”

• Take care as you analyze the class data– the product of your efforts can be used directly in your first two lab reports.

Page 15: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Student’s t-test

William Sealy Gossett

• DEMO: Performing a t-test- Computing a p-value from a t-test- Distinguish the different types of t-tests:

Paired versus Unpaired data Equal versus Unequal variance One-tailed versus Two-tailed tests

• Using a pseudonym, “Student,” Gossett described a test for distinguishing the difference between means of 2 data sets.

• The t-test uses the statistics from two groups of data (means and s.d.) to generate a third statistic (the t statistic).

• If the two groups of data come from populations with the same mean, the t statistic has a characteristic distribution itself (note the shape will depend on the sample sizes).

• If the computed t is extreme, then the chance that the two groups have equal means is slim (quantified by the p-value of the test). The means are significantly different if p<0.05.

• Assumptions- Each datum is independent- Data is normally distributed

• How can we use t-tests for the Public Goods Data? What type of t-test is appropriate? How do you report it?

Page 16: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

Data AnalysisData Analysis

Lecture Outline

• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data

• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)

Page 17: Data Analysis E3: Lecture 8. Data Analysis Lecture Outline Processing and Visualizing Data -Why do we do this? -Processing the Luria-Delbruck Data -Processing.

2 Test• Karl Pearson introduced a test to distinguish whether an

observed set of frequencies differs from a specified frequency distribution.

• The -test uses frequency data to generate a statistic (2).

• If the observed frequencies come from a population with the specified frequency, the 2 statistic has a characteristic distribution (the shape will depend on the # of classes).

• If the computed 2 is extreme, then the chance that the observed frequencies derive from the specified distribution is slim (this is quantified by the p-value from the test). The observed frequencies are significantly different if p<0.05.

• Assumptions- Frequencies are derived from independent sampling- There are not several frequencies that are very small

Karl Pearson

• DEMO: Performing a chi-square test- Computing a p-value from a chi-square-test

• Perform a 2 test to see if the Luria-Delbruck Data from the class differs from the frequencies expected under directed mutation. What can you conclude?