Question 4 What are data and what do they mean to a scientist?

Post on 20-Jan-2016

217 views 0 download

Tags:

Transcript of Question 4 What are data and what do they mean to a scientist?

Question

What are data and what do they mean to a scientist?

Dinner at the Urquhart House

Brought to you by the Briggs Multiracial Alliance

Sunday night All food provided (probably Chinese) Contact Mimi Reddy, reddydee@msu.edu

for details

Data, Statistics, and Spreadsheets

What are data? What are statistics? What are spreadsheets? How can you analyze data with

spreadsheets?

Data

Data are pieces of information Data can be numbers, words, descriptions Data have UNITS The word data is PLURAL, datum is singular Data about Willoughby:

• Age: 5 (years)• Height: 47 (inches)• Weight: 66 (pounds)• Eyes: Blue• Favorite word: Wrestle• Favorite letter: W

Types of Data

Numbers – two types– Real #s – rational numbers – 28.75 lbs– Integers – whole numbers – 18 months

Letters – called characters in programming– W is a character

Words – called strings in programming– “No thanks” is a strings, can be individual

words or phrases

Statistics and Data

Test Scores: – Jeff: 88

– Mollie: 92

– Marcie: 88

– Dave: 47

– Karim: 99

– Willoughby: 42

– Benjamin: 0

What statistics can you calculate to describe these data?

– Try to think of four things to describe the data

stop

Statistics

Statistics are derived from the data Statistics are descriptions of data Statistics are meant to simplify the data Statistics can be misleading

Typical Statistics

Sample Size - number of individuals measured = n

Sum = Average or Mean = /n Median

– Value of 50th percentile, half of values fall above, half below

Maximum, Minimum, Range (Max-Min) Mode - most common value Standard deviation Variance (SD

2)

Analyze these data...

Mean, max, min, range, median, mode

• 18

• 33

• 4

• 47

• 49

• 38

• 29

• 4

• 55

sample size (n)

Sum

mean=average=/n• denoted x

median = halfway

mode = most common

Spreadsheets

Spreadsheets are tables

Spreadsheets allow calculations and manipulations of data

• Calculations: mean, standard deviation

• Manipulations: sort,

CostaRica NicaraguaRainforest 625,000 3,712,000Dry Forest 50,000 300,000

Total 675,000 4,012,000

Make a data table:

Fly 1, length 13.4 mm, velocity 27 Kph, age 21 days Fly 2, length 9.4 mm, velocity 0 Kph, age 220 days Fly 3, length 9.3 mm, velocity 44 Kph, age 1 days Fly 4, length 13.4 mm, velocity 17 Kph, age 32 days Fly 5, length 17.4 mm, velocity 33 Kph, age 11 days

How many columns? How many rows? #s go down or across?

Data Table

Fly # Length Velocity Age

1

2

3

4

5

Microsoft Excel

Typical spreadsheet program– Lotus 1-2-3 is original commercial spreadsheet

Has similar controls to MS Word Now allows graphing (charts)

• very restricted formats, hard to get exactly what you want

Excel tables and graphs can be copied into MS Word

Friday’s Assignment

We will work with Microsoft Excel to analyze some data

Groups of two will submit one finished spreadsheet for the assignment

Graphs

Many different types of graphs– Points– Lines– Bars– Pies

Point Graphs

Called X-Y Scatter in MS Excel Plot points based on X and Y value Can fit a “REGRESSION LINE” to the data

– Line that best fits the data

X-Y Scatter

Bar Graphs

Categorize data into counts or percents Categories can be descriptive categories

(Windows 98, Windows 2000, …) Can also be numeric categories

– Height: 60-63, 63-66, etc. or just 61, 62, 63…– Count up number of people in each group

Histograms are a particular type of bar graph

Bar Graph

Starting Salary

$0

$10,000

$20,000

$30,000

$40,000

$50,000

1988 1989 1990 1991 1992 1993 1994

Starting Salary

Histogram

X axis is categories Y axis is a number or proportion of

observations in that category

Histogram Bar GraphN

um

ber

of

Cra

shes

Regular Bar Graph vs. Histogram Bar Graph

Starting Salary

$0

$10,000

$20,000

$30,000

$40,000

$50,000

1988 1989 1990 1991 1992 1993 1994

Starting Salary

Distributions

Special type of histogram with continuous numeric scale at bottom

Normal distribution is a key concept in statistics

Skewed distribution is one that is unbalanced

Sample distribution histograms

Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.pptRobert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt

The NORMAL Distribution

A NORMAL DISTRIBUTION is the theoretical distribution of values given natural variation around a MEAN

It is balanced, humped distribution

Distributions

Skew is an imbalance in the distribution

Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt

Hypothesis Testing

Statistical Tests are how scientists decide if data support their hypothesis

(NOT PROVE their hypothesis) Four major statistical tests: T-test, X2 Test,

Regression, ANOVA

Hypothesis

Processor speed has an effect on the performance of the computer.

Null Hypothesis– H0: Processor speed has NO EFFECT on the

performance of a computer.

Statistical Tests and Probability

Statistical tests give a value That value can be related to a probability Probability is likelihood that NULL

hypothesis is correct given the data you have

If P < 0.05 (1/20), then you conclude NULL hypothesis is FALSE

T-Test

Compares differences between two means

Formula: T = (x1-x2)/SEM

– SEM is Standard Error of Mean [SD/(N-1)]

T Values: Difference between mean in comparison to the amount of spread in your data

T-Values

If T > 2.5 or 3.0, difference is usually significant (this depends on your sample sizes)