Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture...

37
Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor: Chendi Jiang Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (Stat 205) 1 / 37

Transcript of Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture...

Page 1: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Introduction, Sections 1.1 and 1.3,The R Statistical Package

Instructor: Chendi Jiang

Department of StatisticsUniversity of South Carolina

Elementary Statistics for the Biological and Life Sciences

(Stat 205)

1 / 37

Page 2: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Grading, homework, exams

Stat 205:Elementary Statistics for the Biological and Life SciencesCourse website: https://blackboard.sc.edu.Grade is: 35% for homework and quizzes, 20% each forexams I, II, and 25% for final.Homework: there will be weekly homework graded over thecourse except the exam weeks.Exams are fixed time and location (see course website).Do not ask to turn in homework late; late or missedhomework will receive zero score. Lowest homework scorewill be dropped from your grade.

2 / 37

Page 3: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Homework, exams and course webpage, office hours

About 7-10 homework and random quizzes you will turn infor credit (35% of your grade).Two midterm exams (20% each) and one Final exam(25%) will account for 65% of the grade percentage. Allexams are comprehensive.Strong, positive correlation between attendance and yourfinal grade.All resources for the course will be posted in coursewebsite: https://blackboard.sc.edu. Course website will beupdated frequently.

Both the instructor and teaching assistant will hold officehours, see syllabus for details.

3 / 37

Page 4: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Topics we’ll cover

Graphical displays and summary statisticsProbability, random variables (normal and binomial)Confidence intervals for µ and pTwo-sample testing and CI2× 2 tables: relative risk & odds ratiosAnalysis of varianceLinear regressionLogistic regression, survival analysis, ROC curvesUse of the statistical package R to analyze real data

4 / 37

Page 5: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

R computing & graphics package

R is a powerful, free statistical computing and graphicspackage.Popular with many researchers due to contributedpackages: R functions to do specialized, advanced, &often complex statistical analysis.R can also do many important, routine calculations,analysis, and provide common graphical displays used inthis course.Installed in several of the computing labs across campus,e.g. Sloan 108 & 109, Gambrell 003.You can download it and install it from CRAN:http://cran.r-project.org/

5 / 37

Page 6: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

6 / 37

Page 7: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Get Started

• Installationgoogle R→ The R project for Statistical ComputingR64 bits (large datasets) vs. R32 bits

Getting help with RAt the command prompt, type, for example ?read.table orhelp(read.table)At the command prompt, type, for example,help.search(”read”) or apropos(”read”).Within R, use the menu bar: Help: R help.Quititng R. q()How to save work space

7 / 37

Page 8: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

R Resources

John Verzani’s SimpleR notesR Reference CardCRAN (Document/Manuals)

Note: To run some of the example in John Verzani’s notes, run:> install.packages(”UsingR”)> library(UsingR)

8 / 37

Page 9: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

More on R

R allows you to do all analysis covered in this course, andbeyond.There are some tutorials, both installed in R and on theweb. Under Help choose Manuals (in PDF) andchoose An introduction to R. This is a longerintroduction to R.For homework, I will give you a skeleton set of commandsto get the basic job done with no frills.R’s error messages can be cryptic and therefore R is notas “user friendly” as some other packages such as Minitab.However it is free and is being used by hundreds ofthousands of people.

9 / 37

Page 10: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

The Comprehensive R Archive Network

Here is where you download R.

10 / 37

Page 11: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Installing R

From http://cran.r-project.org/, under Download and Install Rclick on your platform (Linux, MacOS X, or Windows).

for Windows click on Download R for Windows.

Click Save File and when it’s done downloading run the executable by clicking

on it – alternatively you can choose to Run Program directly after downloadingfrom the web.

The installation program will ask you a series of questions; choose the defaults.(e.g. English language, the suggested installation folder, the checked selectedcomponents to install, not to customize startup options, shortcut in the StartMenu, and additional tasks).

When it’s done, click on the new R desktop icon. Click on the console. This iswhere you will type commands to R.

11 / 37

Page 12: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

The R interface

Initially, there is only the console window open. If you makeplots, other windows will open too.

12 / 37

Page 13: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Some code to try

Note that the # sign is a “comment” – R ignores anything after#.

# generate some random normal datadata=rnorm(100)# look at a histogram and a boxplothist(data)boxplot(data)# compute the sample mean, median, variance, standard deviationmean(data)median(data)var(data)sd(data)# if you have a question about a command, preface it with ??hist

13 / 37

Page 14: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Motivation: why analyze data?

Clinical trials/drug development compare existingtreatments with new methods to cure disease.Agriculture enhance crop yields, improve pest resistance.

Ecology study how ecosystems develop/respond toenvironmental impacts.Lab studies learn more about biological tissue/cellular

activity.

14 / 37

Page 15: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

1.1 Statistics and the life sciences

Statistics is the science ofcollecting,summarizing,analyzing,interpreting

data.Goal: to understand the underlying biological phenomenathat generate the data.Statistics separates signal from noise.Are there associations or relationships among variables inthe data?

15 / 37

Page 16: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Numerical Reasoning: Example1

The following are profits (in millions $) per month from Jan. toDec. 20, 20.4, 20.8, 20.9, 21, 21.2, 21.2, 21.2, 21.4, 21.6, 21.8,22.Suppose you want to show your boss how much the profits aregrowing over the year.

How might you graph the data?

16 / 37

Page 17: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

17 / 37

Page 18: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Numerical Reasoning: Example1

Suppose the data are exactly the same as before, but the datarepresent expenses.Now you want to convince your boss that the expenses are notgrowing over the year.

How might you graph the data?

18 / 37

Page 19: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

19 / 37

Page 20: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Example 1.1.2: liver tumors in mice

Is there a risk difference between germ environment(germ-free vs. E. coli) and whether liver tumors develop?Is the association perfect?Statistics can help answer whether there’s a difference andfurther quantify the effect of germ exposure (Chapter 10).

20 / 37

Page 21: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Example 1.1.4: MAO and schizophrenia

Monoamine oxidase (MAO) enzyme thought to regulatebehavior.Blood from n = 42 schizophrenia patients collected, stratified bydiagnosis (I, II, III).Are there differences in MAO enzyme and disease diagnosis?

21 / 37

Page 22: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Example 1.1.4: MAO and schizophrenia

What happens to MAO as severity of diagnosis increases? Is therelationship perfect?These are side-by-side dotplots, described in Sec. 2.2. Formalapproach in Chapter 11.

22 / 37

Page 23: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

1.2 Types of Evidence: Two main types of datacollection

Observational study : In an observational study theresearcher systematically collects data from subjects, butonly as an observer and not as someone who ismanipulating conditions.Experiment : A research method where the researcherimpose the conditions and manipulates one variable at atime in order to prove causality.

23 / 37

Page 24: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Observational studiesExample 1.2.2 Sexual orientation and AC

Study enrolled 30 homosexual men, 30 heterosexual men,and 30 heterosexual womenThen observe endpoints: midsagittal area of the anteriorcommissure (AC)

24 / 37

Page 25: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Observational studies

25 / 37

Page 26: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

ExperimentExample 1.2.4 Toxicity in Dogs

Before being tested on humans, toxicity of a certain drug wastested on 8 female and 8 male dogs. Females were randomlyallocated into two groups of 4 dogs - one taking 8mg/kg andone taking 25 mg/kg of the drug; similarly for the male dogs.The alkaline phosphate level was measured after receiving thedrug.

26 / 37

Page 27: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Example 1.2.4 Toxicity in Dogs

27 / 37

Page 28: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Other topics and terms

Blinding: An experiment is said to be double-blind if boththe subject and the response measurer do not know whichtreatment the subject is getting.Control Groups: Usually in the life sciences we comparetwo or more groups to one another. Often, one of thegroups is a control group.

28 / 37

Page 29: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

1.3 Random sampling

The population is all the subjects/animals/specimens/etc.of interest.Since we can’t measure the entire population (usually) wetake a small sample of size n and use the data collected toinfer about the population.

29 / 37

Page 30: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Simple Random Sampling

A simple random sample of n items is a data set where

every population element has an equal chance of selectionevery population element is chosen independently of everyother element

30 / 37

Page 31: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Cluster Sampling

Divide the population (sampling frame) into clusters, samplewhole clusters.

31 / 37

Page 32: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Stratified Random Sampling

First, divide the population (sampling frame) into strata, thensample a specified number of individuals from each strata.

32 / 37

Page 33: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

MAO data: R code

# read data from web, take 1st & 2nd columns as mao and group indicators, plotstuff=read.table("http://people.stat.sc.edu/Chakrabp/mao.txt",header=FALSE)stuffmao=stuff[,1]maogroup=stuff[,2]groupplot(group,mao)# you can also read data from a file on your computer (text, Excel, etc.)

33 / 37

Page 34: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

MAO data: output

>stuff=read.table("http://people.stat.sc.edu/Chakrabp/mao.txt",header=FALSE)> stuff

V1 V21 6.8 12 4.1 13 7.3 14 14.2 15 18.8 16 9.9 17 7.4 18 11.9 19 5.2 110 7.8 111 7.8 112 8.7 113 12.7 114 14.5 115 10.7 116 8.4 117 9.7 118 10.6 119 7.8 220 4.4 221 11.4 222 3.1 223 4.3 224 10.1 225 1.5 2

34 / 37

Page 35: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

MAO data: output continued

26 7.4 227 5.2 228 10.0 229 3.7 230 5.5 231 8.5 232 7.7 233 6.8 234 3.1 235 6.4 336 10.8 337 1.1 338 2.9 339 4.5 340 5.8 341 9.4 342 6.8 3> mao=stuff[,1]> mao[1] 6.8 4.1 7.3 14.2 18.8 9.9 7.4 11.9 5.2 7.8 7.8 8.7 12.7 14.5 10.7 8.4 9.7

[18] 10.6 7.8 4.4 11.4 3.1 4.3 10.1 1.5 7.4 5.2 10.0 3.7 5.5 8.5 7.7 6.8 3.1[35] 6.4 10.8 1.1 2.9 4.5 5.8 9.4 6.8> group=stuff[,2]> group[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3

> plot(group,mao)

35 / 37

Page 36: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Plot of MAO data from R

●●

1.0 1.5 2.0 2.5 3.0

510

15

group

moa

You can right click on an R plot to save it to the clipboard as a metafile or bitmap.

These can be saved into Microsoft applications such as Word. You can also left click on

the plot then under Save choose Save as and save the plot, e.g. PDF.

36 / 37

Page 37: Introduction, Sections 1.1 and 1.3, The R Statistical …people.stat.sc.edu/jiana/STAT 205/Lecture notes/L1...Introduction, Sections 1.1 and 1.3, The R Statistical Package Instructor:

Plot of MAO data from R

R window after cutting and pasting the commands a few slidesago.

37 / 37