Statistics

Post on 06-Jan-2016

22 views 0 download

description

Statistics. March 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility. Overview. Class goals Master basic statistical concepts Learn analytic techniques & when to apply them Learn how to interpret analysis results - PowerPoint PPT Presentation

Transcript of Statistics

Nemours Biomedical Research

Statistics

March 2009Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D.

Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Overview

• Class goals– Master basic statistical concepts– Learn analytic techniques & when to apply

them– Learn how to interpret analysis results– Develop familiarity with R and related tools– Gain understanding that will transfer to a

broad range of other statistics tool

Nemours Biomedical Research

Overview

• Class structure– 8 sessions– 1.5 hours per session– Several homework assignments

• Class website– http://www.medsci.udel.edu/open/StatClass/March2009

Nemours Biomedical Research

R

• Installing– Download from Class website

• Pick right (Mac versus Windows) version

– Run installer program• Go with all the defaults

• Running R– Live Demonstration

Nemours Biomedical Research

R Concepts

• Command line similar to– Windows Command Shell– Mac Terminal– Unix/Linux Shell

• Uses ‘>’ as prompt• Accepts

– Constants (e.g., 2 3 156 -99.0, etc.)– Variables (e.g., Height, Weight, SubjID,…)– Operations (e.g., ‘+’ ‘-’ ‘*’ ‘/’ ‘^’ ‘>’ ‘<‘ ‘==‘ ‘<-’)– Functions (e.g., sum(c(1,2,3)), mean(c(1,2,3))…)

Nemours Biomedical Research

R Concepts

• Variable types– Scalar (a = 128)– Vector (a = c(3, 2, 9, 5))– Matrix (dim(a) = c(2,2))– Data Frame - collection of vectors.

• Similar to spread sheet• ‘rows’ are indexed• ‘cols’ named and indexed• E.g., df$a is the column of data frame df named ‘a’ and

df$a[5] is the 5th element of ‘a’.

Nemours Biomedical Research

Statistics

• Science of data collection, summarization, analysis and interpretation.

• Descriptive versus Inferential Statistics:

– Descriptive Statistic: Data description (summarization) such as center, variability and shape for quantitative variable (e.g. age) and number (frequency) and percentage for categorical variable (e.g. gender, race etc).

– Inferential Statistic : Drawing conclusion beyond the sample studied, allowing for prediction.

Nemours Biomedical Research

Statistical Description of Data

• Statistics describes a numeric set of data by its

• Center (mean, median, mode etc)

• Variability (standard deviation, range etc)

• Shape (skewness, kurtosis etc)

• Statistics describes a categorical set of data by

• Frequency, percentage or proportion of each

category

Nemours Biomedical Research

Statistical Inference

•Statistical inference is the process by which we acquire information about populations from samples.•Two types of estimates for making inferences:

–Point estimation.–Interval estimate.

sample population

Nemours Biomedical Research

Population and Sample

• Population: The entire collection of individuals or

measurements about which information is desired.

• Sample: A subset of the population selected for study.

– Primary objective is to create a subset of population

whose center, spread and shape are as close as that of

population.

Nemours Biomedical Research

Parameter v.s. Statistic

• Parameter: – Any statistical characteristic of a population.

– Population mean, population median, population standard deviation are examples of parameters.

– Parameter describes the distribution of a population

– Parameters are fixed and usually unknown

Nemours Biomedical Research

Parameter v.s. Statistic

• Statistic:

– Any statistical characteristic of a sample.

– Sample mean, sample median, sample standard deviation are some examples of statistics.

– Statistic describes the distribution of population

– Value of a statistic is known and is varies for different samples

– Are used for making inference on parameter

Nemours Biomedical Research

Parameter v.s. Statistic

• Statistical Issue– Estimate a population parameter using a

sample statistic.– E.g., the sample mean is an estimate of the

population mean.