Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

44
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Transcript of Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Page 1: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Introduction to Statistical Computing in Clinical Research

Biostatistics 212

Lecture 1

Page 2: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Today...

• Course overview– Course objectives– Course details: grading, homework, etc– Schedule, lecture overview

• Where does Stata fit in?• Basic data analysis with Stata• Stata demos• Lab

Page 3: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course Objectives

• Introduce you to using STATA and Excel for– Data management

– Basic statistical and epidemiologic analysis

– Turning raw data into presentable tables, figures and other research products

• Prepare you for Fall courses• Start analyzing your own data

Page 4: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• Biostats 212

– 1 Unit Course– Satisfactory/Unsatisfactory vs. Grades– 7 Sessions – Lecture + Lab, starting August 2

Page 5: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• New this year:

– In-Person + Online versions of the course

– Recorded lectures

– Forum

Page 6: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• Two “In-Person” Sections:

– Lectures – in person (6702), Tuesday 1:15-2:45

– Labs – in person (6702 + 6704), Tuesday 3:00-4:00

Page 7: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• One “Online” Section:

– Lectures – Recorded, posted late Tuesday afternoon

– Labs – Online Wed 1:30-3:00• New this year, online students only

• Led by Jen Cocohoba – comments?

Page 8: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• Recorded Lectures

– Audio + video of lecturer + video of screen

– Available same day for viewing

– See http://xxxxxxxxxxxxxxx

Page 9: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• Forum

– Demo

– Post all questions here!• TA turnaround time

– Before you post, see if it’s already there and answered

– Consider turning ON your alerts around lab time…

Page 10: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• Course Requirements– Hand in all six Labs (even if late)

– Satisfactory Final Project

• Not required– Reading

– Attendance

Page 11: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details

• Grading (only relevant for Master’s and ATCR Credit-Bearing students?)

– Letter grades: Standard cutoffs• 90-100% A

• 80-89% B

• 70-79% C

• 60-69% D

• <60% or Course Requirements not met: F

– Satisfactory/Unsatisfactory• >80%

Satisfactory

Page 12: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course details, cont

Course Director

Mark Pletcher

TA’s

Naomi Bardach

Raymond Hsu

Sharon Poisson

Monika Sarkar

Assistant Course DirectorJennifer Cocohoba

Lab InstructorsJing ChengBarbara GrimesNancy HillsAnn Lazar

Page 13: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Overview of lecture topics

• 1- Introduction to STATA

• 2- Do files, log files, and workflow in STATA

• 3- Generating variables and manipulating data with STATA

• 4- Using Excel

• 5- Basic epidemiologic analysis with STATA

• 6- Making tables and figures with STATA

• 7- Advanced Programming Topics

Page 14: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Overview of labs

• Lab 1 – Load a dataset and analyze it• Lab 2 – Learn how to use do and log files• Lab 3* – Import data from excel, generate new variables and

manipulate data, document everything with do and log files.• Lab 4 – Using and creating Excel spreadsheets• Lab 5* – Epidemiologic analysis using Stata• Lab 6 – Making a figure with Stata

Last lab session will be dedicated to working on the Final Project

* - Labs 3 and 5 are significantly longer and harder than the others

Page 15: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Overview of labs, cont

• Official In-Person Lab time is 3:00-4:00 on Tuesday, but we will start right after lecture, and you can leave when you are done.

Page 16: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Overview of labs, cont

• Labs are due the following week prior to lecture. Labs turned in late (less than 1 week) will receive only half credit; after that, no points will be awarded. However, ALL labs must be turned in to pass the class (even if no points are awarded).

• Lab 1 is paper

• Labs 2-6 are electronic files, and should be emailed to your section leader’s course email address: [email protected] (Elizabeth/Raman) or [email protected] (David/Yvette)

Page 17: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Final Project

• Create a Table and a Figure using your own data, document analysis using Stata.

• Due 1 week after last lab session, 20 points docked for each 1 day late.

Page 18: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Course Materials

• Online Syllabus (http://www.epibiostat.ucsf.edu/courses/schedule/biostat212.html)

– Lectures and Labs/Datasets (“just in time”)– Miscellaneous handouts– Final Project

Page 19: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Getting started with STATA

Session 1

Page 20: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Types of software packages used in clinical research

• Statistical analysis packages

• Spreadsheets

• Database programs

• Custom applications– Cost-effectiveness analysis (TreeAge, etc)– Survey analysis (SUDAAN, etc)

Page 21: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Software packages for analyzing data

• STATA• SAS• S-plus, and R• SPS-S• SUDAAN• Epi-Info• JMP• MatLab• StatExact

Page 22: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Why use STATA?

• Quick start, user friendly

• Immediate results, response

• You can look at the data

• Menu-driven option

• Good graphics

• Log and do files

• Good manuals, help menu

Page 23: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Why NOT use STATA?

• SAS is used more often?• SAS does some things STATA does not• Programming easier with S-plus and R?• R is free• Complicated data structure and

manipulation easier with SAS?• Epi-info is free and even easier than

STATA?

Page 24: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA – Basic functionality

• Holds data for you– Stata holds 1 “flat” file dataset only (.dta file)

• Listens to what you want– Type a command, press enter

• Does stuff– Statistics, data manipulation, etc

• Shows you the results– Results window

Page 25: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Demo #1

• Open the program

• Entering vs. loading data

• Look at data

• Run a command

• Orient to windows and buttons

Page 26: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA - Windows

• Two basic windows– Command

– Results

• Optional windows– Variable list

– Properties

– History of commands

• Other functions– Data browser/editor

– Variables Manager

– Do file editor

– Viewer (for log, help files, etc)

Page 27: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA - Buttons

• The usual – open, save, print

• Log-file open/suspend/close

• Do-file editor

• Browse and Edit

• Break

Page 28: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA - Menus

• Almost every command can be accessed via menu

Page 29: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Menu vs. Command line

• Menu advantages– Look for commands you don’t know about

– See the options for each command

– Complex commands easier – learn syntax

• Command line advantages– Faster (if you know the command!)

– “Closer” to the program

– Only way to write “do” files• Document and repeat analyses

Page 30: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Demo #2

• Load a STATA dataset

• Explore the data

• Describe the data

• Answer some simple research questions– Gender, BMI, blood pressure

Page 31: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsDescribing your data

• describe [varlist]– Displays variable names, types, labels

• list [varlist]– Displays the values of all observations

• codebook [varlist]– Displays labels and codes for all variables

Page 32: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsDescriptive statistics – continuous data

• summarize [varlist] [, detail]– # obs, mean, SD, range– “, detail” gets you more detail (median, etc)

• ci [varlist]– Mean, standard error of mean, and confidence intervals– Actually works for dichotomous variables, too.

Page 33: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsGraphical exploration – continuous data

• histogram varname– Simple histogram of your variable

• graph box varlist– Box plot of your variable

• qnorm varname– Quantile plot of your variable to check normality

Page 34: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsDescriptive statistics – categorical data

• tabulate [varname]– Counts and percentages

– (see also, table - this is very different!)

Page 35: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsAnalytic statistics – 2 categorical variables

Page 36: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsAnalytic statistics – 2 categorical variables

• tabulate [var1] [var2]– “Cross-tab”

– Descriptive options, row (row percentages)

, col (column percentages)

– Statistics options, chi2 (chi2 test)

, exact (fisher’s exact test)

Page 37: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Getting help

• Try to find the command on the pull-down menus

• Help menu– If you don’t know the command - Search...

– If you know the command - Stata command...

• Try the manuals– more detail, theoretical underpinnings, etc

Page 38: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsAnalytic statistics – 1 categorical, 1 continuous

Page 39: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsAnalytic statistics – 1 categorical, 1 continuous

• bysort catvar: summarize [contvar]– mean, SD, range of one in subgroup

• ttest [contvar], by(catvar)– t-test

• oneway [contvar] [catvar]– ANOVA

• table [catvar] [, contents(mean [contvar]…)– Table of statistics

Page 40: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsAnalytic statistics – 2 continuous

Page 41: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

STATA commandsAnalytic statistics – 2 continuous

• scatter [var1] [var2]– Scatterplot of the two variables

• pwcorr [varlist] [, sig]– Pairwise correlations between variables

– “sig” option gives p-values

• spearman [varlist] [, stats(rho p)]

Page 42: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

In Lab Today…

• Expect some chaos!– IT will be here to help with wireless, logins, etc

• Familiarize yourself with Stata

• Load a dataset

• Use Stata commands to analyze data and fill in the blanks

Page 43: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Next week

• Do files, log files, and workflow in Stata

• Find a dataset!

Page 44: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.

Website addresses

• Course website– http://www.epibiostat.ucsf.edu/courses/schedule/biostat212.html

• Computing information– http://www.epibiostat.ucsf.edu/courses/ChinaBasinLocation.html#

computing

• Download RDP for Macs (for Stata Server)– http://www.microsoft.com/mac/remote-desktop-client

• Citrix Web Server– http://apps.epi-ucsf.org/

• Stata 12 Server– 65.175.48.75