Arthritis Research UK Epidemiology Unit University of...

35
Introduction to the Stata Language, Part 2 Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 19/12/2017

Transcript of Arthritis Research UK Epidemiology Unit University of...

Page 1: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Introduction to the Stata Language, Part 2

Mark Lunt

Arthritis Research UK Epidemiology UnitUniversity of Manchester

19/12/2017

Page 2: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Summary

GraphicsSummarizing DataMore Stata SyntaxLoopingReshaping DataOther Useful Commands

Page 3: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Graphics

Scatter plotsLabellingOverlaying plotsSchemesSaving & Exporting

Page 4: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Scatter Plots

twoway scatter mpg weight

1020

3040

Mile

age

(mpg

)

2,000 3,000 4,000 5,000Weight (lbs.)

Page 5: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Labelling

Titles title(), subtitle(), note(),caption()

Axis names xtitle, ytitleTick marks xlabel, ylabel

Page 6: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Overlaying Graphs

twoway (scatter mpg weight) (scatter lengthweight, yaxis(2))

140

160

180

200

220

240

Leng

th (

in.)

1020

3040

Mile

age

(mpg

)

2,000 3,000 4,000 5,000Weight (lbs.)

Mileage (mpg) Length (in.)

Page 7: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

twoway lfitci mpg weight || scatter mpg weight

1020

3040

95%

CI/F

itted

val

ues/

Mile

age

(mpg

)

2000 3000 4000 5000Weight (lbs.)

95% CI Fitted valuesMileage (mpg)

Page 8: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Schemes

Can change appearance of graph:Line thicknessColour or B/WText size

Ideal for journal is not ideal for slides9 Schemes provided with stataCan write your own by modifying existing onesset scheme scheme_name, [permanently]Option scheme(scheme_name)

Page 9: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

10

20

30

40

95%

CI/F

itted

val

ues/

Mile

age

(mpg

)

2000 3000 4000 5000Weight (lbs.)

95% CI Fitted values Mileage (mpg)

Page 10: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Saving Graphs

Save graphs in stata format with graph save

Save graphs in other formats with graph export

Format used defined byFilename suffixOption as() to graph export

Use help graph export to find out formats available toyou (depends on version and OS).

Page 11: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Naming graphs

By default, every graph called “Graph”Can store files in memory by renaming:

Option name() to graph commandsgraph rename Graph newname

Recall with graph display name

Can display multiple graphs as the same time if they havedifferent names

Page 12: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Other Graph Types

graph bar Bar chartsgraph box Box and whisker plotsgraph matrix Given n variables, creates an n by n matrix of

scatterplots, plotting every variable against everyother variable.

twoway histogram Histogramstwoway rcap Given two y -values for each x-value, plots a

line between the two y -values, with “caps” at eachend. Useful for showing confidence intervals ifoverlaid.

Page 13: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Other Graph Types

twoway lfit[ci] Linear regression fit to a scatter plottwoway qfit[ci] Quadratic regression fit to a scatter plottwoway fpfit[ci] Fractional polynomial fit to a scatter plottwoway lowess Nonparametric smoothed fit to a scatter plot

1020

3040

2000 3000 4000 5000Weight (lbs.)

Fitted values Mileage (mpg)

lfit

1020

3040

2000 3000 4000 5000Weight (lbs.)

Fitted values Mileage (mpg)

qfit

1020

3040

2,000 3,000 4,000 5,000Weight (lbs.)

predicted mpg Mileage (mpg)

fpfit

1020

3040

2,000 3,000 4,000 5,000Weight (lbs.)

lowess mpg weight Mileage (mpg)

lowess

Page 14: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Kernel Density

0.0

0.1

0.2

0.3

0.4

Pro

port

ion

of S

ubje

cts

−10 −5 0 5 10Linear predictor of Propensity Score

Page 15: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Summarizing Data

describecodebooksummarizetabulate

Page 16: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

describe

describe [varlist]

Number of observations and variablesFor each variable

NameTypeFormatLabels

Page 17: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

codebook

More detail on each variable:All variables: type, range, unique values, missing values,unitsContinuous vars: mean, SD, percentilesCategorical vars: frequency table / sample values

Page 18: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

summarize

summarize [varlist]

Gives mean, SD, min, max, non-missing valuesOption detail gives fuller summary

summarize price mpg headroom trunk

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

price | 74 6165.257 2949.496 3291 15906mpg | 74 21.2973 5.785503 12 41

headroom | 74 2.993243 .8459948 1.5 5trunk | 74 13.75676 4.277404 5 23

Page 19: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

tabulate

tabulate variable gives a frequency tabletabulate var1 var2 give a cross-tabulationOption ro and co give row and column percentagesrespectivelyOption chi2 gives χ2-test.

Page 20: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

More Stata Syntax

[by varlist]: command varlist [ifexpression][, options]

by repeats an analysis for each subgroupif selects a single subgroup to analyse.

Page 21: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Logical Operators

Operator Meaning& and| or== equal∼=, != not equal< less than<= less than or equal> greater than>= greater than or equal

Page 22: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Missing Values

Missing values are bigger than any “real” valueUsing variables in logical expressions is dangerous ifmissing values existE.g. (price > 15000) is true if price is missing.gen hi_price = price > 15000 if price < .

Be very careful when categorising continuous variables.

Page 23: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

The by varlist clause

Produces results for each subgroup defined by varlistseparatelyData needs to be sorted for by to workCommand bysort will do it for youCan replace a lot of if clausesComplex expression can only be used with if

Does not work with every command

Page 24: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Subscripting

Square brackets ([]) after a variable name used pick out anobservation by its numberweight[7] means the weight of the seventh observation_n means the number of the current observation_N means the number of observations in the data (or bygroup)

Page 25: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Lagged Variables

varname[_n - 1] means the value of the variablevarname in the previous observationbysort idno (fupno): replace haq = haq[_n- 1] if haq == .

bysort idno (fupno): gen diff = haq -haq[_n-1]

Page 26: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Looping

foreach macname in list {list of stata commands

}

Opening { must be on first lineCommand(s) must start on next lineFinal } must have its own line

Page 27: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Other forms of foreach

foreach var of varlist . . .

foreach var of newlist . . .

foreach num of numlist . . .

Page 28: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Examples of foreach

foreach visit in 1 2 {summarize bp if when == ‘visit’

} label define yesno 0 "No" 1 "Yes"

foreach x of varlist *_pain {label values ‘x’ yesno

}

Page 29: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Reshaping Data

Long to wide: very easyWide to long: slightly trickierLong form more efficient for storage: only need space forfollowups that existLong form also normally best for analysis

Page 30: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Long Form

ID Gender Anniversary Score900108 1 1 7900108 1 2 15900108 1 5 19900113 2 1 0900113 2 2 18900114 1 1 0900114 1 2 0

Page 31: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Long to wide

Need to specify:Unique identifier which shows which observations belongtogether: idWhich “repeat” a given observation corresponds to:anniversary

Which variables change between visits: scorereshape wide score, i(id) j(anniversary)

Page 32: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Wide Form

ID Gender Score1 Score2 Score5900108 1 7 15 19900113 2 0 18 .900114 1 0 0 .

Page 33: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Wide to long

Need to specify:Unique identifier which shows which observations belongtogether: idThe name of a new variable to contain “repeat” info:anniversary

Which variables are in wide form: scoreIf suffixes are strings, need to use the string option.

reshape long score, i(id) j(anniversary)

Page 34: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Other Useful Commands

display Make things appear in the results window.Can be used as a calculator

expand Produce multiple copies of each observationcmdlog Make a do-file of all the commands you are entering.

Page 35: Arthritis Research UK Epidemiology Unit University of ...personalpages.manchester.ac.uk/staff/mark.lunt/stats/11_Stata_2/...Other Graph Types twoway lfit[ci] Linear regression fit

Expand

Exposed Cases ControlsNo 20 40Yes 30 10

exposed case frequency0 0 400 1 201 0 101 1 30