Advanced Stata Workshop
description
Transcript of Advanced Stata Workshop
Advanced Stata Workshop
FHSS Research Support Center
Presentation Layout
• Visualization and Graphing• Macros and Looping• Panel and Survey Data• Postestimation
Visualization and Graphing in Stata
5560
6570
7580
Life
exp
ecta
ncy
at b
irth
0.1.2.3Fraction
5560
6570
7580
Life
exp
ecta
ncy
at b
irth
2.5 3 3.5 4 4.5loggnp
0.0
5.1
.15
.2Fr
actio
n
2.5 3 3.5 4 4.5loggnp
Source: 1998 data from The World Bank Group
Life expectancy at birth vs. GNP per capita
Intro To Graphing In Stata10
2030
40M
ileag
e (m
pg)
0 5,000 10,000 15,000Price
“graph” is often optional. So is “twoway” in this case.
. sysuse auto, clear
. graph twoway scatter mpg weight //Note that you don't need to type graph or twoway
. scatter mpg weight
Note: Nearly all graphing commands start with “graph”, and “twoway” is a large family of graphs.
Creating Multiple Graphs with “by():”. twoway scatter mpg weight, by(foreign)
1020
3040
2,000 3,000 4,000 5,000 2,000 3,000 4,000 5,000
Domestic Foreign
Mile
age
(mpg
)
Weight (lbs.)Graphs by Car type
Note that the value label is displayed above the graphs, and the variable label is displayed in the bottom right hand corner.
Overlaying “twoway” graphs
The || tells Stata to put the second graph on top of the first one – order matters! You don’t need to type “twoway” twice; it applies to both.
. twoway scatter mpg weight || lfit mpg weight
1020
3040
2,000 3,000 4,000 5,000Weight (lbs.)
Mileage (mpg) Fitted values
. twoway (scatter mpg weight) (lfit mpg weight)
1020
3040
2,000 3,000 4,000 5,000Weight (lbs.)
Mileage (mpg) Fitted values
This is another way of writing the command – it doesn’t matter which one you use.
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
"by()" statements with overlaid graphs
“qfitci” is a type of graph which plots the prediction line from a quadratic regression, and adds a confidence interval. The “stdf” option specifies that the confidence interval be created on the basis
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
010
2030
40
2000 3000 4000 5000 2000 3000 4000 5000
Domestic Foreign
95% CI Fitted valuesMileage (mpg)
Weight (lbs.)
Graphs by Car type
stdf is an option of qfitci. by(foreign) is an option of twoway.
"by()" statements with overlaid graphsAnother way of writing the previous command is:
010
2030
40
2000 3000 4000 5000 2000 3000 4000 5000
Domestic Foreign
95% CI Fitted valuesMileage (mpg)
Weight (lbs.)
Graphs by Car type
. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)
So: This was is easier to read.
This way is easier to type.
Graphs with Many Options and OverlaysYou can make pretty impressive graphs just from code, if you overlay the graphs and specify certain options like: multiple axes, notes, titles and subtitles, axis titles and labels, and legends.
Code for Previous Graph
. #delimit ;
. #delimit cr
> legend(label(1 "White males") label(2 "Black males") );> "(1918 dip caused by 1918 Influenza Pandemic)" )> note( "Source: National Vital Statistics, Vol 50, No. 6" > subtitle( "USA, 1900-1999" ) > title( "White and black life expectancy" ) > ytitle( "Life expectancy at birth (years)" ) > ylabel( 0 20(10)80, gmax angle(horizontal) ) > ylabel( 0(5)20, axis(2) grid gmin angle(horizontal) ) > xlabel( 1918, axis(2) ) > xtitle( "", axis(2) ) > ytitle( "", axis(2) ) > ||, > || lfit diff year > || line diff year > || line le_bm year . twoway line le_wm year, yaxis(1 2) xaxis(1 2)
. use http://www.stata-press.com/data/r12/uslifeexp, clear
. generate diff = le_wm - le_bm
. label var diff "Difference"
. #delimit cr
This may look scary, but it is actually fairly straightforward. See the accompanying do-file for explanation of each component.
68
1012
14
01oct2009 01jan2010 01apr2010 01jul2010date
NASDAQ Composite Index ABC.com, Inc. share price
Using the Graph Editor
. tsline nci abc
It is often easier to make changes in the graph editor than to specify all the options in code.
Let’s make graph 1 into graph 2 by using the graph editor tools.
0
2
4
6
8
10
12
14
16
Sha
re P
rice
(US
D)
Oct 1, 2009Nov 1, 2009
Dec 1, 2009Jan 1, 2010
Feb 1, 2010Mar 1, 2010
Apr 1, 2010May 1, 2010
Jun 1, 2010
NASDAQ Composite Index ABC.com, Inc. share price
Source: CRSP, Bloomberg
Sep 24, 2009 - June 7, 2010
ABC.com Inc.Closing Share Price vs. Nasdaq Composite Index
Recording Edits in the Graph Editor
Graph Element ChangeGraph Title Enter Title using quotes to separate lines, color=black
Graph Subtitle Enter subtitle
Graph Region Color = Bluish-gray
Y-AxisRange = 0 to 16 by 2, axis line = medium thick, add title, label angle = horizontal, grid lines = off
X- Axis
title = off, minor ticks = off, suggest # of ticks = 8, alternate spacing of adjacent labels = on, change label format, label size=small, axis line = medium thick
Plot 1 line color=green, width = thick
Plot 2 line color = blue, width = thick
Caption Add caption
Before you start making changes, click the record button. After you are done, click it again, and save your changes as a recording so you can “play” them back later. We will save this recording as advanced_workshop_1.
Play Your Graph Recording
. tsline nci abc, play(advanced_workshop_1)
You can create a graph, open the graph editor, click the green play button, and then play back your recorded edits.
Or, you can play your edits right from the code:
You can also run all of your recorded edits on a different graph, and just change the title:
0
2
4
6
8
10
12
14
16
Sha
re P
rice
(US
D)
Oct 1, 2009Nov 1, 2009
Dec 1, 2009Jan 1, 2010
Feb 1, 2010Mar 1, 2010
Apr 1, 2010May 1, 2010
Jun 1, 2010
Computer World share price Computer Planet share price
Source: CRSP, Bloomberg
Sep 24, 2009 - June 7, 2010
ABC.com Inc.Closing Share Price vs. Nasdaq Composite Index
. tsline comp_world comp_planet , play(advanced_workshop_1)
You can run your recorded edits on a graph of a different type, though in this case not all of your edits will make sense:
0
2
4
6
8
10
12
14
16
Sha
re P
rice
(US
D)
Oct 1, 2009Nov 1, 2009
Dec 1, 2009Jan 1, 2010
Feb 1, 2010Mar 1, 2010
Apr 1, 2010May 1, 2010
Jun 1, 2010
NASDAQ Composite Index ABC.com, Inc. share price
Source: CRSP, Bloomberg
Sep 24, 2009 - June 7, 2010
ABC.com Inc.Closing Share Price vs. Nasdaq Composite Index
> , play(advanced_workshop_1). twoway (scatter nci date) (scatter abc date) ///
Storing and Moving Your RecordingsGraph recordings are stored as .grec files in your “personal” folder, under the “grec” folder. Type “personal” to see where this is; normally it is C:\ado\personal. So by default Stata should store your .grec files in C:\ado\personal\grec.
your personal ado-directory is c:\ado\personal\. personal
. dir c:\ado\personal\grec\
1.3k 11/21/12 10:12 x grid.grec 0.9k 5/17/12 15:47 line..grec 0.7k 3/01/12 9:48 jeff_test_recording_graph_edits.grec 0.4k 2/21/13 9:12 advanced_workshop_1.grec
Unfortunately, if you are not faculty, you are probably using lab computers to use Stata, and when they are re-imaged, you will lose the files in your grec folder. So you can store the recordings on your flash drive by clicking the Browse button when you save your recording. Now, when you are in the graph editor and click the play button, your recording will not appear in the list because it is not stored where Stata knows to look for it. Never fear, just click Browse, and navigate to where your .grec file is. If you want your recording to be available right from code, as in play(advanced_workshop_1), you will need to move it (at least temporarily) to the “grec” folder, or write the directory location in the code: play(E:\flashdrive\Graph Recordings\advanced_workshop_1)
Using Schemes in GraphingRecordings are great if you are going to be making the same kind of graph a lot. But a recording for a scatter plot will hardly affect a histogram at all, and might even make it look terrible. If you want to change the look of all graphs that you make, you may want to make a scheme. Schemes are text files which tell Stata how to draw graphs.
40
45
50
55
60
65
life
expe
ctan
cy
1900 1910 1920 1930 1940Year
4045
5055
6065
life
expe
ctan
cy
1900 1910 1920 1930 1940Year
. sysuse uslifeexp2, clear
. scatter le year. scatter le year, scheme(economist)
More on Schemes
economist see help scheme_economist sj see help scheme_sj s1manual see help scheme_s1manual s1rcolor see help scheme_s1rcolor s1mono see help scheme_s1mono s1color see help scheme_s1color s2gcolor s2gmanual s2manual see help scheme_s2manual s2mono see help scheme_s2mono s2color see help scheme_s2color
Available schemes are
. graph query, schemes
Schemes are very powerful, because they let your implement a certain look without specifying a long series of options in every graph, or running every graph through the graph editor. However, creating schemes is fairly time consuming.
For more on creating your own schemes, see:
http://www3.eeg.uminho.pt/economia/nipe/2010_Stata_UGM/papers/Rising.pdfAnd http://www.ats.ucla.edu/stat/stata/seminars/stata_graph/graphsem.txt
Manipulating Graphs: Memory vs. DiskWhen you draw a graph, it is stored in memory, under the name Graph.
If you draw another graph, it replaces the previous one in memory, and is now called Graph.
If you want to have multiple graphs up at the same time, you can use the name option.
graph save moves your graph from memory to disk, saving it as a .gph file.
graph dir lists all graphs in memory and on disk (in the current directory)
graph drop drops a graph from memory. Graphs contain the data files they represent, so if the dataset is large, they can actually take up quite a bit of memory.
. sysuse auto, clear
. scatter price mpg
. scatter price length
. scatter price mpg, name(scatter1)
. cd C:\Users\nickj22\Downloads\
. graph save scatter1 mygraph1.gph
Graph scatter1 mygraph1.gph. graph dir
. graph drop scatter1
Manipulating Graphs DemoSee do file for demo
More Example GraphsNote: Annotated code is in the do file for all of these
Histogram, with overlaid normal distribution
22 22 2233
17
50
33
38
1325
613
6 8
3831
158
020
4060
020
4060
9.5 10 10.5 11 9.5 10 10.5 11
NE N Cntrl
South West
Avg. education level Avg. education level
Avg. education level Avg. education level
Percentnormal educPercent
Per
cent
average education level
Graphs by Census region
8
6
2
8
12
16
20
12
6
12
05
1015
20P
erce
nt
9.5 10 10.5 11average education level
Source: US Census, 1980 and 1990
Avg. education level
More Example Graphs
73.3
27.9
73.5
21.7
81.0
46.1
72.1
46.2
020
4060
80D
egre
es F
ahre
nhei
t
N.E. N. Central South West
Source: U.S. Census Bureau, U.S. Dept. of Commerce
by regions of the United StatesAverage July and January temperatures
July January
Use graph bar to make bar graphs
More Example Graphs
5560
6570
7580
Life
exp
ecta
ncy
at b
irth
0.1.2.3Fraction
5560
6570
7580
Life
exp
ecta
ncy
at b
irth
2.5 3 3.5 4 4.5loggnp
0.0
5.1
.15
.2Fr
actio
n
2.5 3 3.5 4 4.5loggnp
Source: 1998 data from The World Bank Group
Life expectancy at birth vs. GNP per capita
Use graph combine to combine 3 graphs into one:
More Example GraphsGraph matrix is a great alternative to a correlation matrix to investigate relationships between variables
Avg.annual %
growth
Lifeexpectancy
at birth
Log GNPper
capita
safewater
-10123
-1 0 1 2 3
50
60
70
80
50 60 70 80
6
8
10
12
6 8 10 1220406080
100
20 40 60 80 100
Source: The World Bank Group
Correlations among 1998 life-expectancy data
More Example GraphsGet data labels (called marker labels in Stata) from the values of another variable
Canada
Dominican Republic
El Salvador
Guatemala
Haiti
Honduras
Jamaica
Mexico
Nicaragua
PanamaTrinidad
United States
Argentina
Bolivia
Brazil
Chile
ColombiaEcuador ParaPeru
UruguayVenezuela
5560
6570
7580
Life
exp
ecta
ncy
at b
irth
(yea
rs)
.5 5 10 15 20 25 30GNP per capita (thousands of dollars)
Data source: World Bank, 1998
North, Central, and South AmericaLife expectancy vs. GNP per capita
More Example GraphsXtline from a panel data set can overlay lines for each value of panel variable. The labels on the x-axis are often a bit off to start though, as shown.
3500
4000
4500
5000
Cal
orie
s co
nsum
ed
01jan2002 01apr2002 01jul2002 01oct2002 01jan2003Date
Tess SamArnold
Jan 1 2002 - Jan 1 2003Calories Consumed by Subject