Introduction to the Stata Language
Transcript of Introduction to the Stata Language
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Introduction to the Stata Language
Mark Lunt
Arthritis Research UK Epidemiology UnitUniversity of Manchester
02/10/2018
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Topics Covered Today
Getting helpStata WindowsBasic ConceptsManipulation of variablesManipulation of datasets
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Command-line vs. Point-and-Click
Command-line requires more initial learning thanpoint-and-clickCommands must be entered exactly correctlyOnly option for any serious work
1 Reproducible2 Editable3 More efficient
Some commands can be written more efficiently viapoint-and-click
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Getting Help
HelpManualsSearchStata websiteStatalistStata JournalMe
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Command WindowVariables WindowReview WindowResults Window
Stata Windows
2 must exist:ResultsCommand
2 others usually existReviewVariables
Others can exist (data editor, graph, do-file editor, help/logviewer)
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Command WindowVariables WindowReview WindowResults Window
Command Window: Syntax
command [varlist] [,options]
Roman letters: entered exactlyItalic letters: replaced by some text you enterSquare brackets: that item is optionalExample above means means:
Command is called “command”Command name may be followed by a list of variablesOptions may follow a comma
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Command WindowVariables WindowReview WindowResults Window
Command Window
Can navigate through previous commands with PageUpand PageDown.Pressing tab key will complete a variable name as far aspossibleCase-sensitive: height and HEIGHT are differentvariablesSyntax must be exact (although abbreviations are possible)
Only one comma, before all optionsSpace before opening parenthesis was most common error,now (Stata 12) accepted. (e.g. level(5), notlevel (5)).
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Command WindowVariables WindowReview WindowResults Window
Variables window
List of all variables in current datasetClicking adds variable name to command windowMay contain label if one has been defined
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Command WindowVariables WindowReview WindowResults Window
Review Window
List of commands entered this sessionClicking on a command puts it in command windowDouble-clicking runs the commandCan be saved as a do-file
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Command WindowVariables WindowReview WindowResults Window
Results Window
Limited size: use a log file to preserve resultsBlue = clickable linkScrolling controlled by Return, Space and q keys.set more [on | off]
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Basic Concepts
Do-filesLog filesInteraction with Operating SystemMacrosVariable and number lists
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Do-Files
List of commandsCan be run from stata with the commanddo "do-file.do"
All data manipulation and analysis should be done using ado-file.
Perfectly reproducibleCan see exactly what was doneEasy to modify
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Profile.do
Stata looks for a file called profile.do every time itstarts.If it finds it, it runs itUseful for
Setting memoryUser-defined menusLogging commands
See help profilew for details
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Log Files
Results window of limited size: must log resultsCan use plain text or SMCL (stata markup and controllanguage)Top of do file should be:capture log closelog using myfile.log, [append]|[replace]([text]|[smcl])
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Interaction with Operating System
cd Change directorypwd Display current directorymkdir Create directorydir List files in current directoryshell Run another program
Can use either "/" or "\" in directory names.Safer to use "/"
Path names containing spaces must be surrounded byinverted commas.
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Macros
Macro name is replaced by definition text when commandis run.Very useful for making do-files portable
Directories used are defined first using macrosChange in location of data or do-files only means changingmacro definitions
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Macro Example
Definition: global mymac C:/Project/Data
Use:use "$mymac/data"Loads the file C:/Project/Data/data
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Local vs. Global
Global macro retains definition until end of sessionLocal macro loses definition at end of do-file
Definition UseGlobal global mymac defn $mymacLocal local mymac defn ‘mymac’
Local vs Global macros
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Variable Lists
Shorthand for referring to a lot of variablesprefix* means all variables beginning with prefix
firstvar-lastvar means all variables in the datasetfrom firstvar to lastvar inclusive.Type help varlist for more details
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Do-FilesLog FilesInteraction with Operating SystemMacrosLists
Number Lists
Symbol Meaning Example Expansionlist of numbers 1 2 3 1 2 3
x /y whole numbers from x to y inclusive 1/5 1 2 3 4 5x y to z numbers from x to z, increasing by y − x 5 10 to 20 5 10 15 20x y : z same as x y to z 5 10:20 5 10 15 20x(y)z numbers from x to z, increasing by y 10(10)50 10 20 30 40 50x [y ]z same as x(y)z 10[10]50 10 20 30 40 50
Number Lists
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
Manipulating Variables
generate & replace
egen
LabellingSelecting variables
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
generate
Used to create a new variableSyntax: generate [type] newvar = expression
newvar must not already existtype, if present, defines the type of the dataexpression defines the values: e.g.
generate ltitre = log(titre)generate str6 head = substr(name, 1, 6)
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
Variable Types
type size (bytes) min max precision missingbyte 1 -127 126 integers .int 2 -32,767 32,766 integers .long 4 -2,147,483,647 2,147,483,646 integers .∗float 4 −1036 1036 7 digits .double 8 −10308 10308 15 digits .strn n ""strL varies ""
Available data types
∗float is the default type.
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
Missing Values
Numerical variables can have several different missingvalues:
., .a, .b, etcMay be useful if you know why a variable is missingif variable != . may not catch all missing values
All missing values are greater than any numberrepresentable by that datatype.
Can exclude all missing values withif variable < .gen old = age > 65 if age < .
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
replace
Similar to generateCannot change typenewvar must already exist
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
egen
Extended GENerateHas more functions availableUser can write their own egen functionsNo ereplace: must drop the existing variable and createa new oneExamples of its use in the practicalSee help egen for details
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
Labelling
Need to label variables themselvesshow exactly what the variable measures
Need to label values of a variableOnly for categorical variablesFirst define a labelThen assign it to a variableEasier to assign same label to a number of variables
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
Labelling a variable
Syntax: label variable varname "Description"
Example: label variable height "Height in m."
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
Labelling values
Syntax: label define labelname 1 "string1" . . .label values varname labelname
Example: label define yesno 0 "No" 1 "Yes"label values question1 yesnolabel values question2 yesno
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
Creation & ModificationLabellingSelecting variables
Selecting variables
drop varlist
keep varlist
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Manipulating Datasets
use & save
appendmergebrowse and edit
preserve and restore
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
use
use "filename" reads a file into stataIf there is already a file in stata, need use "filename",clear
Always use inverted commasEasier to use the menu or button-bar
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
save
save "filename" saves the current dataset as"filename"
If "filename" already exists, need save "filename",replace
Option saveold allows saving in format of a previousversion of stataIf you do not include a directory in filename, stata will tryto save it in the current directory
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Combining Datasets
appendmore subjects, same variablesappend using filename
mergesame subjects, more variablesmerge 1:1 identifier using filename
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Appending Data: Example
ID common_1 common_2 file1_1 file1_21 a1 b1 c1 d12 a2 b2 c2 d23 a3 b3 c3 d3
Appending Data: File 1
ID common_1 common_2 file2_1 file2_24 a4 b4 e4 f45 a5 b5 e5 f56 a6 b6 e6 f6
Appending Data: File 2
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Appending Data: Example
ID common_1 common_2 file1_1 file1_2 file2_1 file2_21 a1 b1 c1 d1 . .2 a2 b2 c2 d2 . .3 a3 b3 c3 d3 . .4 a4 b4 . . e4 f45 a5 b5 . . e5 f56 a6 b6 . . e6 f6
Appending Data: Combined Files
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Merging Data
Need an identifier (one or more variables on which tomatch observations)Both files must be sorted by this identifierAll observations from both files are usedVariable _merge says whether observation was in first file,second file or both.
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Merging Files: example
idno var1 var21 a1 b12 a2 b23 a3 b3
Merging Data: File 1
idno var3 var41 c1 d13 c3 d34 c4 d4
Merging Data: File 2
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Merging Files: example
idno var1 var2 var3 var4 _merge1 a1 b1 c1 d1 32 a2 b2 . . 13 a3 b3 c3 d3 34 . . c4 d4 2
Merging Data: Combined Files
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
Ensuring Uniqueness
Usually, should only be one observation per uniqueidentifierMay not be the case (e.g. adding family-level data toindividual-level data)If there should be one observation per identifier in bothdatasets, use the command merge 1:1
If each record in current dataset corresponds to several inthe merged dataset, use merge 1:m
Equally, there are merge m:1 and merge 1:mcommands
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
browse & edit
Can open a data editor window with browse
Can choose variables to browse with browse varlist
Cannot modify data while browsingedit allows data to be changed: don’t use it
IntroductionGetting Help
Stata WindowsBasic Concepts
Manipulating VariablesManipulating Datasets
BasicsAppending DatasetsMerging DatasetsOther dataset commands
preserve & restore
You may wish to change your data temporarilyE.g. collapse to means by groupType preserve before changing data, restore after