On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

22
On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models Doug White Argonne Lab contributors: Tom Uram, Lukasz Lacinski, and Rachana Ananthakrishnan, Implementing R code by Anthon Eff and mathematical modeling by Malcolm Dow Models exemplified for Binford foragers (LRB), the comparison of variables from two separate Moral Gods models (SCCS), and Reincarnation beliefs (SCCS)

description

On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models. Doug White Argonne Lab contributors: Tom Uram , Lukasz Lacinski , and Rachana Ananthakrishnan , Implementing R code by Anthon Eff and mathematical modeling by Malcolm Dow - PowerPoint PPT Presentation

Transcript of On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

Page 1: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

Doug WhiteArgonne Lab contributors: Tom Uram, Lukasz Lacinski, and Rachana Ananthakrishnan, Implementing R code by Anthon Eff and mathematical modeling by Malcolm Dow

Models exemplified for Binford foragers (LRB), the comparison of variables from two separate

Moral Gods models (SCCS), and Reincarnation beliefs (SCCS)

Page 2: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

2

Early version, menushttp://SocSciCompute.ss.uci.edu/

Page 3: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

3

Later version with a new project youtube, bearing instructions

Clicking the bolded option “DEf01D Dow Eff” brings up the menu for modeling

Page 4: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

4

Access to CoSSci• We urge conferees to hear about sharing results from the

youtube at http://SocSciCompute.ss.uci.edu/ by• LACINSKI, Łukasz (ARGONNE) How to share CoSSci histories• and http://socscicompute.ss.uci.edu/history/list_published by

the fall 2014 class taught by Ren Feng, Xiamen University• … and the youtube talks for the the SASci session at

http://SocSciCompute.ss.uci.edu/ by – URAM Thomas, LACINSKI Łukasz, ANANTHAKRISHNAN Rachana

(ARGONNE) and WILKINS-DIEHR Nancy (SDSC) Present and Future of High Performance Computing in Anthropology and the Social Sciences (this talk will be available by March 21)

– ROBERTS Wesley, Evolution of Religion

Page 5: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

5

Access to Data and Software• The Dow-Eff functions are written in the R language, and are available in an R workspace which can be loaded in an R GUI such as RStudio.

Based on ideas developed by Malcolm M. Dow and E. Anthon Eff, the functions will estimate OLS, logit, and multinomial logit models, using multiple imputation to handle the problem of missing data, and network lag terms to handle Galton’s Problem.

• Using R scripts with the SCCSUsing R scripts with the LRBUsing R scripts with the EAUsing R scripts with the WNAI• Using R scripts with the XC (merged 371 society dataset)• • Dow’s initial work on network lag models:• Dow, M. M., Burton, M. L., White, D. R., & Reitz, K. (1984). Galton’s Problem as network autocorrelation. American Ethnologist, 11, 754-770.

(Link) • Dow, M. M. (2007). Galton’s Problem as multiple network autocorrelation effects. Cross-Cultural Research, 41, 336-363. (Link)• Dow and Eff on the prevalence of autocorrelation in cross-cultural data:• Eff, E. Anthon. 2004. “Does Mr. Galton Still Have a Problem?: Autocorrelation in the Standard Cross-Cultural Sample.” World Cultures.

15(2):153-170. (Link) • Eff, E. Anthon. Spatial and Cultural Autocorrelation in International Datasets. MTSU Department of Economics and Finance Working Papers.

February 2004. (Link)• Eff, E. Anthon. Spatial, Cultural, and Ecological Autocorrelation in U.S. Regional Data. MTSU Department of Economics and Finance Working

Papers. September 2004. (Link) • Dow, Malcolm M., and E. Anthon Eff. 2008. “Global, Regional, and Local Network Autocorrelation in the Standard Cross-Cultural Sample.”

Cross-Cultural Research. 42(2):148-171. (Link)• Etc. The CoSSci Gateway also accesses the R code, data, variable labels, etc. for classroom use• The intersciwiki is also a repository for all these materials and for sharing results.• • abbreviationdatasetcodebook• WNAIWestern North American Indianscodebook• SCCSStandard Cross-Cultural Samplecodebook• EAEthnographic Atlascodebook• LRBLewis R. Binford's forager datacodebook• XCMerged 371 society datacodebook

Page 6: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

6

Access to Variables and CodebookExample: Lewis R Binford (LRB) Foragers sample with DEf software at CoSSci (or home R gui)load(url("http://dl.dropbox.com/u/9256203/DEf01d.Rdata"), .GlobalEnv)setDS("LRB")names(dx) … e.g., portion of list of variables for Lewis R. Binford data[429] "s_bulk_density" "s_oc" "s_ph_h2o" "s_cec_clay" [433] "s_cec_soil" "s_bs" "s_teb" "s_caco3" [437] "s_caso4" "s_esp" "s_ece" "su_symbol" [441] "su_value" "sq1" "sq2" "sq3" [445] "sq4" "sq5" "sq6" "sq7" [449] "dicgsh1a" "dicgsh1a.flag" "etmnts2a" "etmnts2a.flag" [453] "g12igb3a" "g12igb3a.flag" "twisre3a" "twisre3a.flag" [457] "l3pobi3b" "l3pobi3b.flag" "l3pobi3b.navn" "opisre2a" [461] "opisre2a.flag" "geaisg3a" "geaisg3a.flag" "geaisg3a.navn" [465] "glcjrc3a" "glcjrc3a.flag" "glcjrc3a.navn" "inmsre3a" [469] "inmsre3a.flag" "inssre2a" "inssre2a.flag" "evmmod2a" [473] "evmmod2a.flag" "lammod3a" "lammod3a.flag" "anntotprecip" [477] "anntotprecip.flag" "avgannrh" "avgannrh.flag" "avgannrunoff" [481] "avgannrunoff.flag" "evapotrans" "evapotrans.flag" "gdd" [485] "gdd.flag" "npp" "npp.flag" "pevapotrans" [489] "pevapotrans.flag" "potentialveg" "potentialveg.flag" "potentialveg" [493] "snowdepth" "snowdepth.flag" "soilmoisture" "soilmoisture" [497] "suit" "suit.flag" "eaid" "lrbid" [501] "sccsid" "wnaiid" "xcid" "awc" [505] "society" "dxid"

http://capone.mtsu.edu/eaeff/downloads/LRBcodebook.html

Page 7: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

7

CoSSci Access Window for LRB at http://socscicompute.ss.uci.edu

Choose DEf01d Choice of LRB (Binford Hunters and Gatherers) 75: Maps: Dummy Variables (such as dx$v279.d1)

74: Cases: Dep Variable dens1 = 142 forager subsample

73: h[ ]: in CSVIndep Variables fishing,gathering,rlow,temp,lbio5

diskette: CSVUnrestricted Vars dspmov,numfam,numg3 (helped to start with covariates) (i) errors/R codeNew variables and their Definitions: None below

o green circle = numbering, e.g., 70-73-76 are successive runsRepeat with changes

Page 8: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

8

CoSSci Access Window for LRB at http://socscicompute.ss.uci.edu

This screen appears after pressing the o green circle = Repeat with changes from the preceding window

Page 9: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

9

Defining variables in CoSSci windows

• Making variables into dichotomies (at top of previous screen)• Enter DUMMY VARIABLES v279.d1,v213.d3,v279.d5,v1127.d2

• New variables (at bottom of previous screen)

VARIABLE dx$v2013pos (drops -1 value in Wallace/Roberts/EvoReligion)COMPUTE df <- dx$v2013 ; df[df == -1] <- NA ; dx$v2013pos <- df VARIABLE dx$plow COMPUTE dx$plow ; dx$plow <- (dxv243>1)*1 (2=at period of observation 3=aboriginal)

• Defining variables by interactions (at bottom of previous screen)VARIABLE dx$AnimXbwealthCOMPUTE dx$v206*(dx$v208==1)*1 (product of two variables)

• Squared variablesVARIABLE dx$Sqv206COMPUTE dx$v206^2 (square of one variables)

• Subsetting a Sample (e.g., LRB)• dx$fish1<-dx$fishing ; z<-which(dx$hg142!=1) ; is.na(dx$gath1[z])<-TRUE

Page 10: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

10

UCI VM – quick turnaround & myOutput (time to construct the R Gui code is much longer than CoSSci): - Names for Access to Specific Sets of Output at CoSSci and the DEf Laptop R Gui

• h[1] depvar • H[2] UR Unrestricted model variables (may be needed for covariates in

regression)• H[3] UR model.varbs • h[4] R Restricted model• H[5] Endogeneity tests in Endogeneous variables specified• h[6] Diagnostics: Heteroskedasticity, Hausman, and other tests• H[7] autocorrelation percentages (distance, language, ecology) and R

squareds • H[8] descriptive statistics• h[9] totry (possibly add1 to model) important to improving model• h[10] didwell• H[11] Used these• H[12] dfbetas for analysis of outliers• H[13] imputed data used in second-level analysis (e.g., moral gods model)

Page 11: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

11

The http://SocSciGate.oit.uci.edu DEf Window@VM runs in 2 minutes; while SDSC is slow it does complex operations. Both return myOutput.csv

For large or small online or in-class courses, instructors receive a history of the student’s runs on CoSSci(The yellow color shows the student has clicked the blue EXECUTE button of this model, in white letters, and that the computer is now starting to run the model. If it turns to RED there’s an error)

Page 12: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

12

Using “To Try” h[10] iteratively

• http://socscicompute.ss.uci.edu/history/list_published shows the use of “To Try” by the fall 2014 Xiamen University students taught by Ren Feng. Each iteration is a choice of which variables in the output “To Try” list are likely to be good choices to test towards a finished model.

Page 13: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

13

AnimXbwealthHiGod 0 1 2 3 4 5 7 8 9 1 54 7 6 1 0 0 0 1 0 2 40 6 5 0 0 0 0 0 0 3 13 1 4 3 1 0 1 0 0 4 21 2 0 9 3 1 0 3 4 HiGodFxCmtyWages 1 2 3 4 0 69 51 10 43 1 0 0 13* 0 3=*unconcerned with humans (neither Islam nor Xianity) 4=supportive of morality

Writing & RecordsHiGod 1 2 3 4 5 1 35 16 10 0 8 2 25 17 6 0 3 3 7 9 3 2 2 4 6 7 2 10 18

White, Oztan & Snarey (2014)

Brown & Eff (2010)

Bayesian Network Learning Results using library(bnlearn) in comparing two Moral Gods models

Page 14: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

14

library(bootstrap) blocLite(Rgraphviz)

Page 15: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

15

Paul Rodriguez SDSCBioconductor.blocLite.R library(bootstrap) blocLite(Rgraphviz)V=letters[1:10]M=1:4g1=randomGraph(V,M,0.2)plot(g1)

Probabilities generated by bootstrap run on SDSC Trestles supercomputer

1695=No Scarification, 270=Class stratification

Page 16: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

16

CoSSci CSV output = R gui output h[4], h[6], h[7] for reincarnation model

Screen_Shot_2014-02-21_at_2.51.04_PM.png

Page 17: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

17

CoSSci Maps, e.g., Reincarnation beliefs

Make with new CoSSci so name will show

Page 18: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

18

CoSSci Maps, e.g., Reincarnation beliefsclassical Karmic reincarnation in yellow, red: 5,6

Page 19: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

19

geographic maps of background variables with White-Murdock alignments (intersciwiki)

Female Equality v626 Plow v243

Moral Gods HiGod4 SCCS numbering

Page 20: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

20

Other features in CoSSci, e.g.,library(psych), Matrix Algebra in R p14-15 https://personality-project.org/r/sem.appendix.1.pdf pairs.panels(vars)

These variables are ones used by Amber Johnson in analysis of the LRB data on 399 hunters and gatherers (Lew Binford 2001)

Page 21: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

21

Page 22: On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models

22

Networks of variables