nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

23
http://www.nytimes.com/2009/01/07/technology/business-computing/07progr am.html?pagewanted=all

description

http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?pagewanted=all. Workspace Fewer Lines of Code Efficiency Capability. Package Code Documentation Datasets. Source Code Tons of Lines of Code Simplified. - PowerPoint PPT Presentation

Transcript of nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Page 1: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Page 2: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all
Page 3: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Source Code- Tons of Lines of

Code Simplified

Package- Code- Documentation- Datasets

Workspace- Fewer Lines of Code- Efficiency- Capability

Page 4: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

The next data visual was produced with about 150 lines of R code

Page 5: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Workflow

Statistics &Analysis

Data AnalysisGoals

Data Input

Visualization & Reporting

Data Management

Enter Manually

Combine Variables Add Variable Select a Subset

Input a Comma Separated Values

R Installation AlreadyIncludes Several Libraries

Page 6: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

 Integrated Development Environment (IDE) 

Write Code/ Program- Input Data- Analyze- Graphics

Datasets, etc.

Enter CommandsView Results

Page 7: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

The R Graphics Package

Graphing Parameters

TitlesX-Axis TitleY-Axis TitleLegendScalesColorGridlines

library(help="graphics")

Basic Chart Types

Page 8: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Currently, how many R Packages?

At the command line enter: dim(available.packages()) available.packages()

Page 9: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Correlations Matrix library(car) scatterplotMatrix(h)

Page 10: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

In ggplot2 a plot is made up of layers.

ggplot2

Pl o t

Grammar of Graphics

Layer

- Data

- Mapping

- Geom

- Stat

- Postiion

Scale

Coord

Facet

Page 11: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

ggplot2

Page 12: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all
Page 13: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Character Vector: b <- c("one","two","three")

numeric vector

character vector

Numeric Vector: a <- c(1,2,5.3,6,-2,4)

Matrix: y<-matrix(1:20, nrow=5,ncol=4)

Dataframe:d <- c(1,2,3,4)e <- c("red", "white", "red", NA)f <- c(TRUE,TRUE,TRUE,FALSE)mydata <- data.frame(d,e,f)names(mydata) <- c("ID","Color","Passed")

List:w <- list(name="Fred", age=5.3)

Data Structures

Framework Source: Hadley Wickham

Page 14: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Actor Heights

1) Create Vectors of Actor Names, Heights, Date of Birth, Gender

2) Combine the 4 Vectors into a DataFrame

Page 15: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

• Numeric: e.g. heights

• String: e.g. names

• Dates: “12-03-2013

• Factor: e.g. gender

• Boolean: TRUE, FALSE

Variable Types

Page 16: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

• We use the c() function and list all values in quotations so that R knows that it is string data.

• Create a variable called ActorNames as follows:

ActorNames <- c(“John", “Meryl”, “Jennifer", “Andre")

Creating a Character / String Vector

Page 17: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Class, Length, Index

class(ActorNames)

length(ActorNames)

ActorNames[2]

Page 18: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

• Create a variable called ActorHeights (inches):

ActorHeights <- c(77, 66, 70, 90)

Creating a Numeric Vector / Variable

Page 19: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

• Use the as.Date() function:

ActorDoB <-as.Date(c("1930-10-27", "1949-06-22", "1990-08-15", "1946-05-19“ ))

• Each date has been entered as a text string (in quotations) in the appropriate format (yyyy-mm-dd).

• By enclosing these data in the as.Date() function, these strings are converted to date objects.

Creating a Date Variable

Page 20: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

• Use the factor() function:

ActorGender <- c(“male", “female", “female", “male“ )ActorGender <- factor(ActorGender)

Creating a Categorical / Factor Variable

Page 21: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

Actor.DF <-data.frame(Name=ActorNames, Height=ActorHeights, BirthDate = ActorDob, Gender=ActorGender)

Vectors and DataFrames

dim(Actor.DF)Actor.DF[2]Actor.DF[2,]Actor.DF[1,3]Actor.DF[2,2]Actor.DF[2:3,]

Page 22: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

> getwd()[1] "C:/Users/johnp_000/Documents"

> setwd()

getwd() setwd()

Page 23: nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all

• write.table(Actors.DF, “ActorData.txt", sep="\t", row.names = TRUE)

• write.csv(Actors.DF, “ActorData.csv")

Write / Create a File