Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH

52

description

Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH. First: Advice to the R Newbie. Use the assignment arrow!

Transcript of Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH

Page 1: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH
Page 2: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Using R in a Microsoft Office

World (aka: hostile environment)

Harold Baize, PhD SF-DPH

Page 3: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

First: Advice to the R Newbie

Use the assignment arrow! <- Directional and not ambiguous Clearly shows an object created or modified

X = Y X <- Y Y -> X Use caution with attach() !! * Does not work like SPSS or SAS! * Data manipulations are not part of the data frame

Page 4: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Attach HeadacheAn Introduction to Statistical Learning: with Applications in R James, Witten, Hastie and Tibshirani (2013)“To refer to a variable, we must type the data set and the variable name joined with a $ symbol. Alternatively, we can use the attach() function in order to tell R to make the variables in this data frame available by name.”

Google R Style Guide:“AttachThe possibilities for creating errors when using attach are numerous. Avoid it.”

Page 5: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Attach HeadacheR is fundamentally different from SAS and SPSS, you’re not restricted to one dataset at a time! var1 <- gear * carb #Error: object 'gear' not foundattach(mtcars) var2 <- gear + carb

mtcars$var3 <- gear + carb

Page 6: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

There are now two mtcars, the one attached was not changed!

mtcars$var4 <- mtcars$cyl + mtcars$var3

Formula notation lets you skip $

Attach not necessary with formula notation (data = )

Page 7: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

REPRODUCIBILITY

REPRODUCIBILITY

REPRODUCIBILITY

https://www.scienceexchange.com/reproducibility

Page 8: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Ewww… It’s sorta GUI

Page 9: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

R in a Microsoft Office World IT does not trust open source software Your supervisors don’t trust open source software (but want to save $)

Everyone else only knows MS-Office (you might get hit by a bus) Management only knows Excel (and they think its great!) You have to be able to share data between R and MS-Office apps Need to publish results in MS-Office formats but want reproducibility

Page 10: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

R in a Microsoft Office World Two basic strategies:

Work in MS Office with R plug-ins to execute R from within the app a. not open source b. not easily reproducible due to GUI

Work in R and use packages to tailor output for MS Office

Page 11: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

MS Office Plug-ins

statconnDCOM Server: server that allows integration of R into other applications. rscproxy: R package required for rcom and for the statconn (D)COM Server. rcom: R package for R and other apps to communicate RExcel: embeds R in Excel SWord: embeds R in Wordmodified from:http://www.londonr.org/Presentations/May'10%20Markus%20Gesmann.pdf

Page 12: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Working in R:Data Exchange ODBC CSV files Excel

Page 13: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

RODBC Use RODBC to perform SQL queries ODBC drivers need to be set up in

your operating system. Once set up, you use the

Data Source Name (dsn) as a connection

Page 14: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Control Panel > Administrative Tools > Data Sources (ODBC)

Page 15: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Enter a name Description optional Select server from drop down menu Defaults should work for the rest…

Page 16: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

library(RODBC)AvDW <- odbcConnect( dsn = "Avatar_Reports")

AvLv <- odbcConnect( dsn = "AVCALPMLIVE", believeNRows = FALSE) ## , uid="User", pwd="password")

Page 17: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

X <- sqlQuery(AvDW, "select * from tableName")

Returns a data frame or SQL error message to the data object, no R error until you use X Dates are returned in POSIXct format Text fields are returned as factors, unless you specify stringsAsFactors = FALSE

sqlSave(OpenChannel, dfName, append = FALSE)

Page 18: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

StDate <- "2012-07-01"EnDate <- "2013-06-30"

idEpFY <- sqlQuery(AvDW, paste( "select distinct id, epi_num from billing_tx_history where date_of_service Between {d '",StDate,"'} and {d '",EnDate,"'} and rrg = 'Mental Health'"))odbcClose(AvDW) # {d '2012-07-01'}

Page 19: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Accesslibrary(RODBC)Ac <- odbcConnectAccess("fileName.mdb") ## For newer versions of Access, 2007 or later... "accdb"## dataObj <- odbcConnectAccess2007("fileName.accdb") ## In this example "Demos_FY" is a table in an Access database## If the table name contains underscore, then the ## table name needs to be in back tick quote`

DemoDat <- sqlQuery(Ac, "select * from `Demos_FY`")

Page 20: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

CSV Comma Separated Values (in plain text) Almost a universal format… ALMOST! Microsoft opens CSV files with Excel by default

## To read a CSV fileAnx <- read.csv("ANexx.csv")str(Anx)

## Returns dates as factorAnx$disDate <- as.Date(Anx$disDate, "%m/%d/%Y")## To write CSVwrite.csv(Anx, file = "Ane.csv", row.names = FALSE)

Page 21: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

## If you don’t use ## row.names = FALSE

str(Anx)'data.frame': 2515 obs. of 23 variables: $ X : int 1 2 3 4 5 6 7 8 ... $ X.3 : int 1 2 3 4 5 6 7 8 ... $ X.2 : int 1 2 3 4 5 6 7 8 ... $ X.1 : int 1 2 3 4 5 6 7 8 ... $ program_code : chr "88121" "8919F1"...

Page 22: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Excel CSV “Feature”Microsoft wants to be sure your data looks good, so…. Leading zeroes are removed! program_code 00201 becomes 201Not an option, it’s automatic.

Only work around in Excel is use GUI and open as “text”Note: This problem can occur in R too, although not as common. The solution in R is to use the colClasses option to define data types with read.csv().

Page 23: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

ExcelPackage RODBC has a function odbcConnectExcel()Package gdata read.xls()and read.xlsx() Requires separate installation of pearl executable.

Packages xlsx and XLConnect both use Java run time to parse MS Office “Open XML” format. They can both read and write.

Best package to use to read and write Excel is…

Page 24: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

XLConnect Uses Java runtime installed on mostcomputers Multiplatform Provided by Mirai Solutions GmbHCan write Excel files with formatting

Page 25: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Read Excellibrary(XLConnect)prg <- readWorksheetFromFile( "ExcelFile.xls", sheet = 1)

## Other arguments:## startRow = 1, startCol = 5, endRow = 1, endCol = 5

1 2 3 4## sr,sc,er,ec

Page 26: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Excel file as often misused

Page 27: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

## Create data frame from multiple Excel files## Read file names from the directoryfilenames <- list.files("./XLfiles", pattern = "*.xls", full.names = TRUE)

lpd <- lapply(filenames, function(x) {cbind( readWorksheetFromFile(x,1,13, 1,75, 6), # main data readWorksheetFromFile(x,1, 1, 5, 1, 5, header = FALSE), # prog code sr 1,sc 5,er 1,ec 5 readWorksheetFromFile(x,1, 3, 5, 3, 5, header = FALSE), # program name readWorksheetFromFile(x,1, 5,13, 5,13, header = FALSE), # Date Submitted readWorksheetFromFile(x,1, 2,13, 2,13, header = FALSE)) # FY } )## Change the list object into one data frameddf <- do.call(rbind,lpd)## Give names to the columnsnames(ddf) <- c("A","B","LFname","Hours","Worked", "ClinHrsWk","prog","progN","DateSub","FY")

Page 28: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Write ExcelBasically the same as reading…

To create a new Excel file from a data frame:writeWorksheetToFile("Name.xls", data = dfName, sheet = "Some Name", startRow = 3, startCol = 4)

Page 29: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Add a sheet to an existing Excel file, put your data frame (dfName) into it. wb <- loadWorkbook("xlFile.xls")createSheet(wb, name = "Sheet Name")writeWorksheet(wb, dfName, sheet = "Sheet Name")saveWorkbook(wb)

Page 30: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Reports and Publishing Export data to Crystal Reports Tableau integration Word

R2DOCX Markdown to Word (knitr) xtable sjPlot

Page 31: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Tableau 8.1 integration involves installing the package Rserve which establishes connection to Tableau. Short R scripts can be executed in one line Tableau functions return either integer, real, string or boolean data type R code is the part between red single quotes below.

SCRIPT_INT('kmeans(data.frame(.arg1,.arg2,.arg3,.arg4),3)$cluster;',SUM([Petal length]), SUM([Petal width]),SUM([Sepal length]),SUM([Sepal width]))

Page 32: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

R2DOCXR2DOCX is a package on GitHub that allows you to write Word documents directly in R by using functions!

library(R2DOCX)## Create a document new objectdoc <- new("Docx",

title = "My example", basefile = template.file)

doc <- addParagraph(doc, stylename = "Normal", value = paste("This is a rather short paragraph. ", RObj," Not a lot of room on a PowerPoint screen."))

Page 33: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Markdown to Word Use knitr and knit2html() Use Pandoc, or… A simple trick allows you to open HTML documents in Word You can compose reproducible reports in R as markdown documents with plots and tables Plots can be created as any device including Windows Metafile Tables can be formatted using packages xtable or sjPlot (but not both)

Page 34: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Markdown: A simplified Heading=======

Sub-heading----------- Paragraphs are separated by a blank line. Text attributes *italic*, **bold**, `monospace`. A [link](http://example.com). Shopping list: * apples * oranges

Numbered list:

1. apples 2. oranges

The rain---not the reign---in Spain.

# Heading

## Sub

_italic___bold__

Page 35: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

<h1>Heading</h1> <h2>Sub-heading</h2>

<p>Paragraphs are separated by a blank line.</p>

<p>Text attributes <em>italic</em>, <strong>bold</strong>, <code>monospace</code>.</p> <p>A <a href="http://example.com">link</a>.</p>

<p>Shopping list:</p> <ul> <li>apples</li> <li>oranges</li> <li>pears</li> </ul>

<p>Numbered list:</p> <ol> <li>apples</li> <li>oranges</li> <li>pears</li> </ol>

<p>The rain&mdash;not the reign&mdash;in Spain.</p>

Page 36: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

R in Markdown## Code chunks

A _paragraph_ here. A code chunk below

```{r}1+1 ## defaults: echo=TRUE, include=TRUE .4-.7+.3 ## Use trunc() or round()```## Inline code

The value of pi is **`r pi`**, and 3 times pi is **`r sprintf("%.2f",3*pi)`**.

Page 37: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

HTML to Word: Pandoc

Page 38: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

HTML to Wordlibrary(knitr)knit2html("RMD_Web.rmd")Creates RMD_Web.html and RMD_Web.md

How do you turn it into a Word file?

file.copy("fileName.html", "fileName.doc")(not fileName.docx)

Page 39: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Change Windows to show file extensions Organize > Folder and search options

Page 40: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

HTML to Wordknit2html("RMD_Word.rmd", options = c('mathjax','smartypants', 'use_xhtml','highlight_code'))

Rename RMD_Word.html as RMD_Word.doc

Double click, it opens in Word as a web page, not text

Change the view from web page… View > Print Layout

Save as “docx” it is now a Word document!

Page 41: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Plots and Images By default knitr renders plots as PNG and stores in sub-folder “figure”

Default is to embed plots in the HTML as base 64 encryption. Word does not read that. If you render plots directly to output, then you need to turn off the base_64images option.

Page 42: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

More control by rendering plots using R device (PNG, JPEG, PDF…) then reading in the file If you render image files then read them in, you don’t need to change options to turn off base_64images. To use in Word, render plots using win.metafile() as device, then read in the file. ![plotName](plotName.wmf)

Page 43: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Generate image file in R chunk, read it into document in markdown

```{r, echo=FALSE, include=FALSE}win.metafile("wmDense.wmf", width = 6, height = 4)

densityplot(~mpg|as.factor(cyl), main = "Density Plot by Cylinders", xlab = "Miles per Gallon ", data = mtcars)

dev.off()```![wmhist](wmDense.wmf)

Page 44: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

xtable Nice plots are not enough. You need to be able to print well formatted results and tables. The crude text of R output won’t do in a publication. knit2html() produces HTML Tags can be used in your text! (i.e. Greek: &beta; prints out β) xtable prints tables and data frames as HTML directly in the text

Page 45: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

**Regression on Income**```{r, echo = FALSE, results = 'asis'}reg <- lm(income ~ age + IQ, data = donner)

rxl <- summary(reg)print(xtable(rxl), type = "HTML")

```Residual standard error `r sprintf("%.1f", rxl$sigma)`

on `r rxl$df[2]` degrees of freedom.

xtable

Page 46: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Another Example of In Line Statistics

**Chi Square** Statistical test from a cross tabulation.

**&chi;^2** = `r xt$chisq$statistic`,df = `r xt$chisq$parameter`,*p* = `r xt$chisq$p.value`

Page 47: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

sjPlot sj is for the blog “Strenge Jacke!” German title: “Strict Jacket, the majority do not want to hear the Daniel.” (Daniel Lüdecke) A Package for plots and to generate well formatted output for linear models, frequency tables, and cross tabs similar to SPSS output. Originally designed to produce stand alone web pages or output to R Studio viewer window. Should work with knitr…

Page 48: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

sjPlot Use sjPlot functions in markdown in-line R statements Internal objects of function page.style and page.content provide the HTML Double in-line calls to sjPlot functions Caused conflicts with CSS of the web page Automatically opened web browser Not well suited for knitr use…

Page 49: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

sjPlot So… Comments on Daniel’s blog page New version of sjPlot to be more knitr friendly Still using in line R in markdown but… Embedded CSS Option for no output to browser knitr object added, only need one function

Page 50: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

sjPlot table functions: sjt.corr, sjt.df, sjt.frq, sjt.glm, sjt.xtab, sjt.pca, sjt.stackfrq

```{r echo = FALSE}library(sjPlot) ## One line```

`r sjt.frq(efc$c160age, no.output = TRUE, variableLabels = list("Carer's Age"), autoGroupAt = 10)$knitr`

Page 51: Using  R  in a Microsoft Office World  (aka: hostile environment)  Harold Baize, PhD   SF-DPH

Do it…