wrangling-webinar
-
Upload
brunosipod -
Category
Documents
-
view
220 -
download
0
description
Transcript of wrangling-webinar
![Page 1: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/1.jpg)
Studio
Copyright 2014 RStudio | All Rights Reserved Follow @rstudio
Data Scientist and Master Instructor January 2015 Email: [email protected]
Garrett Grolemund
Data Wrangling with R
250 Northern Ave, Boston, MA 02210 Phone: 844-448-1212
Email: [email protected] Web: http://www.rstudio.com
How to work with the structures of your data
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000Slides at: bit.ly/wrangling-webinar
![Page 2: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/2.jpg)
© 2014 RStudio, Inc. All rights reserved.
GarrettHELLO
my name is
! [email protected]" @StatGarrett
slides at: bit.ly/wrangling-webinar
![Page 3: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/3.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
Two packages to help you work with the structure of data.
tidyr
dplyr
![Page 4: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/4.jpg)
© 2014 RStudio, Inc. All rights reserved.
http://www.rstudio.com/resources/cheatsheets/
slides at: bit.ly/wrangling-webinarslides at: bit.ly/wrangling-webinar
![Page 5: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/5.jpg)
Ground rules
![Page 6: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/6.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar974 55.0 2893 5.89 5.92 3.69975 62.0 2893 6.02 6.04 3.61976 55.0 2893 6.00 5.93 3.78977 59.0 2893 6.09 6.06 3.64978 57.0 2894 5.91 5.99 3.71979 57.0 2894 5.96 6.00 3.72980 56.0 2894 5.88 5.92 3.62981 56.0 2895 5.75 5.78 3.51982 59.0 2895 5.66 5.76 3.53983 53.0 2895 5.71 5.75 3.56984 58.0 2896 5.85 5.89 3.51985 60.0 2896 5.81 5.91 3.59986 63.0 2896 6.00 6.05 3.51987 56.0 2896 5.18 5.24 3.21988 56.0 2896 5.91 5.96 3.65989 55.0 2896 5.82 5.86 3.59990 56.0 2896 5.83 5.89 3.64991 58.0 2896 5.94 5.88 3.60992 57.0 2896 6.39 6.35 4.02993 57.0 2896 6.46 6.45 3.97994 57.0 2897 5.48 5.51 3.33995 58.0 2897 5.91 5.85 3.59996 52.0 2897 5.30 5.34 3.26997 55.0 2897 5.69 5.74 3.57998 61.0 2897 5.82 5.89 3.48999 58.0 2897 5.81 5.77 3.581000 59.0 2898 6.68 6.61 4.03 [ reached getOption("max.print") -- omitted 52940 rows ]
tbl’s
slides at: bit.ly/wrangling-webinar
Source: local data frame [53,940 x 10]
carat cut color clarity depth table1 0.23 Ideal E SI2 61.5 552 0.21 Premium E SI1 59.8 613 0.23 Good E VS1 56.9 654 0.29 Premium I VS2 62.4 585 0.31 Good J SI2 63.3 586 0.24 Very Good J VVS2 62.8 577 0.24 Very Good I VVS1 62.3 578 0.26 Very Good H SI1 61.9 559 0.22 Fair E VS2 65.1 6110 0.23 Very Good H VS1 59.4 61.. ... ... ... ... ... ...Variables not shown: price (int), x (dbl), y (dbl), z (dbl)
tbl data.frame
Just like data frames, but play better with the console window.
![Page 7: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/7.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinarslides at: bit.ly/wrangling-webinar
View()
View(iris) View(mtcars) View(pressure)
Data viewer opens hereExamine any data set with the View()
command (Capital V)
![Page 8: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/8.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
library(dplyr)select(tb, child:elderly)tb %>% select(child:elderly)
%>%The pipe operator
tb select( , child:elderly)
%>%
These do the same thingTry it!
![Page 9: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/9.jpg)
Data Wrangling
![Page 10: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/10.jpg)
Wrangling Munging Janitor Work Manipulation Transformation
© 2014 RStudio, Inc. All rights reserved.
50-80% of your time?
slides at: bit.ly/wrangling-webinar
![Page 11: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/11.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
Two goalsMake data suitable to use with a particular piece of software1Reveal information2
![Page 12: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/12.jpg)
Data sets come in many
formats …but R prefers just one.
![Page 13: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/13.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
# install.packages("devtools")# devtools::install_github("rstudio/EDAWR")library(EDAWR)?storms?cases
EDAWRAn R package with all of the data sets that we will use today.
?pollution?tb
![Page 14: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/14.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 15: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/15.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Storm name
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 16: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/16.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Storm name• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 17: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/17.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Storm name
• Air Pressure• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 18: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/18.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Storm name
• Air Pressure• Date
• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 19: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/19.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Country• Storm name
• Air Pressure• Date
• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 20: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/20.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Country• Year
• Storm name
• Air Pressure• Date
• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 21: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/21.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Country• Year• Count
• Storm name
• Air Pressure• Date
• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 22: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/22.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Country• Year• Count
• City• Storm name
• Air Pressure• Date
• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 23: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/23.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Country• Year• Count
• Amount of large particles• City• Storm name
• Air Pressure• Date
• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 24: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/24.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms
• Country• Year• Count
• Amount of large particles• City
• Amount of small particles
• Storm name
• Air Pressure• Date
• Wind Speed (mph)
cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
![Page 25: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/25.jpg)
© 2014 RStudio, Inc. All rights reserved.
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pollutionstorms cases
# devtools::install_github("rstudio/EDAWR")library(EDAWR)
storms$storm storms$wind storms$pressure storms$date
cases$country names(cases)[-1] unlist(cases[1:3, 2:4])
pollution$city[1,3,5] pollution$amount[1,3,5] pollution$amount[2,4,6]
![Page 26: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/26.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
ratio =pressure
wind
storms$pressure / storms$wind
9501003987
100410061000
8.622.315.225.120.122.2
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
pressure100710091005101310101010
/ 110/ 45/ 65/ 40/ 50/ 45
wind1104565405045
storms
![Page 27: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/27.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
Each variable is saved in its own column.1Each observation is saved in its own row.2Each "type" of observation stored in a single table (here, storms).3
stormsTidy data
slides at: bit.ly/wrangling-webinar
![Page 28: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/28.jpg)
© 2014 RStudio, Inc. All rights reserved.
Recap: Tidy dataVariables in columns, observations in rows, each type in a table123Easy to access variables#Automatically preserves observations#
slides at: bit.ly/wrangling-webinar
![Page 29: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/29.jpg)
tidyr
![Page 30: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/30.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
library(tidyr)?gather?spread
tidyrA package that reshapes the layout of tables.Two main functions: gather() and spread()# install.packages("tidyr")
![Page 31: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/31.jpg)
© 2014 RStudio, Inc. All rights reserved.
Your Turn
casesCountry 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Imagine how this data would look if it were tidy with three variables: country, year, n
slides at: bit.ly/wrangling-webinar
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
![Page 32: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/32.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
![Page 33: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/33.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 34: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/34.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 35: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/35.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 36: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/36.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 37: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/37.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 38: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/38.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 39: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/39.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 40: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/40.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 41: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/41.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 42: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/42.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 43: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/43.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 44: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/44.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
gather()
![Page 45: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/45.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
![Page 46: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/46.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
key (former column names)
![Page 47: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/47.jpg)
© 2014 RStudio, Inc. All rights reserved.
Country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
Country Year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
key value (former cells)
![Page 48: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/48.jpg)
© 2014 RStudio, Inc. All rights reserved.
gather(cases, "year", "n", 2:4)
Collapses multiple columns into two columns: 1. a key column that contains the former column names2. a value column that contains the former column cells
gather()slides at: bit.ly/wrangling-webinar
![Page 49: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/49.jpg)
© 2014 RStudio, Inc. All rights reserved.
gather(cases, "year", "n", 2:4)
Collapses multiple columns into two columns: 1. a key column that contains the former column names2. a value column that contains the former column cells
gather()
data frame to reshape
slides at: bit.ly/wrangling-webinar
![Page 50: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/50.jpg)
© 2014 RStudio, Inc. All rights reserved.
gather(cases, "year", "n", 2:4)
Collapses multiple columns into two columns: 1. a key column that contains the former column names2. a value column that contains the former column cells
gather()
data frame to reshape
name of the new key column
(a character string)
slides at: bit.ly/wrangling-webinar
![Page 51: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/51.jpg)
© 2014 RStudio, Inc. All rights reserved.
gather(cases, "year", "n", 2:4)
Collapses multiple columns into two columns: 1. a key column that contains the former column names2. a value column that contains the former column cells
gather()
data frame to reshape
name of the new key column
(a character string)
name of the new value column
(a character string)
slides at: bit.ly/wrangling-webinar
![Page 52: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/52.jpg)
© 2014 RStudio, Inc. All rights reserved.
gather(cases, "year", "n", 2:4)
Collapses multiple columns into two columns: 1. a key column that contains the former column names2. a value column that contains the former column cells
gather()
data frame to reshape
name of the new key column
(a character string)
name of the new value column
(a character string)
names or numeric indexes of columns
to collapse
slides at: bit.ly/wrangling-webinar
![Page 53: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/53.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
gather(cases, "year", "n", 2:4)
## country year n## 1 FR 2011 7000## 2 DE 2011 5800## 3 US 2011 15000## 4 FR 2012 6900## 5 DE 2012 6000## 6 US 2012 14000## 7 FR 2013 7000## 8 DE 2013 6200## 9 US 2013 13000
## country 2011 2012 2013## 1 FR 7000 6900 7000## 2 DE 5800 6000 6200## 3 US 15000 14000 13000
![Page 54: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/54.jpg)
© 2014 RStudio, Inc. All rights reserved.
Your TurnImagine how the pollution data set would look tidy with three variables: city, large, small
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
pollution
slides at: bit.ly/wrangling-webinar
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 55: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/55.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 56: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/56.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 57: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/57.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 58: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/58.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 59: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/59.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 60: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/60.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 61: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/61.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 62: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/62.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 63: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/63.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
![Page 64: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/64.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
spread()
![Page 65: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/65.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
key (new column names)
![Page 66: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/66.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
key value (new cells)
![Page 67: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/67.jpg)
© 2014 RStudio, Inc. All rights reserved.
spread(pollution, size, amount)
Generates multiple columns from two columns: 1. each unique value in the key column becomes a column name2. each value in the value column becomes a cell in the new columns
spread()slides at: bit.ly/wrangling-webinar
![Page 68: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/68.jpg)
© 2014 RStudio, Inc. All rights reserved.
spread(pollution, size, amount)
Generates multiple columns from two columns: 1. each unique value in the key column becomes a column name2. each value in the value column becomes a cell in the new columns
spread()
data frame to reshape
slides at: bit.ly/wrangling-webinar
![Page 69: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/69.jpg)
© 2014 RStudio, Inc. All rights reserved.
spread(pollution, size, amount)
Generates multiple columns from two columns: 1. each unique value in the key column becomes a column name2. each value in the value column becomes a cell in the new columns
spread()
data frame to reshape
column to use for keys (new columns
names)
slides at: bit.ly/wrangling-webinar
![Page 70: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/70.jpg)
© 2014 RStudio, Inc. All rights reserved.
spread(pollution, size, amount)
Generates multiple columns from two columns: 1. each unique value in the key column becomes a column name2. each value in the value column becomes a cell in the new columns
spread()
data frame to reshape
column to use for keys (new columns
names)
column to use for values (new
column cells)
slides at: bit.ly/wrangling-webinar
![Page 71: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/71.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
spread(pollution, size, amount)
## city large small## 1 Beijing 121 56## 2 London 22 16## 3 New York 23 14
## city size amount## 1 New York large 23## 2 New York small 14## 3 London large 22## 4 London small 16## 5 Beijing large 121## 6 Beijing small 56
![Page 72: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/72.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
spread()
![Page 73: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/73.jpg)
© 2014 RStudio, Inc. All rights reserved.
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city large small
New York 23 14London 22 16Beijing 121 56
spread()
gather()
![Page 74: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/74.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
There are three more variables hidden in storms:
unite() and separate()
storms
• Year• Month• Day
slides at: bit.ly/wrangling-webinar
![Page 75: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/75.jpg)
© 2014 RStudio, Inc. All rights reserved.
Separate splits a column by a character string separator.separate()
separate(storms, date, c("year", "month", "day"), sep = "-")
storm wind pressure year month dayAlberto 110 1007 2000 08 12
Alex 45 1009 1998 07 30Allison 65 1005 1995 06 04
Ana 40 1013 1997 07 1Arlene 50 1010 1999 06 13Arthur 45 1010 1996 06 21
storms storms2storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
slides at: bit.ly/wrangling-webinar
![Page 76: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/76.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
storm wind pressure year month dayAlberto 110 1007 2000 08 12
Alex 45 1009 1998 07 30Allison 65 1005 1995 06 04
Ana 40 1013 1997 07 1Arlene 50 1010 1999 06 13Arthur 45 1010 1996 06 21
storms2
© 2014 RStudio, Inc. All rights reserved.
Unite unites columns into a single column.unite()
unite(storms2, "date", year, month, day, sep = "-")
storms
slides at: bit.ly/wrangling-webinar
![Page 77: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/77.jpg)
© 2014 RStudio, Inc. All rights reserved.
Recap: tidyrA package that reshapes the layout of data sets.
p10071009 wwp1009p1009A1005A1013A1010A1010Make observations from variables with gather()
p10071009wwp1009p1009A1005A1013A1010A1010Make variables from observations with spread()
Split and merge columns with unite() and separate()
w100510051005100510051005w100510051005100510051005w100510051005100510051005w100510051005100510051005
slides at: bit.ly/wrangling-webinar
![Page 78: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/78.jpg)
Data sets contain more information than they display
![Page 79: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/79.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
library(dplyr)?select?filter?arrange
dplyrA package that helps transform tabular data.
?mutate?summarise?group_by
# install.packages("dplyr")
![Page 80: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/80.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
library(nycflights13)?airlines?airports?flights
nycflights13Data sets related to flights that departed from NYC in 2013
?planes?weather
# install.packages("nycflights13")
![Page 81: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/81.jpg)
© 2014 RStudio, Inc. All rights reserved.
Extract existing variables.1Extract existing observations.2
Ways to access informationselect()
filter()
mutate()
summarise()
Derive new variables3 (from existing variables)
Change the unit of analysis4
slides at: bit.ly/wrangling-webinar
![Page 82: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/82.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
select()
select(storms, storm, pressure)
stormsstorm pressureAlberto 1007
Alex 1009Allison 1005
Ana 1013Arlene 1010Arthur 1010
![Page 83: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/83.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
select()
select(storms, -storm)# see ?select for more
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
stormswind pressure date110 1007 2000-08-1245 1009 1998-07-3065 1005 1995-06-0440 1013 1997-07-0150 1010 1999-06-1345 1010 1996-06-21
![Page 84: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/84.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
select()
select(storms, wind:date)# see ?select for more
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
stormswind pressure date110 1007 2000-08-1245 1009 1998-07-3065 1005 1995-06-0440 1013 1997-07-0150 1010 1999-06-1345 1010 1996-06-21
![Page 85: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/85.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
Useful select functions
- Select everything but: Select range
contains() Select columns whose name contains a character stringends_with() Select columns whose name ends with a stringeverything() Select every columnmatches() Select columns whose name matches a regular expression
num_range() Select columns named x1, x2, x3, x4, x5one_of() Select columns whose names are in a group of names
starts_with() Select columns whose name starts with a character string
* Blue functions come in dplyr
![Page 86: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/86.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12Allison 65 1005 1995-06-04Arlene 50 1010 1999-06-13
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
filter()storms
filter(storms, wind >= 50)
![Page 87: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/87.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12Allison 65 1005 1995-06-04
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
filter()storms
filter(storms, wind >= 50, storm %in% c("Alberto", "Alex", "Allison"))
![Page 88: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/88.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
logical tests in R?Comparison
< Less than> Greater than
== Equal to<= Less than or equal to>= Greater than or equal to!= Not equal to
%in% Group membershipis.na Is NA!is.na Is not NA
& boolean and| boolean or
xor exactly or! not
any any trueall all true
?base::Logic
![Page 89: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/89.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
mutate()
mutate(storms, ratio = pressure / wind)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
storm wind pressure date ratioAlberto 110 1007 2000-08-12 9.15
Alex 45 1009 1998-07-30 22.42Allison 65 1005 1995-06-04 15.46
Ana 40 1013 1997-07-01 25.32Arlene 50 1010 1999-06-13 20.20Arthur 45 1010 1996-06-21 22.44
![Page 90: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/90.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
mutate()
mutate(storms, ratio = pressure / wind, inverse = ratio^-1)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
storm wind pressure date ratio inverseAlberto 110 1007 2000-08-12 9.15 0.11
Alex 45 1009 1998-07-30 22.42 0.04Allison 65 1005 1995-06-04 15.46 0.06
Ana 40 1013 1997-07-01 25.32 0.04Arlene 50 1010 1999-06-13 20.20 0.05Arthur 45 1010 1996-06-21 22.44 0.04
![Page 91: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/91.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
Useful mutate functions
pmin(), pmax() Element-wise min and maxcummin(), cummax() Cumulative min and maxcumsum(), cumprod() Cumulative sum and product
between() Are values between a and b?cume_dist() Cumulative distribution of values
cumall(), cumany() Cumulative all and anycummean() Cumulative meanlead(), lag() Copy with values one position
ntile() Bin vector into n bucketsdense_rank(), min_rank(),
percent_rank(), row_number() Various ranking methods
* All take a vector of values and return a vector of values ** Blue functions come in dplyr
![Page 92: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/92.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
"Window" functions
pmin(), pmax() Element-wise min and maxcummin(), cummax() Cumulative min and maxcumsum(), cumprod() Cumulative sum and product
between() Are values between a and b?cume_dist() Cumulative distribution of values
cumall(), cumany() Cumulative all and anycummean() Cumulative meanlead(), lag() Copy with values one position
ntile() Bin vector into n bucketsdense_rank(), min_rank(),
percent_rank(), row_number() Various ranking methods
* All take a vector of values and return a vector of values
123456
136101521
cumsum()
![Page 93: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/93.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
summarise()
pollution %>% summarise(median = median(amount), variance = var(amount))
median variance22.5 1731.6
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 94: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/94.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
summarise()
pollution %>% summarise(mean = mean(amount), sum = sum(amount), n = n())
mean sum n42 252 6
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 95: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/95.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
Useful summary functions
min(), max() Minimum and maximum valuesmean() Mean value
median() Median valuesum() Sum of values
var, sd() Variance and standard deviation of a vectorfirst() First value in a vectorlast() Last value in a vectornth() Nth value in a vectorn() The number of values in a vector
n_distinct() The number of distinct values in a vector
* All take a vector of values and return a single value ** Blue functions come in dplyr
![Page 96: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/96.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
"Summary" functions
min(), max() Minimum and maximum valuesmean() Mean value
median() Median valuesum() Sum of values
var, sd() Variance and standard deviation of a vectorfirst() First value in a vectorlast() Last value in a vectornth() Nth value in a vectorn() The number of values in a vector
n_distinct() The number of distinct values in a vector
* All take a vector of values and return a single value
123456
21sum()
![Page 97: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/97.jpg)
storm wind pressure dateAna 40 1013 1997-07-01Alex 45 1009 1998-07-30
Arthur 45 1010 1996-06-21Arlene 50 1010 1999-06-13Allison 65 1005 1995-06-04Alberto 110 1007 2000-08-12
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
arrange()storms
arrange(storms, wind)
![Page 98: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/98.jpg)
storm wind pressure dateAna 40 1013 1997-07-01Alex 45 1009 1998-07-30
Arthur 45 1010 1996-06-21Arlene 50 1010 1999-06-13Allison 65 1005 1995-06-04Alberto 110 1007 2000-08-12
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
arrange()storms
arrange(storms, wind)
![Page 99: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/99.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
storm wind pressure dateAlberto 110 1007 2000-08-12Allison 65 1005 1995-06-04Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21Alex 45 1009 1998-07-30Ana 40 1013 1997-07-01
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
arrange()storms
arrange(storms, desc(wind))
![Page 100: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/100.jpg)
storm wind pressure dateAna 40 1013 1997-07-01Alex 45 1009 1998-07-30
Arthur 45 1010 1996-06-21Arlene 50 1010 1999-06-13Allison 65 1005 1995-06-04Alberto 110 1007 2000-08-12
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
arrange()storms
arrange(storms, wind)
![Page 101: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/101.jpg)
storm wind pressure dateAna 40 1013 1997-07-01
Arthur 45 1010 1996-06-21Alex 45 1009 1998-07-30
Arlene 50 1010 1999-06-13Allison 65 1005 1995-06-04Alberto 110 1007 2000-08-12
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
arrange()storms
arrange(storms, wind, date)
![Page 102: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/102.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
library(dplyr)select(tb, child:elderly)tb %>% select(child:elderly)
%>%The pipe operator
tb select( , child:elderly)
%>%
These do the same thingTry it!
![Page 103: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/103.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
select()
select(storms, storm, pressure)
stormsstorm pressureAlberto 1007
Alex 1009Allison 1005
Ana 1013Arlene 1010Arthur 1010
![Page 104: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/104.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
select()
storms %>% select(storm, pressure)
stormsstorm pressureAlberto 1007
Alex 1009Allison 1005
Ana 1013Arlene 1010Arthur 1010
![Page 105: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/105.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12Allison 65 1005 1995-06-04Arlene 50 1010 1999-06-13
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
filter()storms
filter(storms, wind >= 50)
![Page 106: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/106.jpg)
storm wind pressure dateAlberto 110 1007 2000-08-12Allison 65 1005 1995-06-04Arlene 50 1010 1999-06-13
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
filter()storms
storms %>% filter(wind >= 50)
![Page 107: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/107.jpg)
storm pressureAlberto 1007Allison 1005Arlene 1010
storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
storms
storms %>% filter(wind >= 50) %>% select(storm, pressure)
![Page 108: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/108.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
mutate()storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
storms %>% mutate(ratio = pressure / wind) %>% select(storm, ratio)
?
![Page 109: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/109.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
mutate()storm wind pressure dateAlberto 110 1007 2000-08-12
Alex 45 1009 1998-07-30Allison 65 1005 1995-06-04
Ana 40 1013 1997-07-01Arlene 50 1010 1999-06-13Arthur 45 1010 1996-06-21
storm ratioAlberto 9.15
Alex 22.42Allison 15.46
Ana 25.32Arlene 20.20Arthur 22.44
storms %>% mutate(ratio = pressure / wind) %>% select(storm, ratio)
?
![Page 110: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/110.jpg)
© 2014 RStudio, Inc. All rights reserved.
Shortcut to type %>%
slides at: bit.ly/wrangling-webinar
Cmd M+ (Mac)
(Windows)
Shift +
Ctrl M+ Shift +
![Page 111: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/111.jpg)
Unit of analysis
![Page 112: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/112.jpg)
mean sum n42 252 6
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 113: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/113.jpg)
Beijing large 121Beijing small 56
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
mean sum n42 252 6
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16
![Page 114: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/114.jpg)
Beijing large 121Beijing small 56
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
city particlesize
amount (µg/m3)
New York large 23New York small 14
London large 22London small 16 19.0 38 2
mean sum n18.5 37 2
88.5 177 2
group_by() + summarise()
![Page 115: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/115.jpg)
Beijing large 121Beijing small 56
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
group_by()
pollution %>% group_by(city)
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 116: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/116.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
pollution %>% group_by(city)## Source: local data frame [6 x 3]## Groups: city#### city size amount## 1 New York large 23## 2 New York small 14## 3 London large 22## 4 London small 16## 5 Beijing large 121## 6 Beijing small 56
![Page 117: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/117.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
group_by() + summarise()
pollution %>% group_by(city) %>% summarise(mean = mean(amount), sum = sum(amount), n = n())
Beijing large 121Beijing small 56
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16
![Page 118: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/118.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
pollution %>% group_by(city) %>% summarise(mean = mean(amount), sum = sum(amount), n = n())
Beijing large 121Beijing small 56
city particlesize
amount (µg/m3)
New York large 23New York small 14
London large 22London small 16
city mean sum nNew York 18.5 37 2
London 19.0 38 2
Beijing 88.5 177 2
![Page 119: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/119.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
pollution %>% group_by(city) %>% summarise(mean = mean(amount), sum = sum(amount), n = n())
Beijing large 121Beijing small 56
city particlesize
amount (µg/m3)
New York large 23New York small 14
London large 22London small 16
city mean sum nNew York 18.5 37 2
London 19.0 38 2
Beijing 88.5 177 2
city mean sum nNew York 18.5 37 2
Beijing 88.5 177 2London 19.0 38 2
![Page 120: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/120.jpg)
city mean sum nNew York 18.5 37 2
Beijing 88.5 177 2London 19.0 38 2
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
pollution %>% group_by(city) %>% summarise(mean = mean(amount), sum = sum(amount), n = n())
Beijing large 121Beijing small 56
city particlesize
amount (µg/m3)
New York large 23New York small 14
London large 22London small 16
![Page 121: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/121.jpg)
city mean sum nNew York 18.5 37 2
Beijing 88.5 177 2London 19.0 38 2
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
pollution %>% group_by(city) %>% summarise(mean = mean(amount), sum = sum(amount), n = n())
Beijing large 121Beijing small 56
city particlesize
amount (µg/m3)
New York large 23New York small 14
London large 22London small 16
![Page 122: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/122.jpg)
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
pollution %>% group_by(city) %>% summarise(mean = mean(amount))
city meanNew York 18.5London 19.0Beijing 88.5
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 123: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/123.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
pollution %>% group_by(size) %>% summarise(mean = mean(amount))
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
size meanlarge 55.3small 28.6
city size amount
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
![Page 124: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/124.jpg)
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
city particlesize
amount (µg/m3)
New York large 23New York small 14London large 22London small 16Beijing large 121Beijing small 56
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
ungroup()
pollution %>% ungroup()
![Page 125: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/125.jpg)
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
tb %>% group_by(country, year) %>% summarise(cases = sum(cases)) %>% ungroup()
![Page 126: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/126.jpg)
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
tb %>% group_by(country, year) %>% summarise(cases = sum(cases)) %>% ungroup()
![Page 127: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/127.jpg)
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
country year casesAfghanistan 1999 2Afghanistan 2000 2
Brazil 1999 4Brazil 2000 4China 1999 6China 1999 6
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
tb %>% group_by(country, year) %>% summarise(cases = sum(cases)) %>% ungroup()
![Page 128: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/128.jpg)
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
country year casesAfghanistan 1999 2Afghanistan 2000 2
Brazil 1999 4Brazil 2000 4China 1999 6China 1999 6
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
tb %>% group_by(country, year) %>% summarise(cases = sum(cases)) %>% summarise(cases = sum(cases))
country casesAfghanistan 4
Brazil 8China 12
![Page 129: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/129.jpg)
country year sex casesAfghanistan 1999 female 1Afghanistan 1999 male 1Afghanistan 2000 female 1Afghanistan 2000 male 1
Brazil 1999 female 2Brazil 1999 male 2Brazil 2000 female 2Brazil 2000 male 2China 1999 female 3China 1999 male 3China 2000 female 3China 2000 male 3
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
Hierarchy of information
country year casesAfghanistan 1999 2Afghanistan 2000 2
Brazil 1999 4Brazil 2000 4China 1999 6China 2000 6
country casesAfghanistan 4
Brazil 8China 12
cases24
Larger units of analysis
![Page 130: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/130.jpg)
Make new variables, with mutate().swp dA11010072A4510091A6510051A4010131A5010101A4510101swp d rA110100729.15A451009122.42A651005115.46A401013125.32A501010120.20A451010122.44
© 2014 RStudio, Inc. All rights reserved.
Recap: Information
Make groupies observations with group_by() and summarise().B l121Bs56
c paN l23Ns14L l22L s16 19.0382cpaNl23
88.51772
swpdA4010131A4510091A4510101A5010101A6510051A11010072swpdA11010072A4510091A6510051A4010131A5010101A4510101
Arrange observations, with arrange().
Extract variables and observations with select() and filter()
swp dA11010072A4510091A6510051A4010131A5010101A4510101s pA1007A1009A1005A1013A1010A1010
slides at: bit.ly/wrangling-webinar
![Page 131: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/131.jpg)
Joining data
![Page 132: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/132.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
x1 x2A 1B 2C 3
x1 x2B 2C 3D 4
+ =
bind_cols(y, z)
y z
dplyr::bind_cols()
x1 x2 x1 x2A 1 B 2B 2 C 3C 3 D 4
![Page 133: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/133.jpg)
dplyr::bind_rows()
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
x1 x2A 1B 2C 3
x1 x2B 2C 3D 4
+ =
bind_rows(y, z)
y z x1 x2A 1B 2C 3B 2C 3D 4
![Page 134: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/134.jpg)
dplyr::union()
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
x1 x2A 1B 2C 3
x1 x2B 2C 3D 4
+ =
union(y, z)
y z
x1 x2A 1B 2C 3D 4
![Page 135: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/135.jpg)
dplyr::intersect()
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
x1 x2A 1B 2C 3
x1 x2B 2C 3D 4
+ =
intersect(y, z)
y z
x1 x2B 2C 3
![Page 136: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/136.jpg)
dplyr::setdiff()
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
x1 x2A 1B 2C 3
x1 x2B 2C 3D 4
+ =
setdiff(y, z)
y z
x1 x2A 1D 4
![Page 137: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/137.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
song nameAcross the Universe John
Come Together JohnHello, Goodbye Paul
Peggy Sue Buddy
name playsGeorge sitar
John guitarPaul bass
Ringo drums
+ =
songs artists
left_join(songs, artists, by = "name")
song name playsAcross the Universe John guitar
Come Together John guitarHello, Goodbye Paul bass
Peggy Sue Buddy <NA>
dplyr::left_join()
![Page 138: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/138.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
left_join(songs, artists, by = "name")
song nameAcross the Universe John
Come Together JohnHello, Goodbye Paul
Peggy Sue Buddy
name playsGeorge sitar
John guitarPaul bass
Ringo drums
+ =
songs artistssong name plays
Across the Universe John guitarCome Together John guitarHello, Goodbye Paul bass
Peggy Sue Buddy <NA>
dplyr::left_join()
![Page 139: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/139.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
song first lastAcross the Universe John Lennon
Come Together John LennonHello, Goodbye Paul McCartney
Peggy Sue Buddy Holly
first last playsGeorge Harrison sitar
John Lennon guitarPaul McCartney bass
Ringo Starr drumsPaul Simon guitarJohn Coltranee sax
+ =
songs2 artists2
left_join(songs2, artists2, by = c("first", "last"))
song first last playsAcross the Universe John Lennon guitar
Come Together John Lennon guitarHello, Goodbye Paul McCartney bass
Peggy Sue Buddy Holly <NA>
dplyr::left_join()
![Page 140: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/140.jpg)
© 2014 RStudio, Inc. All rights reserved.
slides at: bit.ly/wrangling-webinar
song first lastAcross the Universe John Lennon
Come Together John LennonHello, Goodbye Paul McCartney
Peggy Sue Buddy Holly
first last playsGeorge Harrison sitar
John Lennon guitarPaul McCartney bass
Ringo Starr drumsPaul Simon guitarJohn Coltrane sax
+ =
songs2 artists2
left_join(songs2, artists2, by = c("first", "last"))
song first last playsAcross the Universe John Lennon guitar
Come Together John Lennon guitarHello, Goodbye Paul McCartney bass
Peggy Sue Buddy Holly <NA>
dplyr::left_join()
![Page 141: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/141.jpg)
left
© 2014 RStudio, Inc. All rights reserved.
left_join()
left_join(songs, artists, by = "name")
song nameAcross the Universe John
Come Together JohnHello, Goodbye Paul
Peggy Sue Buddy
name playsGeorge sitar
John guitarPaul bass
Ringo drums
+ =
songs artistssong name plays
Across the Universe John guitarCome Together John guitarHello, Goodbye Paul bass
Peggy Sue Buddy <NA>
slides at: bit.ly/wrangling-webinar
![Page 142: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/142.jpg)
inner
© 2014 RStudio, Inc. All rights reserved.
inner_join(songs, artists, by = "name")
song nameAcross the Universe John
Come Together JohnHello, Goodbye Paul
Peggy Sue Buddy
name playsGeorge sitar
John guitarPaul bass
Ringo drums
+ =
songssong name plays
Across the Universe John guitarCome Together John guitarHello, Goodbye Paul bass
inner_join()artists
slides at: bit.ly/wrangling-webinar
![Page 143: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/143.jpg)
semi
© 2014 RStudio, Inc. All rights reserved.
semi_join(songs, artists, by = "name")
song nameAcross the Universe John
Come Together JohnHello, Goodbye Paul
Peggy Sue Buddy
name playsGeorge sitar
John guitarPaul bass
Ringo drums
+ =
songssong name
Across the Universe JohnCome Together JohnHello, Goodbye Paul
semi_join()artists
slides at: bit.ly/wrangling-webinar
![Page 144: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/144.jpg)
anti
© 2014 RStudio, Inc. All rights reserved.
anti_join(songs, artists, by = "name")
song nameAcross the Universe John
Come Together JohnHello, Goodbye Paul
Peggy Sue Buddy
name playsGeorge sitar
John guitarPaul bass
Ringo drums
+ =
songssong name
Peggy Sue Buddy
anti_join()artists
slides at: bit.ly/wrangling-webinar
![Page 145: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/145.jpg)
© 2014 RStudio, Inc. All rights reserved.
Recap: Best format for analysisVariables in columns
F M A
Observations in rowsF M A
Separate all variables implied by law, formula or goal
wwwwwwA1005A1013A1010A1010wp p110100710074510091009
Unit of analysis matches the unit of analysis implied by law, formula or goal
co y s nAf1999f 1Af1999m 1Af2000f 1Af2000m 1Br1999f 2Br1999m 2Br2000f 2Br2000m 2Ch1999f 3Ch1999m 3Ch2000f 3Ch2000m 3
co y nAf19992Af20002Br19994Br20004C19996C20006
c nAf 4Br 8C 12n24
Single tablec e i tA 0$1000000.04A 0$1000000.04B 0$1000000.12B 0$1000000.12C 0$1000000.50C 0$1000000.50
slides at: bit.ly/wrangling-webinar
![Page 146: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/146.jpg)
How to learn more
![Page 147: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/147.jpg)
© 2014 RStudio, Inc. All rights reserved.
http://www.rstudio.com/resources/cheatsheets/
slides at: bit.ly/wrangling-webinarslides at: bit.ly/wrangling-webinar
![Page 148: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/148.jpg)
© 2014 RStudio, Inc. All rights reserved.
dplyr and moreFour courses that teach dplyr, ggvis, rmarkdown, and the RStudio IDE.Video lessonsLive coding environmentInteractive practice(~4 hrs worth of content for dplyr)
www.datacamp.com/tracks/rstudio-trackDataCamp
slides at: bit.ly/wrangling-webinar
![Page 149: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/149.jpg)
© 2014 RStudio, Inc. All rights reserved.
Data Science with RR’s tools for data science. Reshape2, dplyr, and ggplot2 packages.• Tidy data• Data visualization and
customizing graphics• Statistical modeling with R
bit.ly/intro-to-data-science-with-R
slides at: bit.ly/wrangling-webinar
![Page 150: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/150.jpg)
© 2014 RStudio, Inc. All rights reserved.
Expert Data ScienceComing Spring 2015
• Foundations of Data Science• tidyr• dplyr• ggvis
?
slides at: bit.ly/wrangling-webinar
![Page 151: wrangling-webinar](https://reader033.fdocuments.in/reader033/viewer/2022051316/55cf8a8b55034654898b98a9/html5/thumbnails/151.jpg)
© 2014 RStudio, Inc. All rights reserved.
Slides at: bit.ly/wrangling-webinarData Wrangling with R
Thank you