AIMIA The Connected Consumer Conference | MashUp CX on Customer Experience
R training at Aimia
-
Upload
ali-arsalan-kazmi -
Category
Data & Analytics
-
view
142 -
download
0
Transcript of R training at Aimia
R INTRODUCTION COURSEBasics of Data Analysis and Visualisation in R
Ali Arsalan Kazmi
STRUCTURE FOR THE SESSION
For Discussion For Practical work
1. Introduction
2. Fundamentals
3. Data Import and Export in R
4. Data Analysis and Manipulation
5. Data Visualisation
ROADMAP
Each section contains:
1. Subsections
2. Some Theory
3. Practical work
INTRODUCTION
• Initial Thoughts on R
• GNU Free
• Data Analysis and Superior Visualisation
• Burgeoning community of useRs
• #6 in IEEE 2015 Top Programming Languages
• Integration of R in SQL Server 2016
A BIT ABOUT R
• Your first impression about R?
• What do you already know about R?
• Initial Thoughts on R
• GNU Free
• Data Analysis and Superior Visualisation
• Burgeoning community of useRs
• #6 in IEEE 2015 Top Programming Languages
• Integration of R in SQL Server 2016
A BIT ABOUT R
• Four (essential) freedoms granted
• Share the spirit
• Initial Thoughts on R
• GNU Free
• Data Analysis and Superior Visualisation
• Burgeoning community of useRs
• #6 in IEEE 2015 Top Programming Languages
• Integration of R in SQL Server 2016
A BIT ABOUT R
• Clustering – Sophisticated and others
• Supervised Learning
• Deep Learning
• Integration with Hadoop, Spark, Storm
• Many more
A BIT ABOUT R
A BIT ABOUT R
A BIT ABOUT R
A BIT ABOUT R
• Initial Thoughts on R
• GNU Free
• Data Analysis and Superior Visualisation
• Burgeoning community of useRs
• #6 in IEEE 2015 Top Programming Languages
• Integration of R in SQL Server 2016
A BIT ABOUT R
• Currently: 7,284 packages
• Strong presence on the web
• R Consortium
• Google, Ebay, Facebook, NYT, etc.
• Initial Thoughts on R
• GNU Free
• Data Analysis and Superior Visualisation
• Burgeoning community of useRs
• #6 in IEEE 2015 Top Programming Languages
• Integration of R in SQL Server 2016
A BIT ABOUT R
• Link: http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages
• Ranked along with the general purpose languages
• Initial Thoughts on R
• GNU Free
• Data Analysis and Superior Visualisation
• Burgeoning community of useRs
• #6 in IEEE 2015 Top Programming Languages
• Integration of R in SQL Server 2016
A BIT ABOUT R
• Link: http://blog.revolutionanalytics.com/2015/05/r-in-sql-server.html
FUNDAMENTALS
• Data Types
• Data Structures
• Control Structures
• Functions
FUNDAMENTALS
• Think the commonly used data types for Stats
• In R: Numeric/Double; Integer; Logical; Character; Factor
• Many more
• Data Types
• Data Structures
• Control Structures
• Functions
FUNDAMENTALS
• How to store data? Logico-Computational considerations…
• In R: Atomic vectors; Lists; Matrices and Arrays; Dataframes
• Data Types
• Data Structures
• Control Structures
• Functions
FUNDAMENTALS
• Control the flow of a programme’s/function’s logic
• If; IfElse; For; While; Repeat
• Data Types
• Data Structures
• Control Structures
• Functions
FUNDAMENTALS
• “Every process in R is the result of a Function call” – John Chambers
• “Everything in R is an R object” – John Chambers
• Modularise; Customise; Optimise; Automate
• Transition from a useR to a programmeR (and on to a developeR)
PRACTICAL SESSION
DATA I/O
• Sources for Data
• Types of Data
• Base R for I/O
• Packages for Data Import
DATA I/O
• Online Sources: Web; APIs; Dropbox; GitHub
• Offline Sources: Databases; flat files; zipped files
• Sources for Data
• Types of Data
• Base R for I/O
• Packages for Data Import
DATA I/O
• .txt; .csv; .xlsx; .Rdata
• .html; .json; xml
• .xpt (SAS); .sav (SPSS); .dta (Stata)
• Sources for Data
• Types of Data
• Base R for I/O
• Packages for Data Import
DATA I/O
• You can use base R to read a variety of data
• Can be slow with large data
• For exotic file types, use dedicated packages
• Sources for Data
• Types of Data
• Base R for I/O
• Packages for Data Import
DATA I/O
• readr – fast import for .txt, .csv
• readxl – fast import for .xlsx
• R-commander for GUI-based import
PRACTICAL SESSION
DATA MANIPULATION & ANALYSIS
• Subsetting Data
• Split-Apply-Combine
• Merging Data
• sqldf
• dplyr
• R-commander
DATA MANIPULATION & ANALYSIS
• Subsetting ≡ SELECT & WHERE in SQL
• Subset operators: [, [[, $
• Numeric or logical indexes are used to subset data
• Subsetting Data
• Split-Apply-Combine
• Merging Data
• sqldf
• dplyr
• R-commander
DATA MANIPULATION & ANALYSIS
• Split a collection of data, Apply a function to each partition, Combine the result and present
• Collection ≡ data structure
• Splitting is different for data structures and data types
• Combination is different for data structures
• Subsetting Data
• Split-Apply-Combine
• Merging Data
• sqldf
• dplyr
• R-commander
DATA MANIPULATION & ANALYSIS
• Merge ≡ JOINs in SQL
• Dataframes’ specific
• Subsetting Data
• Split-Apply-Combine
• Merging Data
• sqldf
• dplyr
• R-commander
DATA MANIPULATION & ANALYSIS
• Link: https://cran.r-project.org/web/packages/sqldf/sqldf.pdf
• Write SQL in R
• Dataframes’ specific
• Limited to Data analysis and manipulation operations
• Subsetting Data
• Split-Apply-Combine
• Merging Data
• sqldf
• dplyr
• R-commander
DATA MANIPULATION & ANALYSIS
• Link: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
• Expressive for most data manipulation
• Very efficient
• Consistent coding
• Directly connect with some RDBMS
• Subsetting Data
• Split-Apply-Combine
• Merging Data
• sqldf
• dplyr
• R-commander
DATA MANIPULATION & ANALYSIS
• GUI
• Can assist in learning R
PRACTICAL SESSION
VISUALISATION
• Grammar of Graphics
• ggplot2
• Bonus: Interactive Visualisation
VISUALISATION
• Graph is formed of well-defined constituents
• Grammar enables succinct definition of constituents
• Layer(s)
• Scale(s)
• Coordinate System
• Facetting/Trellis Graphics
• Grammar of Graphics
• ggplot2
• Bonus: Interactive Visualisation
VISUALISATION
• Layer(s)
• Data
• Aesthetics (positions on x/y axes; colours, size, etc.)
• Statistical Transformation (none; Log; Squared; etc.)
• Geometric Object(s)
• Position Adjustment
• Scale(s) – control how data are mapped to each aesthetic
• Coordinate System
• Facetting/Trellis Graphics
• Grammar of Graphics
• ggplot2
• Bonus: Interactive Visualisation
VISUALISATION
• Graph is formed of well-defined constituents
• Grammar enables succinct definition of constituents
• Insights into graphs’ structure
• Encourages Creativity
• Grammar of Graphics
• ggplot2
• Bonus: Interactive Visualisation
VISUALISATION
• Link: http://docs.ggplot2.org/current/
• An implementation of (layered) Grammar of Graphics
• Elegant graphics
• Typical Stat graphs + more exotic graphs
• Works with dataframes
• Static graphics
• Grammar of Graphics
• ggplot2
• Bonus: Interactive Visualisation
VISUALISATION
• Intended for the Web – HTML files
• Mostly based on D3 – Data Driven Documents
• Based on contributed packages
• Some under active development
• Not limited to dataframe datasets
PRACTICAL SESSION