Web Analytics with R

32
Web Analytics with R Johann de Boer

description

The aim of this presentation is to: 1. Encourage web data analysts to use R instead of spreadsheets. 2. Help those wanting to learn R to get started. 3. Build interest amongst the R community in developing packages for web analytics. The presentation briefly discusses what Web Analytics is, why R should be used instead of spreadsheets for web data analysis, ways to learn R, and how to put R to practice in web analytics. In this presentation Johann shares his experience in creating his first open-source R package, ganalytics, used for accessing Google Analytics data. Reflecting on his journey to date in learning R, Johann gives tips to newcomers in helping them succeed in using R for their day to day work and in creating their own packages. A demonstration of ganalytics is included along with an invitation to the community to get involved in its future development. The R script used in the demonstration can be located at the following gist: https://gist.github.com/jdeboer/6569077

Transcript of Web Analytics with R

Page 1: Web Analytics with R

Web Analytics with RJohann de Boer

Page 2: Web Analytics with R

Purpose of this presentation

1. Encourage web data analysts to move to R and away from Excel!

2. Help those wanting to learn R to get started.3. Build interest amongst the R community in

developing packages for web analytics.

Page 3: Web Analytics with R

What is web analytics?Measurement and analysis of (aggregated)

internet data for purposes of optimising website usage.

Page 4: Web Analytics with R

Web analytics data

Page 5: Web Analytics with R

Dimensions and metrics

Page 6: Web Analytics with R

Web analytics dataDimensions Metrics

User Custom dimensions, eg. existing customer flag

Unique users, avg revenue per user

Device Browser and OS, isTablet, isMobile

Unique devices, total visits

Session Traffic source, date and time of visit, landing page

Time on site, pageviews per visit, goal completions

Hit Page URL, page title, event type, product name

Time on page, page loading time, transaction amount

Page 7: Web Analytics with R

MetricsUnique Visitors, New Visits, % New Visits, Visits, Bounces, Bounce Rate, Bounce Rate, Visit Duration, Avg. Visit Duration, Organic Searches, Impressions, Clicks, Cost, CPM, CPC, CTR, Cost per Transaction, Cost per Goal Conversion, Cost per Conversion, RPC, ROI, Margin, Goal 1 Starts, Goal Starts, Goal 1 Completions, Goal Completions, Goal 1 Value, Goal Value, Per Visit Goal Value, Goal 1 Conversion Rate, Goal Conversion Rate, Goal 1 Abandoned Funnels, Abandoned Funnels, Goal 1 Abandonment Rate, Total Abandonment Rate, Data Hub Activities, Page Value, Entrances, Entrances / Pageviews, Pageviews, Pages / Visit, Unique Pageviews, Time on Page, Avg. Time on Page, Exits, % Exit, Results Pageviews, Total Unique Searches, Results Pageviews / Search, Visits with Search, % Visits with Search, Search Depth, Search Depth, Search Refinements, % Search Refinements, Time after Search, Time after Search, Search Exits, % Search Exits, Goal 1 Conversion Rate, Goal Conversion Rate, Per Search Goal Value, Page Load Time (ms), Page Load Sample, Avg. Page Load Time (sec), Domain Lookup Time (ms), Avg. Domain Lookup Time (sec), Page Download Time (ms), Avg. Page Download Time (sec), Redirection Time (ms), Avg. Redirection Time (sec), Server Connection Time (ms), Avg. Server Connection Time (sec), Server Response Time (ms), Avg. Server Response Time (sec), Speed Metrics Sample, Document Interactive Time (ms), Avg. Document Interactive Time (sec), Document Content Loaded Time (ms), Avg. Document Content Loaded Time (sec), DOM Latency Metrics Sample, Screen Views, Screen Views, Unique Screen Views, Unique Screen Views, Screens / Session, Screens / Session, Time on Screen, Avg. Time on Screen, Total Events, Unique Events, Event Value, Avg. Value, Visits with Event, Events / Visit, Transactions, Ecommerce Conversion Rate, Revenue, Average Value, Per Visit Value, Shipping, Tax, Total Value, Quantity, Unique Purchases, Average Price, Product Revenue, Average QTY, Local Revenue, Local Shipping, Local Tax, Local Product Revenue, Social Actions, Unique Social Actions, Actions Per Social Visit, User Timing (ms), User Timing Sample, Avg. User Timing (sec), Exceptions, Exceptions / Screen, Crashes, Crashes / Screen, Custom Metric Value

DimensionsVisitor Type, Count of Visits, Days Since Last Visit, User Defined Value, Visit Duration, Referral Path, Full Referrer, Campaign, Source, Medium, Source / Medium, Keyword, Ad Content, Social Network, Social Source Referral, Ad Group, Ad Slot, Ad Slot Position, Ad Distribution Network, Query Match Type, Matched Search Query, Placement Domain, Placement URL, Ad Format, Targeting Type, Placement Type, Display URL, Destination URL, AdWords Customer ID, AdWords Campaign ID, AdWords Ad Group ID, AdWords Creative ID, AdWords Criteria ID, Goal Completion Location, Goal Previous Step - 1, Goal Previous Step - 2, Goal Previous Step - 3, Browser, Browser Version, Operating System, Operating System Version, Mobile (Including Tablet), Tablet, Mobile Device Branding, Mobile Device Model, Mobile Input Selector, Mobile Device Info, Mobile Device Marketing Name, Device Category, Continent, Sub Continent Region, Country / Territory, Region, Metro, City, Latitude, Longitude, Network Domain, Service Provider, Flash Version, Java Support, Language, Screen Colors, Screen Resolution, Endorsing URL, Display Name, Social Activity Post, Social Activity Timestamp, Social User Handle, User Photo URL, User Profile URL, Shared URL, Social Tags Summary, Originating Social Action, Social Network and Action, Hostname, Page, Page path level 1, Page path level 2, Page path level 3, Page path level 4, Page Title, Landing Page, Second Page, Exit Page, Previous Page Path, Next Page Path, Page Depth, Site Search Status, Search Term, Refined Keyword, Site Search Category, Start Page, Destination Page, App Installer ID, App Version, App Name, App ID, Screen Name, Screen Depth, Landing Screen, Exit Screen, Event Category, Event Action, Event Label, Transaction, Affiliation, Visits to Transaction, Days to Transaction, Product SKU, Product, Product Category, Currency Code, Social Source, Social Action, Social Source and Action, Social Entity, Social Type, Timing Category, Timing Label, Timing Variable, Exception Description, Experiment ID, Variation, Custom Dimension , Custom Variable (Key 1), Custom Variable (Value 01), Date, Year, Month of the year, Week of the year, Day of the month, Hour, Month, Week, Day, Day of week, Day of week name, Hour of Day, Month of Year, Week of Year, ISO week of the year

265 dimensions and metrics in Google Analytics and growing!

Page 8: Web Analytics with R

Google Analytics

Page 9: Web Analytics with R

Source: Charles Farina, e-nor.com blog, Published 9 July 2012

The Web Analytics market

Page 10: Web Analytics with R

Google Analytics - now Universal

Page 11: Web Analytics with R

Google Analytics APIs

● Data collection○ Collection APIs and SDKs

● Data extraction○ Core Reporting API

■ Metadata API

○ Real-time Reporting API

○ Multi-Channel Funnels Reporting API

Page 12: Web Analytics with R

Why use R for web analytics?

Page 13: Web Analytics with R

R is free, open source and popular!

Page 14: Web Analytics with R

Spreadsheets are rigid and fragile

Page 15: Web Analytics with R

R is agile and robust

Page 16: Web Analytics with R

7 reasons to use R instead of Excel1. Captures each step in your analysis2. Makes it easier to review and fix your

mistakes3. Easy to reproduce and reuse analyses4. Join datasets from multiple sources5. Powerful ways to analyse and visualise your

data6. Automate retrieval of your data via the Core

Reporting API7. Saves time!

Page 17: Web Analytics with R

Learning and using R

Page 18: Web Analytics with R

In the beginning...

Page 19: Web Analytics with R
Page 20: Web Analytics with R

plyrggplot2lubridatereshape2devtoolshttrroxygen2markdowngit (version control)

Recommended tools and packages

Page 21: Web Analytics with R

Google Analytics packages for R

● r-google-analytics○ By Google but stopped working for a long time

● rga○ By Bror Skardhamar, popular and active

● ganalytics○ Written by me to create an abstraction from the Core

Reporting API protocol

Page 22: Web Analytics with R

ganalyticsAutomate extraction ofGoogle Analytics data

Page 23: Web Analytics with R

Make querying GA data from R an easy and interactive experience

● Queries are manipulated on the fly● Defining filter and segmentation expressions

is easy● Checks queries for errors before sending

○ corrects them automatically in some cases too!

● Creates a level of abstraction from the Core Reporting API - easier to extend functionality

Page 24: Web Analytics with R

Query expressions

ga:keyword@=buy(search traffic keywords containing “buy”)

A single expression comprises of:● a variable - a dimension or metric● an operator - e.g. equals, contains, regular

expression, greater than, does not equal, ...● an operand - a number (metric) or a

character string (dimension)

Page 25: Web Analytics with R

Combining expressions

● Expressions can be joint using OR and AND.● OR takes precedence over AND always, and

expressions cannot be grouped.ga:keyword@=buy;ga:city=~^(Sydney|Melbourne)$,ga:isTablet==Yes(search traffic keywords containing “buy” AND [city is [Sydney OR Melbourne] OR via a tablet])

Page 26: Web Analytics with R

Writing expressions with ganalytics● Filter to pass to core reporting API

ga:keyword@=buy;ga:city=~^(Sydney|Melbourne)$,ga:isTablet==Yes

● Using ganalytics to write thisGaAnd( GaExpr('keyword', '@', 'buy'), GaOr( GaExpr('city', '~', '^(Sydney|Melbourne)$'), GaExpr('isTablet', '=', ‘Yes’) ))

Page 27: Web Analytics with R

ganalytics Demo

gist.github.com/jdeboer/6569077

Page 28: Web Analytics with R

How does traffic from desktop, mobile and tablet users change

throughout the day and over the week?

Average number of visits per hour and day - split by desktop, mobile and tablet

Page 29: Web Analytics with R

R + ggplot2 + plyr + ganalytics =

Page 30: Web Analytics with R

Get involved!Open source R package development is fun!

Page 31: Web Analytics with R

Package development

● Use RStudio with Git version control○ Open a free GitHub account○ Use Roxygen2 for generating your documentation

and NAMESPACE file○ RStudio integrates with Git, Roxygen2 and RTools to

make the package build process easy

● Hadley Wickham is a great help○ devtools package - great for installing straight from a

GitHub repository○ read his online book “Advanced R Programming” -

easy to follow package development steps

Page 32: Web Analytics with R

Learn more...

● Google Analytics: #ganalytics○ Video lessons: google.com.au/analytics/iq.html○ Reference: developers.google.com/analytics

● Learn R: #rstats○ Presciient: presciient.com/courses○ Code School: tryr.codeschool.com○ Coursera: coursera.org/course/compdata○ Intro to R videos by Google: t.co/FQ8DvZEdRW

● Package development: adv-r.had.co.nz● ganalytics: github.com/jdeboer/ganalytics● Follow me on Twitter: @johannux