Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 ·...

34
1 Citi Bike Modeling the Relationship between Earned Media Activity and Service Engagement Allyson Hugley TAMU Analytics 2017 March 2017

Transcript of Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 ·...

Page 4: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Executive Summary: PR Industry Challenge

INDUSTRY SITUATION

The PR industry is under increasing scrutiny to use

more sophisticated performance analytics

Use of modeling techniques is hindered by lack of

access to business outcome data (e.g., sales data).

BUSINESS QUESTION

With access to business outcome data, can models

be developed to quantify the contribution of PR

activities to business outcomes?

4

Page 5: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Executive Summary: Citi Bike Project

PROJECT OVERVIEW

Test the potential for developing models to evaluate

the impact of earned media

Citi Bike was identified as suitable for model

development activities

• Outcome data availability

• News coverage data availability

AGENCY BUSINESS VALUE

This project was designed to advance thinking around

media performance model development

5

Page 6: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Executive Summary: Project Focus

BRAND SITUATION

Citi Bike is a privately owned public bicycle sharing system that

serves parts of New York City.

It is the largest bike sharing program in the United States.

Sponsored by Citigroup and designed to carry the Citibank logo.

It is estimated that in the first year of operations the bank netted

$4.4 million worth of earned media.

However, no relationship between earned media (i.e., news

coverage) and use of Citi Bike services has been established

BUSINESS QUESTIONS

What substantive role, if any, does earned media play in driving

subscriptions to and use of Citi Bike services in New York?

Which modeling techniques are most appropriate for quantifying

and forecasting earned media impact?

6

Page 7: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Executive Summary: Key Findings

Oct 2014 – Sept 2016

22,440,823TRIPS

75,713ANNUAL SUBSCRIPTIONS

4,399Online News Articles

1,458,800,934IMPRESSIONS

7

Earned media output variables (impressions) were found to have a relationship to service

usage when both time series and regression modeling techniques were employed

Page 9: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Citi Bike MECE Tree

9

VA

RIA

BLE

S

OUTCOME VARIABLES

PRIMARY

DAILY TRIPS (USE)

ANNUAL SUBSCRIPTIONS

PREDICTOR VARIABLES

ONLINE NEWS

TOTAL DAILY

ARTICLES (#)

TOTAL DAILY

IMPRESSIONS (#)

DAILY ARTICLES

BY SENTIMENT

POSITIVE

NEUTRAL

NEGATIVE

DAILY IMPRESSIONS

BY SENTIMENT

POSITIVE

NEUTRAL

NEGATIVE

WEATHER CONDITIONS

PRECIPITATION (IN)

SNOWFALL (IN)

SNOWDEPTH

MAX TEMPERATURE

MIN TEMPERATURE

AVG TEMPERATURE

Page 10: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Data Sources and Aggregation Process

10

DATA SOURCES

Citi Bike Transaction Data (business outcomes)https://www.citibikenyc.com/system-data

Sysomos Online News Data (earned media coverage)https://sysomos.com/

SimilarWeb News Source Site Traffic Data (impressions)https://www.similarweb.com/

National Oceanic and Atmospheric Administration - NOAA (weather)http://www.noaa.gov/

DATA AGGREGATION

Citi Bike daily ridership and membership data are released quarterly.

Files for Oct 2014 – Sept 2016 were downloaded and integrated into a

single data set.

Earned media articles for Oct 2014 – Sept 2016 (automatically scored

for sentiment) were obtained from Sysomos media monitoring service.

Each article was manually appended with impressions data from

SimilarWeb, aggregated by date and appended to the Citi Bike

ridership/membership file.

Weather data (e.g., precipitation, temperature) from NOAA was

integrated into the ridership/membership data file based on date fields.

Page 11: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Data Review and Cleaning

EXPLORATORY DATA ANALYSIS

SAS Enterprise Miner was used to perform exploratory data analysis to check for missing values and

data consistency issues.

• No missing values were identified for variables critical to the modeling work

• All values fell within acceptable ranges

• No unusual data points were identified

11

Page 12: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Data Collection Modifications

EXPAND DATA INPUTS

Variables that could be explored in future media impact research, could include:

• Earned media quality/engagement – inclusion of multi-media, news source tier, page views

• Paid media/advertising impressions, spend, format (e.g., video, display ad)

• Bike availability – number of bikes available for use

• Transportation option data – buses, taxis, subways in use/available daily

• Discounts and promotions

12

Page 14: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Modeling Techniques Employed

MULTI-VARIATE ANALYSIS (JMP)

Principle Component Analysis Understand underlying structures in the data set

BASIC FIT ANALYSIS (JMP)

Bivariate Fit Model Understand relationship between earned media impressions and Citi Bike usage

TIMESERIES MODELING (JMP/SAS)

Seasonal ARIMA Understand factors influencing service use – including media and weather

REGRESSION MODEL WITH AUTOCORRELATED ERRORS (JMP/SAS)

Regression with ARMA errors (AR(1)) Understand strength of predictor variables including media outputs (impressions) on service engagement (use)

14

Page 15: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Multivariate Analysis - Principle Component Analysis

15

PCA ANALYSIS

This analytic technique was used to

identify initial structures in the data

Weather – temperature events

(Prin1 and Prin2)

Negative media outcomes

contributed to the structure

(Prin3)

Precipitation events – rain and

snow (Prin4)

Page 16: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Data Transformation and Basic Fit Model

16

FIT LOG USAGE BY LOG

IMPRESSIONS

Data for daily trip volume and daily

impressions were log transformed and a

simple bi-variate fit analysis was

executed to determine the potential

relationship between these variables.

Outcomes suggest that every 10%

increase in impressions is associated

with a 1.3% increase in Citi Bike service

usage.

Fit statistics also suggested that earned

media impressions alone accounted for a

small amount of change in service use.

Page 17: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

Time Series Modeling

17

OUTCOME SELECTION - DAILY USAGE

The time series modeling was limited to analysis and forecast modeling for daily service use (trips in

past 24 hours). A valid time series model based on subscription data was not achieved; the data

were not stationary.

The daily trip data required differencing to account for trends and seasonality as a first step.

Page 18: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

TIME SERIES MODELING

18

MODEL SELECTION • Three valid models (stationary, invertible, parsimonious) were analyzed further accounting for

outliers and level shifts using SAS.

• Seasonal ARIMA (1,1,1)(0,1,1)7

• Seasonal ARIMA (1,1,2)(0,1,1)7

• Seasonal ARIMA (1,1,2)(0,1,2)7 = (Best Model)

Page 20: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

20

SEASONAL ARIMA (1,1,2)(0,1,2)7 - MODEL VALIDATION IN SASJMP Initial SBC: 14740.817

SAS 59 outliers/level shifts identified (8% of observations)

SBC: 14364.05 (improved, vs. SAS Model 1 (14374.67) and SAS Model 2 (14372.31) accounting for outliers and level shifts )

TIME SERIES MODELING

Page 21: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

21

SEASONAL ARIMA (1,1,2)(0,1,2)7 -FURTHER MODEL DEVELOPMENT AND VALIDATION IN SASJMP Initial SBC: 14740.817

SAS 74 outliers/level shifts identified (10.2% of observations, with seven rows withheld for forecasting)

SBC: 13920.45 (improved over initial model excluding weather data – SBC:14364.05); white noise also

significantly lowered)**Media variables were tested, but ultimately excluded form the model due to insignificance**

TIME SERIES MODELING

Weather Variables Added (3): PRCP, SNWD, TEMP_MAX

Page 22: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

22

TIME SERIES MODELING

SEASONAL ARIMA (1,1,2)(0,1,2)7 -- ANALYSIS OF OUTLIERS AND LEVEL SHIFTS (74)

Outliers and level shifts tended to be associated with weather events:

Heavy rain, >1” per day

Snow and snow events (e.g., 2016 blizzard)

Cascading weather events, declining temperatures, rising temperatures, rain events that span several days

Holiday events were also consistently associated with outliers and level shifts

• Holidays and holiday periods were associated with low level outliers and level-shifts – Christmas, New Year’s,

Thanksgiving, Good Friday

Page 23: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

23

TIME SERIES MODELING

SEASONAL ARIMA (1,1,2)(0,1,2)7 -- ANALYSIS OF ESTIMATES

For each unit increase in precipitation (inches), usage of Citi Bike fell by approximately 5,169 users

For each unit increase in snow depth (inches), usage of Citi Bike fell by about 468 users

Each unit increase in daily max temperature resulted in 218 additional users

Page 24: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

24

TIME SERIES MODELING

SEASONAL ARIMA (1,1,2)(0,1,2)7 -- FORECAST ANALYSIS

The forecast and actual service use estimates produced by the model were relatively close with the

forecast usage range being just under 15,000 (14,690)

Forecast estimates generally fell within range- with forecast usage for the seven hold cases being on

average about 6,469 above actual levels

Page 25: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

25

REGRESSION MODEL WITH AUTOCORRELATION ERRORS

AGGREGATE AND LOG TRANSFORM VARIABLES TO ACHIEVE STABILITY

Aggregated daily media and usage data into weekly intervals

Log transformed the summed data to achieve stability

Ran time series for each to confirm stability (no consistent increases in values over time)

USE CROSS CORRELATION FUNCTION TO DETERMINE SIGNIFICANT LAGS

Lag 6 was identified as the most significant lag for use in the model to represent earned media

outputs (impressions)

Page 26: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

26

REGRESSION MODEL WITH AUTOCORRELATION ERRORS

DEVELOP A REGRESSION MODEL AND IDENTIFY TIME SERIES MODEL FOR ERRORS

Valid model was developed in JMP using variables for both weather, earned media (including lags) and

time of the year (e.g., First Week of the Year)

Executed time series model on the regression model residuals to determine time series model for the

errors (AR(1)).

Page 27: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

27

REGRESSION MODEL WITH AUTOCORRELATION ERRORS

REFIT THE REGRESSION MODEL IN SAS, WITH ARMA ERRORS AND ACCOUNTING

FOR OUTLIERS AND LEVEL SHIFTS

AR(1), IDENTIFY VAR = Residual_Log_Sum_Trips_Past_24_

CROSSCORR= (MEAN_SNWD_ MEAN_TEMP_MAX_ XLAG6 YEAR AO14 AO13 AO68 AO15

AO72 LS53 AO67 AO66)

Page 29: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

29

REGRESSION MODEL WITH AUTOCORRELATION ERRORS

ANALYSIS OF OUTLIERS, LEVEL SHIFTS, PREDICTORS

Holiday periods (Christmas, New Year’s) and weather events (blizzard of 2016) were associated with

significant declines in service use.

Expansion to the outer boroughs (Bedford-Stuyvesant, Brooklyn) and Jersey City was associated

with a significant level shift (LS53).

For every 1% increase in snow (inch) or temperature (degrees Fahrenheit) the volume of service is

predicted to decreases by 5.22% and .82%, respectively.

Every 1% increase in earned media impressions is predicted to increase service use by 2.4%.

Variable Wk/Yr Time Period/Event Type %Change

AO13 51/2014 Holiday Season Additive Outlier -55.34%

AO14 52/2014 Holiday Season Additive Outlier -73.07%

AO15 1/2015 Post New Year Additive Outlier -131.69%

LS53 39/2015 Bedford-Stuyvestant Expansion Level Shift 35.31%

AO66 52/2015 Holiday Season Additive Outlier -46.38%

AO67 53/2015 Holiday Season Additive Outlier -57.58%

AO68 1/2016 Holiday Season Additive Outlier -163.27%

AO72 5/2016 Jan 2016 Blizzard Additive Outlier -45.63%

Variable %Change

MEAN_SNWD -5.22%

MEAN_TEMP_MAX -0.82%

xLAG6 (IMPRESSIONS) 2.40%

YEAR 0.02%Calendar Year (2014, 2015, 2016)

OUTLIERS AND LEVEL SHIFTS

PREDICTOR VARIABLES

Description

Average Snow Depth

Average Max Temperature

Impressions Exposure; 6 week lag

Page 31: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

CONCLUSIONS

PROPOSED BUSINESS SOLUTIONS

More sophisticated modeling techniques can be

applied to media relations and client outcome data.

However, earned media activities (news coverage

impressions) are apt to have potentially weaker

associations with business outcomes than

environmental factors (e.g., weather, economic

conditions).

NEXT ACTIONS

Leverage this analysis to champion for education and

acquisition of additional data sets within Weber

Shandwick to better account for environmental factors

when developing models.

Identify opportunities for application of modeling

techniques to advance client work.

31

Page 32: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

PROJECT IMPACT

MAJOR CHALLENGES

Time required to manually code and aggregate the

media data at daily and weekly levels.

KEY INSIGHTS/LEARNING

Implementation of modeling techniques at scale

would require significant resources to support media

data coding and aggregation, or some process

automation would need to be developed.

The impact of earned media is likely to be small

relative to other factors, so we need to be prepared to

message that effectively to clients.

IMPACT ON WORK/ORGANIZATION

This work establishes a foundation for furthering

discussions around the types of data and skill sets

required to develop valid models to evaluate the

impact of earned media on business outcomes.

32

Page 33: Citi Bike - Texas A&Monline.stat.tamu.edu/dist/analytics/capstone/tl2.pdf · 2017-06-08 · Executive Summary: Project Focus BRAND SITUATION Citi Bike is a privately owned public

MS PROGRAM IMPACT

IMPACT FROM MS ANALYTICS PROGRAM

Experience with a range of modeling techniques and tools has

broadened my perspective on approaches to evaluating

communications performance.

PROFESSIONAL DEVELOPMENT GAINED

Exposure to the practical application of a range of tools,

techniques and coding languages to solve business problems.

Foundation in modeling methods to inform and advance

discussions with data vendors and platform partners.

Insight into the tools and skill sets specific to data modeling that

should be incorporated into the agency’s recruiting and

professional development plans.

Improved understanding of quality controls and validation

processes that should be incorporated into the agency’s

burgeoning modeling capabilities.

33