Close Encounter with Data Science 2d

32
Close Encounter with Data Science April 2015 Geoff Yuen, Ph.D. VP Emerging Technology [email protected]

Transcript of Close Encounter with Data Science 2d

Close Encounter with Data Science

April 2015

Geoff Yuen, Ph.D. VP Emerging Technology [email protected]

How Google beat previous search engines ?

Aside from searched content, also use url data patterns (links)* * Eric Schmidt “How Google Works”; also see http://www.economist.com/node/3171440

An additional datatype can make a hugh difference !

Data itself is not useful, we need insights !

Don’t start by collecting everything

Jiawei Han. Abel Bliss Professor, Department of Computer Science, UIUC; “Pattern Discovery in Data

Mining” Coursera online course with 75000 students 2/2015

The data is the second most important thing

Jeff Leeks, Assistant Professor of Biostatistics, Data Science Program , John Hopkins University :

Putting massive data into a cluster may have unpredictable consequences

Big or Small – you need the right data

Telcos may not have certain data necessary to solve specific commercial problems e.g. personal products recommendation

What’s so special about telco data ?

• Voice usage

• Call Circles

• SMS / VAS

• Time of day

• Location (SGN)

Telco data sets with business potential

• Gender

• Home Address

• Plan & Service

• Devices used

• Searches

• Web browsing history

• Mobile Apps used

• Downloaded Content

• Calls

• Email / Text

• Care History

• Resolved Queries

• Billing

• CRM

• Loyalty

• Spending

• Churn

• Product Lines

• CLV

• Contracts

• Cross-sell/Upsell

• Contact history

• Campaigns

• Product offers

• Acquisition cost

Marketing

Customer Loyalty & Retention

Customer Support

*Digital Usage

* Carriage Usage

Customer Profile

Telco versus other OTT companies – data land grab

Telco Facebook Google Apple Edge

Store / Offline Transaction Yes No No No √

True Id / Demographic Yes Partial No No √

Carriage Usage including location history

Yes No No No √

All Apps Used + Usage Yes Facebook Apps Only Google Apps Only

iPhone only √

Web browsing (unencrypted)

Yes No No No √

Multi channel web browsing (mobile + PC)

Yes No No No √

Web search (encrypted)

No No Yes No X

User Uploaded Textual and Multimedia data

No Yes Yes Yes X

Digital purchase (encrypted)

No No Play Store Apple Store X

Proximity Marketing No No No ibeacon X

Offline, O2O data !

Telco Data Assets Value 1. Representative : sample size in the millions instead of thousands

2. True & Clean : authentic measured usage instead of user reported

3. Whole history : reflects when and where users do what with phones, not just snapshots (like apps)

4. Comes with Location : unlike mobile apps, GPS, WiFi

5. Multichannel : can be supplemented with data from other services e.g. broadband, nowTV

6. Mobile mediated store purchase events (e.g. Apple Pay) can be logged (but not content)

7. Proximity data capture still at large

Mobile Usage and Location Insight as unique data type

• Voice usage

• Call Circles

• SMS / VAS

• Time of day

• Location (SGN)

Mobile Carriage and Digital Usage as unique data type

• Gender

• Home Address

• Plan & Service

• Devices used

• Searches

• Web browsing history

• Mobile Apps used

• Downloaded Content

• Calls

• Email / Text

• Care History

• Resolved Queries

• Billing

• CRM

• Loyalty

• Spending

• Churn

• Product Lines

• CLV

• Contracts

• Cross-sell/Upsell

• Contact history

• Campaigns

• Product offers

• Acquisition cost

Marketing

Customer Loyalty & Retention

Customer Support

*Digital Usage

* Carriage Usage

Customer Profile

Location Insights Digital marketing involves targeting customers through their mobile. Location

Insights provides behavioral and demographic profiles of crowds to make

decisions more precise and contextual

39

%

39

2G Network

3G Network

900 MHz

1800 MHz

2100 MHz

2013 4G Network

NETWORK DATA

The O2 mobile network has hundreds of cells to measure the trends in

footfall across the country

39

%

39

Easier to use

Further protecting

anonymity

Extrapolated to

represent local

population

200 x 200 GRID

Footfall is rendered into 200 x 200 metre grid squares across the country

Drilling into footfalls demographics

Morrison Supermarkets UK used related service to

increase sales and “leveling the playfield”:

• Analysis of customer journey patterns to

help target high traffic neighborhoods with

coupons while avoid households in the

direct vicinity of a competitor.”

• Improved catchment and optimize

advertising

• Increase store visits 150% without

introducing customer loyalty program

Telefonica Smart Steps & Morrison

Location Insights Products Launched by Telcos

Verizon Precision Insight

Telefonica Dynamic Insights

AT&T AdWorks with Mobile

SK Telecom Bigdata Hub

Singtel DataSpark

User Base 94 mil 200 mil 70 mil + 15 mil U-Verse

27 mil 500 mil

Data used Phone location, Browing history, Apps usage

Phone location, calls, Text, travels between cells, mobile signal loss & connect

CDRs, locations, TV usage, browsing, mobile apps usage, "other information"

location, social networking, voice calls, sensors, SMS and apps

CDRs, locations, TV usage, “other information"

Privacy Precision ID ID Protection Anonymous / aggregate

Exclude all personal

information

Anonymous / aggregate

Reference cases

Sports Stadium Event Promotion

Football Retail Management, Supermarket Campaign

Only multichannel product in US market; Levi ‘s mobile promotion campaign

Business districts info; facilitate highway planning

Transportation, Site planning, Sports Events, Rich Segment, tourism

Unstructured Data in Telco and what can be done with it ?

What do data look like these days ?

• Data = values of qualitative or quantitative variables, belonging to a set of items (usually population)

• Data = often unstructured (without pre-established data model), usually raw file, different formats

chat Genome-DNA base pairs picture

How much unstructured data in a telco ?

Telecommuncations = 55%

Think again !

• share photos

• mobile chat

• mobile video

• network traffic

http://sites.tcs.com/big-data-study/industries-unstructured-data/

Golden Era of Analytics 1995-

• Statistical Machine Learning has contributed many much more powerful

algorithms than simple regression (list modified from Seni Giovanni, A9):

• 1983 CART (Tree)

• 1996 Lasso

• 1996 Bagging

• 1997 AdaBoost

• 2001 Random Forest

• 2003 Learning Ensembles

• 2004 Regularization & Boosted Lasso

• 2005-2013 Deep Belief / Deep Learning

Many ways to predict and classify from structured and unstructured data now exist !

NASA JPL: better flyby surface feature recognition by random forests

Data Science can only make better predictions and classifications; follow up action depends on telco knowhow

By 2017, 10 % of computers will be learning rather than processing (Gartner 2013)

Page 23

Structured Data Unstructured Data

Regression

Linear or Logistic

Problem specific

Learning structure in data

non-Linear (polynomial)

Knowledge specific

Big Data finally found its analytic partner : deep learning

Deep Learning for unstructured data

• Previous paradigm for feature detection and prediction from data is based on

modelling and optimization. “Deep learning” have now surpassed related

performance in diverse problems from researchers around the world. “Tech 2015: Deep Learning And Machine Intelligence Will Eat The World” Forbes 12/2014

• Deep learning scale well with big data to learn “layering of knowledge” in

hidden layers without requiring handcrafting of feature detectors as past

machine learning methods.

• Demonstrated impressive improvements in diverse areas around the world :

speech recognition, object recognition in images, targeted advertising, fraud

detection, personalization • Speech recognition : Microsoft, Google & Apple competing mobile “digital assistants” (Google Now vs Siri vs

Cortana 9/2014) Digital assistants will drive mCommerce & 50% US digital purchases in 2017 (Gartner)

• Object recognition : Facebook

Mining user images for intentions (NYT)

• Real-time translation : Skype

• World Cup / NBA Predicting 2014 (MS)

• Others : Baidu, IBM, Yahoo, Tencent, Netflix, Adobe,

NEC, Toyota

• Telco centric vendors : Wise-athena, Dataspark, Zettics

Progress in targeting : when data meets the right analytics

• Voice usage

• Call Circles

• SMS / VAS

• Time of day

• Location (SGN)

Telco data useful for targeting

• Gender

• Home Address

• Plan & Service

• Devices used

• Searches

• Web browsing history

• Mobile Apps used

• Downloaded Content

• Calls

• Email / Text

• Care History

• Resolved Queries

• Billing

• CRM

• Loyalty

• Spending

• Churn

• Product Lines

• CLV

• Contracts

• Cross-sell/Upsell

• Contact history

• Campaigns

• Product offers

• Acquisition cost

Marketing

Customer Loyalty & Retention

Customer Support

*Digital Usage

* Carriage Usage

Customer Profile

“If I have first-party CRM information and browser history information, I already know so much about the consumer that [demographic] information doesn’t add any [more] value. “

Claudia Perlich, Chief Data Scientist, Dstillery

So we don’t know everything about our data yet …

How Big Data Improve Ad Targeting

1 Visitor browses website or runs mobile app

2 Advertisers see visitor and conversion probabilities (increased by telco big data/machine

learning, “Star”)

3 Advertiser bid

4 RTB exchange selects best bid and sends ad to visitor

5 Visitor sees ad (~100-300 milliseconds)

1 3

4

2 5

Contextual Mobile Targeting

Contextual & unstructured data using machine learning technology also improve

advertising accuracy +219 % (Ad Theorent)

Mobile Targeted Advertising : Telco Examples

Name Combined Base Basis

*Telefonica Axonix (acquired) 200 mil Digital Fingerprinting of mobile usage

*Singtel, Globe, Optus, Telkomsel

Amobee (acquired) 500 mil (combined) Digital Fingerprinting of mobile usage

AT&T Amobee 119 mil Insert tracking id in URL; digital activity

Orange Orange Ad / Amobee 226 mil Partnership with OpenX

Weve UK (Telefónica, O2 EE & Vodafone)

Weve Mobile Display 22 mil (combined)

Digital Fingerprinting of mobile usage

Deutsche Telekom

AudienceScience 36 mil Digital Fingerprinting

Verizon

PrecisionID 123 mil Insert tracking id (UIDH)

Sprint Amobee 55 mil Digital Fingerprinting; Ad Exchange

2014Q4 Survey of Deep Learning Achievements

Previous Accuracy

Data used to train model

Latest Accuracy

Company

Speech Recognition 75% 680 speakers, 10 sentences each

94% (2013) Google, IBM, Skype, MS

Object recognition 70% 1.2 mil images 95% (2015) Baidu, Google, Facebook

Target Advertising <1 % (Banner Ads)

220K users 22% Adtheorent, AlchemyAP, Correlor

Personalization na 220K users

27% Correlor, Optimove

Churn Prediction (Telco)

69% (SAS) 300 mil CDRs 1.8 mil users

82% Sparked, WiseAthena, Correlor

Dealer Fraud Detection (Telco)

<40% (reactive)

700 mil CDRs 1.2 mil users

80% (predictive)

WiseAthena

• Other big companies in related efforts : Baidu, IBM, Yahoo, Tibco, Tencent, Netflix, Adobe, NEC, Toyota

Facebook “Likes” for Predicting Personality Facebook can predict personality based on annotated data better than

humans except for spouse

http://www.pnas.org/content/112/4/1036.full.pdf

Telco mobile usage data should do even better than this

Concluding Thoughts

• Network usage data known to improve business predictions e.g. churn, loyalty

• Combining internal data types improve user targeting

• Unstructured mobile usage data should do well in personality prediction

• Telco should work together due to similar business problems and common interests

With the help of advanced prediction algorithms, telco

data has potential to create significant new business

Questions ? Email [email protected]