A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

20
Analysing Analytics: Evolution or Emperor's New Clothes? OR55, September 2013 Michael Mortenson, Neil F. Doherty & Stewart Robinson Defining Analytics: A Topic Model of Analytics Job Adverts

description

This presentation presents recent research into definitions of analytics through analysis of related job adverts. The results help us identify a new categorisation of analytics methodologies, and discusses the implications for the operational research community.

Transcript of A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Page 1: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Analysing Analytics:Evolution or Emperor's New Clothes?

OR55, September 2013Michael Mortenson, Neil F. Doherty & Stewart Robinson

Defining Analytics: A Topic Model of Analytics Job Adverts

Page 2: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Agenda

2 Defining Analytics: A Topic Model of Analytics Job Adverts

Problem Summary

Confusion about precise definition of analytics

Benefit of ‘practical’ definitions

Issues with the conventional ‘practical’ model of analytics

Model Details

Data source: ‘analytics’ job adverts

Topic modeling & Latent Dirichlet Allocation

Model build & data pre-processing

Implications

Model analysis

An alternative definition of analytics

Implications for OR/MS

Page 3: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Analytics is …

3 Defining Analytics: A Topic Model of Analytics Job Adverts

…. delivering the right decision support to the right

people at the right time.Laursen & Thorlund, 2010, p XII

… the scientific process of transforming data into insight

for making better decisionsINFORMS

… [the] technologies, systems, practices, & applications to analyze

critical business data so as to gain new insights

Lim et al, 2012

… the extensive use of data, statistical & quantitative analysis, explanatory &

predictive models, & fact-based management to drive decisions & actions.

Davenport & Harris , 2007, p 7

… an outgrowth of what is known as business intelligence […] Today’s

expansive, global enterprises generate a deluge of data that is impossible for a

human to make sense of.Varshney & Mojsilovic, 2011

Analytics with a capital "A" is an umbrella term that represents our industry at a macro level, and analytics with a small "a" refers to technology used to

analyze data.Eckerson, 2011

… information-intensive concepts and methods to improve business

decision making.Chiang et al, 2012

… is the process of obtaining an optimal and realistic

decision based on existing dataHamel, 2011

… data analysis that changes the behavior of the organization

Hackathom, 2010

the science of analysis… the science of analysisWikipedia

… the method of logical analysisMeriam Webster

… the brains to cloud computing’s brawn

Croll, 2011… the process of transforming data, from a variety of sources and of a variety of types, into insights that

support, improve and/or automate business decisions, using

technological, quantitative and presentation techniques

Mortenson et al, 2013

… a group of approaches, organizational procedures and tools used in combination

with one another to gain information, analyze that information, and predict

outcomes of problem solutionsTrkman et al, 2010

… the use of data, information technology, statistical analysis, quantitative

methods, and mathematical or computer-based models to help managers gain improved insight

about their business operations and make better, fact-based decisions

Evans, 2012

• Many contrasting and often contradictory definitions

• Particularly difficult to distinguish analytics from business intelligence or similar fields

• Does it matter? Potential confusion As analytics is multi-disciplinary it is important

that a common language can be established Important so that the growing job market can

be met with the appropriate training

What is Analytics?

Page 4: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Analytics: Practical Definition

4

Source: Blackett, 2012

Defining Analytics: A Topic Model of Analytics Job Adverts

Advantages• Focuses on application &

generation of value• Demonstrates the

disciplines informing analytics

Issues• Some methods suggest

different purposes• Suggesting progression to

prescriptive as advanced may not always hold

Page 5: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Job Adverts

5

• Analyse “analytics” job adverts – following the tradition of ‘ASP’ studies (e.g. Liberatore and Luo, 2012)

• Instead of studying a smaller pool of jobs, we access through the LinkedIn API Over 250k jobs online 77% of all jobs are posted on LinkedIn (Dougherty, 2012)

• Scripted using Python & stored in MongoDB OAuth, SimpleJSON, & PyMongo

• Need to reduce and generalise results from >6,800 adverts with >50,000 unique words.

Defining Analytics: A Topic Model of Analytics Job Adverts

Page 6: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Topic Models

6

• Topic models assume documents to be a collection of latent topics. The topics determine which words are used

• Probabilistic models that determine the topics by analysis of the co-occurrence of the words used

• The most common are Probabilistic Latent Semantic Indexing (pLSI) and Latent Dirichlet Allocation (LDA)

Defining Analytics: A Topic Model of Analytics Job Adverts

Page 7: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Latent Dirichlet Allocation (LDA)

7

• Basic conception is that a collection of documents has three layers and contains:

Documents

WordsWords

WTopics

ZTopic

Distribution

Ө

AlphaParameter

αBeta

Parameter β

Adapted from Blei et al, 2003N MDefining Analytics: A Topic Model of Analytics Job Adverts

Page 8: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Latent Dirichlet Allocation - Process

8

• Model is built by:1. Estimating topics as product of observed words2. Use to estimate document topic proportions3. Evaluate corpus based on the distributions suggested in

(1) & (2)4. Use (3) to improve topic estimations (1)5. Reiterate until best fit found

Defining Analytics: A Topic Model of Analytics Job Adverts

Page 9: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Latent Dirichlet Allocation - Assumptions

9

• Bag-of-words / exchangeability

• The number of topics is known and pre-determined (K ) Cross-validation to identify K with the lowest perplexity

• Topic independence As α is a parameter of a Dirichlet prior, each topic is assumed to

be independent and not correlated In this research correlation between topics has to be assumed. Alternative is the correlated topic model (Blei & Lafferty, 2007),

which uses a logistic normal rather than a Dirichlet distribution

Defining Analytics: A Topic Model of Analytics Job Adverts

Page 10: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Data Pre-Processing & Model Build

10

• Strip HTML / XML• Remove stop words, numbers and punctuation• Remove words < 3 characters• Remove most and least frequent words

Python: HTMLParser, GenSim and String R: TM and TopicModels

• To stem or not to stem? "the job involves managing analytics projects" "the job involves the management of analytical projects“ "has experience running projects using management science and analytics" "managing a team of scientists analysing the experience of runners"

Defining Analytics: A Topic Model of Analytics Job Adverts

Page 11: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Topic Results• 30 topics identified

• All topics are created equally but some are more topical than others

0%5%

10%15%20%25%30%35%40%45%

Most Likely Topic per Document as % of Corpus

Defining Analytics: A Topic Model of Analytics Job Adverts11

Page 12: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Most Likely Terms in Topics• Analysis of the 3rd, 4th & 5th most likely topics

Digital & Web (8%)

Topic 3 (4th)other mediaacross working

understanding analysissocial projects

responsible requiredensure withindesign key

performance digitalcompany managerproducts their

lead toolsrole services

Topic 13 (3rd)working marketdevelop projectsoftware process

media reportingkey through

requirements solutionsmanager excellent

your strategymultiple moreservice opportunitymanage well

opportunities clientsConsultancy (17%)

Defining Analytics: A Topic Model of Analytics Job Adverts12

Topic 9 (5th)risk systems

design solutionsservices other

tools technicalteams related

provide requiredposition degree

such operationsglobal skillsproject opportunityclients service

excellent productsTechnical (7%)

Page 13: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Most Likely Terms in Topics (cont.)• Analysis of the top two most likely topics

Topic 20 (1st)reporting analysis

media requiredstrategy relatedstrategic managercompany degree

risk onlineproducts across

drive mustmanage responsible

well financialplanning industry

lead software

Topic 21 (2nd)services solutions

technology clientsdigital consultingyour more

implementation managementoracle technical

capabilities designprovide advisorystrategy integration

technologies sapcareer enterprise

solution architectureStrategic (41%)Computing (20%)

Defining Analytics: A Topic Model of Analytics Job Adverts13

Page 14: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Model Analysis• Main five topics:

Technical Digital/Web Consultancy Computing Strategic

• ‘Digital/Web’ is a specialism within analytics (also ‘Financial’)

• ‘Technical’ & ‘Consultancy’ are specific job types or environments However, some technical (‘hard’) skills & some consulting-type (‘soft’) skills

are likely to be required in all analytics jobs

• ‘Computing’ & ‘Strategic’?

Defining Analytics: A Topic Model of Analytics Job Adverts14

Page 15: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

The Analytics of Computing?

Defining Analytics: A Topic Model of Analytics Job Adverts15

Basic Analytics Capability

SoftHard

Data Warehouses

Big Data Architecture

Stock Market Analysis

Algorithmic Trading

Fraud Investigation

Automatic Fraud

Detection

Customer Segmentation

Propensity Modeling

Clickstream Analysis

Behavioural Targeting

Qualitative Text Analysis

Natural Language

Processing

Reports & Dashboards

Advanced Visualisation

Advanced Analytics Capability

Discovery Analytics

Page 16: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

The Analytics of Strategy?

Defining Analytics: A Topic Model of Analytics Job Adverts16

Basic Analytics Capability

SoftHard

Trial & Error Experimentation

Optimisation Simulation

Basic Forecasting

ARIMA Time Series

Performance Metrics

Data Envelopment

Analysis

A/B Testing

Multivariate Testing

Business Analysis

Business Process

Optimisation

Requirements Gathering

Problem Structuring

Advanced Analytics Capability

Decision Analytics

Page 17: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

An Alternative Definition of Analytics

Defining Analytics: A Topic Model of Analytics Job Adverts17

Descriptive Analytics

Predictive Analytics Prescriptive Analytics

Statistical and data modeling techniques designed to describe past events and answer “what happened”?

Data mining and machine learning techniques used to predict future

events and answer “what will happen next”?

OR/MS , advanced statistical and mathematical models used to

prescribe future actions and answer “what should we do next”?

Page 18: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

An Alternative Definition of Analytics

Technological Strategic

Lower Risk Decisions Higher Risk DecisionsDefining Analytics: A Topic Model of Analytics Job Adverts18

Discovery Analytics Decision Analytics

Advanced Discovery Analytics

Reporting & alertsMarket research

Information systems

Basic historical analysisPerformance metrics

Stakeholder consultation

Advanced visualisationReal time insights

Automated decisions

Advanced Decision Analytics

Advanced modellingProblem structuring

Decision analysis

Advanced

Page 19: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Summary & Implications for OR/MS• Implemented a correlated topic model on 6,873 job adverts

• An alternative practical definition of analytics has been suggested: discovery and decision analytics Maintains the focus on business value, application & the

disciplines that inform analytics However, removes the contradictions in the previous model

• OR/MS has an obvious role in advanced decision analytics, both in hard and soft applications

• Further exploration (and/or promotion) of the role of OR/MS in advanced discovery analytics

Defining Analytics: A Topic Model of Analytics Job Adverts19

Page 20: A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

Contact Details and Questions

Email: [email protected]

Website: www.whatisanalytics.co.uk

Mobile: 07833 XXXXXX

LinkedIn: http://www.linkedin.com/profile/view?id=114000243&trk=tab_pro (or search Michael Mortenson)

Defining Analytics: A Topic Model of Analytics Job Adverts20