Data science unit introduction

24
Data Science Startup Discussion Document

Transcript of Data science unit introduction

Page 1: Data science unit introduction

Data Science StartupDiscussion Document

Page 2: Data science unit introduction

This overview is not intended to be a business case for data science. It is expected that you are already familiar with the value proposition. However, a reference to several case study examples has been included at the end of this document as a reminder of the broad applicability of the subject at hand.

The intent of this document is to set in motion the discussion for the creation of a startup in South Africa that is focused on data science. To be clear the objective of this startup is to:

Capture the best talent that exists in South Africa in data science

Be the leader in data science in Southern Africa and the go-to for organisations seeking services, products and training

Be a leader in the global data science marketplace by having the best people in the business and a competitive advantage over international firms based on lower people costs

Do not be deluded into thinking that this undertaking is at all easy. However challenges inherent in this undertaking are an opportunity as they serve as barriers to entry for those seeking to compete.

Intent

copyright Gregg Barrett August 2016

Page 3: Data science unit introduction

Example of some of the current uses of data science

- Detection of unauthorized trading activity - Accelerating biomedical research

- Identification of data abuse to protect sensitive information and intellectual property - Discovery of patterns of behaviour and links between key actors

- Preparing for major political and economic transformations - Anticipating emerging threats such as the planning of terrorist attacks

- Accurate rating for insurance underwriting - Improving patient outcomes

- Predicting disease outbreaks - Predicting the path of wildfires

- Detection of information security threats - Detection and elimination of sophisticated criminal activity

- Identification of poachers from real time drone footage and audio networks - Autonomous driving vehicles

- Managing datacentre infrastructure - Product recommendations

- Predicting part failure - Improving transportation efficiency

- Customer/contact centre support - Understanding consumer sentiment

- Market making in securities - Language translation

- Credit scoring - New craft beer recipes!

copyright Gregg Barrett August 2016

Page 4: Data science unit introduction

Describing data science is like trying to describe a sunset.

It should be easy, but somehow capturing the words is impossible

(Booz Allen Hamilton, 2015)

copyright Gregg Barrett August 2016

Page 5: Data science unit introduction

We shall use the following definition

Data science is the utilisation of a vast set of tools for modelling and understanding complex datasets.

To simplify matters we shall consider;

analytics

machine learning

artificial intelligence

and big data

as being part of our data science framework.

Data science is NOT:

fancy looking reports (product of SQL queries)

spiffy dashboards (sexy bar graphs and pie charts)

a wonderfully expensive Business Intelligence offering

copyright Gregg Barrett August 2016

Page 6: Data science unit introduction

The future of data science

What happened? A company that wasn’t even in your industry launched a new product and has completely flattened you. Sound familiar? It does for anyone who’s familiar with Uber. Uber first launched as a transportation service, using data and analytics to provide customers with easy, accessible and fast transportation directly from their phone. Now, Uber has since expanded to beyond just transportation, offering additional services from consumers’ phones such as meals and delivery. (IBM, 2016)

Some of the hottest, most critical domains in which data science will be applied in the coming years include:

Cybersecurity including advanced detection, modelling, prediction, and prescriptive analytics

Healthcare including genomics, precision medicine, population health, healthcare delivery, health data sharing and integration, health record mining, and wearable device analytics

IoT (Internet of Things) including sensor analytics, smart data, and emergent discovery alerting and response

Customer Engagement and Experience including 360-degree view, gamification, and just-in-time personalization

Smart X, where X = cities, highways, cars, delivery systems, supply chain, and more

Precision Y, where Y = medicine, farming, harvesting, manufacturing, pricing, and more

Personalized Z, where Z = marketing, advertising, healthcare, learning, and more

Human capital (talent) and organizational analytics

Societal good (Booz Allen Hamilton, 2015)

copyright Gregg Barrett August 2016

Page 7: Data science unit introduction

Examples of those with data science at their core

Two of the worlds most successful hedge funds:

Renaissance Technologies LLC

Bridgewater Associates

A British startup in 2010, acquired by Google in 2014 for around 600 million USD:

DeepMind

One of the first Data Science consulting firms founded in 1995:

Elder Research

A startup focused on autonomous driving:

comma.ai

A startup focused on cybersecurity:

SparkCognition

copyright Gregg Barrett August 2016

Page 8: Data science unit introduction

Fighting blind without data science

Float like a butterfly, sting like a bee,

for most firms in South Africa they can’t hit what they can’t see.

copyright Gregg Barrett August 2016

Page 9: Data science unit introduction

Why South Africa

Value proposition for data science in South Africa is no different from that in other countries.

Globally skills are in short supply and in South Africa the problem is even more acute.

For the handful or persons in South Africa with the necessary competence, opportunities abroad are compelling, as compensation is around 3 times what they would receive in South Africa.

Data science in South Africa is for the most part in a nascent state. Leading solution providers for example have no presence anywhere on the African continent:

MapR Cloudera Hortonworks

Datameer Trifacta Paxata

Palantir Elder Research Alpine Data Labs

RapidMiner SparkCognition Pivotal Software

For international organisations weakness in the South African economy and the South African rand make the value proposition of a South African based provider compelling.

copyright Gregg Barrett August 2016

Page 10: Data science unit introduction

It’s more about people than about machines

At the very core of this undertaking are people - they are the key to success. Only the truly brilliant will do. They are the outliers and are not easily sourced or recruited. Fortunately, these people tend to be averse to; “Fortunte 500”, “multinational”, “blue chip organisation”, which invoke thoughts of stifling bureaucracy and politics. A startup is what appeals to them, where they have their say, are individuals within a team, have a stake in something that can make a difference and where they can be themselves.

They are a rather scarce commodity in South Africa. However this presents an opportunity as the scarcity of talent serves as an impediment to firms seeking to compete and build competence in this space.

Capturing the best and the brightest in the data science market in South Africa is a primary objective.

What it takes to manage such an operation.

copyright Gregg Barrett August 2016

Page 11: Data science unit introduction

Winner-takes-all

In this field one brilliant person can deliver the work of 10 average persons. It is critical that every individual that is a part of this startup have skin-in-the-game through an equity stake. The equity position serves to attract and retain the people we seek.

People cost is the single largest cost, but also a source of competitive advantage. As a guide for data science positions in the United States:

Entry level position: 100 000 USD base salary

Mid-level position: 150 000 – 250 000 USD base salary

Senior level position: 300 000 – 500 000 USD base salary

South Africa cannot compete with such levels of compensation – a contributing factor why much of the talent leaves the country. We do not have to have such compensation levels however in order to be successful. It is estimated that we can comfortably operate at around 65% - 75% of the cost of a comparative firm in the US. A cost saving of 25% will be a major competitive differentiator and particularly attractive to international firms.

As a guide in South Africa we would aim for:

Entry level position: 600 000 ZAR base salary

Mid-level position: 800 000 ZAR base salary

Senior level position: 1 000 000 ZAR base salary

We believe the following strategy will be attractive:

compensation levels higher that what is currently offered by local organisations

an equity position

being part of a startup composed only of the best

an opportunity to make a major impactcopyright Gregg Barrett August 2016

Page 12: Data science unit introduction

About people

I said that the best in this business are a rather scarce commodity but what do they look like? Herewith are a couple of examples:

Gabor Melis

George Hotz

What are some of the skills that these persons possess? The document “The Quest for Unicorns” by Elder Research serves as a good starting point:

The Quest for Unicorns by Elder Research

The following article from The Economist gives some insight into just how intense the arms race for talent has become:

As Silicon Valley fights for talent, universities struggle to hold on to their stars

copyright Gregg Barrett August 2016

Page 13: Data science unit introduction

Options for South African organisations pushing forward on data science

1. Build the capability internally: Such an approach will be challenging, with most firms not even knowing where to start. The shortage of talent simply compounds the problem.

2. Retain the services of an outside firm: There are several outside of South Africa. Such an approach will be costly though due to dollar exposure. Therefore, the likely approach will be to restrict the search to the local market, supporting our proposition - and a proposition that conversely will be appealing to international firms.

3. Incubate/finance a separate entity and in so doing gain the necessary business capability as well as the added benefit of an equity position which could generate financial gain.

copyright Gregg Barrett August 2016

Page 14: Data science unit introduction

What is needed

The following options are being considered:

Startup via funding: funding the startup as a wholly separate entity for a three year stretch in exchange for an equity position

Startup via incubation: incubating the startup within an existing organisation, where the startup generates value for the organisation and where the organisation has an equity position in the startup with a view to a spin off once it has reached sufficient scale

Startup via initial clients: securing sufficient initial clients under contract to cover start-up costs

copyright Gregg Barrett August 2016

Page 15: Data science unit introduction

Budget

We are looking to put together a 5 to 10 person team. This would require a budget of 5 – 10 million ZAR a year for three years.

The budget calculation is rather straight forward:

5 million ZAR a year for a 5 person team

10 million ZAR a year for a 10 person team

The nature of the business means that it does not require investment in physical assets. Electricity and an internet connection for access to cloud infrastructure are the primary requirements. The startup is thus minimally exposed to risks in the South African operating environment. Further, cloud infrastructure requirements are scaled as and when needed – pay as you go.

Risk

Probability of increases in income tax and corporate tax rates in South Africa are viewed as a risk which could place upward pressure on operating costs. However there are options to mitigate this risk.

copyright Gregg Barrett August 2016

Page 16: Data science unit introduction

Revenue sources

Consulting

Strategy

Execution

Product

Product will be created as and when the need arises. However consulting would be the initial focus with product being a longer term focus.

Training

Approach

The approach is to be as agnostic as possible when it comes to platform/technology/products.

We would also seek to develop academic collaboration with the likes of UCT and WITS.

Example of the Bloomberg Labs Data Science program.

copyright Gregg Barrett August 2016

Page 17: Data science unit introduction

Areas for consulting

Cross Industry Standard Process for Data Mining (CRISP-DM) approach is a data mining process model that provides a reference methodology for conducting data mining. The tasks and output listed in the approach gives an example of areas where consulting work can be provided in executing a data science project.

Figure 1: Generic tasks (bold) and outputs (italic) of the CRISP-DM reference model

copyright Gregg Barrett August 2016

Page 18: Data science unit introduction

Optionality through data driven business models

There is a growing trend of data driven technology companies utilising their own solutions to compete with incumbents in the marketplace, as opposed to licensing their offerings to established incumbents. For example, let’s say that Google finds a new way to price and deliver insurance. An approach which is now seemingly more frequently being considered is rather than licensing it to an existing participant(s) in the insurance market, they setup their own insurance entity – with negative interest rates in many parts of the world, capital is abundant and operating licenses are not impossible to obtain.

Mondo is an example of such thinking:

Digital challenger bank Mondo just got its banking licence

Uber is another:

Uber’s First Self-Driving Fleet Arrives in Pittsburgh This Month

copyright Gregg Barrett August 2016

Page 19: Data science unit introduction

Data Charlatans

I spoke earlier of the need to recruit the best and the brightest. Why you ask? Get things wrong and at best you look silly at worst your blow things up:

Example of getting it wrong and looking silly:

John Gray: Steven Pinker is wrong about violence and war

Example of blowing up:

Recipe for Disaster: The Formula That Killed Wall Street

Big Data brings it’s own set of challenges:

Beware the Big Errors of ‘Big Data’

Beyond Big Data: Identifying Important Information for Real World Challenges

copyright Gregg Barrett August 2016

Page 20: Data science unit introduction

A note for insurance

Traditional actuarial approaches are no match for current data and computing resources available with the likes Gradient Boosting Machines, Neural Networks and ensembles of such providing far superior levels of accuracy.

“As more insurers use predictive analytics, those not doing so will be increasingly exposed to adverse selection because their market will be limited to a subsection for the general population that has worse-than-average loss ratios.” (Nyce, 2007)

Analytics has the potential to make a positive impact on virtually every aspect of the insurance life cycle.

Product development

Marketing and distribution

Pricing and underwriting

Risk control

Claims management

Performance management (Accenture, 2013, pg. 5)

For a more comprehensive overview of data science in insurance:

Value proposition of analytics in P&C insurance

copyright Gregg Barrett August 2016

Page 21: Data science unit introduction

Further reading of potential interest

Bridgewater Associates building an artificial intelligence competence: Bridgewater Is Said to Start Artificial-Intelligence Team

Bloomberg LP building a machine learning competence: Bloomberg and “the magic” of machine learning

Example of Google using it’s DeepMind unit to save on energy consumption: Google Cuts Its Giant Electricity Bill With DeepMind-Powered AI

Example of the arms race for data: Tiny Satellites: The Latest Innovation Hedge Funds Are Using to Get a Leg Up

copyright Gregg Barrett August 2016

Page 22: Data science unit introduction

Case studies on data science abound on the internet, for example:

Healthcare: When Health Care Gets a Healthy Dose of Data – Intermountain Healthcare

Industrial: The Industrial Internet – GE Digital

Automotive: The Connected Vehicle Data Platform – Ford Motor Company

Insurance: Geospatial Analytics – Progressive Insurance

Case Studies from MIT Sloan Management Review: MIT Sloan Management Review Case Studies

Case Studies from Elder Research:

Defense and intelligence: Automating Textual Data Discovery And Analysis

Nonprofit Service Organization: Determining Influential Factors for Conference Satisfaction

Pharmaceutical: Discovering the Efficacy of a New Drug

Retail, Consumer Electronics: Enhancing Customer Loyalty

Government, Healthcare: Improving Claims Approval Speed and Accuracy

Retail Banking, Financial Services: Improving Credit Card Risk Scoring

Telecommunications: Improving Customer Retention and Profitablity

Healthcare Insurance: Improving Provider Performance and Patient Outcomes

Retail Banking, Financial Services: Predicting Financial Account Churn

Oil and Gas: Predicting Natural Gas Well Freezing

Government: Prioritizing Building Lease Renewals

Healthcare Insurance: Prioritizing Long-Term Care Claims

Government: Reducing Fraud, Waste, and Abuse

Retail, Computer and Electronic, Product Manufacturing: Reducing Service Provider and Warranty Fraud

IT Management: Staffing Optimization

Insurance: Understanding Customer Sentiment

Retail, Commercial Software: Using Log Analytics to Improve User Experience

There are no shortage of conferences either, for example: Bloomberg Data for Good Exchange

Organized around the following topic areas

- Justice and fairness, including criminal justice, discrimination, algorithmic bias, workers’ rights, voting rights, etc.

- Economic development, including housing, job security, immigration, wages, challenges coming from the “gig” economy, remittance services, etc.

- Security and safety, including emergency services, cyber-attacks, dark web and illegal content, gun control, resilience, etc.

- Public service delivery, including transportation, sustainability, biodiversity and health monitoring, public health, etc.

copyright Gregg Barrett August 2016

Page 23: Data science unit introduction

Compiled by:

Gregg Barrett

copyright Gregg Barrett August 2016

Page 24: Data science unit introduction

Reference

Accenture. (2013). The digital insurer: achieving payback in insurance analytics. [pdf]. Retrieved from http://www.accenture.com/us-en/Pages/insight-payback-insurance-analytics.aspx

Booz Allen Hamilton. (2015). The field guide to data science. [pdf]. Retrieved from https://www.boozallen.com/content/dam/boozallen/documents/2015/12/2015-FIeld-Guide-To-Data-Science.pdf

CRISP-DM. (2000). Generic tasks (bold) and outputs (italic) of the CRISP-DM reference model. [Figure]. Retrieved from CRISP-DM. (2000). CRISP-DM 1.0. [pdf]. Retrieved from https://the-modeling-agency.com/crisp-dm.pdf

IBM. (2016). Why data science should be your priority. [pdf]. Retrieved from http://www.ibmbigdatahub.com/blog/why-data-science-should-be-your-top-priority

Nyce, C. (2007). Predictive analytics white paper. [pdf]. Retrieved from http://www.theinstitutes.org/doc/predictivemodelingwhitepaper.pdf

copyright Gregg Barrett August 2016