Applied AI Tech Talk: How to Setup a Data Science Dept

15
Tech Talks: How to Setup a Data Science Business Function Jun 2015 www.applied.ai How to Setup a Data Science Business Function Applied AI Tech Talk

Transcript of Applied AI Tech Talk: How to Setup a Data Science Dept

Tech Talks: How to Setup a Data Science Business Function Jun 2015 www.applied.ai

How to Setup a Data Science Business Function

Applied AI Tech Talk

● We are data scientists:

○ variously quants, statisticians, actuarial & machine learning types

● We are consultants:

○ we do complex data analysis, predictive modelling etc

○ and we also help to do the soft stuff...

… enabling companies to learn from their data in a sustainable way

This is a totally biased talk

Like any collaborative business effort involving research & development, a data science function should be built carefully in order to enable the best expertise and technologies.

- Me, ~2 weeks ago

http://blog.applied.ai/how-to-build-a-data-science-business-function/

How to Setup a Data Science Business Function a.ka.

Making in-house Data Science sustainable

● Including, for example:

Data Science is a broad discipline

one-off scenario-specific modelling

exercises

on-line predictive modelling of user

actions

regular analysis of campaigns and

customer discovery

… and a significant amount of data acquisition, preparation, storage etc

● To be sustainable and minimise risk, we need to combine:

○ great people

○ advanced maths

○ scientific experimentation

○ software engineering

○ high-quality data

○ solid business practices

○ communication

The most important thing is communication

https://www.quora.com/How-could-the-Data-Science-Venn-Diagram-be-improved

1. Setting up and sizing the team

2. Defining and operating projects

3. Systemising the data pipeline and analyses

4. Ensuring effective communication

… to help us make in-house Data Science sustainable

Four main areas to cover:

● The practitioner will use a wide variety of tools to:

○ acquire, manipulate, store and access data efficiently

○ design surveys and scientific experiments to test hypotheses

○ undertake statistically valid analyses

○ implement high-quality, optimised predictive models

○ derive and communicate actionable insights

… requiring diverse skills covering database management, software engineering, statistical analysis, machine learning, graphic design, ethics, social responsibility, domain knowledge and communication.

1. Setting up and sizing the team

Data Scientists need a lot of skills!

● But the days of hiring a single, unicorn-like, 'full-stack' data scientist are pretty much gone, and probably never really existed.

1. Setting up and sizing the team

Don’t believe in unicorns

The team needs to be small, agile and focused:

● 2-6 data scientists is ample

● they should be proven generalists, team-players and pragmatists

● able to cope with vague requirements, messy data and high failure rates

“The first hire(s) should help get three things ready: your data; a clear problem to be solved; and a process to evaluate the business impact of any new solution".

- Simon Chan, Forbes, April 2015 http://www.forbes.com/sites/theyec/2015/04/30/how-to-do-your-

first-data-science-hire-right/

1. Setting up and sizing the team

Start with a small, focused team

Any piece of research or development likely to last more than a few days and/or involve more than one person should have:

● A primary sponsor and a project leader

● A well defined goal (SMART), and a written spec

● Progress meetings to validate and update the plan, with full and frank

communication between major stakeholders

● Knowledge sharing upon completion

● Consider maintaining a basic RACI and risks & issues register.

2. Defining and operating projects

Automate good workflows and deal with technical debt:

● Understand and map the data 'pipeline'

● Stop when the models are good enough

● Encourage a systematic, shared approach to the creation of all machine

learning tools and analyses, with:

○ proper source control and documentation

○ code reviews & 'lunch and learn' seminar sessions

○ regular refactoring of algorithms, applications and data preparation

scripts where appropriate.

3. Systemising the data pipeline and analyses

Strong communication within & without the team is vital, helping to

ensure that projects stay on-track and issues are spotted early:

● Daily stand-up meetings (<10 mins), sharing immediate activities & issues

● An up-to-date communal task schedule - e.g. the Kanban methodology

● Simplified and centralised comms tech; move written discussions away

from email and towards wikis, message boards, and group chats Slack

● Try to allow data scientists / software engineers the time & space to get

into a productive flow state without meetings and interruptions.

4. Ensuring effective communication

● Start with a small team of capable generalists and work hard to define the

business problems and success criteria, set timescales and to understand &

access the available data

● Allow for and embrace failure, give data scientists time and space to

research and experiment

● Specialise when necessary, automate where possible and embed into an

ongoing cycle of development, maintenance and support.

● Require a corporate sponsor with clout and encourage strong

communication within the team and the rest of the business

http://blog.applied.ai/how-to-build-a-data-science-business-function/

In review

Applied AI is a data science consultancyWe provide data-driven insights and solutions using applied artificial intelligencewww.applied.ai

Thank You

Any questions?