Making data science pay: Mastering the challenges of ...€¦ · 13/06/2018  · Making data...

Post on 18-Jul-2020

3 views 0 download

Transcript of Making data science pay: Mastering the challenges of ...€¦ · 13/06/2018  · Making data...

Making data science pay:

Mastering the challenges of analytics operations

Michel DebicheGuest Analyst, STAC

michel.debiche@STACresearch.com

Copyright © 2018 Securities Technology Analysis Center LLC

®

Cognitive Reset Part 1: Window management technology

Copyright © 2018 Securities Technology Analysis Center LLC

®

Cognitive Reset Part 2: Window management context

Copyright © 2018 Securities Technology Analysis Center LLC

®

Investment Process

Gather

information

Digest

information

Make

decisionsExecute

decisions

Actions

Results

Info

Copyright © 2018 Securities Technology Analysis Center LLC

®

Pressures

• Scale

• Volume

• Variety

• Density

• Computational complexity

• Velocity of innovation

• Cost

• Regulation

Copyright © 2018 Securities Technology Analysis Center LLC

®

Dimensions of scale

• Scale• Volume

• Variety

• Kinds of data: structured, unstructured, text, binary

• Data entities: Millions of time series

• Density

• Transactions in microseconds

• Simultaneous transactions on multiple channels

• Computational complexity

• NLP, Image processing, AI

• Velocity of innovation

• Competitive pressures: New datasets, new models, new technologies

• Evolving opportunities

• Feedback loops

Copyright © 2018 Securities Technology Analysis Center LLC

®

Responses

• DevOps

• Data Lake

• Open Source

• Big Data

• Data Science

• AI

Copyright © 2018 Securities Technology Analysis Center LLC

®

Issues

• Model Factories: Hundreds of models with nowhere to go

• Redundant engineering

• Open source interoperation and upgrade nightmares

• Murky, expensive data lakes contributing little value

• Skills mismatches

• User resistance to new technologies

• Data lineage, audit trails

Copyright © 2018 Securities Technology Analysis Center LLC

®

Goals

• Maximize returns

• Minimize risk

• Market risk

• Model risk

• Systems risk

• Data risk

• Operational risk (people)

• Maximize productivity

Copyright © 2018 Securities Technology Analysis Center LLC

®

Principles

• Optimize use of resources

• People

• Time

• Data

• Technology

• End-to-end process design

• Agility

• Constant improvement

Copyright © 2018 Securities Technology Analysis Center LLC

®

Industrial Engineering

• Similar challenges and goals

• Eventually came to software engineering as DevOps

• Need to carry paradigm over to full data-to-decision pipeline

• Why is it so hard?

Copyright © 2018 Securities Technology Analysis Center LLC

®

DevOps: Elegant Concept

Copyright © 2018 Securities Technology Analysis Center LLC

®

DevOps: More complicated to implement

Copyright © 2018 Securities Technology Analysis Center LLC

®

So let’s think about QuantOps™

Copyright © 2018 Securities Technology Analysis Center LLC

®

Investment Process

Gather

information

Digest

information

Make

decisionsExecute

decisions

Actions

Results

Info

Copyright © 2018 Securities Technology Analysis Center LLC

®

Investment Process, Expanded

Research data, develop and test models

Devops for data prep, analytical

functions, API

Production pipeline: data to

curated feature

Model scoring engine

Model testing manager

Data

Core

Research Data

Feature

updates

Test backlog

Data

Model suite

updates

Features

Features

Scores

Results

Ad hoc data

ingestion

Ideas

Function

library

Feature

definition

Model

definition

Model

Repository

Feature

preparation

code

Function

definition

Results

Results

Results

Copyright © 2018 Securities Technology Analysis Center LLC

®

1

7

A Unifying Paradigm: QuantOps as a DAG

Copyright © 2018 Securities Technology Analysis Center LLC

®

A Unifying Paradigm: QuantOps as a DAG

• Standardize the connections

• Carefully define the data APIs

• Then all the technology is pluggable

• Makes it possible to efficiently address:

• Orchestration

• Data lineage

• Monitoring

• Audit trails

• Automated code generation and testing

Copyright © 2018 Securities Technology Analysis Center LLC

®

Where does STAC fit in?

• Implementing analytics ops is a big commitment with big payoffs

• Biggest challenge: effective communication, change management

• Design needs to be process-oriented and based on user needs

• Technology needs to respond to process requirements, not vice versa

• Emerging STAC roles:

• Facilitate dialogue & training on analytics ops challenges & best practices

• Accelerate technology selection based on community-source standards

driven by process-oriented model of the investment process

• Let us know if you want to be involved!