Ezml Stanford 2015

47

Transcript of Ezml Stanford 2015

eZmLThe Old The New

An ecosystem for machine learning. EZML

will allow anyone of any background to

upload data and receive a solution for their

machine learning problem.

Size of Opportunity:

SAM: 10 Billion (startups 10~100 ees)

Target Market: 8 Billion (high tech)

Helping startups increase user engagement and

retention through crowdsourced

recommendations and analytics. Engineers

without any data science background can

upload data and seamlessly integrate a

machine learning solution into their product.

Size of Opportunity:

SAM: 10 Billion (startups wz 10~100 ees)

Target Market: 4 Billion (consumer internet)

# of interviews: 103

THE TEAM

Jim Cai

MSCS

Technical Lead LinkedIn

Engineering

William Song

MSCS

Self driving Car

Google[x]; Facebook

Engineering

Billy Jun

MSCS

Robotics at MDA

Engineering

Sam Yu

MSx GSB

SAIF Partners, GE

Strategy + Bizdev

Jordan Segall

MS in MS&E, BSCS

PM RelateIQ; FDE Palantir

Product + Bizdev

AT THE BEGINNINg:

ORIGINAL HYPOTHESES

WHAT: Model selection is time consuming and can be

automated

WHO: Data scientists within companies that can automate

“annoying” parts of their job

HOW: We can provide the entire model to the user that can

be embedded within their system at the end of the process

WHAT DID WE DO?

Talked to primarily data scientists at tech companies and

data science consulting

Business Model

Canvas #1:

1/6/15

Where are the

data scientists at

companies?

LESSONS LEARNED: PART 1

PREVIOUSLY

Model selection is time consuming and

can be automated

Automated Model

Recommendation

Data Scientist Model

Contribution

On-the-fly update

API Access

Saves time performing model

selection and money hiring data

scientists

Obtain models that improve user

engagement and

product quality

Purchased model continues to

improve automatically

Automated model selection system produces

the best model with little to no data science

knowledge

Save the customers time and money by

running everything in

the cloud

“Typically less than 20% of our time in data

science consulting is model selection. We’ll get

it to working ‘decently enough’ then move

on...we don’t aim for the best...we would ONLY

use EZML if the feature extraction part of the

project is trivial. Feature extraction is critically

important.”

- Jay Hack, Founder 205 Consulting, Palantir

Engineer

“Model selection is the easy problem; feature

extraction is the hard part.”

- Nick Gorski, Machine Learning Lead at

TellPart

LESSONS LEARNED: PART 1

HYPOTHESIS:

Data Scientists may be the first target

customer to go for

Photozeen - one of the first companies we

interviewed that recognized their deficiency in

machine learning, and needed help to improve

user retention and engagement.

LESSONS LEARNED: PART 1

HYPOTHESIS:

We can pitch the product

“We don’t understand what the **** you guys

are doing.”

- Steve Weinstein

GOING FORWARD

Is there a need?

Who is our customer

archetype?

Interview more companies

Develop MVP

Hone pitch

Articulate product more clearly

UPDATED HYPOTHESES

WHAT: Our tool will still provide machine learning as a

service, but our target customer are companies, not data

scientists/consulting firms.

WHO: Anyone within the firm, from engineers to marketing,

can use our tool. Our target contact is the founder.

Reached out to 200+ companies

● 51 from Y-Combinator

● 49 from Crunchbase

● 35 from Tec Club

● 34 from Tech Stars

● and more

WHat we did

LESSONS LEARNED: PART 2

There is a need for companies lacking

data science solutions to use our tool

While these companies ultimately

did not work out….

….the reasons why they didn’t

were key to our learning

RelateIQ

Too large with enough data scientists on board;

would need fully working EZML

Verge Campus

Recommendation is an ancillary feature

Photozeen

Good problem for us, but not enough data

LESSONS LEARNED: PART 2

Privacy would not be a concern for

companies that use our tool; they

would reveal their data features publicly

for better results, Kaggle style.

“You need to properly address privacy and

security policies before we share data...we

don’t want to reveal our secret sauce.”

Olga Mack, Assistant General Counsel, Zoosk

“I’m uncomfortable sending you my data since our business is based on selling data - privacy is

already a touchy subject for us.”

Andy Bromberg, CEO Sidewire

LESSONS LEARNED: PART 2

Our customer archetype are CEOs and

CTOs

We would go to CEOs, who would redirect us

to their CTOs.

CTOs tended to have a greater understanding

of the value add to their business, and how

they were unable to implement machine

learning without a tool like EZML.

CUSTOME ARCHETYPEStartup with <10M in funding

CTO or head data scientist/engineer

enough data for machine learning to have effect

no intense privacy concerns

clear problems (ie. user engagement; user churn)

BUSINESS MODEL CANVAS

Business Model

Canvas: 2/10/15

UPDATED HYPOTHESES

SELF-SERVICE: Since it is a self-service tool, the

sales process won’t be difficult

CONSULTING: We can apply a consulting model

for early customers; in exchange for manually

working on problems, we can learn from real

data and create our platform

NEW MVP

Keep Customers

AC

TIV

AT

E

AC

QU

IRE

UP

-SE

LL

NE

XT

-SE

LL

CR

OS

S-S

ELL

RE

FE

RR

ALS

Perfect User Experience:

Easiness, Accuracy

Automatic model upgrade

with continual data upload

Free access to machine learning

models for analytics

Low charge for API calls

Focused sectors (consumer internet)

and algorithms (recommendation)

Expand features / algorithms to tackle

new problems (Upsell)

Invest in sales force, data scientist

models and training data sets

Provide consulting and customization

DIAGRAM OF PAYMENT MODELS:

Amazon EC2

(model

parameters)

eZmL

Compute/storage

resources

Amazon S3

(dataset storage)

- CPU server

$14.17/month/company

- GPU server

$25.73/month/company

$0.0295/GB/mo

Data Scientist

Models

Compensation

%/selected

model

Upload data

($0.030/GB/month)

Evaluation Report

Subscribe

($ 200/model/month)

API Call

Prediction

Continual Improvement of

Model

($ 50/model/month)

Startups

Free Bandwidth

Train model (variable

cost after 10 iterations)

Hive

SO WHAT NOW?

WE LEARNED...What type of

company to target

Who in the

company to sell to

We need a highly

trained sales team

Provide consulting

for early clients to

get data

Tier pricing is

better than buying

model or charge

per API call

Companies are

most interested in

recommendation

types of problems

Crowdsourcing is

a second priority

at the moment

The costs of

storage, servers,

employees, office

space, and more

We can reduce

costs (user churn),

not just revenue

increases

Feature extraction

is key differentiator

Privacy is a

concern we must

tend to

Dumping of data

for continued

training is highly

important

And many many more

WE THINK...

Is this a viable business?

Do we want to pursue it after this class?

BUSINESS MODEL CANVAS

BUSINESS CANVAS

BUSINESS CANVAS

THANK YOU

Steve Blank, Jeff Epstein, Steve Weinstein

Nick O’Connor + Sunil Nagaraj

Classmates + TAs