Ezml Stanford 2015
-
Upload
steve-blank -
Category
Education
-
view
484 -
download
1
Transcript of Ezml Stanford 2015
eZmLThe Old The New
An ecosystem for machine learning. EZML
will allow anyone of any background to
upload data and receive a solution for their
machine learning problem.
Size of Opportunity:
SAM: 10 Billion (startups 10~100 ees)
Target Market: 8 Billion (high tech)
Helping startups increase user engagement and
retention through crowdsourced
recommendations and analytics. Engineers
without any data science background can
upload data and seamlessly integrate a
machine learning solution into their product.
Size of Opportunity:
SAM: 10 Billion (startups wz 10~100 ees)
Target Market: 4 Billion (consumer internet)
# of interviews: 103
THE TEAM
Jim Cai
MSCS
Technical Lead LinkedIn
Engineering
William Song
MSCS
Self driving Car
Google[x]; Facebook
Engineering
Billy Jun
MSCS
Robotics at MDA
Engineering
Sam Yu
MSx GSB
SAIF Partners, GE
Strategy + Bizdev
Jordan Segall
MS in MS&E, BSCS
PM RelateIQ; FDE Palantir
Product + Bizdev
AT THE BEGINNINg:
ORIGINAL HYPOTHESES
WHAT: Model selection is time consuming and can be
automated
WHO: Data scientists within companies that can automate
“annoying” parts of their job
HOW: We can provide the entire model to the user that can
be embedded within their system at the end of the process
WHAT DID WE DO?
Talked to primarily data scientists at tech companies and
data science consulting
Automated Model
Recommendation
Data Scientist Model
Contribution
On-the-fly update
API Access
Saves time performing model
selection and money hiring data
scientists
Obtain models that improve user
engagement and
product quality
Purchased model continues to
improve automatically
Automated model selection system produces
the best model with little to no data science
knowledge
Save the customers time and money by
running everything in
the cloud
“Typically less than 20% of our time in data
science consulting is model selection. We’ll get
it to working ‘decently enough’ then move
on...we don’t aim for the best...we would ONLY
use EZML if the feature extraction part of the
project is trivial. Feature extraction is critically
important.”
- Jay Hack, Founder 205 Consulting, Palantir
Engineer
“Model selection is the easy problem; feature
extraction is the hard part.”
- Nick Gorski, Machine Learning Lead at
TellPart
Photozeen - one of the first companies we
interviewed that recognized their deficiency in
machine learning, and needed help to improve
user retention and engagement.
GOING FORWARD
Is there a need?
Who is our customer
archetype?
Interview more companies
Develop MVP
Hone pitch
Articulate product more clearly
UPDATED HYPOTHESES
WHAT: Our tool will still provide machine learning as a
service, but our target customer are companies, not data
scientists/consulting firms.
WHO: Anyone within the firm, from engineers to marketing,
can use our tool. Our target contact is the founder.
Reached out to 200+ companies
● 51 from Y-Combinator
● 49 from Crunchbase
● 35 from Tec Club
● 34 from Tech Stars
● and more
WHat we did
LESSONS LEARNED: PART 2
There is a need for companies lacking
data science solutions to use our tool
While these companies ultimately
did not work out….
….the reasons why they didn’t
were key to our learning
RelateIQ
Too large with enough data scientists on board;
would need fully working EZML
Verge Campus
Recommendation is an ancillary feature
Photozeen
Good problem for us, but not enough data
LESSONS LEARNED: PART 2
Privacy would not be a concern for
companies that use our tool; they
would reveal their data features publicly
for better results, Kaggle style.
“You need to properly address privacy and
security policies before we share data...we
don’t want to reveal our secret sauce.”
Olga Mack, Assistant General Counsel, Zoosk
“I’m uncomfortable sending you my data since our business is based on selling data - privacy is
already a touchy subject for us.”
Andy Bromberg, CEO Sidewire
We would go to CEOs, who would redirect us
to their CTOs.
CTOs tended to have a greater understanding
of the value add to their business, and how
they were unable to implement machine
learning without a tool like EZML.
CUSTOME ARCHETYPEStartup with <10M in funding
CTO or head data scientist/engineer
enough data for machine learning to have effect
no intense privacy concerns
clear problems (ie. user engagement; user churn)
UPDATED HYPOTHESES
SELF-SERVICE: Since it is a self-service tool, the
sales process won’t be difficult
CONSULTING: We can apply a consulting model
for early customers; in exchange for manually
working on problems, we can learn from real
data and create our platform
Keep Customers
AC
TIV
AT
E
AC
QU
IRE
UP
-SE
LL
NE
XT
-SE
LL
CR
OS
S-S
ELL
RE
FE
RR
ALS
Perfect User Experience:
Easiness, Accuracy
Automatic model upgrade
with continual data upload
Free access to machine learning
models for analytics
Low charge for API calls
Focused sectors (consumer internet)
and algorithms (recommendation)
Expand features / algorithms to tackle
new problems (Upsell)
Invest in sales force, data scientist
models and training data sets
Provide consulting and customization
DIAGRAM OF PAYMENT MODELS:
Amazon EC2
(model
parameters)
eZmL
Compute/storage
resources
Amazon S3
(dataset storage)
- CPU server
$14.17/month/company
- GPU server
$25.73/month/company
$0.0295/GB/mo
Data Scientist
Models
Compensation
%/selected
model
Upload data
($0.030/GB/month)
Evaluation Report
Subscribe
($ 200/model/month)
API Call
Prediction
Continual Improvement of
Model
($ 50/model/month)
Startups
Free Bandwidth
Train model (variable
cost after 10 iterations)
WE LEARNED...What type of
company to target
Who in the
company to sell to
We need a highly
trained sales team
Provide consulting
for early clients to
get data
Tier pricing is
better than buying
model or charge
per API call
Companies are
most interested in
recommendation
types of problems
Crowdsourcing is
a second priority
at the moment
The costs of
storage, servers,
employees, office
space, and more
We can reduce
costs (user churn),
not just revenue
increases
Feature extraction
is key differentiator
Privacy is a
concern we must
tend to
Dumping of data
for continued
training is highly
important
And many many more