Big data meetup

Post on 15-Jun-2015

1.372 views 1 download

Tags:

Transcript of Big data meetup

Data ScienceData Meetup Jan. 12

What is data science?Besides a reason to have beer and pizza…

What does the literature say?

Hacking“Good data scientists understand, in a

deep way, that the heavy lifting of

cleanup and preparation isn’t

something that gets in the way of solving

the problem…

bash/awk/sed

DJ Patilit is the

problem”

StatisticsWhat’s the probability that 2 people in the front 2 rows share a birthday?1. ~10%2. ~20%3. ~50%4. ~90%What’s the probability that a 99% accurate test diagnosed a 1/1000 disease?1. ~10%2. ~50%3. ~90%4. ~99%

Domain Expertise

Intelligence CookbookJust follow the steps

The Recipe

First, make it valuable.Then, make it possible.Then, make it beautiful.

Then, make it smart.

Example

E-Commerce website

Make it valuable

Find a KPI that is correlated to bottom line

revenue

e.g. number of products the visitor browses

through

Make it possible

Develop the simplest heuristic

e.g. show the visitor one of the top 10 selling products

Make it beautiful

Create a method to quickly test new algorithms against old ones

e.g. create a framework that split tests two models and reports which one is better

Make it smart

Figure out in what field your problem is and choose an off the

shelf algorithm

e.g. recognize that the problem is product

recommendation and use collaborative filtering

Common ML problems• Supervised learning

• Classification• Regression• Anomaly detection

• Unsupervised learning• Clustering• Separation

• Recommendation• Feature based recommendation• Collaborative filtering

• Search• Indexing• Ranking

To sum it all upReal data science is hard

but …

Real data science is the last step in data science, not the first

and besides …

The most important thing in data science is the business, not the science

Questions?

email: vitalyp@liveperson.com

Twitter: @bigdatasc