Demystifying Data Science with an introduction to Machine Learning
-
Upload
julian-bright -
Category
Internet
-
view
209 -
download
4
description
Transcript of Demystifying Data Science with an introduction to Machine Learning
Demys&fying Data Science
with and Intro to Machine Learning
Data science is everywhere
Sexiest job in 21st century*
McKinsey Global Ins&tute report es&mates that by 2018, “the United States alone could face a shortage of 140,000 to 190,000 people with deep analy&cal skills as well as 1.5 million managers and analysts with the know-‐how to use the analysis of big data to make effec&ve decisions”
Source: Harvard business Review Oct’ 2012
So what is Data Science?
Source: Hilary Mason ex-‐Chief data science bit.ly
Who are these unicorns?
Bit about me
@brightsparc
I thought it was all about stats?
It’s a broader skillset
Source: h[p://blogs.wsj.com/cio/2014/02/14/it-‐takes-‐teams-‐to-‐solve-‐the-‐data-‐scien&st-‐shortage/
Data science pipeline
Source: h[p://cacm.acm.org/blogs/blog-‐cacm/169199-‐data-‐science-‐workflow-‐overview-‐and-‐challenges/fulltext
Where does Kaggle fit it?
Degree breakdown in top 100 Areas of study
What’s the deal with big data?
Apache Hadoop Ecosystem
It’s like Map Reduce you know
So what about machine learning?
Pioneer in machine learning, created a checkers game that played itself
“Give machines the ability to learn without explicitly programming them.” Arthur L. Samuel (1959)
Types of algorithms
Some examples
Machine learning process
Build a model
Underfit Overfit
Linear Regression Solve for values of θ in the Hypothesis func&on hθ(x)
Gradient descent algorithm
Minimize cost func&on which is ½ of average square error of predic&on vs. the training data.
Demo: House prices
Cross valida&on – split training/test
Supervised learning model
Recommender systems
Collabora&ve filtering – predict ra&ngs for similar items given other users behavior
Collabora&ve filtering method
Source: h[p://cran.r-‐project.org/web/packages/recommenderlab/vigne[es/recommenderlab.pdf
Similar users based on distance
Manha[an distance Euclidian distance
Demo: Music recommender system
Pearson Correla&on Coefficient
Visualiza&on frameworks
Tableau
D3.js Processing
Raphaël.js
What about online experimenta&on?
What will the future look like
• Online collabora&on
• Open Data
Next gen distributed compu&ng
100x faster in memory, and 10x faster even when running on disk.
Deep learning, a new fron&er?
Geoffrey Hinton @Google
How can I get started? • MOOCs – Coursera Machine Learning (Andrew Ng -‐ Stanford)
– Learning from Data (Abu-‐Mostafa -‐ Caltech)
• Other references – Collec&ve Intelligence – Mining of massive data sets – Open-‐Source Data Science Masters
• Frameworks – Python – Scikit learn – Java – WEKA and Cascading
Ques&ons