Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf ·...
Transcript of Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf ·...
![Page 1: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/1.jpg)
Training the Cloud with the Crowd
Twitter + CrowdFlower + Google Prediction API
Kevin Cocco SproutLoop.com
![Page 2: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/2.jpg)
goo.gl/qqWFL - http://www.sproutloop.com/prediction_demo
![Page 3: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/3.jpg)
Creating & Using a Predictive Model
Google Prediction API
• Training data crowdsourcing
• About • Upload - Train - Predict
• API 101
Results & Discoveries
![Page 4: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/4.jpg)
http://search.twitter.com/search.json?q=weather%20OR%20hot%20OR%20raining&rpp=20&lang=en&geocode=36.114,-115.172,10mi
• http://search.twitter.com/search.json? q=weather%20OR%20hot%20OR%20raining rpp=20 lang=en geocode=36.114,-115.172,10mi
• Streaming API • Twitter re-syndicators: DataSift.com & GNIP.com • More... dev.twitter.com
![Page 5: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/5.jpg)
• DialogueEarth.org • 120k+ Weather Related Tweets • 5 CrowdFlower Judgements per Tweet • Avg. $0.02 per Judgement / $0.10 per Unit(Tweet) • http://crowdflower.com/docs/api
![Page 6: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/6.jpg)
LMNOP NLP/ML Model Let's walk through some math:
Predictive Modeling
![Page 7: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/7.jpg)
• Process: Upload -> Train -> Predict • Web Service, RESTful, OAuth • Many modeling techniques, two types: o Regression: Estimating numeric value o Categorical: Choose a category of unstructured text
• Paid Service (free for first 6 months) o Base fee: $10 per/month - Includes 10k predictions o Predictions $0.50/1,000 - beyond initial 10k o Training $0.002/MB
• https://developers.google.com/prediction/
Prediction API
![Page 8: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/8.jpg)
• Upload - Training Data o API, GSUtil, or Google Cloud Storage Manager
• Formatting Training Data o .CSV (label, feature1,feature2,...) o "positive","i love this weather @cancun"
• Google Refine o "Power tool for working with messy data" o http://code.google.com/p/google-refine/
Prediction API
![Page 9: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/9.jpg)
• Train - Build Model o API Console https://code.google.com/apis/console/ o API Explorer https://code.google.com/apis/explorer/
• Predict - Query model o RESTful API & libraries: Python, Java, PHP,... o JSON Response to Prediction "I love this weather"
... "outputLabel": "positive", "outputMulti": [ { "label": "negative", "score": 0.000202 }, { "label": "neutral", "score": 0.000122 }, { "label": "positive", "score": 0.995215 }, ...
Prediction API
![Page 10: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/10.jpg)
![Page 11: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/11.jpg)
Crowd vs Cloud Comparison of labels
Num
ber o
f Tw
eets
![Page 12: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/12.jpg)
Confusion Matrix Accuracy - Precision - Recall
Predicted Labels
Cro
wdf
low
er
![Page 13: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/13.jpg)
Predictive Model Training Size • Accuracy varies based on type of data and nuance • More training data = better accuracy
![Page 14: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/14.jpg)
Discoveries • Tweet sentiment analysis is confusion for humans
o All 5 C.F. workers agreed on tweets sentiment 44% • When humans agree on tweet, model does well o 100% CF agreement = predictive accuracy 91%
• Google Prediction API Poor for: o Batch predictions o Model tuning
• Google Prediction API Good for: o Real time o Scaling - PaaS o Easy integration
![Page 15: Training the Cloud with the Crowd2012.sentimentsymposium.com/presentations/SAS12-Cocco.pdf · Creating & Using a Predictive Model Google Prediction API • Training data crowdsourcing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f605d44e5d64247e14180cb/html5/thumbnails/15.jpg)
Thank you!
Kevin Cocco @kcocco | [email protected]
SproutLoop Simplified Building and Using Predictive Models
slides: goo.gl/Ws4Gh