Data Science in the Cloud
-
Upload
margriet-groenendijk -
Category
Data & Analytics
-
view
59 -
download
0
Transcript of Data Science in the Cloud
@MargrietGr
Margriet GroenendijkDeveloper Advocate for IBM Cloud Data Services
SW Cloud meetupBristol
24 November 2016
Data Science in the Cloud
@MargrietGr
About me• Developer Advocate at IBM Cloud Data Services, UK
•Data science•Python, Spark, R, Cloudant, dashDB
• Research Fellow at University of Exeter, UK•Worked with very large observational datasets and the output of global scale climate models
• PhD at Vrije Universiteit Amsterdam, the Netherlands•Explored large observational datasets of carbon uptake by forests
@MargrietGr
Data Engineers
Data Scientists
BusinessAnalysts
App Developers
Data Science is a Team Effort
Data
@MargrietGr
Toolbox
http://nirvacana.com/thoughts/wp-content/uploads/2013/07/RoadToDataScientist1.png
@MargrietGr
Data Science Workflow
DefineQuestion
FindData
ExploreData
CleanData VisualizeandSummarizeData
CreatePredictiveModels
PresentResults
@MargrietGr
Store Data
Object Store - binary files
Relational database
Document store - json
Bluemix
https://console.ng.bluemix.net/
@MargrietGr
RDDs : Resilient Distributed Datasets• Data does not have to fit on a single machine• Data is separated into partitions
• Creation of RDDs•Load an external dataset•Distribute a collection of objects
• Transformations construct a new RDD from a previous one (lazy!)• Actions compute a result based on an RDD
@MargrietGr
Run Spark locally in a Python notebook
https://www.continuum.io/downloads
http://spark.apache.org/downloads.html
Create a new kernel to use in a Jupyter notebook
@MargrietGr
Jupyter Notebooks!
• Server-client application to edit and run notebook documents via a web browser
• Cells with:•Code•Figures and tables•Rich text elements
• Different kernels: Python, R, Scala, Spark
In the Cloud:
@MargrietGr
Define Question
What will the weather be next weekend?
https://unsplash.com/search/autumn?photo=LSF8WGtQmn8https://unsplash.com/search/rain?photo=19tQv51x4-A
@MargrietGr
Explore DataPython packages• requests and json
•API credentials and latitude/longitude of Bristol•json data returned
• pandas, numpy and datetime•convert json to pandas DataFrame (table with multiple indices)•add time as index
@MargrietGr
Weather forecast for Bristolhttps://developer.ibm.com/clouddataservices/2016/10/06/your-own-weather-forecast-in-a-python-notebook/
Visualize DataPython packages• pandas - rolling mean• matplotlib• Basemap
Demo
@MargrietGr
Weather map
https://developer.ibm.com/clouddataservices/2016/10/06/your-own-weather-forecast-in-a-python-notebook/
Python packages• matplotlib• Basemap• itertools• urllib
@MargrietGr
Weather, Twitter and Sentiment
• Where to find the data?• Where to store the data?• Where to analyse the data?
• Quick tools to explore
@MargrietGr
• watson tone analyser
EmotionLanguage style
Social propensities
Analyze how you are coming across to others
@MargrietGr
Workflow
Weather Company Data
crontab -e
0 23 * * * /path/to/file/do_something.sh
python do_something.py
TweetsWeatherSentiment
Watson Tone Analyser
Insights for Twitter
Cloudant NoSQL
@MargrietGr
PixieDust
https://github.com/ibm-cds-labs/pixiedust
Simpler Workflow
@MargrietGr
PixieDust: an Open Source Library that simplifies and improves Jupyter Python Notebooks
• PackageManager• Visualizations• Cloud Integration• Scala Bridge• Extensibility• Embedded Apps
https://developer.ibm.com/clouddataservices/2016/10/11/pixiedust-magic-for-python-notebook/
@DTAIEB55
@MargrietGr
Install Spark packages or plain jars in your Notebook Python kernel without the need to modify configuration file
Uses the GraphFrame Python APIs
Install GraphFrames Spark Package
@MargrietGr
One simple API: display()Call the Options dialog
Panning/Zooming options
Performance statistics
@MargrietGr
Easily export your data to csv, json, html, etc. locally on your laptop or into a cloud-based service like Cloudant or Object Storage
@MargrietGr
Scala Bridge
Define a Python variable
Use the Python var in Scala
Define a Scala variable
Use the Scala var in Python
@MargrietGr
Easily extend PixieDust to create your own visualizations using HTML/CSS/JavaScript
Customized Visualization for GraphFrame Graphs
@MargrietGr
Encapsulate your analytics into compelling User Interfaces better suited for Line of Business Users
@MargrietGr
IBM Watson Data Platform• Data Science Experience• Watson Data Platform• Machine Learning
• Sign up for beta: http://datascience.ibm.com/features#machinelearning
@MargrietGr
https://developer.ibm.com/clouddataservices/author/mgroenen/
Thanks!
Slides will be here: http://www.slideshare.net/MargrietGroenendijk