Download - Building Data Apps with Python

Transcript
Page 1: Building Data Apps with Python

Building Data Products with PythonDistrict Data Labs

Page 3: Building Data Apps with Python

About the Instructor

Benjamin Bengfort

Data Science:

● MS Computer Science from North Dakota State● PhD Candidate in CS at the University of Maryland● Data Scientist at Cobrain Company in Bethesda, MD● Board member of Data Community DC● Lecturer at Georgetown University

Python Programmer:

● Python developer for 7 years● Open source contributor● My work on Github: https://github.com/bbengfort

Page 4: Building Data Apps with Python

About the Instructor

Benjamin Bengfort

I am available to collaborate and answer questions for all of my students.Twitter: twitter.com/bbengfortLinkedIn: linkedin.com/in/bbengfort Github: github.com/bbengfortEmail: [email protected]

Page 5: Building Data Apps with Python

About the Teaching Assistant

Keshav Magge

● MS Computer Science from University of Houston● Lead Data/Software Engineer at Cobrain Company in

Bethesda, MD

Python Programmer:

● Python developer for 7 years● Plone/Zope for 2 years, Django for 5 years● My work on Github: https://github.com/keshavmagge

Page 6: Building Data Apps with Python

About the Teaching Assistant

Keshav Magge

Reach out to me to talk about all things python/data or just about lifeTwitter: twitter.com/keshavmaggeLinkedIn: linkedin.com/pub/keshav-magge/12/a2a/324/Github: github.com/keshavmaggeEmail: [email protected]

Page 7: Building Data Apps with Python

Building Data Products

Page 8: Building Data Apps with Python

Hilary Mason

A data product is a product that is based on the combination of data and algorithms.”

Page 9: Building Data Apps with Python
Page 10: Building Data Apps with Python

Mike Loukides

A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products.”

Page 11: Building Data Apps with Python
Page 12: Building Data Apps with Python

The Data Science Pipeline

Page 13: Building Data Apps with Python

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Page 14: Building Data Apps with Python

Data Ingestion● There is a world of data out

there- how to get it? Web crawlers, APIs, Sensors? Python and other web scripting languages are custom made for this task.

● The real question is how can we deal with such a giant volume and velocity of data?

● Big Data and Data Science often require ingestion specialists!

Page 15: Building Data Apps with Python

● Warehousing the data means storing the data in as raw a form as possible.

● Extract, transform, and load operations move data to operational storage locations.

● Filtering, aggregation, normalization and denormalization all ensure data is in a form it can be computed on.

● Annotated training sets must be created for ML tasks.

Data Wrangling

Page 16: Building Data Apps with Python

● Hypothesis driven computation includes design and development of predictive models.

● Many models have to be trained or constrained into a computational form like a Graph database, and this is time consuming.

● Other data products like indices, relations, classifications, and clusters may be computed.

Computation and Analyses

Page 17: Building Data Apps with Python

Modeling and Application

This is the part we’re most familiar with. Supervised classification, Unsupervised clustering - Bayes, Logistic Regression,

Decision Trees, and other models.

This is also where the money is.

Page 18: Building Data Apps with Python

● Often overlooked, this part is crucial, even if we have data products.

● Humans recognize patterns better than machines. Human feedback is crucial in Active Learning and remodeling (error detection).

● Mashups and collaborations generate more data- and therefore more value!

Reporting and Visualization

Page 19: Building Data Apps with Python

Don’t forget feedback!(Active Learning for Data

Products)

Page 20: Building Data Apps with Python

What we’re going to build today

SCIENCE BOOKCLUB!!

● A book club that chooses what to read via a recommender system.

● Uses GoodReads data to ingest and return feedback on books.

● Statistical model is a non-negative matrix factorization

● Reporting using Jinja (almost a web app)

Page 21: Building Data Apps with Python

Workflow1. Setting up a Python skeleton2. Creating and Running Tests3. Wading in with a configuration4. Ingestion with urllib and requests5. Creating a command line admin with argparse6. Wrangling with BeautifulSoup and SQLAlchemy7. Modeling with numpy8. Reporting with Jinja2

Page 22: Building Data Apps with Python

Octavo Architecture (really clear DSP)

requests.py

IngestionModule

Raw Data Storage Computational

Data Storage

WranglingModule

BeautifulSoup

SQLAlchemy

RecommenderModule

Numpy

ReportingModule

Jinja2Matplotlib

Page 23: Building Data Apps with Python

requests.py

Octavo Architecture (really clear DSP)

requests.pyIngestionModule

Raw Data Storage

Computational Data Storage

WranglingModule

BeautifulSoup

SQLAlchemy

RecommenderModule

Numpy

ReportingModule

Jinja2

Matplotlib

Page 24: Building Data Apps with Python

How to tackle this course ...

Page 25: Building Data Apps with Python

How to tackle this course ...

Lean into it- absorb as much as possible, don’t worry about falling

behind - it will be in your head!

Then afterwards - lets all digest it together (keep in touch)