Street Fighting Data Science

47
Street Fighting Data Science Pete Skomoroch @peteskomoroch O’Reilly Strata Conference February 28, 2012

description

Practical problem solving with data involves more than just visualization or applying the latest machine learning techniques. Intuition, domain knowledge, and reasonable approximations can mean the difference between a successful model and a catastrophic failure. We’ll dive into some best practices I’ve extracted from solving real world problems like computing trending topics, cleaning election data, and ranking experts on social networks.New analysts or engineers are often lost when textbook approaches fail on real world data. Drawing inspiration from problem solving techniques in mathematics and physics, we will walk through examples that illustrate how come up with creative solutions and solve problems with big data.

Transcript of Street Fighting Data Science

Page 1: Street Fighting Data Science

Street Fighting Data Science

Pete Skomoroch@peteskomoroch

O’Reilly Strata ConferenceFebruary 28, 2012

Page 2: Street Fighting Data Science
Page 3: Street Fighting Data Science
Page 4: Street Fighting Data Science

To solve hard problems:

Page 5: Street Fighting Data Science

Think like a street fighter

Page 6: Street Fighting Data Science

Analyze

Improvise

Anticipate

Adapt

Page 7: Street Fighting Data Science

How does this apply to Data Science?

Page 8: Street Fighting Data Science
Page 9: Street Fighting Data Science
Page 10: Street Fighting Data Science

Pricing model decreases profit in test stores by 30%

Page 11: Street Fighting Data Science

What went wrong?

Page 12: Street Fighting Data Science

• Ran complex “black box” model

• Didn’t analyze the data first

• Didn’t anticipate elasticity errors

Page 13: Street Fighting Data Science

How could this have been avoided?

Page 14: Street Fighting Data Science
Page 15: Street Fighting Data Science
Page 16: Street Fighting Data Science

The Men Who Stare at Charts

Page 17: Street Fighting Data Science

Look at your data

Page 18: Street Fighting Data Science
Page 19: Street Fighting Data Science
Page 20: Street Fighting Data Science
Page 21: Street Fighting Data Science
Page 22: Street Fighting Data Science
Page 23: Street Fighting Data Science

Raw Data: FEC Contributions

Page 24: Street Fighting Data Science

retired 32938self-employed 25454information requested per best efforts 1313homemaker 4992the bank of new york 65john mccain 2008 57u.s. government 121idt corp. 54merrill lynch 273blank rome l.l.p. 51department of defense 100u.s. army 90us army 141none 642greenberg traurig 118northrop grumman 105at&t 141citigroup 134bridgewater associates 44univision communications inc. 36

not employed 118672self employed 92973information requested 17627refused 728unemployed 1493self-employed 5919university of california 825microsoft 915university of chicago 616harvard university 848google 662stanford university 716university of washington 614ibm 1016columbia university 782university of michigan 514freelance 372sa 150sidley austin llp 509na 999

Page 25: Street Fighting Data Science

retired 32938self-employed 25454information requested per best efforts 1313homemaker 4992the bank of new york 65john mccain 2008 57u.s. government 121idt corp. 54merrill lynch 273blank rome l.l.p. 51department of defense 100u.s. army 90us army 141none 642greenberg traurig 118northrop grumman 105at&t 141citigroup 134bridgewater associates 44univision communications inc. 36

not employed 118672self employed 92973information requested 17627refused 728unemployed 1493self-employed 5919university of california 825microsoft 915university of chicago 616harvard university 848google 662stanford university 716university of washington 614ibm 1016columbia university 782university of michigan 514freelance 372sa 150sidley austin llp 509na 999

Page 26: Street Fighting Data Science
Page 27: Street Fighting Data Science
Page 28: Street Fighting Data Science
Page 29: Street Fighting Data Science

Katherine Alexandra

Page 30: Street Fighting Data Science
Page 31: Street Fighting Data Science
Page 32: Street Fighting Data Science
Page 33: Street Fighting Data Science

“Don't indulge in any unnecessary, sophisticated moves. You'll get clobbered if you do, and in a street fight you'll have your shirt zipped off you.”

- Bruce Lee

Page 34: Street Fighting Data Science
Page 35: Street Fighting Data Science
Page 36: Street Fighting Data Science
Page 37: Street Fighting Data Science
Page 38: Street Fighting Data Science
Page 39: Street Fighting Data Science
Page 40: Street Fighting Data Science
Page 41: Street Fighting Data Science
Page 42: Street Fighting Data Science
Page 43: Street Fighting Data Science

Look at your errors

Page 44: Street Fighting Data Science

• Sanity check row counts

• Track errors over time

• Find patterns in the error data

• Add missing features to models

• Replace models entirely

Page 45: Street Fighting Data Science
Page 46: Street Fighting Data Science

Analyze

Improvise

Anticipate

Adapt

Page 47: Street Fighting Data Science

Think like a street fighter