Statistics: the grammar of Data Science
-
Upload
icaro-medeiros -
Category
Technology
-
view
1.083 -
download
1
Transcript of Statistics: the grammar of Data Science
STATISTICS THE GRAMMAR OF DATA SCIENCE
ÍCARO MEDEIROS
Big Data WeekSão Paulo - SP, 23/11/2015
WHY TO PURSUE SOLID BACKGROUND ON STATISTICS?
https://twitter.com/josh_wills/status/198093512149958656
STATISTICS TO HELP NOT GETTING INTO THE DANGER ZONE
http://berkeleysciencereview.com/scientific-collaborations-uc-berkeley-data-driven-cover/
INSPIRATIONS FOR THIS KEYNOTE
https://speakerdeck.com/jakevdp/statistics-for-hackershttps://www.goodreads.com/book/show/17986418-naked-statistics
METRICS
MEANINGMETHODS
MISUSES-+++
HOW TO LIE: MOVIE REVIEWS
http://fivethirtyeight.com/features/fandango-movies-ratings
BECAUSE IT LOOKS LIKE MATH, WE [THINK] IT’S SOMEHOW OBJECTIVELY TRUE, BUT IT’S ALL BASED ON SUBJECTIVE EXPERIENCE
FANDANGO LOVES MOVIES
http://fivethirtyeight.com/features/fandango-movies-ratings
(AND SELLS MOVIE TICKETS)
ATTENTION TO PROVENANCE
HOW TO LIE: ROUNDING
http://fivethirtyeight.com/features/fandango-movies-ratings
HOW TO LIE: ROUNDING
http://fivethirtyeight.com/features/fandango-movies-ratings
CORRELATION IS NOT CAUSATION
SHORT BREAKS AT WORK "CAUSE" CANCER
Example from ‘Naked Statistics'
SMOKINGCAUSES CANCER
BREAKS AT WORK
CORRELATED WITH
LEAD TO
SPURIOUS CORRELATIONS
http://www.tylervigen.com/spurious-correlations
A/B TESTING
https://vwo.com/ab-testing/
A/B TESTING CAN BE BAD
https://www.quora.com/When-should-A-B-testing-not-be-trusted-to-make-decisions/answer/Edwin-Chen-1
▸Feedback loops
▸Novelty effect
▸Seasonality
▸Wrong metrics
http://www.evanmiller.org/how-not-to-run-an-ab-test.html
CHOOSE THE RIGHT METRICS: CLICKS VS DWELL TIME
http://yahoolabs.tumblr.com/post/99405569711/science-powering-product-and-personalization
CHOOSE THE RIGHT METRICS: SHARING IS NOT NECESSARILY CARING
http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/
https://xkcd.com/882 (Significant)
SIGNIFICANCE
THE RED CARD PROBLEM
http://fivethirtyeight.com/features/science-isnt-broken/http://www.nature.com/news/crowdsourced-research-many-hands-make-tight-work-1.18508
61 RESEARCHERS: SAME PROBLEM, DIFFERENT METHODS
http://fivethirtyeight.com/features/science-isnt-broken/
THE BACON CONTROVERSY
MORE ABOUT P-VALUES
https://twitter.com/Ted_Underwood/status/658983555008040960
IS THIS A GOOD CLASSIFICATION?
http://www.wired.com/2015/10/who-does-bacon-cause-cancer-sort-of-but-not-really/
1 CARCINOGENIC
2A PROBABLY
2B POSSIBLY
…
EFFECT SIZE: BACON VS CIGARETTES (SAME CATEGORY)
This is bacon
18%
Cigarette
WAIT FOR IT…
Cigarette
2500%
http://www.wired.com/2015/10/who-does-bacon-cause-cancer-sort-of-but-not-really/
http://www.theguardian.com/society/2015/oct/26/bacon-ham-sausages-processed-meats-cancer-risk-smoking-says-who
https://catracalivre.com.br/geral/sustentavel/indicacao/muito-alem-do-bacon-agrotoxicos-tambem-podem-causar-cancer/
THE SCHRODINGER’S DIET
http://www.vox.com/2015/5/20/8621527/health-tips-reporter
”
http://nerds.airbnb.com/scaling-data-science
DATA SCIENCE IS AN ACT OF INTERPRETATION OF CUSTOMER'S VOICE
GOOD DATA VISUALIZATION: TIPS FOR SCATTER PLOTS
http://content.visage.co/hs-fs/hub/424038/file-2094950163-pdf/Data_Visualization_101_How_to_Design_Charts_and_Graphs.pdf
DAVID MCCANDLESS: INFORMATION IS BEAUTIFUL
http://www.informationisbeautiful.net/visualizations/diversity-in-tech/
IT’S EASY TO LIE WITH STATISTICS, BUT IT’S HARD TO TELL THE TRUTH WITHOUT THEM
Andrejs Dunkels, as mentioned on "Naked Statistics"
TAKEAWAY MESSAGE
WHY PYTHON IS BETTER FOR DATA SCIENCE
MY NEXT TALK
São Paulo Big Data MeetupSão Paulo - SP, 25/11/2015VivaReal Portal Imobiliário. Rua Bela Cintra, 539 - Consolação
http://www.meetup.com/pt/Sao-Paulo-Big-Data-Meetup
slides icaromedeiros.com.br
slideshare.net/icaromedeiros
@icaromedeiros