Fairy tale from the land of data
-
date post
18-Oct-2014 -
Category
Data & Analytics
-
view
212 -
download
4
description
Transcript of Fairy tale from the land of data
![Page 1: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/1.jpg)
Fairy tales in the land of dataOr - do I know what I’m doing?
By @przemur from
http://about.me/przemek.maciolek
![Page 2: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/2.jpg)
A story
![Page 3: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/3.jpg)
http://yamao.deviantart.com/art/Cleric-comm-343786321 https://www.flickr.com/photos/jsjgeology/8359854092/
![Page 4: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/4.jpg)
![Page 5: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/5.jpg)
![Page 6: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/6.jpg)
![Page 7: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/7.jpg)
Suspense
![Page 8: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/8.jpg)
<?
“The hammers from the new
provider are no good, sayr.”
![Page 9: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/9.jpg)
What would you do?
![Page 10: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/10.jpg)
New hammers since this month
![Page 11: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/11.jpg)
install.packages('ggplot2') require('ggplot2') setwd("/Users/pmm/Desktop/hammer") all <- read.csv(file="all.csv") !qplot(all$month_sequence, all$dwarfs) + geom_smooth() qplot(all$month_sequence, all$production) + geom_smooth() !all$prod_per_dwarf <- all$production / all$dwarfs qplot(all$month_sequence, all$prod_per_dwarf) + geom_smooth()
![Page 12: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/12.jpg)
Number of dwarfs working in the mine
The hammers from the new provider started being
distributed to the new miners.
![Page 13: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/13.jpg)
Total production of gold
![Page 14: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/14.jpg)
Per-dwarf average production
![Page 15: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/15.jpg)
Who sees any problem?
![Page 16: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/16.jpg)
Lets look at the production of each dwarf, relative to the time one applied…
Dwarfs which are using the OLD hammer design
Dwarfs which are using the NEW hammer design
![Page 17: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/17.jpg)
new <- read.csv(file="new_relative.csv") old <- read.csv(file="old_relative.csv") !qplot(new$relative_month, new$production) ggplot(new, aes(x=relative_month, y=production)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.2)
# This will look much better!old$type='old' new$type='new' old_and_new = rbind(old,new) ggplot(old_and_new, aes(x=relative_month, y=production, color=type)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.2)
![Page 18: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/18.jpg)
Scatterplot showing relative production done using old and new hammers
![Page 19: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/19.jpg)
What now?
![Page 20: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/20.jpg)
ggplot(old_and_new, aes(x=relative_month, y=production, color=type)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.1) + geom_smooth(method=lm)
The new hammers wear much faster!
![Page 21: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/21.jpg)
How much did the dwarfs lost?
![Page 22: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/22.jpg)
old_m = lm(production ~ relative_month, old) new$possible_production <- predict(old_m, new) sum(new$possible_production) - sum(new$production) (sum(new$possible_production) - sum(new$production))/sum(new$production)
0.5%
Now, taking into account the price of hammer, one can select the optimal strategy… but that’s another story…
![Page 23: Fairy tale from the land of data](https://reader034.fdocuments.in/reader034/viewer/2022042713/54432a83afaf9fef098b47a6/html5/thumbnails/23.jpg)
Lessons learned …?
• Don’t trust the data blindly, ask questions
• Try to understand underlying rules of the system
• Don’t be shy with trying various models
• If using R, go for ggplot2