Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle
-
Upload
nashvilletechcouncil -
Category
Data & Analytics
-
view
137 -
download
0
description
Transcript of Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle
![Page 1: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/1.jpg)
Creativity and CuriosityTHE TRIAL AND ERROR OF DATA SCIENCE
![Page 2: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/2.jpg)
I love data, everything comes easy to me…
![Page 3: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/3.jpg)
There are so many things to try and explore on a given problem, where to start?
• Language (Julia, Python,R, C++,etc)• Visualization (ggplot, Tableau, D3,etc)• Pre-process (standardize, variance scaling, feature encoding, etc) • Classifier (GLM, SVM, SGD, Knn, Random Forest, etc)• Post-process (Rule-truncatation, post-pruning, etc)• Ensemble (weighted average, min, max, probabilities, etc)
![Page 4: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/4.jpg)
Where Many Individual Come To Die…
(Model Tuning Hell)
![Page 5: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/5.jpg)
Structured Process Allows you to remove uncertainty and ensure outcomes in a methodical way.
Gives you an idea of what activities to do and when.
Details for each project varies, however the structure should stay the same.
The process is almost never linear, you should revisit each step again and again.
Knowledge Discovery Process1. Define the goal2. Explore the data3. Prepare the data4. Choosing and evaluating
models5. Ensemble
![Page 6: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/6.jpg)
Define the Goal• Why do the sponsors want the project in the first place?
What do they lack, and what do they need?• What are they doing to solve the problem now, and why
isn’t that good enough?• What resources will you need: what kind of data? Do
you have domain experts to collaborate with, and what are the computational resources?
• How do the project sponsors plan to deploy your results? What are the constraints that have to be met for successful deployment?
• Is the data quality good enough?
![Page 7: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/7.jpg)
Define the GoalModeling:
• Classification• Scoring• Ranking• Clustering• Finding relations• Characterization
Model Evaluation and critique• Is it accurate enough for your needs? Does it
generalize well?• Does it perform better than “the obvious guess”?
Better than whatever is currently in use?• Do the results of the model (coefficients, clusters,
rules) make sense in the context of the problem domain?
![Page 8: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/8.jpg)
Explore the DataUse summary statistics to spot problems
• Missingness• Data ranges (too wide/too
narrow)• Invalid values• Outliers• Units
![Page 9: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/9.jpg)
Explore the DataUse graphics and visualization to spot problems
Single-Variable First• Peak of distribution?• How many peaks?• How normal (or lognormal is the data?• How much data variation is there? Is it
concentrated in a certain interval or category?
• Use histograms, density plots, bar charts, scatter plots with smoothing curve.
![Page 10: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/10.jpg)
Prepare the Data
Cleaning Data• Treating missing
values (NAs)• Data
Transformations
Sampling for Modeling and Validation• Test and training splits• Creating sample group column• Record grouping
![Page 11: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/11.jpg)
Choosing and Evaluating ModelsMapping problems to machine learning tasks (use a problem-to-method mapping)
• Solving classification problems• Naïve Bayes• Decision Trees• Logistic Regression
• Solving scoring problems• Linear Regression• Logistic Regression
• Working without known targets• K-means clustering• Apriori algo to find association rules• Nearest neighbor
![Page 12: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/12.jpg)
Choosing and Evaluating ModelsEvaluating models
• Evaluating classification models• Confusion matrix• Precision• Recall• Sensitivity • Specificity
• Evaluating scoring models• Root Mean Square Error• R-squared• Correlation• Absolute Error
![Page 13: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/13.jpg)
Choosing and Evaluating ModelsEvaluating models
• Evaluating probability models• Area Under the Curve• Log Likelihood• Deviance• Akaike Information Criterion (AIC)• Entropy
• Evaluating ranking models• Intra-cluster distances• Cross-cluster distances
![Page 14: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/14.jpg)
Choosing and Evaluating ModelsValidating models
• Identify common model problems• Bias – systematic error• Variance – oversensitivity of the model• Overfit – doesn’t generalize well• Nonsignficance – relation may not hold
• Ensuring model quality• Testing on Held-Out Data• K-Fold Cross Validation• Significance Testing• Confidence Intervals
![Page 15: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/15.jpg)
Ensemble
How do I bring all my work together?• Weighted average• Min• Max• Voting• Stacking• Neural network
![Page 16: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/16.jpg)
More IdeasLearn about ensemble methods, regularization, and principled dimension reduction
• Hastie, Tibshirani, and Friedman’s The Elements of Statistical Learning, Second Edition
• If you want to understand the consequences of a method, has a math bent
Keep your saw sharp Plug-in
![Page 17: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/17.jpg)
Using your creativity and curiosity you can slay mighty data science problems.
![Page 18: Creativity and Curiosity: The Trial and Error of Data Science, Presented by Damian Mingle](https://reader034.fdocuments.in/reader034/viewer/2022052619/5564359ad8b42ad3308b4adf/html5/thumbnails/18.jpg)
@DamianMinglehttp://www.WPC-Services.com
http://www.DamianMingle.com