Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split...
Transcript of Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split...
![Page 1: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/1.jpg)
TODAY’S LECTURE
Data collection
Data processing
Exploratory analysis
& Data viz
Analysis, hypothesis testing, &
ML
Insight & Policy
Decision
!1
![Page 2: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/2.jpg)
TODAY’S LECTUREMore nonlinear classification/regression methods • Decision trees recap • Bootstrapping, bagging • random forests in Scikit-Learn • Support Vector Machines (SVMs) Thanks to: Hector Corrada Bravo (UMD), Panagiotis Tsaparas (U of I), Oliver Schulte (SFU)
!2
![Page 3: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/3.jpg)
DECISION TREES IN SCIKIT
Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc)
!3
from sklearn.datasets import load_irisfrom sklearn import tree
# Load a common dataset, fit a decision tree to itiris = load_iris()clf = tree.DecisionTreeClassifier()clf = clf.fit(iris.data, iris.target)
# Predict most likely classclf.predict([[2., 2.]])
# Predict PDF over classes (%training samples in leaf)clf.predict_proba([[2., 2.]])
![Page 4: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/4.jpg)
VISUALIZING A DECISION TREE
!4
from IPython.display import Imagedot_data = tree.export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True)graph = pydotplus.graph_from_dot_data(dot_data)Image(graph.create_png())
![Page 5: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/5.jpg)
BOOTSTRAPPINGBootstrapping is to make an inference about an estimate of statistic. Given:
• Observations
• What can we say about the standard error of y? • Challenges:
• Our estimation procedure is deterministic. • we should retain whatever properties of estimate y result
from obtaining it from n samples.
!5
x1, …, xn and estimates, y =1n
n
∑i=1
xi
![Page 6: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/6.jpg)
BOOTSTRAPPINGBootstrapping is a randomization procedure that measures the variance of estimates of y. Procedure:
• Generate B random datasets by sampling with replacement from dataset
• Construct estimates from each dataset,
• Compute center(mean) and spread(variance) of estimates
!6
x1, …, xn . Denote randomized dataset b as x1b, …, xnb
yb =1n
n
∑i=1
xib
yb
![Page 7: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/7.jpg)
RANDOM FORESTSDecision trees are very interpretable, but may be brittle to changes in the training data, as well as noise Random forests are an ensemble method that: • Resamples the training data; • Builds many decision trees; and • Averages predictions of trees to classify. This is done through bagging and random feature selection
!7
![Page 8: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/8.jpg)
BAGGINGBagging: Bootstrap aggregation
Resampling a training set of size n via the bootstrap: • Sample with replacement n elements General scheme for random forests: 1. Create B bootstrap samples, {Z1, Z2, …, ZB} 2. Build B decision trees, {T1, T2, …, TB}, from {Z1, Z2, …, ZB} Classification/Regression: 1. Each tree Tj predicts class/value yj
2. Return average 1/B Σj={1,...,B} yj for regression, or majority vote for classification
!8
![Page 9: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/9.jpg)
!9
obs_id ft_1 ft_21 12.2 puppy2 34.5 dog3 8.1 cat
Original training dataset (Z):
obs_id ft_1 ft_23 8.1 cat2 34.5 dog3 8.1 cat
obs_id ft_1 ft_21 12.2 puppy2 34.5 dog1 12.2 puppy
obs_id ft_1 ft_21 12.2 puppy1 12.2 puppy3 8.1 cat
Z1 Z2ZB
B Bootstrap samples Zj
Aggregate/Vote
T1 T2 TBTj
Class estimate or predicted value
![Page 10: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/10.jpg)
RANDOM ATTRIBUTE SELECTIONWe get some randomness via bootstrapping • We like this! Randomness increases the bias of the forest
slightly at a huge decrease in variance (due to averaging) We can further reduce correlation between trees by: 1. For each tree, at every split point … 2. … choose a random subset of attributes … 3. … then split on the “best” (entropy, Gini) within only that
subset
!10
![Page 11: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/11.jpg)
RANDOM FORESTS IN SCIKIT-LEARN
Can we get even more random?! Extremely randomized trees (ExtraTreesClassifier) do bagging, random attribute selection, but also: 1. At each split point, choose random splits 2. Pick the best of those random splits Similar bias/variance performance to RFs, but can be faster computationally
!11
from sklearn.ensemble import RandomForestClassifier
# Train a random forest of 10 default decision treesX = [[0, 0], [1, 1]]Y = [0, 1]clf = RandomForestClassifier(n_estimators=10)clf = clf.fit(X, Y)
![Page 12: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/12.jpg)
!12
SUPPORT VECTOR MACHINES
![Page 13: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/13.jpg)
SUPPORT VECTOR MACHINES (SVM)
Find a linear hyperplane (decision boundary) that will separate the data !13
![Page 14: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/14.jpg)
SUPPORT VECTOR MACHINES
One possible solution
B1
!14
![Page 15: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/15.jpg)
SUPPORT VECTOR MACHINES
Other possible solutions
B2
!15
![Page 16: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/16.jpg)
SUPPORT VECTOR MACHINES
Which one is better? B1 or B2? ?????????? How do you define better? ??????????
B1
B2
!16
![Page 17: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/17.jpg)
SUPPORT VECTOR MACHINES
Find hyperplane maximizes the margin ! B1 is better than B2
B1
B2
b11
b12
b21b22
margin
!17
![Page 18: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/18.jpg)
SUPPORT VECTOR MACHINESB1
b11
b12
!18
w ⋅ x + b = 0
w ⋅ x + b = − 1 w ⋅ x + b = + 1
f (x) = {1 if w ⋅ x + b ≥ 10, if w ⋅ x + b ≤ − 1 Margin =
2| | w | |2
![Page 19: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/19.jpg)
SUPPORT VECTOR MACHINESWe want to maximize:
Which is equivalent to minimizing:
But subject to the following constraints:
This is a constrained optimization problem • Numerical approaches to solve it (e.g., quadratic programming)
!19
Margin =2
| | w | |2
L(w) =| | w | |2
2
w ⋅ x + b ≥ 1 if yi = 1w ⋅ x + b ≤ − 1 if yi = − 1
![Page 20: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/20.jpg)
SUPPORT VECTOR MACHINESWhat if the problem is not linearly separable?
𝜉𝑖
𝑤
Apply some sort of penalty
!20
![Page 21: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/21.jpg)
SUPPORT VECTOR MACHINESWhat if the problem is not linearly separable? • Introduce slack variables • Need to minimize:
• Subject to:
!21
L(w) =| | w | |2
2+ C(
N
∑i=1
ξKi )
w ⋅ x + b ≥ 1 − ξi if yi = 1w ⋅ x + b ≤ − 1 + ξi if yi = − 1
![Page 22: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/22.jpg)
NONLINEAR SUPPORT VECTOR MACHINESWhat if the decision boundary is not linear?
!22
![Page 23: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/23.jpg)
NONLINEAR SUPPORT VECTOR MACHINESTransform data into higher dimensional space
!23
![Page 24: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/24.jpg)
SVMS IN SCIKIT-LEARN
Lots of defaults used for hyperparameters – can use cross validation to search for good ones
!24
from sklearn import svm
# Fit a default SVM classifier to fake dataX = [[0, 0], [1, 1]]y = [0, 1]clf = svm.SVC()clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
![Page 25: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/25.jpg)
MODEL SELECTION IN SCIKIT-LEARN
!25
from sklearn.model_selection import train_test_splitfrom sklearn.model_selection import GridSearchCVfrom sklearn.metrics import classification_report
# ... Load some raw data into X and y ...# Split the dataset in two equal partsX_train, X_test, y_train, y_test = \ train_test_split(X, y, test_size=0.5, random_state=0)
# Pick values of hyperparameters you want to considertuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4], 'C': [1, 10, 100, 1000]}, {'kernel': ['linear'], 'C': [1, 10, 100, 1000]} ]
![Page 26: Lec27 July8 2019 · Trains a decision tree using default hyperparameters (attribute chosen to split on either Gini or entropy, no max depth, etc) !3 from sklearn.datasets import load_iris](https://reader034.fdocuments.in/reader034/viewer/2022051814/60379f501d2d4f3021288294/html5/thumbnails/26.jpg)
MODEL SELECTION IN SCIKIT-LEARN
!26
# Perform a complete grid search + cross validation# for each of the hyperparameter vectorsclf = GridSearchCV(SVC(C=1), tuned_parameters, cv=5, scoring=‘precision’)clf.fit(X_train, y_train)
# Now that you’ve selected good hyperparameters via CV,# and trained a model on your training data, get an# estimate of the “true error” on your test sety_true, y_pred = y_test, clf.predict(X_test)print(classification_report(y_true, y_pred))