Scikit Learn: Data Normalization Techniques That Work

Post on 13-Apr-2017

80 views 7 download

Transcript of Scikit Learn: Data Normalization Techniques That Work

HELP YOUR DATA BE NORMAL

DAMIAN MINGLECHIEF DATA SCIENTIST

@DamianMingle

GET THE FULL STORYbit.ly/UseSciKitNow

Want faster model run times and better accuracy?

Try Normalizing Your Data

What’s Normal Anyway?

Often stated as “scaling individual samples to have unit norm” or “scale input vectors individually to unit norm (vector length).

Adjusting values measured on different scales to a notionally common scale

Why Normalization Matters

In truth, not all machine learning models are sensitive to magnitude.

Data on the same scale can help machine learning models learn (think k-nearest neighbors and coefficients in regression)

Power in SciKit Learn

Preprocessing Clustering Regression Classification Dimensionality Reduction Model Selection

Power of SciKit Learn

Let’s Look at ML Recipe

Normalization

The Imports

from sklearn.datasets import load_iris from sklearn import preprocessing

Separate Features from Target

iris = load_iris() print(iris.data.shape)X = iris.data y = iris.target

Normalize the Features

normalized_X = preprocessing.normalize(X)

Normalization Recipe

# Normalize the data attributes for the Iris dataset. from sklearn.datasets import load_iris from sklearn import preprocessing # load the iris dataset iris = load_iris() print(iris.data.shape) # separate the data from the target attributes X = iris.data y = iris.target # normalize the data attributes normalized_X = preprocessing.normalize(X)

HELP YOUR DATA BE NORMAL

DAMIAN MINGLECHIEF DATA SCIENTIST

@DamianMingle

GET THE FULL STORYbit.ly/UseSciKitNow