Understanding feature-space

Post on 11-Apr-2017

44 views 1 download

Transcript of Understanding feature-space

1

Understanding Feature Space in Machine Learning

Alice Zheng, DatoSeptember 9, 2015

2

My journey so farApplied machine learning

(Data science)

Build ML tools

Shortage of expertsand good tools.

3

Why machine learning?

Model data.Make predictions.Build intelligent

applications.

4

The machine learning pipeline

I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, …

Raw data

FeaturesModels

Predictions

Deploy inproduction

Feature = numeric representation of raw data

6

Representing natural text

It is a puppy and it is extremely cute.

What’s important? Phrases? Specific words? Ordering?

Subject, object, verb?

Classify: puppy or not?

Raw Text

{“it”:2, “is”:2, “a”:1, “puppy”:1, “and”:1, “extremely”:1, “cute”:1 }

Bag of Words

7

Representing natural text

It is a puppy and it is extremely cute.

Classify: puppy or not?

Raw Text Bag of Wordsit 2

they 0

I 1

am 0

how 0

puppy 1

and 1

cat 0

aardvark 0

cute 1

extremely 1

… …

Sparse vector representation

8

Representing images

Image source: “Recognizing and learning object categories,” Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009.

Raw image: millions of RGB triplets,one for each pixel

Classify: person or animal?Raw Image Bag of Visual Words

9

Representing imagesClassify: person or animal?Raw Image Deep learning features

3.29-15

-5.2448.31.3647.1

-1.9236.52.8395.4-19-89

5.0937.8

Dense vector representation

10

Feature space in machine learning• Raw data high dimensional vectors• Collection of data points point cloud in feature

space• Model = geometric summary of point cloud• Feature engineering = creating features of the

appropriate granularity for the task

Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes.

-- Masha Gessen, “Perfect Rigor”

12

Algebra vs. Geometry

a

bc

a2 + b2 = c2

Algebra GeometryPythagoreanTheorem

(Euclidean space)

13

Visualizing a sphere in 2D

x2 + y2 = 1

a

bc

Pythagorean theorem:a2 + b2 = c2

x

y

1

1

14

Visualizing a sphere in 3D

x2 + y2 + z2 = 1

x

y

z

1

11

15

Visualizing a sphere in 4D

x2 + y2 + z2 + t2 = 1

x

y

z

1

11

16

Why are we looking at spheres?

= =

= =

Poincaré Conjecture:All physical objects without holes

is “equivalent” to a sphere.

17

The power of higher dimensions• A sphere in 4D can model the birth and death

process of physical objects• Point clouds = approximate geometric shapes• High dimensional features can model many things

Visualizing Feature Space

19

The challenge of high dimension geometry• Feature space can have hundreds to millions of

dimensions• In high dimensions, our geometric imagination is

limited- Algebra comes to our aid

20

Visualizing bag-of-words

puppy

cute

1

1

I have a puppy andit is extremely cute

I have a puppy andit is extremely cute

it 1

they 0

I 1

am 0

how 0

puppy 1

and 1

cat 0

aardvark 0

zebra 0

cute 1

extremely 1

… …

21

Visualizing bag-of-words

puppy

cute

1

11

extremely

I have a puppy and it is extremely cute

I have an extremely cute cat

I have a cute puppy

22

Document point cloudword 1

word 2

23

What is a model?• Model = mathematical “summary” of data• What’s a summary?

- A geometric shape

24

Classification modelFeature 2

Feature 1

Decide between two classes

25

Clustering modelFeature 2

Feature 1

Group data points tightly

26

Regression modelTarget

Feature

Fit the target values

Visualizing Feature Engineering

28

When does bag-of-words fail?

puppy

cat

2

11

have

I have a puppy

I have a catI have a kitten

Task: find a surface that separates documents about dogs vs. cats

Problem: the word “have” adds fluff instead of information

I have a dogand I have a pen

1

29

Improving on bag-of-words• Idea: “normalize” word counts so that popular words

are discounted• Term frequency (tf) = Number of times a terms

appears in a document• Inverse document frequency of word (idf) =

• N = total number of documents• Tf-idf count = tf x idf

30

From BOW to tf-idf

puppy

cat

2

11

have

I have a puppy

I have a catI have a kitten

idf(puppy) = log 4idf(cat) = log 4idf(have) = log 1 = 0

I have a dogand I have a pen

1

31

From BOW to tf-idf

puppy

cat1

have

tfidf(puppy) = log 4tfidf(cat) = log 4tfidf(have) = 0

I have a dogand I have a pen,I have a kitten

1

log 4

log 4

I have a cat

I have a puppy

Decision surface

Tf-idf flattens uninformative

dimensions in the BOW point cloud

32

Entry points of feature engineering• Start from data and task

- What’s the best text representation for classification?• Start from modeling method

- What kind of features does k-means assume?- What does linear regression assume about the data?

33

That’s not all, folks!• There’s a lot more to feature engineering:

- Feature normalization- Feature transformations- “Regularizing” models- Learning the right features

• Dato is hiring! jobs@dato.comalicez@dato.com @RainyData