Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social...

37
Introduction to AI (continued) Lecture #5 2/14/2017 1 Spring 2017 Social Computing Course

Transcript of Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social...

Page 1: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Introduction to AI(continued)

Lecture #5

2/14/2017 1Spring 2017 Social Computing Course

Page 2: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

What will we cover today?

• Feature Selection• Neural Networks• Unsupervised Learning– K-means

2/14/2017 Spring 2017 Social Computing Course 2

Page 3: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Selection• Task: Given a string, predict whether it is an

email [email protected]

• Feature selection: Given an input x, which properties might be useful for predicting y?

• Needs a feature selector ϕ(x)– contains_@ 1– endswith_.com 1– endswith_.org 0– contains_% 0

2/14/2017 Spring 2017 Social Computing Course 3

ß This is feature vector

ß binary features have a value of 1 if present, 0 if not

Page 4: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature weights• contains_@: 3• endswith_.com: 2.4• endswith_.org: 1.3• contains_%: -2• !contains_alpha: -4

• This is weight vector w• Weights typically normalized to [-1, 1]

2/14/2017 Spring 2017 Social Computing Course 4

Page 5: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Linear combination

• For a new example string, sum the weights– If the sum is >= 0, output Yes– else No

§ String 1: “[email protected]

§ String 2: “0.5%”

2/14/2017 Spring 2017 Social Computing Course 5

contains_@: 3endswith_.com: 2.4endswith_.org: 1.3contains_%: -2!contains_alpha: -4

Page 6: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Let’s add some math!

2/14/2017 Spring 2017 Social Computing Course 6

Page 7: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Selection• Seems arbitrary

– But actually captures our knowledge– What if we do not really know?

• How can we select good features then?• Feature Template

– Rather than a specific feature, decide what kind of feature may be useful• E.g., presence of certain characters in a string

– And then let the system tell us which are good!

2/14/2017 Spring 2017 Social Computing Course 7

contains_@: 3endswith_.com: 2.4endswith_.org: 1.3contains_%: -2!contains_alpha: -4

Page 8: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Templates• Allows us to define a set of related features– contains_@, contains_a, contains_b....

• As a template with a variable:– E.g., contains _ – where _ can be filled by any single character

• Less burden on the designer since we do not need to know which features to choose, only if they are useful cues or not

• But… it may produce lots of “useless” features

2/14/2017 Spring 2017 Social Computing Course 8

Page 9: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Template for “ends with”

2/14/2017 Spring 2017 Social Computing Course 9

Inefficient to represent all zeros!But it is common for feature vectors to be quite sparse

Page 10: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Vector Representations

• As an Array (or vector)[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0]– Good for dense features– Efficient in space and speed– E.g. in computer vision, pixel intensity vectors

are typically dense

2/14/2017 Spring 2017 Social Computing Course 10

Page 11: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Vector Representations

• As a Map{“contains_@”: 1, “endswith_m”: 1}– Good for sparse features• Known also as “one hot” if only one feature is set

– Features not in the map are assumed 0– Typical in NLP applications– Sometimes trillions of features– But less efficient than arrays, slower

2/14/2017 Spring 2017 Social Computing Course 11

Page 12: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Dimensionality Reduction –selecting only the features that matter

• Feature Vectors for a data setx1: [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0]x2:[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0]x3:[0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0]x4:[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0]x5:[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0]......xn:[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1]

2/14/2017 Spring 2017 Social Computing Course 12

Page 13: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

A Couple of Dimensionality Reduction Techniques

• Missing Values Ratio– Columns that have too many zeros are unlikely to

carry much useful information, remove columns where # of zeros > threshold

• High Correlation Filter– Columns that have very similar trends tend to

carry similar information. So we calculate the correlation between values of given columns, remove columns where correlation coefficient > threshold

2/14/2017 Spring 2017 Social Computing Course 13

Page 14: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

How do we know ϕ is good?

• Score:

• Learning chooses the optimal w

• How does feature extraction affect quality of prediction?– Assume the w is optimally selected by an

oracle

2/14/2017 Spring 2017 Social Computing Course 14

Page 15: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Selection affects what we can learn!

2/14/2017 Spring 2017 Social Computing Course 15

Page 16: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Feature Selection issues• Hypothesis class – a set of possible predictors

• Feature extraction defines a hypothesis class – a subset from all set of all possible predictors

• If feature extraction defines a feature set/hypothesis class which does not contain any good predictors, then no amount of learning will help!!

• We need to select features which are powerful to expresspredictors which are good!

• But it is necessary to keep the hypothesis class small, because the computational cost, and we usually do not get optimal w

2/14/2017 Spring 2017 Social Computing Course 16

Page 17: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Linear Predictors

2/14/2017 Spring 2017 Social Computing Course 17

Page 18: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Neural Networks

2/14/2017 Spring 2017 Social Computing Course 18

A sigmoid function that converts activation input into a signal (neuron like). More recently simple RELU function used (latest bio)

Page 19: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Neural Networks• Think of the intermediate hidden units as learned

features of a linear predictor

• The key idea is feature learning– BEFORE: Manually specify features– AFTER: Automatically learn them from data

• Deep learning: pack sparse large dimensionality representation into a dense small dimensionality space.– Allows generalizations to be learned!

• Objective function: Minimize the loss!

2/14/2017 Spring 2017 Social Computing Course 19

Page 20: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Deep neural networks

2/14/2017 Spring 2017 Social Computing Course 20

Page 21: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Deep learning Neural Networks

2/14/2017 Spring 2017 Social Computing Course 21

Page 22: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Deep Neural Networks

2/14/2017 Spring 2017 Social Computing Course 22

Page 23: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Supervised vs. Unsupervised• Labeled training data is expensive!– For some tasks, it may not even be available

• Unsupervised learning overcomes the need for training data

• Key idea: Data itself contains lots of rich latent structures, unsupervised learning discovers this structure automatically

2/14/2017 Spring 2017 Social Computing Course 23

Page 24: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Types of unsupervised learning

2/14/2017 Spring 2017 Social Computing Course 24

Page 25: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Clustering• Input: training set of input pointsDtrain = {x1, x2,...., xn}

• Output: assignment of each point to a cluster[z1, z2,...., zn] where zi belongs to {1, .. K}

• Want similar points in the same cluster, dissimilar points should be in different clusters

2/14/2017 Spring 2017 Social Computing Course 25

Page 26: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

K-means

2/14/2017 Spring 2017 Social Computing Course 26

Page 27: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

K-means example

• Input Dtrain = {0, 2, 10, 12}• Output K = 2 clusters with centroids µ1,µ2

2/14/2017 Spring 2017 Social Computing Course 27

Page 28: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

K-means

• But we know neither the centroids nor the assignments!

• Break down the hard problem into two easy problems

2/14/2017 Spring 2017 Social Computing Course 28

Page 29: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

K-means algorithm (Step 1)

2/14/2017 Spring 2017 Social Computing Course 29

Page 30: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

K-means algorithm (Step 2)

2/14/2017 Spring 2017 Social Computing Course 30

Page 31: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

K-means algorithm

2/14/2017 Spring 2017 Social Computing Course 31

Page 32: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

K-means example

2/14/2017 Spring 2017 Social Computing Course 32

Page 33: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Local minima

• K-means is guaranteed to converge on local minimum but not guaranteed to find global minimum

• Solution: Run multiple times with different random initializations, then choose solution that has lowest loss

2/14/2017 Spring 2017 Social Computing Course 33

Page 34: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

What will we cover today?

• Feature Selection• Neural Networks• Unsupervised Learning– K-means

2/14/2017 Spring 2017 Social Computing Course 34

Page 35: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Something to think about

• What might be good features for predicting homepages?

• What might be good features for spam recognition?

2/14/2017 Spring 2017 Social Computing Course 35

Page 36: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Assigned Reading

• No paper assigned today

2/14/2017 Spring 2017 Social Computing Course 36

Page 37: Introduction to AI (continued)Introduction to AI (continued) Lecture #5 2/14/2017 Spring 2017 Social Computing Course 1. What will we cover today? •Feature Selection •Neural Networks

Assigned Reading

• Antisocial Behavior in Online Discussion Communities– Justin Cheng, Cristian Danescu-Niculescu-

Mizil, Jure Leskovec– Proceedings of ICWSM 2015.

• Responses due on Feb 16th, 2016 11:59 pm

2/14/2017 Spring 2017 Social Computing Course 37