Lecture15–Part01 Supervisedand...

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5Eruption Time

Lecture 15 – Part 01Supervised and

Unsupervised Learning

Supervised Learning

▶ We tell the machine the “right answer”.▶ There is a ground truth.

▶ Data set: {( ⃗𝑥(𝑖), 𝑦𝑖)}.

▶ Goal: learn relationship between features ⃗𝑥(𝑖)and labels 𝑦𝑖.

▶ Examples: classification, regression.

Unsupervised Learning

▶ We don’t tell the machine the “right answer”.▶ In fact, there might not be one!

▶ Data set: ⃗𝑥(𝑖) (usually no test set)

▶ Goal: learn the structure of the data itself.▶ To discover something, for compression, to use as afeature later.

▶ Example: clustering

Example

▶ We gather measurements ⃗𝑥(𝑖) of a bunch offlowers.

▶ Question: how many species are there?

▶ Goal: cluster the similar flowers into groups.

Example

1 2 3 4 5 6 7PetalLengthCm

Example

70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5Height

Clustering and Dimensionality

▶ Groups emerge with more features.

▶ But too many features, and groups disappear.▶ Curse of dimensionality.

▶ Also: We can’t see in 𝑑 > 3.

Ground Truth

▶ If we don’t have labels, we can’t measureaccuracy.

▶ Sometimes, labels don’t exist.

▶ Example: cluster customers into types byprevious purchases.

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5Eruption Time

Lecture 15 – Part 02K-Means Clustering

Learning

▶ Goal: turn clustering into optimization problem.

▶ Idea: clustering is like compression

1 2 3 4 5 6 7PetalLengthCm

K-Means Objective

▶ Given: data, { ⃗𝑥(𝑖)} ∈ ℝ𝑑 and a parameter 𝑘.

▶ Find: 𝑘 cluster centers �⃗�(1), … , �⃗�(𝑘) so that theaverage squared distance from a data point tonearest cluster center is small.

▶ The k-means objective function:

Cost(�⃗�(1), … , �⃗�(𝑘)) = 1𝑛𝑛∑𝑖=1

min𝑗∈{1,…,𝑘}

‖ ⃗𝑥(𝑖) − �⃗�(𝑗)‖2

Optimization

▶ Goal: find �⃗�(1), … , �⃗�(𝑘) minimizing 𝑘-meansobjective function.

▶ Problem: this is NP-Hard.

▶ We use a heuristic instead of solving exactly.

Lloyd’s Algorithm for K-Means

▶ Initialize centers, �⃗�(1), … , �⃗�(𝑘) somehow.

▶ Repeat until convergence:▶ Assign each point ⃗𝑥(𝑖) to closest center▶ Update each �⃗�(𝑖) to be mean of points assigned to it

Example

Theory

▶ Each iteration reduces cost.

▶ This guarantees convergence to a local min.

▶ Initialization is very important.

Example

Initialization Strategies

▶ Basic Approach: Pick 𝑘 data points at random.

▶ Better Approach: k-means++:▶ Pick first center at random from data.▶ Let 𝐶 = {�⃗�(1)} (centers chosen so far)▶ Repeat 𝑘 − 1 more times:

▶ Pick random data point ⃗𝑥 according todistribution

ℙ( ⃗𝑥) ∝ min�⃗�∈𝐶

‖ ⃗𝑥 − 𝜇‖2

▶ Add ⃗𝑥 to 𝐶

Picking k

▶ How do we know how many clusters the datacontains?

Plot of K-Means Objective

Applications of K-Means

▶ Discovery

▶ Vector Quantization▶ Find a finite set of representatives of a large (possiblyinfinite) set.

Example #1

▶ Cluster animal descriptions.

▶ 50 animals: grizzly bear, dalmatian, rabbit, pig, …

▶ 85 attributes: long neck, tail, walks, swims, …

▶ 50 data points in ℝ85. Run 𝑘-means with 𝑘 = 10

Results

Example #2

▶ How do we represent images of different sizes asfixed length feature vectors for use inclassification tasks?

Visual Bags-of-Words

▶ Idea: build a “dictionary” of image patches.

▶ Extract all ℓ × ℓ image patches from all trainingimages.

▶ Cluster them with 𝑘-means.▶ Each cluster center is now a dictionary “word”

▶ Represent an image as a histogram over {1, 2, … , 𝑘}by associating each patch with nearest center.

Online Learning

▶ What if the dataset is huge?▶ It doesn’t even fit in memory.

▶ What if we’re continuously getting new data?▶ Don’t want to retrain with every new point.

▶ We can update the model online.

Sequential k-Means▶ Set the centers �⃗�(1), … , �⃗�(𝑘) to be first 𝑘 points

▶ Set counts to be 𝑛1 = 𝑛2 = … = 𝑛𝑘 = 1.

▶ Repeat:▶ Get next data point, ⃗𝑥▶ Let �⃗�(𝑗) be closest center▶ Update �⃗�(𝑗) and 𝑛𝑗:

�⃗�(𝑗) =𝑛𝑗�⃗�(𝑗) + ⃗𝑥𝑛𝑗 + 1

𝑛𝑗 = 𝑛𝑗 + 1

K-Means

▶ Perhaps the most popular clustering algorithm.

▶ Fast, easy to understand.

▶ Assumes spherical clusters.

Example

Lecture15–Part01 Supervisedand...

Documents

Transcript of Lecture15–Part01 Supervisedand...

MN I Ver01 Part01

Gumbrecht Atmosphere,MoodStimmung Part01

BU Prospectus21-Part01 copy

Technical Brief - UBC ECErobertor/Links_files/Files/TEK... · 97.5 95.0 92.5 90.0 87.5 85.0 82.5 80.0 77.5 75.0 72.5 70.7 (-3 dB) 77.5. Understanding Oscilloscope Bandwidth, Rise

D1 01 CA Resilient Livelihoods Framework-Part01 RichardEwbank 06Feb2013

Sexy School Teacher Part01-(420wap)

RPNM Part01 Biological Effects WEB

IEATC RSv2011.Part01.Introduction.1.01

Algo 01 part01

Mobiles in class part01

Part01 Energy Introduction 2013 v8

Report of Test LLIA001084-008A · 70.0 61 0000 72.5 53 0000 75.0 44 0000 77.5 36 0000 80.0 29 0000 82.5 21 0000 85.0 14 0000 87.5 60000 90.0 10000 Page 5 of 17 Lab Code: 201079-0

European Classic Powerlifting Championships Kaunas ... · 10 Sobiech-Konopa Emilia 1988 Poland 140.0 77.5 162.5 380.0 11 Daineffe Anne 1982 Belgium 135.0 85.0 145.0 365.0 12 McGrath

Ponniyan selvan part01

Cse115 lecture14strings part01

OGR Part01

Lecture14–Part01 Boostingsierra.ucsd.edu/cse151a-2020-sp/lectures/14-boosting/... · 2020-06-14 · Lecture14–Part01 Boosting. SoFarinCSE151A Learnasingle(sometimescomplex)model:

Media LectureNotes Bree Part01

Scholastic Book NeuralNetworks Part01 2013-02-15

Garcia Jerome Jerry Part01