Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any...

45
Lecture 1 Introduction to Machine Learning & Modern Applications Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universit¨ at T¨ ubingen, Germany Advanced Topics in Machine Learning, 2012 P. Laskov and B. Nelson (T¨ ubingen) Lecture 1: Introduction April 17, 2012 1 / 37

Transcript of Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any...

Page 1: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Lecture 1Introduction to Machine Learning & Modern Applications

Pavel Laskov1 Blaine Nelson1

1Cognitive Systems GroupWilhelm Schickard Institute for Computer ScienceUniversitat Tubingen, Germany

Advanced Topics in Machine Learning, 2012

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 1 / 37

Page 2: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Part I

Course Outline

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 2 / 37

Page 3: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Course Overview

Course Information:

Course Time: Tues, 14:00 c.t. – 16:00 (Except: May 1 & May 29)Course Location: Sand F122Office Hours: By Appointmenthttp://www.ra.cs.uni-tuebingen.de/lehre/ss12/advanced_ml.html

Course Material:

Textbook ( 1/2 of course): John Shawe-Taylor and Nello Cristianini: KernelMethods for Pattern Analysis. Cambridge University Press, 2004 [10].Supplementary Material to be supplied.

Final Exam: July 31

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 3 / 37

Page 4: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Instructors

Dr. Pavel Laskov

Office: Sand A304

pavel DOT laskov AT

uni-tuebingen DOT de

www-rsec.cs.uni-tuebingen.de/

laskov

Dr. Blaine Nelson

Office: Sand A316

blaine DOT nelson AT wsii

DOT uni-tuebingen DOT de

www.ra.cs.uni-tuebingen.de/

mitarb/nelson/

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 4 / 37

Page 5: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Ubungen and Grading

In the exercise meetings, some solution techniques will be presented &model solutions will be discussed.

Time: Wed, 14:00 c.t. - 16:00, on selected dates, TBA

Location: Sand F122

Homework: 4-5 graded written assignments

Grades: Homework will comprise 30% of the grade. The remaining 70%will be from the final exam.

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 5 / 37

Page 6: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Part II

Applications of Machine Learning

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 6 / 37

Page 7: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Why is Machine Learning Relevant?

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 7 / 37

Page 8: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Why is Machine Learning Relevant?

Machine

Perce

ption

Computer Vision

Natural L

anguage

Processin

g

Search

Engine

s

Medical Diagnosis

Bioinfo

rmatics

Cheminformatics

Fraud Detection

Stock Market Analysis

Speech/Handwriting Recognition

Object

Recogn

ition

Robot Locomotion

Google Translator

Recommender Systems (Amazon, Facebook, LastFM, LinkedIn)

Paper Assignment System (NIPS)

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 7 / 37

Page 9: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Why is Machine Learning Relevant?Google Translator

French/EnglishBilingual Text

English Text

Statistical Analysis Statistical Analysis

French (Poor) English English

TranslationModel

LanguageModel

Decoding AlgorithmargmaxP(e) ∗ P(S |e)

la maison la maison blue la fleur

the house the blue house the flower

la maison la maison blue la fleur

the house the blue house the flower

la maison la maison blue la fleur

the house the blue house the flower

Statistical translators are composed of 2 elements:1 Translation model: learns correspondences between words2 Language model: learns word order for a proper sentence

Google trains their translation model by learning correspondences foundin bilingual text

Figures used were reproduced from talk What’s New in StatisticalMachine Translation by Knight & Koehn [3]

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 8 / 37

Page 10: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Why is Machine Learning Relevant?Recommender System

Recommender System: given users past ratings of items (books, movies,etc.) recommend new items for them [8]

Collaborative Filtering (User-based):1 Find users with similar ratings to the current user2 Use the rating of the like-users to make predictions for current user.

Collaborative Filtering (Item-based) [4]:1 Determine item-item relationships from user data2 Use this matrix and user’s preferences to predict new items for the user

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 9 / 37

Page 11: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Why is Machine Learning Relevant?Paper Assignment System

Paper Assignment System (NIPS): Given a set of reviewers (R) and aset of papers (P) find a matching between themMatching Criteria:1 Each paper p ∈ P must be reviewed by at least 3 reviewers2 Each reviewer r ∈ R should be assigned to papers related to his/her research3 Each reviewer r ∈ R should not be assigned to too many papersApproach of Charlin, Zemel, & Boutilier [1]:1 Construct a language model based on observed words in papers2 Construct suitability score for each reviewer-paper pair using linear regression

based on (1) parameters of language model and (2) reviewer preferences3 Use collaborative filtering to find a reviewer-paper assignment

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 10 / 37

Page 12: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Topics to be Covered

2 Large-scale/Online Learning

Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37

Page 13: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Topics to be Covered

2 Large-scale/Online Learning

Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data

3 Learning for Structured Data

Most real data is not numeric & converting to a numeric representation maylose important structural elementsWe will discuss methods that allow for structured data

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37

Page 14: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Topics to be Covered

2 Large-scale/Online Learning

Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data

3 Learning for Structured Data

Most real data is not numeric & converting to a numeric representation maylose important structural elementsWe will discuss methods that allow for structured data

4 Learning in Adversarial Environments

Not all data comes from a static source; in fact, it may change adversariallyLearning methods need to be robust against data

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37

Page 15: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Topics to be Covered

1 Kernel methods

Kernel functions provide data abstractionKernel methods provide common algorithms for any kernel

2 Large-scale/Online Learning

Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data

3 Learning for Structured Data

Most real data is not numeric & converting to a numeric representation maylose important structural elementsWe will discuss methods that allow for structured data

4 Learning in Adversarial Environments

Not all data comes from a static source; in fact, it may change adversariallyLearning methods need to be robust against data

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37

Page 16: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Part III

Scope of Machine Learning

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 12 / 37

Page 17: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?The Chinese Room Problem (see Chapter 26 of [9])

Suppose you are placed in a room with a book of symbols/instructions

When a symbol comes, the book tells you what symbols to produce

To any outside observer, the room is able to perfectly answer questionsin Chinese, but. . .

Does the room know Chinese?Do you know Chinese?Does the book know Chinese?

The same dilemma occurs when we talk about machinelearning. . .What does it mean for a machine to learn?

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 13 / 37

Page 18: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Machine Learning & Artificial Intelligence

Machine Learning (ML) & Artificial Intelligence (AI) are closely relatedbut there are several key differences

Artificial Intelligence: the broad study of machines’ ability to solve awide-range of human-like tasks; e.g.,

SearchSolving Constraint Satisfaction ProblemsLogical InferencePlanningComputing probabilities for events

Machine Learning: the branch of AI that studies the ability of machinesto learn (albeit not necessarily like humans)

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 14 / 37

Page 19: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Machine Learning & Artificial Intelligence

Goal of AI

Particular General

Representation

F1,2 ∝m1∗m2

r21,2

Particular

Acquisition

(Induction)

Application

(Deduction)

Classic AI addressed deductive reasoning & knowledge representationLearning is concerned with inductive reasoning (generalizing) toconstruct hypotheses

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 15 / 37

Page 20: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Machine Learning & Artificial Intelligence

“Classic” AI

Particular General

Representation

F1,2 ∝m1∗m2

r21,2

Particular

Acquisition

(Induction)

Application

(Deduction)

Classic AI addressed deductive reasoning & knowledge representationLearning is concerned with inductive reasoning (generalizing) toconstruct hypotheses

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 15 / 37

Page 21: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Machine Learning & Artificial Intelligence

Learning

Particular General

Representation

F1,2 ∝m1∗m2

r21,2

Particular

Acquisition

(Induction)

Application

(Deduction)

Classic AI addressed deductive reasoning & knowledge representationLearning is concerned with inductive reasoning (generalizing) toconstruct hypotheses

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 15 / 37

Page 22: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Classic Artificial Intelligence: Search, CSPs, & Games

Classic AI includes many interesting problems (see Chapters 1–6 of [9])

Route Planning Constraint Satisfaction Game Search

These AI algorithms solve difficult problems, but do they learn?They use pre-defined knowledge/rules to solve particular instances of eachproblemsTheir solutions do not summarize any inherent aspects of the problems theysolveThese algorithms do not extract information from their input data that canbe applied to solve later problems

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 16 / 37

Page 23: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Classic Artificial Intelligence: Logical & Probabilistic Inference

Inference algorithms derive consequences from prior knowledge &evidence (see Chapters 7–9 & 13–17 of [9])

Logical Inference Probabilistic Inference

Do inference algorithms learn?

They derive previously unknown knowledge from evidenceTheir rules & structure are given a priori from a knowledge base—all theirderivations follow as consequences

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 17 / 37

Page 24: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Part IV

Pattern Analysis

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 18 / 37

Page 25: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Machine Learning as Pattern AnalysisRedundancy

Data Redundancy: indicates that there are (simple) patterns in thedata that allow missing information to be re-constructed/predicted

Compressibility: Redundant data can be compressed (sometimeslosslessly)

Example: Any given natural language text (or photograph), it can besignificantly compressed to a significantly smaller size

Predictability: Redundancy allows predictions to be made with onlypartial information

Example: If we know how long an object has been falling (in a vacuum onEarth), we can predict how far it has fallen: x ∝ −t2

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 19 / 37

Page 26: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Machine Learning as Pattern AnalysisPatterns

Pattern: any relation present in data; i.e., as a function f : X → Y

Exact Pattern: non-trivial pattern such that f (x) = 0 for all foreseeable xApproximate Pattern: non-trivial pattern such that f (x) ≈ 0 for allforeseeable xStatistical Pattern: non-trivial pattern such that Ex∼P [f (x)] ≈ 0 for somedistribution P on X

The veracity of a pattern is assessed by comparing the pattern’sprediction f (x) to the true value y ; this is accomplished via a lossfunction

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 20 / 37

Page 27: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Machine Learning as Pattern Analysis

Pattern Analysis: Discovery of underlying relations, regularities orstructures that are inherent to a set of data

Detecting an inherent pattern allows predictions to be made about futuredata from the same sourceExample - Kepler’s Law: From observation, Kepler found that the periodicityof a planet (P) & its distance (D) are related as P2 ≈ D3

Periodicity (P) Distance (D) P2

D3

Mercury 0.24 0.39 0.058 0.059Venus 0.62 0.72 0.38 0.39Earth 1.00 1.00 1.00 1.00Mars 1.88 1.52 3.53 3.51Jupiter 11.86 5.20 140.66 140.61Saturn 29.46 9.58 867.89 879.22Uranus 84.32 ?? 7109.86 ??

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 21 / 37

Page 28: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Machine Learning as Pattern Analysis

Pattern Analysis: Discovery of underlying relations, regularities orstructures that are inherent to a set of data

Detecting an inherent pattern allows predictions to be made about futuredata from the same sourceExample - Kepler’s Law: From observation, Kepler found that the periodicityof a planet (P) & its distance (D) are related as P2 ≈ D3

Periodicity (P) Distance (D) P2

D3

Mercury 0.24 0.39 0.058 0.059Venus 0.62 0.72 0.38 0.39Earth 1.00 1.00 1.00 1.00Mars 1.88 1.52 3.53 3.51Jupiter 11.86 5.20 140.66 140.61Saturn 29.46 9.58 867.89 879.22Uranus 84.32 19.22 7109.86

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 21 / 37

Page 29: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

What is Machine Learning?Machine Learning as Pattern Analysis

Pattern Analysis: Discovery of underlying relations, regularities orstructures that are inherent to a set of data

Detecting an inherent pattern allows predictions to be made about futuredata from the same sourceExample - Kepler’s Law: From observation, Kepler found that the periodicityof a planet (P) & its distance (D) are related as P2 ≈ D3

Periodicity (P) Distance (D) P2

D3

Mercury 0.24 0.39 0.058 0.059Venus 0.62 0.72 0.38 0.39Earth 1.00 1.00 1.00 1.00Mars 1.88 1.52 3.53 3.51Jupiter 11.86 5.20 140.66 140.61Saturn 29.46 9.58 867.89 879.22Uranus 84.32 19.23 7109.86 7111.11

By finding patterns, the system is able to generalize & makepredictions—thus, this is a form of learning

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 21 / 37

Page 30: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Machine LearningA General Description

Machine Learning Definition (paraphrased from Tom Mitchell [6])

A computer algorithm A is said to learn from data/experience D

with respect to some class of tasks T & performance measure L, ifits performance at tasks in T , as measured by L, improves withexperience D.

The algorithm A is a learning algorithm

The data/experience D will generally be a dataset

The performance function L will generally be a statistical loss function

Throughout this course, we will consider a number of different learningtasks; among them are classification, regression, subspace estimation,outlier detection & clustering.

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 22 / 37

Page 31: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Part V

Learning Framework & Tasks

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 23 / 37

Page 32: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Machine LearningA Mathematical Framework & Terminology

Input Space (X ): space used to describe individual data items; e.g.,the D-dimensional Euclidean space, ℜD

Output Space (Y): space of possible predictions

Dataset (D): indexable collection of data; i.e., the data consistent of Nitems from X ; each instance is a data point xi (and output yi)

Hypothesis/Estimator (f ): object or function that represents thelearned entity; f : X → Y

Hypothesis Space (F): set of all learnable hypotheses

Learning Algorithm (A): algorithm that selects hypothesis f ∈ Fbased on data D; i.e., A : XN → F

Loss Function L (·, ·): a non-negative function that measuresdisagreement between its arguments

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 24 / 37

Page 33: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Machine LearningA Simple Example

Consider the task of finding the center of a distribution on the reals, ℜ,given N numbers drawn from the distribution.

The dataset is D = {xi}Ni=1 where each data point xi ∈ ℜ.

The hypothesis space is the set of all possible centroids; i.e., F = ℜ

The mean is one possible centroid estimating algorithm:

A (D) = 1N

∑Ni=1 xi

The median is a second centroid estimation algorithm:

A (D) = median (x1 . . . xN)

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 25 / 37

Page 34: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Common Machine Learning TasksTypes of Learning

Supervised Learning: pattern analysis in which training data containspaired examples of inputs xi & their corresponding outputs yi

Examples: Regression, Binary/Multiclass Classification

Semi-Supervised Learning: pattern analysis in which training datacontains both paired examples (xi , yi ) & unpaired examples xj

Examples: Transduction, Ranking

Unsupervised Learning: pattern analysis in which training datacontains only unpaired examples of inputs xi

Examples: Anomaly Detection, Subspace Detection, Clustering

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 26 / 37

Page 35: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Common Machine Learning TasksRegression

Objective: find relationship between (correlated) input variables x &output variables y

The dataset is D = {(xi , yi )}Ni=1 where each data point is a pair of input

variables xi ∈ ℜD & the corresponding output yiThe hypothesis space is the set of all functions from the input space Xto the output space Y; i.e., F = {f | f : X → Y}

This hypothesis space is generally too large (to be discussed)

A common restriction is to just consider the set of linear mappings fromX to Y as parametrized by w and b as f (x) = w⊤x− b

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 27 / 37

Page 36: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Common Machine Learning TasksClassification

Objective: find separation between input variables xi based on the classyi of observed instances

The dataset is D = {(xi , yi )}Ni=1 where each data point is a pair of input

variables xi ∈ ℜD & the corresponding output yi ∈ {1, . . . ,K}

The hypothesis space is the set of all functions from the input space Xto {0, . . . ,K}; i.e., F = {f | f : X → {1, . . . ,K} }

Case of binary classification (labels −1 & 1) can be addressed withregression; i.e., as the sign of a real-valued function

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 28 / 37

Page 37: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Common Machine Learning TasksSubspace Estimation

Objective: find a projection PX onto a subspace which “captures” thedata; i.e., PX (xi ) has a small residual, ‖PX (xi )− xi‖

The dataset is a set of points in X : D = {xi}Ni=1

The hypothesis space is the set of all subspace projections; i.e.,F = {PX | ∀ x ∈ X PX (x) = PX (PX (x)) }

When X is a Euclidean space, the subspace (and its projection) can beparametrized by a set of k ≤ D orthonormal basis vectors

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 29 / 37

Page 38: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Common Machine Learning TasksClustering

Objective: find underlying clusters (K ) within the dataset; i.e., there isa latent label y that predicts structure

The dataset is a set of points in X : D = {xi}Ni=1

The hypothesis space is the set of all functions from the input space Xto the cluster label; i.e., F = {f | f : X → {1, . . . ,K} }

Number of clusters (K ) often preselectedAssumptions often made about shape of clusters

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 30 / 37

Page 39: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Part VI

General Challenges for Learning

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 31 / 37

Page 40: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Challenges for Machine LearningInductive Bias (see also [2, 7])

Inductive learning algorithms require an inductive bias

Without a bias, the number of possible hypotheses is too large anduntenable

For a finite space X , the number of possible binary hypotheses is 2|X |

Suppose you maintain a set of all hypotheses consistent with yourobservations. . .For any new unseen instance, there will always been an equal number ofhypotheses that predict that instance as both positive & negative!

Inductive Bias: “the set of assumptions that the learner uses to predictoutputs given inputs that it has not encountered” [7]

Occam’s Razor: prefer shorter/simpler hypothesesMaximum Margin: prefer hypotheses with the large margin (gap)Minimum Features: only include significant features

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 32 / 37

Page 41: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Challenges for Machine LearningSpurious Patterns

Underfitting: the inability to find significant patterns whenoverly-restrictive assumptions are made about data or the hypothesisspace is too small.

Overfitting: Finding spurious patterns when too few assumptions aremade about the data or hypothesis space is too large found.

Codes from (left) Bible & (right) War and Peace (see [5])

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 33 / 37

Page 42: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Challenges for Machine LearningComputational Efficiency

Learning algorithms should be able to (computationally) scale. . .1 To large datasets2 For quick predictions

Training efficiency of learning is generally measured in dataset size, N1 Algorithms are efficient if their computational complexity is polynomial in

N ; i.e., O (Na) for some fixed a ≥ 0.2 Algorithms are considered to be large-scale if their computational

complexity is linear in N ; i.e., O (N)3 In some applications, it may not even be computationally feasible to look at

every data point; require sublinear or logarithmic complexity

Prediction efficiency of learning is generally measured in dataset size, N,and in number of predictions M

Prediction should be sublinear in N

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 34 / 37

Page 43: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Summary

1 Machine learning (ML) is a relevant, popular topic with applicationsspanning many data-driven tasks (e.g., translation & recommendersystems)

2 ML spans tasks in inductive reasoning; unlike classic AI, ML infersgeneral patterns from specific samples

3 ML algorithms can be viewed as pattern analyzers—the patterns theyfind can be used to make predictions

4 Common tasks in ML include regression, classification, subspacediscovery, & clustering

5 Learning algorithms face challenges including choosing an inductivebias, under- & over-fitting, & computational efficiency

6 Next Lecture: We will discuss a general approach to learning calledkernel methods & show its application to regression

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 35 / 37

Page 44: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Bibliography I

[1] Laurent Charlin, Richard S. Zemel, and Craig Boutilier. A frameworkfor optimizing paper matching. In Proceedings of the Twenty-SeventhConference on Uncertainty in Artificial Intelligence (UAI), pages86–95, 2011.

[2] Diana F. Gordon and Marie desJardins. Evaluation and selection ofbiases in machine learning. Machine Learning, 20(1-2):5–22, 1995.

[3] Kevin Knight and Philipp Koehn. What’s new in statistical machinetranslation. In HLT-NAACL, 2003.http://people.csail.mit.edu/people/koehn/publications/tutorial2003.pdf.

[4] G Linden, B Smith, and J York. Amazon.com recommendations:Item-to-item collaborative filtering. IEEE Internet Computing,7(1):76–80, 2003.

[5] Brendan Mckay, Dror Bar-Natan, Maya Bar-Hillel, and Gil Kalai.Solving the bible code puzzle. Statistical Science, 14:150–173, 1999.

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 36 / 37

Page 45: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information

Bibliography II

[6] Tom Mitchell. Machine Learning. McGraw Hill, 1997.

[7] Tom M. Mitchell. The need for biases in learning generalizations.Technical Report CBM-TR 5-110, Rutgers University, NewBrunswick, NJ, 1980.

[8] Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction torecommender systems handbook. In Recommender SystemsHandbook, pages 1–35. 2011.

[9] Stuart J. Russell and Peter Norvig. Artificial Intelligence - A ModernApproach. Pearson Education, 3rd edition, 2010.

[10] John Shawe-Taylor and Nello Cristianini. Kernel Methods for PatternAnalysis. Cambridge University Press, 2004.

P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 37 / 37