Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any...
Transcript of Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any...
![Page 1: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/1.jpg)
Lecture 1Introduction to Machine Learning & Modern Applications
Pavel Laskov1 Blaine Nelson1
1Cognitive Systems GroupWilhelm Schickard Institute for Computer ScienceUniversitat Tubingen, Germany
Advanced Topics in Machine Learning, 2012
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 1 / 37
![Page 2: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/2.jpg)
Part I
Course Outline
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 2 / 37
![Page 3: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/3.jpg)
Course Overview
Course Information:
Course Time: Tues, 14:00 c.t. – 16:00 (Except: May 1 & May 29)Course Location: Sand F122Office Hours: By Appointmenthttp://www.ra.cs.uni-tuebingen.de/lehre/ss12/advanced_ml.html
Course Material:
Textbook ( 1/2 of course): John Shawe-Taylor and Nello Cristianini: KernelMethods for Pattern Analysis. Cambridge University Press, 2004 [10].Supplementary Material to be supplied.
Final Exam: July 31
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 3 / 37
![Page 4: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/4.jpg)
Instructors
Dr. Pavel Laskov
Office: Sand A304
pavel DOT laskov AT
uni-tuebingen DOT de
www-rsec.cs.uni-tuebingen.de/
laskov
Dr. Blaine Nelson
Office: Sand A316
blaine DOT nelson AT wsii
DOT uni-tuebingen DOT de
www.ra.cs.uni-tuebingen.de/
mitarb/nelson/
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 4 / 37
![Page 5: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/5.jpg)
Ubungen and Grading
In the exercise meetings, some solution techniques will be presented &model solutions will be discussed.
Time: Wed, 14:00 c.t. - 16:00, on selected dates, TBA
Location: Sand F122
Homework: 4-5 graded written assignments
Grades: Homework will comprise 30% of the grade. The remaining 70%will be from the final exam.
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 5 / 37
![Page 6: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/6.jpg)
Part II
Applications of Machine Learning
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 6 / 37
![Page 7: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/7.jpg)
Why is Machine Learning Relevant?
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 7 / 37
![Page 8: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/8.jpg)
Why is Machine Learning Relevant?
Machine
Perce
ption
Computer Vision
Natural L
anguage
Processin
g
Search
Engine
s
Medical Diagnosis
Bioinfo
rmatics
Cheminformatics
Fraud Detection
Stock Market Analysis
Speech/Handwriting Recognition
Object
Recogn
ition
Robot Locomotion
Google Translator
Recommender Systems (Amazon, Facebook, LastFM, LinkedIn)
Paper Assignment System (NIPS)
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 7 / 37
![Page 9: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/9.jpg)
Why is Machine Learning Relevant?Google Translator
French/EnglishBilingual Text
English Text
Statistical Analysis Statistical Analysis
French (Poor) English English
TranslationModel
LanguageModel
Decoding AlgorithmargmaxP(e) ∗ P(S |e)
la maison la maison blue la fleur
the house the blue house the flower
la maison la maison blue la fleur
the house the blue house the flower
la maison la maison blue la fleur
the house the blue house the flower
Statistical translators are composed of 2 elements:1 Translation model: learns correspondences between words2 Language model: learns word order for a proper sentence
Google trains their translation model by learning correspondences foundin bilingual text
Figures used were reproduced from talk What’s New in StatisticalMachine Translation by Knight & Koehn [3]
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 8 / 37
![Page 10: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/10.jpg)
Why is Machine Learning Relevant?Recommender System
Recommender System: given users past ratings of items (books, movies,etc.) recommend new items for them [8]
Collaborative Filtering (User-based):1 Find users with similar ratings to the current user2 Use the rating of the like-users to make predictions for current user.
Collaborative Filtering (Item-based) [4]:1 Determine item-item relationships from user data2 Use this matrix and user’s preferences to predict new items for the user
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 9 / 37
![Page 11: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/11.jpg)
Why is Machine Learning Relevant?Paper Assignment System
Paper Assignment System (NIPS): Given a set of reviewers (R) and aset of papers (P) find a matching between themMatching Criteria:1 Each paper p ∈ P must be reviewed by at least 3 reviewers2 Each reviewer r ∈ R should be assigned to papers related to his/her research3 Each reviewer r ∈ R should not be assigned to too many papersApproach of Charlin, Zemel, & Boutilier [1]:1 Construct a language model based on observed words in papers2 Construct suitability score for each reviewer-paper pair using linear regression
based on (1) parameters of language model and (2) reviewer preferences3 Use collaborative filtering to find a reviewer-paper assignment
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 10 / 37
![Page 12: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/12.jpg)
Topics to be Covered
2 Large-scale/Online Learning
Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37
![Page 13: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/13.jpg)
Topics to be Covered
2 Large-scale/Online Learning
Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data
3 Learning for Structured Data
Most real data is not numeric & converting to a numeric representation maylose important structural elementsWe will discuss methods that allow for structured data
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37
![Page 14: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/14.jpg)
Topics to be Covered
2 Large-scale/Online Learning
Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data
3 Learning for Structured Data
Most real data is not numeric & converting to a numeric representation maylose important structural elementsWe will discuss methods that allow for structured data
4 Learning in Adversarial Environments
Not all data comes from a static source; in fact, it may change adversariallyLearning methods need to be robust against data
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37
![Page 15: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/15.jpg)
Topics to be Covered
1 Kernel methods
Kernel functions provide data abstractionKernel methods provide common algorithms for any kernel
2 Large-scale/Online Learning
Learning may need to work with vast amounts of dataLearning can be incrementally updated for new data
3 Learning for Structured Data
Most real data is not numeric & converting to a numeric representation maylose important structural elementsWe will discuss methods that allow for structured data
4 Learning in Adversarial Environments
Not all data comes from a static source; in fact, it may change adversariallyLearning methods need to be robust against data
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 11 / 37
![Page 16: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/16.jpg)
Part III
Scope of Machine Learning
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 12 / 37
![Page 17: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/17.jpg)
What is Machine Learning?The Chinese Room Problem (see Chapter 26 of [9])
Suppose you are placed in a room with a book of symbols/instructions
When a symbol comes, the book tells you what symbols to produce
To any outside observer, the room is able to perfectly answer questionsin Chinese, but. . .
Does the room know Chinese?Do you know Chinese?Does the book know Chinese?
The same dilemma occurs when we talk about machinelearning. . .What does it mean for a machine to learn?
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 13 / 37
![Page 18: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/18.jpg)
What is Machine Learning?Machine Learning & Artificial Intelligence
Machine Learning (ML) & Artificial Intelligence (AI) are closely relatedbut there are several key differences
Artificial Intelligence: the broad study of machines’ ability to solve awide-range of human-like tasks; e.g.,
SearchSolving Constraint Satisfaction ProblemsLogical InferencePlanningComputing probabilities for events
Machine Learning: the branch of AI that studies the ability of machinesto learn (albeit not necessarily like humans)
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 14 / 37
![Page 19: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/19.jpg)
What is Machine Learning?Machine Learning & Artificial Intelligence
Goal of AI
Particular General
Representation
F1,2 ∝m1∗m2
r21,2
Particular
Acquisition
(Induction)
Application
(Deduction)
Classic AI addressed deductive reasoning & knowledge representationLearning is concerned with inductive reasoning (generalizing) toconstruct hypotheses
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 15 / 37
![Page 20: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/20.jpg)
What is Machine Learning?Machine Learning & Artificial Intelligence
“Classic” AI
Particular General
Representation
F1,2 ∝m1∗m2
r21,2
Particular
Acquisition
(Induction)
Application
(Deduction)
Classic AI addressed deductive reasoning & knowledge representationLearning is concerned with inductive reasoning (generalizing) toconstruct hypotheses
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 15 / 37
![Page 21: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/21.jpg)
What is Machine Learning?Machine Learning & Artificial Intelligence
Learning
Particular General
Representation
F1,2 ∝m1∗m2
r21,2
Particular
Acquisition
(Induction)
Application
(Deduction)
Classic AI addressed deductive reasoning & knowledge representationLearning is concerned with inductive reasoning (generalizing) toconstruct hypotheses
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 15 / 37
![Page 22: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/22.jpg)
What is Machine Learning?Classic Artificial Intelligence: Search, CSPs, & Games
Classic AI includes many interesting problems (see Chapters 1–6 of [9])
Route Planning Constraint Satisfaction Game Search
These AI algorithms solve difficult problems, but do they learn?They use pre-defined knowledge/rules to solve particular instances of eachproblemsTheir solutions do not summarize any inherent aspects of the problems theysolveThese algorithms do not extract information from their input data that canbe applied to solve later problems
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 16 / 37
![Page 23: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/23.jpg)
What is Machine Learning?Classic Artificial Intelligence: Logical & Probabilistic Inference
Inference algorithms derive consequences from prior knowledge &evidence (see Chapters 7–9 & 13–17 of [9])
Logical Inference Probabilistic Inference
Do inference algorithms learn?
They derive previously unknown knowledge from evidenceTheir rules & structure are given a priori from a knowledge base—all theirderivations follow as consequences
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 17 / 37
![Page 24: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/24.jpg)
Part IV
Pattern Analysis
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 18 / 37
![Page 25: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/25.jpg)
Machine Learning as Pattern AnalysisRedundancy
Data Redundancy: indicates that there are (simple) patterns in thedata that allow missing information to be re-constructed/predicted
Compressibility: Redundant data can be compressed (sometimeslosslessly)
Example: Any given natural language text (or photograph), it can besignificantly compressed to a significantly smaller size
Predictability: Redundancy allows predictions to be made with onlypartial information
Example: If we know how long an object has been falling (in a vacuum onEarth), we can predict how far it has fallen: x ∝ −t2
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 19 / 37
![Page 26: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/26.jpg)
Machine Learning as Pattern AnalysisPatterns
Pattern: any relation present in data; i.e., as a function f : X → Y
Exact Pattern: non-trivial pattern such that f (x) = 0 for all foreseeable xApproximate Pattern: non-trivial pattern such that f (x) ≈ 0 for allforeseeable xStatistical Pattern: non-trivial pattern such that Ex∼P [f (x)] ≈ 0 for somedistribution P on X
The veracity of a pattern is assessed by comparing the pattern’sprediction f (x) to the true value y ; this is accomplished via a lossfunction
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 20 / 37
![Page 27: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/27.jpg)
What is Machine Learning?Machine Learning as Pattern Analysis
Pattern Analysis: Discovery of underlying relations, regularities orstructures that are inherent to a set of data
Detecting an inherent pattern allows predictions to be made about futuredata from the same sourceExample - Kepler’s Law: From observation, Kepler found that the periodicityof a planet (P) & its distance (D) are related as P2 ≈ D3
Periodicity (P) Distance (D) P2
D3
Mercury 0.24 0.39 0.058 0.059Venus 0.62 0.72 0.38 0.39Earth 1.00 1.00 1.00 1.00Mars 1.88 1.52 3.53 3.51Jupiter 11.86 5.20 140.66 140.61Saturn 29.46 9.58 867.89 879.22Uranus 84.32 ?? 7109.86 ??
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 21 / 37
![Page 28: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/28.jpg)
What is Machine Learning?Machine Learning as Pattern Analysis
Pattern Analysis: Discovery of underlying relations, regularities orstructures that are inherent to a set of data
Detecting an inherent pattern allows predictions to be made about futuredata from the same sourceExample - Kepler’s Law: From observation, Kepler found that the periodicityof a planet (P) & its distance (D) are related as P2 ≈ D3
Periodicity (P) Distance (D) P2
D3
Mercury 0.24 0.39 0.058 0.059Venus 0.62 0.72 0.38 0.39Earth 1.00 1.00 1.00 1.00Mars 1.88 1.52 3.53 3.51Jupiter 11.86 5.20 140.66 140.61Saturn 29.46 9.58 867.89 879.22Uranus 84.32 19.22 7109.86
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 21 / 37
![Page 29: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/29.jpg)
What is Machine Learning?Machine Learning as Pattern Analysis
Pattern Analysis: Discovery of underlying relations, regularities orstructures that are inherent to a set of data
Detecting an inherent pattern allows predictions to be made about futuredata from the same sourceExample - Kepler’s Law: From observation, Kepler found that the periodicityof a planet (P) & its distance (D) are related as P2 ≈ D3
Periodicity (P) Distance (D) P2
D3
Mercury 0.24 0.39 0.058 0.059Venus 0.62 0.72 0.38 0.39Earth 1.00 1.00 1.00 1.00Mars 1.88 1.52 3.53 3.51Jupiter 11.86 5.20 140.66 140.61Saturn 29.46 9.58 867.89 879.22Uranus 84.32 19.23 7109.86 7111.11
By finding patterns, the system is able to generalize & makepredictions—thus, this is a form of learning
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 21 / 37
![Page 30: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/30.jpg)
Machine LearningA General Description
Machine Learning Definition (paraphrased from Tom Mitchell [6])
A computer algorithm A is said to learn from data/experience D
with respect to some class of tasks T & performance measure L, ifits performance at tasks in T , as measured by L, improves withexperience D.
The algorithm A is a learning algorithm
The data/experience D will generally be a dataset
The performance function L will generally be a statistical loss function
Throughout this course, we will consider a number of different learningtasks; among them are classification, regression, subspace estimation,outlier detection & clustering.
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 22 / 37
![Page 31: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/31.jpg)
Part V
Learning Framework & Tasks
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 23 / 37
![Page 32: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/32.jpg)
Machine LearningA Mathematical Framework & Terminology
Input Space (X ): space used to describe individual data items; e.g.,the D-dimensional Euclidean space, ℜD
Output Space (Y): space of possible predictions
Dataset (D): indexable collection of data; i.e., the data consistent of Nitems from X ; each instance is a data point xi (and output yi)
Hypothesis/Estimator (f ): object or function that represents thelearned entity; f : X → Y
Hypothesis Space (F): set of all learnable hypotheses
Learning Algorithm (A): algorithm that selects hypothesis f ∈ Fbased on data D; i.e., A : XN → F
Loss Function L (·, ·): a non-negative function that measuresdisagreement between its arguments
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 24 / 37
![Page 33: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/33.jpg)
Machine LearningA Simple Example
Consider the task of finding the center of a distribution on the reals, ℜ,given N numbers drawn from the distribution.
The dataset is D = {xi}Ni=1 where each data point xi ∈ ℜ.
The hypothesis space is the set of all possible centroids; i.e., F = ℜ
The mean is one possible centroid estimating algorithm:
A (D) = 1N
∑Ni=1 xi
The median is a second centroid estimation algorithm:
A (D) = median (x1 . . . xN)
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 25 / 37
![Page 34: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/34.jpg)
Common Machine Learning TasksTypes of Learning
Supervised Learning: pattern analysis in which training data containspaired examples of inputs xi & their corresponding outputs yi
Examples: Regression, Binary/Multiclass Classification
Semi-Supervised Learning: pattern analysis in which training datacontains both paired examples (xi , yi ) & unpaired examples xj
Examples: Transduction, Ranking
Unsupervised Learning: pattern analysis in which training datacontains only unpaired examples of inputs xi
Examples: Anomaly Detection, Subspace Detection, Clustering
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 26 / 37
![Page 35: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/35.jpg)
Common Machine Learning TasksRegression
Objective: find relationship between (correlated) input variables x &output variables y
The dataset is D = {(xi , yi )}Ni=1 where each data point is a pair of input
variables xi ∈ ℜD & the corresponding output yiThe hypothesis space is the set of all functions from the input space Xto the output space Y; i.e., F = {f | f : X → Y}
This hypothesis space is generally too large (to be discussed)
A common restriction is to just consider the set of linear mappings fromX to Y as parametrized by w and b as f (x) = w⊤x− b
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 27 / 37
![Page 36: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/36.jpg)
Common Machine Learning TasksClassification
Objective: find separation between input variables xi based on the classyi of observed instances
The dataset is D = {(xi , yi )}Ni=1 where each data point is a pair of input
variables xi ∈ ℜD & the corresponding output yi ∈ {1, . . . ,K}
The hypothesis space is the set of all functions from the input space Xto {0, . . . ,K}; i.e., F = {f | f : X → {1, . . . ,K} }
Case of binary classification (labels −1 & 1) can be addressed withregression; i.e., as the sign of a real-valued function
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 28 / 37
![Page 37: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/37.jpg)
Common Machine Learning TasksSubspace Estimation
Objective: find a projection PX onto a subspace which “captures” thedata; i.e., PX (xi ) has a small residual, ‖PX (xi )− xi‖
The dataset is a set of points in X : D = {xi}Ni=1
The hypothesis space is the set of all subspace projections; i.e.,F = {PX | ∀ x ∈ X PX (x) = PX (PX (x)) }
When X is a Euclidean space, the subspace (and its projection) can beparametrized by a set of k ≤ D orthonormal basis vectors
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 29 / 37
![Page 38: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/38.jpg)
Common Machine Learning TasksClustering
Objective: find underlying clusters (K ) within the dataset; i.e., there isa latent label y that predicts structure
The dataset is a set of points in X : D = {xi}Ni=1
The hypothesis space is the set of all functions from the input space Xto the cluster label; i.e., F = {f | f : X → {1, . . . ,K} }
Number of clusters (K ) often preselectedAssumptions often made about shape of clusters
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 30 / 37
![Page 39: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/39.jpg)
Part VI
General Challenges for Learning
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 31 / 37
![Page 40: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/40.jpg)
Challenges for Machine LearningInductive Bias (see also [2, 7])
Inductive learning algorithms require an inductive bias
Without a bias, the number of possible hypotheses is too large anduntenable
For a finite space X , the number of possible binary hypotheses is 2|X |
Suppose you maintain a set of all hypotheses consistent with yourobservations. . .For any new unseen instance, there will always been an equal number ofhypotheses that predict that instance as both positive & negative!
Inductive Bias: “the set of assumptions that the learner uses to predictoutputs given inputs that it has not encountered” [7]
Occam’s Razor: prefer shorter/simpler hypothesesMaximum Margin: prefer hypotheses with the large margin (gap)Minimum Features: only include significant features
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 32 / 37
![Page 41: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/41.jpg)
Challenges for Machine LearningSpurious Patterns
Underfitting: the inability to find significant patterns whenoverly-restrictive assumptions are made about data or the hypothesisspace is too small.
Overfitting: Finding spurious patterns when too few assumptions aremade about the data or hypothesis space is too large found.
Codes from (left) Bible & (right) War and Peace (see [5])
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 33 / 37
![Page 42: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/42.jpg)
Challenges for Machine LearningComputational Efficiency
Learning algorithms should be able to (computationally) scale. . .1 To large datasets2 For quick predictions
Training efficiency of learning is generally measured in dataset size, N1 Algorithms are efficient if their computational complexity is polynomial in
N ; i.e., O (Na) for some fixed a ≥ 0.2 Algorithms are considered to be large-scale if their computational
complexity is linear in N ; i.e., O (N)3 In some applications, it may not even be computationally feasible to look at
every data point; require sublinear or logarithmic complexity
Prediction efficiency of learning is generally measured in dataset size, N,and in number of predictions M
Prediction should be sublinear in N
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 34 / 37
![Page 43: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/43.jpg)
Summary
1 Machine learning (ML) is a relevant, popular topic with applicationsspanning many data-driven tasks (e.g., translation & recommendersystems)
2 ML spans tasks in inductive reasoning; unlike classic AI, ML infersgeneral patterns from specific samples
3 ML algorithms can be viewed as pattern analyzers—the patterns theyfind can be used to make predictions
4 Common tasks in ML include regression, classification, subspacediscovery, & clustering
5 Learning algorithms face challenges including choosing an inductivebias, under- & over-fitting, & computational efficiency
6 Next Lecture: We will discuss a general approach to learning calledkernel methods & show its application to regression
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 35 / 37
![Page 44: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/44.jpg)
Bibliography I
[1] Laurent Charlin, Richard S. Zemel, and Craig Boutilier. A frameworkfor optimizing paper matching. In Proceedings of the Twenty-SeventhConference on Uncertainty in Artificial Intelligence (UAI), pages86–95, 2011.
[2] Diana F. Gordon and Marie desJardins. Evaluation and selection ofbiases in machine learning. Machine Learning, 20(1-2):5–22, 1995.
[3] Kevin Knight and Philipp Koehn. What’s new in statistical machinetranslation. In HLT-NAACL, 2003.http://people.csail.mit.edu/people/koehn/publications/tutorial2003.pdf.
[4] G Linden, B Smith, and J York. Amazon.com recommendations:Item-to-item collaborative filtering. IEEE Internet Computing,7(1):76–80, 2003.
[5] Brendan Mckay, Dror Bar-Natan, Maya Bar-Hillel, and Gil Kalai.Solving the bible code puzzle. Statistical Science, 14:150–173, 1999.
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 36 / 37
![Page 45: Lecture 1 Introduction to Machine Learning & Modern ... · Their solutions do not summarize any inherent aspects of the problems they solve These algorithms do not extract information](https://reader033.fdocuments.in/reader033/viewer/2022041516/5e2b51ecd12e5843fc6a9e05/html5/thumbnails/45.jpg)
Bibliography II
[6] Tom Mitchell. Machine Learning. McGraw Hill, 1997.
[7] Tom M. Mitchell. The need for biases in learning generalizations.Technical Report CBM-TR 5-110, Rutgers University, NewBrunswick, NJ, 1980.
[8] Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction torecommender systems handbook. In Recommender SystemsHandbook, pages 1–35. 2011.
[9] Stuart J. Russell and Peter Norvig. Artificial Intelligence - A ModernApproach. Pearson Education, 3rd edition, 2010.
[10] John Shawe-Taylor and Nello Cristianini. Kernel Methods for PatternAnalysis. Cambridge University Press, 2004.
P. Laskov and B. Nelson (Tubingen) Lecture 1: Introduction April 17, 2012 37 / 37