Applied Machine Learning Lecture 1: Introductionrichajo/dit866/lectures/l1/l1.pdf · Applied...

Applied Machine LearningLecture 1: Introduction

Richard Johansson

richard.johansson@gu.se

January 21, 2020

welcome to the course!

I machine learning is increasingly popular among studentsI our courses are full!I many thesis projects develop or apply ML models

I . . . and in industry, public sectorI many companies come to us looking for studentsI joint research projects

I why the fuss and why now?

success stories: image recognition

success stories: machine translation

[image by Chris Manning]

[source]

applications. . .

[source]

topics covered in the course

I the usual “zoo”: a selection of machine learning modelsI what’s the idea behind them?I how are they implemented? (at least on a high level)I what are the use cases?I how can we apply them practically?

I but hopefully also the “real-world context”:I extended “messy” practical assignments requiring that you

think of what you’re doingI invited talks from industry and the healthcare sectorI annotation of data, evaluationI ethical and legal issues, interpretability

overview

practical issues about the course

basic ideas in machine learning

machine learning libraries in Python

example of a learning algorithm: decision tree learning

underfitting and overfitting

course webpage

I the official course webpage is the Canvas pagehttps://chalmers.instructure.com/courses/8685/

structure of teaching

I lectures Tuesdays and FridaysI some theory and introduction to ML softwareI interactive codingI solving a few exercises when we have timeI most lectures will be given by SelpiI . . . except some guest lectures

I lab sessions ThursdaysI our TAs help you work on your assignmentsI choose between the 13-15 and the 15-17 sessionI please let me know if it’s too crowded

assignments

I seven compulsory assignments:PA 1A intro to the ML workflow, decision treesPA 1B random forestsPA 2A text classificationPA 2B linear classifiersPA 3B skin mark classificationWA 1 read a scientific paper in applied machine learningWA 2 written essay on ethics in ML

I please refer to the course PM for details about gradingI we will use the Python programming language

programming assignment 1A

I warmup lab exercise: quick tour of the scikit-learn libraryI introduction to decision treesI for a high grade: implement decision tree regressionI lab session on ThursdayI submission deadline: January 29

noncompulsory work

I exercise sheetsI online quizzes

literature

I the main course book is A Course in Machine Learning byHal Daumé III: http://ciml.info

I and additional papers to read for some topicsI some notes to complement the lecturesI example code will be posted on the course page

exam, mid-March

I this is a take-home exam: a written assignmentI your solution be submitted onlineI we will find a date in the exam period that suits as many as

possible

exam, details

I a first part about basic concepts: you need to answer most ofthese questions correctly to pass

I a second part that requires more insight: answer thesequestions for a higher grade

student representatives

I if you’re interested in being a student representative, pleasesend me an email!

I the workload is light and there will be a small reward. . .

overview

basic ideas

I given some object, make a predictionI is this patient diabetic?I is the sentiment of this movie review positive?I does this image contain a cat?I what will be tomorrow’s stock market value of this company?I what are the phonemes contained in this speech signal?

I the goal of machine learning is to build the predictionfunctions by observing data

I contrast: expert-defined or data-driven

[source]

basic ideas

[source]

basic ideas

[source]

why machine learning?

why would we want to “learn” the function from data instead ofjust implementing it?

I usually because we don’t really know how to write downthe function by handI speech recognitionI image classificationI machine translationI . . .

I might not be necessary for limited tasks where we knowI what is more expensive in your case? knowledge or data?

don’t forget your domain expertise!

ML makes some tasks automatic, but we still need our brains:

I defining the tasks, terminology, evaluation metricsI annotating (hand-labeling) training and testing dataI designing featuresI error analysis

example: is the patient diabetic?

in order to predict, we make some measurements of propertieswe believe will be useful: these are called the features

example: is the patient diabetic?

I in order to predict, we make some measurements of propertieswe believe will be useful: these are called the features

features: different views

I many learning algorithms operate on numerical vectors:features = [ 1.5, -2, 3.8, 0, 9.12 ]

I more abstractly, we often represent the features as attributeswith values (in Python, typically a dictionary)

features = { "gender":"male","age":37,"blood_pressure":130, ... }

I sometimes, it’s easier just to see the features as a list of e.g.words (bag of words)

features = [ "here", "are", "some", "words","in", "a", "document" ]

more terminology: what is the output?

I classification: learning to output a category labelI spam/non-spam; positive/negative; . . .

I regression: learning to guess a numberI value of a share; number of stars in a review; . . .

basic terminology: supervised learning

I in supervised learning, the training set consists ofinput–output pairs

I our goal is to learn to produce the outputs

types of supervision: alternatives

I unsupervised learning: we are given “unorganized” dataI our goal is to discover some structure

7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

I reinforcement learning: our problem is formalized as a gameI an agent carries out actions and receives rewards

example: Fisher’s iris data

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0petal_length

versicolorvirginicaversicolorvirginica

approach 1: linear separator

if 0.85 · petal_length+ 2.42 · petal_width ≥ 8.34:return virginica

elsereturn versicolor

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5petal_length

approach 2: if/then/else tree

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5petal_length

basic machine learning workflow

basic ML methodology: evaluation

I select an evaluation procedure (a “metric”) such asI classification accuracy: proportion correct classifications?I mean squared error often used in regressionI or some domain-specific metric

I compare to one or more baselinesI trivial solutionI rule-based solutionI existing solution

I apply your model to a held-out test set and evaluateI the test set must be different from the training setI also: don’t optimize on the test set; use a development set or

cross-validation!

managing your data for evaluation

[source]

overview

use cases for machine learning

I standard use cases: standardsolutions are available

I special cases: we may need to tailorour own solutions

the Python machine learning ecosystem (selection)

machine learning software: a small sample

I general-purpose software, large collections of algorithms:I scikit-learn: http://scikit-learn.org

I Python library – will be used in this courseI Weka: http://www.cs.waikato.ac.nz/ml/weka

I Java library with nice user interfaceI special-purpose software, small collections of algorithms:

I Keras, PyTorch, TensorFlow, CNTK for neural networksI LibSVM/LibLinear for support vector machinesI XGboost, lightgbm for tree ensemblesI . . .

I large-scale learning in distributed architectures:I Spark MLLibI H2O

scikit-learn toy example

see alsohttps://scikit-learn.org/stable/getting_started.html

overview

classifiers as rule systems

I assume that we’re building the prediction function by handI how would it look?I probably, you would start writing rules like this:

I IF the blood glucose level > 150, THENI IF the age > 50, THEN return TrueI ELSE . . .I . . .

I a human would construct such a rule system by trial and errorI we’ll see how it can be learned automatically

decision tree classifiers

I a decision tree is a tree whereI the internal nodes represent a choice based on a featureI the leaves represent the return value of the classifier

I like the example we had previously:I IF the blood glucose level > 150, THEN

I IF the age > 50, THEN return TrueI ELSE . . .I . . .

general idea for learning a tree

I it should make few errors on the training setI and an Occam’s razor intuition: we’d like a small treeI however, finding a small and accurate tree is a complex

computational problemI it is NP-hard

I instead, we’ll look at an algorithm that works top-down byselecting the “most useful feature”

I some different variants:I basic approach: the ID3 algorithmI extended approaches: CART, C4.5, . . .I see e.g. Daumé III’s book or

http://en.wikipedia.org/wiki/ID3_algorithm

greedy decision tree classifier learning (pseudocode)

def TrainDecisionTreeClassifier(X , Y )if all outputs in Y are identical

return a leaf with the class of the examples in Yif we have reached the maximally allowed depth

return a leaf with the majority class of YF ← the “most useful feature” in Xfor each possible value fi of F

Xi ,Yi ← the subset where F = fitreei ← TrainDecisionTreeClassifier(Xi ,Yi )

return a tree node that splits on F ,where fi is connected to the subtree treei

how to select the “most useful feature”?

I there are many rules of thumb to select the most usefulfeatureI idea: a feature is good if the subsets Ti are homogeneous

I in Daumé III’s book, he uses a simple score to rank thefeatures:I for each subset Ti , compute the frequency of its majority classI sum the majority class frequencies

I however, the most well-known ranking measure is theinformation gainI this measures the reduction of entropy (statistical uncertainty)

we get by considering the feature

I scikit-learn uses the Gini impurity by default

example: selecting the feature for the top node

decision trees with numerical features

I when our features are numerical, we set a threshold and buildsubtrees for the upper and lower subset

I so we need to find the threshold that gives us the nicest split

example: finding the best threshold

1 2 3 4 5petal_length

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0petal_length

example: finding the best threshold

1 2 3 4 5petal_length

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0petal_length

implementing decision tree classifiers

overview

what goes on when we “learn”?

I the learning algorithm observes the examples in the training setI it tries to find common patterns that explain the data: it

generalizes so that we can make predictions for new examples

I how this is done depends on what algorithm we are using

principles of induction: how do we select “good” models?

I hypothesis space: the set of all possible outputs of a learningalgorithmI for decision tree learners: The set of possible treesI for linear separators: the set of all lines in the plane /

hyperplanes in a vector space

I “learning” = searching the hypothesis spaceI how do we know what hypothesis to look for?

a fundamental tradeoff in machine learning

I goodness of fit: the learned classifier should be able tocapture the information in the training setI e.g. correctly classify the examples in the training data

I regularization: the classifier should be simpleI use as few features as possible?I don’t rely too much on any feature?I small tree or neural network?

why would we prefer “simple” hypotheses?

“overfitting” and “underfitting”: the bias–variance tradeoff

[Source: Wikipedia]

example: training/test accuracy as a function of tree depth

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20tree depth

traintest

up next

I Thursday: lab session for programming assignment 1AI topic of Friday’s lecture: ensembles and random forestsI please prepare for assignment 1A by reading my code and the

extra reading on decision trees

Applied Machine Learning Lecture 1: Introductionrichajo/dit866/lectures/l1/l1.pdf · Applied...

Documents

Transcript of Applied Machine Learning Lecture 1: Introductionrichajo/dit866/lectures/l1/l1.pdf · Applied...

Effects of L1 orthography and L1 phonology on L2 …centaur.reading.ac.uk/77792/1/Mairano_Bassetti_Sokolovic...Effects of L1 orthography and L1 phonology on L2 English pronunciation

Fredrik.Enlund@gu.se 1 Sahlgrenska universitetssjukhuset · 2012-10-16 Fredrik.Enlund@gu.se Fusion genes in diagnostics at KMP, Sahlgrenska University hospital • EWSR1-FLI1 typ

DAT340/DIT866 Applied Machine Learning: Machine learning ...richajo/dit866/lectures/l11/l11.pdfDAT340/DIT866 Applied Machine Learning: Machine learning and ethics Vilhelm Verendel

Fun Activities Telling the Time Places at School · Irregular Verbs Regular Verbs Transport Adverbs L1 U4 where what time what who trousers swimsuit belt L1 U4 L1 U4 L1 U5 L1 U5 L1

1430 L1-L4 Technical Specifications · Section - 1 2 1430 L1-L4 Technical Specifications. 1430 L1-L4 - Manual # 99906367. 1430 L1-L4, Specifications. Performance. Unit. L1 L2 L3 L4.

L1 instruction

Is a mixture assessment factor (MAF) the right way forward? Thomas Backhaus University of Gothenburg thomas.backhaus@gu.se.

EVALUATION FORM - gu.se

L1. Photos

Environmental Risk Assessment of Pharmaceutical Mixtures: - empirical knowledge, gaps and regulatory options Thomas Backhaus University of Gothenburg thomas.backhaus@gu.se.

Static Toolholders DIN 69880 - eyar.co.il Toolholders Catalog.pdf · Toolholders DIN 69880. 24 25 l1 l2 d4 d3 d1 l1 d1 l2 d4 d2 l1 l2 d4 d3 d1 l1 d1 l2 d4 d2 l1 l2 d4 d3 d1 l1 d1

L1 Sustainability

Does knowledge of L1 grammatical morphology and L1 reading ...

L1 Statistics

L1: C# Classespeople.sutd.edu.sg/~ugur_arikan/Documents/Tutorials/CSharp/L1.pdf · L1: C# CLASSES 10/13/16 Uğur Arıkan (ugur_arikan@sutd.edu.sg ; ugur_arikan/) 1 L1: C# Classes

First language acquisition LING 400 Winter 2010. Overview Characteristics of L1 Theories of L1 L1 and innateness Critical period L1 and ASL Please turn.

Substation L1/ MCC L1 Update #2

LG 29FS7RL-L1 29FS7RK-L1 CW62C

Www.gu.se Ullika Lundgren Miljöcontroller ullika@gu.se 031 7869870 Environmental management at the University of Gothenburg – how does it affect work at.

Viral Evolution and Recombination Peter Norberg peter.norberg@gu.se.