Machine Learning: What we learned from our first Coursera ...minncas.org/docs/MinnCAS Jan 22nd -...

Minnesota Casualty Actuarial Symposium

Machine Learning: What we learned from our first Coursera courseNathan Hubbell, Laura Johnson, Patrick Fillmore, Stephen Segroves

January 22nd, 2013

Agenda

1. MOOC Overview - Nathan

2. Machine Learning Concepts - Patrick

3. Machine Learning in Practice – Stephen

4. Other Learnings from Machine Learning - Laura

5. Q&A

MOOC OverviewNathan Hubbell

MOOC Overview

• MOOC – Massive Open Online Course• Big MOOC Names

– Feb, 2011: Udacity (Stanford)– April, 2012: Coursera (Stanford)– March, 2012 edX (Harvard, MIT, Berkeley)– Sept, 2006: Khan Academy

• Features– Open access– Scalability– Discussion Boards

3http://en.wikipedia.org/wiki/Massive_open_online_course

“Coursera doubles university count to 33, now hosts over

200 courses for over 1.3 million students”

The Next Web InsiderSeptember 19th, 2012

MOOCs in the News

• On Udacity’s site:– “… But that seems to be a willful misreading of the regulation

(which seems silly in the first place). Coursera isn't a degree mill. It's not about earning the degree, it's about actually learning. Minnesota's interpretation of the law is fairly ridiculous. It basically means that anyone who wants to access online educational material in Minnesota is limited by the state determining what it considers okay."

• Slate.com: Larry Pogemiller, director of the MN Office of Higher Education:– “Obviously, our office encourages lifelong learning and wants

Minnesotans to take advantage of educational materials available on the Internet, particularly if they’re free. No Minnesotan should hesitate to take advantage of free, online offerings from Coursera.”

MOOCsperience

• Class Structure– 10 Week Course – 2-3 hours of video content per week– Wiki-based Course Notes– Questions? Discussion Forum

• Homework– Review Questions: Quick 5-question / 10 minutes– Programming Exercises: 1 – 4 hours

The following slides’ content are drawn heavily from theCoursera Machine Learning class content: https://www.coursera.org/#course/ml“

Machine Learning ConceptsPatrick Fillmore

What is Machine Learning?

-Tom Mitchell, American computer scientist and E. Fredkin University Professor at the Carnegie Mellon University

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its

performance on T, as measured by P, improves with experience E.”

TaskExperience Performance

What is Machine Learning?

Predict Future Losses

Policy LossHistory

Actual Losses / Loss Ratio

Ratemaking

Predict Future Development

LossDevelopment

History

Final / PredictedUltimates

Reserving

Machine Learning Techniques

• Familiar – Linear Regression / Linear Models– Logistic Regression / GLMs

• Not Machine Learning Algorithms– Judgmental selection of LDFs

– Risk/Reinsurance Models

• Unfamiliar– Supervised Learning

– Regularization– Neural Networks– Support Vector Machines

– Unsupervised Learning– Principal Component Analysis– Clustering– Recommender Systems

– Many More!

Data Driven Modeling

Linear Regression

Weight = Height * Factor + Intercept

Human Height vs. Weight

62 64 66 68 70 72 74 76

Height (Inches)

Hypothesis: 110)( xxhy

Linear Regression: Cost Function

110)( xxh Hypothesis:Human Height vs. Weight

62 64 66 68 70 72 74 76

Height (Inches)

Fitting Goal: minimize J

How to find a good fit?Cost Function!

Cost Function:

mSSEyxh

2)()(10 2

)(21),(

Use Gradient Descent!

Linear Regression: Minimize Cost (Gradient Descent)

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0 1 2 3 4

1. Start with a ; determine cost

Iteration J()

0 1.00 5,488,884

Iteration J() dJ/d

0 1.00 5,488,884 -165.30

2. Determine how J changes with (dJ/d)

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0 1 2 3 4

Iteration J() dJ/d

0 1.00 5,488,884 -165.30

1 2.65

3. Calculate a new

New = Old ‐

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0.00 1.00 2.00 3.00 4.00

= learning rate = .01

Iteration J() dJ/d

0 1.00 5,488,884 -165.30

1 2.65 581,450 -52.98

2 3.18

4. Iterate until Convergence

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0.00 1.00 2.00 3.00 4.00

Iteration J() dJ/d

0 1.00 5,488,884 -165.30

1 2.65 581,450 -52.98

2 3.18 77,351 -16.98

3 3.35

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0.00 1.00 2.00 3.00 4.00

Iteration J() dJ/d

0 1.00 5,488,884 -165.30

1 2.65 581,450 -52.98

2 3.18 77,351 -16.98

3 3.35 25,569 -5.44

4 3.41

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0.00 1.00 2.00 3.00 4.00

Iteration J() dJ/d

0 1.00 5,488,884 -165.30

1 2.65 581,450 -52.98

2 3.18 77,351 -16.98

3 3.35 25,569 -5.44

4 3.41 20,250 -1.74

5 3.42

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0.00 1.00 2.00 3.00 4.00

Iteration J() dJ/d

0 1.00 5,488,884 -165.30

1 2.65 581,450 -52.98

2 3.18 77,351 -16.98

3 3.35 25,569 -5.44

4 3.41 20,250 -1.74

5 3.42 19,704 -0.56

6 3.43

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

0.00 1.00 2.00 3.00 4.00

Final : 3.43

Cost Function – One Parameter

3 3.25 3.5 3.75 4

Cost Function – Two Parameters

Linear Regression: GD vs. Normal Equations

Human Height vs. Weight

y = 3.4327x - 106.03

62 64 66 68 70 72 74 76

Height (Inches)

Why discuss Gradient Descent at all?

• Basic fitting algorithm for Machine Learning• Many other Systems/Models use Gradient Descent

Andrew Ng: If you understand gradient descent and can implement it, you can use optimized software to solve problems, and are ahead of many of the people working on this stuff in this field.

Neural Networks

Layer 1 Layer 2 Layer 3 Layer 4

Selecting Model Structure (The Right Machine for the Job)

Bias/variance: How would you fit this model?

Selecting Model Structure (The Right Machine for the Job)

Bias/variance: How would you fit this model?

Bias vs. Variance

High bias(underfit)

Bias vs. Variance

High bias(underfit)

High variance(overfit)

Bias vs. Variance

High bias(underfit)

“Just right”

PriceSize

Cross Validation

ModelFit

Training Validation Testing (Holdout)

ModelStucture

FinalModel

Testing

Bias vs. Variance

High bias(underfit)

“Just right”

PriceSize

Regularization

Machine Learning in Practice: Cluster AnalysisStephen Segroves

Training set:

Supervised Learning

Training set:

Unsupervised Learning

Applications of Clustering

Market Segmentation / Customer Profiling

Territory Grouping

Social Network Analysis

Clustering: K-Means Algorithm

Randomly initialize cluster centroidsRepeat {

for = 1 to := index (from 1 to ) of cluster centroid

closest to for = 1 to

:= average (mean) of points assigned to cluster

Clustering: K-Means Algorithm

Potential Issues: Local Optima

For i = 1 to 100 {

Randomly initialize K-means.Run K-means. Get .Compute cost function (distortion)

Pick clustering that gave lowest cost

Potential Solution: Local Optima

Other Learnings from the Machine Learning CourseLaura Johnson

This course was a great way to learn – WHY?• Structure and foundation given

– 58,000 students across the world across multiple disciplines• Well laid out web site• Discussion forums, wikis, etc.

– Basic building blocks provided

• Technical enhancements to recorded sessions– Notes – color!!– Captions / transcript– Speed control– “interactive” feedback

Coursera Look and Feel - Structure

Coursera – Teaching using Building Blocks

Coursera Technical Enhancements – Notes, Captions, Speed

Neural Networks

Layer 1 Layer 2 Layer 3 Layer 4

Coursera Technical Enhancments – Notes in Color!!

Coursera Technical Enhancments - Feedback

Coursera Technical Enhancements - Feedback

Machine Learning MOOC Recommendations• Time

– Only take one MOOC at a time!– Do the homework on time

• Software Required– Google Chrome, Firefox, IE9– Octave (Free Matlab)– Text Editor (UltraEdit, SublimeText, TextWrangler)

• Suggested Prerequisites– Linear Algebra– Some Programming Experience a Plus

• Team up!• Final comments on Machine Learning:

– Data: GIGO– Half science / half art

Questions?

Machine Learning: What we learned from our first Coursera ...minncas.org/docs/MinnCAS Jan 22nd -...

Documents

Transcript of Machine Learning: What we learned from our first Coursera ...minncas.org/docs/MinnCAS Jan 22nd -...

Managerial Learning.pdf

Building Smart Applications with Amazon Machine Learning.pdf

all levels worth learning.pdf

ways of learning.pdf

Coursera machine learning week7: Support Vector Machines

Coursera neurobiology

Neurobiology coursera

Singh-Wire-Me-Through-Machine-Learning.pdf - Black … · WIRE ME THROUGH MACHINE LEARNING ... GOOGLE SEARCH Wire Me Through Machine Learning Google Dork: ...

teaching and learning.pdf

pattern recognition and machine learning.pdf

Action Learning.pdf

Coursera Learner Guide · Welcome to Coursera! Congratulations on joining your organization’s Coursera learning program! This guide was written to help you navigate the Coursera

Machine Learning Algorithms Using R’s Caret Packagefiles.meetup.com/11316072/Caret-Machine-Learning.pdf · Machine Learning Algorithms Using R’s aret Package •Regressions Models

Coaching for Learning.pdf

Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Machine Learning for Text - charuaggarwal.netcharuaggarwal.net/Text-Learning.pdf · Machine Learning for Text Download link for computers connected to selected institutions:

PRML笔记-Notes on Pattern Recognition and Machine Learning.pdf

Coursera Final

Neural Networks for Machine Learning Lecture 11a Hopfield Netshinton/coursera/lecture11/lec11.pdf · Neural Networks for Machine Learning Lecture 11a Hopfield Nets . Hopfield Nets

(ebook-pdf) - Artificial Intelligence - Machine Learning.pdf