2017 Predictive Analytics Symposium - Member | SOA Predictive Analytics Symposium . ... Conceptual...

43
2017 Predictive Analytics Symposium Session 13, Getting Started: Sources of Tools and Training Moderator: Min Mercer, FSA, MAAA Presenters: Mary Pat Campbell, FSA, MAAA Michael Cletus Niemerg, FSA, MAAA SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Transcript of 2017 Predictive Analytics Symposium - Member | SOA Predictive Analytics Symposium . ... Conceptual...

2017 Predictive Analytics Symposium

Session 13, Getting Started: Sources of Tools and Training

Moderator: Min Mercer, FSA, MAAA

Presenters:

Mary Pat Campbell, FSA, MAAA Michael Cletus Niemerg, FSA, MAAA

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

www.conning.com | © 2017 Conning, Inc.

Getting Started: Sources of Tools and TrainingResources for Predictive Analytics in R

Mary Pat Campbell, FSA, MAAA

September 2017

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

PREDICTIVE ANALYTICS IN RBeginner Resources

1

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

INTRO TO STATISTICAL LEARNINGOnline course, textbook, and R exercises

2

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Online Course – Hands On, Simple Examples

3

Class link: https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/

Stanford Online Lagunita: Statistical Learning – Self-Paced Course

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Multimedia Approach, Convenient to Try Offline

4

1.1 Opening Remarks

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Quick Quizzes to Test Understanding

5

Chapter 7 Quiz: Moving Beyond Linearity

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Instant Feedback and Explanation

6

Chapter 7 Quiz: Moving Beyond Linearity

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Some Questions are Easier than Others...

7

Chapter 4.1: Introduction to Classification Problems

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Textbook: An Introduction to Statistical Learning with Applications in R

8

Book Page:

http://www-bcf.usc.edu/~gareth/ISL/book.html

Fourth Printing:

http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf

Simplified mathematical/statistical underpinning More rigorous approach: The Elements of Statistical Learning

Covers same material as in the course

Exercises: Important to Try BothConceptual Applied

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Conceptual Exercises: Curse of Dimensionality

9

ISLR, Chapter 4, Conceptual Exercises -- #4 – the problem with local approaches

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Conceptual Exercises: Curse of Dimensionality

10

ISLR, Chapter 4, Conceptual Exercises -- #4 – the problem with local approaches

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Conceptual Exercises: Curse of Dimensionality

11

ISLR, Chapter 4, Conceptual Exercises -- #4 – the problem with local approaches

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Conceptual Exercises: Backward Solving for Sample

12

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Curse of Dimensionality: The Ever-Expanding Cube

13

Wolfram Alpha query: https://www.wolframalpha.com/input/?i=lim(10%5E(-1%2Fd),+d+-%3E+infinity)&rawformassumption=%7B%22C%22,+%22d%22%7D+-%3E+%7B%22Variable%22%7D&rawformassumption=%22UnitClash%22+-%3E+%7B%22d%22,+%7B%22Days%22%7D%7D

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

KAGGLE KERNELSOnline playground

14

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Kaggle Kernels: Play With Other People’s Code!

15

Random Forest And Nearest Neighbors on a Few Blockshttps://www.kaggle.com/apapiu/random-forest-on-a-few-blocks

Alexandru Papiu, https://www.kaggle.com/apapiu/random-forest-on-a-few-blocks Accessed 11 Sept 2017

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Can See Code and Its Result: Graph Example

16

ggplot(small_train, aes(x, y )) +

geom_point(aes(color = place_id)) +

theme_minimal() +

theme(legend.position = "none") +

ggtitle("Check-ins colored by place_id")

Alexandru Papiu, https://www.kaggle.com/apapiu/random-forest-on-a-few-blocks Accessed 11 Sept 2017ggplot cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Can See Code and Its Result: Non-Graph Example

17

Code:model_knn = FNN::knn(train = X, test = X_val, cl = small_train$place_id, k = 15) preds <- as.character(model_knn) truth <- as.character(small_val$place_id) mean(truth == preds)

Result:## [1] 0.5151964

Code:set.seed(131L) small_train$place_id <- as.factor(small_train$place_id) # ranger needs factors for classificationmodel_rf <- ranger(place_id ~ x + y + accuracy + hour + weekday + month + year,

small_train, num.trees = 100, write.forest = TRUE, importance = "impurity")

Result:## Growing trees.. Progress: 36%. Estimated remaining time: 55 seconds. ## Growing trees.. Progress: 86%. Estimated remaining time: 10 seconds.Alexandru Papiu, https://www.kaggle.com/apapiu/random-forest-on-a-few-blocks Accessed 11 Sept 2017

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Other Kaggle Kernels to Try in R

18

Exploring the Titanic Datasethttps://www.kaggle.com/mrisdal/exploring-survival-on-the-titanicGreat data set for beginners – passenger list on Titanic... and who survived Yes/no classification problemBeginners competition: https://www.kaggle.com/c/titanic

Exploratory Analysis Zillowhttps://www.kaggle.com/philippsp/exploratory-analysis-zillowshows correlation plots

Wiki Traffic Forecast Explorationhttps://www.kaggle.com/headsortails/wiki-traffic-forecast-exploration-wtf-edahas hidden code – unhide to see

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Titanic Survival Data Set

19

SOURCE: Megan Risdal, https://www.kaggle.com/mrisdal/exploring-survival-on-the-titanic

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

Fork that Script!

20

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

See How It’s Done

21

SOURCE: Philipp Spachtholz, https://www.kaggle.com/philippsp/exploratory-analysis-zillow Accessed 12 Sept 2017

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

See How It’s Done

22

SOURCE: Philipp Spachtholz, https://www.kaggle.com/philippsp/exploratory-analysis-zillow Accessed 12 Sept 2017

© 2017 Conning, Inc. This research publication is copyrighted with all rights reserved. No part of this research publication may be reproduced, transcribed, transmitted, stored in an electronic retrieval system, or translated into any language in any form by any means without the prior written permission of Conning.

GOT YOUR OWN RECOMMENDATIONS?

23

Getting Started: Sources of tools and trainingSession 13September 2017 – Predictive Analytics Symposium

Michael Niemerg, FSA, MAAAHealthcare Actuary, Milliman Inc.

Machine Learning Toolbox

Statistics and Algorithms

Programming

Data wrangling

Data Visualization and Communication

Domain Knowledge

2

So many languages…

3

So many algorithms…

4

www.machinelearningmastery.com

What’s Common in the Insurance Industry

Languages R

Python

SAS

Algorithms Generalized Linear Models

Penalized Regression

Decision Trees

5

What I wished I had known…

Don’t be intimidated Theory is deep – application is shallow

You don’t need an advanced math/stats/CS degree

Just get started Start with a single programming language and a problem

Don’t focus on only algorithms or only programming – do both

Spend your time wisely Focus on what matters to you

Get the gist of what isn’t important to you

Know your goals Are you curious to see what the buzz is about?

Do you need help with a specific problem?

Do you want to move into a predictive modeling career?

Just get started already!

6

Books

7

Online Education

8

www.coursera.orgwww.datacamp.com

www.kaggle.com

www.udacity.com

www.machinelearningmastery.comwww.udemy.com

Useful Websites

9

https://dataelixir.com/

www.r-bloggers.com/

www.statsblogs.com/

www.win-vector.com/blog

www.datatau.com

www.kdnuggets.com

www.datatau.com

Personal Favorites

Book: An Introduction to Statistical Learning

Online Course: Machine Learning (Coursera – Andrew Ng)

Website: www.datatau.com

Blog/Newsletter: Data Elixir

Dataset Repository: Kaggle

10

Machine Learning in Action

Advantages Gentle on the math

Good introduction to the core concepts

First principles code and examples

Disadvantages Some familiarity with Python

Doesn’t teach pragmatic programming

11

Gradient Descent

A predictive model can have many possible parameter valuesSome parameters are better than others – how do we find the best values?

AlgorithmDetermine Learning Rate Initialize variable weightsRepeat R times: Calculate the gradient Update the weights:

Weights = Weights + Learning Rate * Gradient

12

Gradient Descent – Example

Goal – Using a vector of input data, find a single predicted value ( �𝑦𝑦) that minimizes our error function when compared to the dataData – [ 107, 93, 105, 107, 95, 82, 110, 104, 99, 99]Error Function:

�12𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 − 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃 2

Gradient:−� 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 − 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃

Parameters: Start with an initial guess of 165 Perform 5 rounds of gradient descent Use a learning rate of .05

13

Gradient Descent – Example

14

Logistic Regression w/ Gradient Descent

15

Y X1 X20 1.75 1.370 1.05 1.520 2.43 1.310 2.06 0.51 1.45 2.641 0.95 4.381 1.97 3.51 1.51 3.69

Category 0 = WhiteCategory 1 = Black

Sample of Data Scatterplot of Full Dataset

Let’s start with a randomly generated dataset…

Logistic Regression

�𝑌𝑌 =1

1 + 𝑃𝑃−(𝛽𝛽0+𝛽𝛽1𝑥𝑥1+𝛽𝛽2𝑥𝑥2)

16

Logistic Regression Decision Boundary

17

Thank you