Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data...
Transcript of Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data...
![Page 1: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/1.jpg)
Demystifying Data Science
19th September 2018
![Page 2: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/2.jpg)
The views expressed in these presentations
are those of the presenter(s) and not
necessarily of the Society of Actuaries in
Ireland
Disclaimer
![Page 3: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/3.jpg)
• Pedro Ecija Serrano
Chair, Data Analytics Subcommittee
• First of a series of three presentations
Welcome
Disclaimer:
The material, content and views in the following presentation are those of the presenter(s).
![Page 4: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/4.jpg)
4
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Demystifying Data Science
![Page 5: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/5.jpg)
“Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms”
—Wikipedia
What is Data Science?
![Page 6: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/6.jpg)
6
What is Data Science?
“Data science is the study of how to
make data-driven decisions”
![Page 7: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/7.jpg)
7
The more data you have,
The better your decisions should be
What is Data Science?
![Page 8: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/8.jpg)
8
Data Science Map
Data Science
![Page 9: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/9.jpg)
9
Data Science Map: Insurance Industry
Data Scientists
Actuaries Optimal?
![Page 10: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/10.jpg)
10
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Demystifying Data Science
![Page 11: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/11.jpg)
11
Data Storage Costs
![Page 12: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/12.jpg)
12
Digitalization
![Page 13: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/13.jpg)
13
Number of Wifi-Connected Devices
![Page 14: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/14.jpg)
14
Volume of Data
![Page 15: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/15.jpg)
15
Computer Speeds
![Page 16: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/16.jpg)
16
Data Science Tools
![Page 17: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/17.jpg)
17
Machine Learning
![Page 18: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/18.jpg)
18
Is Data an Asset?
![Page 19: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/19.jpg)
19
Why is it a Big Deal Now?
Q: Is data an asset?
A: Yes
Q: How can companies extract value from their data?
A: Data Science
Q: Who will actually analyse this data?
A: Data Scientists
![Page 20: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/20.jpg)
20
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Demystifying Data Science
![Page 21: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/21.jpg)
21
Data Science Process
Obtain Data + Develop Plan
Model
Clean + Reformat
Explore Data
Summarise Results
Make Data-Driven Decisions
![Page 22: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/22.jpg)
22
Traditional Actuarial Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
ExcelPolicyholder
DatabaseOther
Databases
Database Excel 1
CSVMarket Data
Results Database
Excel 2
Proprietary Model
Excel
Excel Models
Out-of-Model Adjustments
Summary Spreadsheets
Proprietary Model Reformat
Database
MI / BI
![Page 23: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/23.jpg)
23
Data Science Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
ExcelPolicyholder
DatabaseOther
Databases
Python
CSVMarket Data
PythonAutomated Summary
PythonAudit Trail /
Run log
MI / BI
Python
![Page 24: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/24.jpg)
24
Opportunities for Actuaries (1)
• Streamline your processes using open-source data science tools
• Improve efficiency and reduce time costs
• Reduced risk of manual error
• Spend time on value-added work rather than manual labour
![Page 25: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/25.jpg)
25
Opportunities for Actuaries (2)
• The ultimate wider field?
• Opportunity to drive revenue growth
• (e.g. using policyholder-level predictive modelling)
• Opportunity to work in different industries
• Powerful new tools to solve real-world problems
• Already familiar with handling data and building complex models
• CDO Roles
• Superstar salaries for top researchers
![Page 26: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/26.jpg)
26
Source: Indeed.com, November 2017
![Page 27: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/27.jpg)
27
Opportunities for Actuaries: Chief Data Officers
Source: VisualCapitalist.com: The Rise of the Chief Data Officer
![Page 28: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/28.jpg)
28
Threats for Actuaries
• Increased competition from data scientists• Who have strong computer skills
• Who have powerful predictive models
• Strong ability to handle data and extract information from the Company’s data
• Particularly for younger actuaries
![Page 29: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/29.jpg)
29
Threats
29
Data Scientists
Actuaries
![Page 30: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/30.jpg)
30
Threat Mitigation
• Improve data science skills within each actuarial team
• Mainly by improving computer skills and learning about machine learning models
• Gain access to open-source data science tools at work
• Overcome internal challenges to open-source software
• e.g. the IT department might be reluctant to use new software
![Page 31: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/31.jpg)
31
Opportunities for Companies
• Extract value from their data asset
• Make better data-driven decisions
• Better understanding of risks and opportunities by doing quick, novel analyses of the data
• Streamline operations
![Page 32: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/32.jpg)
32
Threats for Companies
• New companies could develop massive structural advantages over incumbents?
• E.g. Amazon have massive structural advantages over traditional retailers
![Page 33: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/33.jpg)
33
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Demystifying Data Science
![Page 34: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/34.jpg)
34
Python and R
• Python is a high level, general purpose programming language with readable syntax
• R is a statistical programming language designed by statisticians for statisticians
• Both are widely used for data science
• Both have similar market-leading functionality
![Page 35: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/35.jpg)
35
Trends
![Page 36: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/36.jpg)
36
Open-Source
Open-source software:
Users have the ability to:
• Run
• Study
• Modify
• Improve
• Copy
• Distribute to anyone and for any purpose
![Page 37: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/37.jpg)
37
The Python Data Science Stack
• Programming Language
• Numerical and scientific calculations
• Organising data, merging data, doing calculations
• Graphs
• Big Data
• Machine learning
• Artificial intelligence and ultra-fast calculations
![Page 38: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/38.jpg)
38
Open Source vs Closed Source
Open Source Closed Source
Source Code Open Hidden
Redistributable? Yes No
Modifiable? Yes No
Licence and Subscription Fees? No Yes
Documentation, Helpdesk and
Tutorials
Online (Google / Stackoverflow)
Provided by Provider (for a fee)
Responsiveness to bugs and
market Quick to respond
Depends on Provider
Version Control Systems AvailableDepends on
Provider
![Page 39: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/39.jpg)
39
Open-Source Advantages
• Fast
• Scalable
• Capable of full automation
• No licencing fees
• Auditability
• Flexibility
• Sustainability
• Easy to find or train developers
• Fast Learning Curve
![Page 40: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/40.jpg)
40
Open-Source Misconceptions
• Not secure
• Too hard to learn
• No documentation / bad documentation
• Not as good as proprietary software
![Page 41: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/41.jpg)
41
Closed Source Advantages
• It’s the Standard / Well Known
• Easier for Unskilled Users
• Guaranteed Support (for a fee)
• Managers prefer buying Software as a Service rather than building own systems?
• Warranties and Indemnity Liability
• Unlikely to Become Obsolete?
![Page 42: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/42.jpg)
42
Closed Source Risks
• Expensive
• Restrictive licences
• Lock-in / Capture
• Time-consuming / Hard to learn
• Management Incentives (Planned obsolescence / cash cow)
• Bankruptcy
• Unknown code quality
• Unknown level of security
• No incentive to provide good documentation
![Page 43: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/43.jpg)
43
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Demystifying Data Science
![Page 44: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/44.jpg)
44
Data Science Process : Buzzwords
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
Big Data
![Page 45: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/45.jpg)
45
Big Data
Big data: data sets that are too big and complex for
traditional data processing software
Need to use new software which can distribute the
storage and calculations across different machines
![Page 46: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/46.jpg)
46
Data Science Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
Exploratory Data Analysis
![Page 47: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/47.jpg)
47
Exploratory Data Analysis
EDA: Analyzing data sets to find their
main characteristics
![Page 48: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/48.jpg)
48
Data Science Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
Exploratory Data Analysis
Data Mining
![Page 49: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/49.jpg)
49
Data Mining
Data Mining is the process of finding patterns and
relationships in large datasets
Goal = to extract valuable understandable
information from data
![Page 50: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/50.jpg)
50
Data Science Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
Business Intelligence and Management Information
![Page 51: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/51.jpg)
51
Business Intelligence and Management Information
Analyzing data and presenting
information to help executives make
informed business decisions
![Page 52: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/52.jpg)
52
Data Science Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
Statistical Models
Predictive Analytics
Predictive Modelling
Machine Learning
![Page 53: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/53.jpg)
53
Statistics vs Predictive Analytics vs Machine Learning
Statistics is about data:
• Collection
• Organisation
• Analysis
• Interpretation
• Presentation
![Page 54: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/54.jpg)
54
Data Science Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
Statistical Models
Predictive Analytics
Predictive Modelling
Machine Learning
![Page 55: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/55.jpg)
55
Predictive Analytics
Predictive Analytics is a set of statistical techniques that
make predictions about future unknown events
For example:
• Data mining
• Traditional predictive models
• Machine learning models
![Page 56: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/56.jpg)
56
Data Science Process
Obtain Data
Model
Clean + Reformat
Explore / Check
Summarise Results
Make Data-Driven Decisions
Statistical Models
Predictive Analytics
Predictive Modelling
Machine Learning
![Page 57: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/57.jpg)
57
Predictive Modelling
Predictive models are models which make predictions
about future unknown events.
• Using current and historical data
• Allowing for relationships among many factors
• Make predictions about every example in the dataset
• These predictions can be used to guide decision
making
![Page 58: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/58.jpg)
58
Predictive Modelling
Two main types:
• Traditional predictive models
• Machine learning models
![Page 59: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/59.jpg)
59
Traditional Predictive Models
Characteristics of traditional predictive models:
• Explainable and interpretable
• Grounded in maths and statistics
• All parameters derived manually using closed form
mathematical solutions or simple algorithms
• Lots of manual effort required to build high
accuracy models
![Page 60: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/60.jpg)
60
Machine Learning Models
Machine learning models are predictive models
which have the ability to learn from data without
being explicitly programmed
Learning = progressively improving performance on
a specific task
![Page 61: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/61.jpg)
61
Machine Learning Models
Characteristics of machine-learning models:
• Automatic
• May be explainable or a black box
• Grounded in computer science
• Most parameters derived automatically using a
machine learning algorithm
• Little manual effort required to build high accuracy
models
![Page 62: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/62.jpg)
62
ML Models
Many possible datasets
Many possible predictions
Policyholder Datafiles
Claims Datafiles
Time Series Data
Text Files
Pictures
Videos
Audio
Policy Reserves
Price
Fraud / Not Fraud
Risk of Lapsing:High/Medium/Low
Rating from 1-5
Machine Learning
Model
Many Different Models
![Page 63: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/63.jpg)
63
Digital Photos
Source: Openframeworks.cc
• Digital Photos are stored as arrays of numbers
![Page 64: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/64.jpg)
64
Digital Audio Files
Source: ch.mathworks.com
• Digital Audio files are stored as a time series of arrays
• Each array contains information on pitch and loudness
![Page 65: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/65.jpg)
65
Digital Text
Source: ch.mathworks.com
• Can be converted to vectors of numbers• Glove
• Word2Vec
• Word Embeddings
![Page 66: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/66.jpg)
66
General Examples of Predictive Models
Self-Driving Cars
Speech-to-text
Recommender Systems
Game Playing
Reducing Electricity Costs
Machine translation
Chatbots
Text-to-Speech
Fraud Detection
Credit Risk
Pricing
Customer Retention
Proxy Models
Sales Forecasting
Anti-Money Laundering
Call-Centre Routing
Sentiment Analysis
Geographic Analysis
AnalysingSatellite Photos
Reading X-rays
![Page 67: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/67.jpg)
67
Example: Machine Translation as Predictive Model
“Je Suis” “I am”Predictive
Model
• The model tries to predict what words a human translator would use
![Page 68: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/68.jpg)
68
Example: Captioning
Red dress with White Spots and Black Belt
Red sweater with white stripes on arms and
Gingerbread man with Christmas Hat
Train Model
• The model takes the picture and predicts what the caption should be
![Page 69: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/69.jpg)
69
Example: Self-Driving Cars
Source: https://clipartxtras.com/
Good Driving
Bad Driving
Train Model
Model predicts what a good driver would do in the current circumstances
![Page 70: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/70.jpg)
70
Example: Fraud Detection
Claim isn’t Fraudulent
Claim is Fraudulent
Train Model
The model will predict whether each incoming claim is fraudulent or non-fraudulent
![Page 71: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/71.jpg)
71
General Examples of Predictive Models
Self-Driving Cars
Speech-to-text
Recommender Systems
Game Playing
Reducing Electricity Costs
Machine translation
Chatbots
Text-to-Speech
Fraud Detection
Credit Risk
Pricing
Customer Retention
Proxy Models
Sales Forecasting
Anti-Money Laundering
Call-Centre Routing
Sentiment Analysis
Geographic Analysis
AnalysingSatellite Photos
Reading X-rays
![Page 72: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/72.jpg)
72
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Demystifying Data Science
![Page 73: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/73.jpg)
73
Practical Example: Traditional Modelling and Machine Learning
![Page 74: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/74.jpg)
74
How much is a 1000 square foot house?
Eyeball approach:
Around €90k
![Page 75: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/75.jpg)
75
Linear Regression Predictive Model
• Linear Regression Model:
• Price = €101,955
• Slope = 108
• Intercept = -5,700
• MSE = 258 million
• But how do you find the slope and intercept?
![Page 76: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/76.jpg)
76
Approach 1: Normal Equation
![Page 77: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/77.jpg)
77
Linear Regression Predictive Model
Linear Regression Model:
• Price = €101,955
• Slope = 108
• Intercept = -5,700
• MSE = 258 million
![Page 78: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/78.jpg)
78
Approach 1: Normal Equation
Problem with normal equation:
• Only works if 𝑋𝑇𝑋 is invertible
• Doesn’t work on other models
• Doesn’t work well on large datasets
![Page 79: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/79.jpg)
79
Approach 2: Gridsearch
![Page 80: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/80.jpg)
80
Approach 2: Gridsearch
![Page 81: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/81.jpg)
81
Approach 2: Gridsearch
![Page 82: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/82.jpg)
82
Approach 2: Gridsearch
![Page 83: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/83.jpg)
83
Approach 2: Gridsearch
• Problem with gridsearch: Very inefficient
• Only works for models with a handful of parameters
![Page 84: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/84.jpg)
84
Approach 3: Stochastic Gradient Descent
1. You don’t know the slope and intercept, so randomly choose them
2. Therefore you start at a random point
3. Calculate the slope of the MSE loss surface at that point
4. Take a step downhill
5. Repeat 3 and 4 until you reach the lowest point on the loss surface
![Page 85: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/85.jpg)
85
Approach 3: Stochastic Gradient Descent
SGD gives exact same answer as Normal Equation in this example
![Page 86: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/86.jpg)
86
SGD: Python Code
![Page 87: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/87.jpg)
87
Approach 3: Stochastic Gradient Descent
![Page 88: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/88.jpg)
88
SGD: Cubic Polynomial
![Page 89: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/89.jpg)
89
SGD: Cubic Polynomial
![Page 90: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/90.jpg)
90
SGD: Exponential Model
![Page 91: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/91.jpg)
91
SGD: Exponential Curve
![Page 92: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/92.jpg)
92
SGD: Exponential Plus Cubic Model
![Page 93: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/93.jpg)
93
SGD: Exponential Plus Cubic Model
![Page 94: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/94.jpg)
94
SGD: Sine Regression
![Page 95: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/95.jpg)
95
SGD: Python Code
![Page 96: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/96.jpg)
96
SGD: Mathematical Background
![Page 97: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/97.jpg)
97
SGD: Python Code
![Page 98: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/98.jpg)
98
Benefits of SGD
• It is straightforward to calibrate predictive models
• You can build models with thousands of parameters
• Can work on huge data sets
• Can achieve human-level accuracy
• You can build models for all different types of data• Pictures
• Videos
• Audio
• Text
• Policyholder datafiles
![Page 99: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/99.jpg)
99
Benefits of SGD
• It works very well in practice• You can choose models which are a good fit to the data
• Rather than choosing models which you are able to fit to the data
![Page 100: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/100.jpg)
100
Machine Learning Models
![Page 101: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/101.jpg)
101
Neural Network Models
![Page 102: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/102.jpg)
102
Machine Learning Models in Scikit-Learn
![Page 103: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/103.jpg)
103
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Demystifying Data Science
![Page 104: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/104.jpg)
• Big Data
More Data
More Computing Power
More Analysis
• Computers in Actuarial Work
• A Word on Terminology
• Association Rule Mining
• Unsupervised Learning
Practical Examples – Getting started
![Page 105: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/105.jpg)
• Mainframe Systems
• Valuation Software
• Spreadsheets
• A precise answer…
• ...given assumptions
• Computers may be able to ‘solve’ problems
• Or at least give valuable insights
The role of Computers in Actuarial Work
![Page 106: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/106.jpg)
• Proved in 1976
• First major theorem proved by computer
Example 1 - Four Colour problem solved
![Page 107: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/107.jpg)
• xn+yn = zn
• Solved by computer for all primes up to 4,000,000
Example 2 - Fermat’s Last Theorem solved (almost)
![Page 108: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/108.jpg)
• Results always need to be interpreted!
http://tylervigen.com/spurious-correlations
Correlation and Causation!
![Page 109: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/109.jpg)
• Actuaries didn’t get here first!
• P = A / ä
Periodic Policy Amount =
Bounded Risk Benefit /
Contribution Vector
• Terminology not intuitive...
• ...concepts are
A word on Terminology
![Page 110: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/110.jpg)
This presentation
• Association Rule Mining (Amazon, Tesco)
• Unsupervised Learning
Letting the data tell its own story
Next presentation
• Supervised Learning
Where we propose a model
Final presentation
• Deep Learning (Neural Nets)
What we’re looking to cover
![Page 111: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/111.jpg)
• Purchasing datasets
Association Rule Mining 1
Bread Milk Eggs ... Yoghurt Tuna Fruit
Customer 1 x
Customer 2 x x x
Customer 3 x x
::
x
Customer n x
• Very very sparse
• Think of Amazon
![Page 112: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/112.jpg)
• Of interest, what items occur together?
• As a purchasing dataset will have very sparse data, ideas will be illustrated by a medical dataset
• 240 Patients
• 6 Symptoms
Association Rule Mining 2
![Page 113: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/113.jpg)
• Illustrative dataset
Association Rule Mining Dataset
Symptoms
1 2 3 4 5 6
Patient 1 x
Patient 2 x x
Patient 3 x x x
::
::
::
::
::
::
::
Patient 240 x x
Total 19 157 55 85 58 181
• Less sparse
![Page 114: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/114.jpg)
• Which symptoms occur together?
• Three key concepts...
For symptoms A & B
1) Support = P(A ⋂ B) = P(A,B)
2) Confidence = P(B|A) = P(A,B) / P(A)
3) Lift = P(A,B) / [P(A).P(B)]
Association Rule Mining Investigation
![Page 115: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/115.jpg)
Association Rule Mining Result 1
![Page 116: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/116.jpg)
Association Rule Mining Result 2
![Page 117: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/117.jpg)
• Concepts are not difficult
• Terminology and visualisation can be confusing at first
• Basic analysis can be enhanced by adding bounds and standardising results
• Very sophisticated algorithms can be developed but speed is an issue
~~---~~
Association Rule Summary
![Page 118: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/118.jpg)
Unsupervised Learning
No y value, Multiple x values
Supervised Learning
We do have a y value & multiple x values
What we’re looking to cover, a reminder
![Page 119: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/119.jpg)
• Old Faithful Geyser
• 272 data points on Waiting & Eruption Times
Unsupervised Learning 1
![Page 120: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/120.jpg)
• Old Faithful Geyser
• 272 data points on Waiting & Eruption Times
Unsupervised Learning 2
![Page 121: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/121.jpg)
Unsupervised Learning 3
![Page 122: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/122.jpg)
Unsupervised Learning 4
![Page 123: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/123.jpg)
Unsupervised Learning 5
‘Elbow’
![Page 124: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/124.jpg)
Unsupervised Learning 6
• Resulting Segmentation
• Can be exploratory or detective
![Page 125: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/125.jpg)
Another Grouping (Clustering) Example 1
![Page 126: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/126.jpg)
Another Grouping (Clustering) Example 2
![Page 127: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/127.jpg)
Another Grouping (Clustering) Example 3
![Page 128: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/128.jpg)
Another Grouping (Clustering) Example 4
• Accuracy 88%
• ‘First pass’ result
• Readily implementable
• Methodology generalisable to n dimensions
• Where could this give more insight?– Segmentation (Distribution Channel)
– Any homogeneous group selection
– Deconstructing portfolios
– Model point building
– Outlier identification (Fraud etc.)
– Trend analysis
![Page 130: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/130.jpg)
Deconstructing Trend Analysis 2
• Constructed dataset
• 6 x 100 sub-series
![Page 131: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/131.jpg)
Deconstructing Trend Analysis 3
![Page 132: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/132.jpg)
Deconstructing Trend Analysis 4
1 2 3 4 5 6
1 97 3 0 0 0 0
2 1 99 0 0 0 0
3 0 0 81 0 19 0
4 0 0 0 63 0 37
5 0 0 16 0 84 0
6 0 0 0 1 0 99
Predicted Group
Act
ual
Gro
up
• Accuracy 87%!
![Page 133: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/133.jpg)
Deconstructing Trend Analysis 5
• Accuracy 87%!!!
• Where could this give more insight?– Claim rates
– Seasonal / Selection Effects
– Investment performance analysis
– Stochastic model analysis
– Trend analysis
![Page 134: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/134.jpg)
Unsupervised Learning Summary
• Can help identify patterns in data
• Can help identify homogeneous groups
• Using computer power
• Relatively unsophisticated
• Possible to get answers quickly
• Perfect insight not possible
• Improved understanding may result
![Page 135: Demystifying Data Science - Society of Actuaries in Ireland · 2018-10-13 · Demystifying Data Science. 34 Python and R •Python is a high level, general purpose programming language](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45abd18d4b923cf72a927/html5/thumbnails/135.jpg)
135
• What is Data Science?
• Why has it Grown So Quickly?
• Opportunities and Threats
• Open Source vs Closed Source
• Buzzwords
• Example: Machine Learning Model
• Practical Examples
Any Questions?