WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG...

Post on 21-Jul-2020

21 views 2 download

Transcript of WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG...

WHAT IS DATA

SCIENCE?

Timo Aho // Data Scientist, PhD // timo.aho@solita.fi // Twitter: @ahotimom

Data Science Tampere Meetup 29.9.2015

Turnover 2014

38,6Million euros

Over

340professionals

THIS IS SOLITA

Over

18years

Working in

3offices

Over

1000projects

Over

97 %customer

satisfaction

Ranking

6.in Great Place to Work

in Finland

Ranking

43.in European Best

Workplaces

Strategic planning

Pre-studies

Road maps

Service concepts

Service design

Visual design

User experience design

Usability design

Architecture design

Solution implementation

Continuous services

Hosting services

Understand & concept Pilot & implement Maintain & develop

ONLINE AND

ECOMMERCE

INFORMATION

MANAGEMENT AND

BIG DATA

UTILIZATION AND

VISUALIZATION OF

INFORMATION

SOFTWARE

DEVELOPMENT

PREDICTIVE

ANALYTICS

DIGITAL STRATEGY AND

TRANSFORMATION

PUBLIC SECTOR ONLINE

SERVICES

BUSINESS PLANNING

AND MANAGEMENT

INTEGRATION

SERVICES

DIGITAL BUSINESS SOLUTIONS

OUR CUSTOMERS

RETAIL SERVICES PUBLIC

OPEN FINLAND CHALLENGE

› An open data contest where you can win prices!

› Solita is offering a challenge on

predictive traffic analytics

› See more: http://openfinlandchallenge.fi/

› The site unfortunately mostly in Finnish

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

BIG DATA CONCEPTS

Big data

Data analysis

Data scienceKnowledge discovery in databases (KDD)

Data mining

Machine learning High VolumePredictive analytics

High Velocity

High Variety

Cloud computation

NoSQL

Cloud storageHadoop MapReduce

Batch vs. Real time

Structured

Unstructured

Semi-structured

Spark

Internet of thingsSensory data

Business analytics

Business intelligence

DEFINITION FOR BIG DATA?

› Narrow:

• Infrastructure for processing exceptionally large or rapidly produced

data

› Broad:

• All data storing, processing and analyzing

• (Does not necessarily fit into computer memory)

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

CASE

SANOMA OYJ

A personalized user experience on the most popular web services by

analyzing 200 millions new events daily

CASE: DIGITAL SERVICE PROVIDER

› Predicting:

• Customer churn

• Cross-selling

› The information available in all customer contacts

• When the customer contacts support

• When marketing contacts customer

• When meeting in shops, in phone, in web

CASE: RETAIL / SERVICE PROVIDER

› Customers act in waves, for a couple of weeks high service demand

› Analysis

• Segment customers according to behavior

• Predict customer action timing and high demand times

• Affect the customers to make demand level steadier. No peaks.

CASE

SANOMA OYJ

A personalized user experience on the most popular web services by

analyzing 200 millions new events daily

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

DATA ANALYSIS PROCESS

Source: CRISP-DM, Image: Wikipedia

50-70%

10-20%20-30%

10-20%

10-20%

5-10%

Servicelayer

Informationexploitation

Analyticsresult

Analyticsmodeling

Discoveringavailable dataBusiness goals

Reducechurn

WHAT A DATA SCIENTIST DOES?

Datapreprocessing

Ex. 1

Ex. 2Increase

manufacturing quality

Leaving customersBilling

ContractsContacts

Service qualityDemography

Failures

Raw materialsMachine

parametersManufacturing

sensorsEnd-product

quality measurements

Failures

Database connections

Abnormal data forms

Bringing to matrix form

Cleaning or highlighting outliers and exceptions

Handling missing information

Training:80 variables per

leaving customer, three times more

current customers

Training:tens of starting,

intermediate and ending variables

Churn prediction for each customer

Prediction for the optimal parameter values for quality

Getting the predictions to data

bases in source systems

Hint for good parameter values,

indication if suboptimal ones

selected

Optimizing communication to

prevent churn. Customer service

sees the churn prediction for

current customer.

Process controller either uses the recommended

parameter values or tunes them.

Also creates real-time reports on process quality.

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

NATURE OF THE DATA

› Structured

› Semi-structured

› Unstructured

{"cod":"200","message":0.0032,"city": {

"id":1851632,"name":"Shuzenji","coord":{"lon":138.933334,"lat":34.966671},"country":"JP"

},"cnt":10,"list": [{

"dt":1406080800,"temp": {

"day":297.77,"min":293.52,"max":297.77,"night":293.52,"eve":297.77,"morn":297.77

},"pressure":925.04,"humidity":76

}]}

HISTORY OR FUTURE?

Descriptive

• What happened?• What is happening?

Predictive

• What will probably happen?

Prescriptive

• What should be done for optimal outcome?

• Reporting• Data warehouses• Master data

• Statistical modeling• Machine learning

• Optimizing• Machine learning• Simulation• Real-time analytics

Most organizations are here

Feature 1 Feature 2 Feature 3 Feature 4

Data point 1

Data point 2

Data point 3

Data point 4

Data point 5

WHAT DO ALGORITHMS EAT?

› Visualizations

• High dimension?

› Statistical values, dependencies

› Clustering

DESCRIPTIVE MODELING METHODS

Source: Wikipedia

Feature 1 Feature 2 Feature 3 Target feature

Data point 1

Data point 2

Data point 3

Data point 4

Data point 5

WHAT DO ALGORITHMS EAT?

› Regression

› Classification

PREDICTIVE MODELING METHODS

1 €2 €4,5 €1,5 € 1,3 €2 €

AAAA OP

WHY IS DATA SCIENCE RELEVANT?

› More data available

› A lot of software tools available

• R, Python, Weka, Rapidminer, Tableau, SPSS, SAS

• Hadoop, Spark, NoSQL databases

• Cloud tools

› Business understanding on how to apply?

Twitter @SolitaOy

www.solita.fi

THANK YOU!

TIMO AHO

Data Scientist, PhD

timo.aho@solita.fi