Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen...

28
N. Fuhr, U. Duisburg-Essen Einführung Information Mining 1 Information Mining - Introduction Norbert Fuhr Dep. Computer Science and Applied Cognitive Science Information Engineering

Transcript of Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen...

Page 1: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 1

Information Mining - Introduction

Norbert Fuhr

Dep. Computer Science and Applied Cognitive Science

Information Engineering

Page 2: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 2

Tasks in Data Mining

● ClassificationPredicting class membership

● Numeric PredictionPredicting a numeric value

● AssociationDetermining associations between arbitrary features

● ClusteringGrouping of objects based on their similarity

Page 3: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 3

Examples of Classification Tasks

Page 4: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 4

Examples of Classification Tasks

● Will it rain tomorrow?

Page 5: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 5

Examples of Classification Tasks

● Will it rain tomorrow?● Will the applicant cause

a car crash next year?

Page 6: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 6

Examples of Classification Tasks

● Will it rain tomorrow?● Will the applicant cause

a car crash next year?

Page 7: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 7

Examples of Classification Tasks

● Will it rain tomorrow?● Will the applicant cause

a car crash next year?● Fighting crime

Page 8: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 8

Examples of Classification Tasks

● Will it rain tomorrow?● Will the applicant cause

a car crash next year?● Fighting crime● Will the customer be able

to pay back the credit?

Page 9: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 9

Examples of Classification Tasks

● Will it rain tomorrow?● Will the applicant cause

a car crash next year?● Fighting crime● Will the customer be able

to pay back the credit?● Will the device have a

defect shortly?

Page 10: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 10

Examples of Classification Tasks

● Will it rain tomorrow?● Will the applicant cause

a car crash next year?● Fighting crime● Will the customer be able

to pay back the credit?● Will the device have a

defect shortly?● Is there a traffic jam?

Page 11: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 11

Examples of Classification Tasks

● Will it rain tomorrow?● Will the applicant cause

a car crash next year?● Fighting crime● Will the customer be able

to pay back the credit?● Will the device have a

defect shortly?● Is there a traffic jam?

Page 12: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 12

Classification: Spam Filtering

Page 13: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 13

Spam detection software, running on the system "martini.is.inf.uni-due.de", hasidentified this incoming email as possible spam.

Content analysis details: (8.6 points, 6.0 required)

pts rule name description---- ---------------------- -------------------------------------------------- 1.0 DATE_IN_PAST_12_24 Date: is 12 to 24 hours before Received: date 0.4 URI_HEX URI: URI hostname has long hexadecimal sequence 1.6 HTML_IMAGE_ONLY_28 BODY: HTML: images with 2400-2800 bytes of words 0.0 HTML_MESSAGE BODY: HTML included in message 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] 1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist [URIs: conferencebrain.net] 1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist [URIs: conferencebrain.net] 2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist [URIs: conferencebrain.net] 1.5 URIBL_SBL Contains an URL listed in the SBL blocklist [URIs: conferencebrain.net]-0.8 AWL AWL: From: address is in the auto white-list

Page 14: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 14

Classification: Learning to Rank in Web Search

Page 15: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 15

Examples fo numeric prediction

● How many rolls will be sold tomorrow?

Page 16: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 16

Examples fo numeric prediction

● How many rolls will be sold tomorrow?● How many visitors will need a hotel room in our

city on xx.xx?

Page 17: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 17

Examples fo numeric prediction

● How many rolls will be sold tomorrow?● How many visitors will need a hotel room in our

city on xx.xx?● How many travellers will want to fly from A to B

on xx.xx.?

Page 18: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 18

Examples fo numeric prediction

● How many rolls will be sold tomorrow?● How many visitors will need a hotel room in our

city on xx.xx?● How many travellers will want to fly from A to B

on xx.xx.?

Page 19: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

Personality prediction based on FB Likes

Page 20: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 20

Examples for Associations

● Shopping cart analysis: Men shopping diapers often also buy beer

● Analysis of transactions of credit cards, customer cards, Payback cards

Page 21: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Lehrangebot 21

Clustering example

Page 22: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Lehrangebot 22

Clustering example

Page 23: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Lehrangebot 23

Page 24: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Lehrangebot 24

Page 25: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Lehrangebot 25

Graph Mining

● Chemistry

● CAD

● Analysis of program code

● Social networks

● Web analytics

● Games

● Geology

● ...

Page 26: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Lehrangebot 26

Sequence Mining

● Shopping● User

Interactions● System logs● DNA

Sequences

SID sequence

10 <a(abc)(ac)d(cf)>

20 <(ad)c(bc)(ae)>

30 <(ef)(ab)(df)cb>

40 <eg(af)cbc>

Page 27: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 27

Process Mining

Page 28: Norbert Fuhr Dep. Computer Science and Applied Cognitive ... · N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28 Summary Data mining applications can be found mainly in

N. Fuhr, U. Duisburg-Essen Einführung Information Mining 28

Summary

● Data mining applications can be found mainly in the trading and service sector, but increasingly also in other areas

● Good predictions lead to better utilisation of limited resources (staff, capital, hotel rooms, planes...) and increase the cost effectiveness

● Advanced applications like Industry 4.0, Smart Home, Self Driving Cars or Big Data rely heavily on machine learning/data mining methods