Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new...
-
Upload
sibyl-thornton -
Category
Documents
-
view
216 -
download
0
Transcript of Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new...
![Page 1: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/1.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
1
Knowledge and the Web –
Inferring new knowledge from data(bases):
Knowledge Discovery in Databases
Bettina Berendt
KU Leuven, Department of Computer Science
http://people.cs.kuleuven.be/~bettina.berendt/teaching
Last update: 25 November 2015
![Page 2: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/2.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
2
Where are we?
![Page 3: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/3.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
3
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 4: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/4.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
4
What should we recommend to a customer/user?
![Page 5: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/5.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
5
What‘s spam and what isn‘t?
![Page 6: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/6.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
6
Classification / prediction: how is that done?
In which weather will someone play (tennis etc.)?
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHigh Hot Sunny
NoFalseHighHotSunny
PlayWindyHumidityTempOutlook
![Page 7: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/7.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
7
Classification / prediction: What makes people happy?
![Page 8: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/8.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
8
“Classification along a numerical scale“: other forms of sentiment analysis
8
![Page 9: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/9.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
9When we don‘t know the classes yet, but need to discover them: What “news stories“ are there today?
9
![Page 10: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/10.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
10
What „circles“ of friends do you have?
![Page 11: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/11.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
11
What „circles“ of friends do you have?
![Page 12: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/12.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
12Topic detection: What topics exist in a collection of texts, and how do they evolve?
News texts, scientific publications, speeches, …
![Page 13: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/13.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
13
From your questions to the speakers
These days you hear a lot about Big Data . Nobody seems to have a really good definition for it though. Do you see linked data as a part of Big Data or more as something separate.
![Page 14: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/14.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
14A note on last week‘s remark on the challenges of wrong data “used by machines“ vs. “used by people“ (1)
![Page 15: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/15.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
15A note on last week‘s remark on the challenges of wrong data “used by machines“ vs. “used by people“ (2)
![Page 16: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/16.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
16A note on last week‘s remark on the challenges of wrong data “used by machines“ vs. “used by people“ (3)
![Page 17: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/17.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
17
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 18: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/18.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
18
Forms of data analysis
• Confirmatory• Hypothesis testing• Experimental procedure, data gathered for this purpose• Inferential statistics• Causality
• Exploratory• Data mining• Already-existing data• Data mining & machine learning models• “Correlation“ (in a wide sense)
• Different basic assumptions, different evaluation methodologies, even when they use the same models (e.g. regression)!
![Page 19: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/19.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
19
Styles of reasoning
• Descriptive vs. predictive
• Deductive vs. inductive inference
• Data mining prediction is always inductive inference!
![Page 20: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/20.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
20
From your questions
Are there any economic indicators, related to the (country of representation of a) speaker that influence how many speeches are given by a certain country in the European parliament?
Are economically more powerful countries more influential in the European parliament?
Why does Germany have so much influence on European politics or is this a false statement?
![Page 21: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/21.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
21
Empiricism and apophenia
21
![Page 22: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/22.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
22Empiricism and apophenia: correlation, causation, and instrumentality
22
![Page 23: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/23.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
23“Correlation replaces causation“: Business logic and prediction vs. explanation ...
23
![Page 24: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/24.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
24
A related issue: number of data points / From your questions
Does the weather in Finland during the European Parliament elections affect the voting behaviour of the Finnish people?
![Page 25: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/25.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
25
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 26: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/26.jpg)
26Berendt: Advanced databases, first semester 2011, http://people.cs.kuleuven.be/~bettina.berendt/teaching
26
The KDD process: The output
The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996)
non-trivial process
Multiple process
valid Justified patterns/models
novel Previously unknown
useful Can be used
understandableby human and machine
![Page 27: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/27.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
27
The process part of knowledge discovery
CRISP-DM • CRoss Industry
Standard Process for Data Mining
• a data mining process model that describes commonly used approaches that expert data miners use to tackle problems.
![Page 28: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/28.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
28
Knowledge discovery, machine learning, data mining
Knowledge discovery
= the whole process
Machine learning
the application of induction algorithms and other algorithms that can be said to „learn.“
= „modeling“ phase
Data mining sometimes = KD,
sometimes = ML
![Page 29: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/29.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
29
How much time will you actually spend modelling?
![Page 30: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/30.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
30
Standard data mining algorithms work on single tables
Important Q for data preparation: How to get from an RDF graph to a table?
![Page 31: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/31.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
31
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 32: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/32.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
32
Descriptive and predictive modelling / learning
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHigh Hot Sunny
NoFalseHighHotSunny
PlayWindyHumidityTempOutlook
![Page 33: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/33.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
33
From your questions
Are economically more powerful countries more influential in the European parliament?
...
Economically powerful countries can be based on different factors, including
Gross Domestic Product per Capita
...
![Page 34: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/34.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
34
A simple descriptive statistic: Correlation
0 5 10 15 20 250
10
20
30
40
50
60
70
80
90
y1
0 5 10 15 20 25
-300
-250
-200
-150
-100
-50
0
50
y2
0 5 10 15 20 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y3
0 5 10 15 20 25
-100
-80
-60
-40
-20
0
20
y4
![Page 35: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/35.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
35
“Truly numerical data“: Pearson correlation
![Page 36: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/36.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
36
From your questions
Is there a correlation between the countries of the speakers who give speeches about the environment and the countries that have the best environmental policies? (pollution, renewable energy, waste generation, etc.)
![Page 37: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/37.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
37
Rank data: Spearman‘s rank correlation coefficient
![Page 38: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/38.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
38
Unclear to me / From your questions
Is there a correlation between BBC coverage and the topic of the talks given at the European Parliament?
Is there a correlation between the government type of a country and how much its members talk about democracy?
![Page 39: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/39.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
39Understand your data (1): Understand your concepts and how your variables measure them
![Page 40: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/40.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
40
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 41: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/41.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
41
Attributes
……………
YesFalse8075Rainy
YesFalse8683Overcast
NoTrue9080Sunny
NoFalse8585Sunny
PlayWindyHumidityTemperatureOutlook
![Page 42: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/42.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
42
What’s in an attribute?
Each instance is described by a fixed predefined set of features, its “attributes”
But: number of attributes may vary in practice
Possible solution: “irrelevant value” flag Related problem: existence of an attribute
may depend of value of another one Possible attribute types (“levels of
measurement”, aka “scales of measurement”):
Nominal, ordinal, interval and ratio
![Page 43: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/43.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
43
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 44: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/44.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
44Task: align example measures, scale of measurement, and allowed operations
Example Scale level operations
Temperature (celsius)
Grades at school/university
Pass or no pass (exam)Metres
Temperature („warm“, „cold“, ...)
Weather („good“, „bad“)
Weather („sunny“, „windy“, „cold crisp day“, ...)
Likert-scale values („on a scale of 1-7, ...“)
Duration of work tasks (in minutes)
ECTS credits
NominalOrdinalIntervalratio
=, ≠<, >+, -*, /%modemedianarithmetic meangeom. mean
![Page 45: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/45.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
45
Nominal quantities
Values are distinct symbols Values themselves serve only as labels or
names Nominal comes from the Latin word for name
Example: attribute “outlook” from weather data
Values: “sunny”,”overcast”, and “rainy” No relation is implied among nominal values
(no ordering or distance measure) Only equality tests can be performed
![Page 46: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/46.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
46
Ordinal quantities
Impose order on values But: no distance between values defined Example:
attribute “temperature” in weather data Values: “hot” > “mild” > “cool”
Note: addition and subtraction don’t make sense
Example rule:temperature < hot Þ play = yes
Distinction between nominal and ordinal not always clear (e.g. attribute “outlook”)
![Page 47: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/47.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
47
Interval quantities
Interval quantities are not only ordered but measured in fixed and equal units
Example 1: attribute “temperature” expressed in degrees Fahrenheit
Example 2: attribute “year” Difference of two values makes sense Sum or product doesn’t make sense
Zero point is not defined!
![Page 48: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/48.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
48
Ratio quantities
Ratio quantities are ones for which the measurement scheme defines a zero point
Example: attribute “distance” Distance between an object and itself is zero
Ratio quantities are treated as real numbers All mathematical operations are allowed
But: is there an “inherently” defined zero point?
Answer depends on scientific knowledge (e.g. Fahrenheit knew no lower limit to temperature)
![Page 49: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/49.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
52
Understanding your data (2): Visualize!
0 5 10 15 20 250
10
20
30
40
50
60
70
80
90
y1
0 5 10 15 20 25
-300
-250
-200
-150
-100
-50
0
50
y2
0 5 10 15 20 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y3
0 5 10 15 20 25
-100
-80
-60
-40
-20
0
20
y4
![Page 50: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/50.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
53
Understanding your data (3): How to visualize non-numerical data?
Is there a correlation between the government type of a country and how much its members talk about democracy?
How could you visualize data on this to avoid drawing wrong conclusions already at the outset?
![Page 51: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/51.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
54
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 52: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/52.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
55Supervised and unsupervised learning and examples dealt with here
• Supervised learning
• Classification / classifier learning
• regression
• Unsupervised learning
• Association rule mining
• Clustering
![Page 53: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/53.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
56
A question to the speakers that I don‘t quite understand
A lot of hierarchies in RDF specifications are built using some human compromise between the properties of a concept and the hierarchy in which the concept is classified. Unsupervised learners already outperform humans in some classification tasks.
How does this automatisation influence the availability of linked open data?
![Page 54: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/54.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
57
How to: our proposal
• Basic KDD techniques: frame your research question in terms of one of these tasks, use software to analyse your data (e.g. RapidMiner)
• Advanced KDD techniques (topic detection, sentiment analysis): use 3rd-party software (Sebastijan will provide a list)
• More advanced ideas? Ask / consult with us!
![Page 55: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/55.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
58
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 56: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/56.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
59
From your questions
Which European politicians have a high chance of receiving a Nobel Prize?
For the sake of the argument, let us rephrase this a bit to give a typical classification task (see later for a more appropriate formalization):
People with what features (feature values) get a Nobel Prize?
![Page 57: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/57.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
60
Constructing decision trees
Strategy: top downRecursive divide-and-conquer fashion
First: select attribute for root nodeCreate branch for each possible attribute value
Then: split instances into subsetsOne for each branch extending from the node
Finally: repeat recursively for each branch, using only instances that reach the branch
Stop if all instances have the same class Will illustrate key ideas with ID3, a very
simple decision-tree learning algorithm
![Page 58: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/58.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
61
Which attribute to select?
![Page 59: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/59.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
62
Which attribute to select?
![Page 60: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/60.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
63
Criterion for attribute selection
Which is the best attribute? Want to get the smallest tree Heuristic: choose the attribute that
produces the “purest” nodes Popular impurity criterion: information
gain Information gain increases with the
average purity of the subsets Strategy: choose attribute that gives
greatest information gain
![Page 61: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/61.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
64
Computing information
Measure information in bits Given a probability distribution, the info
required to predict an event is the distribution’s entropy
Entropy gives the information required in bits(can involve fractions of bits!)
Formula for computing the entropy:
![Page 62: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/62.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
65
Example: attribute Outlook
info[4,0]=entropy 1,0=−1 log 1−0 log0=0bits
info[2,3]=entropy3 /5,2 /5=−3 /5 log 3/5−2 /5 log 2 /5=0.971bits
info[3,2] , [4,0] , [3,2]=5 /14×0.9714 /14×05 /14×0.971=0.693bits
![Page 63: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/63.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
66
Computing information gain
Information gain: information before splitting – information after splitting
Information gain for attributes from weather data:
gain(Outlook ) = 0.247 bitsgain(Temperature ) = 0.029
bitsgain(Humidity ) = 0.152 bitsgain(Windy ) = 0.048 bits
gain(Outlook ) = info([9,5]) – info([2,3],[4,0],[3,2])= 0.940 – 0.693= 0.247 bits
![Page 64: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/64.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
67
Continuing to split
gain(Temperature ) = 0.571 bits
gain(Humidity ) = 0.971 bits
gain(Windy ) = 0.020 bits
![Page 65: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/65.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
68
Final decision tree
Note: not all leaves need to be pure; sometimes identical instances have different classes
Splitting stops when data can’t be split any further
![Page 66: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/66.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
69
Wishlist for a purity measure
Properties we require from a purity measure:
When node is pure, measure should be zero When impurity is maximal (i.e. all classes
equally likely), measure should be maximal Measure should obey multistage property
(i.e. decisions can be made in several stages):
Entropy is the only function that satisfies all three properties!
![Page 67: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/67.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
70
Properties of the entropy
The multistage property:
Simplification of computation:
Note: instead of maximizing info gain we could just minimize information
![Page 68: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/68.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
71
Variants
Top-down induction of decision trees: ID3, algorithm developed by Ross Quinlan
Various improvements, e.g. C4.5: deals with numeric attributes,
missing values, noisy data other measures instead of information gain
(details see exercise session / individual)
……………
YesFalse8075Rainy
YesFalse8683Overcast
NoTrue9080Sunny
NoFalse8585Sunny
PlayWindyHumidityTemperatureOutlook
![Page 69: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/69.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
72
Classification rules
Popular alternative to decision trees Antecedent (pre-condition): a series of tests
(just like the tests at the nodes of a decision tree)
Tests are usually logically ANDed together (but may also be general logical expressions)
Consequent (conclusion): classes, set of classes, or probability distribution assigned by rule
Individual rules are often logically ORed together
Conflicts arise if different conclusions apply
![Page 70: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/70.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
73
An example
If outlook = sunny and humidity = high then play = noIf outlook = rainy and windy = true then play = noIf outlook = overcast then play = yesIf humidity = normal then play = yesIf none of the above then play = yes
![Page 71: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/71.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
74
Transition: Trees for numeric prediction
Regression: the process of computing an expression that predicts a numeric quantity
Regression tree: “decision tree” where each leaf predicts a numeric quantity
Predicted value is average value of training instances that reach the leaf
Model tree: “regression tree” with linear regression models at the leaf nodes
Linear patches approximate continuous function
![Page 72: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/72.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
75
An example
……………
40FalseNormalMildRainy
55FalseHighHot Overcast
0TrueHigh Hot Sunny
5FalseHighHotSunny
Play-timeWindyHumidityTemperatureOutlook
![Page 73: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/73.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
76
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 74: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/74.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
77
From your questions
Are economically more powerful countries more influential in the European parliament?
...
Economically powerful countries can be based on different factors, including
Gross Domestic Product per Capita
...
![Page 75: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/75.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
78
Lead question
“How does the dependent variable depend on the independent one?“
“Can we predict the likely value of the dependent variable for a new data instance (with a given value of the independent variable)?“
![Page 76: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/76.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
79
79
Introduction to Linear Regression(the statistical approach)
The Pearson correlation measures the degree to which a set of data points form a straight line relationship.
Regression is a statistical procedure that determines the equation for the straight line that best fits a specific set of data.
Slides 44-49: slightly adapted from https://home.ubalt.edu/tmitch/631/PowerPoint_Lectures/chapter17/chapter17.ppt
![Page 77: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/77.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
80
80
Introduction to Linear Regression (cont.)
Any straight line can be represented by an equation of the form Y = bX + a, where b and a are constants.
The value of b is called the slope constant and determines the direction and degree to which the line is tilted.
The value of a is called the Y-intercept and determines the point where the line crosses the Y-axis.
![Page 78: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/78.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
81
![Page 79: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/79.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
82
82
Introduction to Linear Regression (cont.)
How well a set of data points fits a straight line can be measured by calculating the distance between the data points and the line.
The total error between the data points and the line is obtained by squaring each distance and then summing the squared values.
The regression equation is designed to produce the minimum sum of squared errors.
![Page 80: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/80.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
83
83
Introduction to Linear Regression (cont.)
The equation for the regression line is
![Page 81: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/81.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
84
![Page 82: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/82.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
85
From your questions
Are economically more powerful countries more influential in the European parliament?
...
Economically powerful countries can be based on different factors, including
Gross Domestic Product per Capita
Human Development Index
...
Multiple regression
(details: see exercise session)
![Page 83: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/83.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
86From your questions
Is there a correlation between the government type of a country and how much its members talk about democracy?
This has (assumed) categorical predictors, which can be modelled by dummy variables in a linear regression.
Dummy variables
![Page 84: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/84.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
88
From your questions
Which European politicians have a high chance of receiving a Nobel Prize?
![Page 85: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/85.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
89
Logistic regression – input data
![Page 86: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/86.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
90
Logistic regression – fitting a curve
![Page 87: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/87.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
91
Logistic regression - prediction
![Page 88: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/88.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
92
From your questions
Which European politicians have a high chance of receiving a Nobel Prize?
Note: Logistic regression also exists in multivariate form (= with multiple predictor variables)
![Page 89: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/89.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
93
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 90: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/90.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
94
From your questions
To what extent are a politician‘s topics of choice influenced by * their field of study during higher education?
* phrasing: See remark on “correlation vs. causation“ above!
Are speeches in the European Parliament related to what the public think or search online?
![Page 91: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/91.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
95Motivation for association-rule learning/mining: store layout (Amazon, earlier: Wal-Mart, ...)
Where to put: spaghetti,
butter?
![Page 92: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/92.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
96
Data
"Market basket data": attributes with boolean domains
In a table each row is a basket (aka transaction)
Transaction ID Attributes (basket items)
1 Spaghetti, tomato sauce
2 Spaghetti, bread
3 Spaghetti, tomato sauce, bread
4 bread, butter
5 bread, tomato sauce
![Page 93: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/93.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
97Solution approach: The apriori principle and the pruning of the search tree (1)
spaghetti Tomato sauce bread butter
Spaghetti, tomato sauce
Spaghetti, bread
Spaghetti, butter
Tomato s.,bread
Tomato s.,butter
Bread,butter
Spagetthi, Tomato sauce,Bread, butter
Spagetthi,Tomato sauce,Bread
Spagetthi,Tomato sauce,butter
Spagetthi,Bread,butter
Tomato sauce,Bread,butter
![Page 94: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/94.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
98
spaghetti Tomato sauce bread butter
Spaghetti, tomato sauce
Spaghetti, bread
Spaghetti, butter
Tomato s.,bread
Tomato s.,butter
Bread,butter
Spagetthi, Tomato sauce,Bread, butter
Spagetthi,Tomato sauce,Bread
Spagetthi,Tomato sauce,butter
Spagetthi,Bread,butter
Tomato sauce,Bread,butter
Solution approach: The apriori principle and the pruning of the search tree (2)
![Page 95: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/95.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
99
spaghetti Tomato sauce bread butter
Spaghetti, tomato sauce
Spaghetti, bread
Spaghetti, butter
Tomato s.,bread
Tomato s.,butter
Bread,butter
Spagetthi, Tomato sauce,Bread, butter
Spagetthi,Tomato sauce,Bread
Spagetthi,Tomato sauce,butter
Spagetthi,Bread,butter
Tomato sauce,Bread,butter
Solution approach: The apriori principle and the pruning of the search tree (3)
![Page 96: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/96.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
100
spaghetti Tomato sauce bread butter
Spaghetti, tomato sauce
Spaghetti, bread
Spaghetti, butter
Tomato s.,bread
Tomato s.,butter
Bread,butter
Spagetthi, Tomato sauce,Bread, butter
Spagetthi,Tomato sauce,Bread
Spagetthi,Tomato sauce,butter
Spagetthi,Bread,butter
Tomato sauce,Bread,butter
Solution approach: The apriori principle and the pruning of the search tree (4)
![Page 97: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/97.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
101
More formally: Generating large k-itemsets with Apriori
Min. support = 40%
step 1: candidate 1-itemsets Spaghetti: support = 3 (60%) tomato sauce: support = 3 (60%) bread: support = 4 (80%) butter: support = 1 (20%)
Transaction ID Attributes (basket items)
1 Spaghetti, tomato sauce
2 Spaghetti, bread
3 Spaghetti, tomato sauce, bread
4 bread, butter
5 bread, tomato sauce
![Page 98: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/98.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
102
Contd.
step 2: large 1-itemsets
Spaghetti
tomato sauce
bread
candidate 2-itemsets
{Spaghetti, tomato sauce}: support = 2 (40%)
{Spaghetti, bread}: support = 2 (40%)
{tomato sauce, bread}: support = 2 (40%)
Transaction ID Attributes (basket items)
1 Spaghetti, tomato sauce
2 Spaghetti, bread
3 Spaghetti, tomato sauce, bread
4 bread, butter
5 bread, tomato sauce
![Page 99: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/99.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
103
step 3: large 2-itemsets {Spaghetti, tomato sauce}
{Spaghetti, bread}
{tomato sauce, bread}
candidate 3-itemsets
{Spaghetti, tomato sauce, bread}: support = 1 (20%)
step 4: large 3-itemsets { }
Transaction ID Attributes (basket items)
1 Spaghetti, tomato sauce
2 Spaghetti, bread
3 Spaghetti, tomato sauce, bread
4 bread, butter
5 bread, tomato sauce
Contd.
![Page 100: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/100.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
104
From itemsets to association rules
Schema: If subset then large k-itemset with support s and confidence c
s = (support of large k-itemset) / # tuples
c = (support of large k-itemset) / (support of subset)
Example:
If {spaghetti} then {spaghetti, tomato sauce}
Support: s = 2 / 5 (40%)
Confidence: c = 2 / 3 (66%)
![Page 101: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/101.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
105
From local associations to global models: clustering
To what extent are a politician‘s topics of choice influenced by their field of study during higher education?
Can we find clusters of educational background and topics?
![Page 102: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/102.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
106
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 103: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/103.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
107
The basic idea of clustering: group similar things
Group 1Group 2
Attribute 1
Att
rib
ute
2
![Page 104: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/104.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
108Concepts in Clustering
Defining distance between points Euclidean distance
any other distance (cityblock metric, Levenshtein, Jaccard sim. ...)
A good clustering is one where (Intra-cluster distance) the sum of distances between objects in the same
cluster are minimized,
(Inter-cluster distance) while the distances between different clusters are maximized
Objective to minimize: F(Intra,Inter)
Clusters can be evaluated with “internal” as well as “external” measures
Internal measures are related to the inter/intra cluster distance
External measures are related to how representative are the current clusters to “true” classes
||
||
RQ
RQ
![Page 105: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/105.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
109
K Means Example (K=2)
Pick seeds
Reassign clusters
Compute centroids
xx
Reasssign clusters
xx xx Compute centroids
Reassign clusters
Converged!
Based on http://rakaposhi.eas.asu.edu/cse494/notes/f02-clustering.ppt
![Page 106: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/106.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
110
K-means algorithm
![Page 107: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/107.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
111
From local associations to global models: clustering
To what extent are a politician‘s topics of choice influenced by their field of study during higher education?
Can we find clusters of educational background and topics?
![Page 108: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/108.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
112
Clustering non-numerical data
(to follow)
![Page 109: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/109.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
113
Agenda
Motivation: application examples
Forms of data analysis and styles of reasoning
The process of knowledge discovery
Description and prediction
Data understanding: two important notes (among other issues)
Types of learning tasks
Classification
Regression
Assocation-rule mining
Clustering
![Page 110: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/110.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
114
Next lecture
More on KDD concepts and methods
for your projects
![Page 111: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/111.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
115Supervised and unsupervised learning and examples dealt with here
• Supervised learning
• Classification / classifier learning
• regression
• Unsupervised learning
• Association rule mining
• Clustering
What‘s the human input in both types?
![Page 112: Berendt: Knowledge and the Web, 2015, berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases):](https://reader031.fdocuments.in/reader031/viewer/2022012922/5697bfd81a28abf838caed2d/html5/thumbnails/112.jpg)
Berendt: Knowledge and the Web, 2015, http://www.cs.kuleuven.be/~berendt/teaching/
116
References / background reading; acknowledgements
The slides are based on Witten, I.H., & Frank, E.(2005). Data Mining. Practical Machine Learning Tools and
Techniques with Java Implementations. 2nd ed. Morgan Kaufmann. http://www.cs.waikato.ac.nz/%7Eml/weka/book.html
In particular, pp. 8-57 are based on the instructor slides for that book available at http://books.elsevier.com/companions/9780120884070/
(chapters 1-4):
http://books.elsevier.com/companions/9780120884070/revisionnotes/01~PDFs/chapter1.pdf (and ...chapter2.pdf, chapter3.pdf, chapter4.pdf) or
http://books.elsevier.com/companions/9780120884070/revisionnotes/02~ODP%20Files/chapter1.odp (and ...chapter2.odp, chapter3.odp, chapter4.odp)
Scales (aka levels) of measurement are explained well here:
http://en.wikipedia.org/wiki/Level_of_measurement [15 Nov 2014]