1 Data Mining for Enlightenment Bettina Berendt ~berendt.
Transcript of 1 Data Mining for Enlightenment Bettina Berendt ~berendt.
1
Data Mining for Enlightenment
Bettina Berendtwww.cs.kuleuven.be/~berendt
2
Basics
Data Mining (DM) – used in the sense of Knowledge Discovery:
“the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”
(Fayyad et al., 1996)
Enlightenment:
“man's emergence from his self-imposed immaturity”
(Kant, 1784)
3
Data mining for ...
4
Putting it together: (One) first tryPutting it together: (One) first try
What makes people happy?
Classification learning
5
Results: Corpus-derived happiness factors
yay 86.67
shopping 79.56
awesome 79.71
birthday 78.37
lovely 77.39
concert 74.85
cool 73.72
cute 73.20
lunch 73.02
books 73.02
goodbye 18.81hurt 17.39tears 14.35cried 11.39upset 11.12sad 11.11cry 10.56died 10.07lonely 9.50crying 5.50 [Mihalcea & Liu, Proc. CAAW 2006]
6
The approach: DM for human learning
5 wrong (but popular) metaphors
about the Internet
articulation and reflection
socialmultiple perspectives
active / con-structive
situated and authentic; multiple contexts
Successful learning
is / has ...
Refutation and DM tool support
7
Interdisciplinary challenges
Engineering challenges
[Reputation challenges]
Interdisciplinary / application question challenges
Computational / DM methods challenges
8
Metaphor 1
9
The Internet is a textbook
10
Multi-purpose tools (with DM) for situated and authentic Internet use
Text and link analysis
11
Metaphor 2
12
The Internet is television
13
DM for active/constructive information use: Can you organize these results some more?
14
Organisation of the literature /bibliography constructionDM for active/constructive information use (1): Intelligent bibliography creation
[Berendt, Dingel, & Hanser, Proc. ECDL 2006; Berendt & Krause, submitted; Berendt & Kolbe, in prep.]
Citation-based clustering,text analysis (TF.IDF, ...)for semi-automatic ontology learning;Embedded in authoring tool
15
Metaphor 3
16
The Internet is a pile of rubbish (biased / extremist / subjective)
17
DM for analyzing multiple perspectives:
[Fortuna, Galleguillos, & Cristianini, in press]
What characterizes different news sources?
Nearest neighbour / best reciprocal hitfor document matching;Kernel Canonical Correlation Analysisand vector operationsfor finding topics and characteristic keywords
18
DM for exploring multiple perspectives
Hyperlinks from blogs to mainstream news media Germany USA
[Berendt, Schlegel, & Koch, in Kommunikation, Partizipation und Wirkungen im Social Web, in press]
How do different news media source / refer to one another?
HTML wrappingand link analysis;(not shown:Named Entity Recognitionfor retrieving textual links)
19DM for making people active explorers of multiple perspectives and multiple contexts
[Berendt & Trümper, PASCAL Symposium, 2008]
Clustering for semi-automatic ontology learning;Named Entity Recognition;Multi-dimensional similarity construct and filtering for nearest-neighbour search
20
Metaphor 4
21
The Internet is a dark cave
22
Social tagging for making people see, explore and generate multiple perspectives
See also [Vuorikari, Ochoa, & Duval, submitted]
23
Social browsing with the semantic pointerfor making people see & explore multiple perspectives
[Ferlež, PASCAL Symposium, 2008; www.jureferlez.name/2007/07/text-mining-for-semantically-enabled.html]
Inter-page text-block similarity analysis;Client-side usage tracking and real-time matching
24
DM for exploring how multiple perspectives evolve
[Griffith, 2007; http://wikiscanner.virgil.gr/]
Why is Scientology an uncontroversial organisation?
Usage tracking,feature constructionby table lookup
25
DM for exploring how multiple perspectives evolve
26
Metaphor 5
27
The Internet is a library
28
... But you‘re a document too!... but you‘re a document too!
[Owad, 2006; www.applefritter.com/bannedbooks]
Where do people live who will buy the Qur‘an soon?
29
DM for demonstrating the Internet‘s inference capabilities (how to create that book map)
Attribute matching in diff. schemas, view construction
30
DM for articulation and reflection
Repetition Organisation Elaboration
[Berendt, in Neues
Handbuch Hochschul-
lehre, 2006]
Proxy server
LogfileASP
Usage tracking, semantic graph coarsening
31
A conclusion ... and a vision
32
A conclusion ... and a vision
New happiness factors:
yay 86.67
shopping 79.56
awesome 79.71
learning 86.67
understanding 79.56
democracy 79.71
…
33
Caveat 1: Data preparation
One approach:
Tools for active (interactive) wrapper learning
34
Caveat 2: “Digging and surfing“
Reductive understanding is not always adequate and/or desired
Person
Context
Task
...
One approach: Treat it as a competency
35
Caveat 3: Cultural/economic biasLand area Population Internet users
[www.worldmapper.org]
36
… Questions? Comments? Other?
Thank you …