Web Semantic & Mining
-
Upload
mohammed-al-haj -
Category
Documents
-
view
218 -
download
0
Transcript of Web Semantic & Mining
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 1/50
Company
LOGO
The Semantic Web MiningThe Next Evolution of the WWW
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 2/50
Overview
What is the Semantic Web?
Background
Components of the Semantic Web
Why the Semantic Web is needed
Machine Learning & the Semantic Web What is Text Mining?
Mining the Web
How Is All This Related to the Semantic Web?
Mining the Semantic Web
Uses of the Semantic Web
Implementing the Semantic Web
Examples
2
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 3/50
Dream
I have a dream for the Web [in which computers]
become capable of analyzing all the data on the Web ±
the content, links, and transactions between people and
computers. A µSemantic Web¶, which should make this
possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily
lives will be handled by machines talking to machines.
The µintelligent agents¶ people have touted for ages will
finally materialize.
± Tim Berners-Lee, 1999
3
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 4/50
Computers don¶t understand Meaning
³My mouse is broken. I need anew one«´
³My mouse is broken´ vs. ³My mouse is dead ́
4
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 5/50
What is the Semantic Web?
The Semantic Web is a group of methodsand technologies to allow machines tounderstand the meaning - or "semantics" - of
information on the World Wide Web.
7
Before Semantic Web
Web content
UsersCreatorsWWWand
Beyond
8
Semantic Web Structure
Semantic
AnnotationsOntologies Logical Support
Languages ToolsApplications /
Services
Web content
UsersCreatorsWWW
and
Beyond
Semantic
Web
5
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 6/50
What is the Semantic Web? (cont)
A framework that:
} Adds meaning to data
} Provides a mechanism for organizing,
interpreting, and making use of that meaning
The Semantic Web is "an extended web of machine-readable information and automatedservices that amplify the Web far beyond
current capabilities" (Daconta et al., 2003)
6
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 7/50
What is the Semantic Web? (cont)
An enhancement to the current Web, not a
replacement
³The Semantic Web will bring structure to the
meaningful content of Web pages, creating an
environment where software agents roaming
from page to page can readily carry outsophisticated tasks for users´ (Berners-Lee et
al., 2001)
7
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 8/50
Background
1968 ± Internet used as a communicationsnetwork by DOD
1989 ± Tim Berners-Lee (and others) at
CERN develop HTML from SGML Early 1990s ± Web browsers created to
interpret HTML
1996 ± XML developed
1990s+ ± Tim Berners-Lee & W3Ccontinue to pursue development theSemantic Web
8
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 9/50
Components of the Semantic Web
Four major components:
} XML
} Resource Description Framework (RDF)
} Ontologies
} Agents
9
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 10/50
Supplemental Components of the
Semantic Web
Supplemental components
} Uniform Resource Identifiers (URIs)
} Web services
} Inference rules
} Service discovery
} Semantic aware applications
} Security and trust} XML and RDF schemas
10
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 11/50
11
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 12/50
12
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 13/50
XML
HTML (XHTML) is a series of predefined
tags that add presentation to data
<b>This text is bold</b>
XML is a series of user-defined tags that
add information and structure to data
<author>John Smith</author>
13
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 14/50
XML (cont)
"XML has become the universal syntax for exchanging data between organizations"
(Daconta et al., 2003)
Issue:Some mechanism must exist for coordinating the
meaning of the user-defined tags and for understanding the context of that information
Company A: <name>Smith</name>Company B: <employee>Jones</employee>Company C: <name>Williams</name>
14
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 15/50
Resource Description Framework
(RDF)
An XML-based language used to describe
resources
Resources can include entities, concepts,
properties and relations
Captures the meta data about the
³externals´ of a document
Can use a serialized model, RDF triplets,special notation, or graphs to describe
data
15
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 16/50
Resource Description Framework
(RDF) (cont)
RDF triplet (subject, predicate, object/literal):
Subject
Object
Literal
Predicate
Predicate
The company sells software
The company is named Microsoft
John Smith is the president of Company X
Company
Software
Microsoft
sells
Is named
16
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 17/50
Ontologies
Provide the repositories for meaning
interpretations
Provide a mechanism for defining therelationship among different words and for
the Semantic Web, relationships among
different resources
³the common words and concepts (the
meaning) used to describe and represent an
area of knowledge" (Daconta et al., 2003)
17
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 18/50
Ontologies (cont)
Consist of:
} Taxonomies
³An organized set of terms.´ (McComb, 2004)
A classification and a tree (Daconta et al., 2003) Hierarchal, tree-like structures similar to
organizational charts
} Sets of inference rules
Used to organize semantics
18
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 19/50
Taxonomy
Object
Person Topic Document
Researcher Student Semantics
OntologyDoctoral Student
Taxonomy := Segmentation, classification and ordering of
elements into a classification system according to their
relationships between each other
PhD Student F-Logic
Menu
19
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 20/50
Thesaurus
Object
Person Topic Document
Researcher Student Semantics
PhD StudentDoktoral Student
Terminology for specific domain
Graph with primitives, 2 fixed relationships (similar, synonym)
originate from bibliography
similar synonym
OntologyF-Logic
Menu
20
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 21/50
Topic Map
Object
Person Topic Document
Researcher Student Semantics
PhD StudentDoktoral Student
knows described_in
writes
AffiliationTel
Topics (nodes), relationships and occurences (to documents)
ISO-Standard
typically for navigation- and visualisation
OntologyF-Logic
similar synonym
Menu
21
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 22/50
OntologyF-Logic
similar
OntologyF-Logic
similar
PhD StudentDoktoral Student
Ontology (in our sense)
Object
Person Topic Document
Tel
PhD StudentPhD Student
Semantics
knows described_in
writes
Affiliationdescribed_in is_about
knowsP writes D is_about T P T
DT T D
Rules
subTopicOf
Representation Language: Predicate Logic (F-Logic)
Standards: RDF(S); coming up standard: OWL
Researcher Student
instance_of
is_a
is_a
is_a
Affiliation
Affiliation
York Sure
AIFB+49 721 608 6592
22
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 23/50
Agents
Also known as software agents
Provide automation services
Should not be designed to replace
humans or to make decisions
Automated agents to perform tasks for
users of the semantic web using this data
23
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 24/50
24
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 25/50
25
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 26/50
26
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 27/50
27
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 28/50
28
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 29/50
29
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 30/50
What Machine Learning Can
Do for the Semantic Web
Upgrading the current web to a semanticweb involves a lot of work
Can partially be automated!
Examples:} Learning ontologies
} Automatic document classification
} Information integration
} ...
30
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 31/50
Learning Ontologies
View:} Manually creating of ontologies is very labour-
intensive
} Fully automating creating of ontologies is not feasible
} Hence: develop tool that helps building ontologies Basic components:
} Good graphical interface (interaction man-machine)
} Powerful underlying machine learning techniques
31
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 32/50
Some Useful Techniques
for Learning Ontologies
Term extraction from texts} Identification of concepts
Hierarchical Clustering} Clustering: finding groups of ³similar´ things
} Hierarchical clustering: clusters of clusters
} Taxonomy can be constructed through hierarchicalclustering of concepts
Association rules} Find sets of terms that often occur together } May indicate important relations
E.g., events in texts often co-occur with locations
32
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 33/50
33
What is Text Mining?
Text mining is about knowledge discovery fromlarge collections of unstructured text.
Its not the same as data mining, which is moreabout discovering patterns in structured data
stored in databases. Similar techniques are sometimes used,
however text mining has many additionalconstraints caused by the unstructured nature ofthe text and the use of natural language.
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 34/50
Mining the Web
Analyze data that are available on the Web
Distinguish 3 types:
} Web content mining
Look in contents of documents (text, ...)
} Web structure mining
Look at links between documents
} Web usage mining
Look at user logs (e.g. who accessed a web page, which
links often used, ...)
34
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 35/50
How Is All This Related to
the Semantic Web?
} Machine learning can help with building the
Semantic Web
} The Semantic Web will help mining the Web,
making Web interfaces and agents moreintelligent
35
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 36/50
What the Semantic Web
Can Do for Web Mining
Will make mining the web much easier
Reason 1: removal of ambiguity
} More precise knowledge of what is meant with certain
terms
Reason 2: structured vs. unstructured data
} Learning from structured data is much easier than
from unstructured data
Reason 3: availability of background knowledge} Can be used to make better decisions when learning
36
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 37/50
Removal of Ambiguity
Example: text document classification} E.g., given a text, tell in which newsgroups it belongs
Typical approaches: ³bag of words´} Look only at which words occur, in the text, and how
often} Each time a word occurs that occurs mainly in one
particular class, increase probability for that class
} But words are ambiguous!
} Increased classification accuracy can be expected byremoving ambiguity
37
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 38/50
Mining From (Un)structured Data
Mining data = intensively querying data
Answering a querying is
} Easy in structured data
Relational database, XML, ...} Harder in semi-structured data (e.g., HTML)
} Hard in unstructured data Information extraction needed
Could do this by learning a ³wrapper´ This involves one extra layer of learning
38
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 39/50
Availability of Background Knowledge
Learning = finding relevant patterns in behaviour
Important to have the right context to describethese patterns
Example:} Making interesting offers to clients
} ³People who bought this book also bought ...´
} = ³Instance-based´ learning Estimate profile of user
Find users with similar profile Look at behaviour of those users to help current user
39
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 40/50
Availability of Background Knowledge
Can work better if more background knowledgeis available, e.g., type of book, author, ...} For instance, for books:
³similar profile´ = users that up till now bought same books as
this user } May not be many people
³similar´ = often bought books by same author
} Probably many more people, allows for more reasonable guess
³similar´ = often bought books of same genre (fiction, ...)
} May work even better
Ontologies (among other) provide suchbackground knowledge
40
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 41/50
Web Mining Revisited
Semantic Web will change
} Content mining
Clearer view on contents and meaning of documents
} Structure mining
More relevant structure
} Usage mining
More relevant information on actions of user
Will in general improve intelligence of systems
} E.g. mail filter gets a better view of contents of mails
41
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 42/50
Mining the Semantic Web
Knowledge base
Hotel: Wellnesshotel
GolfCourse: Seaview
belongsTo(Seaview,
Wellnesshotel)
...
Association
Rule Mining
Hotel( x ), GolfCourse(y ), belongsTo(y, x ) p hasStars( x ,5)
support = 0.4 % confidence = 89 %
belongsTo
FORALL X, Y
Y: Hotel[cooperatesWith ->> X] <-
X:ProjectHotel[cooperatesWith ->> Y].
GolfCourse
Organization
Hotel
name
cooperatesWith
Ontology
42
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 43/50
Semantic Web Usage Mining
p3ee24304.dip.t-dialin.net - - [19/Mar/2002:12:03:51 +0100]
"GET /search.html?l=ostsee%20strand &syn=023785&ord=asc HTTP/1.0" 200 1759
p3ee24304.dip.t-dialin.net - - [19/Mar/2002:12:05:06 +0100]
"GET /search.html?l=ostsee%20strand & p=low&syn=023785&ord=desc HTTP/1.0" 200 8450 p3ee24304.dip.t-dialin.net - - [19/Mar/2002:12:06:41 +0100]
"GET /mlesen.html?Item=3456&syn=023785 HTTP/1.0" 200 3478
Search by
Location
Search by
Location
and Price
R efine
search
Choose
item
Look at individual
Hotel.
From logfile analysis ...
... to semantic logfile analysis:
Basic idea: associate each requested page with one or more ontological entities,to better understand the process of navigation
[Berendt & Spiliopoulou 2000; Berendt 2002; Oberle 2003]
Use the gained knowledge to
understand search strategies
improve navigation design
personalization
43
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 44/50
Text Document Clustering of
Crawled Documents
WWW
Explanation
Clustering
Focused Crawling
44
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 45/50
Uses of the Semantic Web
Improve e-business processes
Improve business-to-business (B2B)communication
³assist human users in their day-to-day onlineactivities´ (Antoniou & van Harmelen, 2004)
³build knowledge and understanding from raw data´(Daconta et al., 2003)} Improve knowledge management
} Improve information retrieval
} Automate tasking
} Integrate data
} Maximize customer value and profits
45
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 46/50
Implementing the Semantic Web
Convert data to XML format according to definedXML schemas
Expose applications as Web services
Build ontologies that specify semantic meaningsand the relationships between data
Create agents that make use of the semantic data,automate search processes, and automate other business processes
46
I C d ith
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 47/50
Issues Concerned with
Implementing the Semantic Web
Cost
Security
Nonstandard technology issues
Semantic precision
47
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 48/50
Any Questions?
48
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 49/50
References
Antoniou, G., & van Harmelen, F. (2004). A semantic Web primer .Cambridge, MA: The MIT press.
Athauda, R. I. (2000). Integration and querying of heterogeneous,autonomous, distributed database systems (Vol. 61/06, pp. 3126):Florida International University.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web.
S cientific American, 284(5
), 34
-4
3. Carey, P., & Kemper, M. (2003). N ew perspectives on creating Web
pages with HTML and Dynamic HTML (2nd ed.). Boston: CourseTechnology.
Daconta, M. C., Obrst, L. J., & Smith, K. T. (2003). The S emantic Web: A guide to the future of XML, Web services, and knowledgemanagement . Indianapolis, IN: Wiley Publishing, Inc.
Ewalt, D. M. (2002, October 14). Semantic Web. InformationWeek, 35-44.
Galitz, W. O. (2002). The essential guide to user interface design. NewYork: John Wiley & Sons, Inc.
49
8/3/2019 Web Semantic & Mining
http://slidepdf.com/reader/full/web-semantic-mining 50/50
References
Gould, M. (1996). Rules in the virtual society. International R eview of Law, Computers & Technology, 10 (2), 199-218.
Kalakota, R., & Robinson, M. (2001). e-Business 2.0: R oadmap for success. Upper Saddle River, NJ: Addison-Wesley.
Lexico Publishing Group, L. (2004). Inference. Retrieved December 7,2004, from http://dictionary.reference.com/search?q=inference
McComb, D. (2004). S emantics in business systems: The savvy manager's guide. San Francisco, CA: Morgan Kaufmann Publishers.
Tiwana, A. (2002). The knowledge management toolkit . Upper SaddleRiver, NJ: Prentice Hall PTR.
Warren, P. (2003). The next steps for the WWW: Putting meaning intothe Web. Computing & Control Engineering, 14(2), 27-31.
Young, M. J. (2002). XML step by step (2nd ed.). Redmond, WA:Microsoft Press.
50