CSA3080: Adaptive Hypertext Systems I
-
Upload
august-allison -
Category
Documents
-
view
25 -
download
0
description
Transcript of CSA3080: Adaptive Hypertext Systems I
1 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
CSA3080:Adaptive Hypertext Systems I
Dr. Christopher StaffDepartment of Computer Science & AI
University of Malta
Lecture 10:Representing Data, Information, and
Knowledge II
2 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• Semantic representations and the ability to reason would give computational systems enormous potential
• Currently, it is not known what the limitations of the Semantic Web might be
• But it is certainly expensive to model knowledge (time, money, computationally)
3 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• Surface-based approaches attempt to approximate using information in the correct context (knowledge), but recognise their limitations
• E.g., Mulder [Kwok01] uses an extended boolean IR system to attempt to answer (certain types of) questions.
• Reference– Kwok, C.C.T., Etzioni, O., Weld, D.S., 2001, “Scaling Question-Answering to the
Web”, in Proceedings of the 10th International WWW Conference, Honk Kong, May 1-5, 2001. http://citeseer.nj.nec.com/kwok01scaling.html
4 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• Mulder– Turns questions into partial phrases, and then
submits a phrase query to an IR system– “Does John love Mary?” is turned into the
query “John loves Mary”– Documents containing the phrase are evidence– What are the limitations?
5 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• So, given that we are operating in a hypertextual environment, what can we use to i) identify what is of interest to a user– Assumptions
• 1: the user interest is represented by a description
• 2: description is a formal statement
• ii) adapt hyperspace to the user
6 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• This addresses our immediate concerns– i) identify what is of interest to a user
• so that a user doesn’t have to describe it
• user modelling next lecture
– ii) adapt hyperspace to the user• so that a user doesn’t have to find it
• adaptation techniques in the last lecture/s
7 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• At their most fundamental– An IR system is document representation + algorithm
for matching query to documents• Assume binary weights for terms
– A hypertext is a collection of nodes and links
• IR and Hypertext allow user interaction• What else can we say about the structures, user
interaction, with a view to learning about the user?
8 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• IR:– User submits query
– System returns relevant documents
– User reads/accesses some
– With relevance feedback, user can select examples of relevant/non-relevant documents and IR system will modify the query
– If we “remember” users we can remember terms used/documents viewed
9 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• IR:– Documents may be relevant to different queries
• Can we learn anything from this?
– Some words in query are used as context (to eliminate docs containing diff word senses)
– Relevance feedback
10 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• Hypertext– What’s a link really?– Navigation history– Automatic link “typing”– Contextualisation of information
• Is a document necessarily identically relevant to all parents?• Is all of a document necessarily relevant to all parents?
– Can we learn anything about documents which link to the same child/children?
– Are assumptions made about information by authors along a path?
11 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• HyperContext– If we index multiple representations of the
same document, will retrieval effectiveness improve?
– Can information be added to an interpretation (from its parents) to improve relevance?
– Can information be removed from an interpretation if it is non-relevant to a parent?
12 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Surface-based approaches
• This surface-based approach improves retrieval by filtering out non-relevant terms from documents and by adding relevant terms to documents– reducing the number of false positives
– increasing the chances of locating a relevant document
• It does nothing to expose the “meaning” of the data in the document
13 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Other examples
• WebWatcher [Armstrong95]– Adds user’s search terms to links on path to relevant
document so that future users can be guided
– Added terms do not need to be present anywhere in the hypertext
• Reference– R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. Webwatcher : A
learning apprentice for the world wide web . In 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments, March 1995. http://citeseer.nj.nec.com/armstrong95webwatcher.html
14 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Other examples
• Analysing Query Logs– Can documents be clustered according to the
terms that are used in queries?– Can queries be automatically expanded to find
documents relevant to what the user intended to ask for?
– Can we use the results of past similar queries?
15 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Other examples
• Analysing “context paths” [Mizuuchi99]– Terms “assumed” in Web pages may be explicit in the
access paths to those Web pages
– Users who follow links will have read the information
– But the info will be missing from the destination pg
• Reference– Mizuuchi, Y., and Tajima, K., 1999, “Finding Context Paths for
Web Pages”, in Proc. Hypertext 99. http://citeseer.nj.nec.com/mizuuchi99finding.html
16 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Other examples
– Use implicit link types to determine whether a path is “significant”
– Link types: • intradirectory• downward• upward• sibling• intersite
– Link roles:• entrance• back• jump
17 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Conclusion
• Surface-based approaches to AHS frequently couple IR or log analysis with hypertext
• The IR aspect is typically term-feature based
• “Meaning” is less embedded within the words/phrases that occur in a document, but with how the document is actually used
18 of [email protected] University of Malta
CSA3080: Lecture 10© 2003- Chris Staff
Conclusion
• These techniques can be coupled with NL techniques, such as Entity Name Recognition to improve term recognition– E.g., President of USA in one doc is referred to
as George W. Bush in another. Query (which is about GWB) is specified as “George Bush”
• Still cannot do reasoning about the content of documents