CSA3080: Adaptive Hypertext Systems I

1 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

CSA3080:Adaptive Hypertext Systems I

Dr. Christopher StaffDepartment of Computer Science & AI

University of Malta

Lecture 10:Representing Data, Information, and

Knowledge II



Surface-based approaches

• Semantic representations and the ability to reason would give computational systems enormous potential

• Currently, it is not known what the limitations of the Semantic Web might be

• But it is certainly expensive to model knowledge (time, money, computationally)




• Surface-based approaches attempt to approximate using information in the correct context (knowledge), but recognise their limitations

• E.g., Mulder [Kwok01] uses an extended boolean IR system to attempt to answer (certain types of) questions.

• Reference– Kwok, C.C.T., Etzioni, O., Weld, D.S., 2001, “Scaling Question-Answering to the

Web”, in Proceedings of the 10th International WWW Conference, Honk Kong, May 1-5, 2001. http://citeseer.nj.nec.com/kwok01scaling.html




• Mulder– Turns questions into partial phrases, and then

submits a phrase query to an IR system– “Does John love Mary?” is turned into the

query “John loves Mary”– Documents containing the phrase are evidence– What are the limitations?




• So, given that we are operating in a hypertextual environment, what can we use to i) identify what is of interest to a user– Assumptions

• 1: the user interest is represented by a description

• 2: description is a formal statement

• ii) adapt hyperspace to the user




• This addresses our immediate concerns– i) identify what is of interest to a user

• so that a user doesn’t have to describe it

• user modelling next lecture

– ii) adapt hyperspace to the user• so that a user doesn’t have to find it

• adaptation techniques in the last lecture/s




• At their most fundamental– An IR system is document representation + algorithm

for matching query to documents• Assume binary weights for terms

– A hypertext is a collection of nodes and links

• IR and Hypertext allow user interaction• What else can we say about the structures, user

interaction, with a view to learning about the user?




• IR:– User submits query

– System returns relevant documents

– User reads/accesses some

– With relevance feedback, user can select examples of relevant/non-relevant documents and IR system will modify the query

– If we “remember” users we can remember terms used/documents viewed




• IR:– Documents may be relevant to different queries

• Can we learn anything from this?

– Some words in query are used as context (to eliminate docs containing diff word senses)

– Relevance feedback




• Hypertext– What’s a link really?– Navigation history– Automatic link “typing”– Contextualisation of information

• Is a document necessarily identically relevant to all parents?• Is all of a document necessarily relevant to all parents?

– Can we learn anything about documents which link to the same child/children?

– Are assumptions made about information by authors along a path?




• HyperContext– If we index multiple representations of the

same document, will retrieval effectiveness improve?

– Can information be added to an interpretation (from its parents) to improve relevance?

– Can information be removed from an interpretation if it is non-relevant to a parent?




• This surface-based approach improves retrieval by filtering out non-relevant terms from documents and by adding relevant terms to documents– reducing the number of false positives

– increasing the chances of locating a relevant document

• It does nothing to expose the “meaning” of the data in the document



Other examples

• WebWatcher [Armstrong95]– Adds user’s search terms to links on path to relevant

document so that future users can be guided

– Added terms do not need to be present anywhere in the hypertext

• Reference– R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. Webwatcher : A

learning apprentice for the world wide web . In 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments, March 1995. http://citeseer.nj.nec.com/armstrong95webwatcher.html



Other examples

• Analysing Query Logs– Can documents be clustered according to the

terms that are used in queries?– Can queries be automatically expanded to find

documents relevant to what the user intended to ask for?

– Can we use the results of past similar queries?



Other examples

• Analysing “context paths” [Mizuuchi99]– Terms “assumed” in Web pages may be explicit in the

access paths to those Web pages

– Users who follow links will have read the information

– But the info will be missing from the destination pg

• Reference– Mizuuchi, Y., and Tajima, K., 1999, “Finding Context Paths for

Web Pages”, in Proc. Hypertext 99. http://citeseer.nj.nec.com/mizuuchi99finding.html



Other examples

– Use implicit link types to determine whether a path is “significant”

– Link types: • intradirectory• downward• upward• sibling• intersite

– Link roles:• entrance• back• jump



Conclusion

• Surface-based approaches to AHS frequently couple IR or log analysis with hypertext

• The IR aspect is typically term-feature based

• “Meaning” is less embedded within the words/phrases that occur in a document, but with how the document is actually used



Conclusion

• These techniques can be coupled with NL techniques, such as Entity Name Recognition to improve term recognition– E.g., President of USA in one doc is referred to

as George W. Bush in another. Query (which is about GWB) is specified as “George Bush”

• Still cannot do reasoning about the content of documents

CSA3080: Adaptive Hypertext Systems I

Documents

Transcript of CSA3080: Adaptive Hypertext Systems I