CSA3080: Adaptive Hypertext Systems I

18
1 of 18 [email protected] University of Malta CSA3080: Lecture 10 © 2003- Chris Staff CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department of Computer Science & AI University of Malta Lecture 10: Representing Data, Information, and Knowledge II

description

CSA3080: Adaptive Hypertext Systems I. Lecture 10: Representing Data, Information, and Knowledge II. Dr. Christopher Staff Department of Computer Science & AI University of Malta. Surface-based approaches. - PowerPoint PPT Presentation

Transcript of CSA3080: Adaptive Hypertext Systems I

Page 1: CSA3080: Adaptive Hypertext Systems I

1 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

CSA3080:Adaptive Hypertext Systems I

Dr. Christopher StaffDepartment of Computer Science & AI

University of Malta

Lecture 10:Representing Data, Information, and

Knowledge II

Page 2: CSA3080: Adaptive Hypertext Systems I

2 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• Semantic representations and the ability to reason would give computational systems enormous potential

• Currently, it is not known what the limitations of the Semantic Web might be

• But it is certainly expensive to model knowledge (time, money, computationally)

Page 3: CSA3080: Adaptive Hypertext Systems I

3 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• Surface-based approaches attempt to approximate using information in the correct context (knowledge), but recognise their limitations

• E.g., Mulder [Kwok01] uses an extended boolean IR system to attempt to answer (certain types of) questions.

• Reference– Kwok, C.C.T., Etzioni, O., Weld, D.S., 2001, “Scaling Question-Answering to the

Web”, in Proceedings of the 10th International WWW Conference, Honk Kong, May 1-5, 2001. http://citeseer.nj.nec.com/kwok01scaling.html

Page 4: CSA3080: Adaptive Hypertext Systems I

4 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• Mulder– Turns questions into partial phrases, and then

submits a phrase query to an IR system– “Does John love Mary?” is turned into the

query “John loves Mary”– Documents containing the phrase are evidence– What are the limitations?

Page 5: CSA3080: Adaptive Hypertext Systems I

5 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• So, given that we are operating in a hypertextual environment, what can we use to i) identify what is of interest to a user– Assumptions

• 1: the user interest is represented by a description

• 2: description is a formal statement

• ii) adapt hyperspace to the user

Page 6: CSA3080: Adaptive Hypertext Systems I

6 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• This addresses our immediate concerns– i) identify what is of interest to a user

• so that a user doesn’t have to describe it

• user modelling next lecture

– ii) adapt hyperspace to the user• so that a user doesn’t have to find it

• adaptation techniques in the last lecture/s

Page 7: CSA3080: Adaptive Hypertext Systems I

7 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• At their most fundamental– An IR system is document representation + algorithm

for matching query to documents• Assume binary weights for terms

– A hypertext is a collection of nodes and links

• IR and Hypertext allow user interaction• What else can we say about the structures, user

interaction, with a view to learning about the user?

Page 8: CSA3080: Adaptive Hypertext Systems I

8 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• IR:– User submits query

– System returns relevant documents

– User reads/accesses some

– With relevance feedback, user can select examples of relevant/non-relevant documents and IR system will modify the query

– If we “remember” users we can remember terms used/documents viewed

Page 9: CSA3080: Adaptive Hypertext Systems I

9 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• IR:– Documents may be relevant to different queries

• Can we learn anything from this?

– Some words in query are used as context (to eliminate docs containing diff word senses)

– Relevance feedback

Page 10: CSA3080: Adaptive Hypertext Systems I

10 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• Hypertext– What’s a link really?– Navigation history– Automatic link “typing”– Contextualisation of information

• Is a document necessarily identically relevant to all parents?• Is all of a document necessarily relevant to all parents?

– Can we learn anything about documents which link to the same child/children?

– Are assumptions made about information by authors along a path?

Page 11: CSA3080: Adaptive Hypertext Systems I

11 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• HyperContext– If we index multiple representations of the

same document, will retrieval effectiveness improve?

– Can information be added to an interpretation (from its parents) to improve relevance?

– Can information be removed from an interpretation if it is non-relevant to a parent?

Page 12: CSA3080: Adaptive Hypertext Systems I

12 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Surface-based approaches

• This surface-based approach improves retrieval by filtering out non-relevant terms from documents and by adding relevant terms to documents– reducing the number of false positives

– increasing the chances of locating a relevant document

• It does nothing to expose the “meaning” of the data in the document

Page 13: CSA3080: Adaptive Hypertext Systems I

13 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Other examples

• WebWatcher [Armstrong95]– Adds user’s search terms to links on path to relevant

document so that future users can be guided

– Added terms do not need to be present anywhere in the hypertext

• Reference– R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. Webwatcher : A

learning apprentice for the world wide web . In 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments, March 1995. http://citeseer.nj.nec.com/armstrong95webwatcher.html

Page 14: CSA3080: Adaptive Hypertext Systems I

14 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Other examples

• Analysing Query Logs– Can documents be clustered according to the

terms that are used in queries?– Can queries be automatically expanded to find

documents relevant to what the user intended to ask for?

– Can we use the results of past similar queries?

Page 15: CSA3080: Adaptive Hypertext Systems I

15 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Other examples

• Analysing “context paths” [Mizuuchi99]– Terms “assumed” in Web pages may be explicit in the

access paths to those Web pages

– Users who follow links will have read the information

– But the info will be missing from the destination pg

• Reference– Mizuuchi, Y., and Tajima, K., 1999, “Finding Context Paths for

Web Pages”, in Proc. Hypertext 99. http://citeseer.nj.nec.com/mizuuchi99finding.html

Page 16: CSA3080: Adaptive Hypertext Systems I

16 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Other examples

– Use implicit link types to determine whether a path is “significant”

– Link types: • intradirectory• downward• upward• sibling• intersite

– Link roles:• entrance• back• jump

Page 17: CSA3080: Adaptive Hypertext Systems I

17 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Conclusion

• Surface-based approaches to AHS frequently couple IR or log analysis with hypertext

• The IR aspect is typically term-feature based

• “Meaning” is less embedded within the words/phrases that occur in a document, but with how the document is actually used

Page 18: CSA3080: Adaptive Hypertext Systems I

18 of [email protected] University of Malta

CSA3080: Lecture 10© 2003- Chris Staff

Conclusion

• These techniques can be coupled with NL techniques, such as Entity Name Recognition to improve term recognition– E.g., President of USA in one doc is referred to

as George W. Bush in another. Query (which is about GWB) is specified as “George Bush”

• Still cannot do reasoning about the content of documents