Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

36
Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur

Transcript of Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Page 1: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Information Retrieval

Prepared by: Cong Chau

Supervised by: Prof. Esma Aimeur

Page 2: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

2/36

Agenda

• Introduction

• Present Technology

• Advanced Technology

• Tendency

• Conclusion

Page 3: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

3/36

• Definition: – Data, Information, Knowledge.

• Information is a power.

Introduction Present Advance Tendency Conclusion

Page 4: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

4/36

Taxonomy

Introduction Present Advance Tendency Conclusion

Page 5: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

5/36

Taxonomy

Introduction Present Advance Tendency Conclusion

USER TASK

RetrievalSpecific purposeFiltering

Browsing

Classic ModelsBooleanVectorProbabilistic

Structured ModelsNon-overlapping ListsProximal Nodes

BrowsingFlatStructure GuidedHypertext

Set TheoreticFuzzyExtended Boolean

AlgebraicGeneralized VectorLat. Semantic IndexNeural Network

ProbabilisticInference NetworkBelief Network

Page 6: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

6/36

Boolean Model

• Theory and Boolean algebra AND,OR, NOT,...

Introduction Present Advance Tendency Conclusion

Page 7: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

7/36

Boolean Model

Introduction Present Advance Tendency Conclusion

Page 8: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

8/36

Vector Model• weights (based on the frequencies usage) or degree of similarity.

Introduction Present Advance Tendency Conclusion

Page 9: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

9/36

Probabilistic Model

• The words were first assigned by an expert a certain probability P(Ki|R)=0.3; P(Ki|nonR)=0.7

Introduction Present Advance Tendency Conclusion

Page 10: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

10/36

Taxonomy

Introduction Present Advance Tendency Conclusion

USER TASK

RetrievalSpecific purposeFiltering

Browsing

Classic ModelsBooleanVectorProbabilistic

Structured ModelsNon-overlapping ListsProximal Nodes

BrowsingFlatStructure GuidedHypertext

Set TheoreticFuzzyExtended Boolean

AlgebraicGeneralized VectorLat. Semantic IndexNeural Network

ProbabilisticInference NetworkBelief Network

Page 11: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

11/36

Fuzzy• Dealing with the marginal, gradual value rather than the Boolean

abrupt value 0 or 1. It computes the relations (algebraic sums and algebraic products) between documents and fuzzy index.

Introduction Present Advance Tendency Conclusion

Page 12: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

12/36

Extended Boolean Model

• It is a boolean with a mixed partial matching and term weighting

Introduction Present Advance Tendency Conclusion

Page 13: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

13/36

Extended Boolean Model:

Introduction Present Advance Tendency Conclusion

Page 14: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

14/36

Taxonomy

Introduction Present Advance Tendency Conclusion

USER TASK

RetrievalSpecific purposeFiltering

Browsing

Classic ModelsBooleanVectorProbabilistic

Structured ModelsNon-overlapping ListsProximal Nodes

BrowsingFlatStructure GuidedHypertext

Set TheoreticFuzzyExtended Boolean

AlgebraicGeneralized VectorLat. Semantic IndexNeural Network

ProbabilisticInference NetworkBelief Network

Page 15: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

15/36

Generalized Vector Space Model

• Gathers a set index terms that have a little similar meaning, then it associates the binary (1 or 0) weights to the document-term pairs.

• For example: the words similar ‘data’, ‘information’, ‘knowledge’ are grouped in a set. The documents link with the word ‘data’ such as ‘data structure’ and ‘data base’ will generate the first subset {1,1,0,0,0,0}. The word ‘information’ will link with the two documents ‘information retrieval’ and ‘information age ’ to form a second subset {0,0,1,1,0,0}. Similarly, the documents ‘knowledge management’ and ‘knowledge base’ will have {0,0,0,0,1,1}.

Introduction Present Advance Tendency Conclusion

Page 16: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

16/36

Latent Semantic Indexing

• Matching the documents in a matrix instead of indexing them.

• M= CTR

Introduction Present Advance Tendency Conclusion

Page 17: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

17/36

Neural Network

• Emitted the signals that are propagated from one node to another. The indexing is fixed, but the weights are changing with time

Introduction Present Advance Tendency Conclusion

Page 18: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

18/36

Taxonomy

Introduction Present Advance Tendency Conclusion

USER TASK

RetrievalSpecific purposeFiltering

Browsing

Classic ModelsBooleanVectorProbabilistic

Structured ModelsNon-overlapping ListsProximal Nodes

BrowsingFlatStructure GuidedHypertext

Set TheoreticFuzzyExtended Boolean

AlgebraicGeneralized VectorLat. Semantic IndexNeural Network

ProbabilisticInference NetworkBelief Network

Page 19: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

19/36

Inference Model

• it associates random variables with the index terms. Then depending on the needs, they will be linked with either Prior Probabilities or Boolean or Ranking Strategies .

Introduction Present Advance Tendency Conclusion

Page 20: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

20/36

Belief Network Model

• Not only use an interpretation of probabilities, but also is using the set-theory and sample space, so it can separate the documents and the queries.

Introduction Present Advance Tendency Conclusion

Page 21: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

21/36

Taxonomy

Introduction Present Advance Tendency Conclusion

USER TASK

RetrievalSpecific purposeFiltering

Browsing

Classic ModelsBooleanVectorProbabilistic

Structured ModelsNon-overlapping ListsProximal Nodes

BrowsingFlatStructure GuidedHypertext

Set TheoreticFuzzyExtended Boolean

AlgebraicGeneralized VectorLat. Semantic IndexNeural Network

ProbabilisticInference NetworkBelief Network

Page 22: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

22/36

Proximal Nodes Model

• It divides a book in a tree hierarchical structure, starting from root (book) to branches (chapters), smaller branches (sections) and leaves (subsections).

Introduction Present Advance Tendency Conclusion

Page 23: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

23/36

Non-overlapping List Model

• Flat, the information is not overlapping at all.

Introduction Present Advance Tendency Conclusion

Page 24: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

24/36

Taxonomy

Introduction Present Advance Tendency Conclusion

USER TASK

RetrievalSpecific purposeFiltering

Browsing

Classic ModelsBooleanVectorProbabilistic

Structured ModelsNon-overlapping ListsProximal Nodes

BrowsingFlatStructure GuidedHypertext

Set TheoreticFuzzyExtended Boolean

AlgebraicGeneralized VectorLat. Semantic IndexNeural Network

ProbabilisticInference NetworkBelief Network

Page 25: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

25/36

Browsing Models

• Flat model: the document is considered as a point in one dimension linear array or two dimensions plan array

• Structure guided model: the information is organized in hierarchy levels.

Introduction Present Advance Tendency Conclusion

Page 26: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

26/36

Advance Models

• Voice Browsing: uses the voice user interface (VUI) instead of graphic user interface (GUI), we interact with the digital voice to get the information. Bell’s voice browsing number 310-2355.

Introduction Present Advance Tendency Conclusion

Page 27: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

27/36

Advance Models (conti)• Image Retrieval Model: divides

the image into eight different regions, and then it is using the ratio between Red Green Blue (RGB), the gray scale conversion of that image and the relationship between shape and color for indexing.

Introduction Present Advance Tendency Conclusion

Page 28: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

28/36

Advance Models (conti)• Graph-based Induction Mode: uses the steps the stepwise pair

expansion to extract distinctive patterns of the graph.

Introduction Present Advance Tendency Conclusion

Page 29: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

29/36

Problems

Introduction Present Advance Tendency Conclusion

Page 30: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

30/36

Advance Models (conti)

• Multimedia Retrieval Model uses the key frame of the video, audio, text and graphic with a complex theory of probability (mean, variance, cluster, etc) to index the media.

• Multi-Agent System Web Information Retrieval uses many intelligence agents surfing on the web, each try to accomplish one part of a query, then they will collaborate together to give an answer the requested query.

Introduction Present Advance Tendency Conclusion

Page 31: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

31/36

Trend

• Text(T), voice(V), graphic(G), and video(D). to satisfy that demand. The following function as the suggested:

Y = B0 + B1*T+ B2*V + B3*G + B4*D

Introduction Present Advance Tendency Conclusion

Page 32: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

32/36

Multi-Agent

Introduction Present Tendency ConclusionAdvance

Page 33: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

33/36

Information Retrieval in future

Introduction Present Tendency Conclusion

Human

Laser Pointing

Web Displays

Finger Pointing Videos

Speakers Sound Systems

Future

Page 34: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

34/36

Conclusion

• Taxonomy• Advance• Tendency

Introduction Present Advance Tendency Conclusion

Page 35: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

35/36

Reference

• BOOKS:• Active Mining by Hiroshi Motoda 2002• Modern Information Retrieval. by Ricardo Baeza-Yates, Berthier Ribiero-Neto,

Berthier Ribeiro-Neto 1999.• PAPERS:• 1) G. Slaton. The SMART Retrieval System 1971.• 2) S. E . Robert and K. Sparck Jones. Relevance weighting of search terns, Journal of

the American Society for Information Sciences 1976• 3) Y. Ogawa, T. Morita and K. Kobayashi. Fuzzy Set and Systems 1991.• 4) Gerard Salton, Edward A. Fox and Harry Wu. Extended Boolean information

retrieval. Communication ACM November 1983.• 5) S.K. M. Wong, W. Ziarko, and P. C. N. Wong Generalize vector space model in

information retrieval concepts in vecto4 spaces Conference on Research and Development in Information Retrieval 1985

• 6) G.W. Furnas, S. Deerwester, S. T. Dumais., T. K. Landauer, R. A. Harsh, L. A. Streeter and K. E. Lochbaum Infromation retrieval using a singular value decomposition model of latent semantic structure 11th Annual International ACM SIIGR Conference on Research and Development in Information Retrieval 1988.

Page 36: Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.

Cong Chau Information RetrievalMarch 2004

36/36

Reference (conti)

• 7) R. Wilkson and P. Hingston. Using the cosine measure in a neural network for document retrieval Conference on Research and Development in Information Retrieval USA 1991

• 8) H. Turtle and W.B. Croft Evaluation of an inference network-based retrieval model ACMTransactions on Information Systems July 1991

• 9) Berthier A. Riberiro-Neto and Richard Muntz A belief network model for IR Conference on Research and Development in IR 1996

• 10)F. Burkkowski. An algebra for hierarchically organized text-dominated databases. Information Processing & Management 1992.

• 11) R.Baeza-Yates and Navarro Integrating contents and structure in text retrieval.• 12) Motochimi Inoue, Yasue Mitsukur, Minoru Fukumi, and Norio Akamatsu Neural net

base image retrieval by using color and location information, IEEE 2000• 13) Takashi Matsuda, Hiroshi Motoda, Tetsuya Yoshida, Takashi Washio Knowledge

discovery form structured data by Beam-wise Graph-Base Induction• 14) Y.Y.Xu, H.C.Fu, and H.T.Pao A WWW-Based Multimedia Information query and

retrieve system IEEE 2000.• 15) K.B.Shaban, O.A.Basir, K.Hassanein, M.Karnel Information Fusion in a

Cooperative Multi-Agent system for Web Information retrieval