Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.
-
Upload
gabriella-hopkins -
Category
Documents
-
view
214 -
download
1
Transcript of Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur.
Information Retrieval
Prepared by: Cong Chau
Supervised by: Prof. Esma Aimeur
Cong Chau Information RetrievalMarch 2004
2/36
Agenda
• Introduction
• Present Technology
• Advanced Technology
• Tendency
• Conclusion
Cong Chau Information RetrievalMarch 2004
3/36
• Definition: – Data, Information, Knowledge.
• Information is a power.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
4/36
Taxonomy
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
5/36
Taxonomy
Introduction Present Advance Tendency Conclusion
USER TASK
RetrievalSpecific purposeFiltering
Browsing
Classic ModelsBooleanVectorProbabilistic
Structured ModelsNon-overlapping ListsProximal Nodes
BrowsingFlatStructure GuidedHypertext
Set TheoreticFuzzyExtended Boolean
AlgebraicGeneralized VectorLat. Semantic IndexNeural Network
ProbabilisticInference NetworkBelief Network
Cong Chau Information RetrievalMarch 2004
6/36
Boolean Model
• Theory and Boolean algebra AND,OR, NOT,...
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
7/36
Boolean Model
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
8/36
Vector Model• weights (based on the frequencies usage) or degree of similarity.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
9/36
Probabilistic Model
• The words were first assigned by an expert a certain probability P(Ki|R)=0.3; P(Ki|nonR)=0.7
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
10/36
Taxonomy
Introduction Present Advance Tendency Conclusion
USER TASK
RetrievalSpecific purposeFiltering
Browsing
Classic ModelsBooleanVectorProbabilistic
Structured ModelsNon-overlapping ListsProximal Nodes
BrowsingFlatStructure GuidedHypertext
Set TheoreticFuzzyExtended Boolean
AlgebraicGeneralized VectorLat. Semantic IndexNeural Network
ProbabilisticInference NetworkBelief Network
Cong Chau Information RetrievalMarch 2004
11/36
Fuzzy• Dealing with the marginal, gradual value rather than the Boolean
abrupt value 0 or 1. It computes the relations (algebraic sums and algebraic products) between documents and fuzzy index.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
12/36
Extended Boolean Model
• It is a boolean with a mixed partial matching and term weighting
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
13/36
Extended Boolean Model:
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
14/36
Taxonomy
Introduction Present Advance Tendency Conclusion
USER TASK
RetrievalSpecific purposeFiltering
Browsing
Classic ModelsBooleanVectorProbabilistic
Structured ModelsNon-overlapping ListsProximal Nodes
BrowsingFlatStructure GuidedHypertext
Set TheoreticFuzzyExtended Boolean
AlgebraicGeneralized VectorLat. Semantic IndexNeural Network
ProbabilisticInference NetworkBelief Network
Cong Chau Information RetrievalMarch 2004
15/36
Generalized Vector Space Model
• Gathers a set index terms that have a little similar meaning, then it associates the binary (1 or 0) weights to the document-term pairs.
• For example: the words similar ‘data’, ‘information’, ‘knowledge’ are grouped in a set. The documents link with the word ‘data’ such as ‘data structure’ and ‘data base’ will generate the first subset {1,1,0,0,0,0}. The word ‘information’ will link with the two documents ‘information retrieval’ and ‘information age ’ to form a second subset {0,0,1,1,0,0}. Similarly, the documents ‘knowledge management’ and ‘knowledge base’ will have {0,0,0,0,1,1}.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
16/36
Latent Semantic Indexing
• Matching the documents in a matrix instead of indexing them.
• M= CTR
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
17/36
Neural Network
• Emitted the signals that are propagated from one node to another. The indexing is fixed, but the weights are changing with time
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
18/36
Taxonomy
Introduction Present Advance Tendency Conclusion
USER TASK
RetrievalSpecific purposeFiltering
Browsing
Classic ModelsBooleanVectorProbabilistic
Structured ModelsNon-overlapping ListsProximal Nodes
BrowsingFlatStructure GuidedHypertext
Set TheoreticFuzzyExtended Boolean
AlgebraicGeneralized VectorLat. Semantic IndexNeural Network
ProbabilisticInference NetworkBelief Network
Cong Chau Information RetrievalMarch 2004
19/36
Inference Model
• it associates random variables with the index terms. Then depending on the needs, they will be linked with either Prior Probabilities or Boolean or Ranking Strategies .
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
20/36
Belief Network Model
• Not only use an interpretation of probabilities, but also is using the set-theory and sample space, so it can separate the documents and the queries.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
21/36
Taxonomy
Introduction Present Advance Tendency Conclusion
USER TASK
RetrievalSpecific purposeFiltering
Browsing
Classic ModelsBooleanVectorProbabilistic
Structured ModelsNon-overlapping ListsProximal Nodes
BrowsingFlatStructure GuidedHypertext
Set TheoreticFuzzyExtended Boolean
AlgebraicGeneralized VectorLat. Semantic IndexNeural Network
ProbabilisticInference NetworkBelief Network
Cong Chau Information RetrievalMarch 2004
22/36
Proximal Nodes Model
• It divides a book in a tree hierarchical structure, starting from root (book) to branches (chapters), smaller branches (sections) and leaves (subsections).
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
23/36
Non-overlapping List Model
• Flat, the information is not overlapping at all.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
24/36
Taxonomy
Introduction Present Advance Tendency Conclusion
USER TASK
RetrievalSpecific purposeFiltering
Browsing
Classic ModelsBooleanVectorProbabilistic
Structured ModelsNon-overlapping ListsProximal Nodes
BrowsingFlatStructure GuidedHypertext
Set TheoreticFuzzyExtended Boolean
AlgebraicGeneralized VectorLat. Semantic IndexNeural Network
ProbabilisticInference NetworkBelief Network
Cong Chau Information RetrievalMarch 2004
25/36
Browsing Models
• Flat model: the document is considered as a point in one dimension linear array or two dimensions plan array
• Structure guided model: the information is organized in hierarchy levels.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
26/36
Advance Models
• Voice Browsing: uses the voice user interface (VUI) instead of graphic user interface (GUI), we interact with the digital voice to get the information. Bell’s voice browsing number 310-2355.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
27/36
Advance Models (conti)• Image Retrieval Model: divides
the image into eight different regions, and then it is using the ratio between Red Green Blue (RGB), the gray scale conversion of that image and the relationship between shape and color for indexing.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
28/36
Advance Models (conti)• Graph-based Induction Mode: uses the steps the stepwise pair
expansion to extract distinctive patterns of the graph.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
29/36
Problems
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
30/36
Advance Models (conti)
• Multimedia Retrieval Model uses the key frame of the video, audio, text and graphic with a complex theory of probability (mean, variance, cluster, etc) to index the media.
• Multi-Agent System Web Information Retrieval uses many intelligence agents surfing on the web, each try to accomplish one part of a query, then they will collaborate together to give an answer the requested query.
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
31/36
Trend
• Text(T), voice(V), graphic(G), and video(D). to satisfy that demand. The following function as the suggested:
Y = B0 + B1*T+ B2*V + B3*G + B4*D
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
32/36
Multi-Agent
Introduction Present Tendency ConclusionAdvance
Cong Chau Information RetrievalMarch 2004
33/36
Information Retrieval in future
Introduction Present Tendency Conclusion
Human
Laser Pointing
Web Displays
Finger Pointing Videos
Speakers Sound Systems
Future
Cong Chau Information RetrievalMarch 2004
34/36
Conclusion
• Taxonomy• Advance• Tendency
Introduction Present Advance Tendency Conclusion
Cong Chau Information RetrievalMarch 2004
35/36
Reference
• BOOKS:• Active Mining by Hiroshi Motoda 2002• Modern Information Retrieval. by Ricardo Baeza-Yates, Berthier Ribiero-Neto,
Berthier Ribeiro-Neto 1999.• PAPERS:• 1) G. Slaton. The SMART Retrieval System 1971.• 2) S. E . Robert and K. Sparck Jones. Relevance weighting of search terns, Journal of
the American Society for Information Sciences 1976• 3) Y. Ogawa, T. Morita and K. Kobayashi. Fuzzy Set and Systems 1991.• 4) Gerard Salton, Edward A. Fox and Harry Wu. Extended Boolean information
retrieval. Communication ACM November 1983.• 5) S.K. M. Wong, W. Ziarko, and P. C. N. Wong Generalize vector space model in
information retrieval concepts in vecto4 spaces Conference on Research and Development in Information Retrieval 1985
• 6) G.W. Furnas, S. Deerwester, S. T. Dumais., T. K. Landauer, R. A. Harsh, L. A. Streeter and K. E. Lochbaum Infromation retrieval using a singular value decomposition model of latent semantic structure 11th Annual International ACM SIIGR Conference on Research and Development in Information Retrieval 1988.
Cong Chau Information RetrievalMarch 2004
36/36
Reference (conti)
• 7) R. Wilkson and P. Hingston. Using the cosine measure in a neural network for document retrieval Conference on Research and Development in Information Retrieval USA 1991
• 8) H. Turtle and W.B. Croft Evaluation of an inference network-based retrieval model ACMTransactions on Information Systems July 1991
• 9) Berthier A. Riberiro-Neto and Richard Muntz A belief network model for IR Conference on Research and Development in IR 1996
• 10)F. Burkkowski. An algebra for hierarchically organized text-dominated databases. Information Processing & Management 1992.
• 11) R.Baeza-Yates and Navarro Integrating contents and structure in text retrieval.• 12) Motochimi Inoue, Yasue Mitsukur, Minoru Fukumi, and Norio Akamatsu Neural net
base image retrieval by using color and location information, IEEE 2000• 13) Takashi Matsuda, Hiroshi Motoda, Tetsuya Yoshida, Takashi Washio Knowledge
discovery form structured data by Beam-wise Graph-Base Induction• 14) Y.Y.Xu, H.C.Fu, and H.T.Pao A WWW-Based Multimedia Information query and
retrieve system IEEE 2000.• 15) K.B.Shaban, O.A.Basir, K.Hassanein, M.Karnel Information Fusion in a
Cooperative Multi-Agent system for Web Information retrieval