© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J....

64
© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J....

Page 1: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 1

CPE/CSC 580: Knowledge Management

CPE/CSC 580: Knowledge Management

Dr. Franz J. Kurfess

Computer Science Department

Cal Poly

Page 2: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 2

Course OverviewCourse Overview Introduction Knowledge Processing

Knowledge Acquisition, Representation and Manipulation

Knowledge Organization Classification, Categorization Ontologies, Taxonomies,

Thesauri

Knowledge Retrieval Information Retrieval Knowledge Navigation

Knowledge Presentation Knowledge Visualization

Knowledge Exchange Knowledge Capture, Transfer,

and Distribution

Usage of Knowledge Access Patterns, User Feedback

Knowledge Management Techniques Topic Maps, Agents

Knowledge Management Tools

Knowledge Management in Organizations

Page 3: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 3

Overview Knowledge RetrievalOverview Knowledge Retrieval

Motivation Objectives Finding Out About

Keywords and Queries Documents Indexing

Data Retrieval Access via Address, Field, Name

Information Retrieval Parsing Matching Against Indices Retrieval Assessment

Knowledge Retrieval Context Usage

Knowledge Discovery Data Mining Rule Extraction

Important Concepts and Terms

Chapter Summary

Page 4: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 4

LogisticsLogistics

Term Project APIs

Lab and Homework Assignments Deadline HW 1: May 1

Exams Midterm: Thursday, May 3

Page 5: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 5

Finding Out AboutFinding Out About

[Belew 2000]

Page 6: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 6

Pre-TestPre-Test

Page 7: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 7

MotivationMotivation

Page 8: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 8

ObjectivesObjectives

Page 9: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 10

Finding Out AboutFinding Out About

Keywords Queries Documents Indexing

[Belew 2000]

Page 10: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 11

KeywordsKeywords

linguistic atoms used to characterize the subject or content of a document words pieces of words (stems) phrases

provide the basis for a match between the user’s characterization of information need the contents of the document

problems ambiguity choice of keywords

[Belew 2000]

Page 11: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 12

QueriesQueries formulated in a query language

natural language interaction with human information providers

artificial language interaction with computers

especially search engines

vocabulary controlled

limited set of keywords may be used

uncontrolled any keywords may be used

syntax often Boolean operators (AND, OR) sometimes regular expressions

[Belew 2000]

Page 12: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 13

DocumentsDocuments

general interpretation any document that can be represented digitally

text, image, music, video, program, etc.

practical interpretation passage of text

strings of characters in an alphabet written natural language length may vary

longer documents may be composed of shorter ones

Page 13: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 14

Aboutness of DocumentsAboutness of Documents

describes the suitability of a document as answer to a query

assumptions all documents have equal aboutness

the probability of any document in a corpus to be considered relevant is equal for all documents

simplistic; not valid in reality

a paragraph is the smallest unit of text with appreciable aboutness

[Belew 2000]

Page 14: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 15

Structural Aspects of DocumentsStructural Aspects of Documents

documents may be composed of documents paragraphs, subsections, sections, chapters, parts footnotes, references

documents may contain meta-data information about the document not part of the content of the document itself may be used for organization and retrieval purposes can be abused by creators

usually to increase the perceived relevance

Page 15: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 16

Document ProxiesDocument Proxies

surrogates for the real document abridged representations

catalog, abstract

pointers bibliographical citation, URL

different media microfiches digital representations

Page 16: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 17

IndexingIndexing

a vocabulary of keywords is assigned to all documents of a corpus

an index maps each document doci to the set of keywords {kwj} it is aboutIndex : doci about {kwj}Index-1 : {kwj} describes doci

indexing of a document / corpus manual: humans select appropriate keywords automatic: a computer program selects the keywords

building the index relation between documents and sets of keywords is critical for information retrieval

[Belew 2000]

Page 17: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 18

FOA Conversation LoopFOA Conversation Loop

[Belew 2000]

Page 18: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 19

Data RetrievalData Retrieval

access to specific data items access via address, field, name typically used in data bases user asks for items with specific features

absence or presence of features values

system returns data items no irrelevant items

deterministic retrieval method

Page 19: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 20

Information Retrieval (IR)Information Retrieval (IR)

access to documents also referred to as document retrieval

access via keywords IR aspects

parsing matching against indices retrieval assessment

Page 20: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 21

Diagram Search EngineDiagram Search Engine

[Belew 2000]

Page 21: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 22

ParsingParsing

extraction of lexical features from documents mostly words

may require some manipulation of the extracted features e.g. stemming of words

used as the basis for automatic compilation of indices

[Belew 2000]

Page 22: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 23

Matching Against IndicesMatching Against Indices

identification of documents that are relevant for a particular query

keywords of the query are compared against the keywords that appear in the document either in the data or meta-data of the document

in addition to queries, other features of documents may be used descriptive features provided by the author or cataloger

usually meta-data

derived features computed from the contents of the document

[Belew 2000]

Page 23: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 24

Retrieved and Relevant DocumentsRetrieved and Relevant Documents

recall |retrieved relevant| / |relevant|

precision |retrieved relevant| / |retrieved|

[Belew 2000]

Page 24: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 25

Specificity vs. ExhaustivitySpecificity vs. Exhaustivity

[Belew 2000]

Page 25: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 26

Vector SpaceVector Space

interpretation of the index matrix relates documents and keywords

can grow extremely large binary matrix of 100,000 words * 1,000,000 documents sparsely populated: most entries will be 0

can be used to determine similarity of documents overlap in keywords proximity in the (virtual) vector space

associative memories can be used as hardware implementation extremely fast, but expensive to build

[Belew 2000]

Page 26: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 27

Vector Space DiagramVector Space Diagram

[Belew 2000]

Page 27: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 28

Document RetrievalDocument Retrieval

[Belew 2000]

Page 28: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 29

Retrieval AssessmentRetrieval Assessment

subjective assessment how well do the retrieved documents satisfy the request of

the user

objective assessment idealized omniscient expert determines the quality of the

response

[Belew 2000]

Page 29: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 30

Retrieval Assessment DiagramRetrieval Assessment Diagram

[Belew 2000]

Page 30: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 31

Relevance FeedbackRelevance Feedback

subjective assessment of retrieval resultsoften used to iteratively improve retrieval resultsmay be collected by the retrieval system for

statistical evaluationcan be viewed as a variant of object recognition

the object to be recognized is the prototypical document the user is looking for this document may or may not exist

the difference between the retrieved document(s) and the idealized prototype indicates the quality of the retrieval results

[Belew 2000]

Page 31: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 32

Relevance Feedback in Vector SpaceRelevance Feedback in Vector Space relevance feedback is used to

move the query towards the cluster of positive documents moving away from bad documents

does not necessarily improve the results

it can also be used as a filter for a constant stream of documents as in news channels or similar

situations

[Belew 2000]

Page 32: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 33

Query Session ExampleQuery Session Example

[Belew 2000]

Page 33: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 34

Consensual RelevanceConsensual Relevance

relevance feedback from multiple users identifies documents that many users found useful or

interesting used by some Web sites related to collaborative filtering can also be used as an evaluation method for search

engines performance criteria must be carefully considered

precision and recall, plus many others

[Belew 2000]

Page 34: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 35

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

IR DiagramIR Diagram

Term 1

Term 2

Term 3

Term 4

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

DocumentsQueryIndex

Corpus

Ke

ywo

rds

Page 35: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 36

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

IR DiagramIR Diagram

Term 1

Term 2

Term 3

Term 4

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

DocumentsQueryIndex

Corpus

Ke

ywo

rds

Page 36: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 37

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

IR DiagramIR Diagram

Term 1

Term 2

Term 3

Term 4

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

DocumentsQueryIndex

Corpus

Ke

ywo

rds

Page 37: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 38

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

IR DiagramIR Diagram

Term 1

Term 2

Term 3

Term 4

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

DocumentsQueryIndex

Corpus

Ke

ywo

rds

Page 38: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 39

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

IR DiagramIR Diagram

Term 1

Term 2

Term 3

Term 4

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

DocumentsQueryIndex

Corpus

Ke

ywo

rds

Page 39: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 40

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

KR DiagramKR Diagram

Term 1

Term 2Term 3

Term 4

Ke

ywo

rds

Documents

Query

Index

Doc. 5

Doc. 4

Doc. 3

Doc. 2

Doc. 1

Corpus

Term A

Term B

Term E

Term M

Term D

Term JTerm ITerm H

Term F

Term C

Term GTerm K

Term LOntology

Page 40: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 41

Knowledge RetrievalKnowledge Retrieval

Context Usage

Page 41: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 42

Context in Knowledge RetrievalContext in Knowledge Retrieval

in addition to keywords, relationships between keywords and documents are exploited explicit links

hypertext

related concepts thesaurus, ontology

proximity spatial: place, directory temporal: creation date/time

intermediate relations author/creator organization project

Page 42: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 43

Inference beyond the IndexInference beyond the Index

determines relationships between documentscitations are explicit references to relevant

documents bibliographic references legal citations hypertext

exampleNEC CiteSeer <http://citeseer.nj.nec.com>

Page 43: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 44

Additional Information SourcesAdditional Information Sources

[Belew 2000, after Kochen 1975]

Page 44: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 45

HypertextHypertext

inter-document links provide explicit relationships between documents can be used to determine the relevance of a document for

a query example:

Google <http://www.google.com>

intra-document links may offer additional context information for some terms footnotes, glossaries, related terms

Page 45: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 46

Adaptive Retrieval TechniquesAdaptive Retrieval Techniques

fine-tuning the matching between queries and retrieved documents learning of relationships between terms

training with term pairs (thesaurus) pattern detection in past queries automatic grouping of documents according to common features

clustering of documents

Page 46: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 47

Document ClassificationDocument Classification

Page 47: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 48

Query ModelQuery Model

query types (templates) frequently used types of queries

e.g. problem/solution, symptoms/diagnosis, problem/further checks, ...

category types abstractions of query types used to determine categories or topics for the grouping of

search results

context information current working document/directory previous queries

[Pratt, Hearst, Fagan 2000]

Page 48: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 49

Terminology ModelTerminology Model

individual terms are connected to related terms thesaurus/ontology

synonyms, super-/sub-classes, related terms

identifies labels for the category types

[Pratt, Hearst, Fagan 2000]

Page 49: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 50

MatchingMatching

categorizer determines the categories to be selected for the grouping

of results assigns retrieved documents to the categories

organizer arranges categories into a hierarchy

should be balanced and easy to browse by the user

depends on the distribution of the search results

[Pratt, Hearst, Fagan 2000]

Page 50: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 51

ResultsResults

retrieved documents are grouped into hierarchically arranged categories meaningful for the user the categories are related to the query the categories are related to each other all categories have similar size

not always achievable due to the distribution of documents

reduced search times higher user satisfaction

[Pratt, Hearst, Fagan 2000]

Page 51: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 52

DynaCat DynaCat

knowledge-based approach to the organization of search results

categorizes results into meaningful groups that correspond to the user’s query uses knowledge of query types and of the domain

terminology to generate hierarchical categoriesapplied to the domain of medicine

MEDLINE is an on-line repository of medical abstracts 9.2 million bibliographic entries from 3800 journals PubMed is a web-based search tool

returns titles as an relevance-ranked list links to “related articles”

Page 52: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 53

DyanCat ResultsDyanCat Results

Page 53: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 54

DynaCat Query TypesDynaCat Query Types

Page 54: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 55

DynaCat Search DynaCat Search

Page 55: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 56

Information vs. Knowledge RetrievalInformation vs. Knowledge Retrieval IR

keywords as main components of the query

index as match-making facility statistical basis for selection of

relevant documents (ordered) list of results

KR keywords plus context

information for the query index plus ontology for

matching query and documents

relationships between keywords and documents influence the selection of relevant documents

results are grouped into meaningful categories

Page 56: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 57

KR DiagramKR Diagram

Page 57: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 58

Knowledge DiscoveryKnowledge Discovery

Data Mining Rule Extraction

Page 58: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 59

Page 59: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 60

Reference [Kearns 00]Reference [Kearns 00]

[Kearns 00]

Page 60: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 61

Reference [Sommerville 01] Reference [Sommerville 01]

[Sommerville 01]

[Sommerville 01]

Page 61: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 62

Post-TestPost-Test

Page 62: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 64

Important Concepts and TermsImportant Concepts and Terms natural language processing neural network predicate logic propositional logic rational agent rationality Turing test

agent automated reasoning belief network cognitive science computer science hidden Markov model intelligence knowledge representation linguistics Lisp logic machine learning microworlds

Page 63: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 65

Summary Chapter-TopicSummary Chapter-Topic

Page 64: © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 66