Post on 21-Dec-2015
Information Access IAn Introduction to Information Retrieval
GSLT,
Göteborg, September 2003
Barbara Gawronska, Högskolan i Skövde
Topics:
1st intensive week: Introduction Knowledge representations for IA Text categorization, indexing (theory + labs) Text summarization User aspects
2nd intensive week: Interactivity Multilingual systems and resources Evaluation
Schedule and content
Thursday 11/9
8-10 BG: An Introduction to Information Retrieval
Central notions: Information/knowledge/data/metadata Information Retrieval vs. Data Retrieval Information Extraction, abstracting, summarization
A survey of the history of Information Retrieval (Standard IR-models)
Schedule and content 2
10-12 BG: Representation of information and identification of significant text features
(Standard IR-models) Different types of information and knowledge representation: classical knowledge representation methods ( top-down and
bottom-up hierarchical classifications, thesauri) weighting techniques and co-occurrence-based techniques
The notion of Retrieval Status Value (rsv) and methods for rsv-evaluation.
Schedule and content 3
15-17 HD: User aspects in information retrieval
and text categorization
presentation of search results using KWIC, Key-word-in-context, marking up words in documents, automatic dynamic spell checking of the search query, term expansion and synonym search
categorization and clustering of texts
Schedule and content 4
Friday 12/98-10 HD: Information extraction and automatic text
summarization information extraction techniques for text summarization
(statistics, linguistics and heuristics) a demonstration of SweSum evaluation of automatic text summarization systems.
10-12, 13-15 ED: Testing automatic indexing with predefined categories (labs)
Shannon’s and Weaver’s definition of information (1959)
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1 2 4 16 32
the number of symbols in a code
ii ppnInformation
i
log1
Data – Knowledge – Information...(Fuhr 1995, modified)
Metadata
Data Knowledge Information
Syntactic level:
Organized symbols,values of attributes
Semantic level: meaning
representation
Pragmatic level:
”Knowledge in action”
Information Access
Information Recovery Information Retrieval(Information recovery /= Information Retrieval ; Information
Retrieval includes a selection process)
Information Refinement:Multilingual Information Retrieval Information ExtractionAbstracting, summarization
Data Retrieval vs. IR(Rijsbergen 1979)
RelevantMatching thequery
Search object
Best matchExactMatching
PartialFullQueryspecification
Natural (as agoal
FormalQuery language
ProbabilisticDeterministicModel
InductiveDeductiveInference
IRDR
RelevantMatching thequery
Search object
Best matchExactMatching
PartialFullQueryspecification
Natural (as agoal)
FormalQuery language
ProbabilisticDeterministicModel
InductiveDeductiveInference
IRDR
Data Retrieval vs. IR (2)(the German IR Research Group)
IR systems have to handle ”uncertain knowledge” (”unsicheres Wissen”):
Vague queries; reformulation frequently required
The problem of the user’s own understanding of his/hers information need
Limitations of knowledge representations
The history of IR in brief
The early IR – ”a history of how indexes were created and searched” (Meadow et al 2000:20)
Index – a broad definition: a systematic scheme that places like material together
Thus, books arranged in alphabetical order can also be seen as an index
The history of IR in brief (2)
Pre-alphabetical systems: ca 2700 A.C., the Sumerian culture: grouping
by similarity among initial ideograms (birds, bowls, trees...)
ca 1500 A.C. – first phonetic based systems (syllable-based, later – phoneme-based)
The history of IR in brief (3) First attempts to utilize letter frequency in
search: Arabs, 9th century Categorization of documents in ancient libraries:
Babylonian ”libraries” 12th-7th century A.C.:categories like astronomy, geography, history, mathematics, natural science, laws and...linguistics
The Alexandrian library; Callimachus (310-240 A.C):8 categories; subject matter and genre as criterions(history, laws, medicine, philosophy,lyric poetry, oratory, tragedy – a catalogue in 120 scrolls - pinakes )
The history of IR in brief (5) 1876 – Melvil Devey, USA – Devey Decimal
Classification (DDC) Universal Decimal Classification (UDC, Otlet & Lafontaine)
10 main classeshierarchical organizationmax 10 branches from 1 nodetoday 130 000 classes
The history of IR in brief (6)
Universal Decimal Classification, an example:
3 Social science, laws, administration33 National economics
336 Finances336.7 Banking 336.76 Stock exchange 336.763 Share market
The history of IR in brief (7) Card catalogues – 18th/19th century
How to represent the content of a document
in an index? Precoordination vs. postcoordination
of index terms
The history of IR in brief (8) 1950 –
M. Taube – the Uniterm systems W.E. Batten – the optical coincidence system
(in both systems, the TERM serves as the starting point)
C. Moores – the Zatocode system
(cards represent DOCUMENTS and are provided with descriptors, coded as series of holes at the edge of the card)
The history of IR in brief (9)
1950- First computerized IR systems (special purpose
computers): The Western Reserve Rapid Searching Selector (1957,
Shera, Kent & Berry) Based on human-created telegraphic abstracts Aimed at technical texts Semantic categories like product, process, material...
The Minicard Selector (1959, Kessel & DeLucia)
The history of IR in brief (10)
Late 1950s - early 1970s: first IR-systems on general purpose computers
(Bracken & Tillit 1957) computerized IR should become more than simple
string matching: the idea of utilizing word frequencies and inverse document frequency (idf) – Luhn, Bar Hillel
first online IR services (MAC at MIT, MEDLINE, Lexis/Nexis)
The Internet (4 hosts 1967)
IR today and in the future
From simple string matching towards NLP-techniques (statistics/heuristics/morphology/semantics/pragmatics)
Natural language in queries Integrating speech technology Multilingual retrieval and extraction Multimedial retrieval
A General Model of an IR system (Fuhr 1995:11)
Data Analysis Retrieved Information
Knowledgerepresentation Transformations
Information Retrieval
Internal KnowledgeStructures
A Basic Model of a Document Retrieval System (Fuhr 1995:11)
Document AnalysisRetrieved Documents orDocument Information
Indexing, Classification,Clustering Retrieval operations
(Boolean or stochastic)
Document Retrieval
Data Bank Structures
A document from different perspectives (Meghini et al. 91, modified)
Artikel ur NyttI T
Grundskoleprojektet – sammanfattning av detförsta året2003-09-05 FU-kanslietJ ohanna Österberg
Sedan ett år tillbaka driver Högskolan rekryteringsprojektet’Grundskolans elever – våra framtida studenter’.
Genom att på olika sätt nå ut med information om högskolestudier tillgrundskoleelever är målet att avdramatisera och väcka intresse för högrestudier i allmänhet och Högskolan i Skövde i synnerhet. Syftet är attöppna upp högskolans värld, öka mångfalden och minskasnedrekryteringen.
KlassbesökUnder hösten 2002 samarbetade Högskolan med Vasaskolan i Skövde ochCentralskolan i Töreboda. På båda skolorna träffade personal ochstudenter från Högskolan alla avgångsklasser under ungefär en timme föratt diskutera framtiden och olika valmöjligheter i livet. Även skillnadermellan att läsa på högstadiet/gymnasiet och högskola diskuterades.Sammanlagt deltog ungefär 200 elever i dessa träffar. Även föräldrarnatill dessa elever fick en kort information om högskolestudier i sambandmed föräldramöten om gymnasievalet.
Layout”Logical” stucture
(head, title, autor…)Semantics