LIS 7450, Searching Electronic Databases
description
Transcript of LIS 7450, Searching Electronic Databases
![Page 1: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/1.jpg)
LIS 7450, Searching Electronic Databases
Basic: Database Structure & Database Construction
Dialog: Database Construction for Dialog (FYI)
Deborah A. Torres
![Page 2: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/2.jpg)
Database Structure
Organization of Data Elements and records
![Page 3: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/3.jpg)
Database Record
Record – basic unit of information in a database (file). Example: Bibliographic record contains
description information, i.e. author, title, publisher etc.
![Page 4: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/4.jpg)
Fields
Field – a distinct part or section of a record (a unit of information within the record) Example of personnel record fields:
employee’s name, special identifier number, address, date of hire etc.
![Page 5: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/5.jpg)
Field Design Decisions
For each field Decide what information is placed within
that field & format for that information (text, numeric)
Should there be subfields within a field? What to call the fields? Field codes (abbreviations, numbering) Order of the fields
![Page 6: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/6.jpg)
Example: MARC Record (a type of record you should be familiar with)
Record Fields & CodesThe 100 field
contain author information.The 245 field contains main title information.
![Page 7: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/7.jpg)
Other Design Decisions
Hyphenated words Home-school
Stop words High frequency words not useful for searching
Single words and phrases Library, library science, color of money
Alternative spellings of words Color, colour
![Page 8: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/8.jpg)
Types of Databases
Bibliographic – references and abstracts of published documents
Fulltext – complete text of articles, dictionary entry, code of law, or other such document.
Directory – factual information about organizations, companies, products, people, or materials.
![Page 9: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/9.jpg)
Types of Databases
Numeric – data in a tabular or statistically manipulated form, often with some added text.
Hybrid – a mix of record types. For example, a database may have full-text records for some publications and citations and abstracts for other source documents.
![Page 10: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/10.jpg)
Database Construction
Basic Steps for automatic indexing of text documents
![Page 11: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/11.jpg)
Six Basic StepsStep 1: Parse text into wordsStep 2: Compare to stoplist and eliminate
stopwordsStep 3: Stem content words (reduce to root
words) (skip this step if decide not to stem)
Step 4: Count stemmed word occurrencesStep 5: Create union list of termsStep 6: Create data structure for specific
retrieval techniques (i.e. an inverted file)
![Page 12: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/12.jpg)
Example: Simple Set of 5, One-sentence documents
D1: It is a dog eat dog world!D2: While the world sleeps.D3: Let sleeping dogs lie.D4: I will eat my hat.D5: My dog wears a hat.
“D” stands for document
![Page 13: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/13.jpg)
Step 1: Parse Text into WordsD1:itisa dogeatdogworld
D2:whiletheworldsleeps
D3:letsleepingdogslie
D4:Iwilleatmyhat
D5:mydogwearsahat
Note: Some databases remove punctuation for words, like possessives; others preserve it. What difference would this make?
![Page 14: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/14.jpg)
Step 2: Eliminate Stop WordsD1:dogeatdogworld
D2:worldsleeps
D3:letsleepingdogslie
D4:eathat
D5:dogwearshat
Stop words are content-free words – those not useful in determining the content of the document.Examples: pronouns (I, my), prepositions (of, by, on), articles (a, the, this)
![Page 15: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/15.jpg)
Step 3: Stemming (remember not all databases stem words)
D1:dogeatdogworld
D2:worldsleeps
D3:letsleepingdogslie
D4:eathat
D5:dogwearshat
D1:dogeatdogworld
D2:worldsleep
D3:letsleepdoglie
D4:eathat
D5:dogwearhat
![Page 16: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/16.jpg)
Types of Stemming DecisionsNo Stemming:contractcontractscontractedcontractingcontractorcontractioncontractualcontracture
Weak Stemming:Inflections: -s, -es, -ed, -ing, -’s
Strong Stemming:Derivations: -tion, -ly, -ally
Reduce words to a root variant; there are different stemming algorithms
![Page 17: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/17.jpg)
A bit more about stemming for searching…
Some databases automatically search for all of the words that come from the same stem/root word unless you indicate that you only want the word you entered.
Example: if you entered computer, the database would also search for computing, computers, computation, etc.
![Page 18: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/18.jpg)
Step 4: Sort Words, Count DuplicatesD1:dogdogeatworld
D2:sleep world
D3:dogletliesleep
D4:eathat
D5:doghat wear
D1:dog(2)eatworld
D2:sleep world
D3:dogletliesleep
D4:eathat
D5:doghat wear
Sort into Alpha order
Count any duplicate
s
![Page 19: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/19.jpg)
Step 5: Create Union List of Unique TermsUnsorted List
dogeat
world sleep world dogletlie
sleep eathat doghat wear
Sorted List dogdogdogeateathat hat letlie
sleep sleep wearworld world
Sorted, Unique List
dogeathatletlie
sleepwearworld
![Page 20: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/20.jpg)
Step 6: Create Inverted Index (inverted file)
dogeathatletliesleepwearword
Union List Unique terms
dog: D1 D3 D5eat: D1 D4hat: D4 D5let: D3lie: D3sleep: D2 D3wear: D5word: D1 D2
Inverted Index: has pointers to documents in which word occurs
Inverted Index
![Page 21: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/21.jpg)
Dialog Database Construction
FYI: For those interested in Dialog
![Page 22: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/22.jpg)
Dialog Database Construction
Step 1: Create a linear file of records received from the Information Provider. Assign sequential accession numbers to the records.
Step 2: Label the fields within the records: AU for Author, TI for Title, etc. If a field is word-indexed, also label the words within each field. Exclude stop words: AN FOR THE AND FROM TO BY WITH
![Page 23: LIS 7450, Searching Electronic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681625c550346895dd2b432/html5/thumbnails/23.jpg)
Dialog Database Construction
Step 3: Create the Basic Index: all words and phrases from fields containing subject-related terms.
Step 4: Create the Additional Indexes: all terms from all remaining fields.