Automatic Schema Matching Seminar on Databases and the Internet Yaron Naveh January 2006.
Commercial Online Databases and the Internet
-
Upload
abdul-nash -
Category
Documents
-
view
19 -
download
2
description
Transcript of Commercial Online Databases and the Internet
![Page 1: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/1.jpg)
Commercial Online Databases and the
Internet
OSS ‘99
Global Information ForumMay 24, 1999
Anne Caputo
Dow Jones Interactive Publishing
![Page 2: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/2.jpg)
Traditional Search Services Challenge the Web
The Internet Searchoff• September 1997-February 1998• Susan Feldman, DATASEARCH
GoalCompare searching traditional online
services with World Wide Web • Effectiveness in finding information
• When to use which one
• Strengths of each approach
![Page 3: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/3.jpg)
Searchoff Ground Rules
Be a trained, experienced searcherUse a real question from a clientSearch either Dialog or Dow Jones
InteractiveRelevance rank the results Rank the top 30 retrieved documents on a
scale of 1 to 5
![Page 4: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/4.jpg)
Subjects Searched
Business Technology Medicine/Pharmaceuticals Science Humanities Engineering Other
38%
18%
14%
10%
8%
6%
6%
![Page 5: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/5.jpg)
Web Search Engines Used
Alta Vista Hotbot Excite Infoseek Lycos Webferret
45%
20%
14%
14%
5%
2%
![Page 6: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/6.jpg)
0
200
400
600
800
1000
1200
1400
Relevance Points # Documents
Internet Search-Off Results
Web totals
Dlg/dj totals
W D
DW
484515
1400
1143
![Page 7: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/7.jpg)
Searching time
Total minutes searching time: DIALOG/DOW JONES: 594
minutes WWW search engines: 1230
minutes Plus formatting time
![Page 8: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/8.jpg)
Searching Assumptions:traditional search engines
Information exists on the subjectThe information is high qualityThe information is currentThe information is expensive
To find it, we need expertise and training to know how and where to search
It will be a surprise if we can’t find something
![Page 9: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/9.jpg)
Searching assumptions:World Wide Web
There MIGHT be information on the topicQuality and timeliness is unpredictableThe information is freeThere’s no telling how the search engine works
searching requires no skill searching requires no training
It will be a surprise if we find something
![Page 10: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/10.jpg)
Retrieved Documents by Relevance
350
306
38 34 26
147
52
108
60
111117
0
50
100
150
200
250
300
RANKED 1 RANKED 2 RANKED 3 RANKED 4 RANKED 5
Less Relevant More Relevant
Series1
Series2Web
-- DIALOG/Dow Jones
W w W
W
D
D
D
D
![Page 11: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/11.jpg)
Conclusion
DIALOG training has influenced an entire generation of searchers: we automatically shift into Boolean
![Page 12: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/12.jpg)
Digression:
Nested Boolean searches don’t take advantage of the strong points of Web search engines
Statistical search engines search a whole territory. Boolean engines search for a point in that territory
![Page 13: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/13.jpg)
Web Strategies
Map the territory: Use your searching skills to create lists of
related termsOmit Boolean operators;
Let the search engine work without interference
Put the most important and most rare words first
Use MORE LIKE THIS to improve results
![Page 14: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/14.jpg)
Web Strategies
Use phrases when possible to eliminate irrelevant materials
Ignore the useless hits and pursue the good ones
Don’t worry about finding six million documents. Just look at the top 30Rephrase the search Move to another search engine if you don’t
find anything
![Page 15: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/15.jpg)
Conclusions: traditional search services
Predictable archivesChemical EngineeringElectrical Engineering
StrengthsHistory and background on companies History and historical figuresMarket reports, industry reports
![Page 16: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/16.jpg)
Conclusions: traditional search services
Current drug studies (authoritative) Industry newsletters and journals Financial industry coverage Scholarly journal articles High quality information Quick searches when you know the information
is likely to be there
![Page 17: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/17.jpg)
Conclusions: The Web
Pictures and illustrationsSome conference coverage and papersProduct information comes from companySmall companies – products/ backgroundMedical statistics (current) If you know where to find the information
![Page 18: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/18.jpg)
Conclusions: use both
To supplement each other for: Standards Articles on topics of general interest Popular subjects Organizations Directory information Reviews/evaluations/how-to information
![Page 19: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/19.jpg)
Government regulations and other agency information
Competitive intelligenceObscure topicsClues for finding information on and offline
Conclusions: use both
![Page 20: Commercial Online Databases and the Internet](https://reader036.fdocuments.in/reader036/viewer/2022080917/568131a5550346895d981500/html5/thumbnails/20.jpg)
Conclusions: general
Time is money. Free information that takes too long to
find and format is expensive information The Web is a new tool.
We need to learn to use both online sources well
Vary strategies and approach to take advantage of each medium