Download - The Invisible Web

Transcript
Page 1: The Invisible Web

The Invisible WebThe Invisible Web

Gary Price, MLISGeorge Washington University

Chris ShermanAssociate Editor

Search Engine Watch

Page 2: The Invisible Web

Your Browser

How Search Engines WorkHow Search Engines Work

The Web

URL1URL2

URL3 URL4

Crawler

Indexer

SearchEngine

Database Eggs?Eggs.

Eggs - 90%Eggo - 81%Ego- 40%

Huh? - 10%

All AboutEggsby

S. I. Am

Page 3: The Invisible Web

What is the Invisible Web?What is the Invisible Web?• “Stuff” that search engine crawlers

(spiders) can not -- or will not -- add to their databases

• 2 to 50 times larger than the visible Web

• Resources often much higher quality than the visible Web

Page 4: The Invisible Web

What is the Invisible Web?What is the Invisible Web?• Certain file formats (PDF, Flash,

Office files, streaming media)– Why? They aren’t HTML text

• Most real-time data (stock quotes, weather, airline flight info)– Why? Ephemeral & storage intensive

Page 5: The Invisible Web

What is the Invisible Web?What is the Invisible Web?• Dynamically generated pages

(cgi, javascript, asp, or most pages with “?” in URL)– Why? Spider traps

• Web accessible databases– Why? Spiders can’t type

Page 6: The Invisible Web

Invisible Web GatewaysInvisible Web Gateways• Intelliseek

– http://www.invisibleweb.com– http://beta.profusion.com

• Complete Planet– http://www.completeplanet.com/

• Librarians’ Index to the Internet– http://www.lii.org

Page 7: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

The Need For Knowledge!• Awareness that the IW Exists

Maybe the IW Hold the Content Your Users Can’t Find! What is the cost in both wasted time/effort and total frustration?

• Let Others Know About the IW• Awareness of The Synonyms

– Invisible Web– Deep Web– Hidden Web

• Let the Content be Your Calling CardFocus Less on the Amount IW Data

Page 8: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

Why is the IW Useful to the Librarian and the End User?

• Quality of Content (Authority)• Deep Content on Subject Area

(Comprehensiveness) • Focused Databases (Limited Scope)

Smaller Universe of Documents to Search (Maximize Precision/Recall)

Page 9: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

Why is the IW Useful to the Librarian & the End User?

• Material Unavailable Elsewhere on the Web Material Unavailable Elsewhere on the Web (Uniqueness)(Uniqueness)

• Many Options to Limit, Sort, Interact with the Many Options to Limit, Sort, Interact with the DataData(Maximize Precision)(Maximize Precision)

• Timeliness vs. Time Lag of General Search Timeliness vs. Time Lag of General Search Tools (Currency)Tools (Currency)

Page 10: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

The IW, The Librarian, The Future

• What Happens If/When the General Search Tools Crawl IW Material? Good News? Bad News?

• General Search Tools May NOT:Offer Many Interactive/Limiting ToolsMay Not be Updated/Refreshed (time lag) as FrequentlyTimeliness, making current info available is one of the things the NET does well.

Page 11: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

The IW, The Librarian, The Future

• The Search Engine Business, Will IW Material be a Priority?

• Just One Dialog or SilverPlatter Database?NO, in Terms of Content!!!

• Yes, Common Interface, SyntaxPerhaps XML will Assist

Page 12: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

Challenges

• It’s Not The Magic Bullet. It’s a Tool• We Still Need Traditional Online Databases• Learning Curve, Sorry!• Database Selection, When To Use the IW? • Numerous Interfaces, Syntax• A Non-Stop Flow of New Material

Page 13: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

Things To Do!

• Build Your Own CollectionsInternet Resource Collection Development

• Mine Entire Sites, Often the IW Material Gets Little or No Notice In Reviews

• Create Links When Possible DIRECT to the Interface.

• “Save the Time of the Web Researcher”• Keep Current

Page 14: The Invisible Web

The Invisible Web The Invisible Web & The Librarian& The Librarian

• Bibliographic- OPAC’s- Subject Bibs

• Non-Bibliographic- Full-Text- Numeric- Graphic- Directory- Real-Time

Types of IW Content in Librarian Terms

Page 15: The Invisible Web

Future TrendsFuture Trends• Killer apps will lead the way

– Research Index (CiteSeer)• Search engines will work harder to

“find” Invisible Web content– Inktomi (Index Connect, Ultraseek)– WhizBang (“wrappers”)

• No matter what, there will always be a problem!

Page 16: The Invisible Web

Coming SoonComing Soon

Available: July 2001 CyberAge Books 0-910965-51-

Xhttp://www.invisible-web.net

Page 17: The Invisible Web

Invisible Web:Invisible Web:Computer ScienceComputer Science

• MacAfee World Virus Map – http://www.mcafee.com

• ResearchIndex – http://www.researchindex.com

Page 18: The Invisible Web

Invisible Web:Invisible Web:Company ResearchCompany Research

• European High-Tech Industry Database – http://www.tornado-insider.com/

radar/• Kompass

– http://www.kompass.com

Page 19: The Invisible Web

Invisible Web:Invisible Web:Intellectual PropertyIntellectual Property

• Delphion Intellectual Property Network– http://www.delphion.com/

• ESP@CENET (European Patent Office) Patent Database – http://ep.espacenet.com/

Page 20: The Invisible Web

Invisible Web:Invisible Web:Dictionaries & LanguagesDictionaries & Languages

• EuroDicAutom – http://eurodic.ip.lu

• Verbix – http://www.verbix.com/index.html

Page 21: The Invisible Web

Invisible Web:Invisible Web:Art & ArtistsArt & Artists

• ADAM (Art, Design, Architecture & Media Information Gateway) – http://adam.ac.uk/

• Artcyclopedia – http://www.artcyclopedia.com/

Page 22: The Invisible Web

Invisible Web:Invisible Web:Real-Time InformationReal-Time Information

• Flight Tracker– http://www.trip.com/ft/home/

0,2096,1-1,00.shtml• J-Track 3-D Satellite Locator

– http://liftoff.msfc.nasa.gov/realtime/JTrack/Spacecraft.html

Page 23: The Invisible Web

Invisible Web:Invisible Web:Maps and Driving Maps and Driving

DirectionsDirections• MapBlast – http://www.mapblast.com

• Streetmap.co.uk– http://www.streetmap.co.uk/

Page 24: The Invisible Web

Invisible Web:Invisible Web:Government InfoGovernment Info

• Parline Database – http://www.ipu.org

• United Nations Daily Press Briefings– http://www.un.org/News/

Page 25: The Invisible Web

Invisible Web:Invisible Web:Health & MedicineHealth & Medicine

• Economics of Tobacco Control Database – http://www1.worldbank.org/tobacco/

database.asp• International Digest of Health

Legislation – http://www.who.int

Page 26: The Invisible Web

Invisible Web:Invisible Web:News & Current EventsNews & Current Events

• Cold North Wind Newspaper Archive Project – http://www.coldnorthwind.com

• Financial Times Global Archive – http://www.globalarchive.ft.com

Page 27: The Invisible Web

Invisible Web:Invisible Web:ScienceScience

• Great Barrier Reef Online Image Catalogue– http://www.gbrmpa.gov.au/corp_site/

info_services/library/index.html• Nuclear Explosions Database

– http://www.ausseis.gov.au/databases

Page 28: The Invisible Web

Invisible Web:Invisible Web:TransportationTransportation

• Equasis (Merchant Ships) – http://www.equasis.org/

• World Aircraft Accident Summary (WAAS) Fatal Airline Accident Subset– http://www.waasinfo.net/