Metadata and the web
-
Upload
richardsapon-white -
Category
Documents
-
view
338 -
download
3
description
Transcript of Metadata and the web
![Page 1: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/1.jpg)
Richard Sapon-WhiteMarch 18, 2013
![Page 2: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/2.jpg)
The growth of the Web Metadata in the context of the Web Important metadata schemes: XML, HTML,
MARC
![Page 3: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/3.jpg)
From 1996-2007:◦ 77,138 Web sites 125 million Web sites
Access provided through search engines◦ Google is the most used search engine
Search engines use Web crawlers (a.k.a., spiders or robots) to collect information on web sites◦ Copy web pages and locations to build a catalog
of indexed pages
![Page 4: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/4.jpg)
=Invisible Web, Deep Web Web crawlers cannot:
◦ submit queries to databases, ◦ parse file formats that they do not recognize, ◦ click buttons on Web forms, or◦ Log-in to sites requiring authentication
Therefore, much of the information on the Web is invisible!◦ How much is invisible? ◦ Thousands of times larger than the
indexed/visible web!!
![Page 5: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/5.jpg)
• Topic Databases — subject-specific aggregations of information, such as SEC corporate filings, medical databases, patent records, etc.
• Internal site — searchable databases for the internal pages of large sites that are dynamically created, such as the knowledge base on the Microsoft site.
• Publications — searchable databases for current and archived articles.• Shopping/Auction.• Classifieds.• Portals — broader sites that included more than one of these other categories in searchable
databases.• Library — searchable internal holdings, mostly for university libraries.• Yellow and White Pages — people and business finders.• Calculators — while not strictly databases, many do include an internal data component for
calculating results. Mortgage calculators, dictionary look-ups, and translators between languages are examples.
• Jobs — job and resume postings.• Message or Chat .• General Search — searchable databases most often relevant to Internet search topics and
information.From: Michael K. Bergman, "The Deep Web: Surfacing Hidden Value," Journal of Electronic
Publishing 7, no. 1 (August 2001). http://www.press.umich.edu/jep/07-01/bergman.html.
![Page 6: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/6.jpg)
Poor site design results in invisible web sites To create web sites for human and machine
retrieval:◦ Use hyperlinked hierarchies of categories ◦ Contribute Deep Web collections’ metadata to
union catalogs (which can then be indexed by search engines)
Google’s Sitemap can provide detailed list of pages on a site
http://www.sitemaps.org/ http://en.wikipedia.org/wiki/Help:Contents/Site_map
![Page 7: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/7.jpg)
Create conventional, MARC-based metadata Access via library catalogs, union catalogs Problems:
◦ Creating MARC records is labor-intensive, slow, expensive
◦ Web sites are dynamic (content, URL’s), require MARC records to be revised
Solutions:◦ Dublin Core◦ META tags◦ Resource Description Framework
![Page 8: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/8.jpg)
Dublin Core PowerPoint
![Page 9: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/9.jpg)
Embed 2 metadata elements in HTML <Head> section of web page◦ Keywords◦ Description
Example:◦ <META NAME="KEYWORDS" CONTENT="data
standards, metadata, Web resources, World Wide Web, cultural heritage information, digital resources, Dublin Core, RDF, Semantic Web"><META NAME="DESCRIPTION" CONTENT="Version 3.0 of the site devoted to metadata: what it is, its types and uses, and how it can improve access to Web resources; includes a crosswalk.">
![Page 10: Metadata and the web](https://reader036.fdocuments.in/reader036/viewer/2022070302/546f817eb4af9f490b8b4598/html5/thumbnails/10.jpg)