Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

52
Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Transcript of Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Page 1: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Exploring the Deep Web

Peter L. Kraus

J. Willard Marriott Library – University of Utah

Page 2: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

What is the Deep Web?

The deep Web is the hidden part of the Web, containing a huge volume of content that is inaccessible to conventional search engines, and consequently, to most users.

Page 3: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

How big is the Deep Web?

• 550 billion documents

• 500 times the content of the surface Web

• Google has identified 1.2 billion documents

• An Internet search typically searches .03% (1/3000) of available content.

Page 4: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

What’s in the Deep Web?

• Searchable databases

• Downloadable files & spreadsheets

• Image and multi-media files

• Data sets

• Various file formats such as .pdf

• Lots of government information

Page 5: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Why use the Deep Web?

• Higher quality sources– Selected and organized by subject experts

• Dynamic display

• Customized data sets

• Some data is visual, and not word searchable

• Regular search engines miss vast resources available in the Deep Web

Page 6: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Why are we talking about Government Sites in the Deep

Web?

• Governments have the mandate and the capacity to gather information that individuals don’t

• Most government information is copyright free

• Government information is authoritative• Governments have the financial and

human resources to maintain Deep Web sites

Page 7: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 8: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

The Web Today

• Web sites from the federal government only occupy about 1% of the entire global web. However, they hold 85% of “The Deep Web”.

• The content of these web sites include items with either an .html or .pdf format (reports, records, data-sets, etc) – diversity of files. Little standardization or uniformity ; Common term for this content is “Grey Literature”.

Page 9: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Definition of “Grey Literature”

• “That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers”

Page 10: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Growth and Life of Federal Information

• On federal web sites the amount of information grew 13-fold between 1992-2003

• The average life expectancy of federal web resource is 4 months (2003)

Page 11: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

What can libraries do?

• LOCKSS-DOCS project (BYU and UU are members) (Archival project)

• Cooperative efforts in specific subject areas (Western Waters Digital Library)

• Individual Institutional Initiatives; such as Institutional Repositories ; reflecting the institutional productivity in research (Information often funded by federal grants)

Page 12: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 13: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 14: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 15: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 16: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 17: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 18: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 19: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 20: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 21: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Finding Naked People - Forsyth, Fleck (1996)   (Correct)   (54 citations)

This paper demonstrates an automatic system for telling whether there are naked people present in an image. The approach combines color and texture properties to obtain a mask for skin regions, which is shown to be effective for a wide range of shades and colors of skin.

http.cs.berkeley.edu/~daf/newo2.ps.Z

Page 22: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Graph showing number of citations to “Finding Naked People”

Page 23: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 24: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Arches National Park : NASA Landsat 7 10/3/99

Page 25: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 26: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 27: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

searching for ""University of Utah""

displaying records 1 - 25 of a total of 27

next 25 last 25

Development and Evaluation of Stitched Sandwich PanelsLarry E. Stanley; Daniel O. AdamsNASA Langley Research CenterNASA/CR-2001-211025 , June 2001; 20010702

….. test panels were produced initially at the University of Utah and later at NASA Langley Research Center……

http://techreports.larc.nasa.gov/ltrs/PDF/2001/cr/NASA-2001-cr211025.pdf

Page 28: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 29: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 30: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 31: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 32: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 33: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Marriott Library, Salt Lake City, Utah, United States 9/18/2003 (TerraServer)

Page 34: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 35: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Utah Seismic Hazards (National Atlas)

Page 36: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

International Deep Web Resources

• International organizations collect an amazing amount of data

• Statistical data is often best organized in database and spreadsheet format

• Like the US Government, individual countries post data files and databases

• This information may not be available in print sources in schools and libraries

Page 37: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

United Nations Official Documents System

• http://documents.un.org/

Page 38: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Why use the ODS?

• Full-text Official United Nations Documents (1993 -) online, free

• Retrospective digitization in process

• Highly relevant material for almost any international topic

• Timely and authoritative

Page 39: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 40: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 41: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 42: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

United Nations Statistical Databases

• Value of the information:– Authoritative – Comparative– Time series– Compact

• Database topics include:

• Commodity trade• Demographics• Disability statistics• Social indicators• Statistics on men and

women

Page 43: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

http://unstats.un.org/unsd/databases.htm

Page 44: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 45: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 46: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 47: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Individual Country Statistics

• http://www.census.gov/main/www/stat_int.html

Page 48: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Why use this kind of information?

• Aggregate statistical sources are often not as up-to-date

• Individual countries are often more specific in their indicators than aggregate sources

• Information in databases, spreadsheets, and downloadable files is usually NOT searchable by web crawlers

Page 49: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 50: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 51: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.
Page 52: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

For Further Information

• Marriott Library, University of Utah

801-581-8394

www.lib.utah.edu/documents

[email protected]