Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of...
-
Upload
milo-sherman -
Category
Documents
-
view
213 -
download
0
Transcript of Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of...
Digital Libraries
Spring 2006, 1 March
Bharat MehraIS 520 (Organization and Representation of Information)
School of Information SciencesUniversity of Tennessee
Digital Libraries
What does the digital library concept mean to you
as a user as an information professional as an author
Is the Web a digital library? Why? Why not?
Your definition or notion?
Digital Libraries
What is the role of a librarian or information professional? How has this role changed in the context of digital libraries?
The Web: Implications for DLs
Ubiquitous information source: Why is the web “a much more engaging medium and teacher” than textbooks or a local librarian?
Identify pros and cons for specific situations in the different quadrants?
Finding Information on the Web
Web directories for browsing Yahoo! -- human indexers/catalogers
classificatory structure
Web search engines for queryingAltaVista, Google -- robots
automatically generated indexes
Combination of directory and engine
Paradigm shift
Classic IR Web IR
Collectionprofessionalsselection policy
polling (robot)
Representationdescriptionaccess points
full textmetadata
Searchalgorithms
master fileinverted indexes
non Booleanproprietary
Interfacegood functionalitycomplex
simplistictrade off
Digital Library Features
community based users extension and enhancement of classic IRs digital resources are multimedia: text,
images, sounds, etc. technical capabilities for creating,
searching, and using information distributed using networks (the Web, etc.)
Digital Library Features
content of digital libraries includes data, metadata that describe various aspects of the data
links (or relations) to other data or metadata (internal or external)
context portals to support individual users’ information needs and work tasks
Digital Library Projects
Digital Libraries Initiatives phase II <http://www.dli2.nsf.gov/>
LC American Memory Website <http://memory.loc.gov/>
standards <http://lcweb.loc.gov/standards/metadata.html>
Example Digital Libraries
The National Science Digital Library
http://nsdl.org/ Library portals extend and serve
classrooms, offices, laboratories, homes, and public spaces.
Information Theory (for DLs)Joseph Goguen: A theory of information should be
Useful for understanding and designing info systems (or DLs)
Address the meanings that users give to events, including social and political nuances
Address ethical issues Account for the fact that different individuals and
groups can construe meanings in very different ways
Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Science Research, Technical Systems and Cooperative Work, edited by Science Research, Technical Systems and Cooperative Work, edited by Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997). Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997).
Goguen’s Info Qualities Relevant to DLs1. Situated: Info can only be fully understood in relation to the particular,
concrete situation in which it actually occurs2. Local: Interpretations are constructed in some particular context, including
a particular time, place, and group3. Emergent: Info cannot be fully understood at the level of the individual, that
is at the level of the individual psychology, because it arises through ongoing interactions with other people/technologies
4. Contingent: Interpretation of info depends upon current situation, which may include the current interpretation of prior events
5. Embodied: Info is tied to documents/bodies in particular situations, so that the particular way that bodies are embedded in a situation may be essential to some interpretations
6. Vague: In practice, info is only elaborated to the degree that it is useful to do so; the rest is grounded in intangible knowledge
7. Open: Info cannot in general be given a final and complete form, but must remain open to revision in the light of future developments
“Wet” information: strongly situated, less mobile “Dry” information: Weakly situated; more mobile
Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Science Research, Technical Systems and Cooperative Work, edited by Science Research, Technical Systems and Cooperative Work, edited by Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997). Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997).
Issues of Text Representation in DLs
Storing textual materials is related to its: Structure (characters, words, paragraphs,
headings): Represented by mark-up, e.g., Standard Generalized Markup Language
Appearance (choice of format, size of font, margins, line spacing, how headings are represented, location of figures)” Page-description languages precisely describe the appearance, e.g., TeX, PostScript, Portable Document Format (PDF)
Alternative renderings of a single document
Converting Text
Scanning: Optical character recognition
Encoding characters: ASCII, Unicode
Document type definitions (DTDs) in the Text Encoding Initiative (TEI), Encoded Archival Description (EAD)
Three General Types of Metadata
1. Object-descriptor metadata (Dublin Core)Designed to describe global characteristics of entire objects with external references
2. Internal/Structural Metadata (HTML, XML, RDF)Designed to describe internal semantic structure of objects with internal and external references
3. Display Metadata (HTML, StyleSheets)Designed to describe how objects or parts of objects should be visualized or displayed. Not necessarily related to semantic structure
What is a Database?
A database is a collection of data that is organized so that its contents can easily be accessed, managed and updated. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network.
Relational Databases
A database system in which the database is organized and accessed according to the relationships between data items without the need for any consideration of physical orientation and relationship. Relationships between data items are expressed by means of tables.
Features of Databases • Collection of data stored together as a unit
• Databases are useful for storing data and making it available for retrieval
• Within the database, data is organized into different tables
• Each table has columns and rows. Indexes on tables provide speedy access to data
• Information in the database can be retrieved, modified, or deleted using a query language like SQL
• Some common database systems are Oracle, SQL Server, DB2, Sybase, etc.
Relational Database Model
Data is presented as a collection of relations
Each relation is depicted as a table
Columns are attributes
Rows represent entities
Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity
Relational Database Model
Views in a database
Company maintains a database of its employees
• Other attributes of its employees: age, salary, emergency contacts, appraisal, etc.
• Different needs for different applications of the database: e.g., company may need to make available demographic data to a governmental agency
• Only some attributes need be supplied - and others ought not to so as to protect privacy: different views can be provided into the same data
Database Design Identify entities that we are dealing with, their various attributes, and
their relationships
An entity is some object with a real or conceptual existence in the world -- tofu, Advanced Java Class, Guggenheim Museum, Elaine, company
Attribute is a property of an entity -- address, size, mother, age
A relational column is an attribute
A relationship defines roles in which entities work together -- "Bill WORKS-FOR Motorola", "jbs TEACHES advanced-java"
RDBMSs represent relationships as tables
Database Design as ER Diagrams Rectangles represent entity types, diamonds relationship types, and ovals attributes. Underlined attribute names represent keys
Rectangles: Object/concept nounsDiamonds: VerbsOvals: Characteristics
Functions: Join
Microsoft Access provides a graphical user interface that makes it very easy to define and manipulate databases. E.g., membership records in an organization
Access allows you to define and then store a set of queries and give these queries names that are meaningful to you. Note the Tables and Queries tabs in particular (Reports is useful for generating hardcopy output, such as mailing labels).
Tables in Microsoft Access
Final Projectso Two-student teams work on projects for the DiscoverET.org
or develop their own
o Each team will present final results to the class during a public forum and produce a document of the project
o Information Organization and Representation Portfolio (IORP) Includes analysis and/or commentary related to class topics
o Intellectual works and their manifestations, metadata standards in various environments, cataloging and authority control, metadata coding and crosswalks, digital library development, subject access and vocabulary control, concept mapping, indexing and abstracting, classification systems, cognitive category analysis, system design
o Evaluation based on : Creativity of project outcomes (recommendations/ solutions proposed), Relevance and practicality of implementation, Thoroughness and examination of details
Final Project General Guidelineso Purpose is to apply knowledge to real life situations and to gain hands-
on experiences.
o I. You must sign up for the project and work in a two-student team.
o II. Each group must schedule a meeting with the instructor to discuss the project no later than the due date indicated in schedule.
o III. Each group must document the process and activities. Turn in your project documentation including the following parts:
Introduction: Topic description and project goals; members
Specific tasks that are distributed among members
The final product plus description and examples (this is the main part of the document)
Conclusions and experiences (summarize what you have learned and your thoughts; you may add what you would do if you would do it again)
Final Projects: Road Map/TOC/Outline for
the Information Organization Portfolio I. Introduction
• What is your project? Expectations, Required elements, etc.• Issues/concerns specific to your project topic that play a role
in developing an IOP
II. Class topics and their relationship to your project3-5 key considerations about each topic that is significant in developing an IOP on the specific project
III. Case-Studies and their Critique based on class topics or more
List of web resources (DL or web portal) with short description and location
3 or more case studies as relevant
Comparative analysis
IS 520~Mehra
Final Projects: Road Map/TOC/Outline for the
Information Organization Portfolio
IV. Design Solutions/Templates Design solutions reflecting key aspects Web design solutions Analysis of designs
V. Recommendations
VI. Future Considerations
VII. Documentation Report
IS 520~Mehra
Final Project Examples1. On the existing DiscoverET.org website, develop an IORP for
presenting community-based information for a selected subject
category “Health.”
• Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions.
• Do a case-analysis of existing content and representation scheme(s) on websites of other community networks and provide alternative design solutions.
• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.
• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.
Final Project Examples2. On the existing DiscoverET.org website, develop an IORP
for presenting community-based information for a selected subject category “Tourism.”
• Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions.
• Do a case-analysis of existing content and representation scheme(s) on websites of other community networks and provide alternative design solutions.
• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.
• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.
Final Project Examples3. For the existing DiscoverET.org website, develop an
IORP for presenting community-based information for a new subject category of “Diversity Resources.”
• Do a case-analysis of existing content and representation scheme(s) (related to “Diversity”) on the website and provide alternative design solutions.
• Do a case-analysis and critique of existing content and representation scheme(s) on selected websites/web portals (other community networks) on the subject site and provide alternative design solutions.
• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.
• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.
Final Project Examples4. Select one county in Tennessee and develop an IORP for presenting community-based information for the county.
• Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions.
• Your IORP should include a comprehensive collection of website listings for that county, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.
• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.
• Provide a test-bed for implementation based on selection for one selected county from the adjoining states or select from the following website: URL: http://www.discoveret.org/index.php?p=DirCountySearch
IS 520~Mehra
Final Project Examples5. Based on a study of the use of wikis in existing and emerging community-based web portals, develop an IORP for presenting community-based interactive communication and information-sharing interactive tools via development of wikis on the DiscoverET.org website.
• Do a case-analysis and critique of existing content and representation scheme(s) on selected websites/web portals (other community networks) that have wikis and provide alternative design solutions.
• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.
• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. Evaluate the forms of interaction taking place via the different wikis in the different settings.
• Present the pro and cons based upon your analysis while you make recommendations for the DiscoverET.org website. Present summary reports for use of wikis as community-based interactive communication and information-sharing tools that includes design options and implementation plan for application.
Final Project Examples6. Based on a study of the use of interactive databases for organizing, representing, and managing community-based information in representative case examples, provide a scheme for a community client (Fish) at DiscoverET.org who want to develop a system to keep up track of their activities/events and organize their work and human resources (time schedules, working responsibilities, etc.).
• Based on case-analysis and critique of existing content and representation scheme(s) in databases on selected websites/web portals (other community networks), identify what kind of databases the client can use, discussion on pros and cons for each, cost-benefit ratios, etc.
• Your IORP should include a comprehensive collection of database examples, identification of entities and attributes for your designed database, classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.
• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.
For the DiscoverET.org website
1. Present community-based information for a selected subject category “Health”
2. Present community-based information for a selected subject category “Tourism”: Pam, Suzanne
3. Present community-based information for a new subject category “Diversity Resources”: Hannah, Deborah
4. Select one county in Tennessee and develop an IORP for presenting community-based information for the county: Sara, Christa
5. Study of the use of wikis in existing and emerging community-based web portals: Margaret, Emily
6. Study of the use of interactive databases for organizing, representing, and managing community-based information in representative case examples: Bridger, Roger
Critical Reflection 7
In pairs identify a subject domain and select at least five items to form a template design for a digital library. Brainstorm various topics/aspects covered in class that will be pertinent for creating an effective information organization and representation scheme for your digital library. Design a database for your collection and identify key entities, attributes, and relationships. Present an ER Diagram to reflect some aspects of your database design.
Critical Reflection Goals for the metadata and users: Are you clear about what you
want to achieve with this metadata? Are you clear about your users’ use of the resources?
Granularity: What level of granularity is most appropriate to the items and user needs?
Sources of info: Is it clear or even stated where you get your information? For example, if title is a field, is the cataloger told where to find that info? For example with a videotape- do you look on the label? The box?
Complexity of record creation: Are special skills required to formulate the records? Are the records designed to be created by the info ‘publisher’ or centrally by service providers?
Content: The content of different metadata record formats can be compared from aspects of structure and syntax, but perhaps most important is an evaluation of the usefulness and purpose of the info within them. How useful are the records you have created?
Works well or not: What fields or characteristics work well (or do not work well) in describing your objects?
Tweaking: How could/should the metadata be “tweaked” to accommodate your needs?