Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave...

30
Distributed Access to Data Distributed Access to Data Resources: Metadata Resources: Metadata Experiences from the Experiences from the NESSTAR Project NESSTAR Project Simon Musgrave Data Archive, University of Essex

Transcript of Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave...

Page 1: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Distributed Access to Data Resources: Distributed Access to Data Resources: Metadata Experiences from the Metadata Experiences from the

NESSTAR ProjectNESSTAR Project

Simon Musgrave

Data Archive, University of Essex

Page 2: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Reference PointReference Point

• Data Archive is supplier of data for ‘secondary analysis’ to research community

• Main focus is on metadata for dissemination• Key research project (FASTER, 2000-2001) is on

examining and using metadata throughout the data process from concept, through collection to analysis and interpretation

Page 3: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Setting the SceneSetting the Scene

The project aimed to increase massively the use of data by developing a set of generic tools that make it easier to:

locate multiple data sources across organisational and national boundaries

browse detailed information about these data, especially the descriptive and contextual information

tabulate and visualise these data quickly and easily for both naïve and experienced users

disseminate these data and documentation, in whole or part, in forms suitable for immediate use

Page 4: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

User demand or technological User demand or technological determinism?determinism?

• Technology – improves productivity of existing activities (document processing,

data access)

– creates new opportunities (new analysis techniques, interactive research, interoperability)

• New opportunities have to be assessed and evaluated against the dreams and expectations of users

• How can technological opportunities to be harnessed to make it as easy as possible for data analysts to derive knowledge as efficiently as possible?

Page 5: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Types of End UserTypes of End User

• LibrariansLibrarians• ResearchersResearchers• StudentsStudents• Policy MakersPolicy Makers• JournalistsJournalists

Page 6: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Types of ResourceTypes of Resource• Data

– Micro– Aggregate– Geographical– Qualitative

• Journals• Models• Analysis• People

As much as possibleIdentifiableUnderstandableUsableInteractive

Page 7: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Types of data Types of data

0%

25%

50%

75%

100%

1987 1997 2007

S-EDI

P-EDI

Paper

S-EDI is secondary EDI, in other words re-use of data collected for other purposes - source Statistics Netherlands

Page 8: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Role of MetadataRole of Metadata

Metadata is data about data, is all data metadata?

Statistical metadata

All the information needed for and relevant to collecting,

processing, disseminating, accessing, understanding, and

using statistical data (Statistics Netherlands)

Page 9: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Importance of standardsImportance of standards

• Closed

• Proprietary

• De jure

• De facto

• Open

Acronym attack

Page 10: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Whose metadata standard do you use?Whose metadata standard do you use?

• W3C– RDF - Resource Description Framework

– XML Schema

• OMG– XMI - XML Metadata Interchange; MOF Repository format

• XML/EDI Group– XML Repository Standard

• Meta Data Coalition– Open Information Model

Page 11: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Metadata DevelopmentsMetadata Developments

• Metadata is increasingly about machine to machine communication

• Metadata should be embedded with the data wherever possible to facilitate process control

• Structure, semantics and syntax become increasingly important to facilitate interoperability

Goal is to create the semantic Web - a web of data that can be processed directly or indirectly by machines - leaving people to be more intuitive and creative

n

Page 12: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Upstream metadataUpstream metadata

– Statistical Concepts– Data processes– Storage structures– Classification databases– Survey questionnaires

Page 13: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Downstream metadataDownstream metadata

– Availability– Data structure– Multi-lingual thesaurus– Geographical referencing– Analysis– Articles– Feedback

Page 14: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Types of metadata (1)Types of metadata (1)

• Catalogue (DDI level 1 and 2)– Dublin Core– BIRON– Most on-line catalogue– Z39.50– Thesaurus

• Starting point for resource discovery– Find it - but then what?

Page 15: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Dublin CoreDublin Core

Content Intellectual Property InstantiationTitle Creator Date

Subject Publisher Format

Description Contributor Identifier

Type Rights Language

Source

Relation

Coverage

More

Page 16: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Types of metadata (2)Types of metadata (2)

• Content (DDI level 4)– Data dictionary allowing detailed searching and

browsing• Question text

• Variable and value labels

– Entry point to the actual data

Page 17: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Types of metadata (3)Types of metadata (3)

• Contextual– the sky’s the limit– background - user guides, questionnaires– multi-media descriptions– teaching and learning– concepts

Page 18: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Types of metadata (4)Types of metadata (4)

• Quality, e.g.– Methodology– Response rates– Responsible agent– Processing procedures

Page 19: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Types of metadata (5)Types of metadata (5)

• People - repositories of informationexperts in– the subject matter– the analysis techniques– the data source– data and computer management

Page 20: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Recursive MetadataRecursive Metadata

The statistical production process

(Secondary) use of statistical data

The Lifetime of Data

Metadata systems :• bridge the gap between the production process and the end-users• facilitate two-way communication between producers and users

Metadata systems should not only pass on all relevant information to the the end-users,but also allows the end users tocontribute to the metadata conversation.

Page 21: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar Types of metadata (6)Types of metadata (6)

• Bookmarks/hyperlinks– searches– datasets– analysis (tables, models etc.)– download

• Run manually or by active agent

Page 22: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

The Electronic JournalThe Electronic Journal

• HTML and XML provide the facility to bring data and text together

• The readers have to opportunity to participate in the research process directly - information flow is 2-way

• Using, creating and sharing bookmarks

Data

On-linedocuments

Page 23: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

FASTER ObjectivesFASTER Objectives

• To create a flexible and intelligent presentation system to access statistical and other data in a distributed 'virtual' environment.

• Based on a Web/JAVA environment, and built around the careful specification of metadata content, it will allow the user to create their own personal data workbench.

• Implement full access control to the underlying data, taking care of both data confidentiality issues (including disclosure control) and the commercial opportunities for the data

Page 24: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Faster - Flexible Access to Statistics Table and Electron Resources

PartnersPartners

Role Participant name Country

CO The Data Archive, University of Essex United Kingdom

CR Norwegian Social Science Data Services Norway

CR Dansk Data Arkiv Denmark

CR Centraal Bureau voor de Statistiek Netherlands

CR Universita di Milano Italy

AC Central Statistics Office Ireland

AC Statistisk Sentralbyrå Norway

AC Centre National de la RechercheScientifique

France

n

Page 25: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Page 26: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Page 27: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Page 28: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Page 29: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Page 30: Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.

Nesstar - Networked Social Science Tools and Resources

nnesstar