Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave...
-
Upload
frederick-glenn -
Category
Documents
-
view
219 -
download
0
Transcript of Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave...
Distributed Access to Data Resources: Distributed Access to Data Resources: Metadata Experiences from the Metadata Experiences from the
NESSTAR ProjectNESSTAR Project
Simon Musgrave
Data Archive, University of Essex
Nesstar - Networked Social Science Tools and Resources
nnesstar Reference PointReference Point
• Data Archive is supplier of data for ‘secondary analysis’ to research community
• Main focus is on metadata for dissemination• Key research project (FASTER, 2000-2001) is on
examining and using metadata throughout the data process from concept, through collection to analysis and interpretation
Nesstar - Networked Social Science Tools and Resources
nnesstar Setting the SceneSetting the Scene
The project aimed to increase massively the use of data by developing a set of generic tools that make it easier to:
locate multiple data sources across organisational and national boundaries
browse detailed information about these data, especially the descriptive and contextual information
tabulate and visualise these data quickly and easily for both naïve and experienced users
disseminate these data and documentation, in whole or part, in forms suitable for immediate use
User demand or technological User demand or technological determinism?determinism?
• Technology – improves productivity of existing activities (document processing,
data access)
– creates new opportunities (new analysis techniques, interactive research, interoperability)
• New opportunities have to be assessed and evaluated against the dreams and expectations of users
• How can technological opportunities to be harnessed to make it as easy as possible for data analysts to derive knowledge as efficiently as possible?
Types of End UserTypes of End User
• LibrariansLibrarians• ResearchersResearchers• StudentsStudents• Policy MakersPolicy Makers• JournalistsJournalists
Types of ResourceTypes of Resource• Data
– Micro– Aggregate– Geographical– Qualitative
• Journals• Models• Analysis• People
As much as possibleIdentifiableUnderstandableUsableInteractive
Nesstar - Networked Social Science Tools and Resources
nnesstar Types of data Types of data
0%
25%
50%
75%
100%
1987 1997 2007
S-EDI
P-EDI
Paper
S-EDI is secondary EDI, in other words re-use of data collected for other purposes - source Statistics Netherlands
Role of MetadataRole of Metadata
Metadata is data about data, is all data metadata?
Statistical metadata
All the information needed for and relevant to collecting,
processing, disseminating, accessing, understanding, and
using statistical data (Statistics Netherlands)
Importance of standardsImportance of standards
• Closed
• Proprietary
• De jure
• De facto
• Open
Acronym attack
Whose metadata standard do you use?Whose metadata standard do you use?
• W3C– RDF - Resource Description Framework
– XML Schema
• OMG– XMI - XML Metadata Interchange; MOF Repository format
• XML/EDI Group– XML Repository Standard
• Meta Data Coalition– Open Information Model
Metadata DevelopmentsMetadata Developments
• Metadata is increasingly about machine to machine communication
• Metadata should be embedded with the data wherever possible to facilitate process control
• Structure, semantics and syntax become increasingly important to facilitate interoperability
Goal is to create the semantic Web - a web of data that can be processed directly or indirectly by machines - leaving people to be more intuitive and creative
n
Nesstar - Networked Social Science Tools and Resources
nnesstar Upstream metadataUpstream metadata
– Statistical Concepts– Data processes– Storage structures– Classification databases– Survey questionnaires
Nesstar - Networked Social Science Tools and Resources
nnesstar Downstream metadataDownstream metadata
– Availability– Data structure– Multi-lingual thesaurus– Geographical referencing– Analysis– Articles– Feedback
Nesstar - Networked Social Science Tools and Resources
nnesstar Types of metadata (1)Types of metadata (1)
• Catalogue (DDI level 1 and 2)– Dublin Core– BIRON– Most on-line catalogue– Z39.50– Thesaurus
• Starting point for resource discovery– Find it - but then what?
Dublin CoreDublin Core
Content Intellectual Property InstantiationTitle Creator Date
Subject Publisher Format
Description Contributor Identifier
Type Rights Language
Source
Relation
Coverage
More
Nesstar - Networked Social Science Tools and Resources
nnesstar Types of metadata (2)Types of metadata (2)
• Content (DDI level 4)– Data dictionary allowing detailed searching and
browsing• Question text
• Variable and value labels
– Entry point to the actual data
Nesstar - Networked Social Science Tools and Resources
nnesstar Types of metadata (3)Types of metadata (3)
• Contextual– the sky’s the limit– background - user guides, questionnaires– multi-media descriptions– teaching and learning– concepts
Nesstar - Networked Social Science Tools and Resources
nnesstar Types of metadata (4)Types of metadata (4)
• Quality, e.g.– Methodology– Response rates– Responsible agent– Processing procedures
Nesstar - Networked Social Science Tools and Resources
nnesstar Types of metadata (5)Types of metadata (5)
• People - repositories of informationexperts in– the subject matter– the analysis techniques– the data source– data and computer management
Recursive MetadataRecursive Metadata
The statistical production process
(Secondary) use of statistical data
The Lifetime of Data
Metadata systems :• bridge the gap between the production process and the end-users• facilitate two-way communication between producers and users
Metadata systems should not only pass on all relevant information to the the end-users,but also allows the end users tocontribute to the metadata conversation.
Nesstar - Networked Social Science Tools and Resources
nnesstar Types of metadata (6)Types of metadata (6)
• Bookmarks/hyperlinks– searches– datasets– analysis (tables, models etc.)– download
• Run manually or by active agent
The Electronic JournalThe Electronic Journal
• HTML and XML provide the facility to bring data and text together
• The readers have to opportunity to participate in the research process directly - information flow is 2-way
• Using, creating and sharing bookmarks
Data
On-linedocuments
FASTER ObjectivesFASTER Objectives
• To create a flexible and intelligent presentation system to access statistical and other data in a distributed 'virtual' environment.
• Based on a Web/JAVA environment, and built around the careful specification of metadata content, it will allow the user to create their own personal data workbench.
• Implement full access control to the underlying data, taking care of both data confidentiality issues (including disclosure control) and the commercial opportunities for the data
Faster - Flexible Access to Statistics Table and Electron Resources
PartnersPartners
Role Participant name Country
CO The Data Archive, University of Essex United Kingdom
CR Norwegian Social Science Data Services Norway
CR Dansk Data Arkiv Denmark
CR Centraal Bureau voor de Statistiek Netherlands
CR Universita di Milano Italy
AC Central Statistics Office Ireland
AC Statistisk Sentralbyrå Norway
AC Centre National de la RechercheScientifique
France
n
Nesstar - Networked Social Science Tools and Resources
nnesstar