CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

26
CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive

Transcript of CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Page 1: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

CESSDA Expert Seminar 2009

Atle AlvheimNorwegian Social Science Data Archive

Page 2: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.
Page 3: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

A common future ?

The last 15 years has been focused on building up a common data infrastructure for the social sciences, based on modern web-technology

Page 4: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

1. The web: The idea that the archives could create an integrated catalog, Grenoble 1994

2. DDI: A richer and better data documentation format, R.Rockwell / ICPSR IASSIST 1995

3. Integrate 3-4 components:

Internet / web / Common catalog

DDI

Access explore analyse download data: The social science dream machine NESSTAR J.Ryssevik / S.Musgrave

ILSES Integrated Library and Survey Data Extraction Service

4. Richer services, FASTER (Data types) LIMBER (Attack the language barrier)

5. One single common entry point,

Madiera,

Metadater

Page 5: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

CESSDA METADATA HARVESTER

SERVER 3

SERVER 1

SERVER 2

Nesstar

Nesstar

Nesstar

SERVER 5

SERVER 4OAI-PMH

OAI-PMH

Search Browse

LUCENE ELSSTTopical List

Page 6: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Square files

Resources:

CESSDA Template Controled vocabulariesMultilingual thesaurusCESSDA classification

HarvesterIndexing tool Portal

Server 1 2 3 4

Publishing

Client

Browsing toolSearch tool

Page 7: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

THE RESEARCHER

Looking for data... Cultivate knowledge...

THE ARCHIVES

THE PORTAL

Bridging the gap

Page 8: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

1. Greenland 2. Iceland 3. Feroe Islands 4. Norway 5. Sweden 6. Finland 7. Aaland Islands 8. Estonia9. Latvia 10. Lithuania 11. Belorussia12. Ukraine13. Moldova14. Poland15. Germany16. Denmark17. England18. Scotland19. Wales20. Northern Ireland21. Ireland

22. Netherland23. Belgium24. Luxembourg25. France26. Portugal27. Spain28. Andorra29. Monaco30. Switzerland31. Italy32. San Marino33. Vatican State34. Slovenia35. Lichtenstein36. Austria37. Czech republic 38. Slovakia39. Hungary40. Romania41. Bulgaria42. Serbia

43. Croatia44. Bosnia & Herzegovina45. Montenegro46. Kosovo47. Albania48. Macedonia49. Greece50. Cyprus South51. Cyprus North52. Malta53. Turkey54. Russia ?55. Georgia ?? 56. Armenia ???57. Israel ????

30 Languages, 45 legal systems

We are supposed to support research, break down technical-,

linguistic-, judicial-, economic barriers

Several processes – timelines in a layered system

Page 9: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Share formats and routines

Access and download

Instrument development

Control access

Page 10: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.
Page 11: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

1. Make a more powerful interface to data holdings- more sophisticated search / browse possibilities, more focused, even across languages- better possibilities to handle results

2. Handle more complex datastructures, over time, across space, languages, link micro – macro These we may see as ”analytic dimensions”

3. Persistent identity, connect knowledge products back into

the data used, turn traditional picture upside down These are more ”practical management

4. Handle problems of double storage. Data dynamics, more than one value in a table cellVersioning, updating, comments, links, references Adding to the data item

5. Single Sign On, need to pass information and access more than one server, logging

Page 12: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Conseptualisation Instrument Data production (SIP) Data documentation (AIP)

Question DB

The researcher formulatea problem and need data to analyse the problem

If data have to be collected, we need an instrument, a questionnaire

When data are collected, with necessary metadata, they represent a SIP

A questions- and concepts DB is a very useful tool to develop instruments

To make data ready for archiving they have to be documented (and processed), lifted from a SIP to a AIP

Data documentation: Should be based on standardised procedures / best practices and common tools for all CESSDA (+) archives

DDI 2/3 expressed as a Template/DDI-profile, which is a) selection of elements, with status b) element repositories c) controled vocabularies d) multi-lingual thesaurus e) gazetteer, geographic classification f) CESSDA study classification

This requires software or a manual / clear guidelines. DDI becomes the glue that hold this whole system together.

A questions DB potentially problematic for data documentation processes. Better to import directly via questionnaire

Much data generated by thepublic statistical system or other producers

Will make it possible to find questions fromconcepts (Need an interface)

Learn from othersEncourage comp researchLook up translations

Contact with user community

Metadata – standard

Tool for instrument development

Tool for data collection

Tool for documentation

Question DB, translations

A overarching plan

Integration of components

Data have a life-cycleThe archive: A Greenhouse or a Graveyard ?

Page 13: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Metadata Metadata Metadata Metadata Data Data Data-data-data Data Data-data-data Data Data-data-data

Ingest Data repositories UKDA DDA FSDAIP

Question DB

When an AIP a inserted into an archive or storage it can trigger an update of a question database.

Or do updates happen as aharvesting process ?

To what degree are packages pre-defined or built for purposes ?

A question database will berelated to a basic storage.Do updates happen as a guarded / explicit process ?What are the criteria ?

Our AIPs

Page 14: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Finnish and EnglishDDI 3.1OtherNesstarFSD

CombinationsCombinations

LanguageMetadata-standardStorageArchive

Danish and EnglishDD2.xNesstarFedoraDDA

EnglishDDI 3.0DDI 2.0OtherFedoraUKDA

Because of storage complexity harvesting also becomes quite complex

Page 15: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Data repositories UKDA DDA FSD

Metadata Metadata Metadata Metadata Data Data Data-data-data Data Data-data-data Data Data-data-data

Data repositories are guarded by access policies. Policies are usually formulated at institution or repository level

Policies are activated by the crossing of the line between metadata and data, which is at data package level

Should policies be linked to packages instead of repositories ? Should it be anobligatory part of metadata ? Then we need to have policies formalised.

SSO / AAA

LOG-DB

Data repositories should be documented in national + common languageDifferent documentation templates for national and international language

Page 16: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

CV: LifeCycleEvent

Study ProposalStudy DesignInstrument Design FundingInterviewer training Ethics ReviewSamplingInstrument pre-testing Pilot studyQuestionnaire translationDocumentation translation

DATA COLLECTIONData collection reportsPost-collection processing Data production Initial data quality checks Metadata production Original release

DEPOSITPost-production processing Data quality checks Data editing Data integration Processing for Disclosure Metadata editing Preservation package production Dissemination package New version production

New version release / publication

From producer to consumer, the data archival work

Cover the whole data (or project ?)life-cycle

Locate, explore and download

Page 17: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

NSDs Nesstar-serversNSDData Meta-

dataCivic-active

ESS Political system

InnovationNorway

Churchdata

Schoolnesstar

Welfaredata

Eurosphere Opinionpolls

Micro Cube Micro Mix Mix Mix Mix Metadata Qualitative Micro

The CESSDA data archives will in due time be both data providers, aggregators and single service providers. This is an illustration of what would presently be the NSD situation.

CESSDA complications: We need services that cover many servers and many conditions for use

Page 18: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

NSDs Nesstar-serversNSDData Meta-

dataCivic-active

ESS Political system

InnovationNorway

Churchdata

Schoolnesstar

Welfaredata

Eurosphere Opinion polls

Micro Cube Micro Mix Mix Mix Mix Metadata Qualitative Micro

Functionalities we need, with a scale from producer to consumer

NSDData Multi-linguality, Translation-support, DDI-profile, ELSST

CivicActive Switch absolute/relative figures, convert cubes to rectangular files

ESS Complex files, link micro- and macro-levels

Political system Auto-publishing from databases to service system

Eurosphere Text, qualitative data Link servers

Complex servers Selective login

Page 19: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

The user authentication problemAlmost always at institutional level

Page 20: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Portal

Server 1 DDAServer 2Server 3 ZAServer 4Server 5 UKDAServer n

Dataset 1

Dataset 2

Dataset 3

Users, affiliated with national institutions, based on a commonjustification (research) and work within specific projects (Have roles within projects ?)

want to

access data resources in different institutions and countries

User

The user authorisation problemVery often at resource level

Page 21: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Complex?

Page 22: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Complex?Complex?

Page 23: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

DDI 2/3 expressed as Template/DDI-profile, as a) selection of elements, with status b) Controled vocabularies c) Multilingual thesaurus d) Gazetteer e) CESSDA classification

Portal Search Browse

ELSST x time, space, methodology

ELSST Query service

Data loader: May handle multipleand complex data packagesExplore and compare functionality

Ingest (AIP)

Data repositories UKDA DDA FSD

SSO/AAA

Politics (Repository or package level)Metadata

Metadata

Metadata

Metadata

Data Data Data Data

Web browser

Conceptualisation

Instrument

Data production (SIP)

Data documentation

Harmonisation (and concepts) DB

Log database

Question DB

Download

Registry

CESSDA Toolkit1

5

3

4

6

7

98

2

11

12

10

Tool

Intermediate storage

Page 24: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

CESSDA WS

ConceptBank

ClassificationBank

GeoBank

QuestionBank

QuestionnaireBank

MetadataIngester

InstructionBank

UniverseBank

VariableBank

3CDB

QDB

FutureServices

StudyBank

C3DBWS

QDBWS

FutureWS

…Banks

3CDBApplications

QDBApplications

3CDB/QDBApplications

FutureApplications

NesstarPublisher

ReportingTools

AdminTools

SecurityTools

non-DDI Objects

DDI 1/2.x

IngestWS

PublicationTool

LegacyDatabase

DDI 3.0+

Custom Exporter

DDI 3.0Converter

Could interact with WS for metadata

preparation

Could interact with WS for metadata

preparation

Ingester performs quality assurance, split metadata and maintains referential

integrity for storage in CESSDA Bank

DDI centric back-end

CESSDA-DB stores all low level objects

Back-end maintenance and reporting tools

Web services exposed for

public consumption

Internal web services stack

3CDB/QBD applications call

relevant WS

local objects

local objects

Page 25: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

Ingestion/Registration Process

ConceptBank

ClassificationBank

GeoBank

QuestionBank

QuestionnaireBank

InstructionBank

UniverseBank

VariableBank

StudyBank

…Banks

MetadataIngester

NesstarPublisher

DDI 1/2.x

IngestWS

PublicationTool

LegacyDatabase

DDI 3.0+

Custom Exporter

DDI 3.0Converter

MetadataRegistry

PublicationWS

ExampleSubmission of a Nesstar DDI will typically result in creation of objects in the following

banks: study, classifications, variables, instance (files) and possibly concepts,

universes, questions, instructions if such variable level metadata have been

compiled.

ExampleA legacy system used for the production of questionnaire could create objects in the

question, questionnaire, instruction, concepts, universes and classification

banks. This may happen outside the context of a survey (question bank) and no variable

would be associated with these objects.

Metadata optimization / harmonizationOptimization of the metadata (merging

duplicates, aligning on harmonized objects, etc.) can be done using various automated, semi-automated or manual methods during the various stages of submission (this can

also be performed later on)

SubmissionObject registration could be automated upon

release of the metadata by the provider. Workflow can be implemented as

necessary.

RepositoryMany metadata repositories can exist

around the network. These can be deployed at the provider level, or as shared metadata

storage.

RepositoryWS

InterfacesNote that metadata repositories also

expose a set of general and specialized web services along with administrative /

security interfaces

MetadataRepositories

(Banks)Submission

Submission packages are prepared by providers in compliance with the CESSDA DDI3+ specification. Publications tools are

used to manage packages and control ingestion process. Packages are broken down

and stored in various banks (as needed)

Page 26: CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.

DDI 2/3 expressed as Template/DDI-profile, as a) selection of elements, with status b) Controled vocabularies c) Multilingual thesaurus d) Gazetteer e) CESSDA classification

Portal Search Browse

ELSST x time, space, methodology

ELSST Query service

Data loader: May handle multipleand complex data packagesExplore and compare functionality

Ingest (AIP)

Data repositories UKDA DDA FSD

SSO/AAA

Politics (Repository or package level)Metadata

Metadata

Metadata

Metadata

Data Data Data Data

Web browser

Conceptualisation

Instrument

Data production (SIP)

Data documentation

Harmonisation (and concepts) DB

Log database

Question DB

Download

Registry

CESSDA Toolkit1

5

3

4

6

7

98

2

11

12

10

Tool

Intermediate storage