Atle Hesmyr: European Civilization; From Belief in Progress to Germs of Dystopia
CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.
-
Upload
cecilia-keep -
Category
Documents
-
view
218 -
download
4
Transcript of CESSDA Expert Seminar 2009 Atle Alvheim Norwegian Social Science Data Archive.
CESSDA Expert Seminar 2009
Atle AlvheimNorwegian Social Science Data Archive
A common future ?
The last 15 years has been focused on building up a common data infrastructure for the social sciences, based on modern web-technology
1. The web: The idea that the archives could create an integrated catalog, Grenoble 1994
2. DDI: A richer and better data documentation format, R.Rockwell / ICPSR IASSIST 1995
3. Integrate 3-4 components:
Internet / web / Common catalog
DDI
Access explore analyse download data: The social science dream machine NESSTAR J.Ryssevik / S.Musgrave
ILSES Integrated Library and Survey Data Extraction Service
4. Richer services, FASTER (Data types) LIMBER (Attack the language barrier)
5. One single common entry point,
Madiera,
Metadater
CESSDA METADATA HARVESTER
SERVER 3
SERVER 1
SERVER 2
Nesstar
Nesstar
Nesstar
SERVER 5
SERVER 4OAI-PMH
OAI-PMH
Search Browse
LUCENE ELSSTTopical List
Square files
Resources:
CESSDA Template Controled vocabulariesMultilingual thesaurusCESSDA classification
HarvesterIndexing tool Portal
Server 1 2 3 4
Publishing
Client
Browsing toolSearch tool
THE RESEARCHER
Looking for data... Cultivate knowledge...
THE ARCHIVES
THE PORTAL
Bridging the gap
1. Greenland 2. Iceland 3. Feroe Islands 4. Norway 5. Sweden 6. Finland 7. Aaland Islands 8. Estonia9. Latvia 10. Lithuania 11. Belorussia12. Ukraine13. Moldova14. Poland15. Germany16. Denmark17. England18. Scotland19. Wales20. Northern Ireland21. Ireland
22. Netherland23. Belgium24. Luxembourg25. France26. Portugal27. Spain28. Andorra29. Monaco30. Switzerland31. Italy32. San Marino33. Vatican State34. Slovenia35. Lichtenstein36. Austria37. Czech republic 38. Slovakia39. Hungary40. Romania41. Bulgaria42. Serbia
43. Croatia44. Bosnia & Herzegovina45. Montenegro46. Kosovo47. Albania48. Macedonia49. Greece50. Cyprus South51. Cyprus North52. Malta53. Turkey54. Russia ?55. Georgia ?? 56. Armenia ???57. Israel ????
30 Languages, 45 legal systems
We are supposed to support research, break down technical-,
linguistic-, judicial-, economic barriers
Several processes – timelines in a layered system
Share formats and routines
Access and download
Instrument development
Control access
1. Make a more powerful interface to data holdings- more sophisticated search / browse possibilities, more focused, even across languages- better possibilities to handle results
2. Handle more complex datastructures, over time, across space, languages, link micro – macro These we may see as ”analytic dimensions”
3. Persistent identity, connect knowledge products back into
the data used, turn traditional picture upside down These are more ”practical management
4. Handle problems of double storage. Data dynamics, more than one value in a table cellVersioning, updating, comments, links, references Adding to the data item
5. Single Sign On, need to pass information and access more than one server, logging
Conseptualisation Instrument Data production (SIP) Data documentation (AIP)
Question DB
The researcher formulatea problem and need data to analyse the problem
If data have to be collected, we need an instrument, a questionnaire
When data are collected, with necessary metadata, they represent a SIP
A questions- and concepts DB is a very useful tool to develop instruments
To make data ready for archiving they have to be documented (and processed), lifted from a SIP to a AIP
Data documentation: Should be based on standardised procedures / best practices and common tools for all CESSDA (+) archives
DDI 2/3 expressed as a Template/DDI-profile, which is a) selection of elements, with status b) element repositories c) controled vocabularies d) multi-lingual thesaurus e) gazetteer, geographic classification f) CESSDA study classification
This requires software or a manual / clear guidelines. DDI becomes the glue that hold this whole system together.
A questions DB potentially problematic for data documentation processes. Better to import directly via questionnaire
Much data generated by thepublic statistical system or other producers
Will make it possible to find questions fromconcepts (Need an interface)
Learn from othersEncourage comp researchLook up translations
Contact with user community
Metadata – standard
Tool for instrument development
Tool for data collection
Tool for documentation
Question DB, translations
A overarching plan
Integration of components
Data have a life-cycleThe archive: A Greenhouse or a Graveyard ?
Metadata Metadata Metadata Metadata Data Data Data-data-data Data Data-data-data Data Data-data-data
Ingest Data repositories UKDA DDA FSDAIP
Question DB
When an AIP a inserted into an archive or storage it can trigger an update of a question database.
Or do updates happen as aharvesting process ?
To what degree are packages pre-defined or built for purposes ?
A question database will berelated to a basic storage.Do updates happen as a guarded / explicit process ?What are the criteria ?
Our AIPs
Finnish and EnglishDDI 3.1OtherNesstarFSD
CombinationsCombinations
LanguageMetadata-standardStorageArchive
Danish and EnglishDD2.xNesstarFedoraDDA
EnglishDDI 3.0DDI 2.0OtherFedoraUKDA
Because of storage complexity harvesting also becomes quite complex
Data repositories UKDA DDA FSD
Metadata Metadata Metadata Metadata Data Data Data-data-data Data Data-data-data Data Data-data-data
Data repositories are guarded by access policies. Policies are usually formulated at institution or repository level
Policies are activated by the crossing of the line between metadata and data, which is at data package level
Should policies be linked to packages instead of repositories ? Should it be anobligatory part of metadata ? Then we need to have policies formalised.
SSO / AAA
LOG-DB
Data repositories should be documented in national + common languageDifferent documentation templates for national and international language
CV: LifeCycleEvent
Study ProposalStudy DesignInstrument Design FundingInterviewer training Ethics ReviewSamplingInstrument pre-testing Pilot studyQuestionnaire translationDocumentation translation
DATA COLLECTIONData collection reportsPost-collection processing Data production Initial data quality checks Metadata production Original release
DEPOSITPost-production processing Data quality checks Data editing Data integration Processing for Disclosure Metadata editing Preservation package production Dissemination package New version production
New version release / publication
From producer to consumer, the data archival work
Cover the whole data (or project ?)life-cycle
Locate, explore and download
NSDs Nesstar-serversNSDData Meta-
dataCivic-active
ESS Political system
InnovationNorway
Churchdata
Schoolnesstar
Welfaredata
Eurosphere Opinionpolls
Micro Cube Micro Mix Mix Mix Mix Metadata Qualitative Micro
The CESSDA data archives will in due time be both data providers, aggregators and single service providers. This is an illustration of what would presently be the NSD situation.
CESSDA complications: We need services that cover many servers and many conditions for use
NSDs Nesstar-serversNSDData Meta-
dataCivic-active
ESS Political system
InnovationNorway
Churchdata
Schoolnesstar
Welfaredata
Eurosphere Opinion polls
Micro Cube Micro Mix Mix Mix Mix Metadata Qualitative Micro
Functionalities we need, with a scale from producer to consumer
NSDData Multi-linguality, Translation-support, DDI-profile, ELSST
CivicActive Switch absolute/relative figures, convert cubes to rectangular files
ESS Complex files, link micro- and macro-levels
Political system Auto-publishing from databases to service system
Eurosphere Text, qualitative data Link servers
Complex servers Selective login
The user authentication problemAlmost always at institutional level
Portal
Server 1 DDAServer 2Server 3 ZAServer 4Server 5 UKDAServer n
Dataset 1
Dataset 2
Dataset 3
Users, affiliated with national institutions, based on a commonjustification (research) and work within specific projects (Have roles within projects ?)
want to
access data resources in different institutions and countries
User
The user authorisation problemVery often at resource level
Complex?
Complex?Complex?
DDI 2/3 expressed as Template/DDI-profile, as a) selection of elements, with status b) Controled vocabularies c) Multilingual thesaurus d) Gazetteer e) CESSDA classification
Portal Search Browse
ELSST x time, space, methodology
ELSST Query service
Data loader: May handle multipleand complex data packagesExplore and compare functionality
Ingest (AIP)
Data repositories UKDA DDA FSD
SSO/AAA
Politics (Repository or package level)Metadata
Metadata
Metadata
Metadata
Data Data Data Data
Web browser
Conceptualisation
Instrument
Data production (SIP)
Data documentation
Harmonisation (and concepts) DB
Log database
Question DB
Download
Registry
CESSDA Toolkit1
5
3
4
6
7
98
2
11
12
10
Tool
Intermediate storage
CESSDA WS
ConceptBank
ClassificationBank
GeoBank
QuestionBank
QuestionnaireBank
MetadataIngester
InstructionBank
UniverseBank
VariableBank
3CDB
QDB
FutureServices
StudyBank
C3DBWS
QDBWS
FutureWS
…Banks
3CDBApplications
QDBApplications
3CDB/QDBApplications
FutureApplications
NesstarPublisher
ReportingTools
AdminTools
SecurityTools
non-DDI Objects
DDI 1/2.x
IngestWS
PublicationTool
LegacyDatabase
DDI 3.0+
Custom Exporter
DDI 3.0Converter
Could interact with WS for metadata
preparation
Could interact with WS for metadata
preparation
Ingester performs quality assurance, split metadata and maintains referential
integrity for storage in CESSDA Bank
DDI centric back-end
CESSDA-DB stores all low level objects
Back-end maintenance and reporting tools
Web services exposed for
public consumption
Internal web services stack
3CDB/QBD applications call
relevant WS
local objects
local objects
Ingestion/Registration Process
ConceptBank
ClassificationBank
GeoBank
QuestionBank
QuestionnaireBank
InstructionBank
UniverseBank
VariableBank
StudyBank
…Banks
MetadataIngester
NesstarPublisher
DDI 1/2.x
IngestWS
PublicationTool
LegacyDatabase
DDI 3.0+
Custom Exporter
DDI 3.0Converter
MetadataRegistry
PublicationWS
ExampleSubmission of a Nesstar DDI will typically result in creation of objects in the following
banks: study, classifications, variables, instance (files) and possibly concepts,
universes, questions, instructions if such variable level metadata have been
compiled.
ExampleA legacy system used for the production of questionnaire could create objects in the
question, questionnaire, instruction, concepts, universes and classification
banks. This may happen outside the context of a survey (question bank) and no variable
would be associated with these objects.
Metadata optimization / harmonizationOptimization of the metadata (merging
duplicates, aligning on harmonized objects, etc.) can be done using various automated, semi-automated or manual methods during the various stages of submission (this can
also be performed later on)
SubmissionObject registration could be automated upon
release of the metadata by the provider. Workflow can be implemented as
necessary.
RepositoryMany metadata repositories can exist
around the network. These can be deployed at the provider level, or as shared metadata
storage.
RepositoryWS
InterfacesNote that metadata repositories also
expose a set of general and specialized web services along with administrative /
security interfaces
MetadataRepositories
(Banks)Submission
Submission packages are prepared by providers in compliance with the CESSDA DDI3+ specification. Publications tools are
used to manage packages and control ingestion process. Packages are broken down
and stored in various banks (as needed)
DDI 2/3 expressed as Template/DDI-profile, as a) selection of elements, with status b) Controled vocabularies c) Multilingual thesaurus d) Gazetteer e) CESSDA classification
Portal Search Browse
ELSST x time, space, methodology
ELSST Query service
Data loader: May handle multipleand complex data packagesExplore and compare functionality
Ingest (AIP)
Data repositories UKDA DDA FSD
SSO/AAA
Politics (Repository or package level)Metadata
Metadata
Metadata
Metadata
Data Data Data Data
Web browser
Conceptualisation
Instrument
Data production (SIP)
Data documentation
Harmonisation (and concepts) DB
Log database
Question DB
Download
Registry
CESSDA Toolkit1
5
3
4
6
7
98
2
11
12
10
Tool
Intermediate storage