Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation...
-
Upload
darlene-marshall -
Category
Documents
-
view
213 -
download
1
Transcript of Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation...
Incompatible or Interoperable?A METS bridge for a small gap between two digital preservation software packages
Lucas MakMetadata & [email protected]
Aaron CollieDigital Curation [email protected]
What we wanted
What we found
What we did
METS
We bridged a gap, but we didn’t close the bridge
METS
METS 1.8
METS
Archivematica Output: METS.xml AIP DIP
METS
Fedora Commons Input: METS Fedora Extension
• fedora-batch-ingest.sh Datastreams!
Staging12 TB
Dark Archive84 TB
Serving5 TB
AIP
DIPDIP
AIP
METS
METS Fedora Ext. 1.1
XSL
Humans
Staging12 TB
METS 1.8
DIP
AIP
ProQuest
Persistent ID (PID)
METS
PQ_DATA
DC
MODS
PREMIS
(…)
METS
DIP
AIP (“E”)
METS Fedora Ext. 1.1
Why?
We wanted to be able to control and systematize ingest at the microservice level And we like the direction Archivematica is taking
We wanted to pipe technical and preservation metadata into Fedora Commons This was the reason we got started
We haven’t contributed to the open source community, and we wanted something to learn on. We are thinking of it as professional development…
Comparing Archivematica & Fedora METS Different schema
Archivematica: METS v. 1.8• http://www.loc.gov/standards/mets/version18/mets.xsd
Fedora: Fedora METS 1.1• http://fedora-commons.org/definitions/1/0/mets-fedora-ext1-1.xsd
Differences in structure, elements, attributes, & values allowed
<structMap> Archivematica
• Physical structMap of the bag (i.e. directory structure) Fedora: No <structMap> per v.1.1*
Solution: Structure represented by <GROUPID> & <SEQ> attributes of <mets:file> <SEQ> by page no. embedded in filename
• only physical arrangement is possible unless changing file naming convention to include logical info
<GROUPID> by file type/usage (e.g. preservation master, high/low resolution access copies)
* <mets:structMap> is allowed in schema v.1.0 (used until Fedora 3.0)
<fileSec> Archivematica
• Two file groups: “Original” & “Submission documentation”– Original: digital objects– Submission documentation: descriptive metadata XML files
Fedora• Datastreams to be ingested as files
– Files of digital objects and others (e.g. Archivematica METS)– Descriptive metadata XML files are ingested as “inline XML
datastreams”» Copy all XML files in “Submission documentation” into
separate <dmdSecFedora> elements
<amdSec> Archivematica: Hierarchical structure
<amdSec ID=“amdSec1”><techMD ID=“techMD1”/>…<digiProvMD ID=“digiProvMD1”/>
</amdSec><amdSec ID=“amdSec2”>
<techMD ID=“techMD2”/>…<digiProvMD ID=“digiProvMD2”/>
</amdSec>
• 1 digital file has 1 <amdSec>• All <techMD>, <rightsMD>, <sourceMD> and <digiProvMD> pertaining to the same file are
nested under the same <amdSec>
Fedora: Flat structure <amdSec ID=“tech1”><techMD ID=“tech1.0”/></amdSec><amdSec ID=“digiProv1”><digiProvMD ID=“digiProv1.0”/></amdSec><amdSec ID=“tech2”><techMD ID=“tech2.0”/></amdSec>
• To accommodate inline XML datastream versioning– ID (syntax DSn.v) contains both:
» the number of the inline datastream (n) and » the version number of the datastream (v)
– Individual <amdSec> serves as container and its ID serves to indicate datastream number
– <techMD> and alike have their IDs to indicate datastream version number
<AMDID> attribute in <mets:file> Archivematica
• Pointing to one <amdSec>, which has <techMD>, <rightsMD>, <sourceMD>, and <digiProvMD> nested within, per file– <mets:file ID=“file1” AMDID=“amdSec1”/>
Fedora• Pointing to multiple <amdSec>, each of which contains <techMD>,
<rightsMD>, <sourceMD>, or <digiProvMD>, per file– <mets:file ID=“file1” AMDID= “tech1 rights1 source1
digiProv1”/>
<dmdSec> Archivematica
• Only 1 Dublin Core record is allowed to describe the SIP• Constrained by Archivematica workflow instead of METS schema• Additional descriptive metadata XML records are included in “Submission
documentation” folder Fedora
• Fedora extension element: <dmdSecFedora>• Allowed MDTYPE: MARC, EAD, DC, NISOIMG, LC-AV, VRA, TEI Header, DDI, FGDC,
& OTHER• Copy XML files in “Submission documentation” folder into separate
<dmdSecFedora>– MODS has to be labeled as “OTHER”– Use namespace URI to assign correct “MDTYPE”
» Does not work with TEI Header or EAD
<mets:metsHdr> Archivematica
• Does not use (optional in METS schema) Fedora
• <RECORDSTATUS> attribute to indicate whether the object is “active”, “inactive” or “deleted”
• Hard-coding in with constant data<mets:metsHdr RECORDSTATUS="A"> <mets:agent ROLE="IPOWNER" TYPE="ORGANIZATION"> <mets:name>MSU Libraries Digital and Multimedia Center</mets:name> </mets:agent> </mets:metsHdr>
<OWNERID> attribute in <mets:file> Archivematica
• Does not use (optional in METS schema) Fedora
• To indicate whether the file is “managed by Fedora internally”, “externally referenced”, or “redirected”– Though optional according to Fedora-METS schema
• Determine based on filename or file format– Archivematica add “checksum” into filename for files
generated during the preservation workflow
Proposed Workflow
84TB
Dark Archive(s)
Staging Area 12 TB
Serving Share(s)
Web Display
METS
AIP DIP
What a bridge gets us: Automatically extracts and captures technical &
preservation metadata Eases handling of complex objects with lots of
metadata or parts Maintains and manages separate AIP/DIP packages
METS
What a full integration might benefit from: Archivematica A/DIP Content Model & Solution Pack Integrated AIP management
Including dashboard GUI Including JMS messaging
Integrated rebuilds from filesystem Currently supported in Fedora Commmons On Roadmap for archivematica
Automated ingest, improved handling
Questions?
Lucas Mak ([email protected]) Aaron Collie ([email protected])