Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and...
-
Upload
milo-chambers -
Category
Documents
-
view
216 -
download
3
description
Transcript of Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and...
Softwaretechnologie für FortgeschritteneTeil Eide
Stunde IV: Media server operations and abstraction
(with contributions from Christian-Emil Ore, Jon Holmen, and other colleagues at the Unit for
Digital Documentation, University of Oslo)
Köln 17. Dezember 2015
File processing
• Parse processing XML• Set up production line– any conversion path with possible conversions can be
added• Matrix of in and out formats and default scripts– can be overridden.
• Scripts run in background– queue handling– load balancing
Remember work flow
Recording a(camera)media_unit a
(original imageraw format)
media_unit b (large tiff)
media_unit c(large jpeg)
media_unit d(small jpeg)
Recording b(processingsoftware)
Recording c(image processing
server)
Recording d(image processing
server)
One information object for each
original recording(Work)
(media_group)
File processing xml (example)
<IMAGE type="digital_kopi" method="convert" format="jpg" subpath="jpeg" rec_process="3" software="Imagemagick 6.2.9" settings="fileFormat JFIFcontentFormat 24BITRGB”><IMAGE type="digital_kopi" method="convert" format="jpg" width="640" height="480" subpath="small" rec_process="3" software="Imagemagick 6.2.9" settings="maxScale 640x480 fileFormat JFIF contentFormat 24BITRGB" default="1"></IMAGE>
</IMAGE>
File processing script
Process status: • 0: not converted• 1: under conversion• 2: converted• 9: conversion error
Outer structure producing conversion jobs:
while (!kill_signal) fork out new processwait 10 seconds
File processing scriptEach forked our job:
dba:connectint i= dba:get_current_conversion_count (status 1)if (i > 10) diestring proc_str= dba:get_proc_string with status 0dba:set_proc_string_status(proc_str) to 1boolean OK= truewhile (!end(proc_str))
interpret(proc_str)string comm= pick_command(proc_str)int ret_val= system(comm)if (ret_val == 0)dba:record_new_file_nameelseOK= false
if (OK = true)dba:set_proc_string_status(proc_str) to 2
elsedba:set_proc_string_status(proc_str) to 9
Archiving script
• This script is run every night (cron job on unix)
• Files for archiving are files where– archive date is NULL– error_field is NULL
Remember long term preservation
tapestation
diskraid
Digital Original
Other copies
Digital Original
Digital Original
Tape duplicates
Archiving scriptdata_list= dba:get_files_for_archivingforeach (data_line from data_list)
if (data_line:file exist)dba:write(“Archive command with timestamp”)int ret_val= archive(file)if (ret_val = “Error”)dba:write(“Archive error”)elseret_val= query_archived_fileint file_size= ret_val:filesizeif (ret_val = “Archived > 1”)dba:write(“Archive warning: multi”)call delete_test(file_size)else if (ret_val = “Archived = 1”)dba:write(“Archiving OK”)call delete_test(file_size)elsedba:write(“Archived but not found error”)elsedba:write(“File not found error”)
Archiving scripts: delete files
delete_test:
file_size_archiving= argfile_size_limit= dba:ask_limit_for_schemaif (file_size_archiving > file_size_limit)
dba:write(delete: yes)
delete:
file_list= dba:files_where(delete=“yes”, deleted=“NULL”)foreach (file in file_list)
int ret_val= system(delete file)if (ret_val = 0)dba:write(“Deleted date_time”)elsedba:write(“Delete error”)
Monitoring
• Zombie jobs• Server load• Memory consumption– leakage– fine on a PC, but with uptime > 100 days…
• Database– error messages– instability– abnormal behaviour
• Must be monitored by humans (using tools)– email messages with control data
Extending metadata
• Some metadata must be kept– e.g., process records– format details
• Some metadata can be changed– classification– motive description
• But keep old versions– institution history– legal liability
• historical institutional/governmental racism• land rights
Example system 1: Fedora
Based on self promotion: http://www.fedora-commons.org• An open source repository system– for the management and dissemination of digital content– robust– modular
• Especially suited for– digital libraries and archives– access as well as preservation
• Used to provide specialized access to– very large and complex digital collections– historic and cultural materials– scientific data
Example system 1: Fedora
• User base that includes– academic and cultural heritage organizations– universities– research institutions– university libraries– national libraries– government agencies
Example system 1: Fedora
• Long-term flexible access to digital resources– can be used to support all types of digital content– digital collections– e-research– digital libraries and archives– digital preservation– institutional repositories– open access publishing– document management– digital asset management
Example system 2: Omeka
Based on self promotion: http://omeka.org• A free, open-source, digital publishing suite for
– scholars– librarians– archivists– museum professionals– cultural enthusiasts
• to publish– archives– collections– exhibits– teaching materials
• Provide interaction for public audiences
Example system 2: Omeka
• Extensible, scalable, and flexible– can handle large collections ( >1 million items)– element sets for institution-specific metadata may be
added– Zend framework for PHP allows for customization– accepts and stores all types of files
• images• video• audio• multi-page documents and PDFs• Power Point presentations• individual items may contain multiple files• etc.
– extensible with dozens of available plugins
Example system 2: Omeka
• Standards-based– metadata (tied to Dublin Core)– web design
• Interoperable– Dublin Core– Fedora Connect
• Data sharing– feeds– RDF– migration
• Re-purpose content– enter or import item metadata once– use items and metadata in multiple instances across website – including exhibits
Towards abstract modelling
• How can this method be generalised?– Some preliminary notes
• Learning strategies• The role of theory
theory/modelling
implementation
What is an image?
• An image can be found– in a data file– on a 35mm film– at a paper positive– at a glass plate– …
• We do not care, they are all images– modelled as media_units– connected to media_groups
• If something is – another media_unit to an existing media_group, or – a derived media_group
• is a scholarly (content based) choice
The transformation event
• Each transformation event happened in time– may or may not know when– actor(s) may be know or unknown
• Transfer events from– analogue to analogue– analogue to digital– digital to digital– (digital to analogue)
• are recorded in the same way
Chains of events
• Connected to a image there is a chain of events– like the passport of a person with stamps
• Can see where the image comes from • Can step in at any point to re-do processing• Some images can be seen as caches– but the distinction between cached and not is not central– rather: some processes can be re-done– but be aware of detail differences
• program versions• libraries
The memory of events
• Storing events: writing history• This is obviously important for old stuff– museums try to track provenience
• But all new will become old• History is made by our scripts– we can record it or let it go