Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and...

23
Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon Holmen, and other colleagues at the Unit for Digital Documentation, University of Oslo) Köln 17. Dezember 2015

description

Remember work flow Recording a (camera) media_unit a (original image raw format) media_unit b (large tiff) media_unit c (large jpeg) media_unit d (small jpeg) Recording b (processing software) Recording c (image processing server) Recording d (image processing server) One information object for each original recording (Work) (media_group)

Transcript of Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and...

Page 1: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Softwaretechnologie für FortgeschritteneTeil Eide

Stunde IV: Media server operations and abstraction

(with contributions from Christian-Emil Ore, Jon Holmen, and other colleagues at the Unit for

Digital Documentation, University of Oslo)

Köln 17. Dezember 2015

Page 2: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

File processing

• Parse processing XML• Set up production line– any conversion path with possible conversions can be

added• Matrix of in and out formats and default scripts– can be overridden.

• Scripts run in background– queue handling– load balancing

Page 3: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Remember work flow

Recording a(camera)media_unit a

(original imageraw format)

media_unit b (large tiff)

media_unit c(large jpeg)

media_unit d(small jpeg)

Recording b(processingsoftware)

Recording c(image processing

server)

Recording d(image processing

server)

One information object for each

original recording(Work)

(media_group)

Page 4: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

File processing xml (example)

<IMAGE type="digital_kopi" method="convert" format="jpg" subpath="jpeg" rec_process="3" software="Imagemagick 6.2.9" settings="fileFormat JFIFcontentFormat 24BITRGB”><IMAGE type="digital_kopi" method="convert" format="jpg" width="640" height="480" subpath="small" rec_process="3" software="Imagemagick 6.2.9" settings="maxScale 640x480 fileFormat JFIF contentFormat 24BITRGB" default="1"></IMAGE>

</IMAGE>

Page 5: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

File processing script

Process status: • 0: not converted• 1: under conversion• 2: converted• 9: conversion error

Outer structure producing conversion jobs:

while (!kill_signal) fork out new processwait 10 seconds

Page 6: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

File processing scriptEach forked our job:

dba:connectint i= dba:get_current_conversion_count (status 1)if (i > 10) diestring proc_str= dba:get_proc_string with status 0dba:set_proc_string_status(proc_str) to 1boolean OK= truewhile (!end(proc_str))

interpret(proc_str)string comm= pick_command(proc_str)int ret_val= system(comm)if (ret_val == 0)dba:record_new_file_nameelseOK= false

if (OK = true)dba:set_proc_string_status(proc_str) to 2

elsedba:set_proc_string_status(proc_str) to 9

Page 7: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Archiving script

• This script is run every night (cron job on unix)

• Files for archiving are files where– archive date is NULL– error_field is NULL

Page 8: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Remember long term preservation

tapestation

diskraid

Digital Original

Other copies

Digital Original

Digital Original

Tape duplicates

Page 9: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Archiving scriptdata_list= dba:get_files_for_archivingforeach (data_line from data_list)

if (data_line:file exist)dba:write(“Archive command with timestamp”)int ret_val= archive(file)if (ret_val = “Error”)dba:write(“Archive error”)elseret_val= query_archived_fileint file_size= ret_val:filesizeif (ret_val = “Archived > 1”)dba:write(“Archive warning: multi”)call delete_test(file_size)else if (ret_val = “Archived = 1”)dba:write(“Archiving OK”)call delete_test(file_size)elsedba:write(“Archived but not found error”)elsedba:write(“File not found error”)

Page 10: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Archiving scripts: delete files

delete_test:

file_size_archiving= argfile_size_limit= dba:ask_limit_for_schemaif (file_size_archiving > file_size_limit)

dba:write(delete: yes)

delete:

file_list= dba:files_where(delete=“yes”, deleted=“NULL”)foreach (file in file_list)

int ret_val= system(delete file)if (ret_val = 0)dba:write(“Deleted date_time”)elsedba:write(“Delete error”)

Page 11: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Monitoring

• Zombie jobs• Server load• Memory consumption– leakage– fine on a PC, but with uptime > 100 days…

• Database– error messages– instability– abnormal behaviour

• Must be monitored by humans (using tools)– email messages with control data

Page 12: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Extending metadata

• Some metadata must be kept– e.g., process records– format details

• Some metadata can be changed– classification– motive description

• But keep old versions– institution history– legal liability

• historical institutional/governmental racism• land rights

Page 13: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Example system 1: Fedora

Based on self promotion: http://www.fedora-commons.org• An open source repository system– for the management and dissemination of digital content– robust– modular

• Especially suited for– digital libraries and archives– access as well as preservation

• Used to provide specialized access to– very large and complex digital collections– historic and cultural materials– scientific data

Page 14: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Example system 1: Fedora

• User base that includes– academic and cultural heritage organizations– universities– research institutions– university libraries– national libraries– government agencies

Page 15: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Example system 1: Fedora

• Long-term flexible access to digital resources– can be used to support all types of digital content– digital collections– e-research– digital libraries and archives– digital preservation– institutional repositories– open access publishing– document management– digital asset management

Page 16: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Example system 2: Omeka

Based on self promotion: http://omeka.org• A free, open-source, digital publishing suite for

– scholars– librarians– archivists– museum professionals– cultural enthusiasts

• to publish– archives– collections– exhibits– teaching materials

• Provide interaction for public audiences

Page 17: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Example system 2: Omeka

• Extensible, scalable, and flexible– can handle large collections ( >1 million items)– element sets for institution-specific metadata may be

added– Zend framework for PHP allows for customization– accepts and stores all types of files

• images• video• audio• multi-page documents and PDFs• Power Point presentations• individual items may contain multiple files• etc.

– extensible with dozens of available plugins

Page 18: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Example system 2: Omeka

• Standards-based– metadata (tied to Dublin Core)– web design

• Interoperable– Dublin Core– Fedora Connect

• Data sharing– feeds– RDF– migration

• Re-purpose content– enter or import item metadata once– use items and metadata in multiple instances across website – including exhibits

Page 19: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Towards abstract modelling

• How can this method be generalised?– Some preliminary notes

• Learning strategies• The role of theory

theory/modelling

implementation

Page 20: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

What is an image?

• An image can be found– in a data file– on a 35mm film– at a paper positive– at a glass plate– …

• We do not care, they are all images– modelled as media_units– connected to media_groups

• If something is – another media_unit to an existing media_group, or – a derived media_group

• is a scholarly (content based) choice

Page 21: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

The transformation event

• Each transformation event happened in time– may or may not know when– actor(s) may be know or unknown

• Transfer events from– analogue to analogue– analogue to digital– digital to digital– (digital to analogue)

• are recorded in the same way

Page 22: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

Chains of events

• Connected to a image there is a chain of events– like the passport of a person with stamps

• Can see where the image comes from • Can step in at any point to re-do processing• Some images can be seen as caches– but the distinction between cached and not is not central– rather: some processes can be re-done– but be aware of detail differences

• program versions• libraries

Page 23: Softwaretechnologie für Fortgeschrittene Teil Eide Stunde IV: Media server operations and abstraction (with contributions from Christian-Emil Ore, Jon.

The memory of events

• Storing events: writing history• This is obviously important for old stuff– museums try to track provenience

• But all new will become old• History is made by our scripts– we can record it or let it go