Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and...
-
Upload
12th-international-conference-on-digital-preservation-ipres-2015 -
Category
Presentations & Public Speaking
-
view
175 -
download
0
Transcript of Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Benchmarks for Digital Preservation Tools
Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker
Vienna University of Technology and University of Toronto
IPRES 2015 , Chapel Hill, North Carolina, USA November 3, 2015
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview
1. Need for evidence in digital preservation • Software tools in digital preservation
and evidence around their quality
2. Software benchmarks in digital preservation • What, Why, How
• Software quality
3. Benchmark proposals • Photo migration
• Property extraction from documents
4. Are we ready? Next steps https://goo.gl/ps6x8I
In case you get bored
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview
1. Need for evidence in digital preservation • Software tools in digital preservation
and evidence around their quality
2. Software benchmarks in digital preservation • What, Why, How
• Software quality
3. Benchmark proposals • Photo migration
• Property extraction from documents
4. Are we ready? Next steps https://goo.gl/ps6x8I
In case you get bored
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evidence in digital preservation
Important topic !!!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evidence in digital preservation
Important topic !!!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software tools in digital preservation
•many tools (>400 tools in COPTR registry http://coptr.digipres.org)
•categories
• identification, validation & characterisation
• migration
• emulation
• ….
•high quality required
Droid
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .However !!!
Blog post at OPF by paul on 19th Oct
2012.
http://openpreservation.org/blog/2012/10/19/practitioners-have-spoken-we-need-better-characterisation/
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .However !!!
Blog post at OPF by johan on 31st Jan
2014.
http://openpreservation.org/blog/2014/01/31/why-cant-we-have-digital-preservation-tools-just-work/
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .However !!!
Panel at IPRES 2014
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Evidence of software quality in digital preservation
•the need for evaluation is not new ! • testbeds and numerous individual evaluation initiatives
•often take-up is missing
2002 2006 2010
Dutch Digital
Preservation Testbed
DELOS Digital
PreservationTestbed
Planets Testbed
?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview
1. Need for evidence in digital preservation • Software tools in digital preservation
and evidence around their quality
2. Software benchmarks in digital preservation • What, Why, How
• Software quality
3. Benchmark proposals • Photo migration
• Property extraction from documents
4. Are we ready? Next steps https://goo.gl/ps6x8I
In case you are already bored
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software benchmarks
•Proposal : Start creating software benchmarks to extend the evidence base around software quality of digital preservation tools
•major characteristics • collaborative • open • public
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software benchmarks - definition
Source Definition key words
SPEC glossary
“a "benchmark" is a test, or set of tests, designed to compare the performance of one computer system against the performance of others”
test, compare, performance
IEEE Glossary
“1. a standard against which measurements or comparisons can be made. 2. a procedure, problem, or test that can be used to compare systems or components to each other or to a standard. 3. a recovery file”
measurements, comparison, test
W. Tichy “a benchmark is a task domain sample executed by a computer or by a human and computer. During execution, the human or computer records well-defined performance measurements. A benchmark provides a level playing field for competing ideas, and (assuming the benchmark is sufficiently representative) allows repeatable and objective comparisons”
task domain sample, performance measurements, repeatable objective comparison
Sim et al. “a test or set of tests used to compare the performance of alternative tools or techniques”
test, compare
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software benchmarks - definition
Source Definition key words
SPEC glossary
“a "benchmark" is a test, or set of tests, designed to compare the performance of one computer system against the performance of others”
test, compare, performance
IEEE Glossary
“1. a standard against which measurements or comparisons can be made. 2. a procedure, problem, or test that can be used to compare systems or components to each other or to a standard. 3. a recovery file”
measurements, comparison, test
W. Tichy “a benchmark is a task domain sample executed by a computer or by a human and computer. During execution, the human or computer records well-defined performance measurements. A benchmark provides a level playing field for competing ideas, and (assuming the benchmark is sufficiently representative) allows repeatable and objective comparisons”
task domain sample, performance measurements, repeatable objective comparison
Sim et al. “a test or set of tests used to compare the performance of alternative tools or techniques”
test, compare
DP Software Benchmark is a set of tests which enable rigorous comparison of software solutions
for a specific task in digital preservation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmarks in other fields
•Software Engineering
• transactions
• virtualisation
• databases
• …
•Information Retrieval
• text
• video
• music
• …
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmarks in other fields
community
specifications tools & methods that need benchmarks
yearly meeting
resultsiterative process
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmark components
Software engineering
Motivating comparison
Task sample
Performance measures
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmark components
Information retrieval
Tasks
Datasets
Answersets
Measures
Representation formats
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmark components
Software engineering
Motivating comparison
Task sample
Performance measures
Information retrieval
Tasks
Datasets
Answersets
Measures
Representation formats
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Motivating comparison
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•What is benchmark supposed to compare ? • Which quality aspects are addressed ?
•What would be the benefits ?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Motivating comparison
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•software quality •many aspects
•we need a model •ISO SQUARE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Function
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•specific task tool is expected to perform
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dataset
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•set of digital objects • images • audio • video
• relevant • accurate • complete
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Ground truth
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•correct answers •optional element •machine readable
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Performance measures
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•fitness of the benchmarked tool
•type • quantitative • qualitative
•calculated • by human • by machine
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview
1. Need for evidence in digital preservation • Software tools in digital preservation
and evidence around their quality
2. Software benchmarks in digital preservation • What, Why, How
• Software quality
3. Benchmark proposals • Raw Photo migration
• Property extraction from documents
4. Are we ready? Next steps https://goo.gl/ps6x8I
In case you are bored
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .RAW Photo migration benchmark
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•various RAW file formats •migrate to DNG •many tools should be tested
•migrate from RAW to DNG
•collection of photographs in RAW format
•SSIM
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Property values extraction benchmark
Digital preservation
Motivating comparison
Function
Dataset
Ground truth
Performance measures
•different document properties•coverage of needed properties•correctness of extracted values
•extract property values from a document
•collection of documents in MS Word 2007 file format
•correct values
•coverage •accuracy•exact match ratio
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview
1. Need for evidence in digital preservation • Software tools in digital preservation
and evidence around their quality
2. Software benchmarks in digital preservation • What, Why, How
• Software quality
3. Benchmark proposals • Raw Photo migration
• Property extraction from documents
4. Are we ready? Next steps https://goo.gl/ps6x8I
In case you are bored
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmarking - community driven process
•benchmarks emerge from the community consensus • community participation needs to be enabled
•based on established results where possible
•community needs to incur the costs • continuous evolution of benchmarks is necessary
•Are we ready ?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Is the community ready ? Indicators
•projects
•workshops
•own conference • several other
journals and conferences
• concerns
•vocabularies
•metrics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Next step
create benchmark specifications
Benchmarking forum workshop
Friday 6th November 9:00 - 17:00
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Questions ?
https://goo.gl/ps6x8I
In case you were not bored