Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and...

34
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarks for Digital Preservation Tools Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker Vienna University of Technology and University of Toronto IPRES 2015 , Chapel Hill, North Carolina, USA November 3, 2015 1

Transcript of Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and...

Page 1: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Benchmarks for Digital Preservation Tools

Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

Vienna University of Technology and University of Toronto

IPRES 2015 , Chapel Hill, North Carolina, USA November 3, 2015

1

Page 2: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview

1. Need for evidence in digital preservation • Software tools in digital preservation

and evidence around their quality

2. Software benchmarks in digital preservation • What, Why, How

• Software quality

3. Benchmark proposals • Photo migration

• Property extraction from documents

4. Are we ready? Next steps https://goo.gl/ps6x8I

In case you get bored

Page 3: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview

1. Need for evidence in digital preservation • Software tools in digital preservation

and evidence around their quality

2. Software benchmarks in digital preservation • What, Why, How

• Software quality

3. Benchmark proposals • Photo migration

• Property extraction from documents

4. Are we ready? Next steps https://goo.gl/ps6x8I

In case you get bored

Page 4: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evidence in digital preservation

Important topic !!!

Page 5: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evidence in digital preservation

Important topic !!!

Page 6: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software tools in digital preservation

•many tools (>400 tools in COPTR registry http://coptr.digipres.org)

•categories

• identification, validation & characterisation

• migration

• emulation

• ….

•high quality required

Droid

Page 7: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .However !!!

Blog post at OPF by paul on 19th Oct

2012.

http://openpreservation.org/blog/2012/10/19/practitioners-have-spoken-we-need-better-characterisation/

Page 8: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .However !!!

Blog post at OPF by johan on 31st Jan

2014.

http://openpreservation.org/blog/2014/01/31/why-cant-we-have-digital-preservation-tools-just-work/

Page 9: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .However !!!

Panel at IPRES 2014

Page 10: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Evidence of software quality in digital preservation

•the need for evaluation is not new ! • testbeds and numerous individual evaluation initiatives

•often take-up is missing

2002 2006 2010

Dutch Digital

Preservation Testbed

DELOS Digital

PreservationTestbed

Planets Testbed

?

Page 11: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview

1. Need for evidence in digital preservation • Software tools in digital preservation

and evidence around their quality

2. Software benchmarks in digital preservation • What, Why, How

• Software quality

3. Benchmark proposals • Photo migration

• Property extraction from documents

4. Are we ready? Next steps https://goo.gl/ps6x8I

In case you are already bored

Page 12: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software benchmarks

•Proposal : Start creating software benchmarks to extend the evidence base around software quality of digital preservation tools

•major characteristics • collaborative • open • public

Page 13: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software benchmarks - definition

Source Definition key words

SPEC glossary

“a "benchmark" is a test, or set of tests, designed to compare the performance of one computer system against the performance of others”

test, compare, performance

IEEE Glossary

“1. a standard against which measurements or comparisons can be made. 2. a procedure, problem, or test that can be used to compare systems or components to each other or to a standard. 3. a recovery file”

measurements, comparison, test

W. Tichy “a benchmark is a task domain sample executed by a computer or by a human and computer. During execution, the human or computer records well-defined performance measurements. A benchmark provides a level playing field for competing ideas, and (assuming the benchmark is sufficiently representative) allows repeatable and objective comparisons”

task domain sample, performance measurements, repeatable objective comparison

Sim et al. “a test or set of tests used to compare the performance of alternative tools or techniques”

test, compare

Page 14: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Software benchmarks - definition

Source Definition key words

SPEC glossary

“a "benchmark" is a test, or set of tests, designed to compare the performance of one computer system against the performance of others”

test, compare, performance

IEEE Glossary

“1. a standard against which measurements or comparisons can be made. 2. a procedure, problem, or test that can be used to compare systems or components to each other or to a standard. 3. a recovery file”

measurements, comparison, test

W. Tichy “a benchmark is a task domain sample executed by a computer or by a human and computer. During execution, the human or computer records well-defined performance measurements. A benchmark provides a level playing field for competing ideas, and (assuming the benchmark is sufficiently representative) allows repeatable and objective comparisons”

task domain sample, performance measurements, repeatable objective comparison

Sim et al. “a test or set of tests used to compare the performance of alternative tools or techniques”

test, compare

DP Software Benchmark is a set of tests which enable rigorous comparison of software solutions

for a specific task in digital preservation

Page 15: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmarks in other fields

•Software Engineering

• transactions

• virtualisation

• databases

• …

•Information Retrieval

• text

• video

• music

• …

Page 16: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmarks in other fields

community

specifications tools & methods that need benchmarks

yearly meeting

resultsiterative process

Page 17: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmark components

Software engineering

Motivating comparison

Task sample

Performance measures

Page 18: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmark components

Information retrieval

Tasks

Datasets

Answersets

Measures

Representation formats

Page 19: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmark components

Software engineering

Motivating comparison

Task sample

Performance measures

Information retrieval

Tasks

Datasets

Answersets

Measures

Representation formats

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

Page 20: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Motivating comparison

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•What is benchmark supposed to compare ? • Which quality aspects are addressed ?

•What would be the benefits ?

Page 21: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Motivating comparison

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•software quality •many aspects

•we need a model •ISO SQUARE

Page 22: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Function

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•specific task tool is expected to perform

Page 23: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dataset

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•set of digital objects • images • audio • video

• relevant • accurate • complete

Page 24: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Ground truth

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•correct answers •optional element •machine readable

Page 25: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Performance measures

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•fitness of the benchmarked tool

•type • quantitative • qualitative

•calculated • by human • by machine

Page 26: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview

1. Need for evidence in digital preservation • Software tools in digital preservation

and evidence around their quality

2. Software benchmarks in digital preservation • What, Why, How

• Software quality

3. Benchmark proposals • Raw Photo migration

• Property extraction from documents

4. Are we ready? Next steps https://goo.gl/ps6x8I

In case you are bored

Page 27: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .RAW Photo migration benchmark

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•various RAW file formats •migrate to DNG •many tools should be tested

•migrate from RAW to DNG

•collection of photographs in RAW format

•SSIM

Page 28: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Property values extraction benchmark

Digital preservation

Motivating comparison

Function

Dataset

Ground truth

Performance measures

•different document properties•coverage of needed properties•correctness of extracted values

•extract property values from a document

•collection of documents in MS Word 2007 file format

•correct values

•coverage •accuracy•exact match ratio

Page 29: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overview

1. Need for evidence in digital preservation • Software tools in digital preservation

and evidence around their quality

2. Software benchmarks in digital preservation • What, Why, How

• Software quality

3. Benchmark proposals • Raw Photo migration

• Property extraction from documents

4. Are we ready? Next steps https://goo.gl/ps6x8I

In case you are bored

Page 30: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Benchmarking - community driven process

•benchmarks emerge from the community consensus • community participation needs to be enabled

•based on established results where possible

•community needs to incur the costs • continuous evolution of benchmarks is necessary

•Are we ready ?

Page 31: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Is the community ready ? Indicators

•projects

•workshops

•own conference • several other

journals and conferences

• concerns

•vocabularies

•metrics

Page 32: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Conclusion

Page 33: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Next step

create benchmark specifications

Benchmarking forum workshop

Friday 6th November 9:00 - 17:00

Page 34: Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Questions ?

https://goo.gl/ps6x8I

In case you were not bored