Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011 ...

83
Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011 http://code4lib.org/conference/2011/schedu le#preconf 13:30-16:30 : Persimmon Room

Transcript of Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011 ...

Page 1: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Using JHOVE2 for Policy Assessment of Files

Richard AndersonCode4LibCon Preconference

2/7/2011

http://code4lib.org/conference/2011/schedule#preconf13:30-16:30 : Persimmon Room

Page 2: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Agenda 13:30-16:30

• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules

Page 3: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

JHOVE2 is …

… a project to develop a next-generation open source framework and application for format-aware characterization

… a collaborative undertaking of the California Digital Library (CDL), Portico, and Stanford University

… a two year grant from the Library of Congress as part of its National Digital Information Infrastructure Preservation Program (NDIIPP)

Page 4: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

“What? So what?”

Characterization is the automated determination of the intrinsic and extrinsic properties of a formatted object

– Identification

– Feature extraction

– Validation

– Assessment

Determining the presumptive format of a digital object based on suggestive extrinsic hints and intrinsic signatures

Reporting the intrinsic properties of an object significant for classification, analysis, and planning

Page 5: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

What's new in JHOVE2?

Processing of multi-file objects as well as embedded objects inside files

Recursive processing of containers objects

Plug-in Format Modules

Buffered I/O

Internationalized output

Clean APIs and modern design patterns

Je ne sais quoi !

Page 6: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

API design idioms

Separation of concerns

– Annotation and Reflection confluence.ucop.edu/display/JHOVE2Info/Background+Papers

Inversion of Control (IOC) / Dependency Injection

– Martin Fowlermartinfowler.com/articles/injection.html

– Spring Frameworkwww.springsource.org/

Page 7: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Project HomeDomain name

– http://jhove2.org/

Code Repository– https://bitbucket.org/jhove2/main/wiki/Home

• Public Wiki/Documentation• Browse/Clone Source Code• Download Release Packages• Changeset History• Issue Tracking

Mailing lists– [email protected][email protected]

Page 8: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

JHOVE2 Documentation

Complete documentation

– User’s guide

– Architectural overview

– Module specifications

– Programmer’s guide

Page 9: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Agenda 13:30-16:30

• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules

Page 10: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Characterization

Page 11: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Validation vs. AssessmentValidation is the determination of the level of conformance to the normative requirements of a format’s authoritative specification

– To the extent that there is community consensus on these requirements, validation is an objective determination – Hard coded in JHOVE2 Modules

Assessment is the determination of the level of acceptability for a specific purpose on the basis of locally-defined policy rules

– Since these rules are locally configurable, assessment is a subjective determination – Scripted via config files

Page 12: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Format Specifications

Format Specification

JPEG 2000 JP2 (ISO/IEC 15444-1), JPX (ISO/IEC 15444-2)

PDF PDF 1.0 – 1.7, ISO 3200-1, PDF/A-1 (ISO 19005-1), PDF/X-1 (ISO 15920-1), -1a (ISO 15930-4), -2 (ISO 15930-5) -3 (ISO 15930-6)

TIFF TIFF 4 – 6, Class B, F, G, P, R, Y, TIFF/EP (ISO 12234-2), TIFF/IT (ISO 12639), GeoTIFF, Exif (JEITA CP-3451), DNG

UTF-8 ASCII (ANSI X3.4)

WAVE BWF (EBU N22-1997)

Page 13: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Validation vs. AssessmentValidation is the determination of the level of conformance to the normative requirements of a format’s authoritative specification

– To the extent that there is community consensus on these requirements, validation is an objective determination – Hard coded in JHOVE2 Modules

Assessment is the determination of the level of acceptability for a specific purpose on the basis of locally-defined policy rules

– Since these rules are locally configurable, assessment is a subjective determination – Scripted via config files

Page 14: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Putting it another way …

Assessment is the evaluation ofa source unit's

reportable properties against a set of

policy-based rules

Page 15: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment is the evaluation ofa source unit's

– File (UTF-8)– File with embedded ByteStream(s)

(TIFF with ICC profile)– Aggregate (Directory, ZIP ) – ClumpSource (ShapeFile)

reportable properties against a set of

policy-based rules

Page 16: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment is the evaluation ofa source unit's reportable properties

– Format Identification– Features – Validity

against a set of policy-based rules

Page 17: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment is the evaluation ofa source unit's

reportable properties

against a set of policy-based rules– Is the item acceptable?

– Is there a preservation risk?– What level of preservation service?– Should we flag object for future action?

Page 18: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Practical Applications of Assessment

• Ingest workflows

• Migration workflows

• Digitization workflows

• Publishing workflows

Page 19: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Agenda 13:30-16:30

• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules

Page 20: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Running JHOVE

jhove2.sh –d Text –o outfile.txt myfile.xmlDisplay format choices are: Text (default), JSON, and XML.

File argument can be any of:– Filename– Directory name– URL– Set of space-delimited filepaths

http://bitbucket.org/jhove2/main/wiki/documents/JHOVE2-Users-Guide.pdf

Page 21: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

JHOVE2 Output options

• Input File– xml-schemaLocation-cannot-resolve.xml

• Text– text-output.txt

• XML– xml-output.xml

• JSON– json-output.txt

Page 22: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

FileSource:

Path: E:\samples\xml\schema-sample.xml

Size (byte): 9516

LastModified: 2010-10-12T11:55:29-06:00

SourceName: schema-sample.xml

StartingOffset (byte): 0

JHOVE2 Output

Page 23: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Format Identification

PresumptiveFormats:

PresumptiveFormat {FormatIdentification}:

NativeIdentifier {I8R}:

Namespace: PUID

Value: fmt/101 PRONOM Identifier

JHOVE2Identifier {I8R}:

Namespace: JHOVE2

Value: http://jhove2.org/terms/format/xml

...

Page 24: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

PRONOM Format Registryhttp://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=638

Name Extensible Markup LanguageVersion 1.0Other names XML (1.0)Identifiers PUID: fmt/101

Apple Uniform Type Identifier: public.xmlMIME: text/xml

Classification Text (Mark-up)Description The Extensible Markup Language (XML) is a general

purpose markup language for creating other, special purpose, markup languages, and is a simplified subset of SGML. …

Page 25: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Agent used for Identification

Module {DROIDIdentifier}:

SignatureFile: …/DROID_SignatureFile_V20.xml

Version: 2.0.0

ReleaseDate: 2010-09-10

WrappedProduct:

Name: DROID

Version: 4.0.0

ReleaseDate: 2009-07-23

...

Page 26: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

DROIDhttp://sourceforge.net/projects/droid/ DROID (Digital Record Object Identification) is an automatic

file format identification tool. It is the first in a planned series of tools developed by The National Archives under the umbrella of its PRONOM technical registry service

Page 27: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

XML Module Module {XmlModule}:

SaxParser:

Parser: org.apache.xerces.parsers.SAXParser

XmlDeclaration:

Version: 1.0

Encoding: UTF-8

Standalone: no

RootElement:

Name: mets

Namespace: http://www.loc.gov/METS/

Page 28: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

XML Module (namespaces) NamespaceInformation:

NamespaceCount: 2

Namespaces:

Namespace:

URI: http://www.loc.gov/METS/

Declarations:

Prefix: [default]

SchemaLocations:

SchemaLocation:

Location: http://www.loc.gov/standards/mets/version15/mets.xsd

Namespace:

URI: http://www.loc.gov/mix/v10

Declarations:

Prefix: mix

Page 29: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

XML Module (cont)

ValidationResults:

ParserWarnings {ValidationMessageList}:

ValidationMessageCount: 0

ParserErrors {ValidationMessageList}:

ValidationMessageCount: 0

FatalParserErrors {ValidationMessageList}:

ValidationMessageCount: 0

isWellFormed: true

isValid: true

Page 30: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Format Modules from JHOVE2 Team

ICC color profileJPEG 2000PDFSGMLShapefile

TIFFUTF-8WAVEXMLZip

JHOVE2 can identify (by DROID) many more formats than it can validate (by modules)

Page 31: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Other Module Development3rd party development activities

– NetCDF and GRIB modules (Wegener Institute)

– Integration with DuraCloud (DuraSpace)– ARC module (Bibliothèque nationale de France)– WARC, JPEG, GIF modules (CDL, hopefully ;-)

Possible development efforts– Additional format modules– Configuration GUIs– JHOVE2-as-a-service– Integration with DAITTS, DSpace, Fedora, FITS, etc.

Suggestions, volunteers and funders welcome

Page 32: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

AssessmentModule Module {AssessmentModule}:

AssessmentResultSets:

AssessmentResultSet:

RuleSetName: XmlRuleSet

RuleSetDescription: RuleSet for Xml Module

ObjectFilter: org.jhove2.module.format.xml.XmlModule

BooleanResult: true

AssessmentResults:

AssessmentResult:

RuleName: XmlValidityRule

RuleDescription: Is the XML file acceptable?

BooleanResult: true

NarrativeResult: Acceptable

Page 33: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Agenda 13:30-16:30

• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules

Page 34: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

JHOVE2 Abstractions

• Source Unit• Module• Reportable• Reportable Property• Message

Page 35: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Source UnitA formatted object about which characterization information can be meaningfully reported

– Unitary File e.g. UTF-8 text file File inside of a container e.g. TIFF inside a Zip Byte stream inside a file e.g. ICC inside a TIFF

– Aggregate Directory Directory inside of a container Clump e.g. Shapefile File set e.g. command line arguments

For purposes of characterization, directories, file sets, and clumps are considered format types

Page 36: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Source Interface (Java)

public Set<FormatIdentification> getPresumptiveFormats() {return presumptiveFormatIdentifications;

}public List<Module> getModules() {

return this.modules;}public List<Source> getChildSources() {

return this.children;}

Page 37: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.
Page 38: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Format Module• implements Parser• implements Validator • Implements Reportable• Imports org.jhove2.annotation.ReportableProperty

public long parse(JHOVE2 jhove2, Source source, Input input) {// extract features and //fill in the reportable properties fields

. . . }

Page 39: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Reportables

A Reportable is a named set of properties– Reportables correspond to Java classes

– Including classes for sources and modules

Also define reportables for the major conceptual structures inherent to a format

– JPEG 2000: Box

– TIFF: IFH, IFD, IFD entry (“tag”)

– UTF-8: Character stream, character

– WAVE: Chunk

Page 40: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Reportable Interfacepackage org.jhove2.core

public interface Reportable { public I8R getReportableIdentifier(); public String getReportableName(); public void setReportableName(String name);}

public abstract class AbstractReportable implements Reportable{ protected I8R reportableIdentifier; protected String reportableName;}

A reportable class implements the Reportable marker interface

Page 41: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

ReportablePropertiesA ReportableProperty is a named, typed value

– org.jhove2.annotation.ReportableProperty – Unique formal identifier– Data type

Scalar or collection Java types, JHOVE2 primitive types, or JHOVE2 reportables

– Typed value– Description of correct semantic interpretation– Properties correspond to fields

Page 42: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

ReportableProperty AnnotationEach reportable property is represented by a field and accessor and mutator methodsThe accessor method must be marked with the @ReportableProperty annotation

public class MyReportable implements Reportable{ protected String myProperty;

@ReportableProperty(order=1, desc= “description”, ref= “reference”) public String getMyProperty() { return this.myProperty; }

public void setMyProperty(String property) { this.myProperty = property; }}

Page 43: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Wave Reportable Properties

chunks[ ]

formatChunkNotBeforeDataChunkMessage

missingRequiredFormatChunkMessage

missingRequiredDataChunkMessage

missingRequiredFactChunkMessage

isValid

childChunks[ ]hasPadByteidentifierisValidsize

Page 44: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

UTF-8 Reportable Properties

byteOrderMark

c0Characters

c1Characters

codeBlocks

eOLMarkers

invalidCharacters[ ]

isValid

numCharacters

numLines

numNonCharacters

c0Controlc1ControlcodeBlockcodePointcodePointOutOfRangecoverageinvalidByteValuesisByteOrderMarkisC0ControlisC1ControlisNonCharacterisValidsize

Page 45: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

XML Reportable Properties

Page 46: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Fields for the reportable properties protected String saxParser = "org.apache.xerces.parsers.SAXParser"; protected XmlDeclaration xmlDeclaration = new XmlDeclaration(); protected String xmlRootElementName; protected List<XmlDTD> xmlDTDs; protected HashMap<String,XmlNamespace> xmlNamespaceMap; protected List<XmlNotation> xmlNotations; protected List<String> xmlCharacterReferences; protected List<XmlEntity> xmlEntitys; protected List<XmlProcessingInstruction> xmlProcessingInstructions; protected List<String> xmlComments; protected XmlValidationResults xmlValidationResults ; protected boolean wellFormed ;

Page 47: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Getter methods for reportable propertiesimport org.jhove2.annotation.ReportableProperty;

@ReportableProperty(order = 1, value = "Java class used to parse the XML")

public String getSaxParser() { return saxParser; } @ReportableProperty(order = 2, value = "XML Declaration data") public XmlDeclaration getXmlDeclaration() { return xmlDeclaration; } @ReportableProperty(order = 3, value = "Name of the document's root element") public String getXmlRootElementName() { return xmlRootElementName; }

Page 48: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Messagesif (position == start && ch.isByteOrderMark()) { Object [] messageParms = new Object [] {position};

this.bomMessage = new Message(Severity.INFO,Context.OBJECT,"org.jhove2.module.format.utf8.UTF8Module.bomMessage",messageParms );

}

Page 49: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Messages

• Messages are reportable properties– Unique identifier

info:jhove2/message/…– Context

Process Condition arising from the process of characterization

Object Condition arising in the object being characterized

– Severity Error Warning Info

– Internationalizable

Page 50: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Agenda 13:30-16:30

• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules

http://code4lib.org/conference/2011/schedule#preconf

Page 51: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment rules

Assertions (logical expressions) based on

– Presence/absence of a property– Constraints on property values– Combinations of properties/values

Page 52: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Predicate Logic

• Rules use a construct whose basic structure looks like this:

If (condition)

Then (consequent)

Else (alternative)

http://en.wikipedia.org/wiki/Conditional_(programming)

Page 53: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

ConditionA condition is defined by a

universal or existential qualifier “for all” “for any”¬ “not any”

and an arbitrary set of predicates {ALL_OFF | ANY_OF | NONE_OF}

(predicate) (predicate) ...

http://www.csm.ornl.gov/~sheldon/ds/sec1.6.html

Page 54: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Predicate

Each predicate is a string containing a boolean expression

xmlDeclaration.standalone == 'yes'

These assertions take the form:property relation value

Supported relational operators include:

== != < > =< =>

contains

exists ( != null)

Page 55: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

XML Assessment rule

If ANY_OF validity == true ;

(validity == undetermined) and (wellFormed == true)Then AcceptableElse Not acceptableEnd If

Page 56: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

JPEG 2000 Assessment Rule

If ALL_OF validity == true;

exists(colourBox);

exists(resolutionBox.capture)Then AcceptableElse Not acceptableEnd If

Page 57: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Wave Assessment rule

If ALL_OF validity == true ;

exists(broadcastWaveExtensionChunk) ;

waveFormatChunk.nSamplesPerSec == 96000 ;

waveFormatChunk.nBitsPerSample == 24Then AcceptableElse Not acceptableEnd If

Page 58: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

TIFF Assessment rule

If ANY_OF validity == true ;

((ifd.messages contains ‘offsetNotByteAligned’) or (ifd.messages contains ‘dateNotWellFormed’))Then AcceptableElse Not acceptableEnd If

Page 59: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Rules Engines

• JSR 94: JavaTM Rule Engine APIhttp://jcp.org/en/jsr/detail?id=94

• Rule Engines Overviewhttp://jadex-rules.informatik.uni-hamburg.de/xwiki/bin/view/Resources/Rule+Engines

• Top 10 Java Business Rule Engineshttp://blog.taragana.com/index.php/archive/top-10-java-business-rule-engines/

• Introduction to Droolshttp://www.intltechventures.com/presentations/2008-01-26-Introduction-to-Drools.pdf

Page 60: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Expression Languages• Predicates (conditions) are evaluated using an domain-specific

language that supports scripted examination of Java objects

• MVEL (MVFLEX Expression Language)

http://mvel.codehaus.org/• OGNL (Object-Graph Navigation Language)

http://www.opensymphony.com/ognl

• Groovyhttp://groovy.codehaus.org/

• Open Source Expression Languages in Javahttp://java-source.net/open-source/expression-languages

http://www.java-opensource.com/open-source/expression-languages.html

Page 61: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.
Page 62: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.
Page 63: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.
Page 64: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment Module at work public void assess(JHOVE2 jhove2, Source source) throws JHOVE2Exception { /* Assess the source unit. */ this.configInfo = jhove2.getConfigInfo(); List<Module> modules = source.getModules(); for (Module module : modules) { assessObject(module); this.getModuleAccessor().persistModule(this); } assessObject(source); this.getModuleAccessor().persistModule(this);

}

Page 65: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

AssessObject Method private void assessObject(Object assessedObject) throws JHOVE2Exception {

String objectFilter = assessedObject.getClass().getName();List<RuleSet> ruleSetList = getRuleSetFactory()

.getRuleSetList(objectFilter);if (ruleSetList != null) { for (RuleSet ruleSet : ruleSetList) {

if (ruleSet.isEnabled()) { AssessmentResultSet resultSet =

new AssessmentResultSet();assessmentResultSets.add(resultSet);

resultSet.setRuleSet(ruleSet); resultSet.fireAllRules(assessedObject);

} } }

Page 66: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Fire Off the Rules

Page 67: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.
Page 68: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Sequence Diagram

Page 69: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Identification

Page 70: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Feature extraction

Page 71: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessmemt

Page 72: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Agenda 13:30-16:30

• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules

Page 73: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment Configuration• Lists of properties for a Module can be generated

using the ReportableInstanceTraverser utilityUSAGE: java -cp CLASSPATH

org.jhove2.app.util.traverser.ReportableInstanceTraverser fully-qualified-class-name output-file-path {optional boolean should-recurse(default true)}

• wave-property-list.txt

• tiff-module-properties.txt

Page 74: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment Configuration• Rules are configured using ARules utility

– Utility developed by CDL to create rule set in XML– Future plans: a GUI

• ARules output is a Spring config fle

Page 75: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

ARules configurationruleset XmlRuleSet enabled org.jhove2.module.format.xml.XmlModule

desc Ruleset for XML module

rule XmlStandaloneRule enabled

desc Does XML Declaration specify standalone status?

cons Is Standalone

alt Is Not Standalone

quant all

pred xmlDeclaration.standalone == "yes"

rule XmlAcceptableRule enabled

Desc Is the XML status acceptable?

cons Acceptable

alt Not Acceptable

quant any

pred valid.name() == "True"

pred (valid.name() == "Undetermined") && (wellFormed.name() == "True")

Page 76: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

RuleSet Spring Bean <!-- RuleSet bean for the XmlModule --><bean id="XmlRuleSet" class="org.jhove2.module.assess.RuleSet"

scope="singleton"> <property name="name" value="XmlRuleSet"/> <property name="description"

value="RuleSet for Xml Module"/> <property name="objectFilter"

value="org.jhove2.module.format.xml.XmlModule"/> <property name="rules"> <list value-type="org.jhove2.module.assess.Rule">

<ref local="XmlStandaloneRule"/><ref local="XmlValidityRule"/>

</list></property><property name="enabled" value="true"/>

</bean>

Page 77: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Rule Spring Bean<!-- Rule bean for evaluating validity value --><bean id="XmlValidityRule"

class="org.jhove2.module.assess.Rule" scope="singleton"> <property name="name" value="XmlValidityRule"/> <property name="description"

value="Is the XML validity status acceptable?"/><property name="consequent" value="Acceptable"/> <property name="alternative" value="Not Acceptable"/> <property name="quantifier" value="ANY_OF"/><property name="predicates"> <list value-type="java.lang.String">

<value><![CDATA[ valid.toString() == 'true' ]]</value><value><![CDATA[ (valid.toString() == 'undetermined') &&

(wellFormed.toString() == 'true') ]]></value> </list></property><property name="enabled" value="true"/>

</bean>

Page 78: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Spring Config Filesconfig│ └───spring │ └───module ├───aggrefy │ jhove2-aggrefy-config.xml │ ├───assess │ jhove2-assess-config.xml │ jhove2-ruleset-xml-config.xml │ ├───digest │ jhove2-digest-config.xml │ ├───display │ jhove2-display-config.xml │ ├───identify │ jhove2-display-config.xml

Page 79: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment Output

Results stored as new characterization properties

Rule evaluation output includes – Rule's name and brief description– Boolean value of the condition that was evaluated– Text value of the consequent of alternative– Details of the predicate evaluation results

Page 80: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment Output ExampleModule {AssessmentModule}:

AssessmentResultSets: AssessmentResultSet:

RuleSetName: XmlRuleSet RuleSetDescription: Ruleset for XML module

ObjectFilter: org.jhove2.module.format.xml.XmlModule BooleanResult: false AssessmentResults:

AssessmentResult: RuleName: XmlStandaloneRule RuleDescription: Does XML Declaration specify standalone status? BooleanResult: false NarrativeResult: Is Not Standalone AssessmentDetails: ALL_OF { xmlDeclaration.standalone == "yes" =>

false; } AssessmentResult: RuleName: XmlAcceptableRule RuleDescription: Is the XML status acceptable? BooleanResult: true NarrativeResult: Acceptable AssessmentDetails: ANY_OF { valid.name() == "True" => true;(valid.name( )

== "Undetermined") && (wellFormed.name() == "True") => false; }

Page 81: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Actionable Outcomes?

– Assessment outcome is informational data– Surrounding workflows may utilize assessment

results to guide control mechanism– JHOVE2 provides API, but does not initiate actions

Page 82: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

Assessment Enhancements• Assessment Config file editing

– Make it easier for a non-programmer to edit– Editing should be bullet-proofed if possible

• GUI User interface– Presents a GUI treeview that lists reportable properties in a navigable

hierarchy.

• Sanity checking– Pre-test config files to ensure compatability

• Command-line invocation of the sanity checker• Run check whenever installed modules have been changed

– Also have robust reporting in case property is missing

Page 83: Using JHOVE2 for Policy Assessment of Files Richard Anderson Code4LibCon Preconference 2/7/2011  13:30-16:30.

JHOVE2 Community

Wiki– http://jhove2.org/– https://bitbucket.org/jhove2/main/wiki/Modules

Mailing lists– [email protected][email protected]