Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using...

25
Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software

Transcript of Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using...

Page 1: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Glynn EdwardsSAA – August 22, 2015Director, ePADD Project

Archival Stewardship of Email using ePADD Software

Page 2: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 3: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Developed and funded by:

Page 4: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

ePADD program

Collection Development

Pre-Acquisition Appraisal

Capture Normalization Item-level processing Bulk processing

Intellectual Arrangement

Search Capability

Personal/Sensitive Information Processing

Packaging RepositoryOnline

DiscoveryAccess

CERP Parser Email message Email message

DArcMail Email message Email message Fielded

EMCAPServer

Version Email message Email message

Server version only

ArchivematicaMessage +

attachmentsMessage +

attachments

PeDALS Email message Email messageOther: not declared

ePADD Message +

attachmentsMessage +

attachments

NLP; fielded; full-text; lexicon

Identification (Reg. Ex.)

EASMessage +

attachmentsMessage +

attachmentsfielded; full-text

Identification (Reg. Ex.)

eMailchemy

MailStore Server

Message + attachments

Message + attachments

Full-text

AccessData FTK

Message + attachments

Message + attachments

Full-textIdentification (Reg.

Ex.) ZL Unified Archive

Message + attachments

Message + attachments

Full-text

Preservica Standard

Message + attachments

Message + attachments

Other: not declared

Paraben Email Examiner

Message + attachments

Message + attachments

Other: not declared

Aid4Mail Professional

Other: not declared

Full support Not Supported Unknown

Lifecycle Tools for Archival Email Stewardship

Preservation Access Accessioning Archival Processing

Page 5: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Appraisal Module

Page 6: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 7: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

ePADD Technical Information

ePADD is written in Java and Javascript and powered by Apache Tomcat (v7.0) using Java EE Servlet API (v3.x) and Java Mail (v1.4.2). Text and metadata extraction, indexing and retrieval is performed by Apache Lucene (v4.7) and Apache Tika (v1.8). Charting and visualization is supported using the D3‑based reusable chart library (v0.4.10). Oracle's Java Application Bundler and Launch4J are used for packaging on Mac and Windows platforms respectively. Other Java libraries from Apache (Lang, commons, CLI, IO, logging, etc.) are also used. JSON formatting is performed with the libraries org.json and Gson.

 ePADD has implemented its own natural language processing (NLP) toolkit which is used for named entity extraction, disambiguation and other tasks. This toolkit supplants the Apache OpenNLP used in earlier beta versions of the ePADD software. We continue to use Muse as an internal library within ePADD. However, the Apache OpenNLP proved insufficient for our needs (at least for name recognition), and after various rounds of customization, we built our own named entity recognizer. This toolkit uses external datasets such as Wikipedia/DBpedia, Freebase, Geonames, OCLC FAST and LC Subject Headings/LC Name Authority File.

 The project is developed with IDEs like IntelliJ Idea and Eclipse, built with Apache Maven, Ant, and custom shell scripts, and tracked using Git for source control and issue tracking. The ePADD software client is browser‑based and compatible with Chrome and Firefox. It is optimized for Windows 7 and OSX 10.9/10.10 machines, using Java 7 or 8.

Page 8: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Correspondents: Resolving multiple accounts into single entry

Page 9: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Actions: do not transfer – restrict - reviewed

Page 10: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Processing Module

Page 11: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 12: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 13: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 14: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 15: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 16: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Disambiguation of names

Page 17: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Discovery & Delivery (Access)

Page 18: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 19: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Query generator

Page 20: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 21: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Upload of CSV files of email addresses for matching with existing archiveSearch by Date and Date Range

1.1 release - August 2015New features

Page 22: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Future Roadmap

• Enhance Natural Language Processing Capability• Enhance the Processing Module Features • Enhance the Discovery/ Delivery Module Features• Recommend and Test Preservation Strategy • Collaboration with other Platforms & Services • Explore Sustainability Model • Add Restriction Management/ Annotation Functions • Enhance the Error Handling Capability

Page 23: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.
Page 24: Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.