CTS at LC - Access 2010

Post on 16-Dec-2014

870 views 3 download

Tags:

description

CTS at LC, talk given at Access 2010 in Winnipeg.

Transcript of CTS at LC - Access 2010

CTS* at LC**

Daniel Chudnov - 2010-10-15 - dchud at loc govAccess 2010 - Winnipeg

* Content Transfer Services** Library of Congress

follow along at

slideshare.net / dchud

slideshare.net / dchud

work in progress

transferverificationinventoryreportingworkflow

notificationstatusaccess

hard to show

but that won’t stop me

tinyurl.com/cts2010

when i’m donethis shouldmake sense

NDNP

publishingbreaking news*

online

* 100 years after it happens

chroniclingamerica.loc.gov

1,442,264pages

last year at Access

2,692,369pages

this year at Access

went livespring 2007

first two years 1.4Mlast year 2.7M

56 TB content117 TB in copies

how?

1.built abetter

access system

faster ingestfrom 1 month to 1 day

2.workflow

CTSdoesthis

monthsbatches

monts

page counts

first batchreceived2005-10

went live spring 2007

press event

the gap

first CTS workflow2009-09

2010-09

3-4 month lag

2-3 month lag

1-2 month lag

ingest rateapproachesreceipt rate

this makes ussmile

contenttransferservices

some requirements

LC starteddigitizing

in the 1980s

we have a lot

of stuff

in alot

of places

distributedcomputing

environment

commercialMFT*

license

* Managed File Transfer

buy or build?

why, both, thank you

100s of collections

dozens ofcuratorial organizations

lots morestuff

coming every day

along term project

collectingand

making available

services fortransfer ofcontent

any content

lots of transfers“movage”

several services

transferverificationinventoryreportingworkflow

notificationstatusaccess

transfer across

systemsorganizations

time

content transferis

risky

copies fail

bits go bad

drives get lost

you forgetwhat you did

you forgetwhat you had

people retire

software breaks

hardware breaks

three blizzardsin

DC

CTS helpsmake transfers

reliable and resilient

reliable

know when you’vesucceeded

BagIt

packing slipfor data

data in a Bag

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml

identifiesa bag

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml

where thedata starts

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml

packingslip

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | | ...|-- manifest-md5.txt`-- tagmanifest-md5.txt

71607ad119be88c842268a76f0b6b9e9 data/sn99021999/00206538107/1884091301/0621.pdfc602d2ac07508059ce5f5597e239b97f data/sn99021999/00206538120/1885100601/0831.xmla59795bd1584532d5cbc0b1d82f75cf8 data/sn99021999/00206538016/1880061401/0593.pdf3c64fac7e2d49671e0d93908ae42a779 data/sn99021999/00206539616/1888101801/0905.xml03158a560baa7479b3805d2b45ee02cd data/sn99021999/00206538028/1880111501/0405.tiffa56ea18580e1446939ed62709e5b2db data/sn99021999/00206538077/1883061901/1145.pdfbf4fb83ff8305e8256970a3466c1a12d data/sn99021999/00206538120/1885061501/0043.pdf8f3649fc812de74b9d9443ee90a8ac9c data/sn99021999/00206538120/1885111101/1109.tife0b83a7f9ca228271fdaecf6348e1cec data/sn99021999/00206538120/1885101201/0871.xml1c2f84e12792c123ba0aabedd0c0bbad data/sn99021999/00206538107/1884071401/0197.xml080e557fe9f68037605e5b80df4bc4ac data/sn99021999/0020653820A/1888050701/0543.tif532efe32c156459d9d9589caf618f502 data/sn99021999/00206538120/1885071401/0250.tifce607af59a96f2656d9448f38ffda072 data/sn99021999/0020653820A/1888052801/0731.pdf60b626d8fd40aca1b425e86a004bb055 data/sn99021999/00206539628/1888111801/0088.xmla467cd62350334c7aa83cf1e9056c1c6 data/sn99021999/00206539616/1888091701/0629.jp21a434f7a4d843a2c8ffe8d0824fafc3f data/sn99021999/00206538028/1880120801/0482.jp222996d89b4a3334256afaddcaa0238d8 data/sn99021999/00206538016/1874102001/0259.jp236f550da273ad4c592fee1761c98322a data/sn99021999/00206538016/1880052201/0518.jp27f7ccec3f2afae896338498372fd476e data/sn99021999/00206539616/1888080101/0200.pdfc247a5d74d0e7f857c534d935661adbe data/sn99021999/00206538107/1884072601/0286.jp24d497a18a154adcc8636239378ab340b data/sn99021999/00206539628/1889021101/0868.pdf2e8ca2558b54b5c49b2f20a355a60895 data/sn99021999/00206538065/1882092001/0136.xmlfb71493048e5010100f18012f5060d42 data/sn99021999/00206538028/1880123001/0569.xml40b100432890b055a5defbfbea815d57 data/sn99021999/00206538107/1884090901/0590.xml46f6d61480dadc1c988b0baa4de8b6c4 data/sn99021999/00206539628/1888122801/0463.pdf1cb8af0648e8c9df395b63226fe7371f data/sn99021999/00206538016/1874101501/0244.pdf9257834023c683b02f354888b2740b8f data/sn99021999/00206539616/1888102301/0956.xml0d52b3b2b1c5459b7e8d500a8566b0bf data/sn99021999/00206538120/1885080801/0425.tif

indicates two things

1

what i thinki’m sending you

2

whether youreceived it

just likea

packing slip

works acrossspace

works acrosssystems

works acrossorgs

works acrosstime

easy to make

md5deep

BIL

BagItLibrary

Bagger

desktop GUI

BIL is free softwareBagger will be soon

sf.net/projects/loc-xferutils/

see also:BagIt

in Wikipedia

edsu++

reliabilitythrough bagging

resiliencethrough

persistence

verify thatcopies succeed

know whencopies fail

repeat untilcopies succeed

debug&

diagnose

record all of it

know what you haveknow what you did

inventory

BagIt checksumsin a DB

content propertiesproject, process, type

event timeline

receiptverification

QRcopies

accept/rejectingest/release

comments

life cycle ofsome setof content

basicfacts

all the copies projectdetails

event timeline

comments along the way

life cycle ofNDNPbatch

two key things

1automated workflow

using jBPM

this part

process definitionmanages the steps

doesn’t let us forget

2when content partners

callwe can answertheir questions

reporting

answering ourown questions

annual reportsvery important

file countsoverall size

etc.

used to bevery difficultto determine

nowimmediateanytime

mostlyNDNP

newerpartners

also project reporting /planning

NDNP batches - one awardee

NDNP batches - all awardees(same data, CSV export)

provides5000’ view

workflow

working statusat a glance

a personalized view

overview of a whole project

overview of a system

overview of a person

not exactly “Facebook for bags”

but kinda

but wait,there’s more

browse live copies

go right to the content

many benefits

aaaand...

a RESTy web API

we can buildcomplex workflows

withinventory

and reportingin CTS

we can buildQR/workflow/auditing

outside of CTSwith inventoryand reportingthrough CTS

CTS:java, spring, mysql

hibernate, velocity, tilesjquery, jBPM, jetty

NDNP:python, django,

mysql, solr, apache

nice clean interfacesnice separation

different coders,different styles

same benefitsfrom using CTS

what’s next?

many morecontent collections

now:

NDNPWeb Archives

NDIIPPCopyright Cards

next:P&PG&MWDLAFC

TwitterCopyright EDeposit

also coming:

more simple workflows

“Receive and Copy”

fits many use cases

receivebag/verify

copy to archivalcopy to access

works for reconworks for new stuff

and,get past typical problems

permissionsinsufficient storage

failed copies

connectionwith

high expectation

and, finallya UI redesign

thanks!

BagIt - wikipedia

sf.net/projects/loc-xferutils/

hooray for protovis

@dchud - dchud at loc gov