CTS at LC - Access 2010

152
CTS * at LC ** Daniel Chudnov - 2010-10-15 - dchud at loc gov Access 2010 - Winnipeg * Content Transfer Services ** Library of Congress follow along at slideshare.net / dchud

description

CTS at LC, talk given at Access 2010 in Winnipeg.

Transcript of CTS at LC - Access 2010

Page 1: CTS at LC - Access 2010

CTS* at LC**

Daniel Chudnov - 2010-10-15 - dchud at loc govAccess 2010 - Winnipeg

* Content Transfer Services** Library of Congress

follow along at

slideshare.net / dchud

Page 2: CTS at LC - Access 2010

slideshare.net / dchud

Page 3: CTS at LC - Access 2010

work in progress

Page 4: CTS at LC - Access 2010

transferverificationinventoryreportingworkflow

notificationstatusaccess

Page 5: CTS at LC - Access 2010

hard to show

but that won’t stop me

Page 6: CTS at LC - Access 2010
Page 7: CTS at LC - Access 2010
Page 8: CTS at LC - Access 2010
Page 9: CTS at LC - Access 2010

tinyurl.com/cts2010

Page 10: CTS at LC - Access 2010

when i’m donethis shouldmake sense

Page 11: CTS at LC - Access 2010

NDNP

Page 12: CTS at LC - Access 2010

publishingbreaking news*

online

* 100 years after it happens

Page 13: CTS at LC - Access 2010
Page 14: CTS at LC - Access 2010

chroniclingamerica.loc.gov

Page 15: CTS at LC - Access 2010

1,442,264pages

last year at Access

Page 16: CTS at LC - Access 2010

2,692,369pages

this year at Access

Page 17: CTS at LC - Access 2010

went livespring 2007

Page 18: CTS at LC - Access 2010

first two years 1.4Mlast year 2.7M

Page 19: CTS at LC - Access 2010

56 TB content117 TB in copies

Page 20: CTS at LC - Access 2010

how?

Page 21: CTS at LC - Access 2010

1.built abetter

access system

Page 22: CTS at LC - Access 2010

faster ingestfrom 1 month to 1 day

Page 23: CTS at LC - Access 2010

2.workflow

Page 24: CTS at LC - Access 2010

CTSdoesthis

Page 25: CTS at LC - Access 2010

monthsbatches

monts

page counts

Page 26: CTS at LC - Access 2010

first batchreceived2005-10

went live spring 2007

press event

the gap

Page 27: CTS at LC - Access 2010

first CTS workflow2009-09

2010-09

3-4 month lag

2-3 month lag

1-2 month lag

Page 28: CTS at LC - Access 2010

ingest rateapproachesreceipt rate

Page 29: CTS at LC - Access 2010

this makes ussmile

Page 30: CTS at LC - Access 2010
Page 31: CTS at LC - Access 2010

contenttransferservices

Page 32: CTS at LC - Access 2010

some requirements

Page 33: CTS at LC - Access 2010

LC starteddigitizing

in the 1980s

Page 34: CTS at LC - Access 2010

we have a lot

of stuff

Page 35: CTS at LC - Access 2010

in alot

of places

Page 36: CTS at LC - Access 2010

distributedcomputing

environment

Page 37: CTS at LC - Access 2010

commercialMFT*

license

* Managed File Transfer

Page 38: CTS at LC - Access 2010

buy or build?

why, both, thank you

Page 39: CTS at LC - Access 2010
Page 40: CTS at LC - Access 2010

100s of collections

Page 41: CTS at LC - Access 2010

dozens ofcuratorial organizations

Page 42: CTS at LC - Access 2010

lots morestuff

coming every day

Page 43: CTS at LC - Access 2010

along term project

Page 44: CTS at LC - Access 2010

collectingand

making available

Page 45: CTS at LC - Access 2010

services fortransfer ofcontent

Page 46: CTS at LC - Access 2010

any content

Page 47: CTS at LC - Access 2010

lots of transfers“movage”

Page 48: CTS at LC - Access 2010

several services

Page 49: CTS at LC - Access 2010

transferverificationinventoryreportingworkflow

notificationstatusaccess

Page 50: CTS at LC - Access 2010

transfer across

systemsorganizations

time

Page 51: CTS at LC - Access 2010

content transferis

risky

Page 52: CTS at LC - Access 2010

copies fail

Page 53: CTS at LC - Access 2010

bits go bad

Page 54: CTS at LC - Access 2010

drives get lost

Page 55: CTS at LC - Access 2010

you forgetwhat you did

Page 56: CTS at LC - Access 2010

you forgetwhat you had

Page 57: CTS at LC - Access 2010

people retire

Page 58: CTS at LC - Access 2010

software breaks

Page 59: CTS at LC - Access 2010

hardware breaks

Page 60: CTS at LC - Access 2010

three blizzardsin

DC

Page 61: CTS at LC - Access 2010

CTS helpsmake transfers

reliable and resilient

Page 62: CTS at LC - Access 2010

reliable

know when you’vesucceeded

Page 63: CTS at LC - Access 2010

BagIt

packing slipfor data

Page 64: CTS at LC - Access 2010

data in a Bag

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml

Page 65: CTS at LC - Access 2010

identifiesa bag

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml

Page 66: CTS at LC - Access 2010

where thedata starts

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml

Page 67: CTS at LC - Access 2010

packingslip

.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | | ...|-- manifest-md5.txt`-- tagmanifest-md5.txt

Page 68: CTS at LC - Access 2010

71607ad119be88c842268a76f0b6b9e9 data/sn99021999/00206538107/1884091301/0621.pdfc602d2ac07508059ce5f5597e239b97f data/sn99021999/00206538120/1885100601/0831.xmla59795bd1584532d5cbc0b1d82f75cf8 data/sn99021999/00206538016/1880061401/0593.pdf3c64fac7e2d49671e0d93908ae42a779 data/sn99021999/00206539616/1888101801/0905.xml03158a560baa7479b3805d2b45ee02cd data/sn99021999/00206538028/1880111501/0405.tiffa56ea18580e1446939ed62709e5b2db data/sn99021999/00206538077/1883061901/1145.pdfbf4fb83ff8305e8256970a3466c1a12d data/sn99021999/00206538120/1885061501/0043.pdf8f3649fc812de74b9d9443ee90a8ac9c data/sn99021999/00206538120/1885111101/1109.tife0b83a7f9ca228271fdaecf6348e1cec data/sn99021999/00206538120/1885101201/0871.xml1c2f84e12792c123ba0aabedd0c0bbad data/sn99021999/00206538107/1884071401/0197.xml080e557fe9f68037605e5b80df4bc4ac data/sn99021999/0020653820A/1888050701/0543.tif532efe32c156459d9d9589caf618f502 data/sn99021999/00206538120/1885071401/0250.tifce607af59a96f2656d9448f38ffda072 data/sn99021999/0020653820A/1888052801/0731.pdf60b626d8fd40aca1b425e86a004bb055 data/sn99021999/00206539628/1888111801/0088.xmla467cd62350334c7aa83cf1e9056c1c6 data/sn99021999/00206539616/1888091701/0629.jp21a434f7a4d843a2c8ffe8d0824fafc3f data/sn99021999/00206538028/1880120801/0482.jp222996d89b4a3334256afaddcaa0238d8 data/sn99021999/00206538016/1874102001/0259.jp236f550da273ad4c592fee1761c98322a data/sn99021999/00206538016/1880052201/0518.jp27f7ccec3f2afae896338498372fd476e data/sn99021999/00206539616/1888080101/0200.pdfc247a5d74d0e7f857c534d935661adbe data/sn99021999/00206538107/1884072601/0286.jp24d497a18a154adcc8636239378ab340b data/sn99021999/00206539628/1889021101/0868.pdf2e8ca2558b54b5c49b2f20a355a60895 data/sn99021999/00206538065/1882092001/0136.xmlfb71493048e5010100f18012f5060d42 data/sn99021999/00206538028/1880123001/0569.xml40b100432890b055a5defbfbea815d57 data/sn99021999/00206538107/1884090901/0590.xml46f6d61480dadc1c988b0baa4de8b6c4 data/sn99021999/00206539628/1888122801/0463.pdf1cb8af0648e8c9df395b63226fe7371f data/sn99021999/00206538016/1874101501/0244.pdf9257834023c683b02f354888b2740b8f data/sn99021999/00206539616/1888102301/0956.xml0d52b3b2b1c5459b7e8d500a8566b0bf data/sn99021999/00206538120/1885080801/0425.tif

Page 69: CTS at LC - Access 2010

indicates two things

Page 70: CTS at LC - Access 2010

1

what i thinki’m sending you

Page 71: CTS at LC - Access 2010

2

whether youreceived it

Page 72: CTS at LC - Access 2010

just likea

packing slip

Page 73: CTS at LC - Access 2010

works acrossspace

Page 74: CTS at LC - Access 2010

works acrosssystems

Page 75: CTS at LC - Access 2010

works acrossorgs

Page 76: CTS at LC - Access 2010

works acrosstime

Page 77: CTS at LC - Access 2010

easy to make

Page 78: CTS at LC - Access 2010

md5deep

Page 79: CTS at LC - Access 2010

BIL

BagItLibrary

Page 80: CTS at LC - Access 2010

Bagger

desktop GUI

Page 81: CTS at LC - Access 2010
Page 82: CTS at LC - Access 2010

BIL is free softwareBagger will be soon

Page 83: CTS at LC - Access 2010

sf.net/projects/loc-xferutils/

Page 84: CTS at LC - Access 2010

see also:BagIt

in Wikipedia

edsu++

Page 85: CTS at LC - Access 2010

reliabilitythrough bagging

Page 86: CTS at LC - Access 2010

resiliencethrough

persistence

Page 87: CTS at LC - Access 2010

verify thatcopies succeed

Page 88: CTS at LC - Access 2010

know whencopies fail

Page 89: CTS at LC - Access 2010

repeat untilcopies succeed

Page 90: CTS at LC - Access 2010

debug&

diagnose

Page 91: CTS at LC - Access 2010

record all of it

Page 92: CTS at LC - Access 2010

know what you haveknow what you did

Page 93: CTS at LC - Access 2010

inventory

Page 94: CTS at LC - Access 2010

BagIt checksumsin a DB

Page 95: CTS at LC - Access 2010

content propertiesproject, process, type

Page 96: CTS at LC - Access 2010

event timeline

Page 97: CTS at LC - Access 2010

receiptverification

QRcopies

accept/rejectingest/release

comments

Page 98: CTS at LC - Access 2010

life cycle ofsome setof content

Page 99: CTS at LC - Access 2010

basicfacts

all the copies projectdetails

Page 100: CTS at LC - Access 2010

event timeline

Page 101: CTS at LC - Access 2010

comments along the way

Page 102: CTS at LC - Access 2010

life cycle ofNDNPbatch

Page 103: CTS at LC - Access 2010

two key things

Page 104: CTS at LC - Access 2010

1automated workflow

using jBPM

Page 105: CTS at LC - Access 2010

this part

Page 106: CTS at LC - Access 2010

process definitionmanages the steps

doesn’t let us forget

Page 107: CTS at LC - Access 2010

2when content partners

callwe can answertheir questions

Page 108: CTS at LC - Access 2010

reporting

answering ourown questions

Page 109: CTS at LC - Access 2010

annual reportsvery important

Page 110: CTS at LC - Access 2010

file countsoverall size

etc.

Page 111: CTS at LC - Access 2010

used to bevery difficultto determine

Page 112: CTS at LC - Access 2010

nowimmediateanytime

Page 113: CTS at LC - Access 2010
Page 114: CTS at LC - Access 2010

mostlyNDNP

newerpartners

Page 115: CTS at LC - Access 2010

also project reporting /planning

Page 116: CTS at LC - Access 2010

NDNP batches - one awardee

Page 117: CTS at LC - Access 2010

NDNP batches - all awardees(same data, CSV export)

Page 118: CTS at LC - Access 2010

provides5000’ view

Page 119: CTS at LC - Access 2010

workflow

Page 120: CTS at LC - Access 2010
Page 121: CTS at LC - Access 2010
Page 122: CTS at LC - Access 2010

working statusat a glance

Page 123: CTS at LC - Access 2010

a personalized view

Page 124: CTS at LC - Access 2010

overview of a whole project

Page 125: CTS at LC - Access 2010

overview of a system

overview of a person

Page 126: CTS at LC - Access 2010

not exactly “Facebook for bags”

but kinda

Page 127: CTS at LC - Access 2010

but wait,there’s more

Page 128: CTS at LC - Access 2010

browse live copies

Page 129: CTS at LC - Access 2010
Page 130: CTS at LC - Access 2010

go right to the content

Page 131: CTS at LC - Access 2010

many benefits

Page 132: CTS at LC - Access 2010

aaaand...

a RESTy web API

Page 133: CTS at LC - Access 2010

we can buildcomplex workflows

withinventory

and reportingin CTS

Page 134: CTS at LC - Access 2010

we can buildQR/workflow/auditing

outside of CTSwith inventoryand reportingthrough CTS

Page 135: CTS at LC - Access 2010

CTS:java, spring, mysql

hibernate, velocity, tilesjquery, jBPM, jetty

Page 136: CTS at LC - Access 2010

NDNP:python, django,

mysql, solr, apache

Page 137: CTS at LC - Access 2010

nice clean interfacesnice separation

Page 138: CTS at LC - Access 2010

different coders,different styles

Page 139: CTS at LC - Access 2010

same benefitsfrom using CTS

Page 140: CTS at LC - Access 2010

what’s next?

Page 141: CTS at LC - Access 2010

many morecontent collections

Page 142: CTS at LC - Access 2010

now:

NDNPWeb Archives

NDIIPPCopyright Cards

Page 143: CTS at LC - Access 2010

next:P&PG&MWDLAFC

TwitterCopyright EDeposit

Page 144: CTS at LC - Access 2010

also coming:

more simple workflows

Page 145: CTS at LC - Access 2010

“Receive and Copy”

Page 146: CTS at LC - Access 2010

fits many use cases

receivebag/verify

copy to archivalcopy to access

Page 147: CTS at LC - Access 2010

works for reconworks for new stuff

Page 148: CTS at LC - Access 2010

and,get past typical problems

permissionsinsufficient storage

failed copies

Page 149: CTS at LC - Access 2010

connectionwith

high expectation

Page 150: CTS at LC - Access 2010

and, finallya UI redesign

Page 151: CTS at LC - Access 2010

thanks!

Page 152: CTS at LC - Access 2010

BagIt - wikipedia

sf.net/projects/loc-xferutils/

hooray for protovis

@dchud - dchud at loc gov