The National Archives and Records Administration digitized by NARA and its Digitizing Partners, ......

56
The National Archives and Records Administration (NARA) Electronic Records Archives (ERA)

Transcript of The National Archives and Records Administration digitized by NARA and its Digitizing Partners, ......

The National Archives and Records Administration (NARA)

Electronic Records Archives (ERA)

ERA Misconceptions and Facts • ERA is not yet operational.

– NARA is relying on ERA every day to preserve and provide access to electronic records, and has been operating since 2008. See About the ERA Program for a description of the different instances of ERA that have been deployed since 2008.

• ERA is a paper digitization and scanning project – ERA is primarily an archive for "born-digital" electronic

records created in the course of business by Federal agencies on their computer systems. However, ERA does have the capability to preserve and make available the electronic data resulting from scanning projects that digitize Federal records. As more and more analog Federal records are digitized by NARA and its Digitizing Partners, as well as other Federal agencies, they will be available using ERA's Online Public Access interface.

• Developing ERA will cost the American taxpayer a projected $1.4 Billion. – The total appropriation for the development

phase of the ERA project, including development, program management, and operations and maintenance costs from 2002 to 2011, will total $457 Million. Development will conclude on September 30, 2011. Operations and maintenance costs thereafter are expected to be $25 Million to $30 Million per year for at least the next couple of years.

• ERA is just an electronic data storage archive. Why couldn't NARA just buy storage devices at a technology superstore like everyone else does? – The requirement to provide a true digital archive for the National

Archives that is capable of applying all the laws and regulations that apply to Federal, Presidential, and Congressional records means that ERA is far more complicated than just a set of data storage devices. For example, ERA also provides workflow support for many of the transactions that occur between NARA and its agency customers, capabilities to process and preserve electronic records, and an access interface to make electronic records available to the public.

• ERA can't solve the problem of long-term preservation of electronic records as hardware and software technology changes over time.– ERA allowed NARA to make a quantum leap forward

in the preservation of electronic records and building a flexible and adaptable framework that will let NARA evolve as electronic recordkeeping evolves. Without ERA, NARA's legacy systems and processes for electronic records would not be able to handle the increasing volumes we are beginning to encounter.

• The public will be able to access and conduct full-text searches against all of the electronic records in ERA– The Online Public Access (OPA) piece of ERA does provide access

and full-text search capabilities for electronic records that have been determined to be free of any access restrictions, and that contain text-based content. Many of the transfers of electronic records NARA receives have some access restrictions that prohibit NARA from releasing them to the public. NARA is required to protect sensitive information about individuals, for example, and it must conduct time-consuming reviews of these records to determine records that can be released, and those that need to be withheld under applicable laws and regulations. Additionally, all Presidential electronic records are subject to the Presidential Records Act and the access provisions therein. Therefore, while OPA does provide the capability to search the content of electronic records, that capability will not apply to all electronic records in the ERA System.

• ERA will "normalize" all electronic records into one single format. – While ERA is implementing a Transformation Framework that

will allow for the conversion of electronic records from one format to another more persistent format, many of the electronic records will remain in their original format until such time as the format approaches obsolescence. Many of the formats we encounter are ubiquitous, and can be rendered by easily by the most popular, and often freely-available, software programs. NARA will continually evaluate the formats we receive from Federal agencies to make sure the essential characteristics of electronic records in our holdings are preserved.

ERA's history • 1969-1993: Formation, stagnation, then

rejuvenation • 1993: Armstrong v. The Executive Office of the

President • 1998: Transition to e-Government supports an

archives of the future • 2000: Establishment of the ERA Program

Management Office • 2004: Seeking a development contractor • 2008: Initial Operating Capability • 2009-2011: Last phases of development

ERAElectronic Records Archives

http://www.archives.gov/era/

The National Archives and Records Administration (NARA)

Archives II, College Park, MD

Hubert Wajs Ph. [email protected]

Toruń2007.12.06

• Historia Projektu (Timeline)• Organizacja Projektu

– Środki (finansowanie)– Partnerzy

• Cele Projektu• Problemy i wyzwania

Timeline I• 1997.12 John W. Carlin Archivist of the United States

powołał do życia Electronic Records Work Group. • 1998 r. NARA otrzymała pierwsze środki rządowe z National

Science Foundation na rozpoczęcie badań nad problematyką archiwów elektronicznych przez tę grupę roboczą.

• 2000 r. został utworzony w NARA ERA Project Office, aby prowadzić badania nad przechowywaniem elektronicznych dokumentów (zaprojektowanie systemu ERA).

• 2002.10.31 - NARA Directive 101- Part 3, Section 6.- Program ERA rozpoczął się oficjalnie.

• 2005.08. – NARA wybrała Lockheed Martin CorporationLockheed Martin Corporation do budowy ERA system.

• 2011 – Full operating capability of ERA system.

FinansowanieROK MLN USD UWAGI

1999 1,8 NA 3 LATA

2001 20 PREZYDENT

2002 22,3 KONGRES

2003 12

2004 36 PREZYDENT

2005 308 KONGRES DO 2012

Partnerships in innovation - research partnerships

• Wiodące instytuty i instytucje IT– National Institute of Standards and Technology, – NASA, – Army Research Laboratory, – San Diego Supercomputer Center, – Georgia Tech Research Institute, – National Center for Supercomputing Applications, the

University of Maryland• Oraz ‘archiwistyczne’

– INTERPARES, – The Library of Congress

CELE

What is ERA?ERA Vision (2004)

• ERA will authentically preserve and provide access to any kind of electronic record, free from dependency on any specific hardware or software, enabling NARA to carry out its mission into the future.

08.2006• ERA will be a

comprehensive system for preserving and providing continuing access to any type of electronic records created anywhere in the U. S. Federal Government enabling NARA to carry out its mission into the future.

[Jarrellann Filsinger]

And what about „authentically preserve records”?

• Identification and Authentication– A password– A token– A fingerprint

• Access control– Roles to perform particular function– No more privilege than necessary to perform a job

• Audit log• Integrity

– Message digest– Virus detection– Encryption

• System assurance– Policies, standards, procedures

ERA Objectives by David Lake 08.2006

• To preserve any type of electronic records, • created using any type of application, • on any computing platform,• delivered on any digital media• from any entity in the Federal Government and

any donor,• to provide discovery and delivery to anyone with

an interest and legal right of access,• Now and for the „Life of the Republic”.

Po co? czyli Dlaczego?• W amerykańskim kodeksie administracyjnym

przyjęto oficjalną definicję dokumentu (44 U.S.C. 3301), stanowiącą, iż niezależnie od nośnika wszystkie dokumenty związane z działalnością administracji przechowywane lub wytypowane do przechowywania są dowodami.

• 1998.10.22 - General Records Schedule 20 (GRS 20)

• 2000.03.20 - wyroku Sądu Najwyższego

The Presidential Projects• The Tennessee Valley Authority: Electricity for All

– 1933.05.18 FDR - Tennessee Valley Authority Act. • TVA was to improve navigability on the Tennessee River.

• 1942 MED - Manhattan Engineering District– The Manhattan Project

• The Apollo Project – 1961.05.25 President Kennedy delivering his famous Moon speech

• "... I believe this nation should commit itself to achieving the goal, before this decade is out, of landing a man on the Moon and returning him safely to the Earth."

– 1969. 07.20 Apollo 11• The Independence Project

– 1973.11 President Richard M. Nixon prescribed antidote for the energy crisis

• ERA - ‘the showcase of American research’

NARA - Zdobyte doświadczenia • Electronic and Special

Media Records Services Division (NARA) od 25 lat gromadzi dokumenty elektroniczne.

• Zasób – 15 TB – taśmy – DLT 8000– IBM 3480 – także dokumentacja

papierowa: • dokumentacja techniczna

tego, co jest w bazach danych

• dokumentacji procesu postępowania z dokumentami

Growing amount of data

1993.01.20 - do 20 2001.01.20

http://www.clintonlibrary.gov/

• Przejęto e-maile z Białego Domu – 40 mln obiektów.

• Depesze (electronic diplomatic messages) z Departamentu Stanu – 25 mln (jeszcze nie są udostępniane).

• Zrzuty stron internetowych głównych agencji rządowych (60% wszystkich agencji federalnych) oraz

Jak to opanować?• Jak

– opisać w tradycyjny sposób? – dokonać kontroli i migracji na nowe nośniki i formaty?– pomieścić w magazynach?

• Dotychczasowe podejście prowadzi w ślepą uliczkę, gdyż coraz powszechniejsze stają się:– poczta elektroniczna i – systemy obiegu dokumentów.

• A dojdą do tego jeszcze:– nagrania filmowe (cyfrowe), – telewizja wysokiej jakości (HDTV), – modele 3D (szczególnie VMR – Vritual Model Reality), – wspomagana komputerowo dokumentacja inżynierska

(systemy CAD) GIS (Geographical Information Systems).

The Electronic Records Archives

• Program ERA to nie prosta kontynuacja dotychczasowych doświadczeń.

• Systemu ERA służącego do: – Przyjmowania („Submission”), – przechowywania („Repository”) i – udostępniania („Dissemination”) dokumentów

elektronicznych; – ma też podtrzymywać zarządzanie przez NARA

różnego typu dokumentami elektronicznymi przez cały okres ich istnienia (lifecycle).

Reference Model for an Open Archival Information System (OAIS) - ISO 14721; 2003

• Digital repository– Submission Information Packages – SIPs – Archival Information Packages - AIPs– Dissemination Information Packages – DIPs

Wyzwania• Przyrost zasobu w

postępie geometrycznym• Tempo wdrażania

innowacji– Nośniki– Sprzęt– Oprogramowanie

• Antycypowania przyszłych (3-5 lat) nowości w: – metodach komunikacji – nośnikach

Projekt Persistant Archives • Grid

– NARA +– Uniwersytetem stanu Maryland +– Centrum Superkomputerowym w San Diego

• przestrzeń koncepcyjną dla badania data objects – rozproszenie zasobu w różnych miejscach– zarządzanie dokumentami niezależne od platformy

sprzętowej i programowej• IBM, DELL, SUN, APPLE

i programowej• SUN, UNIX, LINUX, WINDOWS

Authenticity linking of identity metadata to a record

– Date record is made– Date record is transmitted– Date record is received– Date record is filed– Name of author (person or organization issuing the record)– Name of addressee (person or organization for whom the record is

intended)– Name of writer (person or organization responsible for the record content)– Name of originator (e-mail address of sender)– Name of recipient(s) (person or organization to whom the record is send)– Name of creator (person or organization in whose archival found the record

exists)– Name of action matter (transaction or activities in the course of which the

record is created)– Name of documentary form (e-mail, report, memo etc.)– Identification of digital components– Identification of attachments (digital signature)– Archival classification code– Assertion about the creation of record

Using a DATA GRID I

1. User asks for data from the data grid

2. The data is found and returned1. Where and how details are

hidden

2.1.

DATA GRID

Using a DATA GRID II1. User asks for data from the data grid2. Data request goes to Storage Resource Broker

(SRB)3. Server looks up data in Metadata Catalog4. Catalog tells which SRB server has data

1. Data grid has arbitrary number of servers (addresses and logical files name)

2. Heterogeneity of data (formats) is hidden from users5. 1st server asked 2nd server for data6. The data is found and returned

Virtualization• Logical arrangement of digital records• Persistent identifier for the record is the logical file

name.• Arrangement hierarchy imposed on the logical file

name as collection hierarchy (record group, record series, file, item) associated with Life Cycle Data Requirements Guide attributes with each level of the collection hierarchy.

• Information about all operations performed upon digital record are mapped to the logical file name.

• Logical file name is the link between authenticity information and the record.

NARA and Presidential Libraries

Federation of five independent Data Grids

NARA I NARA II SDSCGT UMd

MCatMCatMCat MCat MCat

SRBSRBSRBSRBSRB TL

TL

E-records• Project PERPOS (Presidential Electronic

Records PilOt System)• E-mails from the George Herbert Walker

Bush’s White House (1989-1993) • William Underwood from Georgia Tech –

Information Technology & Telecommunications Laboratory– Semantic technologies

• Information extraction• Named entity task

– Automation of search and ‘description’ of e-holding– metadata

White HouseCorrespondence

March 27, 1990Dear Mr. Allen

Thank you very much for your letter of March 15, 1990 which stated your concerns and suggestions regarding the Americans with Disabilities Act.

In order to fulfill President Bush’s campaign promise of bringing Americans with handicaps into the mainstream of American life, the Bush Administration support the objectives of the A.D.A

As you may know, the bill is still in House Committee for consideration and change. You can be sure that your thoughts have been fully noted at are appreciated.

Sincerely, Doug WeadSpecial Assistant to the President for Public Liaison

Ray Allen, PresidentAmerican Cultural TraditionsP.O. Box 1995Washington, D.C.20013

Named Entities Extractedfrom the Letter

<date>March 27, 1990</date><greeting>Dear</greeting><person>Mr. Allen</person> <p>Thank you very much for your letter of <date> March 15, 1990</date>

which stated your concerns and suggestions regarding the Americans with Disabilities Act.</p>

<p>In order to fulfill <name>Bush”s</name> campaign promise of bringing Americans with handicaps into the mainstream of American life, the Bush Administration supports the objectives of the A. D. A. </p>

<p>As you may know, the bill is still in <organization>House Committee</organization> for consideration and change. You can be sure that your thoughts have been fully noted and are appreciated.</p>

<formula of respect>Sincerely</formula of respect><person>Doug Wead</person>

<job title> Assistant to the President for Public Liaison</job title> <person> Ray Allen</person> <job title>President</job title><organization> American Cultural Traditions </organization><postal address>P.O. Box 1895 </postal address><location>Washington,

D.C. </location><zipcode>20013</zipcode>

Producer - Archive Workflow Network – (PAWN)

• przekazywania cyfrowych obiektów od twórcy (z aktualnego środowiska ) do archiwum (rozumianego tak, jak zakłada model OAIS)

• standard METS (MEetadata Transmission Standard)– ‘push model’ – gdy wytwórca przygotowuje dane do

przekazania– ‘pull model’ – gdy przejmuje je samo archiwum. – Przy transferze danych korzysta się z podpisu

cyfrowego (PKI)

Co już zrobiono?• Przygotowanie systemu wymagało ogromnych prac wstępnych:• grupy ekspertów z każdej z 18 agencji rządowych określających

działanie systemu (‘co?’), – ‘What?’ should ERA do - not ‘how?’

• drobiazgowe modele, procedur i wymagania (łącznie określono około 850 requirements).– Concept of Operations

• Dokumentacja projektowa dostępna na stronie: (http://www.archives.gov/era/about/documentation.html

• System ma mieć otwartą budowę modułową opartą na open source dla administracji federalnej USA

• Przetarg dwustopniowy: – (1. etap) wyłonienie dwóch firm, osobno pracujących nad projektem (1 rok)

– (2.. etap) wyłonienie ostatecznego zwycięzcy Lockheed Martin Lockheed Martin Corporation Corporation który będzie budował ERA system (2005.09.05)

Life cycle of the records

OfficeRecords manager

Archives

Time

Creation Semi active Appraisal and selection

Archivist

ACCESS

Life cycle of the records in ERA

Office010101

010101

010101

ERMSArchives - ERA

010101

Time

Creation Semi active ??? Appraisal and selection ???

PAWN Archivist ???

ACCESS