The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego...

29
SAN DIEGO SUPERCOMPUTER CENTER The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego [email protected] Robert H. McDonald SDSC, UC San Diego [email protected] Ardys Kozbial UC San Diego Libraries, UC San Diego [email protected]

Transcript of The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego...

Page 1: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

The NDIIPP/SDSC Partnership

David MinorSDSC, UC San [email protected]

Robert H. McDonaldSDSC, UC San [email protected]

Ardys KozbialUC San Diego Libraries, UC San Diego

[email protected]

Page 2: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Outline

• SDSC and Big Data (Mass Storage) (David)

• Current NDIIPP Projects (Robert)

• Overview of SDSC/LC Data Center Pilot (David)

• Overview of SDSC Digital Preservation (Robert)

• Points for Discussion and Feedback (David/Robert/Ardys)

Page 3: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

SDSC at a GlanceOriginal NSF supercomputer center (1985)Supports 3 High Performance Computing Systems

Supports Data Applications for Science, Engineering, Social Sciences, Cultural Heritage Institutions200 TB Disk Storage25 PB Tape Storage

Page 4: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Data Structure at SDSC

Page 5: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

SDSC and Data Cyberinfrastructure(aka Big Data)

• The mission of the San Diego Supercomputer Center (SDSC) is to empower communities in data‐oriented research, education, and practicethrough the innovation and provision of Cyberinfrastructure 

Cyberinfrastructure = resources

(computers, data storage, networks, scientific

instruments, experts, etc.) + “glue”

(integrating software, systems, and organizations).

Virtually all modern research and education efforts are enabled by information and computational

infrastructure

Page 6: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

SDSC/NDIIPP COLLABORATION

NDIPP/NSF DigArch

Data Center Pilot

Technical Architecture

Chronopolis

Page 7: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Data Center for Library of Congress Digital Holdings:

Library of Congress:Office of Strategic Initiatives

(National Digital Information Infrastructure and Preservation Program)

University of California, San Diego:San Diego Supercomputer Center and

UCSD Libraries

A Pilot Project

Page 8: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Project Overview:“Building Trust in a Third Party Data Repository”

- Pilot project to be completed in 1 year

- Slightly less than $ 1 million

- Transfer, store and study multiple TBs of data

“… demonstrate the feasibility and performance of current approaches for a production digital Data Center to support the Library of Congress’ requirements.”

Page 9: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Data Collection:Prints and Photographs Division

Prokudin-Gorskii Photographs

http://www.loc.gov/exhibits/empire/

Page 10: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Data Collection:Prints and Photographs Division

Characteristics of the collection

• Different file types based on the original pieces

• Recreations of projections, based on files

• Unique file structure based on years of ad hoc storage

In many ways, a good example of digital memory:extending the lifespan and accessibility of a traditionalcollection using digital mechanisms.

Page 11: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Data Collection:Prints and Photographs Division

What are we doing with collection?

• Providing a replica of their production environment

• Providing a new front end

• Providing extensive logging and monitoring

• Tasks accomplished using SRB

Page 12: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Data Collection:Web Archiving and Preservation Project

Characteristics of the collection

A living snapshot of this moment in history. These “documents” exist nowhere else.

• 6TB of of “born digital”materials

• Library had never indexed this much at once

• Special file format and software installations

Page 13: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Data Collection:Web Archiving and Preservation Project

What are we doing with collection?

• Indexed all data by re-writing indexing software – took it from 30+ days of compute time to 7 days

• Installed and configured Wayback web access to replicate their environment

• Performed usability studies comparing our two sites.

Page 14: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

Early Findings of Pilot

Page 15: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

NDIIPP/SDSC Partnership:A MetaPartnership 2007-2008

Library of Congress:Office of Strategic Initiatives

(National Digital Information Infrastructure and Preservation Program)

Chronopolis:San Diego Supercomputer Center, UCSD Libraries,

National Center for Atmospheric Research, University of Maryland

California Digital Library

Interuniversity Consortium for Political and Social Science Research

Page 16: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

What Is Chronopolis?Chronopolis:

• is a geographically distributed preservation environment that supports long‐term management and stewardship of digital collections

• is implemented by developing and deploying a distributed data grid, and by supporting its human, policy, and technological infrastructure.  

• includes technology forecasting and migration in support of long‐term   life‐cycle management of the dedicated preservation environment.

Digital Collections

of Long-TermValue

Production Preservation Environment

TechnologyForecasting and

Migration

Administration, Policy,

Outreach

Page 17: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Chronopolis Vision• Assessment of the needs of potential user communities and 

development of appropriate service models

• Development of roles and responsibilities of providers, partners, users• Development of Memoranda of Understanding (MOUs), Service Level Agreements 

(SLAs), etc. to formalize trust relationships and manage expectations

• Assessment and prototyping of best practices for bit preservation, authentication, metadata, etc. 

• Development of appropriate cost and risk models for long‐term preservation

• Development of appropriate success metrics to evaluate usefulness, reliability, and usability of infrastructure

Page 18: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Who Is Chronopolis?• Chronopolis is being developed by a national consortium led by SDSC and the UCSD Libraries.  

• Initial Chronopolis provider sites include:

• SDSC and UCSD Libraries at UC San Diego

• University of Maryland

• National Center for Atmospheric Research (NCAR) in Boulder, CO

UCSD Libraries

Page 19: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Chronopolis Demonstration Data Grid

SDSC/UCSDL

Univ. of Maryland

National Center for Atmospheric Research (NCAR)

Chronopolis Provider Sites

• The Chronopolis demonstration  Data Grid is composed of 3 geographically distributed Chronopolis provider sites.

• Each provider takes on different roles with respect to a set of demonstration collections.

Demonstration collections include:• National Virtual Observatory (NVO) [1 TB Digital

Palomar Observatory Sky Survey]

• Copy of Interuniversity Consortium for Political and Social Research (ICPSR) data [1 TB Web-accessible Data]

• NCAR Observational Data [3 TB of Observational and Re-Analysis Data]

Page 20: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

2007-2008 Next Steps

Page 21: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Chronopolis Support for ICPSR for NDIIPP Collaboration

Develop automatic ingest mechanism and ingest extant ICPSR holdings (12-14 TB) into Chronopolis environment.

Provide mass storage for all ICPSR holdings within the Chronopolis preservation environment and standard auditing.

Enhance current standard agreements (MOU and SLA) with ICPSR.

Content authentication at ingest for initial collection verification.

NETWORK SERVICES

ORGANIZATIONAL FUNCTIONS

Centers of Expertise

ContentPlan

Public Policy

Standards

R & D

Auditing

Automatic Ingest & Metadata

Mass Storage

Tools

Training

Formal RegistryContent Directory

Secure StorageAccreditation

Advocacy

Metadata

Rights

Agreements

User

Ingest/Deposit

ContentAuthentication

Creator

Owner

Distributor

SelectorCataloger

Archiver/Custodian

DataManager

DiscoveryandNavigation

Centers of Expertise

ContentPlan

Public Policy

Standards

R & D

Auditing

Automatic Ingest & Metadata

Mass Storage

Tools

Training

Formal RegistryContent Directory

Secure StorageAccreditation

Advocacy

Metadata

Rights

Agreements

User

Ingest/Deposit

ContentAuthentication

Creator

Owner

Distributor

SelectorCataloger

Archiver/Custodian

DataManager

DiscoveryandNavigation

NDIIPP Stewardship Network

Chronopolis Services not used by ICPSR

Chronopolis Services used by ICPSR

Page 22: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TMChronopolis Support for CDL (Web at Risk) for NDIIPP Collaboration

Develop tool and methodology for ingest of select content from CDL Web at Risk Collections to Chronopolis grid environment.

Develop automated state metadata for collections ingested within Chronopolis for mapping to PREMIS standards.

Provide mass storage for select CDL Web at Risk Collections within the Chronopolis preservation environment and standard auditing.

Develop standard agreements (MOU and SLA) with CDL.

Content authentication at ingest for initial collection verification.

NETWORK SERVICES

ORGANIZATIONAL FUNCTIONS

Centers of Expertise

ContentPlan

Public Policy

Standards

R & D

Auditing

Automatic Ingest & Metadata

Mass Storage

Tools

Training

Formal RegistryContent Directory

Secure StorageAccreditation

Advocacy

Metadata

Rights

Agreements

User

Ingest/Deposit

ContentAuthentication

Creator

Owner

Distributor

SelectorCataloger

Archiver/Custodian

DataManager

DiscoveryandNavigation

Centers of Expertise

ContentPlan

Public Policy

Standards

R & D

Auditing

Automatic Ingest & Metadata

Mass Storage

Tools

Training

Formal RegistryContent Directory

Secure StorageAccreditation

Advocacy

Metadata

Rights

Agreements

User

Ingest/Deposit

ContentAuthentication

Creator

Owner

Distributor

SelectorCataloger

Archiver/Custodian

DataManager

DiscoveryandNavigation

NDIIPP Stewardship Network

Chronopolis Services not used by CDL

Chronopolis Services used by CDL

Page 23: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Identified Technical Architecture Issues• LC/SDSC/UCSDL Pilot (Identified TechArch Areas)• Content Transfer

• Network• Integrity• Verification• Provenance

• Mass Storage• Web Crawls• Aggregated Domain 

Collections

• Software (Middleware)• Formalized Trust

• “Contract Services”

• NDIIPP SDSC Partnership(In Development)• Content Transfer

• Upfront Network• Tape to Tape (No No)• Disk Staging

• Mass Storage• Diverse WG around TechArch (Translators)

• Software (Middleware)• Formalized Trust

Page 24: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Points for Discussion• Content Transfer• Storage and Scale• Software – Middleware – Lock‐in• Human Resources• Federations for Success (Formalized Trust)

Page 25: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Trends across NDIIPP

Page 26: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

SDSC and NDIIPP• We appreciate the opportunity to be involved in the NDIIPP partner network and its important work

• Goal for today:  Discover technical architecture needs of NDIIPP Partners.

Page 27: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Page 28: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

UC SAN DIEGO LIBRARIES

TM

Other Slides follow

Page 29: The NDIIPP/SDSC Partnership · The NDIIPP/SDSC Partnership David Minor SDSC, UC San Diego minor@sdsc.edu Robert H. McDonald SDSC, UC San Diego mcdonald@sdsc.edu ... (SLAs), etc. to

SAN DIEGO SUPERCOMPUTER CENTER

The NDIIPP Collaboration• SDSC Support for Multiple Levels of NDIIPP Collaboration• NDIIPP DigArch Program

• Digital Object Lifecycle for Video Broadcast

• LC DATA Pilot Project• Content Transfer• Formalized Trust Relationships

• NDIIPP Technical Architecture• Partnerships with:• CDL• ICPSR

• Support for Digital Preservation• Chronopolis