Discover Data Portal

13
Jeff Sale and Diane Baxter San Diego Supercomputer Center University of California, San Diego

description

 

Transcript of Discover Data Portal

Page 1: Discover Data Portal

Jeff Sale and Diane BaxterSan Diego Supercomputer CenterUniversity of California, San Diego

Page 2: Discover Data Portal

Data: Evidence to unravel themysteries of our Universe

Page 3: Discover Data Portal

Data provide answers to our children’s single most persistent question:

How do you know that?

Page 4: Discover Data Portal

Data Come From Every Field . . .

Astronomy

Physics

Life Sciences

Modeling and Simulation

Data Managementand Mining

GAMESS

Geosciences

Page 5: Discover Data Portal

And are shared around the world.

SDSC

PRAGMA: Pacific Rim GridMiddleware Consortium TeraGrid: National

Research Resource Grid

GEON: GeosciencesGrid

BIRN: Biomedical Informatics Grid

Open Science Grid: Physics-driven Grid

infrastructureNEES: Earthquake

Engineering Grid

Using high-performance network connections, the TeraGrid integrates high-performance computers, data resources and tools, and high-end experimental facilities around the country. These integrated resources include more than 102 teraflops of computing capability and more than 15 petabytes (quadrillions of bytes) of online and archival data storage with rapid access and retrieval over high-performance networks. Through the TeraGrid, researchers can access over 100 discipline-specific databases. With this combination of resources, the TeraGrid is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research.TeraGrid is coordinated through the Grid Infrastructure Group (GIG) at the University of Chicago, working in partnership with the Resource Provider sites: Indiana University, Oak Ridge National Laboratory, National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing Center, and University of Chicago/Argonne National Laboratory.
Page 6: Discover Data Portal

DisciplinaryDisciplinaryDatabasesDatabasesUsersUsers

Portals, Domain Portals, Domain Specific APIsSpecific APIs

provide accessprovide accessto datato data

MiddlewareMiddlewarefederates datafederates data

across disciplinaryacross disciplinaryvocabulariesvocabularies

OrganismsOrganisms

OrgansOrgans

CellsCells

AtomsAtoms

BiopolymersBiopolymers

OrganellesOrganelles

Cell BiologyCell Biology

AnatomyAnatomy

PhysiologyPhysiology

ProteomicsProteomics

Medicinal ChemistryMedicinal Chemistry

GenomicsGenomics

Life Sciences

Page 7: Discover Data Portal

How much data are we producing*?

Kilo 103

Mega 106

Giga 109

Tera 1012

Peta 1015

Exa 1018

1 human brain at the

micron level = 1 PetaByte

1 novel = 1 MegaByte

iPod Shuffle (up to 120 songs) = 512 MegaBytes

Printed materials in the Library of Congress = 10 TeraBytes

SDSC HPSS tape archive = 25 PetaBytes and growing

All worldwide information in one year = 2 ExaBytes

1 Low Resolution

Photo = 100 KiloBytes

* Rough/average estimates

1 DVD = 9.4 GigaBytes

Page 8: Discover Data Portal

Computational tools are essential to comprehend that much data!

Integrate vast data collections from a wide variety of collection points

Visualize empirical results

Create mathematical models based on complex, interconnected data

Page 9: Discover Data Portal

Computational ModelsExtend beyond data to:

Predict

Ask “what if” questions

Evaluate alternate hypotheses

Visualize vast, complex data

collectionsManipulate multiple variables

Page 10: Discover Data Portal

Why is Data Literacy So Essential?Data = the foundation of science

Data shared can solve problems

Data can bridge and connect fields, ideas and peopleComputation, the “third leg” of research, depends upon data

Page 11: Discover Data Portal

From atomic interaction data that form a model of a molecular dynamics

Page 12: Discover Data Portal

To light emission data from long-dead stars that explain the origins of the Universe

Page 13: Discover Data Portal

Teaching about data means teaching the “language and currency” of science.

A Window to Data

The Discover Data Education Portal