Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San...

12
Microsoft Research Microsoft Research Jim Gray Jim Gray Distinguished Engineer Distinguished Engineer Microsoft Research Microsoft Research San Francisco San Francisco SKYSERVER SKYSERVER

Transcript of Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San...

Page 1: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

Microsoft ResearchMicrosoft Research

Jim GrayJim GrayDistinguished EngineerDistinguished EngineerMicrosoft Research Microsoft Research San FranciscoSan Francisco

SKYSERVERSKYSERVER

Page 2: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

Microsoft ResearchMicrosoft Research

Organization goal: Organization goal: Advance state of the artAdvance state of the art

More than 700 staff, 55 areasMore than 700 staff, 55 areas

Labs in US, Europe, Asia Labs in US, Europe, Asia Internationally recognized teamsInternationally recognized teams

University organizational modelUniversity organizational modelOpen research environmentOpen research environmentClose ties to universitiesClose ties to universities

Close working relations with development.Close working relations with development.

Page 3: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

My Research GoalMy Research Goal

Information at your fingertipsInformation at your fingertips

Bring all scientific literature and data onlineBring all scientific literature and data online

Focus on large database issues, Focus on large database issues, and scalable servers. and scalable servers.

Experiments &Instruments

Simulations facts

facts

answers

questions

?Literature

Other Archives facts

facts

Page 4: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

World Wide TelescopeWorld Wide Telescope

Premise: Most Astronomy data is online Premise: Most Astronomy data is online

The Internet is the world’s best telescope The Internet is the world’s best telescope

It has data on every part of the skyIt has data on every part of the sky

In every measured spectral band: In every measured spectral band:

As deep as the best instruments As deep as the best instruments

It is up when you are up.It is up when you are up.The “seeing” is always greatThe “seeing” is always great (no working at night, no clouds no moons no..).(no working at night, no clouds no moons no..).

It’s a smart telescope: It’s a smart telescope: links data with literature.links data with literature.

Page 5: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

SkyServer.SDSS.orgSkyServer.SDSS.orgBuilt with Johns Hopkins U.Built with Johns Hopkins U.

A modern archiveA modern archiveRaw data in file serversRaw data in file servers

Catalog data (derived objects) in DatabaseCatalog data (derived objects) in Database

10 billon records, 2 TB10 billon records, 2 TB

Also used for educationAlso used for education150 hours of online Astronomy150 hours of online Astronomy

Interesting thingsInteresting thingsBased on Web Services Based on Web Services

Spatial data searchSpatial data search

Cloned by other surveys Cloned by other surveys (a design template) (a design template)

Page 6: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

Service Oriented ArchitectureService Oriented ArchitectureData Federations of Web ServicesData Federations of Web Services

Massive datasets live near their owners:Massive datasets live near their owners:Near instrument software pipeline, appsNear instrument software pipeline, apps

Near data knowledge and curationNear data knowledge and curation

Each Archive publishes a web serviceEach Archive publishes a web service

Schema: documents the dataSchema: documents the data

Methods on objects (queries)Methods on objects (queries)

Uniform access to multiple ArchivesUniform access to multiple Archives

A common global schemaA common global schema

Scientists get “personalized” extractsScientists get “personalized” extracts

DB

DB

DBDB

DB

Page 7: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

2MASS

INT

SDSS

FIRST

SkyQueryPortal

ImageCutout

SkyQuery StructureSkyQuery StructureEach SkyNode publishesEach SkyNode publishes

Schema Web ServiceSchema Web Service

Data Query Web ServiceData Query Web Service

Portal Portal

Plans Query (2 phase) Plans Query (2 phase)

Integrates answersIntegrates answers

Is itself a web serviceIs itself a web service

Page 8: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

Federation: Federation: SkyQuery.NetSkyQuery.Net

Combines 15 archives Combines 15 archives

Send query to portal, Send query to portal, portal joins data from archives.portal joins data from archives.

ProblemProblem: want to do multi-step data analysis : want to do multi-step data analysis (not (not just single query).just single query).

SolutionSolution: : Allow personal databases on portalAllow personal databases on portal

ProblemProblem: : some queries are monsterssome queries are monsters

SolutionSolution: “batch scheduler” on portal server, : “batch scheduler” on portal server, Deposits answer in personal db.Deposits answer in personal db.

Page 9: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

Current Status: CERN → PasadenaCurrent Status: CERN → Pasadena

Multi Stream tpc/ip 7.1 Gbps Multi Stream tpc/ip 7.1 Gbps ~900 MBps~900 MBps New speed record @ New speed record @ http://ultralight.caltech.edu/lsr-winhechttp://ultralight.caltech.edu/lsr-winhec//

Single Stream tpc/ip 6.5 Gbps Single Stream tpc/ip 6.5 Gbps ~800 MBps~800 MBps

File Transfer Speed File Transfer Speed ~450 MBps~450 MBps

mb

ps p

er

secon

dm

bp

s p

er

secon

d00

1,0001,000

2,0002,000

3,0003,000

4,0004,000

5,0005,000

6,0006,000

7,0007,000

20002000 20012001 20022002 20032003 20042004 20052005

Page 10: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

Challenge: Move DaChallenge: Move Data from CERNta from CERN to Remote Centers @ 1GBps to Remote Centers @ 1GBps

• Disk-to-DiskDisk-to-Disk• gigabyte / secondgigabyte / second data rates data rates• 80TB/day 80TB/day • 3030 petpetabytes by 2008abytes by 2008• 1 exabyte by 20141 exabyte by 2014

~5 GBps~5 GBpsCERNCERN

Filter

Tier 2Tier 2

Tier 3Tier 3

Tier 1Tier 1

……INP3 RAL INFN FNAL

Tier 2

Institute

Tier 2Tier 2Tier 2Tier 2

Institute Institute Institute

Tier 4Tier 4

ExperimentExperiment ~1 GBps~1 GBps~PBps~PBps

.1 GBps.1 GBps

Physics Physics data data

cachecache

~1 GBps~1 GBps

~1 GBps~1 GBps

WorkstationsWorkstations

OC192 = 9.9 Gbps

Graphics courtesy of Harvey Newman @ Caltech

Page 11: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

SummarySummaryMicrosoft Research is active inside and outside Microsoft Research is active inside and outside Microsoft. Microsoft.

World Wide Telescope is comingWorld Wide Telescope is coming

Exemplifies service oriented architectureExemplifies service oriented architecture

Built with web services and databasesBuilt with web services and databases

Has interesting spatial database algorithmsHas interesting spatial database algorithms

10Gbps Networking is coming,10Gbps Networking is coming,x-64 is comingx-64 is comingand we are investing to make them real.and we are investing to make them real.

Details on my website:Details on my website:http://research.microsoft.com/~Grayhttp://research.microsoft.com/~Gray

Page 12: Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.

© 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.© 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.