Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San...
-
Upload
carter-montgomery -
Category
Documents
-
view
218 -
download
3
Transcript of Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San...
Microsoft ResearchMicrosoft Research
Jim GrayJim GrayDistinguished EngineerDistinguished EngineerMicrosoft Research Microsoft Research San FranciscoSan Francisco
SKYSERVERSKYSERVER
Microsoft ResearchMicrosoft Research
Organization goal: Organization goal: Advance state of the artAdvance state of the art
More than 700 staff, 55 areasMore than 700 staff, 55 areas
Labs in US, Europe, Asia Labs in US, Europe, Asia Internationally recognized teamsInternationally recognized teams
University organizational modelUniversity organizational modelOpen research environmentOpen research environmentClose ties to universitiesClose ties to universities
Close working relations with development.Close working relations with development.
My Research GoalMy Research Goal
Information at your fingertipsInformation at your fingertips
Bring all scientific literature and data onlineBring all scientific literature and data online
Focus on large database issues, Focus on large database issues, and scalable servers. and scalable servers.
Experiments &Instruments
Simulations facts
facts
answers
questions
?Literature
Other Archives facts
facts
World Wide TelescopeWorld Wide Telescope
Premise: Most Astronomy data is online Premise: Most Astronomy data is online
The Internet is the world’s best telescope The Internet is the world’s best telescope
It has data on every part of the skyIt has data on every part of the sky
In every measured spectral band: In every measured spectral band:
As deep as the best instruments As deep as the best instruments
It is up when you are up.It is up when you are up.The “seeing” is always greatThe “seeing” is always great (no working at night, no clouds no moons no..).(no working at night, no clouds no moons no..).
It’s a smart telescope: It’s a smart telescope: links data with literature.links data with literature.
SkyServer.SDSS.orgSkyServer.SDSS.orgBuilt with Johns Hopkins U.Built with Johns Hopkins U.
A modern archiveA modern archiveRaw data in file serversRaw data in file servers
Catalog data (derived objects) in DatabaseCatalog data (derived objects) in Database
10 billon records, 2 TB10 billon records, 2 TB
Also used for educationAlso used for education150 hours of online Astronomy150 hours of online Astronomy
Interesting thingsInteresting thingsBased on Web Services Based on Web Services
Spatial data searchSpatial data search
Cloned by other surveys Cloned by other surveys (a design template) (a design template)
Service Oriented ArchitectureService Oriented ArchitectureData Federations of Web ServicesData Federations of Web Services
Massive datasets live near their owners:Massive datasets live near their owners:Near instrument software pipeline, appsNear instrument software pipeline, apps
Near data knowledge and curationNear data knowledge and curation
Each Archive publishes a web serviceEach Archive publishes a web service
Schema: documents the dataSchema: documents the data
Methods on objects (queries)Methods on objects (queries)
Uniform access to multiple ArchivesUniform access to multiple Archives
A common global schemaA common global schema
Scientists get “personalized” extractsScientists get “personalized” extracts
DB
DB
DBDB
DB
2MASS
INT
SDSS
FIRST
SkyQueryPortal
ImageCutout
SkyQuery StructureSkyQuery StructureEach SkyNode publishesEach SkyNode publishes
Schema Web ServiceSchema Web Service
Data Query Web ServiceData Query Web Service
Portal Portal
Plans Query (2 phase) Plans Query (2 phase)
Integrates answersIntegrates answers
Is itself a web serviceIs itself a web service
Federation: Federation: SkyQuery.NetSkyQuery.Net
Combines 15 archives Combines 15 archives
Send query to portal, Send query to portal, portal joins data from archives.portal joins data from archives.
ProblemProblem: want to do multi-step data analysis : want to do multi-step data analysis (not (not just single query).just single query).
SolutionSolution: : Allow personal databases on portalAllow personal databases on portal
ProblemProblem: : some queries are monsterssome queries are monsters
SolutionSolution: “batch scheduler” on portal server, : “batch scheduler” on portal server, Deposits answer in personal db.Deposits answer in personal db.
Current Status: CERN → PasadenaCurrent Status: CERN → Pasadena
Multi Stream tpc/ip 7.1 Gbps Multi Stream tpc/ip 7.1 Gbps ~900 MBps~900 MBps New speed record @ New speed record @ http://ultralight.caltech.edu/lsr-winhechttp://ultralight.caltech.edu/lsr-winhec//
Single Stream tpc/ip 6.5 Gbps Single Stream tpc/ip 6.5 Gbps ~800 MBps~800 MBps
File Transfer Speed File Transfer Speed ~450 MBps~450 MBps
mb
ps p
er
secon
dm
bp
s p
er
secon
d00
1,0001,000
2,0002,000
3,0003,000
4,0004,000
5,0005,000
6,0006,000
7,0007,000
20002000 20012001 20022002 20032003 20042004 20052005
Challenge: Move DaChallenge: Move Data from CERNta from CERN to Remote Centers @ 1GBps to Remote Centers @ 1GBps
• Disk-to-DiskDisk-to-Disk• gigabyte / secondgigabyte / second data rates data rates• 80TB/day 80TB/day • 3030 petpetabytes by 2008abytes by 2008• 1 exabyte by 20141 exabyte by 2014
~5 GBps~5 GBpsCERNCERN
Filter
Tier 2Tier 2
Tier 3Tier 3
Tier 1Tier 1
……INP3 RAL INFN FNAL
Tier 2
Institute
Tier 2Tier 2Tier 2Tier 2
Institute Institute Institute
Tier 4Tier 4
ExperimentExperiment ~1 GBps~1 GBps~PBps~PBps
.1 GBps.1 GBps
Physics Physics data data
cachecache
~1 GBps~1 GBps
~1 GBps~1 GBps
WorkstationsWorkstations
OC192 = 9.9 Gbps
Graphics courtesy of Harvey Newman @ Caltech
SummarySummaryMicrosoft Research is active inside and outside Microsoft Research is active inside and outside Microsoft. Microsoft.
World Wide Telescope is comingWorld Wide Telescope is coming
Exemplifies service oriented architectureExemplifies service oriented architecture
Built with web services and databasesBuilt with web services and databases
Has interesting spatial database algorithmsHas interesting spatial database algorithms
10Gbps Networking is coming,10Gbps Networking is coming,x-64 is comingx-64 is comingand we are investing to make them real.and we are investing to make them real.
Details on my website:Details on my website:http://research.microsoft.com/~Grayhttp://research.microsoft.com/~Gray
© 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.© 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.