An Overview of Systems and Networking Research at Microsoft Research Michael B. Jones Systems and...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of An Overview of Systems and Networking Research at Microsoft Research Michael B. Jones Systems and...
An Overview of Systems and An Overview of Systems and Networking Research at Networking Research at
Microsoft ResearchMicrosoft Research
Michael B. Jones
Systems and Networking Research Group,Microsoft Research
April 1999
Microsoft ResearchMicrosoft ResearchA quick primerA quick primer
Founded in 1991 Goal: Pursue strategic technologies
for Microsoft Original research groups:
Natural Language Processing Operating Systems Programming Languages
Microsoft ResearchMicrosoft Research Over 300 researchers in 27 areas
Speech, Decision Theory, Graphics, Databases, to Statistical Physics
Research lab locations: Redmond, San Francisco, Cambridge (UK),
Beijing
Internationally recognized research teams Hundreds of publications, presentations Leadership roles in professional societies,
journals, conferences
Fastest Growing CS Fastest Growing CS Research Organization Research Organization
In The WorldIn The World Grew by factor of four from ’94 to ’97 Decided in ’97 to grow by a factor of
three in three years 200 in FY ’97 => 600 in FY ’00,
primarily in Redmond
Major impact on Microsoft products Virtually all MS products shipped today
use technology from Microsoft Research
Systems and Networking Systems and Networking Research GroupResearch Group
One of the original three research groups at Microsoft Research in Redmond Formerly called the “Operating Systems
Research Group” Name changed in 1998 to explicitly include
networking
Group presently 15 members Working in four areas
Past ProjectsPast Projects
Tiger Scalable, fault-tolerant multimedia file
system using commodity hardware
Rialto Real-time kernel enabling predictable
concurrent execution of independent real-time programs
Both were used in Microsoft's Interactive TV trial in 1996-1997 with NTT in Yokosuka, Japan
Current Research AreasCurrent Research Areas
Networking Distributed Computing Operating Systems Real-Time Systems
Group Members andGroup Members andCurrent Research AreasCurrent Research Areas
Victor Bahl – Net Bill Bolosky – OS Gerald Cermak – Dist.
Sys. Scott Cutshall – OS Rich Draves – Net John Douceur – OS Alessandro (Sandro)
Forin – Net Johannes Helander – OS
Galen Hunt – Dist. Sys. Mike Jones – Real-Time
Sys. Steve Levi – Dist. Sys. Venkat Padmanabhan –
Net Marvin Theimer – OS Yi-Min Wang – Dist. Sys. Brian Zill – Net
Networking ProjectsNetworking Projects Location Aware Systems and Services Hardware Adapter for Light-Weight
Mobile Networking IPv6 Automatic Network Configuration High Performance & Sys. Area Networking DCOM over SAN TCP Fast Start, Network Performance
Improvement Multicast-based Data Dissemination
Distributed Computing Distributed Computing ProjectsProjects
Millennium Distributed, Fault-Tolerant
Applications Automatic Application Partitioning Distributed Java Virtual Machine
Operating Systems ProjectsOperating Systems Projects
Componentized System Architecture Single-Instance Store Filesystem Unobtrusive Background
Computation Transactional Filesystem
Real-Time Systems ProjectsReal-Time Systems Projects
Real-Time Scheduling Real-Time Latency Measurement
Current ProjectsCurrent Projects
Grouped by Research Areas
Networking ProjectsNetworking Projects
Location Aware Systems and Location Aware Systems and ServicesServices
In-building location-aware system Wireless mobile nodes precisely
compute their geographic location Enable new class of mobile applications
E.g., use nearest printer, etc.
Victor Bahl, Venkat Padmanabhan, Turner Whitted, Josh Broch (CMU)
Hardware Adapter for Light-Hardware Adapter for Light-Weight Mobile NetworkingWeight Mobile Networking
MCoM (Mobile Communicator) Project Light-weight devices network in both
ad-hoc and controlled manner Investigates protocol and systems
issues: Energy conservation Multi-hop routing In presence of link failures, mobility
Victor Bahl, Turner Whitted
IPv6IPv6
Internet Protocol Version 6 (IPv6) implementation for Windows NT Freely downloadable Numerous v6 utilities also available
Multi-homing issues Rich Draves, Brian Zill, ISI (Allison
Mankin, etc.) Published in ’98 USENIX NT
Automatic Network Automatic Network ConfigurationConfiguration
Algorithms for auto-configuring IP networks
Address and subnet assignment that optimize the network’s efficiency
Rich Draves, Chris King (Northeastern), Cheenu Venkatachary (WUSTL)
Published in InfoCom ’99
High Performance & System High Performance & System Area NetworkingArea Networking
High-performance networking under NT VIA-like and memory-like interconnects It’s WinSock! No need to rewrite apps No loss of performance Easily extensible (RDMA, registration, …) Gigabit Ethernet Jumbo Frames
TCP Switch Layered WSP over SAN vendor’s WSP
Sandro Forin, Johannes Helander, NT Published at DARPA NT Workshop
Hybrid SAN-TCP/IP ArchitectureHybrid SAN-TCP/IP Architecture
User
Kernel
Winsock
AFD
MsAfd
TCP/IP
Switch
TDI App
Winsock App
Switch
SAN WS Provider
SAN NDIS MiniPort
SAN TDIProvider
SAN NIC
Winsock
AFD
MsAfd
TCP/IP
TDI App
Winsock App
SAN NIC
SAN MiniPort
SANWS
Driver
TDI
NDIS
Winsock SPI
TDI
DCOM Over SANDCOM Over SAN
Millennium Falcon project Implement high-performance distributed
object systems For clusters of servers Connected by SANs
Take full advantage of user-mode nets Current implementation based on DCOM
and VIA Yi-Min Wang
TCP Fast Start, Network TCP Fast Start, Network Performance ImprovementPerformance Improvement Reuse information learned in past
Rather than rediscover it each time E.g., TCP congestion window
Venkat Padmanabhan, Randy Katz (Berkeley)
Published at Globecom ’98 Internet Mini-Conference
Multicast-based Data Multicast-based Data DisseminationDissemination
Quantify potential benefits of multicast for information dissemination Based on HTTP logs
Evaluate algorithms/heuristics for deciding which data should be multicast
Venkat Padmanabhan
Distributed Computing Distributed Computing ProjectsProjects
Distributed, Fault-Tolerant Distributed, Fault-Tolerant ApplicationsApplications
Millennium Project Unifying vision behind several individual
prototype projects
Galen Hunt, Yi-Min Wang, Gerald Cermak, Johannes Helander, Rick Rashid
Initial position paper published at HotOS-VI, 1997
Problem Building distributed, fault tolerant
applications is too hard, costs too much
Goal Raise the level of abstraction provided
by the operating system Individual computers, file systems,
networks unimportant to component builders
MillenniumMillennium
App
NTNTNT
AppApp
COM+COM+COM+
App App
Millennium:Millennium:Raise the Level of AbstractionRaise the Level of Abstraction
Maintain single system image.
Transparent invocation, migration, and recovery.
Individual computers, file systems, and networks become unimportant to application developers.
Millennium
Application
Automatic Application Automatic Application PartitioningPartitioning
Millennium Coign Project Galen Hunt Published in OSDI ’99
Before: After:
Coign: Coign: Automatic Distributed Automatic Distributed
PartitioningPartitioning Converts local COM applications into
distributed client-server applications without source code.
The Plan:The Plan:1. Find Components in Application Binaries
2. Identify Interfaces and Measure Communication
3. Partition and Distribute Components
COP: Component Object COP: Component Object ProxyProxy
Transparently remote Win32 API calls Factor Win32 interface Automatically create DCOM interfaces Transparently insert proxy objects Galen Hunt, Gerald Cermak
Millennium ContinuumMillennium Continuum
Provides single system image for Windows API
Automatic object placement and migration at run-time
Language neutral At least Visual Basic, C, C++, Java
Based on COM+ Galen Hunt, Gerald Cermak, Rick
Rashid
Distributed Java Virtual Distributed Java Virtual MachineMachine
Millennium Borg project Makes multiple JVMs appear to be one Unmodified Java programs may run as
distributed applications Transparent distribution, migration Johannes Helander
Operating Systems ProjectsOperating Systems Projects
Componentized System Componentized System ArchitectureArchitecture
MMLite Project Kernel object architecture stressing
adaptability, minimalism, reusability Many normally “built-in” components
selectable, loadable E.g., Virtual Memory, IPC
Johannes Helander, Sandro Forin Published at ’98 SigOps European
Workshop
Single-Instance Store Single-Instance Store FilesystemFilesystem
Enables single on-disk instance of files with multiple logical copies
Sharing transparent to applications Replicas found in background, coalesced
Bill Bolosky, Scott Cutshall, John Douceur, NT filesystem group
Planned to ship with Windows 2000
Unobtrusive Background Unobtrusive Background ComputationComputation
“How to be Really Nice” Background processes that don’t
interfere with foreground work Even if neither CPU-bound
Based on progress metrics Back off when statistically significant
slowdown observed
John Douceur, Bill Bolosky
Transactional FilesystemTransactional Filesystem
Research version of NTFS with transactional semantics
Marvin Theimer
Real-Time Systems ProjectsReal-Time Systems Projects
Real-Time SchedulingReal-Time Scheduling
Scheduling abstractions enabling predictable concurrent execution of independent real-time programs
Mike Jones, John Regehr (Virginia), formerly Daniela Rou (GA Tech), Marcel Rou (GA Tech), George Candea (MIT)
Published in ’96 SigOps, ’97 SOSP, ’98 & ’99 USENIX Windows NT
Real-Time Latency Real-Time Latency MeasurementMeasurement
Understand, fix sources of long thread scheduling latencies in NT
Mike Jones, John Regehr (Virginia) Published in ’98 NOSSDAV & ’99
HotOS
Problem: “Unimportant” Problem: “Unimportant” Background WorkBackground Work
DEC dc21x4 PCI Fast 10/100 Ethernet 6ms periodic DPC every 5s
Autosense processing Most of 6ms in five 0.88ms calls to
routine that reads device register that: Writes a HW register – 1.5µs Stalls for 5µs Writes HW register again – 1.5µs Stalls for 5µs Reads a HW register – 1.5µs Stalls for 5µs
And does this 16 times! (once per bit)
Another Long DPC: Intel EE 16Another Long DPC: Intel EE 16
Intel EtherExpress 16 ISA Ethernet 17ms DPC every 10s Card reset for no received packets
Amusing Observation Unplugging Ethernet makes latency
worse! Despite conventional wisdom to the
contrary
Even Worse: Video CardsEven Worse: Video Cards Video cards and drivers conspire to
hog the PCI bus Dragging large window locks out
interrupts for up to 30ms Obliterates sound I/O, for instance Can set registry key to ask drivers to
behave, but not default No problem when set correctly
Manufacturers’ motivation: WinBench ~ 5% improvement
Video CardVideo CardMisbehavior DetailsMisbehavior Details
Don’t check if card FIFO full before write Eliminates one PCI read Stalls PCI bus if full to prevent overflow Uses “PCI disconnect” feature
For More InformationFor More Information
Systems and Networking Research Group web pages: http://research.microsoft.com/sn/