Designing an OS for a Warehouse-scale Computerms705/pub/talks/2012-11-20_d...2012/11/20 ·...

Designing an OS for a Warehouse-scale Computer

Malte Schwarzkopf and Matthew P. Grosvenor

• “If your storage system has a few petabytes of data, you may have a datacenter, if you're paged in the middle of the night because there are only a few petabytes of storage left, you have may a warehouse scale computer”

Image liberally borrowed from google.com

Warehouse Scale Computers... (The gory details)

• Current state of the art: – Commodity components on custom boards – Millions of cores, Petabytes of RAM – Exabytes of storage – Hosts run a modified Linux kernel – User space services provide programming API

• e.g. BigTable, GFS, Spanner, MapReduce, AppEngine etc. • Almost exclusively distributed applications

Warehouse Scale Computers... (The gory details)

• Current state of the art: – Commodity components on custom boards – Millions of cores, Petabytes of RAM – Exabytes of storage – Hosts run a modified Linux kernel – User space services provide programming API

• e.g. BigTable, GFS, Spanner, MapReduce, AppEngine etc. • Almost exclusively distributed applications

I think I’ve heard that before somewhere…

“A microkernel is the near-minimum amount of software that can provide the mechanisms needed to implement an operating system. These mechanisms include low-level address space management, thread management, and inter-process communication (IPC). Traditional operating system functions, such as device drivers, protocol stacks and file systems, are removed from the microkernel to run in user space.” - en.wikipedia.org/wiki/Microkernel

So, what do we actually need?

“A microkernel is the near-minimum amount of software that can provide the mechanisms needed to implement an operating system. These mechanisms include low-level address space management, thread management, and inter-process communication (IPC). Traditional operating system functions, such as device drivers, protocol stacks and file systems, are removed from the microkernel to run in user space.” - en.wikipedia.org/wiki/Microkernel

The µKernel gospel sayeth:

Manage Address Spaces as_create() / as_destroy() Run Processes proc_run() / proc_stop() Manage IPC messages send_ipc() / recv_ipc()

That’s probably not quite enough.

The anatomy of a WSC.

EU L1I$

NIC NIC NIC

“The network“

Wait, are you saying we should have a

distributed OS over all of this?!

*) With caveats

Manage Address Spaces as_create() / as_destroy() Run Processes proc_run() / proc_stop() Manage IPC messages send_ipc() / recv_ipc() Network stack send_net() / recv_net() Disk I/O read_disk() / write_disk()

Manage Address Spaces as_create() / as_destroy() Run Processes proc_run() / proc_stop() Manage IPC messages recv_ipc() / send_ipc() Network stack recv_net() / send_net() Disk I/O read_disk() / write_disk()

READ WRITE

But, but, but…

• Reads/writes may not be local – where is my data?

– IPC to a process on box, across DC, [in another DC?]

– Disk read on box, across DC, RDMA etc.

• Data consistency is an issue – how do we get it?

– Replication – where should I get it from?

– Updates – which one is most recent?

– Updates – what if an update happens while I’m working?

– Failure – what if a the node holding data fails?

1. Where is the data? – All objects in the system (including processes) have

a UUID

– Global name service, with local cache (sounds like a TLB)

– If an object name is not in the cache, send a message to figure out where it is (ARP-like). [ N.B. Object may be in many places ]

– Send invalidation message on write to object

• How long do you wait for a response??

1.5. References – Locally valid identifiers for global names

• Data objects • Processes • Device buffers (net/disk) • …

– Forged by kernel (≈ capabilities) – Slim in required, but rich in potential information ref 0x66a67f7e3 { max write size: 4K; durable: false; proto: binary; […] }

2. Data consistency – Update the API – begin_write() – take a pointer to the data – end_write() – notify hosts that data has

changed (i.e. cache invalidation message) – begin_read() – take a copy of the data to work

with – end_read() – has the data changed since I

started? (Do I care?)

• How long do you wait?

If only we had a reliable network layer with bounded

latency…

But hang on, we do!

Now we just have another cache-coherent layer with a

large invalidation cost!

The WSC syscall interface.

• <ptr, size> begin_read(ref) bool end_read(ref)

• <ptr> begin_write(ref, size) bool end_write(ref)

• bool proc_run(ref) bool proc_pause(ref)

• ref create(name) ref lookup(name) bool delete(ref)

• select([ref])

What’s the goodness?

Simple API: “Declarative” programs

– Calls all take a flags argument

• O_PERSISTENT / O_TEMPORARY

• O_REPLICATE=4

• O_WEAKLY_CONSISTENT

• …

– Forces the programmer to reason about the

properties of their distributed application and its

objects!

Reduced complexity

– Easier to debug and reason about

– Can plug in “simulation” layers at syscall

interface (to e.g. simulate failures or high

bounded latency)

– “Default refs” at proc_run can supply useful

primitives

• e.g. timer, network wakeup ref (integrated with R2D2

scheduler)

How can we build this?

Firmament

Mirage D¢OS?

Designing an OS for a Warehouse-scale Computerms705/pub/talks/2012-11-20_d...2012/11/20 ·...

Documents

Transcript of Designing an OS for a Warehouse-scale Computerms705/pub/talks/2012-11-20_d...2012/11/20 ·...

Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.

Understanding and Designing New Server …chong/290N-W10/WarehouseWorkloadsISCA08.pdf · Understanding and Designing New Server Architectures for Emerging Warehouse-Computing ...

FOR LEASE 355-369 ADRIAN ROAD...355-369 adrian road warehouse warehouse warehouse office office open office warehouse off area wommen office open office warehouse warehouse retail

Designing a Data Warehouse

Designing and Implementing a Data Warehouse using ...

Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments

Designing a Data Warehouse - what would a BI solution recommend?

Designing SQL Server 2012 Analysis Services Cubes using ...€¦ · data warehouse, creating and analyzing cubes to data mining. The following provides a cube designing example using

Designing a Data Warehouse Issues in DW design. Data Warehouse A read-only database for decision analysis Subject Oriented Integrated Time variant Nonvolatile.

Designing warehouse receipt legislation: regulatory options and ...

Designing a Data Warehouse for Market Basket Analysis in ...

Electric Counterbalanced Trucks Capacity 1600, 2000kg E16, E20 · Electric Counterbalanced Trucks Capacity 1600, 2000kg E16, E20 Series 1275 1275_E16/20_D-01_201507 Safety ... class/form

Designing warehouse receipt legislation - FGS, Inc · 2017. 7. 9. · One specific law for the warehouse receipt system 11 Separate laws for warehouses and warehouse receipts 12 ...

Electric Counterbalanced Trucks Capacity 1600, 2000kg E16, E20 · 2020. 2. 10. · Electric Counterbalanced Trucks Capacity 1600, 2000kg E16, E20 Series 1275 1275_E16/20_D-01_201507

Designing a Modern Data Warehouse + Data Lake · PDF fileDesigning a Modern Data Warehouse + Data Lake ... Designing a Modern Data Warehouse + Data Lake ... Selective integration with

Designing warehouse receipt legislation

Optimization of Data Warehouse Design and Architecture432740/FULLTEXT01.pdf · 2011-08-05 · Data warehouse architecture and designing section include the theoretical information

Designing a Data Warehouse for Cyber Crimes · models that can be used for designing a data warehouse for cyber crimes. In Section 4, we discuss ... The field of cyber forensics is

Warehouse & Manufacturing Wi-Fi - Ibwave solutions · Warehouse & Manufacturing Wi-Fi Network Challenges Designing Warehouse & Manufacturing Wi-Fi Networks Production plants are busy,

DESIGNING EFFICIENT MEMORY SCHEDULERS FOR FUTURE … · DESIGNING EFFICIENT MEMORY SCHEDULERS FOR FUTURE SYSTEMS by ... modern personal/mobile computing is composed of powerful, warehouse-scale