NeVer Mind networking: Using shared non- volatile memory in ......Non-volatile Random Access Memory...

30
NeVer Mind networking: Using shared non- volatile memory in scale-out software Stanko Novakovic , Paolo Faraboschi, Kimberly Keeton, Rob Schreiber, Edouard Bugnion, Babak Falsafi * EPFL, Switzerland HP Labs, USA * * *

Transcript of NeVer Mind networking: Using shared non- volatile memory in ......Non-volatile Random Access Memory...

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

    NeVer Mind networking: Using shared non-volatile memory in scale-out software

    Stanko Novakovic , Paolo Faraboschi, Kimberly Keeton, Rob Schreiber, Edouard Bugnion, Babak Falsafi

    *

    EPFL, Switzerland HP Labs, USA *

    * *

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 2

    Multiple non-volatile memory technologies

    Magnetic disk •  Block-based storage, access via programmed IO or DMA (e.g. ATA)

    Flash (PCIe-attached – via NVMe)

    •  Block-based storage, access via queue-pairs (i.e. SQ, CQ) and DMA

    Non-volatile Random Access Memory (NVRAM) •  Persistent with similar performance characteristics to DRAM •  Byte-addressable, direct access via LD/ST •  Examples: Resistive RAM (RRAM), Phase-change memory (PCM)

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 3

    Rack-scale systems with shared non-volatile memory

    Shared disk architecture

    Shared NVMe namespaces

    TheMachine/Firebox

    Cache coher. No No No Interface PIO, DMA QP-based LD/ST

    Technology Magnetic disk Flash NVRAM

    1. Direct shared access 2. Latency: small factor of DRAM 3. Bandwidth: DDR bandwidth

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 4

    The Machine from HP Labs

    High-performance, high-capacity rack-scale computing

    Photonic interconnect

    Special purpose cores Massive memory pool

    Petabytes of byte-addressable NVRAM based on memristor technology •  Shared across thousands of cores via photonic interconnect

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 5

    Outline

    Introduction to shared NVRAM ! Shared NVRAM model Hash Table Graph Processing System Conclusion

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 6

    Shared NVRAM model

    Shared NVRAM

    CPU

    DRAM

    I/O CPU

    DRAM

    I/O … CPU

    DRAM

    I/O

    Servers -  Do not share DRAM -  Share NVRAM

    * but not CC *

    Shared NVRAM -  Used as a heap -  Latency: 4-5x of DRAM -  Bandwidth: DDR rate

    server

    Can datacenter software benefit from such architecture?

    Memory Interconnect

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 7

    Experimental shared NVRAM platform

    Kernel Virtual Machines (KVM)

    Fraction of each guest’s address space is shared User code instrumented to delay NVRAM accesses

    Ttrans = lat + data_size/bandwidth ~0.5µs DDR3/QPI

    (unchanged)

    DRAM Linux

    VM VM VM …

    Shared NVRAM

    100ns ~0.5µs

    app app app

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 8

    Using shared NVRAM in datacenter software systems

    Data serving •  Distributed hash tables, graph stores ! Share data items via NVRAM

    Data processing (a.k.a. analytics) •  MapReduce, distributed graph processing, etc. ! Use shared NVRAM as communication medium

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 9

    Today: Hash Table w/ client-side hashing

    CPU I/O CPU I/O … CPU I/O

    client

    shardID ß h(key)…

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 10

    This work: CREW Hash Table for shared NVRAM

    CPU

    DRAM

    I/O CPU

    DRAM

    I/O … CPU

    DRAM

    I/O

    Each server owns write permission to one block and read permission to whole NVRAM 1 2 N

    Permisssions: Read-Write = {1} Read-Only = {1,2,..N}

    Data stored in NVRAM

    Memory Interconnect

    1 2 N

    CREW – Concurrent Read Exclusive Write

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 11

    CREW Hash Table for shared NVRAM (reads)

    CPU

    DRAM

    I/O CPU

    DRAM

    I/O … CPU

    DRAM

    I/O

    client

    serverID = uniform_rand() 1 2 N

    Memory Interconnect

    N

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 12

    CREW Hash Table for shared NVRAM (writes)

    CPU

    DRAM

    I/O CPU

    DRAM

    I/O … CPU

    DRAM

    I/O

    client

    serverID = uniform_rand() 1 2 N

    network

    Memory Interconnect

    N

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 13

    Hash table for shared NVRAM prototype

    Based on Redis KVS •  Each server runs separate instance of Redis •  Shared read-only access to data items

    YCSB clients run on separate servers

    •  Configured to compute hash or pick server in round-robin fashion

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 14

    Per-server network utilization (0.99 Zipf workload)

    0

    50

    100

    150

    200

    250

    300

    350

    400

    1 2 3 4 5 6 7 8 9 10 11 12

    1KB

    read

    look

    ups

    (thou

    sand

    s)

    servers

    Client-side hashing

    Consistent hashing on client side can cause load imbalance

    Redis (hashing)

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 15

    0

    50

    100

    150

    200

    250

    300

    350

    400

    1 2 3 4 5 6 7 8 9 10 11 12

    1KB

    read

    look

    ups

    (thou

    sand

    s)

    servers

    Client-side hashing Round-robin

    What does uniform utilization buy us?

    BW/CPU bottleneck

    Redis (shared NVRAM)

    Higher throughput due to shared access w/o violating SLA

    99th-percentile latency Redis (hashing)

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 16

    Today: Hash Table w/ proxying

    CPU I/O CPU I/O … CPU I/O

    client

    serverID = uniform_rand()

    proxy

    network

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 17

    HT w/ proxying vs. HT w/ shared access

    0

    10

    20

    30

    40

    50

    60

    70

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Look

    ups/

    seco

    nd (t

    hous

    ands

    )

    Fraction of reads

    Round-robin (w/ proxying) Round-robin

    ~2x improvement in performance for read-only workload

    Redis (shared NVRAM) Redis (proxying)

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 18

    Data serving recap

    Looked at two hash table (KVS) designs •  with client-side hashing and proxying

    KVS that uses shared NVRAM allows for shared read-only access •  Better load balancing over hashing •  Lower end-to-end latency over proxying

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 19

    Outline

    Introduction to shared NVRAM Shared NVRAM model Hash Table ! Graph Processing System Conclusion

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 20

    Bulk Synchronous Parallel (BSP) on scale-out

    CPU I/O CPU I/O … CPU I/O

    1 2 N Compute

    Buffered vertex updates

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 21

    BSP graph processing on scale-out

    CPU I/O CPU I/O … CPU I/O

    1 2 N

    network

    Compute

    Communicate

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 22

    BSP graph processing on shared NVRAM

    CPU I/O CPU I/O … CPU I/O

    1 2 N Compute

    Memory Interconnect

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 23

    BSP graph processing on shared NVRAM

    CPU I/O CPU I/O … CPU I/O

    1 2 N Compute Communicate

    DRAM à NVRAM

    FLUSH

    Memory Interconnect

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 24

    BSP graph processing on shared NVRAM

    CPU I/O CPU I/O … CPU I/O

    1 2 N Compute Communicate

    DRAM à NVRAM NOTIFY

    NOTIFY

    Memory Interconnect

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 25

    BSP graph processing on shared NVRAM

    CPU I/O CPU I/O … CPU I/O

    1 2 N Compute

    PULL

    Communicate DRAM à NVRAM NOTIFY NVRAM à DRAM

    Memory Interconnect

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 26

    BSP graph processing for shared NVRAM prototype

    Shared NVRAM BSP framework implemented in C++ from scratch Algorithms implemented as compute kernels Compare BSP over TCP/IP and BSP over NVRAM

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 27

    PageRank on Twitter graph

    Accelerating shuffle phase helps improve overall execution time

    0

    1

    2

    3

    4

    5

    6

    7

    8

    2 nodes 4 nodes 8 nodes 16 nodes

    Exe

    cutio

    n tim

    e (s

    econ

    ds)

    TCP/IP NVM Scale-out Shared NVRAM

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 28

    Related work

    Concurrent Read Exclusive Write (FARM NSDI’14, MICA NSDI’14) Graph update aggregation (Pregel SIGMOD’10) Shared disks and shared NVMe namespaces

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 29

    Conclusion

    Shared NVRAM

    CPU

    DRAM

    I/O CPU

    DRAM

    I/O … CPU

    DRAM

    I/O

    Memory Interconnect

    Rack-scale systems w/ shared NVRAM •  Direct access, byte-addressable

    This talk: how to make use of this arch. •  In common DC apps •  Studied KVS and graph processing

  • © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 30

    Thank you