Gaurav slides

Click here to load reader

Embed Size (px)

Transcript of Gaurav slides

  • The Path to Exascale Computing Challenges and Opportunities

    HPC Meet-up21st May

    Gaurav KaulSolutions Architect

    Intel

  • 2

    Outline

    Why Exascale?

    Existing Trends The End of Moores Law?

    Major Technology Challenges (aka Walls)

    Technologies On the Horizon

    Scaling Applications for Peta/Exa-Scale Era

    Summary

  • 3

    Performance Roadmap

    1.E-04

    1.E-02

    1.E+00

    1.E+02

    1.E+04

    1.E+06

    1.E+08

    1960 1970 1980 1990 2000 2010 2020

    GF

    LO

    P

    MFLOP

    GFLOP

    TFLOP

    PFLOP

    EFLOP

    12 Years 11 Years 10 Years

    Client

    Hand-held

  • A bit of History

    4

  • The Top 500 Waterfall

    5

  • 50 years of Moores Law

    6

  • Moore and Dennard Scaling

    7

  • 8

    Current Processor Performance Trends

  • Technology Scaling Outlook

    9

  • 10

    The Power & Energy Challenge

    200W

    150W

    100W

    100W

    4550W

    5KW

    Compute

    Memory

    Com

    Disk

    TFLOP Machine today

    5W2W

    ~5W~3W5W

    TFLOP Machine thenWith Exa Technology

    ~20W

  • Promising Technologies

    11

  • Rethink System Level Architecture

    12

  • DRAM Scaling Using 3D Memory

    13

  • Innovative Packaging and I/O

    14

  • 15

    Needs a Paradigm Shift

    Evaluate each (old) architecture feature with new priorities

    Single thread performance Frequency

    Programming productivity Legacy, compatibility

    Architecture features for productivity

    Constraints (1) Cost

    (2) Reasonable Power/Energy

    Throughput performance Parallelism

    Power/Energy Architecture features for energy

    Simplicity

    Constraints (1) Programming productivity

    (2) Cost

    Past and present priorities

    Future priorities

  • Intel: Investing to Remove 6 Bottlenecks

    Interconnect

    Memory

    &

    Storage

    Processor

    Performance

    Reliability

    and

    Resiliency

    Standard Programming

    Model for Parallelism

    Power

    Efficiency

  • Impact on Applications

    17

  • The Many Ways to Parallelism

    18

  • And New Workloads will

    Emerge

    19

  • Code Modernization The 4D Approach

    20

  • New for Knights Landing(Next Generation Intel Xeon Phi Products)

    2nd half 15 1st commercial systems

    3+ TFLOPS1In One Package Parallel Performance & Density

    On-Package Memory: High Performance

    up to 16GB at launch

    5X Bandwidth vs DDR47

    Compute: Intel Silvermont Arch. (Intel Atom)2

    Low-Power Cores with HPC Enhancements3

    3X Single Thread Performance4 vs Prior Gen.

    Intel Xeon Processor Binary Compatible5

    1/3X the Space6

    5X Power Efficiency6

    ..

    .

    ..

    .

    Integrated Fabric

    Intel Silvermont Arch. Enhanced for HPC6

    Processor Package

    ConceptualNot Actual Package Layout

    Platform Memory: DDR4 Bandwidth and Capacity Comparable to Intel Xeon Processors

    LEARN MORE: Knights Landing Webcast (Tuesday June 24th): https://www.brighttalk.com/webcast/10773/116329

    Jointly Developed with Micron Technology

    https://www.brighttalk.com/webcast/10773/116329

  • 22

    What is an FPGA?

    FPGAs (Field Programmable Gate Arrays) are

    semiconductor devices that can be programmed

    - Desired functionality of the FPGA can be (re-)programmed by downloading a configuration into the device

    FPGAs offer several advantages over potential

    alternatives:

    - Lower one-time development cost, and faster time to market compared to custom designed chips (ASICs)

    - Ability to implement customer-specific functionality beyond what is available from standard products (ASSPs)

    - Customizable and reprogrammable after the device has been deployed to the field compared to both ASIC and ASSP

    http://commons.wikimedia.org/wiki/File:Fpga1a.gifhttp://commons.wikimedia.org/wiki/File:Fpga1a.gif

  • 0.01

    0.1

    1

    10

    100

    1000

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    Acceleration Architectural

    Landscape

    Source: ISSCC Proceedings

    En

    erg

    y e

    ffic

    ien

    cy (

    MO

    PS

    /mW

    )

    Processor Number (sorted by efficiency)

    MicroprocessorsReconfigurable

    Dedicated HWMore programmable

    More efficient

    10X

    100X

    Potential for 10-100X higher performance/watt vs. general purpose cores

    23

  • 24

    FPGAs as Reconfigurable

    Accelerators

  • Intel Confidential Do Not Forward

    25

    Example Use Case HFT

  • What will matter in 10 years

    26

  • Intel Confidential Do Not Forward

    27

    What Next?

  • Intel Confidential Do Not Forward

    28

    Summary