IA-64 Architecture - Linux Clusters ??Itanium Micro-Architecture lItanium Update ... Protection Key...

download IA-64 Architecture - Linux Clusters  ??Itanium Micro-Architecture lItanium Update ... Protection Key Registers Key5 rw- ... Flexible Virtual Memory Architecture Enables

of 40

  • date post

    25-Mar-2018
  • Category

    Documents

  • view

    220
  • download

    4

Embed Size (px)

Transcript of IA-64 Architecture - Linux Clusters ??Itanium Micro-Architecture lItanium Update ... Protection Key...

  • 1

    IAIAIAIA----64 Architecture64 Architecture64 Architecture64 Architecture

    Sunil SaxenaSunil SaxenaPrincipal EngineerPrincipal EngineerIntel CorporationIntel Corporation

    September 11th, 2000September 11th, 2000

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 2

    IA Processor Roadmap

    MadisonIAIA--64 Perf64 Perf

    FutureIA-32

    DeerfieldIAIA--64 Price/Perf64 Price/Perf

    Per

    form

    ance

    0200 01.25 .18 .13

    . . .. . .

    McKinley

    ItaniumTMprocessor

    99

    . . .. . .

    . . .. . .

    Foster

    Outstanding Performance for

    32 Bit Volume Apps

    Outstanding Performance for

    32 Bit Volume Apps

    Extends IA Headroom, Scalability and Availability

    for the Most Demanding Environments

    Extends IA Headroom, Scalability and Availability

    for the Most Demanding Environments

    Cascades

    PentiumIII Xeon processor

    Strong Execution on Itanium Processor, Continued Focus on the Long Term

  • 2

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 3

    Agenda Agenda IAIA--64 Architecture64 Architecture

    llEPIC 101EPIC 101Application ArchitectureApplication Architecture

    System ArchitectureSystem Architecture

    Itanium Itanium MicroarchitectureMicroarchitecture

    ll Itanium UpdateItanium Update

    llUseful URLsUseful URLs

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 4

    EPIC Design PhilosophyEPIC Design Philosophy

    Maximize performance via hardware & software synergy

    Advanced features enhance instruction level parallelism

    Predication, Speculation, ...

    Massive hardware resources for parallel execution

    High performance EPIC building block

    Achieving performance at the most Achieving performance at the most fundamental levelfundamental level

    Time

    Per

    form

    ance

    CISC

    RISC

    OOO / SuperScalarVLIW

    EPICEPIC

  • 3

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 5

    Instruction 2Instruction 2 Instruction 1Instruction 1 Instruction 0Instruction 0 TemplateTemplate

    128128--bit bundlebit bundle

    00127127

    ss Breaking the sequential execution paradigmBreaking the sequential execution paradigmss Explicit instruction dependency: templateExplicit instruction dependency: template

    ss Flexibly groups any number of independent instructionsFlexibly groups any number of independent instructions

    ss Explicitly scheduled parallelismExplicitly scheduled parallelismss Enables compiler to create greater parallelismEnables compiler to create greater parallelism

    ss Simplifies hardware by removing dynamic mechanisms Simplifies hardware by removing dynamic mechanisms

    ss Fully interlockedFully interlocked-- hardware provides compatibilityhardware provides compatibility

    Instruction Format: Explicit ParallelismInstruction Format: Explicit Parallelism

    The new instruction format enables scalability w/ compatibility

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 6

    Branches Limit PerformanceBranches Limit Performance

    Traditional Traditional Architectures: 4 Architectures: 4

    basic blocksbasic blocks

    Control flow introduces branchesControl flow introduces branches

    Load a[i].Load a[i].ptrptrp1, p2 =p1, p2 = cmpcmp a[i].a[i].ptrptr != 0!= 0branch if p2branch if p2

    Load a[i].lLoad a[i].lstore b[i]store b[i]branchbranch

    Load a[i].rLoad a[i].rstore b[i]store b[i]

    i = i + 1i = i + 1

    elseelse

    thenthen

    ififIf a[i].ptr != 0

    b[i] = a[i].l;else

    b[i] = a[i].r;i = i + 1

  • 4

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 7

    Load a[i].Load a[i].ptrptrp1, p2 =p1, p2 = cmpcmp a[i].a[i].ptrptr != 0!= 0branch if p2branch if p2

    Load a[i].lLoad a[i].l store b[i]store b[i]branchbranch

    Predication removes branches Predication removes branches and eliminatesand eliminates mispredictsmispredicts

    PredicationPredication

    Load a[i].rLoad a[i].r store b[i]store b[i]

    i = i + 1i = i + 1

    elseelse

    thenthen

    ififIf a[i].ptr != 0

    b[i] = a[i].l;else

    b[i] = a[i].r;i = i + 1

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 8

    Predication Enhances ParallelismPredication Enhances ParallelismTraditional ArchitecturesTraditional Architectures: 4 basic blocks: 4 basic blocks IAIA--6464TMTM ArchitectureArchitecture: 1 basic block: 1 basic block

    Predication enables more Predication enables more effective use of parallel hardwareeffective use of parallel hardware

    Load a[i].Load a[i].ptrptrp1, p2 =p1, p2 = cmpcmp a[i] != 0a[i] != 0jump if p2jump if p2

    Load a[i].lLoad a[i].lstore b[i]store b[i]jumpjump

    Load a[i].rLoad a[i].rstore b[i]store b[i]

    i = i + 1i = i + 1

    elseelse

    thenthen

    ififLoad a[i].Load a[i].ptrptrp1, p2 =p1, p2 = cmpcmp a[i] != 0a[i] != 0

    Load a[i].lLoad a[i].l store b[i]store b[i]

    Load a[i].rLoad a[i].r store b[i]store b[i]

    i = i + 1i = i + 1

  • 5

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 9

    Memory Latency Causes DelaysMemory Latency Causes Delaysll Loads significantly affect performanceLoads significantly affect performance

    Often first instruction in dependency chain of instructionsOften first instruction in dependency chain of instructions

    Can incur high latenciesCan incur high latencies

    Add t1 + 1Add t1 + 1comp t1 > t2comp t1 > t2branchbranch

    Load a[t1Load a[t1--t2]t2]Load b[j]Load b[j]add b[j] + 1 add b[j] + 1

    BarrierBarrier

    Traditional ArchitecturesTraditional Architectures t1 = t1 + 1If t1 > t2

    j = a[t1 - t2]b[j] ++

    Loads can cause exceptionsLoads can cause exceptions

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 10

    Speculation with IASpeculation with IA--6464TMTM ArchitectureArchitecture

    ll Separate load behavior from exception behaviorSeparate load behavior from exception behavior

    Speculative load instruction (Speculative load instruction (load.sload.s) initiates a load ) initiates a load operation and detects exceptionsoperation and detects exceptions

    Propagate an exception Propagate an exception tokentoken (stored with (stored with destination register) from destination register) from load.sload.s to to check.scheck.s

    Speculative check instruction (Speculative check instruction (check.scheck.s) delivers any ) delivers any exceptions detected by exceptions detected by load.sload.s

    ;Exception Detection;Exception Detection

    ;Exception Delivery;Exception Delivery

    PropagatePropagateExceptionException

    Add t1 + 1Add t1 + 1load.s a[t1load.s a[t1--t2]t2]comp t1 > t2comp t1 > t2jumpjump

    Check.sCheck.sLoad b[j]Load b[j]add b[j] + 1add b[j] + 1

  • 6

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 11

    Speculation Minimizes the Effect Speculation Minimizes the Effect of Memory Latencyof Memory Latency

    ll Give scheduling freedom to the compilerGive scheduling freedom to the compiler

    Allows Allows load.sload.s to be scheduled above branchesto be scheduled above branches

    check.scheck.s remains in home block, branches toremains in home block, branches to fixupfixupcode if an exception is propagatedcode if an exception is propagated

    Add t1 + 1Add t1 + 1comp t1 > t2comp t1 > t2jumpjump

    Load a[t1Load a[t1--t2]t2]Load b[j]Load b[j]add b[j] + 1 add b[j] + 1

    Traditional ArchitecturesTraditional Architectures

    ;Exception Detection;Exception Detection

    ;Exception Delivery;Exception Delivery

    PropagatePropagateExceptionException

    Add t1 + 1Add t1 + 1load.s a[t1load.s a[t1--t2]t2]comp t1 > t2comp t1 > t2jumpjump

    Check.sCheck.sLoad b[j]Load b[j]add b[j] + 1 add b[j] + 1

    IAIA--64 Architecture64 Architecture

    BarrierBarrier

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 12

    Predication & SpeculationPredication & Speculation

    If a[i].ptr != 0b[i] = a[i].l;

    elseb[i] = a[i].r;

    i = i + 1

    Load a[i].Load a[i].ptrptrp1, p2 =p1, p2 = cmpcmp a[i].a[i].ptrptr != 0!= 0

    Load a[i].lLoad a[i].l store b[i]store b[i]

    Load a[i].rLoad a[i].r store b[i]store b[i]

    i = i + 1i = i + 1

    With PredicationWith Predication

    Load a[i]Load a[i]load.s a[I].l load.s a[I].rload.s a[I].l load.s a[I].rp1, p2 =p1, p2 = cmpcmp a[i] != 0a[i] != 0

    check.scheck.s store b[i]store b[i]

    check.scheck.s store b[i]store b[i]

    i = i + 1i = i + 1

    With Predication & SpeculationWith Predication & Speculation

    Predication and Predication and Speculation = higher ILPSpeculation = higher ILP

  • 7

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 13

    Agenda Agenda IAIA--64 Architecture64 Architecture

    llEPIC 101EPIC 101Application ArchitectureApplication Architecture

    System ArchitectureSystem Architecture

    Itanium MicroItanium Micro--ArchitectureArchitecture

    ll Itanium UpdateItanium Update

    llUseful URLsUseful URLs

    Copyright 2000 Intel Corporation. Linux Supercluster Users Conference

    IntelIntelLabsLabsPage 14