Post on 20-Mar-2016
description
An Instruction Set and Micro architecture for Instruction
Level Distribution Processing
(Ho-Seop Kim and James E. Smith)
Haiying QuElectrical and Computer Engineering University of Alberta
Introduction 1 ILPILP: Instruction Level Parallelism Achieved significant performance gains
ILDPILDP: Instruction Level Distributed Processing Technology trend
Introduction 2 Proposed Micro architecture
Short pipelines Distributed processing elements: in-order instruction processing enable out-of order execution
Strand: dependent instructions Accumulator Inter instruction communication
Instruction Set 64 General Purpose
Registers: R0-R63 Source or
Destination 8 Accumulators: A0-A7
Dead Accumulator
Load/store Instruction One accumulator value One GPR One parcel Ai <- mem(Aj) Ai <- mem(Rj) mem(Ai) <- Rj mem(Rj) <- Ai
Register Instruction Operation: accumulator and GPR/immediate Result: accumulator or GPR Ai <- Ai op Rj Ai <- Ai op immed Ai <- Rj op immed Rj <- Ai Rj <- Ai op immed
Branch/jump Instruction
Conditional branch: compare Ai, 0 or GPR(All usual predicates)
Program counter (p) Indirect jump: Ai or GPR Return address: GPR P <- P + immed; Ai pred Rj P <- P + immed; Ai pred 0 P <- Ai P <- Rj P <- Ai; Rj <- P++
Example Code
Strand
Figure 3. Types of values and and associated registers
Strand Ends Two strands
intersect: copy one to GPR
Out put is a static global register
New strand
Figure 4. Issue timing
Stages Fetch: 4 words-- over 4 instructions Parceling: Break into individual instructions Renaming: GPR Steering: into FIFO according to the
accumulators
Figure 5 ILDP Processor Block Diagram
Some Concepts PE: Processing Element IR: Issue Register—single Reservation Station
ICN: Interconnection Network
Figure 6 Micro architecture
Table 1 Complexity Comparison
Please be noted: the ILDP’s is based on one PE
Table 2 Bench Mark Program Properties
Evaluation 1
Figure 7 type of register values Figure 8 Average strand length
Evaluation 2
Figure 9 Strand end Figure 10 instruction size
Evaluation 3
Figure 11 Cumulative strand re-use Figure 12 IPC
Evaluation 4
Figure 13 Global register rename map read/ write bandwidth
Table 3 Simulator Configurations
Discussion