Pipelineing idealisam

1. PIPELINING IDEALISMANEESH R Center For Development of Advanced Computing (C-DAC) INDIA [email protected] R

2. Pipelining idealism Motivation of a k-stage pipelined design is to achieve a k-folded increase in throughput.The K-fold increase in throughput represents the ideal case.Unavoidable deviations form the idealism in real pipeline make pipelined design more challenging .Solution for idealism realism gap in pipelining is more challenging.Three points in pipelining idealism are :-Uniform sub-computations : Computation to be performed is evenly partitioned into uniform latency computations.Identical sub-computations : Same computation is to be performed repeatedly on a large number of input data setsIndependent sub-computations : All the repetitions of the same computations are mutually independent ANEESH R [email protected] 3. Uniform sub-computations The computation to be pipelined can be evenly partitioned into K-uniform latency subcomputations. Original design can be evenly partitioned into K-balanced(i.e. having same latency) pipeline stages.If the latency of the original computation and hence the clocking period of the non-pipelined design is T, then clocking period of a k-stage pipelined design is exactly T/K.The k-folded increase in throughput is achieved due to the k-fold increase of the clocking rate. This idealized concept may not be true in an actual pipeline design. It may not be possible to partition the computation into perfectly balanced stages.The latency of 400 ns of the non-pipelined computation is partitioned into three stages with latencies of 125, 150, and 125 ns, respectively.The original latency has not been evenly partitioned into three balanced stages.ANEESH R [email protected] 4. Uniform sub-computations (cont) The clocking period of a pipelined design is dictated by the stage with the longest latency. The stages with shorter latencies in effect will incur some inefficiency or penalty. The first and third stages have an inefficiency of 25 ns each. These are the internal fragmentation of pipeline stages. The total latency required for performing the same computation will increase from T to Tf The clocking period of the pipelined design will be no longer T/k but Tf/k The performance of the three sub-computations will require 450 ns instead of the original 400 ns The clocking period will be not 133 ns (400/3 ns) but 150 ns ANEESH R [email protected] 5. Uniform sub-computations (cont) In actual designs, an additional delay is introduced by the introduction of buffers between pipeline stages and an additional delay is also required for ensuring proper clocking of the pipeline stages. An additional 22 ns is required to ensure proper clocking of the pipeline stages. This results in the cycle time of 172 ns for the three-stage pipelined design. The ideal cycle time for a three-stage pipelined design would have been 133 ns. The difference between 172 and 133 ns for the clocking period accounts for the shortfall from the idealized three-fold increase of throughput.ANEESH R [email protected] 6. Uniform sub-computations (cont) Uniform sub-computations basically assumes two things: There is no inefficiency introduced due to the partitioning of the original computation into multiple sub-computationsThere is no additional delay caused by the introduction of the inter-stage buffers and the clocking requirementsThe additional delay incurred for proper pipeline clocking can be minimized by employing latches similar to the Earle latchThe partitioning of a computation into balanced pipeline stages constitutes the first challenge of pipelined design The goal is to achieve stages as balanced as possible to minimize internal fragmentationInternal fragmentation is the primary cause of deviation from the first point of pipelining idealism This deviation leads to the shortfall from the idealized k-fold increase of throughput in a kstage pipelined designANEESH R [email protected] 7. Identical sub-computations Many repetitions of the same computation are to be performed by the pipeline. The same computation is repeated on multiple sets of input data. Each repetition requires the same sequence of sub-computations provided by the pipeline stages. This is certainly true for the Pipelined Floating-Point Multiplier. Because this pipeline performs only one function, that is, floating-point multiplication. Many pairs of floating-point numbers are to be multiplied. Each pair of operands is sent through the same three pipeline stages. All the pipeline stages are used by every repetition of the computation.ANEESH R [email protected] 8. Identical sub-computations(cont) If a pipeline is designed to perform multiple functions, this assumption may not hold. An arithmetic pipeline can be designed to perform both addition and multiplicationNot all the pipeline stages may be required by each of the functions supported by the pipelineA different subset of pipeline stages is required for performing each of the functionsEach computation may not require all the pipeline stagesSome data sets will not require some pipeline stages and effectively will be idling during those stagesThese unused or idling pipeline stages introduce another form of pipeline inefficiency Called external fragmentation of pipeline stagesExternal fragmentation is a form of pipelining overhead and should be minimized in multifunction pipelinesANEESH R [email protected] 9. Identical sub-computations(cont) Identical computations effectively assume that all pipeline stages are always utilized.It also implies that there are many sets of data to be processed. It takes k cycles for the first data set to reach the last stage of the pipeline. These cycles are referred to as the pipeline fill time. After the last data set has entered the first pipeline stage, an additional k cycles are needed to drain the pipeline. During pipeline fill and drain times, not all the stages will be busy. Assuming the processing of many sets of input data is that the pipeline fill and drain times constitute a very small fraction of the total time. The pipeline stages can be considered, for all practical purposes, to be always busy. ANEESH R [email protected] 10. Independent sub-computations The repetitions of computation, or simply computations, to be processed by the pipeline are independent All the computations that are concurrently resident in the pipeline stages are independent They have no data or control dependences between any pair of the computations This permits the pipeline to operate in "streaming" mode A later computation needs not wait for the completion of an earlier computation due to a dependence between them For our pipelined floating-point multiplier this assumption holds If there are multiple pairs of operands to be multiplied, the multiplication of a pairof operands does not depend on the result from another multiplication These pairs can be processed by the pipeline in streaming mode ANEESH R [email protected] 11. Independent sub-computations (Cont) For some pipelines this point may not hold :A later computation may require the result of an earlier computationBoth of these computations can be concurrently resident in the pipeline stagesIf the later computation has entered the pipeline stage that needs the result while the earlier computation has not reached the pipeline stage that produces the needed result, the later computation must wait in that pipeline stage Referred to as a pipeline stallIf a computation is stalled in a pipeline stage, all subsequent computations may have to be stalledPipeline stalls effectively introduce idling pipeline stagesThis is essentially a dynamic form of external fragmentation and results in the reduction of pipeline throughputIn designing pipelines that need to process computations that are not necessarily independent, the goal is to produce a pipeline design that minimizes the amount of pipeline stallsANEESH R [email protected] 12. ANEESH R [email protected] 13. This topic is adopted form Micro-processor design by authors SHEN and LIPSATIANEESH R [email protected]

Pipelineing idealisam

Technology

Transcript of Pipelineing idealisam