Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and...
-
Upload
barbara-long -
Category
Documents
-
view
214 -
download
0
Transcript of Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and...
Performing Advanced Bit Manipulations Efficiently in General-Purpose ProcessorsYedidya Hilewitz and Ruby B. Lee
Princeton Architecture Lab for Multimedia and SecurityDepartment of Electrical EngineeringPrinceton University
18th IEEE Symposium on Computer Arithmetic (ARITH-18)Montpellier, France, June 25-27, 2007
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
2
Background and Motivation
Advanced bit manipulations are not well supported by commodity microprocessors These operations are performed using
“programming tricks” (cf. Hacker’s Delight) Bit manipulations play a role in applications
of increasing importance We propose a brand new shifter architecture
that replaces the shifter with a new unit that directly supports bit manipulation operations
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
3
Outline
Background and motivation Advanced bit manipulation operations
Delineation and example usage New shift-permute functional unit Summary and conclusions
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
4
Advanced Bit Manipulation Instructions Bit Permutation
Butterfly (bfly) and Inverse Butterfly (ibfly) Bit Gather and Bit Scatter
Parallel Extract and Parallel Deposit
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
5
Any of the n! permutations of n bits can be done with one pass of bfly and ibfly instructions bfly+ibfly = general permutation circuit 8-bit Butterfly
lg(n) stages of n 2:1 MUXes split into n/2 pairs that pass through or swap inputs
8-bit Inverse Butterfly
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
6
Bit Gather (Parallel Extract) and Bit Scatter (Parallel Deposit) Parallel Extract
pex r1 = r2, r3
extracts bits from r2 flagged by 1’s in r3 and compresses and right justifies in result register
Parallel extract maps to ibfly datapath
Parallel Deposit pdep r1 = r2, r3
deposits in the result register, at positions flagged by 1’s in r3, the right justified bits from r2
Parallel deposit maps to bfly datapath
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
7
Example Usage: Bioinformatics - DNA Sequence Reversal DNA Bases A, C, G and T
represented by two bit codes
Reversing DNA sequence is equivalent to reversing order of bit pairs bfly or ibfly permutation
1 ibfly instruction equivalent to 11-23 ALU and shifter instructions 2×(and, and, shift, shift, or)
+ byte reverse instruction, at minimum
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
8
Advanced Bit Manipulation Functional Unit We propose adding a new functional unit to
directly perform advanced bit manipulations To minimize the cost, we intend for this new
functional unit to replace the shifter unit Shifter currently performs basic bit manipulation
operations Our new functional unit represents an
evolution of shifter designs
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
9
Basic Bit Manipulation Operations shift r1 = r2, s
extract r1 = r2, pos, len
mix r1 = r2, r3
rotate r1 = r2, s
deposit r1 = r2, pos, len
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
10
Parallel Extract and Parallel Deposit Parallel Extract Parallel Deposit
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
11
Evolution of Shifter Designs
Barrel Shifter Log Shifter
Our proposed design
?
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
12
New Shifter Design
Inverse butterfly (or butterfly) circuit enhanced with extra multiplexer stage is basis of new shifter design
We will show that either butterfly or inverse butterfly individually can do rotate
Rotations are the basic operation underlying shift, extract, deposit and mix Model other basic bit manipulation operations as rotate +
zeroing sign bit propagation or merging
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
13
New Shift-Permute Functional Unit Implementation
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
14
Configuring Inverse Butterfly for Rotations Hard Problem: generating control bits for
rotations on inverse butterfly circuit We derive an expression for the control bits
based on recursive function of shift amount, s, and stage number, j
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
15
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit The input is right
rotated by 5 after each stage within each subcircuit
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
16
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 1, input is
right rotated by 5 (mod 2) = 1 within each 2-bit subcircuit
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
17
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 2, input is
right rotated by 5 (mod 4) = 1 within each 4-bit subcircuit
Bits that wrapped at output of previous stage are swapped
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
18
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 2, input is
right rotated by 5 (mod 4) = 1 within each 4-bit subcircuit
Bits that wrapped at output of previous stage are swapped
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
19
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 3, input is
right rotated by 5 Bits that wrapped at
output of previous stage are passed through
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
20
Rotations in general on n-bit Inverse Butterfly Circuit shift amount, s < n/2 → swap bits that wrapped
shift amount, s ≥ n/2 → pass through bits that wrapped
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
21
Circuit Implementation of Rotation Control Bit Generator
11 ||, jjj ssjsfcb
1{},
2,1,||||1,, 222
j
jsjsfssjsfjsf jjj
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
22
Comparison to Barrel and Log ShiftersBarrel Log IBFLY
# of Gates n2 n×log4(n) n×lg(n)
Control Lines n lg(n) n/2×lg(n)
Gate delay (of datapath)
1 log4(n) lg(n)
Mux Width (Capacitance)
n 4 2
Relative Delay (Logical Effort)
1.16× 1 1.19×
Bit Manipulation Capabilities
basic basic basic +
advanced
PALMS Princeton University
Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
23
Summary and Conclusions
We proposed evolving the shifter to a new design using butterfly and inverse butterfly datapaths New shifter subsumes basic shifter, multimedia shift-
permute unit and advanced bit manipulation unit We have shown how to perform basic shifter
operations on these datapaths Rotation control bit generator Extra multiplexer stage for masking and merging
Use of the new shifter design in future microprocessor implementations allows for increased capabilities at only marginal cost