Square Root Function- The Restoring Algorithm

25
Square Root Function- Square Root Function- The Restoring The Restoring Algorithm Algorithm VLSI–Lab project Moran Amir Elior

description

Square Root Function- The Restoring Algorithm. VLSI–Lab project Moran Amir Elior. Goals and needs. The squaring function performs the basic math operation f(A) = Q such that Q 2 = A. - PowerPoint PPT Presentation

Transcript of Square Root Function- The Restoring Algorithm

Page 1: Square Root Function-  The Restoring Algorithm

Square Root Function- Square Root Function- The Restoring AlgorithmThe Restoring Algorithm

VLSI–Lab project

Moran Amir Elior

Page 2: Square Root Function-  The Restoring Algorithm

Goals and needsGoals and needs

• The squaring function performs the basic math operation f(A) = Q such that Q2 = A.

• The root method is considered difficult to implement in hardware, and requires iterative process (or use of lookup table).

• We present a method which is accurate (not an approximation). The results are Q and R such that:

2Q R A

Page 3: Square Root Function-  The Restoring Algorithm

MotivationMotivation

• The restoring method is based on “binary” search over the result range of the input, which is half the input bits.

• Each time, the last remainder is sign checked.

• If the remainder >= 0, we search in the upper domain, else, the lower domain.

• Since this is a square root, we can divide the input by 4 and not by 2.

Page 4: Square Root Function-  The Restoring Algorithm

The Restoring AlgorithmThe Restoring Algorithm• Initial conditions:

> Let R (the remainder) equal A, the input.> Let Q equal 0. Q =q1… qn

• Iterative step (i is the index):> if R>>2i >= { Q , 0 ,1 } then qj-1 = ‘1’ ; R = R – {Q , 0 , 1}> if R>>2i < { Q , 0 ,1 } then qj-1 = ‘0’ ; R = R R and Q are best thought of as changing in width, bit wise; in reality, they will be zero padded from the left.

We Compare R, which is originally the input, to the main terms of the square of q (as was explained for the squaring function method):26a3 , 24a2 , 22 a1, 20a0 (4 bit example)If we are bigger, we add zero to the result and keep the remainder;if we are smaller or equal we add one to the result, and subtract the term from the remainder such that we are left with the minor terms.

Page 5: Square Root Function-  The Restoring Algorithm

Example – square root of 11Example – square root of 11

Page 6: Square Root Function-  The Restoring Algorithm

Implementation issuesImplementation issues

• The operations needed are:> Subtraction> Shifting

• We can use a simple Data-path for this operators.

• We can use multiplied Conditional Subtraction (SC) units as well.

• For each of them, there are n/2+1 iterations.

Page 7: Square Root Function-  The Restoring Algorithm

Behavioral VHDL designBehavioral VHDL designFor Data Path implementationFor Data Path implementation

• Qj := "00000000";• R2j := D;• FOR j IN 4 DOWNTO 1 LOOP• Shift8(Qj,j,'1',Q_t);• Q_t(j+j-2) := '1';• Subtract(R2j,Q_t, R_t, negative);• IF (negative = '0') THEN• Qj(j-1) := '1';• R2j := R_t;• ELSE• Qj(j-1) := '0';• END IF;• END LOOP;

Page 8: Square Root Function-  The Restoring Algorithm

Using a Data pathUsing a Data path

ALU

0 1Q R

0

1

sign

load

Page 9: Square Root Function-  The Restoring Algorithm

Using SC unitsUsing SC unitsagain the square root of 11 exampleagain the square root of 11 example

Page 10: Square Root Function-  The Restoring Algorithm

ConsiderationsConsiderations• Design reuse: ALU already exists.• Simplicity: SC units are easy to implement:procedure SC

( signal CO, S : out Std_Logic ; signal R, D, CI, Q : in Std_Logic ) isbegin CO <= (R and D) or (R and CI) or (D and CI) ;

S <= R xor ((D xor CI) and Q) ; end SC ;

• Area: ~ same as ALU.• Speed: ALU demands 4-5 cycles. The SC

units can produce output much faster.• Power: Lower than ALU• ALU iteration number: q iteration• SC unit count: 0.5*q2 +2.5*q - 1

Page 11: Square Root Function-  The Restoring Algorithm

Root function implementationRoot function implementation

Page 12: Square Root Function-  The Restoring Algorithm

SC simple implementationSC simple implementation

Page 13: Square Root Function-  The Restoring Algorithm

SC optimized implementationSC optimized implementation

Page 14: Square Root Function-  The Restoring Algorithm

Behavioral VHDL simulationBehavioral VHDL simulation

Page 15: Square Root Function-  The Restoring Algorithm

Behavioral VHDL simulation (Cont’)Behavioral VHDL simulation (Cont’)

Page 16: Square Root Function-  The Restoring Algorithm

Behavioral VHDL simulation (Cont’)Behavioral VHDL simulation (Cont’)

Page 17: Square Root Function-  The Restoring Algorithm

Results on SchematicsResults on Schematics

A 0 1 2 3 4 5 6 7

R 0 0 0 1 2 0 1 2

Q 0 0 1 1 1 2 2 2

Page 18: Square Root Function-  The Restoring Algorithm

Results on Schematics IIResults on Schematics II

R 0 4 0 1 2 3 4 5

A 8 9 10 11 12 13 14 15

Q 0 2 3 3 3 3 3 3

Page 19: Square Root Function-  The Restoring Algorithm

Simulation results -QSimulation results -Q

Page 20: Square Root Function-  The Restoring Algorithm

Simulation results -RSimulation results -R

Page 21: Square Root Function-  The Restoring Algorithm

The SC unit maximal delayThe SC unit maximal delay

1.62nS SC max latency

Few transients with the maximal delays

103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0

Time (ns)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

Vol

tage

(V

)

v(D)-14.10m

v(Q)5.00

v(Cin)453.84m

v(S)2.47

v(Cout)-38.27m

x1= x2= dx=105.77n 107.39n 1.62nSC_sim_wResCap

43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5

Time (ns)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

Vol

tage

(V

)

v(Cout)4.01

v(S)2.38

v(Cin)4.46

v(Q)4.87

v(D)114.45m

x1= x2= dx=45.83n 47.34n 1.51nSC_sim_wResCap

Page 22: Square Root Function-  The Restoring Algorithm

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Time (us)

0.0

0.5

1.0

Pow

er (W

)p(Vdd)

noFF_sim_allCycles

PowerPowerOn 25 cycles

The most power consuming cycle is marked in red. 25mW RMS

Page 23: Square Root Function-  The Restoring Algorithm

Transistor count & latencyTransistor count & latency

• The SC unit:34 MOS devices SC max latency ~ 2.5nSec (includes margin)

• The Square Root extractor:17 SC units17 * 34 = 578 MOS devices Circuit max latency – 15XSC Latency = 40nSecMax working frequency = 25MHzRMS power on most consuming cycle = 25mWHighest power peek measured = 1W

Page 24: Square Root Function-  The Restoring Algorithm

Performance evaluationPerformance evaluation

• Using ALU scheme will require minimum of 4 cycles => 400 nSec

• Circuit improves speed by a factor of 10.• Area is not much less than the ALU unit

itself excluding the peripheries we should have add.

Page 25: Square Root Function-  The Restoring Algorithm

Credits for picturesCredits for pictures

• Alain Guyot’s site for TIMA Laboratory • http://tima-cmp.imag.fr/~guyot/Cours/

Oparithm/english/Extrac.htm