Kris Gaj Office hours: Monday, 6:00-7:00 PM Tuesday, Thursday, 7:30-8:30 PM, and by appointment...

Kris Gaj

Office hours: Monday, 6:00-7:00 PM Tuesday, Thursday, 7:30-8:30 PM, and by appointment

Research and teaching interests:• cryptography• computer arithmetic• VLSI design and testing

Contact:Science & Technology II, room 223

[email protected]

(703) 993-1575

ECE 645

Part of:

MS in EE

MS in CpE

Digital Systems Design – pre-approved courseOther concentration areas – elective course

Certificate in VLSI Design/Manufacturing

PhD in IT

PhD in ECE

Spring 2007 Enrollment as of January 22, 2006

MS in CpE7

MS in EE4

PhD in CS1

Non-Degree4

My general area of interest is…

I want to specializeprimarily in…

VLSI

Digital Systems Design

ASICs & FPGAs

VHDL/Verilog

CAD Tools

Reconfigurable Computing

Microelectronics

VLSI Fabrication

Nanoelectronics

CAD tools & Design Automation

Hardware description languages

FPGAs & Reconfigurable computing

Computer arithmetic

Front-end ASICdesign (algorithmic downtogate level)

Back-end ASIC design (transistor downto devicelevel)

Analog & Mixed Circuit Design

VLSI Fabrication

Micro- and Nanoelectronics

Semiconductor Devices

MS CpEDigital Systems Design

MS EEMicroelectronics

Recommendeddegree &

concentration

algorithmic

register-transfer

gate

transistor

layout

devices

ComputerArithmetic

Introduction to VHDL

DigitalIntegrated

CircuitsMixed

Signals VLSI

VLSI Test Concepts

ECE545

ECE645

ECE 586

ECE 699

ECE682

ECE684ECE 584Semiconductor

Device Fundamentals

ECE681

VLSI Design Automation

MOS Device Electronics

ECE745

ECE 699ULSI

MicroelectronicsNano-

electronics

ECE 587

AnalogIntegrated

Circuits

CpEcore

EEcore

MS CpE: DIGITAL SYSTEMS DESIGN

Concentration advisors: Kris Gaj, Ken Hintz, David Hwang

1. ECE 545 Introduction to VHDL– K. Gaj, D. Hwang, project, VHDL, Aldec/Synplicity/Xilinx and Synopsys Design Analyzer/PrimeTime

2. ECE 645 Computer Arithmetic: HW and SW Implementation– K. Gaj, project, VHDL, Aldec/Synplicity/Xilinx and Synopsys Design Analyzer/PrimeTime

3. ECE 681 VLSI Design Automation – T. Storey, project/lab, back-end design with Synopsys tools

4. ECE 586 Digital Integrated Circuits – D. Ioannou

Prerequisites

Permission of the instructor, granted assuming that you know

VHDL or Verilog, High level programminglanguage(preferably C)

ECE 545 Introduction to VHDL

or

Course web page

ECE web page Courses Course web pages ECE 645

http://teal.gmu.edu/courses/ECE645/index.htm

Computer Arithmetic

Lecture Project

Project 1 20 %Project 2 30 %

Homework 10 %Midterm exam 1 (in class) 20 %Midterm exam 2 (take-home) 20 %

Advanced digital circuit design course covering

• addition and subtraction• multiplication• division and modular reduction• exponentiation

Efficient

Integersunsigned and signed

Real numbers• fixed point• single and double precision floating point

Elementsof the Galoisfield GF(2n)• polynomial base

Lecture topics (1)

1. Applications of computer arithmetic algorithms

2. Number representation

• Unsigned Integers• Signed Integers• Fixed-point real numbers• Floating-point real numbers• Elements of the Galois Field GF(2n)

INTRODUCTION

1. Basic addition, subtraction, and counting

2. Carry-lookahead, carry-select, and hybrid adders

3. Adders based on Parallel Prefix Networks

ADDITION AND SUBTRACTION

MULTIOPERAND ADDITION

1. Carry-save adders

2. Wallace and Dadda Trees

3. Adding multiple signed numbers

MULTIPLICATION

1. Tree and array multipliers

2. Sequential multipliers

3. Multiplication of signed numbers and squaring

DIVISION

1. Basic restoring and non-restoring sequential dividers

2. SRT and high-radix dividers

3. Array dividers

LONG INTEGER ARITHMETIC

1. Modular Exponentiation

2. Multi-Precision Arithmetic in Software

FLOATING POINT AND

GALOIS FIELD ARITHMETIC

1. Floating-point units

2. Galois Field GF(2n) units

• University of California, Santa Barbara, Behrooz Parhami, ECE252B: Computer Arithmetic.

• University of Massachusetts, Amherst, Israel Koren, ECE666: Digital Computer Arithmetic

• Lehigh University, Michael Schulte, ECE496: High-Speed Computer Arithmetic.

• Worcester Polytechnic Institute, Berk Sunar, EE-579 V Computer Arithmetic Circuits.

• Stanford University, Michael Flynn, EE486: Advanced Computer Arithmetic.

• University of California, Davies, Vojin Oklobdzija, ECE278: Computer Arithmetic for Digital Implementation.

Similar courses at other universities

New in this course

• real-life project based on VHDL or Verilog HDL

• operations in the Galois Field (with application in cryptography and communications)

Possible topics for a Scholarly Paper or Research Project

for the CpE & EE students

Advanced Computer Arithmetic

Square rootExponential and logarithmic functionsTrigonometric functionsHyperbolic functions

Fault-Tolerant ArithmeticLow-Power ArithmeticHigh-Throughput Arithmetic

Literature (1)

Required textbook:

Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000.

Milos D. Ercegovac and Tomas Lang Digital Arithmetic, Morgan Kaufmann Publishers, 2004.

Isreal Koren, Computer Arithmetic Algorithms, 2nd edition, A. K. Peters, Natick, MA, 2002.

Recommended textbooks:

Literature (2)

1. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998.

2. Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004.

VHDL books (used in ECE 545 in Fall 2005)

Literature (3)

Supplementary books:

1. E. E. Swartzlander, Jr., Computer Arithmetic, vols. I and II, IEEE Computer Society Press, 1990. 2. Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptology, Chapter 14, Efficient Implementation, CRC Press, Inc., 1998. 3. Christof Paar, Efficient VLSI Architectures for Bit Parallel Computation in Galois Fields, VDI Verlag, 1994.

Literature (3)

Proceedings of conferences ARITH - International Symposium on Computer Arithmetic ASIL - Asilomar Conference on Signals, Systems, and Computers ICCD - International Conference on Computer Design CHES - Workshop on Cryptographic Hardware and Embedded Systems

Journals and periodicals IEEE Transactions on Computers, in particular special issues on computer arithmetic: 8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94, 7/00, 3/05. IEEE Transactions on Circuits and Systems IEEE Transactions on Very Large Scale Integration IEE Proceedings: Computer and Digital Techniques Journal of VLSI Signal Processing

Homework

• reading assignments (main textbook + articles)

• analysis of hardware and software algorithms and implementations

• design of small hardware units using VHDL or Verilog

Optional assignments

Possibility of trading

analysis vs. design vs. coding

Midterm exams

Exam 1 - 2 hrs 30 minutes, in class multiple choice + short problems

Exam 2 – 48 hrs, take-home conceptual questions, analysis and design of arithmetic units using VHDL or Verilog HDL

Practice exams on the web

Exam 1 - Monday, March 26Exam 2 - Saturday-Sunday, May 5-6

Tentative days of exams:

Project (1)Project I (20% of grade)

Design and comparative analysis of fast adders (several hundred bits long)

Final report dueMonday, March 19

Optimization criteria:• minimum latency• maximum throughput• minimum area• minimum product latency · area• maximum ratio throughput/area• scalability

Similar for all students Done individually

Project II (30% of grade)

Fast • multiplication• squaring• division• modular reduction, or • modular exponentiation

Project (2)

or

Fast • addition or • multiplication

Long unsigned or signed integers

Floating-point numbers

Written report & oral presentationMonday, May 14

• Real life application

• Requirements derived from the analysis of the application

• Typically both hardware and software design

• Several project topics proposed on the web

• You can choose project topic by yourself

• Can be done in a group of 1-3 students

Project II (rules)

• Cooperation (but not exchange of codes) between teams is encouraged

• Every team works on a slightly different problem

• Project topics should be more complex for larger teams

Project II (rules)

Project

Hardware Software

VHDL (or Verilog) code

Latency and/or throughput

Area

High level language(C preferred)

Execution time

Memory requirements

Scalability Scalability

Degrees of freedom and possible trade-offs

speed area

power testability

ECE 645

ECE 682 ECE 586, 681

speed

area

latency

throughput

Degrees of freedom and possible trade-offs

Timing parameters

definition units pipelining

latency

throughput

delay

clock period

clock frequency

time inputoutput

#output bits/time unit

time pointpoint

rising edge rising edgeof clock

1clock period

ns

ns

Mbits/s

ns

MHz

bad

good

good

good

Project technologies

semi-custom Application Specific Integrated Circuits and Field Programmable Gate Arrays

Levels of design description

Algorithmic level

Register Transfer Level

Logic (gate) level

Circuit (transistor) level

Physical (layout) level

Level of description

most suitable for synthesis

Register Transfer Logic (RTL) Design Description

Combinational Logic

Combinational Logic

…

Clock

Registers

CAD software available at GMU (1)

• Aldec Active-HDL (under Windows)

• ModelSim Xilinx Edition III (under Windows)

• available in the FPGA Lab, S&T II, room 203• limited version available for free for individual use at home as a part of Xilinx WebPACK

• available in the FPGA Lab, S&T II, room 203

VHDL simulators

• student edition can be purchased on an individual basis ($59.95 + S&H)

http://www.aldec.com/education/students/


• Synplicity Synplify Pro (under Windows)

• Synopsys Design Compiler and PrimeTime (under Unix)• available from all PCs in the ECE educational labs using an X-terminal emulator• available remotely from home using a fast Internet connection

• available in the FPGA Lab, S&T II, room 203• available for free as a part of WebPACK

Tools used for logic synthesis

• Xilinx XST (under Windows)

FPGA synthesis

ASIC synthesis


• Xilinx ISE (under Windows)

• available in the FPGA Lab, S&T II, room 203

Tools used for implementation (mapping, placing & routing) in the FPGA technology

• Xilinx WebPACK (under Windows)

• limited version available for free for individual use at home as a part of Xilinx WebPACK

How to learn VHDL for synthesisby yourself?

• Lecture slides for ECE 545 from Fall 2005

• Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998.

• Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004.

• Individual or small-group hands-on sessions with the TA

• Practice, Practice, Practice!!!

Testbench

testbench

design entity

Architecture 1 Architecture 2 Architecture N. . . .

Non-synthesizable

Synthesizable

Representative

Inputs

HDL Design

(VHDL or Verilog)

Reference Model

(C or MAGMA ) expected results

Testbench

actual results= ?

Hardware Design Verification

Primary applications (1)

Execution units of general purpose microprocessors

Integer units Floating point units

Integers(8, 16, 32, 64 bits)

Real numbers (32, 64 bits)


Digital signal and digital image processing

Real or complex numbers(fixed-point or floating point)

e.g., digital filters Discrete Fourier Transform Discrete Hilbert Transform

General purpose DSP processors

Specialized circuits


Coding

Elements of the Galois fields GF(2n) (4-64 bits)

Error detection codesError correcting codes

Secret-key (Symmetric) Cryptosystems

key of Alice and Bob - KABkey of Alice and Bob - KAB

Alice Bob

Network

Encryption Decryption


Cryptography

Integers(16, 32 bits)

Secret key cryptography

IDEA, RC6, Mars Twofish, Rijndael

Elements of the Galois field GF(2n) (4, 8 bits)

RC6

MARS

Twofish

MUL32, 2 x ROL32,S-box 9x32

Mainoperations

Auxiliaryoperations

XOR,ADD/SUB32

2 x SQR32,2 x ROL32

XOR,ADD/SUB32

96 S-box 4x4,24 MUL GF(28)

XORADD32

Rijndael

Serpent 8 x 32 S-box 4x4

XOR

16 S-box 8x824 MUL GF(28)

XOR

Public Key (Asymmetric) Cryptosystems

Public key of Bob - KBPrivate key of Bob - kB

Alice Bob

Network

Encryption Decryption

RSA as a trap-door one-way function

M C = f(M) = Me mod N C

M = f-1(C) = Cd mod N

PUBLIC KEY

PRIVATE KEY

N = P Q P, Q - large prime numbers

e d 1 mod ((P-1)(Q-1))

RSA keys

PUBLIC KEY PRIVATE KEY

{ e, N } { d, P, Q }

N = P Q

e d 1 mod ((P-1)(Q-1))

P, Q - large prime numbers


Cryptography

Long integers(1000-16,000 bits)

Public key cryptography

RSA, DSA,Diffie-Hellman

Elliptic Curve Cryptosystems

Elements of the Galois field GF(2n) (150-500 bits)


Cipher Breaking

Public key cryptography

RSA PUBLIC KEY RSA PRIVATE KEY

{ e, N } { d, P, Q }

N = P Q P, Q

e d 1 mod ((P-1)(Q-1))

Estimation of RSA Security Inc. regarding the number and memory of PCs

necessary to break RSA-1024

Attack time: 1 year

Single machine: PC, 500 MHz, 170 GB RAM

Number of machines: 342,000,000

Factoring 1024-bit RSA keysusing Number Field Sieve (NFS)

Polynomial Selection

Linear Algebra

Square Root

Relation Collection

Sieving

Norm Factoring/Cofactoring

200 bit

numbers & 350 bit ECM

p-1 methodPollard rho

Sashisu Bajracharya

RamakrishnaBachimanchi

Comparison among technologies

SRC COPACOBANA

Microprocessors ASICsFPGAs

FPGAs vs. Microprocessors

Spartan3s5000Virtex2v6000

Pentium4 2.8GHz

637

869

635

857

315

435

80 7640

rho p-1 ECM

10.8x

7.8x

11.3x

8.4x

10.8x

7.9x

Local Memory

Global Memory

Rho in an ASIC 130 nm

51x

ASIC 130 nm vs. Virtex II 6000 – rho (24 units)

19.80 mm

19.6

8 m

m

2.7 mm

2.82 mm

Area of Virtex II 6000 (estimation by R.J. Lim Fong,

MS Thesis, VPI, 2004)

Area of an ASIC with equivalent functionality

Number of computations per second using the same chip area

88,405

869

21,739

435

101x

50x

Virtex2v6000 FPGA

130 nm ASIC library

rho ECM

Cofactorization Unit

interestingComputer Arithmeticproject

Famous computer arithmeticbugs and flaws

Learn to deal with approximations

• In digital arithmetic one has to come to grips with approximation and questions like:– When is approximation good enough

– What margin of error is acceptable

• Be aware of the applications you are designing the arithmetic circuit or program for

• Analyze the implications of your approximation

Calculators

2.....u =

10 times

v = 21/1024 = 1.000 677 131= 1.000 677 131

x = (((u2)2)…)2 = 1.999 999 963

10 times

x’ = u1024 = 1.999 999 973

y = (((v2)2)…)2 = 1.999 999 983

10 times

y’ = v1024 = 1.999 999 994

Hidden digits in the internal representation of numbersDifferent algorithms give slightly different results

Very good accuracy

Consequences of bad approximations

Example: Failure of Patriot Missile (1991 Feb. 25)

Source http://www.math.psu.edu/dna/455.f96/disasters.html

American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile The Scud struck an American Army barracks, killing 28

Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of the time since boot)

Specifics of the problem: time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to get the time in seconds. Internal registers were 24 bits wide 1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b) Error 0.1100 1100 2–23 9.5 10–8

Error in 100-hr operation period9.5 10–8 100 60 60 10 = 0.34 sDistance traveled by Scud = (0.34 s) (1676 m/s) 570 m

This put the Scud outside the Patriot’s “range gate” Ironically, the fact that the bad time calculation had been improved in some (but not all) code parts contributed to the problem, since it meant that inaccuracies did not cancel out

Example: Explosion of Ariane Rocket (1996 June 4)

Source http://www.math.psu.edu/dna/455.f96/disasters.html

Unmanned Ariane 5 rocket launched by the European Space Agency veered off its flight path, broke up, and exploded only 30 seconds after lift-off (altitude of 3700 m)

The $500 million rocket (with cargo) was on its 1st voyage after a decade of development costing $7 billion

Cause: “software error in the inertial reference system”

Specifics of the problem: a 64 bit floating point number relating to the horizontal velocity of the rocket was being converted to a 16 bit signed integer

An SRI* software exception arose during conversion because the 64-bit floating point number had a value greater than what could be represented by a 16-bit signed integer (max 32 767)

Consequences of bad approximations

Pentium bug (1)October 1994

Thomas Nicely, Lynchburg Collage, Virginiafinds an error in his computer calculations, and tracesit back to the Pentium processor

Tim Coe, Vitesse Semiconductorpresents an example with the worst-case error

c = 4 195 835/3 145 727

Pentium = 1.333 739 06...Correct result = 1.333 820 44...

November 7, 1994

Late 1994

First press announcement, Electronic Engineering Times

Pentium bug (2)

Intel admits “subtle flaw”

Intel’s white paper about the bug and its possible consequences

Intel - average spreadsheet user affected once in 27,000 yearsIBM - average spreadsheet user affected once every 24 days

Replacements based on customer needs

Announcement of no-question-asked replacements

November 30, 1994

December 20, 1994

Pentium bug (3)

Error traced back to the look-up table used bythe radix-4 SRT division algorithm

2048 cells, 1066 non-zero values {-2, -1, 1, 2}

5 non-zero values not downloaded correctly to the lookup table due to an error in the C script

Kris Gaj Office hours: Monday, 6:00-7:00 PM Tuesday, Thursday, 7:30-8:30 PM, and by appointment...

Documents

Transcript of Kris Gaj Office hours: Monday, 6:00-7:00 PM Tuesday, Thursday, 7:30-8:30 PM, and by appointment...