331 Week 13.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13...

32
331 Week 13.1 Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Buses and I/O system [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of 331 Week 13.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13...

331 Week 13.1 Spring 2006

14:332:331Computer Architecture and Assembly Language

Spring 2006

Week 13Buses and I/O system

[Adapted from Dave Patterson’s UCB CS152 slides and

Mary Jane Irwin’s PSU CSE331 slides]

331 Week 13.2 Spring 2006

Head’s Up This week’s material

Buses: Connecting I/O devices- Reading assignment – PH 8.4

Memory hierarchies- Reading assignment – PH 7.1 and B.8-9

Reminders Next week’s material Basics of caches

- Reading assignment – PH 7.2

331 Week 13.3 Spring 2006

Review: Major Components of a Computer

Processor

Control

Datapath

Memory

Devices

Input

Output

Cach

e

Main

M

emo

ry

Seco

nd

ary M

emo

ry(D

isk)

331 Week 13.4 Spring 2006

Input and Output Devices

I/O devices are incredibly diverse wrt Behavior Partner Data rate

Device Behavior Partner Data rate (KB/sec)Keyboard input human 0.01

Mouse input human 0.02

Laser printer output human 200.00

Graphics display output human 60,000.00

Network/LAN input or output

machine 500.00-6000.00

Floppy disk storage machine 100.00

Magnetic disk storage machine 2000.00-10,000.00

331 Week 13.5 Spring 2006

Magnetic Disk

Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy

- slow, large, inexpensive

General structure A rotating platter coated with a magnetic surface

Use a moveable read/write head to access the disk

Advantages of hard disks over floppy disks Platters are more rigid (metal or glass) so they can be larger Higher density because it can be controlled more precisely Higher data rate because it spins faster Can incorporate more than one platter

331 Week 13.6 Spring 2006

Organization of a Magnetic Disk

Typical numbers (depending on the disk size) 1 to 15 (2 surface) platters per disk with 1” to 8” diameter 1,000 to 5,000 tracks per surface 63 to 256 sectors per track

- the smallest unit that can be read/written (typically 512 to 1,024 B)

Traditionally all tracks have the same number of sectors- Newer disks with smart controllers can record more sectors on

the outer tracks (constant bit density)

Platters

Track

Sector

331 Week 13.7 Spring 2006

Magnetic Disk Characteristic Cylinder: all the tracks under the heads

at a given point on all surfaces

Read/write data is a three-stage process: Seek time: position the arm over the

proper track (6 to 14 ms avg.)- due to locality of disk references

the actual average seek time may be only 25% to 33% of the advertised number

Rotational latency: wait for the desired sectorto rotate under the read/write head (½ of 1/RPM)

Transfer time: transfer a block of bits (sector)under the read-write head (2 to 20 MB/sec typical)

Controller time: the overhead the disk controller imposes in performing an disk I/O access (typically < 2 ms)

SectorTrack

Cylinder

HeadPlatter

331 Week 13.8 Spring 2006

Magnetic Disk Examples

Characteristic Sun X6713A Toshiba MK2016Disk diameter (inches) 3.5 2.5

Capacity 73 GB 20 GB

MTTF (k hr’s) 1,200 300

# of platters - heads 2 - 4

# cylinders 16,383

# B/sector - # sectors/track 512 - 63

Rotation speed (RPM) 10,000 4,200

Max. - Avg. seek time (ms) ? - 6.6 24 - 13

Avg. rot. latency (ms) 3 7.14

Transfer rate (PIO) 35 MB/sec 16.6 MB/sec

Power (watts) < 2.5

Volume (in3) 4.01

Weight (oz) 3.49

331 Week 13.9 Spring 2006

I/O System Interconnect Issues

A bus is a shared communication link (a set of wires used to connect multiple subsystems)

Performance Expandability Resilience in the face of failure – fault tolerance

Processor

ReceiverMainMemory

Keyboard

bus

331 Week 13.10 Spring 2006

Performance Measures

Latency (execution time, response time) is the total time from the start to finish of one instruction or action

usually used to measure processor performance

Throughput – total amount of work done in a given amount of time

aka execution bandwidth the number of operations performed per second

Bandwidth – amount of information communicated across an interconnect (e.g., a bus) per unit time

the bit width of the operation * rate of the operation usually used to measure I/O performance

331 Week 13.11 Spring 2006

I/O System Expandability

Cache Memory

Memory - I/O Bus

MainMemory

I/OController

Disk Disk

I/OController

I/OController

Terminal Network

interrupt signals

Usually have more than one I/O device in the system each I/O device is controlled by an I/O Controller

Processor

331 Week 13.12 Spring 2006

Bus Characteristics

Control lines Signal requests and acknowledgments Indicate what type of information is on the data lines

Data lines Data, complex commands, and addresses

Bus transaction consists of Sending the address Receiving (or sending) the data

Data Lines

Control Lines

331 Week 13.13 Spring 2006

Output (Read) Bus Transaction Defined by what they do to memory

read = output: transfers data from memory (read) to I/O device (write)

Processor Main Memory

Control

Data

Step 1: Processor sends read request and read address to memory

Processor

Control

DataMain Memory

Step 2: Memory accesses data

Processor

Control

DataMain Memory

Step 3: Memory transfers data to disk

331 Week 13.14 Spring 2006

Input (Write) Bus Transaction Defined by what they do to memory

write = input: transfers data from I/O device (read) to memory (write)

Processor Main Memory

Control

Data

Step 1: Processor sends write request and write address to memory

Processor

Control

DataMain Memory

Step 2: Disk transfers data to memory

331 Week 13.15 Spring 2006

Advantages and Disadvantages of Buses Advantages

Versatility:- New devices can be added easily

- Peripherals can be moved between computer systems that use the same bus standard

Low Cost:- A single set of wires is shared in multiple ways

Disadvantages It creates a communication bottleneck

- The bus bandwidth limits the maximum I/O throughput

The maximum bus speed is largely limited by- The length of the bus

- The number of devices on the bus

It needs to support a range of devices with widely varying latencies and data transfer rates

331 Week 13.16 Spring 2006

Types of Buses Processor-Memory Bus (proprietary)

Short and high speed Matched to the memory system to maximize the memory-

processor bandwidth Optimized for cache block transfers

I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE) Usually is lengthy and slower Needs to accommodate a wide range of I/O devices Connects to the processor-memory bus or backplane bus

Backplane Bus (industry standard, e.g., PCI) The backplane is an interconnection structure within the

chassis Used as an intermediary bus connecting I/O busses to the

processor-memory bus

331 Week 13.17 Spring 2006

A Two Bus System

I/O buses tap into the processor-memory bus via Bus Adaptors (that do speed matching between buses)

Processor-memory bus: mainly for processor-memory traffic

I/O busses: provide expansion slots for I/O devices

Processor Memory

Processor-Memory Bus

I/OBus

BusAdaptor

BusAdaptor

BusAdaptor

I/OBus

I/OBus

331 Week 13.18 Spring 2006

A Three Bus System

A small number of Backplane Buses tap into the Processor-Memory Bus

Processor-Memory Bus is used for processor memory traffic I/O buses are connected to the Backplane Bus

Advantage: loading on the Processor-Memory Bus is greatly reduced

Processor Memory

Processor-Memory Bus

BusAdaptor

Backplane Bus

BusAdaptor

BusAdaptor

I/O Bus

I/O Bus

331 Week 13.19 Spring 2006

I/O System Example (Apple Mac 7200)

Cache Memory

PCI

MainMemory

I/OController

I/OController

Graphic Terminal

Network

Processor

Typical of midrange to high-end desktop system in 1997

PCIInterface/Memory

ControllerI/O

ControllerI/O

Controller

SC

SI

bu

s

Disk

CDRom

Tape

Processor-Memory Bus

Serial portsAudio I/O

331 Week 13.20 Spring 2006

Example: Pentium System Organization

Processor-MemoryBus

PCI Bus

I/O Busses

Memory controller(“Northbridge”)

http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&

331 Week 13.21 Spring 2006

Synchronous and Asynchronous Buses Synchronous Bus

Includes a clock in the control lines A fixed protocol for communication that is relative to the clock Advantage: involves very little logic and can run very fast Disadvantages:

- Every device on the bus must run at the same clock rate

- To avoid clock skew, they cannot be long if they are fast

Asynchronous Bus It is not clocked, so requires handshaking protocol (req, ack)

- Implemented with additional control lines

Advantages:- Can accommodate a wide range of devices

- Can be lengthened without worrying about clock skew or synchronization problems

Disadvantage: slow(er)

331 Week 13.22 Spring 2006

Asynchronous Handshaking Protocol

1. Memory sees ReadReq, reads addr from data lines, and raises Ack

2. I/O device sees Ack and releases the ReadReq and data lines

3. Memory sees ReadReq go low and drops Ack

4. When memory has data ready, it places it on data lines and raises DataRdy

5. I/O device sees DataRdy, reads the data from data lines, and raises Ack

6. Memory sees Ack, releases the data lines, and drops DataRdy

7. I/O device sees DataRdy go low and drops Ack

Output (read) data from memory to an I/O device.

I/O device signals a request by raising ReadReq and putting the addr on the data lines

12

3

ReadReq

Data

Ack

DataRdy

addr data

4

5

6

7

331 Week 13.23 Spring 2006

Key Characteristics of Two Bus Standards

Characteristic Firewire (1394) USB 2.0Type I/O I/O

Data bus width(signals)

4 2

Clocking asynchronous asynchronous

Theoretical Peak bandwidth

50 MB/sec (Firewire 400) or 100 MB/sec

(Firewire 800)

0.2 MB/sec (low speed), 1.5 MB/sec (full) or 60MB/sec (high)

Hot plugable Yes yes

Max. devices 63 127

Max. length (copper wire)

4.5 meters 5 meters

331 Week 13.24 Spring 2006

Memory Hierarchy Technologies Random Access

“Random” is good: access time is the same for all locations DRAM: Dynamic Random Access Memory

- High density (1 transistor cells), low power, cheap, slow

- Dynamic: need to be “refreshed” regularly (~ every 8 ms)

SRAM: Static Random Access Memory- Low density (6 transistor cells), high power, expensive, fast

- Static: content will last “forever” (until power turned off)

Size: DRAM/SRAM 4 to 8 Cost/Cycle time: SRAM/DRAM 8 to 16

“Non-so-random” Access Technology Access time varies from location to location and from time to

time (e.g., Disk, CDROM)

331 Week 13.25 Spring 2006

Classical SRAM Organization (~Square)

row

decoder

rowaddress

data word

RAM Cell Array

word (row) select

bit (data) lines

Each intersection represents a 6-T SRAM cell

Column Selector & I/O Circuits

columnaddress

One memory row holds a block of data, so the column address selects the requested word from that block

331 Week 13.26 Spring 2006

data bitdata bit

Classical DRAM Organization (~Square Planes)

row

decoder

rowaddress

Column Selector & I/O Circuits

columnaddress

data bit

word (row) select

bit (data) lines

Each intersection represents a 1-T DRAM cell

The column addressselects the requested bit from the row in

eachplanedata word

. . .

. . .

RAM Cell Array

331 Week 13.27 Spring 2006

RAM Memory Definitions Caches use SRAM for speed

Main Memory is DRAM for density Addresses divided into 2 halves (row and column)

- RAS or Row Access Strobe triggering row decoder- CAS or Column Access Strobe triggering column selector

Performance of Main Memory DRAMs Latency: Time to access one word

- Access Time: time between request and when word arrives- Cycle Time: time between requests- Usually cycle time > access time

Bandwidth: How much data can be supplied per unit time- width of the data channel * the rate at which it can be used

331 Week 13.28 Spring 2006

Classical DRAM Operation

DRAM Organization: N rows x N column x M-bit Read or Write M-bit at a time Each M-bit access requires

a RAS / CAS cycle

Row Address

CAS

RAS

Col Address Row Address Col Address

1st M-bit Access 2nd M-bit Access

N r

ows

N cols

DRAM

M bits

RowAddress

ColumnAddress

M-bit OutputCycle Time

331 Week 13.29 Spring 2006

Ways to Improve DRAM Performance Memory interleaving

Fast Page Mode DRAMs – FPM DRAMs www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.

htm

Extended Data Out DRAMs – EDO DRAMs www.chips.ibm.com/products/memory/88H2011/88H2011.pdf

Synchronous DRAMS – SDRAMS www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D.

htm

Rambus DRAMS www.rambus.com/developer/quickfind_documents.html www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669B

.htm

Double Data Rate DRAMs – DDR DRAMS www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323H

A.htm

. . .

331 Week 13.30 Spring 2006

Increasing Bandwidth - InterleavingAccess pattern without Interleaving:

Start Access for D1

CPU Memory

Start Access for D2

D1 available

Access pattern with 4-way Interleaving:

CPU

MemoryBank 1

MemoryBank 0

MemoryBank 3

MemoryBank 2

Cycle Time

Access Time

Acc

ess

Ban

k 0

Access Bank 1

Access Bank 2

Access Bank 3

We can Access Bank 0 again

D2 available

331 Week 13.31 Spring 2006

Problems with Interleaving

How many banks? Ideally, the number of banks number of clocks we have to

wait to access the next word in the bank Only works for sequential accesses (i.e., first word requested

in first bank, second word requested in second bank, etc.)

Increasing DRAM sizes => fewer chips => harder to have banks

Growth bits/chip DRAM : 50%-60%/yr

Only can use for very large memory systems (e.g., those encountered in supercomputer systems)

331 Week 13.32 Spring 2006

N r

ows

N cols

DRAM

ColumnAddress

M-bit Output

M bitsN x M “SRAM”

RowAddress

Fast Page Mode DRAM Operation Fast Page Mode DRAM

N x M “SRAM” to save a row

Row Address

CAS

RAS

Col Address Col Address

1st M-bit Access

Col Address Col Address

2nd M-bit 3rd M-bit 4th M-bit

After a row is read into the SRAM “register”

Only CAS is needed to access other M-bit blocks on that row

RAS remains asserted while CAS is toggled