Computer Architecture All lecture.pdf

337
Lecture № 1 Introduction to the Computer Organization and Architecture. 1. Notions of the Computer Organization and Architecture. 2. Functions of the Computer System (CS): Data processing; Data storage; Data movement; Control. 3. Structure of the CS (hierarchy levels). 4. Multilevel computer organization. Literature. 1. Stallings W. Computer Organization and Architecture. Designing and performance, 5 th ed. Upper Saddle River, NJ : Prentice Hall, 2002. 2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4 th ed. McGRAW- HILL INTERNATIONAL EDITIONS, 1996. 3. Tanenbaum, A.S. Structured Computer Organization, 4 th ed. - Upper Saddle River, NJ : Prentice Hall, 2002. Key words. Architecture, structure, organization, function, instruction, coding, interface, heritage, processing, storage, movement, control, peripherals, Central Processing Unit (CPU), Main Memory, System Interconnection (System Bus), Input, Output, Register, Arithmetic and Login Unit, Control Unit, Sequencing Login, Decoder

Transcript of Computer Architecture All lecture.pdf

Page 1: Computer Architecture All lecture.pdf

Lecture № 1

Introduction to the Computer Organization and Architecture. 1. Notions of the Computer Organization and Architecture.

2. Functions of the Computer System (CS):

Data processing;

Data storage;

Data movement;

Control.

3. Structure of the CS (hierarchy levels).

4. Multilevel computer organization.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th ed. – Upper

Saddle River, NJ : Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th

ed. – McGRAW-

HILL INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th ed. - Upper Saddle River, NJ : Prentice Hall,

2002.

Key words.

Architecture, structure, organization, function, instruction, coding, interface, heritage, processing, storage,

movement, control, peripherals, Central Processing Unit (CPU), Main Memory, System Interconnection

(System Bus), Input, Output, Register, Arithmetic and Login Unit, Control Unit, Sequencing Login, Decoder

Page 2: Computer Architecture All lecture.pdf

Definition 1. Architecture of the Computer System(CS) is a specification of its

interfaces, which determines data processing and includes: methods of data coding,

system of instructions, principles of software-hardware interaction. It is also determined

as a set of information, which is necessary and sufficient for programming in the machinery code.

Definition 2. The operational units and their interconnections that realize the

architecture of the CS is the Organization of the CS.

All Intel x86 family share the same basic architecture The IBM System/370 family share the same basic architecture This gives code compatibility, software succession Organization differs between different versions Architecture is more conservative than organization

Structure is the way of merging (uniting) components of some subsystem in one (whole) unit.

Function is an operation of individual component as a part of the structure.

All computer functions are: 1.- Data processing, 2.- Data storage, 3. - Data movement, 4. – Control

Page 3: Computer Architecture All lecture.pdf

Functional view of a computer

Data Movement Apparatus

Control Mechanism

Data Storage Facility

Data Processing

Facility

Operating

Environment

(sources and

destinations of

data)

Page 4: Computer Architecture All lecture.pdf

Operation (1)

Data movement e.g. keyboard to screen

Data Movement Apparatus

Control Mechanism

Data Storage Facility

Data Processing

Facility

Page 5: Computer Architecture All lecture.pdf

Operation (2)

Storage e.g. Internet download to disk

Data Movement Apparatus

Control Mechanism

Data Storage Facility

Data Processing

Facility

Page 6: Computer Architecture All lecture.pdf

Operation (3)

Processing from/to storage e.g. updating bank statement

Data Movement Apparatus

Control Mechanism

Data Storage Facility

Data Processing Facility

Page 7: Computer Architecture All lecture.pdf

Operation (4)

Processing from storage to I/O e.g. printing a bank statement

Data Movement Apparatus

Control Mechanism

Data Storage Facility

Data Processing

Facility

Page 8: Computer Architecture All lecture.pdf

Structure - Top Level.

Computer

Main Memory

Input Output

Systems Interconnection

Peripherals

Communication

lines

Central Processing Unit

Computer Manages the

functioning of the system,

and executes functions of

data processing;

Stores the initial data

and all information, which is

necessary for data processing

Mechanism, which

provides data interchange

among CPU, MM and I/O

Relocate data between

the computer and

environment in both

directions.

Page 9: Computer Architecture All lecture.pdf

Structure - The CPU

Computer

Arithmetic and

Login Unit

Control

Unit

Internal CPU Interconnection

Registers

CPU

I/O

Memory

System Bus

CPU Store operative information

during the CPU execution of

a current operation;

Executes all operations

concerned with data

processing pithiness;

Mechanism, which provides

joined work of CPU

components;

Controls CPU’s components

functioning

Page 10: Computer Architecture All lecture.pdf

Structure - The Control Unit

CPU

Control Memory

Control Unit Registers and Decoders

Sequencing

Login

Control Unit

ALU

Registers

Internal

Bus

Control Unit

Serves for execution of

concrete actions; it has a

finite set of internal states

and finite set of input

meanings;

Transforms n-width input

binary word into unique

signal on one of 2n

outputs of this schema;

Stores data (a micro-

program as a whole),

which are directly used

by ALU and CU itself.

Sometimes it’s realized in

a form of gate’s set.

Page 11: Computer Architecture All lecture.pdf

Многоуровневая компьютерная организация. Электронные схемы каждого компьютера могут

распознавать и выполнять ограниченный набор простых

(примитивных) команд. Поэтому все программы перед

выполнением должны быть превращены в

последовательность примитивных. Эти примитивные

команды в совокупности составляют язык, на котором

люди общаются с компьютером. Такой язык называется

машинным языком. Использовать машинный язык

утомительно и трудно. Для преодоления этих сложностей

стали строиться ряды уровней абстракции (абстракция

более высокого уровня надстраивается над абстракцией

нижней, здесь под абстракцией понимается набор

удобных для человека команд). Такой подход называют

многоуровневой компьютерной организацией.

Языки, уровни и виртуальные машины. Пусть новые (более удобные для человека) команды в совокупности формируют язык Я1. Машинный язык обозначим Я0 (компьютер может выполнять только эти команды). Для того, чтобы выполнить программу, написанную на Я1, необходимо заменить каждую из команд этой программы на эквивалентный набор

Multilevel computer organization. Electronic circuits of every computer can identify and execute a limited set of simple (primitive) instructions. That is why all the programs must be transformed in a sequence of simple ones. These primitive instructions in totality compose a language in which people communicate with a computer. Such language is called the machine language. It is very difficult and tiresome to use such languages. In order to overcome these difficulties series of abstract levels were constructed (an abstraction of the higher level is build over of the lower one, here under abstraction a set of convenient for user languages is meant). This approach is called multilevel computer organization.

Languages, levels and virtual machines. Let the new (more convenient for user) instructions in

totality form a language L1. The machine language we letter as

L0 (the computer can execute only these instructions). In order

to run the program in L1 language it is necessary to replace each

instruction of this program by an equivalent set of instructions in

Page 12: Computer Architecture All lecture.pdf

команд в языке Я0. В результате мы получим программу, которую может выполнить компьютер. Эта технология называется трансляцией.

Допустим в компьютере имеется специальная

программа (на Я0), которая “берет” программы,

написанные на Я1, в качестве входных данных,

рассматривает каждую команду по очереди и сразу

подбирает эквивалентный набор команд на Я0 и выполняет

их. Такая технология называется интерпретацией.

(Программа, осуществляющая интерпретацию, называется

интерпретатором).

Представим себе существование виртуальной

машины, для которой машинным языком является язык

Я1 и обозначим ее М1, а виртуальную машину с языком Я0

– М0. На самом деле М1 можно сконструировать, но с

большими затратами. Таким образом, можно писать

программы для виртуальных машин и не думать о

трансляции и интерпретации. При этом можно создавать

языки, которые в большей степени ориентированы на

человека: языки Я2, Я3 и т.д., которые является

машинными для виртуальных машин М2, М3 и т.д.

Изобретение новых языков может продолжаться до тех

пор, пока мы не дойдем до подходящего нам языка.

Каждый из этих языков будет использовать предыдущий

как основу, поэтому компьютер можно рассматривать как

the language L0. As a result we’ll get a program, which can be

executed by the computer. This technology is called translation.

Let’s assume that there is a special program (in L0), which

“takes” programs in L1 as data, considers each instruction in

turn and immediately chooses the equivalent set of instructions

in language L0 and executes them. Such technology is called

interpretation. (The program, which executes interpretation is

called an interpreter).

Let’s imagine an existence of a virtual machine, which

has a machine language L1 and letter it as M1, and virtual

machine with a language L0 as M0. In fact M1 may be built, but

with large expenditures. So, it is possible to create programs for

virtual machines and don’t worry about translation and

interpretation. It is possible to create such languages, which are

mostly oriented on users: L2, L3, . . . , Ln, which are machine

languages for virtual machines M2, M3 and so on. Invention of

new languages may continue till the last will satisfy user’s

demands. Each of these languages will use previous as a base,

that is why it is possible to consider computer as a system,

which consists of levels series.

There is an important relation between the language and

the virtual machine. Every machine has got certain machine

language, and the machine indeed determines the language. We

will use terms “level” and “virtual machine” as synonymous . It

is important to remember that only program in L0 can be

Page 13: Computer Architecture All lecture.pdf

систему, состоящую из ряда уровней.

Между языком и виртуальной машиной существует

важная зависимость. У каждой машины есть какой-то

определенный машинный язык, в сущности, машина

определяет язык. Термины «уровень» и «виртуальная

машина» будем использовать как синонимы. При этом

важно помнить, что только программы на Я0 выполняются

компьютером без трансляции. Программисты обычно

интересуются только языком уровня Яn, однако для того,

чтобы понимать, как работает компьютер, необходимо

знать все уровни.

Современные многоуровневые машины.

Большинство современных компьютеров состоят из

двух и более уровней. Уровень 0 – аппаратное обеспечение

(его электронные схемы выполняют программы на языке

первого уровня)). На самом деле существует уровень,

лежащий ниже нулевого, но он попадает в сферу

электронной техники и нами не рассматривается (это –

уровень физических устройств).

Нулевой уровень – цифровой логический уровень

(объектами этого уровня являются вентили, каждый

вентиль формируется из нескольких транзисторов; группа

вентилей формирует 1 бит памяти; биты памяти

объединяются в группы и формируют регистры).

Первый уровень – микро архитектурный уровень.

executed by the computer without translation. Programmers are

usually interested in language Ln only, but for those , who wont

to understand how does the computer work it is necessary to

know all the levels.

Contemporary multi-levels machines.

The majority of contemporary computers include

two and more levels (up to six levels). The zero level is

hardware (its electron circuits execute programs in

language of the first level). In fact there is one more

level below zero one, but it belongs to the sphere of

electronic engineering and we will not consider it here

(it is a level of physical devices).

The zero level is the digital-logic level (its objects

are gates; every gate is constructed from some

transistors; a group of gates forms 1 bit of memory;

groups of gates are united in another groups and form

registers).

The first level is called micro architecture level.

There are 8 or 32 registers and ALU (arithmetic-logic

unit) on this level, the registers form local memory.

ALU performs simple logic and arithmetic operations.

Memory registers and ALU altogether form data tract

Page 14: Computer Architecture All lecture.pdf

На этом уровне имеется совокупность 8 или 32 регистров,

которые формируют локальную память и схему АЛУ

(арифметико-логического устройства). АЛУ выполняет

простые логические операции. Регистры вместе с АЛУ

формируют тракт данных (тракт данных состоит из:

выбора одного или двух регистров, АЛУ производит

действия с данными этих регистров и помещает результат

в один из них). Тракт данных может контролироваться

специальной микропрограммой, либо аппаратными

средствами. Для машин, где тракт данных контролируется

программным обеспечением, микропрограмма – это

интерпретатор для команд на уровне 2.

Второй уровень – уровень архитектуры системы

команд. Он включает команды, которые выполняются

микропрограммой-интерпретатором или аппаратным

обеспечением.

Третий уровень – уровень операционной системы.

Этот уровень имеет гибридный характер. Он может

включать команды, имеющиеся на нижних уровнях.

Особенностью этого уровня является: наличие новых

команд, другая организация памяти, способность

выполнять несколько программ одновременно и др. Новые

средства, появившиеся на третьем уровне, выполняются

интерпретатором, который работает на втором уровне.

Этот интерпретатор был когда-то назван операционной

(data tract consists in selection of one or two registers,

ALU executes some operations with data of these

registers and places result in some of the registers). The

tract may be controlled by a special micro instruction

or by the hardware facilities. For machines in which the

data tract is controlled by the micro program the last is

called an interpreter for instructions for the second

level’s programs.

The second level is the level of system

instructions architecture. It includes instructions

which are executed by the micro program interpreter or

by hardware.

The third level is called the operation system level.

It may include instructions which belong to the lowest

levels. The peculiarity of this level: the presence of new

instructions; the use of another memory organization;

the ability to execute many programs simultaneously

and others. So, a part of instructions of the third level

(new instructions) is interpreted by the operational

system, and the other one (instructions identical to

instructions of the second level) is interpreted by micro

program (that is why this level has a hybrid character).

Page 15: Computer Architecture All lecture.pdf

системой. Таким образом, одна часть команд третьего

уровня (новые команды) интерпретируется операционной

системой, а другая (команды идентичные командам

второго уровня)-–микропрограммой (вот почему он

является гибридным).

Четвертый уровень – уровень языка ассемблера.

Этот уровень представляет символическую форму одного

из языков более низкого уровня. На этом уровне можно

писать программы в приемлемой для человека форме. Эти

программы сначала транслируются на язык уровня 1, 2 или

3, а затем интерпретируются соответствующей

виртуальной или фактически существующей машиной.

Программа, которая выполняет трансляцию, называется

ассемблером.

Пятый уровень – Языки высокого уровня. Этот

уровень состоит из языков, разработанных для прикладных

программистов. Программы, написанные на этих языках,

обычно транслируются на уровень 4 или 3. Трансляторы,

которые обрабатывают эти программы называются

компиляторами (иногда также используется метод

интерпретации, например программы на язык Java обычно

интерпретируются).

Вывод : компьютер проектируется как иерархическая

структура уровней, каждый из которых надстраивается над

предыдущим. Каждый уровень представляет собой

The forth level is the level of assembly language.

This level represents a symbolic form (not digital) of

one of the languages of lower levels. On this level it is

possible to write programs in an acceptable to users

form. These programs are translated first on some of

languages of levels 1, 2 or 3, and after this they are

interpreted by corresponding virtual or really existed

machine (most of the programs of 4th

level are supported

by a translator; programs of 2nd

and 3rd

levels are

interpretable). The program which fulfills the translation

is called assembler.

The fifth level is the High Level Languages. This

level consists of languages which were created for the

applied programmers. Programs in such languages are

usually translated into the 4th

or 3rd

level. Translators,

which processed these programs are called compilers

(sometimes the method of interpretation is used; e.g.

programs in Java language are usually interpreted).

Inference: computer is usually designed as a

hierarchy structure of levels each of which is built over

the preceding. Every level represents a certain

abstraction with different objects and operations.

Page 16: Computer Architecture All lecture.pdf

определенную абстракцию с различными объектами и

операциями.

Набор типов данных, операций и особенностей

каждого уровня называется архитектурой.

В конце 50-х годов компания IBM решила, что

производство семейств компьютеров, каждый из которых

выполняет одни и те же команды, имеет много

преимуществ и для компании и для покупателей. Чтобы

описать уровень совместимости таких компьютеров IBM

ввела термин архитектура. Новое семейство компьютеров

должно было иметь одну общую архитектуру и много

разных разработок, различающихся по цене и скорости

(при этом они могли выполнять одни и те же программы).

Это достигалось с помощью интерпретации (такую

технологию предложил Уилкс в 1951 году). Аппаратное

обеспечение без интерпретации использовалось в самых

дорогих компьютерах.

Развитие многоуровневых машин.

Аппаратное обеспечение состоит из осязаемых

объектов: интегральных схем, печатных плат, кабелей,

источников электропитания, запоминающих устройств и

устройств ввода/вывода.

Программное обеспечение состоит из

подробных последовательностей команд и их

The set of data types, operations and specifications

of every level is called architecture.

At the end of 50-th the IBM company decided, that manufacturing of computer families, where every computer executes the same instructions, has many preferences as for the company, so for customers. In order to describe the level of compatibility of such computers the IBM introduced the term architecture. The new family of computers should have the same common architecture and many different elaboration, which are distinguished by prices and velocities (with it all they can perform the same set of programs). It was realized with help of interpretation. The hardware instead of interpretation was used in the most expensive computers.

Development of multilevel machines. Hardware consists of tangible objects: integration

circuit boards, cables, power supply, storage units,

input/output devices.

Software consists of detail sequences of instructions

and their computer presentations (i.e. programs).

Page 17: Computer Architecture All lecture.pdf

компьютерных представлений.

Вначале граница между АО и ПО была очевидной. Со

временем эта граница стала размытой.

В действительности АО и ПО логически эквивалентны:

любая операция, выполняемая ПО, может быть встроена в

АО (желательно после того, как она создана) и, - наоборот.

Решение разделить функции АО и ПО основано на таких

фактах, как стоимость, скорость, надежность, а также

частота ожидаемых изменений, однако существует

несколько жестких правил, которые определяют, что

должно принадлежать АО, а что – ПО.

At first there was a legible border between hardware

and software. Eventually this border has become

obliterated.

In reality hardware and software are logically

equivalent: any operation which is executed by software

can be mounted in hardware (it is advisable after it has

been carried out), and vice versa.

The decision of hardware-software functions

disintegration is based on such facts as: cost, velocity

and frequency of anticipated changes, but there are some

strict regulations which determine what must belong to

hardware and what to software.

Page 18: Computer Architecture All lecture.pdf

Уровень операционной системы

Уровень языка ассемблера

Язык высокого уровня

Уровень архитектуры команд

Микроархитектурный уровень

Уровень 5

Уровень 4

Уровень 3

Уровень 2

Цифровой логический уровень

Уровень 1

Уровень 0

Аппаратное обеспечение

Интерпретация (микропрограмма)

или непосредственное выполнение

Трансляция (ассемблер)

Трансляция (ассемблер)

Трансляция (компилятор)

Компьютер с шестью уровнями

Page 19: Computer Architecture All lecture.pdf

Data are base elements of information, such as numbers,

letters, symbols and so on, which are processed or carried out by

human or computer (or by some machine) [sometimes the

information itself, prepared for certain purposes(in a special

form) is considered as data].

Information is a matter conferred (присваиваемое

содержимое) to the data.

Format is a way of data representation, or a scheme of data

positioning.

System is a set of material or abstract objects, which are

simultaneously considered as one whole (entire), and which

have been united for achieving some certain results.

Computer System is a device or a complex of devices, which

is intended for mechanization or automating of data processing,

and which is constructed on the base of electronic elements

(transistors, logic circuits, magnet elements and so on).

Analog Computer is a computing device, which processes

data given in a form of continuously changing physical values,

the meanings of which may be measured (such values may be

angle or linear transfers, electric voltage, electric current power,

time and so on). These analog values are processed by

mechanical or some other physical methods, by measuring

results of such operations. Such type of computers are usually

used for solving equations, describing processes in real scale of

time, when initial data is input from special measuring data

monitors.]

[ Digital Computer is an electronic computing device, which

receives a discrete input data, processes it in accordance with

the list of instructions stored inside it and generates resulting

output data. (Instructions may be considered as a special type of

data, which are coded in correspondence with format; these

instructions: a)manage data transfer as inside the computer

itself, so the computer internal and peripheral devices (input-

output devices), b) determine arithmetic or logic operations to

be performed).]

[ Hybrid Computer is a computing system, in which elements

of analog and digital computers are combined. These computers

Page 20: Computer Architecture All lecture.pdf

are used for solving equations by implementing analog devices,

but for storage, future processing and results representation

digital devices are implemented.]

Configuration of Computer System is a concrete composition

of hardware devices and interconnections among them, which is

used during a certain period of time. It determines character of

the considered system work (there is a special program in

Computer System, which allows change in available

frameworks Composition of Computer).

Hardware consists of tangible(palpable) objects: integrated

circuits, printed boards, cables, memory devices, printers, some

others technical devices and physical equipment.

Software is a detailed instructions that control the operation

of a computer system.

Interface is:

(1) a relation between two processing components.

(2) a complete complex of agreements (a language in a

common sense) concerning input and output signals,

by which may exchange the following data processors:

computer device – computer device; program –

program medium; human beings – data processing

system, - and some others. These agreements are called

protocols. Protocols are sequence of technical requirements,

which must be provided by constructors of any device for

successful concordance (compatibility) of its (the considered

device) work with other devices.

Page 21: Computer Architecture All lecture.pdf

Questions for Quiz № 1 and № 2

1. In your own words explain the following notions(concepts)

and give examples:

a) data, information, format;

b) computer (analog, digital, hybrid);

c) hardware, software, computer configuration;

d) function, structure, interface;

e) architecture, organization.

2. List the major components of contemporary computer

system and indicate there functions.

3. List operations, which you more often use, when you work

with a computer and explain, which of the computer’s major

components are engaged in a process of executing one of

these operations.

4. Analyze the 5 given below definitions of Computer

architecture. Which of these definitions does more than

others correspond to the officially accepted one? (Give a

detailed explanation).

1)”The design of the integrated system which provides a

useful tool to the programmer” (Bear)

2)”The study of structure, behavior and design of

computers” (Hayes)

3)”The design of the system specification at a general or

subsystem level”(Abd-Alla)

4)”The art of designing a machine that will be pleasure to

work with”(Foster)

5)”The interface between the hardware and the lowest level

software”(Hennessy and Patterson).

5. Call the minimal number of levels of virtual machine, which

can execute all main computer functions (give explanation).

6. What is the difference between translator and interpreter?

7. Why computer hardware and computer software are

considered as logically equivalent?

Page 22: Computer Architecture All lecture.pdf

List all operations (draw up a sketch), which are to be

performed by Computer System for:

1. – record deletion;

2. – data editing;

3. – data correcting on the hard disk;

4. – copying data from the hard disk to CD;

5. – printing a text from CD;

6. – searching a file on the hard disk;

7. – searching a file in Internet;

8. – copying a file from Internet to the hard disk;

9. – data correcting on a diskette;

10. – rename a file on the hard disk;

11. – archiving a file on the hard disk;

12. – archiving a file on a diskette;

13. – disarchiving a file on the hard disk;

14. – installation of a program from CD;

15. – deletion a record in a file on a diskette;

16. – printing a text from a site in Internet;

17. – archiving a file in the hard disk and copying it to a

diskette.

Page 23: Computer Architecture All lecture.pdf

Lecture №2

Computer Evolution and Performance

1. The electronic Era of Computers, Generation I .

2. Structure of von Nuemann machine.

3. Structure of IAS.

4. Generations II, III and IV. Moor’s Law.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th ed. – Upper

Saddle River, NJ : Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th

ed. – McGRAW-

HILL INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th ed. - Upper Saddle River, NJ : Prentice Hall,

2002.

Page 24: Computer Architecture All lecture.pdf

ENIAC – background

Electronic Numerical Integrator And Computer Eckert and Mauchly University of Pennsylvania Trajectory tables for weapons Started 1943 Finished 1946 Too late for war effort Used until 1955

ENIAC - details

Decimal (not binary) 20 accumulators of 10 digits Programmed manually by switches 18,000 vacuum tubes 30 tons 15,000 square feet 140 kW power consumption 5,000 additions per second

von Neumann/Turing

EDVAC (Electronic Discrete Variable Automatic Calculator) Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies IAS (Immediate Address Storage – Память с Прямой Адресацией) Completed 1952

Основные моменты концепции фон Неймана.

Данные и команды хранятся совместно в единой подсистеме

памяти, способной выполнять операции чтения и записи;

К отдельным элементам информации, хранящейся в памяти,

можно обращаться по адресу, характеризующему ее

положение в общем массиве, независимо от смысла

затребованной информации;

Заданный алгоритм реализуется последовательным

выполнением элементарных команд в порядке их

расположения в памяти, если иное не будет указано явно.

Key Concepts of von Neuman Architecture.

Data and instructions are stored in a single read-write memory

subsystem;

The contents of this memory are addressable by location, without

regard to the type of data contained there;

Execution occurs in a sequential fashion (unless explicitly

modified) from one instruction to the next.

Page 25: Computer Architecture All lecture.pdf

Structure of von Nuemann machine

Main Memory

(M)

Arithmetic and Logic Unit

Program Control Unit (PCU)

Input Output Equipment

Works in accordance of

signals coming from the

PCU

Analyses program’s instructions taken out from

the M and organizes its execution

Accumulator

Processes data, which is presented in

a binary form

Contains data

( instructions)

Page 26: Computer Architecture All lecture.pdf

19

39 1 0

IAS - details

1000 x 40 bit words(1000 cells, and each cell contains 40 bit)

Binary number

2 x 20 bit instructions(they were stored at the same cells)

Set of registers (storage in CPU)

Memory Buffer Register(MBR stores a word, which should be put into the memory or which just have been taken out of the memory).

Memory Address Register(MAR stores an address of the memory cell to which we call for write or read data).

Instruction Register(IR stores an operation code of current instruction 8 bit width during process of its execution).

Instruction Buffer Register(IBR serves for the temporary storage of the right instruction, which has been fetched )

Program Counter(PC stores address of the next word of instruction, which should be fetched next)

Accumulator(AC serves for temporary storage operands and results in ALU)

Multiplier Quotient(MQ serves for temporary storage operands and results in ALU)

Знаковый

разряд

Слово числа

Левая команда Правая команда

0 8 19 20 28 39

Код

операции

Адрес Код Адрес

операции

Слово команды

Page 27: Computer Architecture All lecture.pdf

Structure of IAS - detail

Main Memory

Arithmetic and Logic Unit

Program Control Unit

Input Output Equipment

MBR

Arithmetic & Logic circuits Circuits

MQ Accumulator

MAR

Control Circuits

IBR

IR

PC

Instructions & Data

Central Processing Unit

Signals

Page 28: Computer Architecture All lecture.pdf

Commercial Computers

1947 - Eckert-Mauchly Computer Corporation UNIVAC I (Universal Automatic Computer) US Bureau of Census 1950 calculations Became part of Sperry-Rand Corporation Late 1950s - UNIVAC II Faster More memory Upward compatibility

IBM

Punched-card processing equipment 1953 - the 701 IBM’s first stored program computer Scientific calculations 1955 - the 702 Business applications Lead to 700/7000 series

Transistors

Replaced vacuum tubes Smaller Cheaper Less heat dissipation Solid State device Made from Silicon (Sand) Invented 1947 at Bell Labs William Shockley et al.

Transistor Based Computers

Second generation machines NCR & RCA produced small transistor machines IBM 7000 DEC(Digital Equipment Corporation)- 1957 Produced PDP-1

Page 29: Computer Architecture All lecture.pdf

Microelectronics

Literally - “small electronics” A computer is made up of gates, memory cells and interconnections These can be manufactured on a semiconductor e.g. silicon wafer

Generations of Computer

Vacuum tube - 1946-1957 Transistor - 1958-1964 Small scale integration - 1965 on Up to 100 devices on a chip Medium scale integration - to 1971 100-3,000 devices on a chip Large scale integration - 1971-1977 3,000 - 100,000 devices on a chip Very large scale integration - 1978 to date 100,000 - 100,000,000 devices on a chip Ultra large scale integration Over 100,000,000 devices on a chip

Page 30: Computer Architecture All lecture.pdf

Moore’s Law

Increased density of components on chip Gordon Moore - cofounder of Intel

Number of transistors on a chip will double every year Since 1970’s development has slowed a little Number of transistors doubles every 18 months Cost of a chip has remained almost unchanged Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability

Page 31: Computer Architecture All lecture.pdf

Growth in CPU Transistor Count

Page 32: Computer Architecture All lecture.pdf

IBM 360 series

1964 Replaced (& not compatible with) 7000 series First planned “family” of computers Similar or identical instruction sets Similar or identical O/S Increasing speed Increasing number of I/O ports (i.e. more terminals) Increased memory size Increased cost

Multiplexed switch structure

Page 33: Computer Architecture All lecture.pdf

DEC PDP-8

1964 First minicomputer (after miniskirt!) Did not need air conditioned room Small enough to sit on a lab bench $16,000 $100k+ for IBM 360

Embedded applications BUS STRUCTURE

Page 34: Computer Architecture All lecture.pdf

DEC - PDP-8 Bus Structure

OMNIBUS

Console Controller

CPU

Main Memory I/O Module

I/O Module

Page 35: Computer Architecture All lecture.pdf

Semiconductor Memory

1970 Fairchild Size of a single core i.e. 1 bit of magnetic core storage

Holds 256 bits Non-destructive read Much faster than core Capacity approximately doubles each year

Page 36: Computer Architecture All lecture.pdf

Intel

1971 - 4004 First microprocessor

All CPU components on a single chip 4 bit

Followed in 1972 by 8008 8 bit Both designed for specific applications

1974 - 8080 Intel’s first general purpose microprocessor

Page 37: Computer Architecture All lecture.pdf

Speeding it up

Pipelining On board cache On board L1 & L2 cache

Branch prediction Data flow analysis

Speculative execution

Performance Mismatch

Processor speed increased Memory capacity increased Memory speed lags behind processor speed

Page 38: Computer Architecture All lecture.pdf

DRAM and Processor Characteristics

Page 39: Computer Architecture All lecture.pdf

Trends in DRAM use

Page 40: Computer Architecture All lecture.pdf

Definition. The Computer Performance (CP) is determined

by number of certain (well known) operations per time

unity.

The generalized estimation of the CP is a number of

transactions per second.

The basic performance characteristics of a computer system:

processor speed, memory capacity, interconnection data rates.

Solutions

Increase number of bits retrieved at one time Make DRAM “wider” rather than “deeper” Change DRAM interface Cache Reduce frequency of memory access More complex cache and cache on chip Increase interconnection bandwidth High speed buses Hierarchy of buses

Page 41: Computer Architecture All lecture.pdf
Page 42: Computer Architecture All lecture.pdf

Def.1 Register is an area of internal (high-speed) memory for

temporary storing data.

Def.2 Word (computer word) is an assemblage of quite certain

number of symbols (binary digits, bits), which is perceived by

the computer as entire(whole, integer, not dividable) one and

has got a strict meaning (sense).

Two main units (ALU and Program Control Unit) take part in

execution IAS’ instructions.

The major components of ALU are:

1. Registers:

AC – accumulator, which serves for temporary storage senior 40

(from 80 possible) bits, by which input operand or obtained result have

been coded;

MQ – multiplier quotient serves for temporary storage junior 40

bits, by which input operand or obtained result have been coded;

MBR – memory buffer register stores a word, which is to be written

into the memory, or which just has been fatted from the memory.

2. Arithmetic & Logic Circuits Unit performs the primary arithmetic

and logical operations of the computer.

Program Control Unit includes the following components:

1. Registers:

IBR – instruction buffer register, which is intended for storing the

right instruction, which has been fatted from the memory;

IR – instruction register serves for storing the left instruction, which

just has been fatted;

MAR – memory address register is intended for storing an address

of word, which is to be written into the memory, or to be read from it;

PC – program counter stores an address of a word (where the left

and right instructions are stored), which are to be executed next.

2. Control Circuits coordinate and control the other parts of computer

system. They read the stored program (one instruction at a time),

direct other components of the computer system to perform the tasks

required by the program. The series of operations required to process a

single machine instruction is called the machine cycle.

Instructions of IAS.

There were 21 instructions in IAS. All these instructions may be

divided into 5 groups:

Page 43: Computer Architecture All lecture.pdf

1. Data transfer instructions, these instructions remove data from the

memory cells in registers AC or MQ, or vice verse ;

2. Instructions of unconditional jumps;

3. Instructions of conditional jumps;

4. Arithmetic instructions;

5. Commands of modification a part of some instructions.

Page 44: Computer Architecture All lecture.pdf

MULTIPLEXOR

CPU

Main Memory

1

2

3

7

8

9

4

5

6

DATA

CHANNEL

DATA

CHANNEL

DATA

CHANNEL

1.(8.) Magnetic tape storage; 2. Puncher; 3. Printer; 4. Punch cards reader; 5. Magnetic drum; 6.(7.) Magnetic disk

storage.;9Data communication equipment.

Configuration of Generation II Typical Computer (IBM 7094)

Page 45: Computer Architecture All lecture.pdf

Data Channel is an independent I/O block, which is equipped with its own processor and own system of instructions. These

instructions are stored in the main memory subsystem, but they are executed only by corresponding processor (of I/O block). CPU initializes the session (process of starting, using and completion interactions between applications and computer devices for data transfer)

through the channel, by sending a concrete signal to the I/O module, and after it all necessary operation are performed by this module in

correspondence with a program, which is fatted from the main memory. After completing the session the I/O module informs CPU (by sending special

signal). So, CPU is released from executing tasks, which are not peculiar to it.

Multiplexor is a device, which serves as a central commutator for data transfer among data channels, CPU and the main

memory. It may be considered as a dispatcher (manager) of access to the main memory by CPU and data channels (it provides

independence for work as for the channels, so CPU).

Transistor is an electronic device on the base of semiconductor crystal, which has got three or more electrodes; it intended for

amplification, generation or transformation electric oscillations.

Integrated circuit is an electronic device made by printing thousands or even millions of tiny transistors and some other

electronic elements on a small silicon crystal (chip), which are connected in a certain way and considered as entire one.

Base Electronic Elements of Computer

Logical

Function …..

Input

Timing signal

Electronic Circuit

with 2 stable states

Input

READ

Output Output

WRITE

a) Gate b) Memory Cell

Page 46: Computer Architecture All lecture.pdf

Questions to Lecture 2.

1. Describe Architecture and Structure Organization of

computers of I, II, III and IV generations, compare them.

2. Formulate and analyze Key Concepts of von Neumann

Architecture.

3. Describe the functional structure of von Neumann machine.

4. Describe the functional structure of IAS. List elements of

Architecture and Structure Organization (details) of IAS.

5. List and describe base electronic components of

contemporary computer.

6. Formulate and analyze Moor’s Law.

7. What’s Computer System Performance? List the basic

characteristics of Computer System Performance.

Page 47: Computer Architecture All lecture.pdf

Arithmetic logic unit

From Wikipedia, the free encyclopedia

(Redirected from Arithmetic Logic Unit)

Jump to: navigation, search

Arithmetic Logic Unit schematic symbol

Cascadable 8 Bit ALU Texas Instruments SN74AS888

In computing, an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and

logical operations. The ALU is a fundamental building block of the central processing unit

(CPU) of a computer, and even the simplest microprocessors contain one for purposes such as

maintaining timers. The processors found inside modern CPUs and graphics processing units

(GPUs) accommodate very powerful and very complex ALUs; a single component may

contain a number of ALUs.

Mathematician John von Neumann proposed the ALU concept in 1945, when he wrote a

report on the foundations for a new computer called the EDVAC. Research into ALUs

remains an important part of computer science, falling under Arithmetic and logic

structures in the ACM Computing Classification System.

Contents

1 Numerical systems

2 Practical overview

o 2.1 Simple operations

o 2.2 Complex operations

Page 48: Computer Architecture All lecture.pdf

o 2.3 Inputs and outputs

o 2.4 ALUs vs. FPUs

3 See also

4 References

5 External links

[edit] Numerical systems

Main article: Signed number representations

An ALU must process numbers using the same format as the rest of the digital circuit. The

format of modern processors is almost always the two's complement binary number

representation. Early computers used a wide variety of number systems, including ones'

complement, two's complement sign-magnitude format, and even true decimal systems, with

ten tubes per digit.[disputed – discuss]

ALUs for each one of these numeric systems had different designs, and that influenced the

current preference for two's complement, as this is the representation that makes it easier for

the ALUs to calculate additions and subtractions.[citation needed]

The ones' complement and two's complement number systems allow for subtraction to be

accomplished by adding the negative of a number in a very simple way which negates the

need for specialized circuits to do subtraction; however, calculating the negative in two's

complement requires adding a one to the low order bit and propagating the carry. An

alternative way to do two's complement subtraction of A−B is to present a one to the carry

input of the adder and use ¬B rather than B as the second input.

[edit] Practical overview

Most of a processor's operations are performed by one or more ALUs. An ALU loads data

from input registers, an external Control Unit then tells the ALU what operation to perform on

that data, and then the ALU stores its result into an output register. The Control Unit is

responsible for moving the processed data between these registers, ALU and memory.

[edit] Simple operations

Page 49: Computer Architecture All lecture.pdf

A simple example arithmetic logic unit (2-bit ALU) that does AND, OR, XOR, and addition

Most ALUs can perform the following operations:

Integer arithmetic operations (addition, subtraction, and sometimes multiplication and

division, though this is more expensive)

Bitwise logic operations (AND, NOT, OR, XOR)

Bit-shifting operations (shifting or rotating a word by a specified number of bits to the

left or right, with or without sign extension). Shifts can be interpreted as

multiplications by 2 and divisions by 2.

[edit] Complex operations

This section's tone or style may not be appropriate for Wikipedia. Specific

concerns may be found on the talk page. See Wikipedia's guide to writing better

articles for suggestions. (January 2011)

Engineers can design an Arithmetic Logic Unit to calculate any operation. The more complex

the operation, the more expensive the ALU is, the more space it uses in the processor, the

more power it dissipates. Therefore, engineers compromise. They make the ALU powerful

enough to make the processor fast, but yet not so complex as to become prohibitive. For

example, computing the square root of a number might use:

1. Calculation in a single clock Design an extraordinarily complex ALU that calculates

the square root of any number in a single step.

2. Calculation pipeline Design a very complex ALU that calculates the square root of

any number in several steps. The intermediate results go through a series of circuits

arranged like a factory production line. The ALU can accept new numbers to calculate

even before having finished the previous ones. The ALU can now produce numbers as

fast as a single-clock ALU, although the results start to flow out of the ALU only after

an initial delay.

3. interactive calculation Design a complex ALU that calculates the square root through

several steps. This usually relies on control from a complex control unit with built-in

microcode.

Page 50: Computer Architecture All lecture.pdf

4. Co-processor Design a simple ALU in the processor, and sell a separate specialized

and costly processor that the customer can install just beside this one, and implements

one of the options above.

5. Software libraries Tell the programmers that there is no co-processor and there is no

emulation, so they will have to write their own algorithms to calculate square roots by

software.

6. Software emulation Emulate the existence of the co-processor, that is, whenever a

program attempts to perform the square root calculation, make the processor check if

there is a co-processor present and use it if there is one; if there isn't one, interrupt the

processing of the program and invoke the operating system to perform the square root

calculation through some software algorithm.

The options above go from the fastest and most expensive one to the slowest and least

expensive one. Therefore, while even the simplest computer can calculate the most

complicated formula, the simplest computers will usually take a long time doing that because

of the several steps for calculating the formula.

Powerful processors like the Intel Core and AMD64 implement option #1 for several simple

operations, #2 for the most common complex operations and #3 for the extremely complex

operations.

[edit] Inputs and outputs

The inputs to the ALU are the data to be operated on (called operands) and a code from the

control unit indicating which operation to perform. Its output is the result of the computation.

In many designs the ALU also takes or generates as inputs or outputs a set of condition codes

from or to a status register. These codes are used to indicate cases such as carry-in or carry-

out, overflow, divide-by-zero, etc.

[edit] ALUs vs. FPUs

A Floating Point Unit also performs arithmetic operations between two values, but they do so

for numbers in floating point representation, which is much more complicated than the two's

complement representation used in a typical ALU. In order to do these calculations, a FPU

has several complex circuits built-in, including some internal ALUs.

In modern practice, engineers typically refer to the ALU as the circuit that performs integer

arithmetic operations (like two's complement and BCD). Circuits that calculate more complex

formats like floating point, complex numbers, etc. usually receive a more specific name such

as FPU.

[edit] See also

7400 series

74181

adder (electronics)

multiplication ALU

digital circuit

Page 51: Computer Architecture All lecture.pdf

division (electronics)

Control Unit

[edit] References

Hwang, Enoch (2006). Digital Logic and Microprocessor Design with VHDL.

Thomson. ISBN 0-534-46593-5. http://faculty.lasierra.edu/~ehwang/digitaldesign.

Stallings, William (2006). Computer Organization & Architecture: Designing for

Performance 7th ed. Pearson Prentice Hall. ISBN 0-13-185644-8.

http://williamstallings.com/COA/COA7e.html.

[edit] External links

A Simulator of Complex ALU in MATLAB

An ALU implemented in Minecraft

v · d · eCPU technologies

Architecture

ISA : CISC · EDGE · EPIC · MISC · OISC · RISC · VLIW ·

NISC · ZISC · Harvard architecture · von Neumann architecture ·

4-bit · 8-bit · 12-bit · 16-bit · 18-bit · 24-bit · 31-bit · 32-bit · 36-

bit · 48-bit · 64-bit · 128-bit · Comparison of CPU architectures

Parallelism

Pipeline

Instruction pipelining · In-order & out-of-

order execution · Register renaming ·

Speculative execution · Hazards

Level Bit · Instruction · Superscalar · Data · Task

Threads

Multithreading · Simultaneous

multithreading · Hyperthreading ·

Superthreading

Flynn's taxonomy SISD · SIMD · MISD · MIMD

Types

Digital signal processor · Microcontroller · System-on-a-chip ·

Vector processor

Components

Arithmetic logic unit (ALU) · Address generation unit (AGU) ·

Barrel shifter · Floating-point unit (FPU) · Back-side bus ·

Multiplexer · Demultiplexer · Registers · Memory management

unit (MMU) · Translation lookaside buffer (TLB) · Cache ·

Register file · Microcode · Control unit · Clock rate

Power management

APM · ACPI · Dynamic frequency scaling · Dynamic voltage

scaling · Clock gating

Retrieved from "http://en.wikipedia.org/wiki/Arithmetic_logic_unit"

Categories: Digital circuits | Central processing unit | Computer arithmetic

Hidden categories: All accuracy disputes | Articles with disputed statements from November

2010 | All articles with unsourced statements | Articles with unsourced statements from

October 2007 | Wikipedia articles needing style editing from January 2011 | All articles

needing style editing

Personal tools

Page 53: Computer Architecture All lecture.pdf

Download as PDF

Printable version

Languages

ية عرب ال

Български

Català

Česky

Deutsch

Eesti

Ελληνικά

Español

Euskara

سی ار ف

Français

Galego

한국어

Bahasa Indonesia

Italiano

עברית

Latina

Latviešu

Lëtzebuergesch

Magyar

Nederlands

日本語

orsk bokm l

Polski

Português

Română

Русский

Shqip

Simple English

Slovenčina

Svenska

ไทย Türkçe

Tiếng Việt

中文

This page was last modified on 21 February 2011 at 17:05.

Text is available under the Creative Commons Attribution-ShareAlike License;

additional terms may apply. See Terms of Use for details.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit

organization.

Contact us

Privacy policy

About Wikipedia

Page 55: Computer Architecture All lecture.pdf

Computer

From Wikipedia, the free encyclopedia

Jump to: navigation, search

For other uses, see Computer (disambiguation).

"Computer technology" redirects here. For the company, see Computer Technology Limited.

Computer

A computer is a programmable machine that receives input, stores and automatically

manipulates data, and provides output in a useful format.

The first electronic computers were developed in the mid-20th century (1940–1945).

Originally, they were the size of a large room, consuming as much power as several hundred

modern personal computers (PCs).[1]

Modern computers based on integrated circuits are millions to billions of times more capable

than the early machines, and occupy a fraction of the space.[2]

Simple computers are small

enough to fit into mobile devices, and can be powered by a small battery. Personal computers

in their various forms are icons of the Information Age and are what most people think of as

"computers". However, the embedded computers found in many devices from MP3 players to

fighter aircraft and from toys to industrial robots are the most numerous.

Contents

1 History of computing

o 1.1 Limited-function ancient computers

o 1.2 First general-purpose computers

o 1.3 Stored-program architecture

o 1.4 Semiconductors and microprocessors

2 Programs

o 2.1 Stored program architecture

o 2.2 Bugs

o 2.3 Machine code

Page 56: Computer Architecture All lecture.pdf

o 2.4 Higher-level languages and program design

3 Function

o 3.1 Control unit

o 3.2 Arithmetic/logic unit (ALU)

o 3.3 Memory

o 3.4 Input/output (I/O)

o 3.5 Multitasking

o 3.6 Multiprocessing

o 3.7 Networking and the Internet

4 Misconceptions

o 4.1 Required technology

o 4.2 Computer architecture paradigms

o 4.3 Limited-function computers

o 4.4 Virtual computers

5 Further topics

o 5.1 Artificial intelligence

o 5.2 Hardware

o 5.3 Software

o 5.4 Programming languages

o 5.5 Professions and organizations

6 See also

7 Notes

8 References

9 External links

History of computing

Main article: History of computing hardware

The first use of the word "computer" was recorded in 1613, referring to a person who carried

out calculations, or computations, and the word continued with the same meaning until the

middle of the 20th century. From the end of the 19th century onwards, the word began to take

on its more familiar meaning, describing a machine that carries out computations.[3]

Limited-function ancient computers

Page 57: Computer Architecture All lecture.pdf

The Jacquard loom, on display at the Museum of Science and Industry in Manchester,

England, was one of the first programmable devices.

The history of the modern computer begins with two separate technologies—automated

calculation and programmability—but no single device can be identified as the earliest

computer, partly because of the inconsistent application of that term. Examples of early

mechanical calculating devices include the abacus, the slide rule and arguably the astrolabe

and the Antikythera mechanism, an ancient astronomical computer built by the Greeks around

80 BC.[4]

The Greek mathematician Hero of Alexandria (c. 10–70 AD) built a mechanical

theater which performed a play lasting 10 minutes and was operated by a complex system of

ropes and drums that might be considered to be a means of deciding which parts of the

mechanism performed which actions and when.[5]

This is the essence of programmability.

The "castle clock", an astronomical clock invented by Al-Jazari in 1206, is considered to be

the earliest programmable analog computer.[6][verification needed]

It displayed the zodiac, the solar

and lunar orbits, a crescent moon-shaped pointer travelling across a gateway causing

automatic doors to open every hour,[7][8]

and five robotic musicians who played music when

struck by levers operated by a camshaft attached to a water wheel. The length of day and

night could be re-programmed to compensate for the changing lengths of day and night

throughout the year.[6]

The Renaissance saw a re-invigoration of European mathematics and engineering. Wilhelm

Schickard's 1623 device was the first of a number of mechanical calculators constructed by

European engineers, but none fit the modern definition of a computer, because they could not

be programmed.

First general-purpose computers

In 1801, Joseph Marie Jacquard made an improvement to the textile loom by introducing a

series of punched paper cards as a template which allowed his loom to weave intricate

patterns automatically. The resulting Jacquard loom was an important step in the development

of computers because the use of punched cards to define woven patterns can be viewed as an

early, albeit limited, form of programmability.

Page 58: Computer Architecture All lecture.pdf

The Most Famous Image in the Early History of Computing

[9]

This portrait of Jacquard was woven in silk on a Jacquard loom and required 24,000 punched

cards to create (1839). It was only produced to order. Charles Babbage owned one of these

portraits ; it inspired him in using perforated cards in his analytical engine[10]

It was the fusion of automatic calculation with programmability that produced the first

recognizable computers. In 1837, Charles Babbage was the first to conceptualize and design a

fully programmable mechanical computer, his analytical engine.[11]

Limited finances and

Babbage's inability to resist tinkering with the design meant that the device was never

completed ; nevertheless his son, Henry Babbage, completed a simplified version of the

analytical engine's computing unit (the mill) in 1888. He gave a successful demonstration of

its use in computing tables in 1906. This machine was given to the Science museum in South

Kensington in 1910.

In the late 1880s, Herman Hollerith invented the recording of data on a machine readable

medium. Prior uses of machine readable media, above, had been for control, not data. "After

some initial trials with paper tape, he settled on punched cards ..."[12]

To process these

punched cards he invented the tabulator, and the keypunch machines. These three inventions

were the foundation of the modern information processing industry. Large-scale automated

data processing of punched cards was performed for the 1890 United States Census by

Hollerith's company, which later became the core of IBM. By the end of the 19th century a

number of technologies that would later prove useful in the realization of practical computers

had begun to appear: the punched card, Boolean algebra, the vacuum tube (thermionic valve)

and the teleprinter.

During the first half of the 20th century, many scientific computing needs were met by

increasingly sophisticated analog computers, which used a direct mechanical or electrical

model of the problem as a basis for computation. However, these were not programmable and

generally lacked the versatility and accuracy of modern digital computers.

Page 59: Computer Architecture All lecture.pdf

Alan Turing is widely regarded to be the father of modern computer science. In 1936 Turing

provided an influential formalisation of the concept of the algorithm and computation with the

Turing machine, providing a blueprint for the electronic digital computer.[13]

Of his role in the

creation of the modern computer, Time magazine in naming Turing one of the 100 most

influential people of the 20th century, states: "The fact remains that everyone who taps at a

keyboard, opening a spreadsheet or a word-processing program, is working on an incarnation

of a Turing machine".[13]

The Zuse Z3, 1941, considered the world's first working programmable, fully automatic

computing machine.

The ENIAC, which became operational in 1946, is considered to be the first general-purpose

electronic computer.

EDSAC was one of the first computers to implement the stored program (von Neumann)

architecture.

Page 60: Computer Architecture All lecture.pdf

Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging.

The Atanasoff–Berry Computer (ABC) was among the first electronic digital binary

computing devices. Conceived in 1937 by Iowa State College physics professor John

Atanasoff, and built with the assistance of graduate student Clifford Berry,[14]

the machine

was not programmable, being designed only to solve systems of linear equations. The

computer did employ parallel computation. A 1973 court ruling in a patent dispute found that

the patent for the 1946 ENIAC computer derived from the Atanasoff–Berry Computer.

The inventor of the program-controlled computer was Konrad Zuse, who built the first

working computer in 1941 and later in 1955 the first computer based on magnetic storage.[15]

George Stibitz is internationally recognized as a father of the modern digital computer. While

working at Bell Labs in November 1937, Stibitz invented and built a relay-based calculator he

dubbed the "Model K" (for "kitchen table", on which he had assembled it), which was the first

to use binary circuits to perform an arithmetic operation. Later models added greater

sophistication including complex arithmetic and programmability.[16]

A succession of steadily more powerful and flexible computing devices were constructed in

the 1930s and 1940s, gradually adding the key features that are seen in modern computers.

The use of digital electronics (largely invented by Claude Shannon in 1937) and more flexible

programmability were vitally important steps, but defining one point along this road as "the

first digital electronic computer" is difficult.Shannon 1940

Notable achievements include.

Konrad Zuse's electromechanical "Z machines". The Z3 (1941) was the first working

machine featuring binary arithmetic, including floating point arithmetic and a measure

of programmability. In 1998 the Z3 was proved to be Turing complete, therefore being

the world's first operational computer.[17]

The non-programmable Atanasoff–Berry Computer (commenced in 1937, completed

in 1941) which used vacuum tube based computation, binary numbers, and

regenerative capacitor memory. The use of regenerative memory allowed it to be

much more compact than its peers (being approximately the size of a large desk or

workbench), since intermediate results could be stored and then fed back into the same

set of computation elements.

The secret British Colossus computers (1943),[18]

which had limited programmability

but demonstrated that a device using thousands of tubes could be reasonably reliable

and electronically reprogrammable. It was used for breaking German wartime codes.

The Harvard Mark I (1944), a large-scale electromechanical computer with limited

programmability.[19]

Page 61: Computer Architecture All lecture.pdf

The U.S. Army's Ballistic Research Laboratory ENIAC (1946), which used decimal

arithmetic and is sometimes called the first general purpose electronic computer (since

Konrad Zuse's Z3 of 1941 used electromagnets instead of electronics). Initially,

however, ENIAC had an inflexible architecture which essentially required rewiring to

change its programming.

Stored-program architecture

Several developers of ENIAC, recognizing its flaws, came up with a far more flexible and

elegant design, which came to be known as the "stored program architecture" or von

Neumann architecture. This design was first formally described by John von Neumann in the

paper First Draft of a Report on the EDVAC, distributed in 1945. A number of projects to

develop computers based on the stored-program architecture commenced around this time, the

first of these being completed in Great Britain. The first working prototype to be

demonstrated was the Manchester Small-Scale Experimental Machine (SSEM or "Baby") in

1948. The Electronic Delay Storage Automatic Calculator (EDSAC), completed a year after

the SSEM at Cambridge University, was the first practical, non-experimental implementation

of the stored program design and was put to use immediately for research work at the

university. Shortly thereafter, the machine originally described by von Neumann's paper—

EDVAC—was completed but did not see full-time use for an additional two years.

Nearly all modern computers implement some form of the stored-program architecture,

making it the single trait by which the word "computer" is now defined. While the

technologies used in computers have changed dramatically since the first electronic, general-

purpose computers of the 1940s, most still use the von Neumann architecture.

Beginning in the 1950s, Soviet scientists Sergei Sobolev and Nikolay Brusentsov conducted

research on ternary computers, devices that operated on a base three numbering system of −1,

0, and 1 rather than the conventional binary numbering system upon which most computers

are based. They designed the Setun, a functional ternary computer, at Moscow State

University. The device was put into limited production in the Soviet Union, but supplanted by

the more common binary architecture.

Semiconductors and microprocessors

Computers using vacuum tubes as their electronic elements were in use throughout the 1950s,

but by the 1960s had been largely replaced by transistor-based machines, which were smaller,

faster, cheaper to produce, required less power, and were more reliable. The first

transistorised computer was demonstrated at the University of Manchester in 1953.[20]

In the

1970s, integrated circuit technology and the subsequent creation of microprocessors, such as

the Intel 4004, further decreased size and cost and further increased speed and reliability of

computers. By the late 1970s, many products such as video recorders contained dedicated

computers called microcontrollers, and they started to appear as a replacement to mechanical

controls in domestic appliances such as washing machines. The 1980s witnessed home

computers and the now ubiquitous personal computer. With the evolution of the Internet,

personal computers are becoming as common as the television and the telephone in the

household[citation needed]

.

Modern smartphones are fully programmable computers in their own right, and as of 2009

may well be the most common form of such computers in existence[citation needed]

.

Page 62: Computer Architecture All lecture.pdf

Programs

The defining feature of modern computers which distinguishes them from all other machines

is that they can be programmed. That is to say that some type of instructions (the program)

can be given to the computer, and it will carry process them. While some computers may have

strange concepts "instructions" and "output" (see quantum computing), modern computers

based on the von Neumann architecture are often have machine code in the form of an

imperative programming language.

In practical terms, a computer program may be just a few instructions or extend to many

millions of instructions, as do the programs for word processors and web browsers for

example. A typical modern computer can execute billions of instructions per second

(gigaflops) and rarely makes a mistake over many years of operation. Large computer

programs consisting of several million instructions may take teams of programmers years to

write, and due to the complexity of the task almost certainly contain errors.

Stored program architecture

Main articles: Computer program and Computer programming

A 1970s punched card containing one line from a FORTRAN program. The card reads: "Z(1)

= Y + W(1)" and is labelled "PROJ039" for identification purposes.

This section applies to most common RAM machine-based computers.

In most cases, computer instructions are simple: add one number to another, move some data

from one location to another, send a message to some external device, etc. These instructions

are read from the computer's memory and are generally carried out (executed) in the order

they were given. However, there are usually specialized instructions to tell the computer to

jump ahead or backwards to some other place in the program and to carry on executing from

there. These are called "jump" instructions (or branches). Furthermore, jump instructions may

be made to happen conditionally so that different sequences of instructions may be used

depending on the result of some previous calculation or some external event. Many computers

directly support subroutines by providing a type of jump that "remembers" the location it

jumped from and another instruction to return to the instruction following that jump

instruction.

Program execution might be likened to reading a book. While a person will normally read

each word and line in sequence, they may at times jump back to an earlier place in the text or

skip sections that are not of interest. Similarly, a computer may sometimes go back and repeat

the instructions in some section of the program over and over again until some internal

Page 63: Computer Architecture All lecture.pdf

condition is met. This is called the flow of control within the program and it is what allows

the computer to perform tasks repeatedly without human intervention.

Comparatively, a person using a pocket calculator can perform a basic arithmetic operation

such as adding two numbers with just a few button presses. But to add together all of the

numbers from 1 to 1,000 would take thousands of button presses and a lot of time—with a

near certainty of making a mistake. On the other hand, a computer may be programmed to do

this with just a few simple instructions. For example:

mov #0, sum ; set sum to 0

mov #1, num ; set num to 1

loop: add num, sum ; add num to sum

add #1, num ; add 1 to num

cmp num, #1000 ; compare num to 1000

ble loop ; if num <= 1000, go back to 'loop'

halt ; end of program. stop running

Once told to run this program, the computer will perform the repetitive addition task without

further human intervention. It will almost never make a mistake and a modern PC can

complete the task in about a millionth of a second.[21]

Bugs

Errors in computer programs are called "bugs". Bugs may be benign and not affect the

usefulness of the program, or have only subtle effects. But in some cases they may cause the

program to "hang"—become unresponsive to input such as mouse clicks or keystrokes, or to

completely fail or "crash". Otherwise benign bugs may sometimes be harnessed for malicious

intent by an unscrupulous user writing an "exploit"—code designed to take advantage of a

bug and disrupt a computer's proper execution. Bugs are usually not the fault of the computer.

Since computers merely execute the instructions they are given, bugs are nearly always the

result of programmer error or an oversight made in the program's design.[22]

Machine code

In most computers, individual instructions are stored as machine code with each instruction

being given a unique number (its operation code or opcode for short). The command to add

two numbers together would have one opcode, the command to multiply them would have a

different opcode and so on. The simplest computers are able to perform any of a handful of

different instructions; the more complex computers have several hundred to choose from—

each with a unique numerical code. Since the computer's memory is able to store numbers, it

can also store the instruction codes. This leads to the important fact that entire programs

(which are just lists of these instructions) can be represented as lists of numbers and can

themselves be manipulated inside the computer in the same way as numeric data. The

fundamental concept of storing programs in the computer's memory alongside the data they

operate on is the crux of the von Neumann, or stored program, architecture. In some cases, a

computer might store some or all of its program in memory that is kept separate from the data

it operates on. This is called the Harvard architecture after the Harvard Mark I computer.

Modern von Neumann computers display some traits of the Harvard architecture in their

designs, such as in CPU caches.

Page 64: Computer Architecture All lecture.pdf

While it is possible to write computer programs as long lists of numbers (machine language)

and while this technique was used with many early computers,[23]

it is extremely tedious and

potentially error-prone to do so in practice, especially for complicated programs. Instead, each

basic instruction can be given a short name that is indicative of its function and easy to

remember—a mnemonic such as ADD, SUB, MULT or JUMP. These mnemonics are

collectively known as a computer's assembly language. Converting programs written in

assembly language into something the computer can actually understand (machine language)

is usually done by a computer program called an assembler. Machine languages and the

assembly languages that represent them (collectively termed low-level programming

languages) tend to be unique to a particular type of computer. For instance, an ARM

architecture computer (such as may be found in a PDA or a hand-held videogame) cannot

understand the machine language of an Intel Pentium or the AMD Athlon 64 computer that

might be in a PC.[24]

Higher-level languages and program design

Though considerably easier than in machine language, writing long programs in assembly

language is often difficult and is also error prone. Therefore, most practical programs are

written in more abstract high-level programming languages that are able to express the needs

of the programmer more conveniently (and thereby help reduce programmer error). High level

languages are usually "compiled" into machine language (or sometimes into assembly

language and then into machine language) using another computer program called a

compiler.[25]

High level languages are less related to the workings of the target computer than

assembly language, and more related to the language and structure of the problem(s) to be

solved by the final program. It is therefore often possible to use different compilers to

translate the same high level language program into the machine language of many different

types of computer. This is part of the means by which software like video games may be

made available for different computer architectures such as personal computers and various

video game consoles.

The task of developing large software systems presents a significant intellectual challenge.

Producing software with an acceptably high reliability within a predictable schedule and

budget has historically been difficult; the academic and professional discipline of software

engineering concentrates specifically on this challenge.

Function

Main articles: Central processing unit and Microprocessor

A general purpose computer has four main components: the arithmetic logic unit (ALU), the

control unit, the memory, and the input and output devices (collectively termed I/O). These

parts are interconnected by busses, often made of groups of wires.

Inside each of these parts are thousands to trillions of small electrical circuits which can be

turned off or on by means of an electronic switch. Each circuit represents a bit (binary digit)

of information so that when the circuit is on it represents a "1", and when off it represents a

"0" (in positive logic representation). The circuits are arranged in logic gates so that one or

more of the circuits may control the state of one or more of the other circuits.

Page 65: Computer Architecture All lecture.pdf

The control unit, ALU, registers, and basic I/O (and often other hardware closely linked with

these) are collectively known as a central processing unit (CPU). Early CPUs were composed

of many separate components but since the mid-1970s CPUs have typically been constructed

on a single integrated circuit called a microprocessor.

Control unit

Main articles: CPU design and Control unit

Diagram showing how a particular MIPS architecture instruction would be decoded by the

control system.

The control unit (often called a control system or central controller) manages the computer's

various components; it reads and interprets (decodes) the program instructions, transforming

them into a series of control signals which activate other parts of the computer.[26]

Control

systems in advanced computers may change the order of some instructions so as to improve

performance.

A key component common to all CPUs is the program counter, a special memory cell (a

register) that keeps track of which location in memory the next instruction is to be read

from.[27]

The control system's function is as follows—note that this is a simplified description, and

some of these steps may be performed concurrently or in a different order depending on the

type of CPU:

1. Read the code for the next instruction from the cell indicated by the program counter.

2. Decode the numerical code for the instruction into a set of commands or signals for

each of the other systems.

3. Increment the program counter so it points to the next instruction.

4. Read whatever data the instruction requires from cells in memory (or perhaps from an

input device). The location of this required data is typically stored within the

instruction code.

5. Provide the necessary data to an ALU or register.

6. If the instruction requires an ALU or specialized hardware to complete, instruct the

hardware to perform the requested operation.

7. Write the result from the ALU back to a memory location or to a register or perhaps an

output device.

8. Jump back to step (1).

Since the program counter is (conceptually) just another set of memory cells, it can be

changed by calculations done in the ALU. Adding 100 to the program counter would cause

the next instruction to be read from a place 100 locations further down the program.

Instructions that modify the program counter are often known as "jumps" and allow for loops

Page 66: Computer Architecture All lecture.pdf

(instructions that are repeated by the computer) and often conditional instruction execution

(both examples of control flow).

It is noticeable that the sequence of operations that the control unit goes through to process an

instruction is in itself like a short computer program—and indeed, in some more complex

CPU designs, there is another yet smaller computer called a microsequencer that runs a

microcode program that causes all of these events to happen.

Arithmetic/logic unit (ALU)

Main article: Arithmetic logic unit

The ALU is capable of performing two classes of operations: arithmetic and logic.[28]

The set of arithmetic operations that a particular ALU supports may be limited to adding and

subtracting or might include multiplying or dividing, trigonometry functions (sine, cosine,

etc.) and square roots. Some can only operate on whole numbers (integers) whilst others use

floating point to represent real numbers—albeit with limited precision. However, any

computer that is capable of performing just the simplest operations can be programmed to

break down the more complex operations into simple steps that it can perform. Therefore, any

computer can be programmed to perform any arithmetic operation—although it will take

more time to do so if its ALU does not directly support the operation. An ALU may also

compare numbers and return boolean truth values (true or false) depending on whether one is

equal to, greater than or less than the other ("is 64 greater than 65?").

Logic operations involve Boolean logic: AND, OR, XOR and NOT. These can be useful both

for creating complicated conditional statements and processing boolean logic.

Superscalar computers may contain multiple ALUs so that they can process several

instructions at the same time.[29]

Graphics processors and computers with SIMD and MIMD

features often provide ALUs that can perform arithmetic on vectors and matrices.

Memory

Main article: Computer data storage

Magnetic core memory was the computer memory of choice throughout the 1960s, until it

was replaced by semiconductor memory.

A computer's memory can be viewed as a list of cells into which numbers can be placed or

read. Each cell has a numbered "address" and can store a single number. The computer can be

Page 67: Computer Architecture All lecture.pdf

instructed to "put the number 123 into the cell numbered 1357" or to "add the number that is

in cell 1357 to the number that is in cell 2468 and put the answer into cell 1595". The

information stored in memory may represent practically anything. Letters, numbers, even

computer instructions can be placed into memory with equal ease. Since the CPU does not

differentiate between different types of information, it is the software's responsibility to give

significance to what the memory sees as nothing but a series of numbers.

In almost all modern computers, each memory cell is set up to store binary numbers in groups

of eight bits (called a byte). Each byte is able to represent 256 different numbers (2^8 = 256);

either from 0 to 255 or −128 to +127. To store larger numbers, several consecutive bytes may

be used (typically, two, four or eight). When negative numbers are required, they are usually

stored in two's complement notation. Other arrangements are possible, but are usually not

seen outside of specialized applications or historical contexts. A computer can store any kind

of information in memory if it can be represented numerically. Modern computers have

billions or even trillions of bytes of memory.

The CPU contains a special set of memory cells called registers that can be read and written to

much more rapidly than the main memory area. There are typically between two and one

hundred registers depending on the type of CPU. Registers are used for the most frequently

needed data items to avoid having to access main memory every time data is needed. As data

is constantly being worked on, reducing the need to access main memory (which is often slow

compared to the ALU and control units) greatly increases the computer's speed.

Computer main memory comes in two principal varieties: random-access memory or RAM

and read-only memory or ROM. RAM can be read and written to anytime the CPU

commands it, but ROM is pre-loaded with data and software that never changes, so the CPU

can only read from it. ROM is typically used to store the computer's initial start-up

instructions. In general, the contents of RAM are erased when the power to the computer is

turned off, but ROM retains its data indefinitely. In a PC, the ROM contains a specialized

program called the BIOS that orchestrates loading the computer's operating system from the

hard disk drive into RAM whenever the computer is turned on or reset. In embedded

computers, which frequently do not have disk drives, all of the required software may be

stored in ROM. Software stored in ROM is often called firmware, because it is notionally

more like hardware than software. Flash memory blurs the distinction between ROM and

RAM, as it retains its data when turned off but is also rewritable. It is typically much slower

than conventional ROM and RAM however, so its use is restricted to applications where high

speed is unnecessary.[30]

In more sophisticated computers there may be one or more RAM cache memories which are

slower than registers but faster than main memory. Generally computers with this sort of

cache are designed to move frequently needed data into the cache automatically, often without

the need for any intervention on the programmer's part.

Input/output (I/O)

Main article: Input/output

Page 68: Computer Architecture All lecture.pdf

Hard disk drives are common storage devices used with computers.

I/O is the means by which a computer exchanges information with the outside world.[31]

Devices that provide input or output to the computer are called peripherals.[32]

On a typical

personal computer, peripherals include input devices like the keyboard and mouse, and output

devices such as the display and printer. Hard disk drives, floppy disk drives and optical disc

drives serve as both input and output devices. Computer networking is another form of I/O.

Often, I/O devices are complex computers in their own right with their own CPU and

memory. A graphics processing unit might contain fifty or more tiny computers that perform

the calculations necessary to display 3D graphics[citation needed]

. Modern desktop computers

contain many smaller computers that assist the main CPU in performing I/O.

Multitasking

Main article: Computer multitasking

While a computer may be viewed as running one gigantic program stored in its main memory,

in some systems it is necessary to give the appearance of running several programs

simultaneously. This is achieved by multitasking i.e. having the computer switch rapidly

between running each program in turn.[33]

One means by which this is done is with a special signal called an interrupt which can

periodically cause the computer to stop executing instructions where it was and do something

else instead. By remembering where it was executing prior to the interrupt, the computer can

return to that task later. If several programs are running "at the same time", then the interrupt

generator might be causing several hundred interrupts per second, causing a program switch

each time. Since modern computers typically execute instructions several orders of magnitude

faster than human perception, it may appear that many programs are running at the same time

even though only one is ever executing in any given instant. This method of multitasking is

sometimes termed "time-sharing" since each program is allocated a "slice" of time in turn.[34]

Before the era of cheap computers, the principal use for multitasking was to allow many

people to share the same computer.

Seemingly, multitasking would cause a computer that is switching between several programs

to run more slowly — in direct proportion to the number of programs it is running. However,

most programs spend much of their time waiting for slow input/output devices to complete

their tasks. If a program is waiting for the user to click on the mouse or press a key on the

keyboard, then it will not take a "time slice" until the event it is waiting for has occurred. This

Page 69: Computer Architecture All lecture.pdf

frees up time for other programs to execute so that many programs may be run at the same

time without unacceptable speed loss.

Multiprocessing

Main article: Multiprocessing

Cray designed many supercomputers that used multiprocessing heavily.

Some computers are designed to distribute their work across several CPUs in a

multiprocessing configuration, a technique once employed only in large and powerful

machines such as supercomputers, mainframe computers and servers. Multiprocessor and

multi-core (multiple CPUs on a single integrated circuit) personal and laptop computers are

now widely available, and are being increasingly used in lower-end markets as a result.

Supercomputers in particular often have highly unique architectures that differ significantly

from the basic stored-program architecture and from general purpose computers.[35]

They

often feature thousands of CPUs, customized high-speed interconnects, and specialized

computing hardware. Such designs tend to be useful only for specialized tasks due to the large

scale of program organization required to successfully utilize most of the available resources

at once. Supercomputers usually see usage in large-scale simulation, graphics rendering, and

cryptography applications, as well as with other so-called "embarrassingly parallel" tasks.

Networking and the Internet

Main articles: Computer networking and Internet

Visualization of a portion of the routes on the Internet.

Page 70: Computer Architecture All lecture.pdf

Computers have been used to coordinate information between multiple locations since the

1950s. The U.S. military's SAGE system was the first large-scale example of such a system,

which led to a number of special-purpose commercial systems like Sabre.[36]

In the 1970s, computer engineers at research institutions throughout the United States began

to link their computers together using telecommunications technology. This effort was funded

by ARPA (now DARPA), and the computer network that it produced was called the

ARPANET.[37]

The technologies that made the Arpanet possible spread and evolved.

In time, the network spread beyond academic and military institutions and became known as

the Internet. The emergence of networking involved a redefinition of the nature and

boundaries of the computer. Computer operating systems and applications were modified to

include the ability to define and access the resources of other computers on the network, such

as peripheral devices, stored information, and the like, as extensions of the resources of an

individual computer. Initially these facilities were available primarily to people working in

high-tech environments, but in the 1990s the spread of applications like e-mail and the World

Wide Web, combined with the development of cheap, fast networking technologies like

Ethernet and ADSL saw computer networking become almost ubiquitous. In fact, the number

of computers that are networked is growing phenomenally. A very large proportion of

personal computers regularly connect to the Internet to communicate and receive information.

"Wireless" networking, often utilizing mobile phone networks, has meant networking is

becoming increasingly ubiquitous even in mobile computing environments.

Misconceptions

A computer does not need to be electric, nor even have a processor, nor RAM, nor even hard

disk. The minimal definition of a computer is anything that transforms information in a

purposeful way.

Required technology

Main article: Unconventional computing

Computational systems as flexible as a personal computer can be built out of almost anything.

For example, a computer can be made out of billiard balls (billiard ball computer); this is an

unintuitive and pedagogical example that a computer can be made out of almost anything.

More realistically, modern computers are made out of transistors made of photolithographed

semiconductors.

Historically, computers evolved from mechanical computers and eventually from vacuum

tubes to transistors.

There is active research to make computers out of many promising new types of technology,

such as optical computing, DNA computers, neural computers, and quantum computers. Some

of these can easily tackle problems that modern computers cannot (such as how quantum

computers can break some modern encryption algorithms by quantum factoring).

Computer architecture paradigms

Page 71: Computer Architecture All lecture.pdf

Some different paradigms of how to build a computer from the ground-up:

RAM machines

These are the types of computers with a CPU, computer memory, etc., which

understand basic instructions in a machine language. The concept evolved from the

Turing machine.

Brains

Brains are massively parallel processors made of neurons, wired in intricate patterns,

that communicate via electricity and neurotransmitter chemicals.

Programming languages

Such as the lambda calculus, or modern programming languages, are virtual

computers built on top of other computers.

Cellular automata

For example, the game of Life can create "gliders" and "loops" and other constructs

that transmit information; this paradigm can be applied to DNA computing, chemical

computing, etc.

Groups and committees

The linking of multiple computers (brains) is itself a computer

Logic gates are a common abstraction which can apply to most of the above digital or analog

paradigms.

The ability to store and execute lists of instructions called programs makes computers

extremely versatile, distinguishing them from calculators. The Church–Turing thesis is a

mathematical statement of this versatility: any computer with a minimum capability (being

Turing-complete) is, in principle, capable of performing the same tasks that any other

computer can perform. Therefore any type of computer (netbook, supercomputer, cellular

automaton, etc.) is able to perform the same computational tasks, given enough time and

storage capacity.

Limited-function computers

Conversely, a computer which is limited in function (one that is not "Turing-complete")

cannot simulate arbitrary things. For example, simple four-function calculators cannot

simulate a real computer without human intervention. As a more complicated example,

without the ability to program a gaming console, it can never accomplish what a

programmable calculator from the 1990s could (given enough time); the system as a whole is

not Turing-complete, even though it contains a Turing-complete component (the

microprocessor). Living organisms (the body, not the brain) are also limited-function

computers designed to make copies of themselves; they cannot be reprogrammed without

genetic engineering.

Virtual computers

A "computer" is commonly considered to be a physical device. However, one can create a

computer program which describes how to run a different computer, i.e. "simulating a

computer in a computer". Not only is this a constructive proof of the Church-Turing thesis,

but is also extremely common in all modern computers. For example, some programming

languages use something called an interpreter, which is a simulated computer built on top of

the basic computer; this allows programmers to write code (computer input) in a different

Page 72: Computer Architecture All lecture.pdf

language than the one understood by the base computer (the alternative is to use a compiler).

Additionally, virtual machines are simulated computers which virtually replicate a physical

computer in software, and are very commonly used by IT. Virtual machines are also a

common technique used to create emulators, such game console emulators.

Further topics

Glossary of computers

Artificial intelligence

A computer will solve problems in exactly the way they are programmed to, without regard to

efficiency nor alternative solutions nor possible shortcuts nor possible errors in the code.

Computer programs which learn and adapt are part of the emerging field of artificial

intelligence and machine learning.

Hardware

The term hardware covers all of those parts of a computer that are tangible objects. Circuits,

displays, power supplies, cables, keyboards, printers and mice are all hardware.

History of computing hardware

First Generation

(Mechanical/Electromechanical)

Calculators

Antikythera mechanism,

Difference engine, Norden

bombsight

Programmable Devices Jacquard loom, Analytical

engine, Harvard Mark I, Z3

Second Generation (Vacuum Tubes)

Calculators

Atanasoff–Berry Computer,

IBM 604, UNIVAC 60,

UNIVAC 120

Programmable Devices

Colossus, ENIAC,

Manchester Small-Scale

Experimental Machine,

EDSAC, Manchester Mark 1,

Ferranti Pegasus, Ferranti

Mercury, CSIRAC, EDVAC,

UNIVAC I, IBM 701, IBM

702, IBM 650, Z22

Third Generation (Discrete

transistors and SSI, MSI, LSI

Integrated circuits)

Mainframes

IBM 7090, IBM 7080, IBM

System/360, BUNCH

Minicomputer

PDP-8, PDP-11, IBM

System/32, IBM System/36

Fourth Generation (VLSI integrated

circuits)

Minicomputer VAX, IBM System i

4-bit microcomputer Intel 4004, Intel 4040

8-bit microcomputer

Intel 8008, Intel 8080,

Motorola 6800, Motorola

6809, MOS Technology 6502,

Page 73: Computer Architecture All lecture.pdf

Zilog Z80

16-bit microcomputer Intel 8088, Zilog Z8000,

WDC 65816/65802

32-bit microcomputer

Intel 80386, Pentium,

Motorola 68000, ARM

architecture

64-bit microcomputer[38]

Alpha, MIPS, PA-RISC,

PowerPC, SPARC, x86-64

Embedded computer Intel 8048, Intel 8051

Personal computer

Desktop computer, Home

computer, Laptop computer,

Personal digital assistant

(PDA), Portable computer,

Tablet PC, Wearable

computer

Theoretical/experimental

Quantum computer,

Chemical computer,

DNA computing, Optical

computer, Spintronics

based computer

Other Hardware Topics

Peripheral device

(Input/output)

Input Mouse, Keyboard, Joystick, Image scanner,

Webcam, Graphics tablet, Microphone

Output Monitor, Printer, Loudspeaker

Both Floppy disk drive, Hard disk drive, Optical

disc drive, Teleprinter

Computer busses

Short range RS-232, SCSI, PCI, USB

Long range (Computer

networking) Ethernet, ATM, FDDI

Software

Main article: Computer software

Software refers to parts of the computer which do not have a material form, such as

programs, data, protocols, etc. When software is stored in hardware that cannot easily be

modified (such as BIOS ROM in an IBM PC compatible), it is sometimes called "firmware"

to indicate that it falls into an uncertain area somewhere between hardware and software.

Computer software

Operating

system

Unix and BSD UNIX System V, IBM AIX, HP-UX, Solaris (SunOS),

IRIX, List of BSD operating systems

GNU/Linux List of Linux distributions, Comparison of Linux

distributions

Microsoft

Windows

Windows 95, Windows 98, Windows NT, Windows 2000,

Windows XP, Windows Vista, Windows 7

Page 74: Computer Architecture All lecture.pdf

DOS 86-DOS (QDOS), PC-DOS, MS-DOS, DR-DOS, FreeDOS

Mac OS Mac OS classic, Mac OS X

Embedded and

real-time List of embedded operating systems

Experimental Amoeba, Oberon/Bluebottle, Plan 9 from Bell Labs

Library

Multimedia DirectX, OpenGL, OpenAL

Programming

library C standard library, Standard Template Library

Data

Protocol TCP/IP, Kermit, FTP, HTTP, SMTP

File format HTML, XML, JPEG, MPEG, PNG

User

interface

Graphical user

interface (WIMP)

Microsoft Windows, GNOME, KDE, QNX Photon, CDE,

GEM, Aqua

Text-based user

interface

Command-line interface, Text user interface

Application

Office suite

Word processing, Desktop publishing, Presentation

program, Database management system, Scheduling &

Time management, Spreadsheet, Accounting software

Internet Access Browser, E-mail client, Web server, Mail transfer agent,

Instant messaging

Design and

manufacturing

Computer-aided design, Computer-aided manufacturing,

Plant management, Robotic manufacturing, Supply chain

management

Graphics

Raster graphics editor, Vector graphics editor, 3D modeler,

Animation editor, 3D computer graphics, Video editing,

Image processing

Audio

Digital audio editor, Audio playback, Mixing, Audio

synthesis, Computer music

Software

engineering

Compiler, Assembler, Interpreter, Debugger, Text editor,

Integrated development environment, Software performance

analysis, Revision control, Software configuration

management

Educational Edutainment, Educational game, Serious game, Flight

simulator

Games

Strategy, Arcade, Puzzle, Simulation, First-person shooter,

Platform, Massively multiplayer, Interactive fiction

Misc Artificial intelligence, Antivirus software, Malware scanner,

Installer/Package management systems, File manager

Programming languages

Main article: Programming language

Programming languages provide various ways of specifying programs for computers to run.

Unlike natural languages, programming languages are designed to permit no ambiguity and to

be concise. They are purely written languages and are often difficult to read aloud. They are

generally either translated into machine code by a compiler or an assembler before being run,

Page 75: Computer Architecture All lecture.pdf

or translated directly at run time by an interpreter. Sometimes programs are executed by a

hybrid method of the two techniques. There are thousands of different programming

languages—some intended to be general purpose, others useful only for highly specialized

applications.

Programming languages

Lists of programming

languages

Timeline of programming languages, List of programming

languages by category, Generational list of programming languages,

List of programming languages, Non-English-based programming

languages

Commonly used

Assembly languages ARM, MIPS, x86

Commonly used high-

level programming

languages

Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal,

Object Pascal

Commonly used

Scripting languages Bourne script, JavaScript, Python, Ruby, PHP, Perl

Professions and organizations

As the use of computers has spread throughout society, there are an increasing number of

careers involving computers.

Computer-related professions

Hardware-

related

Electrical engineering, Electronic engineering, Computer engineering,

Telecommunications engineering, Optical engineering, Nanoengineering

Software-

related

Computer science, Desktop publishing, Human–computer interaction,

Information technology, Information systems, Computational science, Software

engineering, Video game industry, Web design

The need for computers to work well together and to be able to exchange information has

spawned the need for many standards organizations, clubs and societies of both a formal and

informal nature.

Organizations

Standards groups ANSI, IEC, IEEE, IETF, ISO, W3C

Professional Societies ACM, AIS, IET, IFIP, BCS

Free/Open source software

groups

Free Software Foundation, Mozilla Foundation, Apache

Software Foundation

See also

Information technology portal

Computability theory

Page 76: Computer Architecture All lecture.pdf

Computer security

Computer insecurity

List of computer term etymologies

List of fictional computers

Pulse computation

Notes

1. ^ In 1946, ENIAC required an estimated 174 kW. By comparison, a modern laptop computer

may use around 30 W; nearly six thousand times less. "Approximate Desktop & Notebook

Power Usage". University of Pennsylvania.

http://www.upenn.edu/computing/provider/docs/hardware/powerusage.html. Retrieved 2009-

06-20.

2. ^ Early computers such as Colossus and ENIAC were able to process between 5 and 100

operations per second. A modern "commodity" microprocessor (as of 2007) can process

billions of operations per second, and many of these operations are more complicated and

useful than early computer operations. "Intel Core2 Duo Mobile Processor: Features". Intel

Corporation. http://www.intel.com/cd/channel/reseller/asmo-

na/eng/products/mobile/processors/core2duo_m/feature/index.htm. Retrieved 2009-06-20.

3. ^ computer, n.. Oxford English Dictionary (2 ed.). Oxford University Press. 1989.

http://dictionary.oed.com/. Retrieved 2009-04-10

4. ^ "Discovering How Greeks Computed in 100 B.C.". The New York Times. 31 July 2008.

http://www.nytimes.com/2008/07/31/science/31computer.html?hp. Retrieved 27 March 2010.

5. ^ "Heron of Alexandria". http://www.mlahanas.de/Greeks/HeronAlexandria2.htm. Retrieved

2008-01-15.

6. ^ a b Ancient Discoveries, Episode 11: Ancient Robots. History Channel.

http://www.youtube.com/watch?v=rxjbaQl0ad8. Retrieved 2008-09-06

7. ^ Howard R. Turner (1997), Science in Medieval Islam: An Illustrated Introduction, p. 184,

University of Texas Press, ISBN 0-292-78149-0

8. ^ Donald Routledge Hill, "Mechanical Engineering in the Medieval Near East", Scientific

American, May 1991, pp. 64–9 (cf. Donald Routledge Hill, Mechanical Engineering)

9. ^ From cave paintings to the internet HistoryofScience.com

10. ^ See: Anthony Hyman, ed., Science and Reform: Selected Works of Charles Babbage

(Cambridge, England: Cambridge University Press, 1989), page 298. It is in the collection of

the Science Museum in London, England. (Delve (2007), page 99.)

11. ^ The analytical engine should not be confused with Babbage's difference engine which was a

non-programmable mechanical calculator.

12. ^ "Columbia University Computing History: Herman Hollerith". Columbia.edu.

http://www.columbia.edu/acis/history/hollerith.html. Retrieved 2010-12-11.

13. ^ a b "Alan Turing – Time 100 People of the Century". Time Magazine.

http://205.188.238.181/time/time100/scientist/profile/turing.html. Retrieved 2009-06-13. "The

fact remains that everyone who taps at a keyboard, opening a spreadsheet or a word-

processing program, is working on an incarnation of a Turing machine"

14. ^ "Atanasoff-Berry Computer".

http://energysciencenews.com/phpBB3/viewtopic.php?f=1&t=98&p=264#p264. Retrieved

2010-11-20.

15. ^ "Spiegel: The inventor of the computer's biography was published". Spiegel.de. 2009-09-28.

http://www.spiegel.de/netzwelt/gadgets/0,1518,651776,00.html. Retrieved 2010-12-11.

16. ^ "Inventor Profile: George R. Stibitz". National Inventors Hall of Fame Foundation, Inc..

http://www.invent.org/hall_of_fame/140.html.

17. ^ Rojas, R. (1998). "How to make Zuse's Z3 a universal computer". IEEE Annals of the

History of Computing 20 (3): 51–54. doi:10.1109/85.707574.

18. ^ B. Jack Copeland, ed., Colossus: The Secrets of Bletchley Park's Codebreaking Computers,

Oxford University Press, 2006

Page 77: Computer Architecture All lecture.pdf

19. ^ ""Robot Mathematician Knows All The Answers", October 1944, Popular Science".

Books.google.com.

http://books.google.com/books?id=PyEDAAAAMBAJ&pg=PA86&dq=motor+gun+boat&hl=

en&ei=LxTqTMfGI4-

bnwfEyNiWDQ&sa=X&oi=book_result&ct=result&resnum=6&ved=0CEIQ6AEwBQ#v=one

page&q=motor%20gun%20boat&f=true. Retrieved 2010-12-11.

20. ^ Lavington 1998, p. 37

21. ^ This program was written similarly to those for the PDP-11 minicomputer and shows some

typical things a computer can do. All the text after the semicolons are comments for the

benefit of human readers. These have no significance to the computer and are ignored. (Digital

Equipment Corporation 1972)

22. ^ It is not universally true that bugs are solely due to programmer oversight. Computer

hardware may fail or may itself have a fundamental problem that produces unexpected results

in certain situations. For instance, the Pentium FDIV bug caused some Intel microprocessors

in the early 1990s to produce inaccurate results for certain floating point division operations.

This was caused by a flaw in the microprocessor design and resulted in a partial recall of the

affected devices.

23. ^ Even some later computers were commonly programmed directly in machine code. Some

minicomputers like the DEC PDP-8 could be programmed directly from a panel of switches.

However, this method was usually used only as part of the booting process. Most modern

computers boot entirely automatically by reading a boot program from some non-volatile

memory.

24. ^ However, there is sometimes some form of machine language compatibility between

different computers. An x86-64 compatible microprocessor like the AMD Athlon 64 is able to

run most of the same programs that an Intel Core 2 microprocessor can, as well as programs

designed for earlier microprocessors like the Intel Pentiums and Intel 80486. This contrasts

with very early commercial computers, which were often one-of-a-kind and totally

incompatible with other computers.

25. ^ High level languages are also often interpreted rather than compiled. Interpreted languages

are translated into machine code on the fly, while running, by another program called an

interpreter.

26. ^ The control unit's role in interpreting instructions has varied somewhat in the past. Although

the control unit is solely responsible for instruction interpretation in most modern computers,

this is not always the case. Many computers include some instructions that may only be

partially interpreted by the control system and partially interpreted by another device. This is

especially the case with specialized computing hardware that may be partially self-contained.

For example, EDVAC, one of the earliest stored-program computers, used a central control

unit that only interpreted four instructions. All of the arithmetic-related instructions were

passed on to its arithmetic unit and further decoded there.

27. ^ Instructions often occupy more than one memory address, so the program counters usually

increases by the number of memory locations required to store one instruction.

28. ^ David J. Eck (2000). The Most Complex Machine: A Survey of Computers and Computing.

A K Peters, Ltd.. p. 54. ISBN 9781568811284.

29. ^ Erricos John Kontoghiorghes (2006). Handbook of Parallel Computing and Statistics. CRC

Press. p. 45. ISBN 9780824740672.

30. ^ Flash memory also may only be rewritten a limited number of times before wearing out,

making it less useful for heavy random access usage. (Verma & Mielke 1988)

31. ^ Donald Eadie (1968). Introduction to the Basic Computer. Prentice-Hall. p. 12.

32. ^ Arpad Barna; Dan I. Porat (1976). Introduction to Microcomputers and the

Microprocessors. Wiley. p. 85. ISBN 9780471050513.

33. ^ Jerry Peek; Grace Todino, John Strang (2002). Learning the UNIX Operating System: A

Concise Guide for the New User. O'Reilly. p. 130. ISBN 9780596002619.

34. ^ Gillian M. Davis (2002). Noise Reduction in Speech Applications. CRC Press. p. 111.

ISBN 9780849309496.

Page 78: Computer Architecture All lecture.pdf

35. ^ However, it is also very common to construct supercomputers out of many pieces of cheap

commodity hardware; usually individual computers connected by networks. These so-called

computer clusters can often provide supercomputer performance at a much lower cost than

customized designs. While custom architectures are still used for most of the most powerful

supercomputers, there has been a proliferation of cluster computers in recent years. (TOP500

2006)

36. ^ Agatha C. Hughes (2000). Systems, Experts, and Computers. MIT Press. p. 161.

ISBN 9780262082853. "The experience of SAGE helped make possible the first truly large-

scale commercial real-time network: the SABRE computerized airline reservations system..."

37. ^ "A Brief History of the Internet". Internet Society.

http://www.isoc.org/internet/history/brief.shtml. Retrieved 2008-09-20.

38. ^ Most major 64-bit instruction set architectures are extensions of earlier designs. All of the

architectures listed in this table, except for Alpha, existed in 32-bit forms before their 64-bit

incarnations were introduced.

References

a Kempf, Karl (1961). Historical Monograph: Electronic Computers Within the

Ordnance Corps. Aberdeen Proving Ground (United States Army). http://ed-

thelen.org/comp-hist/U-S-Ord-61.html.

a Phillips, Tony (2000). "The Antikythera Mechanism I". American Mathematical

Society. http://www.math.sunysb.edu/~tony/whatsnew/column/antikytheraI-

0400/kyth1.html. Retrieved 2006-04-05.

a Shannon, Claude Elwood (1940). A symbolic analysis of relay and switching

circuits. Massachusetts Institute of Technology. http://hdl.handle.net/1721.1/11173.

Digital Equipment Corporation (1972) (PDF). PDP-11/40 Processor Handbook.

Maynard, MA: Digital Equipment Corporation.

http://bitsavers.vt100.net/dec/www.computer.museum.uq.edu.au_mirror/D-09-

30_PDP11-40_Processor_Handbook.pdf.

Verma, G.; Mielke, N. (1988). Reliability performance of ETOX based flash

memories. IEEE International Reliability Physics Symposium.

Meuer, Hans; Strohmaier, Erich; Simon, Horst; Dongarra, Jack (2006-11-13).

"Architectures Share Over Time". TOP500.

http://www.top500.org/lists/2006/11/overtime/Architectures. Retrieved 2006-11-27.

Lavington, Simon (1998). A History of Manchester Computers (2 ed.). Swindon: The

British Computer Society. ISBN 0902505018

Stokes, Jon (2007). Inside the Machine: An Illustrated Introduction to

Microprocessors and Computer Architecture. San Francisco: No Starch Press.

ISBN 978-1-59327-104-6.

External links

Find more about Computer on Wikipedia's sister projects:

Definitions from Wiktionary

Images and media from Commons

Learning resources from Wikiversity

News stories from Wikinews

Quotations from Wikiquote

Page 79: Computer Architecture All lecture.pdf

Source texts from Wikisource

Textbooks from Wikibooks

A Brief History of Computing - slideshow by Life magazine

Retrieved from "http://en.wikipedia.org/wiki/Computer"

Categories: Computers | Computing

Hidden categories: Wikipedia indefinitely semi-protected pages | Wikipedia indefinitely

move-protected pages | All pages needing factual verification | Wikipedia articles needing

factual verification from September 2010 | All articles with unsourced statements | Articles

with unsourced statements from February 2010 | Articles with unsourced statements from

December 2007 | Use dmy dates from September 2010

Personal tools

Log in / create account

Namespaces

Article

Discussion

Variants

Views

Read

View source

View history

Actions

Search

Navigation

Main page

Contents

Featured content

Current events

Random article

Donate to Wikipedia

Interaction

Help

Page 80: Computer Architecture All lecture.pdf

About Wikipedia

Community portal

Recent changes

Contact Wikipedia

Toolbox

What links here

Related changes

Upload file

Special pages

Permanent link

Cite this page

Print/export

Create a book

Download as PDF

Printable version

Languages

Acèh

Afrikaans

Alemannisch አማርኛ

Ænglisc

ية عرب ال

Aragonés ܐܪܡܝܐ

Asturianu

Azərbaycanca

Bân-lâm-gú

Башҡортса

Беларуская

Беларуская (тарашке а

Boarisch

Bosanski

Brezhoneg

Български

Català

Чӑ ашла

Cebuano

Česky

Cymraeg

Dansk

Deutsch

Page 81: Computer Architecture All lecture.pdf

Diné bizaad

Eesti

Ελληνικά

Emiliàn e rumagnòl

Español

Esperanto

Euskara

سی ار ف

Føroyskt

Français

Frysk

Furlan

Gaeilge

Gaelg

Gàidhlig

Galego

贛語

������

Hak-kâ-fa

한국어

Հայերեն

Hrvatski

Ido

Igbo

/ Bahasa Indonesia

Interlingua

ᐃᓄᒃᑎᑐᑦ/inuktitut

isiXhosa

Íslenska

Italiano

עברית

Basa Jawa

Kapampangan

Къарачай-Малкъар

ქართული

Қазақша

Kernowek

Kinyarwanda

Кыргызча

Kiswahili

Kongo

Kurdî

Ladino

Page 82: Computer Architecture All lecture.pdf

ລາວ

Latina

Latviešu

Lëtzebuergesch

Lietuvių

Limburgs

Lingála

Lojban

Lumbaart

Magyar

Македонски

Malagasy

Malti

صرى م

ما ر ی

Bahasa Melayu

Mirandés

Монгол

����������

Nāhuatl

Nederlands

Nedersaksisch

日本語

Nnapulitano

Нохчийн

Norsk (bokm l

Norsk (nynorsk

Occitan

Олык Марий

O'zbek

ی نجاب پ

ت ښ پ Plattdüütsch

Polski

Português

Română

Runa Simi

Русский

Русиньскый

Саха тыла

Page 83: Computer Architecture All lecture.pdf

Sardu

Scots

Seeltersk

Shqip

Sicilianu

Simple English

Slovenčina

Сло ньскъ / ����������

Slovenščina

Soomaaliga

Српски / Srpski

Srpskohrvatski / Српскохр атски

Suomi

Svenska

Tagalog

Татарча/Tatarça

ไทย Тоҷикӣ

Türkçe

Türkmençe

�� ����

Українська

ارد

Vahcuengh

Vèneto

Tiếng Việt

Võro

Walon

West-Vlams

Winaray

Wolof

吴语

יי י

Yorùbá

粵語

Zazaki

Žemaitėška

中文

This page was last modified on 17 February 2011 at 17:57.

Text is available under the Creative Commons Attribution-ShareAlike License;

additional terms may apply. See Terms of Use for details.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit

organization.

Contact us

Page 85: Computer Architecture All lecture.pdf

CPU design

From Wikipedia, the free encyclopedia

Jump to: navigation, search

CPU design is the design engineering task of creating a central processing unit (CPU), a

component of computer hardware. It is a subfield of electronics engineering and computer

engineering.

Contents

1 Overview

2 Goals

3 Performance analysis and benchmarking

4 Markets

o 4.1 General purpose computing

4.1.1 High-end processor economics

o 4.2 Scientific computing

o 4.3 Embedded design

4.3.1 Embedded processor economics

4.3.2 Research and educational CPU design

4.3.3 Soft microprocessor cores

5 Micro-architectural concepts

6 Integrated heat spreader

7 Research Topics

8 References

9 See also

[edit] Overview

CPU design focuses on these areas:

1. datapaths (such as ALUs and pipelines)

2. control unit: logic which controls the datapaths

3. Memory components such as register files, caches

4. Clock circuitry such as clock drivers, PLLs, clock distribution networks

5. Pad transceiver circuitry

6. Logic gate cell library which is used to implement the logic

CPUs designed for high-performance markets might require custom designs for each of these

items to achieve frequency, power-dissipation, and chip-area goals.

CPUs designed for lower performance markets might lessen the implementation burden by:

Acquiring some of these items by purchasing them as intellectual property

Use control logic implementation techniques (logic synthesis using CAD tools) to

implement the other components - datapaths, register files, clocks

Page 86: Computer Architecture All lecture.pdf

Common logic styles used in CPU design include:

Unstructured random logic

Finite-state machines

Microprogramming (common from 1965 to 1985)

Programmable logic array (common in the 1980s, no longer common)

Device types used to implement the logic include:

Transistor-transistor logic Small Scale Integration logic chips - no longer used for

CPUs

Programmable Array Logic and Programmable logic devices - no longer used for

CPUs

Emitter-coupled logic (ECL) gate arrays - no longer common

CMOS gate arrays - no longer used for CPUs

CMOS ASICs - what's commonly used today, they're so common that the term ASIC

is not used for CPUs

Field-programmable gate arrays (FPGA) - common for soft microprocessors, and

more or less required for reconfigurable computing

A CPU design project generally has these major tasks:

Programmer-visible instruction set architecture, which can be implemented by a

variety of microarchitectures

Architectural study and performance modeling in ANSI C/C++ or SystemC

High-level synthesis (HLS) or RTL (e.g. logic) implementation

RTL Verification

Circuit design of speed critical components (caches, registers, ALUs)

Logic synthesis or logic-gate-level design

Timing analysis to confirm that all logic and circuits will run at the specified operating

frequency

Physical design including floorplanning, place and route of logic gates

Checking that RTL, gate-level, transistor-level and physical-level representations are

equivalent

Checks for signal integrity, chip manufacturability

As with most complex electronic designs, the logic verification effort (proving that the design

does not have bugs) now dominates the project schedule of a CPU.

Key CPU architectural innovations include index register, cache, virtual memory, instruction

pipelining, superscalar, CISC, RISC, virtual machine, emulators, microprogram, and stack.

[edit] Goals

The first CPUs were designed to do mathematical calculations faster and more reliably than

human computers.[1]

Each successive generation of CPU might be designed to achieve some of these goals:

higher performance levels of a single program or thread

Page 87: Computer Architecture All lecture.pdf

higher throughput levels of multiple programs/threads

less power consumption for the same performance level

lower cost for the same performance level

greater connectivity to build larger, more parallel systems

more specialization to aid in specific targeted markets

Re-designing a CPU core to a smaller die-area helps achieve several of these goals.

Shrinking everything (a "photomask shrink"), resulting in the same number of

transistors on a smaller die, improves performance (smaller transistors switch faster),

reduces power (smaller wires have less parasitic capacitance) and reduces cost (more

CPUs fit on the same wafer of silicon).

Releasing a CPU on the same size die, but with a smaller CPU core, keeps the cost

about the same but allows higher levels of integration within one VLSI chip

(additional cache, multiple CPUs, or other components), improving performance and

reducing overall system cost.

[edit] Performance analysis and benchmarking

Main article: Computer performance

Because there are too many programs to test a CPU's speed on all of them, benchmarks were

developed. The most famous benchmarks are the SPECint and SPECfp benchmarks

developed by Standard Performance Evaluation Corporation and the ConsumerMark

benchmark developed by the Embedded Microprocessor Benchmark Consortium EEMBC.

Some important measurements include:

Instructions per second - Most consumers pick a computer architecture (normally Intel

IA32 architecture) to be able to run a large base of pre-existing pre-compiled software.

Being relatively uninformed on computer benchmarks, some of them pick a particular

CPU based on operating frequency (see Megahertz Myth).

FLOPS - The number of floating point operations per second is often important in

selecting computers for scientific computations.

Performance per watt - System designers building parallel computers, such as Google,

pick CPUs based on their speed per watt of power, because the cost of powering the

CPU outweighs the cost of the CPU itself. [1][2]

Some system designers building parallel computers pick CPUs based on the speed per

dollar.

System designers building real-time computing systems want to guarantee worst-case

response. That is easier to do when the CPU has low interrupt latency and when it has

deterministic response. (DSP)

Computer programmers who program directly in assembly language want a CPU to

support a full featured instruction set.

Low power - For systems with limited power sources (e.g. solar, batteries, human

power).

Small size or low weight - for portable embedded systems, systems for spacecraft.

Environmental impact - Minimizing environmental impact of computers during

manufacturing and recycling as well during use. Reducing waste, reducing hazardous

materials. (see Green computing).

Page 88: Computer Architecture All lecture.pdf

Some of these measures conflict. In particular, many design techniques that make a CPU run

faster make the "performance per watt", "performance per dollar", and "deterministic

response" much worse, and vice versa.

[edit] Markets

There are several different markets in which CPUs are used. Since each of these markets

differ in their requirements for CPUs, the devices designed for one market are in most cases

inappropriate for the other markets.

[edit] General purpose computing

The vast majority of revenues generated from CPU sales is for general purpose

computing[citation needed]

. That is, desktop, laptop and server computers commonly used in

businesses and homes. In this market, the Intel IA-32 architecture dominates, with its rivals

PowerPC and SPARC maintaining much smaller customer bases. Yearly, hundreds of

millions of IA-32 architecture CPUs are used by this market.

Since these devices are used to run countless different types of programs, these CPU designs

are not specifically targeted at one type of application or one function. The demands of being

able to run a wide range of programs efficiently has made these CPU designs among the more

advanced technically, along with some disadvantages of being relatively costly, and having

high power consumption.

[edit] High-end processor economics

In 1984, most high-performance CPUs required four to five years to develop.[2]

This section may require cleanup to meet Wikipedia's quality standards. Please

improve this section if you can. The talk page may contain suggestions. (December

2009)

Developing new, high-end CPUs is a very costly proposition. Both the logical complexity

(needing very large logic design and logic verification teams and simulation farms with

perhaps thousands of computers) and the high operating frequencies (needing large circuit

design teams and access to the state-of-the-art fabrication process) account for the high cost of

design for this type of chip. The design cost of a high-end CPU will be on the order of US

$100 million. Since the design of such high-end chips nominally takes about five years to

complete, to stay competitive a company has to fund at least two of these large design teams

to release products at the rate of 2.5 years per product generation.

As an example, the typical loaded cost for one computer engineer is often quoted to be

$250,000 US dollars/year. This includes salary, benefits, CAD tools, computers, office space

rent, etc. Assuming that 100 engineers are needed to design a CPU and the project takes 4

years.

Total cost = $250,000 / Engineer-Man/Year x 100 engineers x 4 years = $100,000,000 USD.

Page 89: Computer Architecture All lecture.pdf

The above amount is just an example. The design teams for modern day general purpose

CPUs have several hundred team members.

[edit] Scientific computing

Main article: Supercomputer

A much smaller niche market (in revenue and units shipped) is scientific computing, used in

government research labs and universities. Previously much CPU design was done for this

market, but the cost-effectiveness of using mass markets CPUs has curtailed almost all

specialized designs for this market. The main remaining area of active hardware design and

research for scientific computing is for high-speed system interconnects.

[edit] Embedded design

As measured by units shipped, most CPUs are embedded in other machinery, such as

telephones, clocks, appliances, vehicles, and infrastructure. Embedded processors sell in the

volume of many billions of units per year, however, mostly at much lower price points than

that of the general purpose processors.

These single-function devices differ from the more familiar general-purpose CPUs in several

ways:

Low cost is of utmost importance.

It is important to maintain a low power dissipation as embedded devices often have a

limited battery life and it is often impractical to include cooling fans.

To give lower system cost, peripherals are integrated with the processor on the same

silicon chip.

Keeping peripherals on-chip also reduces power consumption as external GPIO ports

typically require buffering so that they can source or sink the relatively high current

loads that are required to maintain a strong signal outside of the chip.

o Many embedded applications have a limited amount of physical space for

circuitry; keeping peripherals on-chip will reduce the space required for the

circuit board.

o The program and data memories are often integrated on the same chip. When

the only allowed program memory is ROM, the device is known as a

microcontroller.

For many embedded applications, interrupt latency will be more critical than in some

general-purpose processors.

[edit] Embedded processor economics

As of 2009, more CPUs are produced using the ARM architecture instruction set than any

other 32-bit instruction set. The ARM architecture and the first ARM chip were designed in

about one and a half years and 5 man years of work time.[3]

The 32-bit Parallax Propeller microcontroller architecture and the first chip were designed by

two people in about 10 man years of work time.[4]

Page 90: Computer Architecture All lecture.pdf

It is believed[weasel words]

that the 8-bit AVR architecture and first AVR microcontroller was

conceived and designed by two students at the Norwegian Institute of Technology.

The 8-bit 6502 architecture and the first MOS Technology 6502 chip were designed in 13

months by a group of about 9 people.[5]

[edit] Research and educational CPU design

The 32 bit Berkeley RISC I and RISC II architecture and the first chips were mostly designed

by a series of students as part of a four quarter sequence of graduate courses.[6]

This design

became the basis of the commercial SPARC processor design.

For about a decade, every student taking the 6.004 class at MIT was part of a team—each

team had one semester to design and build a simple 8 bit CPU out of 7400 series integrated

circuits. One team of 4 students designed and built a simple 32 bit CPU during that semester. [7]

Some undergraduate courses require a team of 2 to 5 students to design, implement, and test a

simple CPU in a FPGA in a single 15 week semester. [8]

[edit] Soft microprocessor cores

For embedded systems, the highest performance levels are often not needed or desired due to

the power consumption requirements. This allows for the use of processors which can be

totally implemented by logic synthesis techniques. These synthesized processors can be

implemented in a much shorter amount of time, giving quicker time-to-market.

Main article: Soft microprocessor

[edit] Micro-architectural concepts

Main article: Microarchitecture

[edit] Integrated heat spreader

IHS is usually made of copper covered with a nickel plating.

[edit] Research Topics

Main article: History of general purpose CPUs#1990 to today: looking forward

A variety of new CPU design ideas have been proposed, including reconfigurable logic,

clockless CPUs, and optical computing.

[edit] References

Page 91: Computer Architecture All lecture.pdf

This article includes a list of references, related reading or external links, but its

sources remain unclear because it lacks inline citations. Please improve this article by

introducing more precise citations where appropriate. (March 2009)

1. ^ Brian Randell: The Origins of Digital Computers. Berlin: Springer 1973. ISBN 0-387-06169

2. ^ "New system manages hundreds of transactions per second" article by Robert Horst and

Sandra Metz, of Tandem Computers Inc., "Electronics" magazine, 1984 April 19: "While most

high-performance CPUs require four to five years to develop, The NonStop TXP processor

took just 2+1/2 years -- six months to develop a complete written specification, one year to

construct a working prototype, and another year to reach volume production."

3. ^ "ARM's way" 1998

4. ^ "Why the Propeller Works" by Chip Gracey

5. ^ "Interview with William Mensch"

6. ^ 'Design and Implementation of RISC I' - original journal article by C.E. Sequin and

D.A.Patterson

7. ^ "the VHS"

8. ^ "Teaching Computer Design with FPGAs" by Jan Gray

Notes

Hwang, Enoch (2006). Digital Logic and Microprocessor Design with VHDL.

Thomson. ISBN 0-534-46593-5. http://faculty.lasierra.edu/~ehwang/digitaldesign.

Processor Design: An Introduction - Detailed introduction to microprocessor design.

Somewhat incomplete and outdated, but still worthwhile.

[edit] See also

Wikibooks has a book on the topic of

Microprocessor Design

Computer science portal

Central processing unit

History of general purpose CPUs

Microprocessor

Microarchitecture

Moore's law

Amdahl's law

System-on-a-chip

Reduced instruction set computer

Complex instruction set computer

Minimal instruction set computer

Electronic design automation

High-level synthesis

v · d · eCPU technologies

Architecture

ISA : CISC · EDGE · EPIC · MISC · OISC · RISC · VLIW ·

NISC · ZISC · Harvard architecture · von Neumann architecture ·

Page 92: Computer Architecture All lecture.pdf

4-bit · 8-bit · 12-bit · 16-bit · 18-bit · 24-bit · 31-bit · 32-bit · 36-

bit · 48-bit · 64-bit · 128-bit · Comparison of CPU architectures

Parallelism

Pipeline

Instruction pipelining · In-order & out-of-

order execution · Register renaming ·

Speculative execution · Hazards

Level Bit · Instruction · Superscalar · Data · Task

Threads

Multithreading · Simultaneous

multithreading · Hyperthreading ·

Superthreading

Flynn's taxonomy SISD · SIMD · MISD · MIMD

Types

Digital signal processor · Microcontroller · System-on-a-chip ·

Vector processor

Components

Arithmetic logic unit (ALU) · Address generation unit (AGU) ·

Barrel shifter · Floating-point unit (FPU) · Back-side bus ·

Multiplexer · Demultiplexer · Registers · Memory management

unit (MMU) · Translation lookaside buffer (TLB) · Cache ·

Register file · Microcode · Control unit · Clock rate

Power management

APM · ACPI · Dynamic frequency scaling · Dynamic voltage

scaling · Clock gating

Retrieved from "http://en.wikipedia.org/wiki/CPU_design"

Categories: Central processing unit | Computer engineering

Hidden categories: All articles with unsourced statements | Articles with unsourced statements

from May 2010 | Wikipedia articles needing cleanup from December 2009 | All articles

needing cleanup | All articles with specifically marked weasel-worded phrases | Articles with

specifically marked weasel-worded phrases from March 2009 | Articles lacking in-text

citations from March 2009 | All articles lacking in-text citations

Personal tools

Log in / create account

Namespaces

Article

Discussion

Variants

Views

Read

Edit

View history

Actions

Search

Page 93: Computer Architecture All lecture.pdf

Navigation

Main page

Contents

Featured content

Current events

Random article

Donate to Wikipedia

Interaction

Help

About Wikipedia

Community portal

Recent changes

Contact Wikipedia

Toolbox

What links here

Related changes

Upload file

Special pages

Permanent link

Cite this page

Print/export

Create a book

Download as PDF

Printable version

Languages

ية عرب ال

Česky

Deutsch

Français

Nederlands

日本語

Polski

Português

Русский

Türkçe

This page was last modified on 8 February 2011 at 15:26.

Page 94: Computer Architecture All lecture.pdf

Text is available under the Creative Commons Attribution-ShareAlike License;

additional terms may apply. See Terms of Use for details.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit

organization.

Contact us

Privacy policy

About Wikipedia

Disclaimers

Page 95: Computer Architecture All lecture.pdf

Von Neumann architecture

From Wikipedia, the free encyclopedia

Jump to: navigation, search

It has been suggested that System bus model be merged into this article or section.

(Discuss)

Schematic of the von Neumann architecture. The Control Unit and Arithmetic Logic Unit

form the main components of the Central Processing Unit (CPU)

The von Neumann architecture is a design model for a stored-program digital computer that

uses a central processing unit (CPU) and a single separate storage structure ("memory") to

hold both instructions and data. It is named after the mathematician and early computer

scientist John von Neumann. Such computers implement a universal Turing machine and have

a sequential architecture.

A stored-program digital computer is one that keeps its programmed instructions, as well as

its data, in read-write, random-access memory (RAM). Stored-program computers were an

advancement over the program-controlled computers of the 1940s, such as the Colossus and

the ENIAC, which were programmed by setting switches and inserting patch leads to route

data and to control signals between various functional units. In the vast majority of modern

computers, the same memory is used for both data and program instructions. The mechanisms

for transferring the data and instructions between the CPU and memory are, however,

considerably more complex than the original von Neumann architecture.

The terms "von Neumann architecture" and "stored-program computer" are generally used

interchangeably, and that usage is followed in this article.

Contents

1 Description

2 Development of the stored-program concept

3 Von Neumann bottleneck

Page 96: Computer Architecture All lecture.pdf

4 Early von Neumann-architecture computers

5 Early stored-program computers

6 Non-von Neumann processors

7 See also

8 References

o 8.1 Inline

o 8.2 General

9 External links

[edit] Description

The earliest computing machines had fixed programs. Some very simple computers still use

this design, either for simplicity or training purposes. For example, a desk calculator (in

principle) is a fixed program computer. It can do basic mathematics, but it cannot be used as a

word processor or a gaming console. Changing the program of a fixed-program machine

requires re-wiring, re-structuring, or re-designing the machine. The earliest computers were

not so much "programmed" as they were "designed". "Reprogramming", when it was possible

at all, was a laborious process, starting with flowcharts and paper notes, followed by detailed

engineering designs, and then the often-arduous process of physically re-wiring and re-

building the machine. It could take three weeks to set up a program on ENIAC and get it

working.[1]

The idea of the stored-program computer changed all that: a computer that by design includes

an instruction set and can store in memory a set of instructions (a program) that details the

computation.

A stored-program design also lets programs modify themselves while running. One early

motivation for such a facility was the need for a program to increment or otherwise modify

the address portion of instructions, which had to be done manually in early designs. This

became less important when index registers and indirect addressing became usual features of

machine architecture. Self-modifying code has largely fallen out of favor, since it is usually

hard to understand and debug, as well as being inefficient under modern processor pipelining

and caching schemes.

On a large scale, the ability to treat instructions as data is what makes assemblers, compilers

and other automated programming tools possible. One can "write programs which write

programs".[2]

On a smaller scale, I/O-intensive machine instructions such as the BITBLT

primitive used to modify images on a bitmap display, were once thought to be impossible to

implement without custom hardware. It was shown later that these instructions could be

implemented efficiently by "on the fly compilation" ("just-in-time compilation") technology,

e.g., code-generating programs—one form of self-modifying code that has remained popular.

There are drawbacks to the von Neumann design. Aside from the von Neumann bottleneck

described below, program modifications can be quite harmful, either by accident or design. In

some simple stored-program computer designs, a malfunctioning program can damage itself,

other programs, or the operating system, possibly leading to a computer crash. Memory

protection and other forms of access control can usually protect against both accidental and

malicious program modification.

Page 97: Computer Architecture All lecture.pdf

[edit] Development of the stored-program concept

The mathematician Alan Turing, who had been alerted to a problem of mathematical logic by

the lectures of Max Newman at the University of Cambridge, wrote a paper in 1936 entitled

On Computable Numbers, with an Application to the Entscheidungsproblem, which was

published in the Proceedings of the London Mathematical Society.[3]

In it he described a

hypothetical machine which he called a "universal computing machine", and which is now

known as the "universal Turing machine". The hypothetical machine had an infinite store

(memory in today's terminology) that contained both instructions and data. The German

engineer Konrad Zuse independently wrote about this concept in 1936.[4]

John von Neumann

became acquainted with Turing when he was a visiting professor at Cambridge in 1935 and

also during the year that Turing spent at Princeton University in 1936-37. Whether he knew of

Turing's 1936 paper at that time is not clear.

Independently, J. Presper Eckert and John Mauchly, who were developing the ENIAC at the

Moore School of Electrical Engineering, at the University of Pennsylvania, wrote about the

stored-program concept in December 1943.[5][6]

In planning a new machine, EDVAC, Eckert

wrote in January 1944 that they would store data and programs in a new addressable memory

device, a mercury metal delay line memory. This was the first time the construction of a

practical stored-program was proposed. At that time, they were not aware of Turing's work.

Von Neumann was involved in the Manhattan Project at the Los Alamos National Laboratory,

which required huge amounts of calculation. This drew him to the ENIAC project, in the

summer of 1944. There he joined into the ongoing discussions on the design of this stored-

program computer, the EDVAC. As part of that group, he volunteered to write up a

description of it. The term "von Neumann architecture" arose from von Neumann's paper

First Draft of a Report on the EDVAC dated 30 June 1945, which included ideas from Eckert

and Mauchly. It was unfinished when his colleague Herman Goldstine circulated it with only

von Neumann's name on it, to the consternation of Eckert and Mauchly.[7]

The paper was read

by dozens of von Neumann's colleagues in America and Europe, and influenced the next

round of computer designs.

Von Neumann was, then, not alone in putting forward the idea of the stored-program

architecture, and Jack Copeland considers that it is "historically inappropriate, to refer to

electronic stored-program digital computers as 'von Neumann machines'".[8]

His Los Alamos

colleague Stan Frankel said of his regard for Turing's ideas:

I know that in or about 1943 or '44 von Neumann was well aware of the

fundamental importance of Turing's paper of 1936 ... Von Neumann introduced

me to that paper and at his urging I studied it with care. Many people have

acclaimed von Neumann as the "father of the computer" (in a modern sense of

the term) but I am sure that he would never have made that mistake himself.

He might well be called the midwife, perhaps, but he firmly emphasized to me,

and to others I am sure, that the fundamental conception is owing to Turing—

in so far as not anticipated by Babbage ... Both Turing and von Neumann, of

course, also made substantial contributions to the "reduction to practice" of

these concepts but I would not regard these as comparable in importance with

the introduction and explication of the concept of a computer able to store in

its memory its program of activities and of modifying that program in the

course of these activities. [9]

Page 98: Computer Architecture All lecture.pdf

Later, Turing produced a detailed technical report Proposed Electronic Calculator describing

the Automatic Computing Engine (ACE).[10]

He presented this to the Executive Committee of

the British National Physical Laboratory on 19 February 1946. Although Turing knew from

his wartime experience at Bletchley Park that what he proposed was feasible, the secrecy that

was maintained about Colossus for several decades prevented him from saying so. Various

successful implementations of the ACE design were produced.

Both von Neumann's and Turing's papers described stored program-computers, but von

Neumann's earlier paper achieved greater circulation and the computer architecture it outlined

became known as the "von Neumann architecture". In the 1953 book Faster than Thought

(edited by B.V. Bowden), a section in the chapter on Computers in America reads as

follows:[11]

THE MACHINE OF THE INSTITUTE FOR ADVANCED STUDIES,

PRINCETON

In 1945, Professor J. von Neumann, who was then working at the Moore

School of Engineering in Philadelphia, where the E.N.I.A.C. had been built,

issued on behalf of a group of his co-workers a report on the logical design of

digital computers. The report contained a fairly detailed proposal for the

design of the machine which has since become known as the E.D.V.A.C.

(electronic discrete variable automatic computer). This machine has only

recently been completed in America, but the von Neumann report inspired the

construction of the E.D.S.A.C. (electronic delay-storage automatic calculator)

in Cambridge (see page 130).

In 1947, Burks, Goldstine and von Neumann published another report which

outlined the design of another type of machine (a parallel machine this time)

which should be exceedingly fast, capable perhaps of 20,000 operations per

second. They pointed out that the outstanding problem in constructing such a

machine was in the development of a suitable memory, all the contents of

which were instantaneously accessible, and at first they suggested the use of a

special tube—called the Selectron, which had been invented by the Princeton

Laboratories of the R.C.A. These tubes were expensive and difficult to make, so

von Neumann subsequently decided to build a machine based on the Williams

memory. This machine, which was completed in June, 1952 in Princeton has

become popularly known as the Maniac. The design of this machine has

inspired that of half a dozen or more machines which are now being built in

America, all of which are known affectionately as "Johniacs."'

In the same book, the first two paragraphs of a chapter on ACE read as follows:[12]

AUTOMATIC COMPUTATION AT THE NATIONAL PHYSICAL

LABORATORY'

One of the most modern digital computers which embodies developments and

improvements in the technique of automatic electronic computing was recently

demonstrated at the National Physical Laboratory, Teddington, where it has

been designed and built by a small team of mathematicians and electronics

research engineers on the staff of the Laboratory, assisted by a number of

Page 99: Computer Architecture All lecture.pdf

production engineers from the English Electric Company, Limited. The

equipment so far erected at the Laboratory is only the pilot model of a much

larger installation which will be known as the Automatic Computing Engine,

but although comparatively small in bulk and containing only about 800

thermionic valves, as can be judged from Plates XII, XIII and XIV, it is an

extremely rapid and versatile calculating machine.

The basic concepts and abstract principles of computation by a machine were

formulated by Dr. A. M. Turing, F.R.S., in a paper1. read before the London

Mathematical Society in 1936, but work on such machines in Britain was

delayed by the war. In 1945, however, an examination of the problems was

made at the National Physical Laboratory by Mr. J. R. Womersley, then

superintendent of the Mathematics Division of the Laboratory. He was joined

by Dr. Turing and a small staff of specialists, and, by 1947, the preliminary

planning was sufficiently advanced to warrant the establishment of the special

group already mentioned. In April, 1948, the latter became the Electronics

Section of the Laboratory, under the charge of Mr. F. M. Colebrook.

[edit] Von Neumann bottleneck

The separation between the CPU and memory leads to the von Neumann bottleneck, the

limited throughput (data transfer rate) between the CPU and memory compared to the amount

of memory. In most modern computers, throughput is much smaller than the rate at which the

CPU can work. This seriously limits the effective processing speed when the CPU is required

to perform minimal processing on large amounts of data. The CPU is continuously forced to

wait for needed data to be transferred to or from memory. Since CPU speed and memory size

have increased much faster than the throughput between them, the bottleneck has become

more of a problem, a problem whose severity increases with every newer generation of CPU.

The term "von Neumann bottleneck" was coined by John Backus in his 1977 ACM Turing

Award lecture. According to Backus:

Surely there must be a less primitive way of making big changes in the store

than by pushing vast numbers of words back and forth through the von

Neumann bottleneck. Not only is this tube a literal bottleneck for the data

traffic of a problem, but, more importantly, it is an intellectual bottleneck that

has kept us tied to word-at-a-time thinking instead of encouraging us to think

in terms of the larger conceptual units of the task at hand. Thus programming

is basically planning and detailing the enormous traffic of words through the

von Neumann bottleneck, and much of that traffic concerns not significant data

itself, but where to find it.[13]

The performance problem can be alleviated (to some extent) by several mechanisms.

Providing a cache between the CPU and the main memory, providing separate caches with

separate access paths for data and instructions (the so-called Harvard architecture), and using

branch predictor algorithms and logic are three of the ways performance is increased. The

problem can also be sidestepped somewhat by using parallel computing, using for example

the NUMA architecture—this approach is commonly employed by supercomputers. It is less

clear whether the intellectual bottleneck that Backus criticized has changed much since 1977.

Backus's proposed solution has not had a major influence.[citation needed]

Modern functional

Page 100: Computer Architecture All lecture.pdf

programming and object-oriented programming are much less geared towards "pushing vast

numbers of words back and forth" than earlier languages like Fortran were, but internally, that

is still what computers spend much of their time doing, even highly parallel supercomputers.

[edit] Early von Neumann-architecture computers

The First Draft described a design that was used by many universities and corporations to

construct their computers.[14]

Among these various computers, only ILLIAC and ORDVAC

had compatible instruction sets.

ORDVAC (U-Illinois) at Aberdeen Proving Ground, Maryland (completed Nov

1951[15]

)

IAS machine at Princeton University (Jan 1952)

MANIAC I at Los Alamos Scientific Laboratory (Mar 1952)

ILLIAC at the University of Illinois, (Sept 1952)

AVIDAC at Argonne National Laboratory (1953)

ORACLE at Oak Ridge National Laboratory (Jun 1953)

JOHNNIAC at RAND Corporation (Jan 1954)

BESK in Stockholm (1953)

BESM-1 in Moscow (1952)

DASK in Denmark (1955)

PERM in Munich (1956?)

SILLIAC in Sydney (1956)

WEIZAC in Rehovoth (1955)

[edit] Early stored-program computers

The date information in the following chronology is difficult to put into proper order. Some

dates are for first running a test program, some dates are the first time the computer was

demonstrated or completed, and some dates are for the first delivery or installation.

The IBM SSEC was a stored-program electromechanical computer and was publicly

demonstrated on January 27, 1948. However it was partially electromechanical, thus

not fully electronic.

The Manchester SSEM (the Baby) was the first fully electronic computer to run a

stored program. It ran a factoring program for 52 minutes on June 21, 1948, after

running a simple division program and a program to show that two numbers were

relatively prime.

The ENIAC was modified to run as a primitive read-only stored-program computer

(using the Function Tables for program ROM) and was demonstrated as such on

September 16, 1948, running a program by Adele Goldstine for von Neumann.

The BINAC ran some test programs in February, March, and April 1949, although it

wasn't completed until September 1949.

The Manchester Mark 1 developed from the SSEM project. An intermediate version of

the Mark 1 was available to run programs in April 1949, but it wasn't completed until

October 1949.

The EDSAC ran its first program on May 6, 1949.

The EDVAC was delivered in August 1949, but it had problems that kept it from

being put into regular operation until 1951.

Page 101: Computer Architecture All lecture.pdf

The CSIR Mk I ran its first program in November 1949.

The SEAC was demonstrated in April 1950.

The Pilot ACE ran its first program on May 10, 1950 and was demonstrated in

December 1950.

The SWAC was completed in July 1950.

The Whirlwind was completed in December 1950 and was in actual use in April 1951.

The first ERA Atlas (later the commercial ERA 1101/UNIVAC 1101) was installed in

December 1950.

[edit] Non-von Neumann processors

The NEC µPD7281D pixel processor was the first non-von Neumann microprocessor.[citation

needed]

Perhaps the most common kind of non-von Neumann structure used in modern computers is

content-addressable memory (CAM).

In some cases, emerging memristor technology may be able to circumvent the von Neumann

bottleneck.[16]

[edit] See also

Computer science portal

Harvard architecture

Modified Harvard architecture

Turing machine

Random access machine

Little man computer

CARDboard Illustrative Aid to Computation

Von Neumann syndrome

Interconnect bottleneck

[edit] References

[edit] Inline

1. ^ Copeland (2006) p. 104.

2. ^ MFTL (My Favorite Toy Language) entry Jargon File 4.4.7,

http://catb.org/~esr/jargon/html/M/MFTL.html, retrieved 2008-07-11

3. ^ Turing, A.M. (1936), "On Computable Numbers, with an Application to the

Entscheidungsproblem", Proceedings of the London Mathematical Society, 2 42: 230–65,

1937, doi:10.1112/plms/s2-42.1.230 (and Turing, A.M. (1938), "On Computable Numbers,

with an Application to the Entscheidungsproblem: A correction", Proceedings of the London

Mathematical Society, 2 43: 544–6, 1937, doi:10.1112/plms/s2-43.6.544)

4. ^ The Life and Work of Konrad Zuse Part 10: Konrad Zuse and the Stored Program

Computer, archived from the original on June 1, 2008,

Page 102: Computer Architecture All lecture.pdf

http://web.archive.org/web/20080601160645/http://www.epemag.com/zuse/part10.htm,

retrieved 2008-07-11

5. ^ Lukoff, Herman (1979), From Dits to Bits...: A Personal History of the Electronic

Computer, Robotics Press, ISBN 978-0-89661-002-6

6. ^ ENIAC project administrator Grist Brainerd's December 1943 progress report for the first

period of the ENIAC's development implicitly proposed the stored program concept (while

simultaneously rejecting its implementation in the ENIAC) by stating that "in order to have

the simplest project and not to complicate matters" the ENIAC would be constructed without

any "automatic regulation".

7. ^ Copeland (2006) p. 113

8. ^ Copeland, Jack (2000), A Brief History of Computing: ENIAC and EDVAC,

http://www.alanturing.net/turing_archive/pages/Reference%20Articles/BriefHistofComp.html

#ACE, retrieved 27 January 2010

9. ^ Copeland, Jack (2000), A Brief History of Computing: ENIAC and EDVAC,

http://www.alanturing.net/turing_archive/pages/Reference%20Articles/BriefHistofComp.html

#ACE, retrieved 27 January 2010 which cites Randell, B. (1972), Meltzer, B.; Michie, D.,

eds., "On Alan Turing and the Origins of Digital Computers", Machine Intelligence 7

(Edinburgh: Edinburgh University Press): 10, ISBN 0902383264

10. ^ Copeland (2006) pp. 108-111

11. ^ Bowden (1953) pp. 176,177

12. ^ Bowden (1953) p. 135

13. ^ E. W. Dijkstra Archive: A review of the 1977 Turing Award Lecture,

http://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD692.html, retrieved 2008-07-

11

14. ^ Electronic Computer Project, http://www.ias.edu/spfeatures/john_von_neumann/electronic-

computer-project/[dead link]

15. ^ Illiac Design Techniques, report number UIUCDCS-R-1955-146, Digital Computer

Laboratory, University of Illinois at Urbana-Champaign, 1955

16. ^ Mouttet, Blaise L (2009), "Memristor Pattern Recognition Circuit Architecture for

Robotics", Proceedings of the 2nd International Multi-Conference on Engineering and

Technological Innovation II: 65–70,

http://www.iiis.org/CDs2008/CD2009SCI/CITSA2009/PapersPdf/I086AI.pdf

[edit] General

Bowden, B.V., ed. (1953), "Computers in America", Faster Than Thought: A

Symposium on Digital Computing Machines, London: Sir Isaac Pitman and Sons Ltd.

Rojas, Raúl; Hashagen, Ulf, eds. (2000), The First Computers: History and

Architectures, MIT Press, ISBN 0-262-18197-5

Davis, Martin (2000), The universal computer: the road from Leibniz to Turing, New

York: W W Norton & Company Inc., ISBN 0-393-04785-7

Can Programming be Liberated from the von Neumann Style?, John Backus, 1977

ACM Turing Award Lecture. Communications of the ACM, August 1978, Volume

21, Number 8. Online PDF

C. Gordon Bell and Allen Newell (1971), Computer Structures: Readings and

Examples, McGraw-Hill Book Company, New York. Massive (668 pages).

Copeland, Jack (2006), "Colossus and the Rise of the Modern Computer", in

Copeland, B. Jack, Colossus: The Secrets of Bletchley Park's Codebreaking

Computers, Oxford: Oxford University Press, ISBN 978-0-19-284055-4.

Page 103: Computer Architecture All lecture.pdf

[edit] External links

Harvard vs von Neumann

A tool that emulates the behavior of a von Neumann machine

v · d · eCPU technologies

Architecture

ISA : CISC · EDGE · EPIC · MISC · OISC · RISC · VLIW ·

NISC · ZISC · Harvard architecture · von Neumann

architecture · 4-bit · 8-bit · 12-bit · 16-bit · 18-bit · 24-bit · 31-

bit · 32-bit · 36-bit · 48-bit · 64-bit · 128-bit · Comparison of CPU

architectures

Parallelism

Pipeline

Instruction pipelining · In-order & out-of-

order execution · Register renaming ·

Speculative execution · Hazards

Level Bit · Instruction · Superscalar · Data · Task

Threads

Multithreading · Simultaneous

multithreading · Hyperthreading ·

Superthreading

Flynn's taxonomy SISD · SIMD · MISD · MIMD

Types

Digital signal processor · Microcontroller · System-on-a-chip ·

Vector processor

Components

Arithmetic logic unit (ALU) · Address generation unit (AGU) ·

Barrel shifter · Floating-point unit (FPU) · Back-side bus ·

Multiplexer · Demultiplexer · Registers · Memory management

unit (MMU) · Translation lookaside buffer (TLB) · Cache ·

Register file · Microcode · Control unit · Clock rate

Power management

APM · ACPI · Dynamic frequency scaling · Dynamic voltage

scaling · Clock gating

Retrieved from "http://en.wikipedia.org/wiki/Von_Neumann_architecture"

Categories: Computer architecture | Flynn's Taxonomy | Reference models | Classes of

computers

Hidden categories: All articles with dead external links | Articles with dead external links

from July 2010 | Articles to be merged from October 2010 | All articles to be merged | All

articles with unsourced statements | Articles with unsourced statements from December 2010 |

Articles with unsourced statements from April 2010

Personal tools

Log in / create account

Namespaces

Article

Discussion

Variants

Page 104: Computer Architecture All lecture.pdf

Views

Read

Edit

View history

Actions

Search

Navigation

Main page

Contents

Featured content

Current events

Random article

Donate to Wikipedia

Interaction

Help

About Wikipedia

Community portal

Recent changes

Contact Wikipedia

Toolbox

What links here

Related changes

Upload file

Special pages

Permanent link

Cite this page

Print/export

Create a book

Download as PDF

Printable version

Languages

ية عرب ال

Asturianu

Беларуская

Bosanski

Page 105: Computer Architecture All lecture.pdf

Български

Català

Česky

Deutsch

Ελληνικά

Español

سی ار ف

Français

한국어

Hrvatski

Bahasa Indonesia

Íslenska

Italiano

עברית

Latina

Latviešu

Magyar

Nederlands

日本語

Norsk ( ok l)

Polski

Português

Ro ână

Русский

Shqip

Slovenčina

Српски / Srpski

Srpskohrvatski / Српскохрватски

Suomi

Svenska

ไทย Türkçe

Українська

中文

This page was last modified on 13 February 2011 at 21:43.

Text is available under the Creative Commons Attribution-ShareAlike License;

additional terms may apply. See Terms of Use for details.

Wikipedia® is a registered trade ark of the Wikimedia Foundation, Inc., a non-profit

organization.

Contact us

Privacy policy

About Wikipedia

Disclaimers

Page 106: Computer Architecture All lecture.pdf
Page 107: Computer Architecture All lecture.pdf

Lecture 3

1. Computer components. Hardware and Software programming.

2. The main cycle of instruction processing (MCIP).

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5

th ed. – Upper Saddle River, NJ :

Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL

INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th

ed. - Upper Saddle River, NJ : Prentice Hall, 2002.

Page 108: Computer Architecture All lecture.pdf

Sequence of

Arithmetic

And Logic

Functions

Data Results

Customized

Hardware

Programming in Hardware (Hardwired Program)

Instructions Codes

Instruction

interpreter

General-Purpose Arithmetic and Logic

Functions Results Data

Programming in Software

Control Signals

Page 109: Computer Architecture All lecture.pdf

Program Concept

Hardwired systems are inflexible

General purpose hardware can do different tasks, given correct control signals

Instead of re-wiring, supply a new set of control signals

What is a program?

A sequence of steps

For each step, an arithmetic or logical operation is done

For each operation, a different set of control signals is needed

Function of Control Unit

For each operation a unique code is provided

e.g. ADD, MOVE

A hardware segment accepts the code and issues the control signals

We have a computer!

Page 110: Computer Architecture All lecture.pdf

Components

The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit

Data and instructions need to get into the system and results out

Input/output

Temporary storage of code and results is needed

Main memory

The CPU is typically in control. It exchanges data with memory. For this purpose, it typically

makes use of two internal (to the CPU) registers: a memory address register (MAR), which specifies the

address in memory for the next read or write, and a memory buffer register (MBR), which contains the

data to be written into the memory or receives the data read from the memory. Similarly, an I/O address

register (I/OAR) specifies a particular I/O device. An I/O buffer register is used for the exchange data

between an I/O module and the CPU.

A memory module consists of a set of locations, defined by sequentially numbered addresses. Each

location contains binary number that can be interpreted as either an instruction or data. An I/O module

transfers data from external devices to CPU and memory, and vice versa. It contains internal buffers for

temporarily holding this data until it can be sent on.

Page 111: Computer Architecture All lecture.pdf

Computer Components:

Top Level View

Page 112: Computer Architecture All lecture.pdf

Instruction Cycle

Two steps: Fetch Execute

The instruction fetch consists of reading an instruction from a location in

the memory.

The instruction execution may involve several operations and depends on

the nature of the instruction.

Page 113: Computer Architecture All lecture.pdf

Integer Format

0 1 15

15

0 3 4 15

OpCode Address

S Magnitude

Instruction Format

Program Counter (PC) = Address of Instruction

Instruction Register (IR) = Instruction Being Executed

Accumulator (AC) = Temporary Storage

Internal CPU Registers

0001 = Load AC from Memory

0010 = Store AC to Memory

0101 = Add to AC from Memory

Partial List of OpCodes

Characteristics of Hypothetical Machine

Page 114: Computer Architecture All lecture.pdf

The instruction code is a group of bits that instruct the computer to perform a specific

operation. It is usually divided into parts, each having its own particular interpretation.

The most basic part of an instruction code is its operation part, which defines such

operations as add, subtract, multiply, shift, complement.

The number of bits required for the operation code of an instruction depends on the total

number of operations available in the computer.

At this point we must recognize the relationship between a computer operation and a micro

operation. An operation is a part of an instruction stored in the computer memory. It is a binary

code that tells the computer to perform a specific operation. The control unit receives the

instruction from memory and interprets the operation code bits. It then issues a sequence of

control signals to initiate micro operations in internal computer registers. For every operation

code, the control issues a sequence of micro operations needed for hardware implementation of

the specified operation. For this reason, an operation code is sometimes called a macrooperation

because it specifies a set of micro operations.

The operation must be performed on some data stored in processor registers or in memory.

An instruction code must therefore specify not only the operation but also registers or the

memory words where the operands are to be found, as well as the register or memory word

where the result is to be stored. So, the second part of the instruction code specifies an address,

which tells the control where to find operand in the memory. This operand is read from the

memory and used as the data to be operated on together with data stored in the processor

register.

Page 115: Computer Architecture All lecture.pdf

Fetch Cycle

Program Counter (PC) holds address of next instruction to fetch

Processor fetches instruction from memory location pointed to by PC

Increment PC Unless told otherwise

Instruction loaded into Instruction Register (IR) Processor interprets instruction and performs

required actions

Page 116: Computer Architecture All lecture.pdf

Execute Cycle

Processor-memory data transfer between CPU and main memory

Processor I/O Data transfer between CPU and I/O module

Data processing Some arithmetic or logical operation on data

Control Alteration of sequence of operations e.g. jump

Combination of above

Page 117: Computer Architecture All lecture.pdf

Example of Program Execution

Page 118: Computer Architecture All lecture.pdf

Instruction Cycle -

State Diagram

Page 119: Computer Architecture All lecture.pdf

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

Memory

0003

Registers of CPU

300

300

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

300

300

1940

1

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

301

300

1940

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

301

300

0003

1

Instruction

Fetch

Loading of AC from Memory

Page 120: Computer Architecture All lecture.pdf

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

302

0003

301

5941 5

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

302

0003

301

5941 1

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

301

0003

301

0003

1

Instruction

Fetch

Instruction

Fetch

1940

5941

2941

300

301

302

….

940

0002

941

0003

PC

AC

MAR

MBR

IR

301

0003

300

0003

1

Loading of AC from Memory

Page 121: Computer Architecture All lecture.pdf

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

302

0005

301

0002 5

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

303

0005

302

2941

5

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

302

0005

302

0002 5

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0002

941

0003

303

0005

302

2941

2

3 + 2 = 5

To the contents of AC the number, which has been read from

the memory is adding

Instruction

Fetch

Page 122: Computer Architecture All lecture.pdf

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0005

941

0003

303

0005

302

0005

2

1940

5941

2941

300

301

302

….

940

0005

941

0003

PC

AC

MAR

MBR

IR

303

0005

302

0005

2

Page 123: Computer Architecture All lecture.pdf

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0005

941

0003

303

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0005

941

0003

303

0005

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0005

941

0003

303

0005

PC

AC

MAR

MBR

IR

1940

5941

2941

300

301

302

….

940

0005

941

0003

303

0005

302

0005

2

Page 124: Computer Architecture All lecture.pdf

Работа компьютера состоит в периодическом повторении основного

цикла выполнения команды (ОЦВК) (main instruction cycle processing -

MCIP). В схеме алгоритма используются следующие обозначения:

М(Х) – содержимое ячейки памяти по адресу Х;

(X:Y) – разряды от X до Y , разряды нумеруются от старших к младшим;

Каждый цикл состоит из двух фаз (тактов): фазы извлечения(fetch cycle) и

фазы выполнения(execution cycle).

В течение фазы извлечения (fetch cycle) код операции очередной

команды загружается в регистр команд IR, а содержимое поля адреса этой же

команды – в регистр адреса MAR. Сама команда может быть извлечена либо

из буферного регистра команд IBR, либо из памяти M. В последнем случае

сначала прочитанное из памяти M слово загружается в регистр данных

памяти MBR, а уже из него отдельные компоненты передаются в IBR, IR и

MAR.

Для упрощения электронных схем, обеспечивающих связь с блоком

памяти, потребовалось все операции чтения и записи в память выполнять

через единственную пару регистров, один из которых хранит адрес ячейки, а

второй – слово (операнд), считываемое из памяти или записываемое в память.

После того как код операции будет загружен в регистр команды IR,

наступает черед фазы выполнения (execution cycle). Цепи управления

расшифровывают код операции и посылают соответствующие управляющие

сигналы, которые синхронизируют пересылку данных и выполнение

арифметических или логических операций схемами АЛУ.

Система команд насчитывала 21 команду, которые группировались

следующим образом:

Команды пересылки данных, которые выполняют пересылку

данных из заданной ячейки памяти в один из двух адресуемых

регистров АЛУ (аккумулятор или регистр множимого/частного) или

из этих регистров в заданную ячейку памяти.

Команды перехода (условного/безусловного), изменяют

естественный порядок выполнения команд программы.

Арифметические команды, которые задают выполнение четырех

арифметических действий (некоторые арифметические команды

имеют модификации).

Команды модификации адресной части команды, которые

позволяют выполнять модификацию программы программным путем,

заменяя первоначально установленные значения адресных полей в

командах.

Page 125: Computer Architecture All lecture.pdf

Начало

В IBR есть

следующая

команда?

MAR PC

IBR MBR(20:39)

IR MBR(0:7)

MAR MBR(8:19)

Требуется

команда в

левой части

слова?

MBR M(MAR)

IR IBR (0:7)

MAR IBR (8:19)

MBR M(MAR)

PC PC + 1

AC 0?

PC MAR

MBR M(MAR)

AC MBR AC AC + MBR

IR MBR (20:27)

MAR MBR(28:39)

Фаза извлечения

из памяти

Фаза выполнения

Не

требуется

обращение

к памяти

Да

Да

Нет

Нет

Расшифровка

команды в IR

Если AC 0, то

перейти к М(Х,0:19)

ACAC+М (Х)

Переход к

М(Х,0:19)

Да

Нет

Упрощенная схема ОЦВК в компьютере IAS.

Page 126: Computer Architecture All lecture.pdf

Start

Is there

the next

instruction in

IBR?

MAR PC

IBR MBR(20:39)

IR MBR(0:7)

MAR MBR(8:19)

Is an

instruction in

the left part of

the word

required?

MBR M(MAR)

IR IBR (0:7)

MAR IBR (8:19)

MBR M(MAR)

PC PC + 1

AC 0?

PC MAR

MBR M(MAR)

AC MBR AC AC + MBR

IR MBR (20:27)

MAR MBR(28:39)

Fetch Cycle

Execution Cycle

Storage

request

isn’t

required

.

Yes

Yes

No

No

Decoding of the

instruction in IR

If AC 0, then go to

М(Х,0:19)

ACAC+М (Х)

Jump to

М(Х,0:19)

Yes

No

Simplified Scheme of MCIP in IAS.

Page 127: Computer Architecture All lecture.pdf

Questions to Lecture 3.

1. What’s Hardwired Program? (What’s programming in

Hardware?)

2. What’s Software Program? (What’s programming in

Software?)

3. Describe the functional structure of Computer components

(Top level View) in the eye of Interconnection Subsystem.

What’s the Main Cycle of Instruction Processing (MCIP)?

5. Describe the architecture of “Hypothetical Machine”. What is

the difference between translator and interpreter?

6. Describe each step of MCIP on the “Hypothetical Machine”

for one concrete instruction.

7. Describe each step of MCIP on the IAS for one concrete

instruction.

Page 128: Computer Architecture All lecture.pdf

Lecture 4

Interrupts. The goal of the lecture : analyze and study Interrupts, classes of interrupts, program flow control, interrupt Cycle.

Contents

1. Interrupts. Classes of interrupts.

2. Program Flow Control.

3. Interrupt Cycle.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5

th ed. – Upper Saddle River, NJ :

Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL

INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th

ed. - Upper Saddle River, NJ : Prentice Hall, 2002.

Page 129: Computer Architecture All lecture.pdf

Interrupts

Mechanism by which other modules (e.g. I/O, memory) may interrupt normal sequence of processing. Interrupts are some changes in the control flow caused not by the program itself, but by something other, and usually connected with the I/O process. Interrupt is a temporary cessation of the process caused by an event, which is an external one as regards to this process.

Classes of most common Interrupts:

Program

e.g. overflow, division by zero

Timer

Generated by internal processor timer

I/O

from I/O controller

Hardware failure

e.g. memory parity error

Interrupts are provided primarily as a way to improve processing efficiency.

Page 130: Computer Architecture All lecture.pdf

cv

Program Flow Control

Program Flow Control is an abstraction (some sort of virtual operations) at the set of all possible sequences of

execution in the program.

1, 2 and 3 – code

segments refer to

sequences of

instructions that do not

involve I/O. The WRITE calls are calls

to an I/O program

that is a system utility

and that will perform

the actual I/O

operation.

The I/O operation

consists of three

sections:

4 – a sequence of

instructions which

prepare for the

operation.

I/O command – the

actual I/O command.

5 – a sequence of

instructions which

complete the operation.

Page 131: Computer Architecture All lecture.pdf

The user’s program doesn’t have to contain any special code to accommodate

interrupts; the processor and the operating system are responsible for suspending the

user’s program and then resuming it at the same point.

Interrupt Cycle (IC)

With interrupts mechanism processor can be engaged in executing other instructions

while I/O operation is in progress

IC is added to instruction cycle to accommodate interrupts

Processor checks for interrupt

Indicated by an interrupt signal if interrupts are pending (I/O module sends an interrupt request signal to the processor when the external device becomes ready to be serviced).

If no interrupt, fetch next instruction

If interrupt pending:

Suspend execution of current program

Save context

Set PC to start address of interrupt handler routine

Process interrupt

Restore context and continue interrupted program

Page 132: Computer Architecture All lecture.pdf

START

Fetch next

instruction

Execute instruction

Check for interrupt; Process

interrupt

HALT

Fetch Cycle

Execute Cycle

Interrupt Cycle

Interrupts Enabled Interrupts Disabled

Page 133: Computer Architecture All lecture.pdf

Instruction Cycle (with Interrupts) - State Diagram

Page 134: Computer Architecture All lecture.pdf

1

4 Processor

Wait 5 2 4 5 3

Processor

Wait

1

4

2a

5

2b

4

3a

5

3b

I/O Operation I/O Operation

I/O Operation I/O Operation Time

a)

b)

Program timing: short I/O wait a) Without Interrupts b) With Interrupts

Page 135: Computer Architecture All lecture.pdf

Program timing: long I/O wait

1 2 5 4 4 5 3

1 4 2 5 4 3 5

CPU

Wait

CPU

Wait

CPU

Wait

CPU

Wait

I/O OPERATION I/O OPERATION

I/O OPERATION I/O OPERATION

a)

b)

Time

Program timing: long I/O wait a) Without Interrupts b) With Interrupts

Page 136: Computer Architecture All lecture.pdf

Issue Read

Command to

I/O Module

Issue Read

Command to

I/O Module

Read Status of

I/O Module Read Status of

I/O Module

Check

Status

Check

Status

Read Word from I/O

Module

Read Word from

I/O Module

Write Word

into Memory

Write Word

into Memory

Done?

Done?

CPU I/O

I/O CPU

Error

Condition

I/O CPU

CPU Memory

CPU I/O

Do Something

Else

Interrupt

I/O CPU

Error Condition

I/O CPU

CPU Memory

Issue Read

Block

Command to

DMA

Module

Read Status of

DMA Module

CPU DMA

Do Something

Else

Interrupt

DMA CPU

No

Yes

Ready

Not

Ready

No

Yes

Ready

Next Instruction

Next Instruction

Next Instruction

Direct Memory Access

Algorithm of Data-Block Input

Page 137: Computer Architecture All lecture.pdf

Three I/O Techniques.

Programmed I/O.

With the programmed I/O data is exchanged between the CPU and I/O module. The CPU

executes a program that gives it direct control of the I/O operation, including sensing device

status, sending a read or write command and transferring data.

To execute I/O-related instruction, the CPU issues an address, specifying the particular

module and external device, and I/O command. There are four types of I/O commands that I/O

module may receive when it is addressed by the CPU: control, test, read, and write.

Interrupt-driven I/O.

With interrupt—driven I/O, the CPU issues an I/O command, continues to execute other

instructions, and is interrupted by I/O module when the latter has completed work.

With both programmed and interrupt I/O the CPU is responsible for extracting data from

main memory for output and storing data in main memory for input. Both these forms suffer

from two “inherent drawbacks” («врожденных недостатков»):

1. The I/O transfer rate is limited by the speed with which the CPU can test and service a

device.

2. The CPU is tied up in managing an I/O transfer; a number of instructions must be executed

for each I/O transfer.

Page 138: Computer Architecture All lecture.pdf

Direct Memory Access (DMA).

DMA permits to the I/O module and main memory exchange data directly without CPU

involvement.

DMA involves an additional module on the system bus (DMA controller). The DMA

controller is capable of “mimicking” (подмена) the CPU, indeed, of taking over control of the

system from the CPU. The technique works as follows: when CPU wishes to read or write a

block of data, it issues a command to the DMA, by sending the following information:

Whether a read or write is requested.

The address of the I/O device involved.

The starting location in memory to read from or write to.

The number of words to be read or written.

The CPU then continues with other work. It has delegated this I/O operation to the DMA

module, and that module will take care of it. The DMA transfers the entire block of data, one

word at a time, directly to or from the memory, without going through the CPU, When the

transfer is complete, the DMA sends an interrupt signal to the CPU. Thus the CPU is involved

only at the beginning and the end of the transfer.

Page 139: Computer Architecture All lecture.pdf

Multiple Interrupts

Disable interrupts (Режим запрета прерывания) Processor will ignore further interrupts whilst

processing one interrupt Interrupts remain pending and are checked after first

interrupt has been processed Interrupts handled in sequence as they occur

Define priorities (Режим приоритетного прерывания) Low priority interrupts can be interrupted by higher

priority interrupts When higher priority interrupt has been processed,

processor returns to previous interrupt

Page 140: Computer Architecture All lecture.pdf

Multiple Interrupts - Sequential

This approach is nice and

simple, as interrupts are

handled in strict sequential

order.

The drawback of this approach

is that it doesn’t take into

account relative priority or

time critical needs.

Page 141: Computer Architecture All lecture.pdf

Multiple Interrupts - Nested

This approach is to define priorities for

interrupts and to allow an interrupt of

higher priority (“estate”) to cause a lower-

priority interrupt handler to be itself

interrupted.

Example.

Consider a system with 3 I/O devices:

a printer (priority 2);

a disk (priority 4);

a communication line (priority 5)

Let user program begins at t = 0. At t = 10,

a printer interrupt occurs, user information

is placed on the stack, and execution

continues at the printer interrupt service

routine (ISR). While this routine is still

executing, at t = 15, a communication

interrupt occurs. Since communication

line has higher priority, the interrupt is

honored, the printer ISR is interrupted.

Page 142: Computer Architecture All lecture.pdf

The state of printer is pushed onto the stack, and the execution continues at the communication

ISR. While this routine is executing, a disk interrupt occurs at t = 20. Since this interrupt is of lower

priority it is simply held, and communication ISR runs to completion. When communication ISR is

complete (t = 25) the previous processor state is restored, with the execution of the printer ISR.

However, before even a single instruction in this routine can be executed, the processor honors the

higher priority disk interrupt and control transfers to the disk ISR. Only when that routine is

complete (t = 35) , the printer ISR is resumed. When that routine completes (t = 40), control finally

returns to the user program.

User Program

t=0

t=10

Printer ISR

t = 15

t = 40

Communication

ISR

t=25

Disk ISR

t=25

t= 35

Time sequences of multiple interrupts

Page 143: Computer Architecture All lecture.pdf

DMA request

CPU

Address

decoder

I/O

device

Memory

RD WR Address Data

DMA

controller DMA acknowledge

RD

DS

RS

BR

BG

WR Address WCR

Interrupt

Interrupt

BG

BR

RD WR Address Data

DMA Transfer in Computer System

CR

Page 144: Computer Architecture All lecture.pdf

Address –address register specifies the desired location of a word in memory;

WCR – word-count register specifies the number of word that must be transferred;

CR- control register specifies the mode of transfer;

RD- Read

WR- Write

DS- DMA Select

RS-Register Select Interface registers

BG-Bus Granted

BR –Bus Request

Direct Memory Access Technique.

CPU initializes the DMA by sending the following information through Data Register:

1) the starting address of the Memory block;

2) the word count (number of words in this block);

3) type of operation (Read or Write);

4) a control bit to start the DMA transfer.

After it CPU stops communicating with DMA Controller unless it (CPU) receives an interrupt signal or needs to check how

many words have been transferred.

Literature.

M.M. Mano, C.R. Kime . Logic and Computer Design Fundamentals. Part 2 (pp.557-

561).

Page 145: Computer Architecture All lecture.pdf

Address Space Allocation. Address Space(AS) is a set of addresses, which the

microprocessor is able to generate. The allocation of the general components in the address

space is unified

Volume of AS Physical addr. Segment addr. 00000h 0000h

1 Kb 00400h 0040h

256 byte 00500h 0050h Usual Memory

(640 Kb) A0000h A000h

64 Kb B0000h B000h

32 Kb B8000h B800h

32 Kb C0000h C000h Senior

64 Kb Memory D0000h D000h (384 Kb)

128 Kb F0000h F000h

64 Kb 100000h

64 Kb 10FFF0h Extended Memory

Up to 4 Gb Fig. 1 Typical Allocation of Address Space

Vectors of

Interruptions

Area of BIOS ‘s

Data

Free memory for

Application programs

Graphical Video-Buffer

Free addresses

Text Video-Buffer

Permanent Storage of

BIOS Extensions

Free addresses

Permanent Storage of

BIOS

High Memory Area (HMA)

Extended Memory

Specification (XMS)

Operational System

MS-DOS

Page 146: Computer Architecture All lecture.pdf

Apparatus Organization of Interrupts.

Signals of apparatus interrupts, which appear in computer devices, come in the microprocessor not directly, but

through two interrupt controllers (the Leading Interrupts

Controller and the Driven Interrupts Controller).

The Driven

Interrupts

Controller

The Base

Vector

70h

The Leading

Interrupts

Controller

The Base

Vector

08h

Processor

IQR15

IQR14

IQR13

IQR12

IQR11

IQR10

IQR9

IQR8

IQR7

IQR6

IQR5

IQR4

IQR3

IQR2

IQR1

IQR0 Timer

Keyboard

Mouse

Floppy

Disk

Printer

Hard

Disk

Signal

INT

Vector’s

Number

Page 147: Computer Architecture All lecture.pdf

Pict. Procedure of Interrupt Service.

IP

Interrupt Handler 0

CS

Interrupt Handler 0

IP

Interrupt Handler 1

CS

Interrupt Handler 1

IP

Interrupt Handler

n CS

Interrupt Handler n

IP

CS

Flags

Interrupt Vector 0

Interrupt Vector 1

Interrupt Vector n

Vector

of

Inter-

rupted

Process

Processor

IP

CS

Flags

SP at

the moment of Interrupt

Memory addresses

0

2

4

6

4n

4n +2

Page 148: Computer Architecture All lecture.pdf

Stages of Perfecting Input/Output Subsystem during the process of

Computer System Development.

The First Stage. CPU directly controls all external devices. Now such technology

is used in simplest devices with micro-processors control.

The Second Stage. In a computer system a controller of the external device or an

input/output module is included. During the data exchange CPU uses the

programmable input/output methodology without interrupts. CPU entrusts many

functions of control by some units of external device to the Input/Output module and

gets rid of care of direct interface with the external device.

The Third Stage. Here the same configuration of System is used (as in the

previous stage), but the exchange process is realized on the interruptions base. CPU

doesn’t waste time in vain for waiting the external device will be ready for exchange

by the next in turn portion of data.

The Forth Stage. Input/Output modules have ability of direct access to RAM through

the DMA Controller, and now the exchange process is realized practically without

CPU’s participation. CPU is necessary only for initializing (starting) the seance of

data exchange and receipt a signal of seance’s termination.

The Fifth Stage. On this stage the Input/Output module becomes a processor of

Input/Output which is allotted special rights in the System, it is able to execute some

certain instructions. CPU sends to this module (processor) only instruction “fulfill the

program” (this program is stored in RAM). The module itself runs this program, and

after this program termination it informs CPU about completing the work.

The Sixth Stage. Now the I/O Module is able not only run special programs, but it

is equipped by own block of local memory. So, as a matter of fact, it becomes a

plenipotentiary computer in the Computer System’s composition, though a minimal

participation of CPU is necessary here as well.

Page 149: Computer Architecture All lecture.pdf

Этапы развития подсистемы ввода-вывода в процессе развития

вычислительной техники.

Первый этап. ЦП непосредственно управляет внешними устройствами. В

настоящее время эта технология находит применение в простейших

устройствах с микропроцессорным управлением.

Второй этап. В систему включается контроллер внешнего устройства или

модуль ввода-вывода. При обмене данными ЦП использует методику

программируемого ввода-вывода без прерываний. При этом ЦП передает

большинство функций управления отдельными узлами внешнего устройства

контроллеру или модулю ввода-вывода и избавляется от забот о

непосредственном интерфейсе с внешним устройством.

Третий этап. Используется та же конфигурация системы, что и на втором

этапе, но процесс обмена реализуется на базе прерываний. ЦП не теряет

напрасно время на ожидание, пока внешнее устройство будет готово

обменяться очередной порцией данных (это значительно повысило

производительность компьютера).

Четвертый этап. Модули ввода-вывода получают возможность прямого

доступа к памяти через контроллер DMA. Теперь обмен данными с внешними

устройствами выполняется практически без участия ЦП. ЦП необходим в

этом случае только для запуска сеанса обмена и приема сигнала о завершении

сеанса.

Пятый этап. Модуль ввода-вывода превращается в процессор ввода-

вывода с собственными правами в системе, он способен выполнять

специализированные инструкции. ЦП передает этому модулю (процессору)

только указание выполнить программу, которая размещена в оперативной

памяти. Модуль самостоятельно выполняет эту программу. После завершения

программы модуль оповещает ЦП о завершении своей работы.

Шестой этап. Теперь модуль ввода-вывода способен не только выполнять

специализированную программу, но и оснащается своим собственным

блоком локальной памяти. Таким образом он становится, по существу,

полноценным компьютером в составе вычислительного комплекса. Такой

специализированный компьютер может обеспечивать обмен с множеством

внешних устройств при минимальном участии ЦП в этом процессе.

Page 150: Computer Architecture All lecture.pdf

Questions to Lecture № 4.

1. What do we mean under the Interrupts? What is the main

reason of using the Interrupt Mechanism?

2. Draw up diagrams of the Program Flow Control without

interrupts and with interrupts, describe each fragment of the

Program Flow Control.

3. Which classes of interrupts must be enabled constantly? (give

explanation)

4. Describe the mechanism of work with interrupts.

5. In the diagram “Program Flow Control” find points, which

correspond to interrupts of user’s program and explain the

necessity of using these interrupts.

6. How many techniques of I/O operations execution are used?

Describe each of these techniques and compare them.

7. Which approaches can be taken to dealing with multiple

interrupts? Show advantages and disadvantages of these

approaches.

Page 151: Computer Architecture All lecture.pdf

Lecture 5

I.

1. Interconnections of base computer components through the bus.

2. Bus structure.

3. Bus hierarchy.

II.

1. Elements of Bus Design (Types, Methods of Arbitration, Timing).

2. PCI bus. Instructions of PCI bus. Data transaction and arbitration of PCI bus.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5

th ed. – Upper Saddle River, NJ :

Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL

INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th

ed. - Upper Saddle River, NJ : Prentice Hall, 2002.

Page 152: Computer Architecture All lecture.pdf

Connecting

In effect, a computer is a network of basic modules (CPU, Memory,

I/O), thus, there must be paths for connecting the modules together.

The way of connecting the various modules is called the

interconnection structure.

All the units must be connected

Different types of connections for different types of units

Memory

Input/Output

CPU

Page 153: Computer Architecture All lecture.pdf

Memory Connection

Receives and sends data

Receives addresses (of locations)

Receives control signals

Read

Write

Timing

Memory

0

1

.

. N-1

N Words

Read

Write

Address

Data

Data

Page 154: Computer Architecture All lecture.pdf

Input/Output Connection(1)

Similar to memory from computer’s viewpoint

Output

Receive data from computer

Send data to peripheral

Input

Receive data from peripheral

Send data to computer

Input/Output Connection(2)

Receive control signals from computer

Send control signals to peripherals

e.g. spin disk

Receive addresses from computer

e.g. port number to identify peripheral

Send interrupt signals (control)

Address

Internal Data

External Data

Internal Data

External Data

Interrupt Signals

I/O Module

M Ports

Read

Write

Page 155: Computer Architecture All lecture.pdf

CPU Connection

Reads instructions and data

Writes out data (after processing)

Sends control signals to other units

Receives (& acts on) interrupts

Instructions

Data

Interrupt Signals

Control Signals

Data

CPU Address

Page 156: Computer Architecture All lecture.pdf

The interconnection structure is determined by character of exchange

operations, which are specific for each module.

Major forms of input and output for the modules: Memory: Typically, a memory module will consists of N words of equal length. Each word

is assigned a unique numerical address (0, 1, …, N-1). A word of data can be read from or

written into the memory. The nature of the operations is indicated by READ or WRITE

control signals. The location for the operation is specified by an address.

I/O Module: It’s functionally similar to the memory (from internal point of view).

There are two operations READ and WRITE. Further, an I/O module may control more

than one external device. We can refer to each of the interfaces to an external device as a port

and give each a unique address (e.g., 0, 1, 2.,…, M-1). In addition, there are external data

paths for the input and output of data with an external device. Finally, an I/O module may be

able to send interrupt signals to the CPU.

CPU: CPU reads in instructions and data, writes out data after processing, and uses

control signals to control the overall operation of the system. It also receives

interrupt signals.

Page 157: Computer Architecture All lecture.pdf

Types of transfers supported by interconnection structure.

Memory to CPU: The CPU reads an instruction or unit of data

from memory.

CPU to Memory: The CPU writes a unit of data to memory.

I/O to CPU: The CPU reads data from I/O device via an I/O

module.

CPU to I/O: The CPU sends data to the I/O device.

I/O to or from the Memory: For these two cases, an I/O

module is allowed to exchange data directly with memory,

without going through the CPU, using direct memory access

(DMA).

Page 158: Computer Architecture All lecture.pdf

Multiplexer is a functional device which permits to two or more channels of data link to use the same common device of data transfer jointly.

Buses

There are a number of possible interconnection systems

Single and multiple BUS structures are most common

e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP)

Page 159: Computer Architecture All lecture.pdf

What is a Bus?

A bus is a set of electric pathways and service

electronic devices (framing), providing exchange

of data among computer units and devices.

A communication pathway connecting two or more devices is a bus.

Often grouped

A number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels

Power lines may not be shown

What do buses look like?

Parallel lines on circuit boards

Ribbon (ленточный) cables

Strip (полоса) connectors on mother boards e.g. PCI (Peripheral Component Interconnect)

Sets of wires

Page 160: Computer Architecture All lecture.pdf

A system bus consists, typically, of from 50 to 100 separate lines, which can be

classified into three functional groups: data, address and control lines (power lines are

usually omitted ).

Data Bus (Line)

The data lines provide a path for moving data between system modules. Number of lines is

referred as WIDTH of the data bus (the number of lines determines how many bits can be

transferred at a time)

Carries data

Remember that there is no difference between “data” and “instruction” at this level!

Width of Data Bus is a key determinant of the system performance

8, 16, 32, 64 bit

Bus Structure

Page 161: Computer Architecture All lecture.pdf

Address Bus (Line)

Identify the source or destination of data

(e.g. CPU needs to read an instruction (data) from a given location in memory)

Address Bus width determines maximum memory capacity of the system.

Used to address as the Main Memory, so I/O ports (the higher-order bits are used to select a particular module on the bus, and the lower-order bits select an address in the Memory or I/O port within the module). E.g., if a width of a bus is equal to 8, then codes 01111111 and less specify cells addresses in the Main

Memory module (module with 0 address), and codes from 10000000 and higher specify I/O ports which are under control of a module with an address 1.

Page 162: Computer Architecture All lecture.pdf

Command signals specify operations to be performed. Typical control lines include:

Memory Write: Causes data on the bus to be written into the addressed location.

Memory Read: Causes data from the addressed location to be placed on the bus.

I/O Write: Causes data on the bus to be output to the addressed I/O port.

I/O Read: Causes data from the addressed I/O port to be placed on the bus.

Transfer ACK: Indicates that data have been accepted from or placed on the bus.

Bus Request: Indicates that a module needs to gain control of the bus.

Bus Grant: Indicates that a requesting module has been granted control of the bus.

Interrupt request: Indicates that interrupt is pending.

Interrupt ACK: Acknowledges that the pending interrupt has been recognized.

Control Bus(Line)

Is used to control the access to and the use of the data and

address lines.

Control and timing information(indicate validity of data and address information)

Memory read/write signal

Interrupt request

Page 163: Computer Architecture All lecture.pdf

Clock: Used to synchronize operations.

Reset: Initializes all modules.

The operation of any bus is as follows:

If one of the modules “wishes” to send data to another, it must do two things:

1. Obtain the use of the bus;

2. Transfer data through the bus.

If one of the modules “wishes” to receive data from the other module it must do:

1. Obtain the use of the bus;

2. Send request to the other module, by putting the corresponding code on the

address lines after formation signals on the certain control lines.

Computer systems contain a number of different buses that provide pathways

between components at various levels of the computer systems hierarchy.

Page 164: Computer Architecture All lecture.pdf

A bus that connects major computer components

(CPU, Memory, I/O) is called a System Bus.

Bus Interconnection Scheme

System

Bus

Page 165: Computer Architecture All lecture.pdf

CPU

Memory

.

.

.

I/O Module

Bus

Boards

Page 166: Computer Architecture All lecture.pdf

Buses Hierarchy.

Single Bus Problems:

Lots of devices on one bus leads to:

Propagation delays (задержки распространения).

Long data paths mean that co-ordination of bus use can

adversely affect (неблагоприятно сказываться)

performance (dynamic characteristics become worse).

If aggregate data transfer approaches bus capacity the

system’s work may become unreliable.

Most systems use multiple buses organised by hierarchy

principle to overcome these problems

Page 167: Computer Architecture All lecture.pdf

Traditional Bus Architecture (ISA –Industry Standard

Architecture) (with cache)

Page 168: Computer Architecture All lecture.pdf

Up to now the Traditional Bus Architecture has been widely used. In this case the Computer

System includes Local Bus, which connects the CPU, Cache Memory and some peripheral

devices. Cache Memory Controller provides connections not only with the Local Bus, but with

the System Bus as well (all modules of the Main Memory are connected with the System Bus).

Under such structure all processes of input-output are realized through the System Bus omitting

the CPU, it allows the CPU to perform more important operations.

The connecting peripheral devices not directly to the System Bus, but to additional bus -

Expansion Bus, which buffers data circulating between the Main Memory and peripheral

devices’ controllers allows to support a large variety of external devices, and at the same time to

separate information-flows “CPU – Memory” and “ Memory – I/O Controllers”.

The appearance of new high-performance external devices demands to increase speed of data

transfer through buses, that is why one more High-Speed Bus is often used in contemporary

computer systems. This bus unites high-speed external devices and is connected with the System

Bus through special concordance module (модуль согласования) - Bridge. Such kind of

structure is called Mezzanine Architecture (Мезонинная Архитектура).

The advantage of this structure: high-speed peripheral devices are integrated with the

processor and at the same time they may work independently (themselves). It means that

functioning of the bus doesn’t depend on the CPU architecture and vice versa.

Page 169: Computer Architecture All lecture.pdf

SCSI- Small Computer System Interface; LAN – Local Area Network; PI394 – Peripheral Interface (high-speed)

High Performance Bus

Page 170: Computer Architecture All lecture.pdf

Bus Types

Dedicated

Separate data & address lines

Multiplexed

Shared Lines

Address valid or Data valid control lines

Advantage – fewer lines

Disadvantages

More complex control

Reduction in performance

Physically Dedicated

The use of multiple buses, each of which may connect only some

certain modules (Expansion Bus, High-Speed Bus)

Advantage – high throughput (there is less bus connection)

Disadvantage – increased size and cost of the system.

Page 171: Computer Architecture All lecture.pdf

Bus Arbitration

More than one module controlling the bus

e.g. CPU and DMA controller

Only one module may control bus at one time (to be a master)

Arbitration may be centralised or distributed

Centralised

Arbitration

Single hardware device controlling bus access

Bus Controller

Arbiter

May be part of CPU or separate

Distributed Arbitration

Each module may claim the bus

Control logic on all modules

Page 172: Computer Architecture All lecture.pdf

Timing

Co-ordination of events on bus

Synchronous

Events determined by clock signals

Control Bus includes clock line

A single 1-0 is a bus cycle

All devices can read clock line

Usually sync on leading edge

Usually a single cycle for an event

Asynchronous

Scheme for controlling data transfers on the bus is based on the use of handshake(квитирование) between the initiator and the target.

The clock line is replaced by two timing control lines “READY” (“MSYN”) and “ACCEPT” (“SSYN”).

Page 173: Computer Architecture All lecture.pdf

Synchronous Timing Diagram

Read Operation

The CPU issues Read

signal and places memory

Address on the address bus,

issues a Start signal to

mark the presence (validity)

of the address. The memory

module recognizes the

address and after a delay of

1 bus cycle it places the

Data and Acknowledge

signal on the bus

Page 174: Computer Architecture All lecture.pdf

Timing refers to the way in which events are coordinated on the bus.

Asynchronous Timing Diagram

Read Operation

The CPU places Address and

Read signals on the bus.

After pausing for the signals

to stabilize, it issues an

MSYN (master sync) signal,

indicating the presence of

valid address and control

signals. The memory module

responds with Data and

SSYN (slave) signal,

indicating the response.

Page 175: Computer Architecture All lecture.pdf

With synchronous timing the occurrence of events on the bus is determined by

a clock.

The bus includes a clock line upon which a clock transmits a regular sequence of alternating 1s

and 0s of equal duration. A single 1-0 transmission is referred to as a clock cycle (bus

cycle) and defines a time slot (интервал). All other devices on the bus can read the clock

line, and all events start at the beginning of a clock cycle. Other bus signals may change at the

leading edge of the clock signal.

With asynchronous timing the occurrence of one event on a bus follows

and depends on the occurrence of a previous event.

Synchronous timing Asynchronous timing

Advantages Disadvantages Advantages Disadvantages

Simple to implement

and test

Less flexible: all devices

are tied to a fixed clock

rate

Flexible More complex to

implement and test

The system can’t take

advantage of advances in

device performance

Allow to use newer

technology; mixture of

slow and fast devices

In actual implementations, electronic switches are used. The output gate of

register is capable of being electrically disconnected from the bus or placing a

Page 176: Computer Architecture All lecture.pdf

0 or a 1 on the bus. Because it supports these three possibilities, such a gate is

said to have a three—state output. A separate control input is used either to

enable the gate output to drive the bus to 0 or to 1 or to put it in a high-

impedance (electrically disconnected) state. The latter state corresponds to the

open-circuit state of a mechanical switch.

Page 177: Computer Architecture All lecture.pdf

PCI Bus

Peripheral Component Interconnection, high-bandwidth, processor-independent, functions as a mezzanine or peripheral bus

Intel released to public domain

32 or 64 bit, 33 (66)MHz, a transfer rate 264(528) Mbytes/sec

50 lines

Page 178: Computer Architecture All lecture.pdf

PCI Bus Lines (required)

1. Systems lines Including clock and reset

2. Address & Data 32 time lines for address/data Interrupt & validate lines

3. Interface Control Control the timing transactions and provide coordination among

initiators and targets

4. Arbitration Not shared Direct connection to PCI bus arbiter

5. Error lines

Page 179: Computer Architecture All lecture.pdf

PCI Bus Lines (Optional)

Interrupt lines Not shared

Cache support

64-bit Bus Extension Additional 32 lines Time multiplexed 2 lines to enable devices to agree to use 64-bit transfer

JTAG/Boundary Scan For testing procedures

Page 180: Computer Architecture All lecture.pdf

PCI Commands

Transaction between initiator (master) and target

Master claims bus

Determine type of transaction e.g. I/O read/write

Address phase One or more data phases

Page 181: Computer Architecture All lecture.pdf

PCI Read Timing Diagram

All events are synchronized to the falling transitions of the

clock, which occur in the middle of each clock cycle.

The following are significant events, labeled on the diagram:

a. The master begins transaction by asserting FRAME (this

PCI signal indicates the start and duration of a transaction.

It is asserted at the start and unasserted when the initiator

(master) is ready to begin the final data phase). The master

also puts the start address on the AD (address line, which

is multiplexed and used for address and data transfer, 64

bits). On the C/BE lines (these multiplexed lines indicate

which of the four bytes lanes carry meaningful data) the

master puts the READ command.

b. At the start of clock 2, the target will recognize its address.

c. The master ceases driving the AD bus, changes the

information on the C/BE lines. To designate which AD lines

are to be used for transfer for currently addressed data. The

initiator also asserts IRDY (Initiator Ready. Driven by current

bus master. During READ operation it indicates that the

master is prepared to accept data; during a WRITE operation

it indicates that valid data is present on AD). d. The selected target asserts DEVSEL (Device Select.

Asserted by target when it has recognized its address.

Indicates to current initiator, whether any device has been

selected) to indicate that it has recognized its address, it also

places the requested data on the AD and asserts TRDY to

indicate that valid data is present

e. The initiator reads data at the beginning of clock 4 and changes the bus enable lines as needed in preparation for the next READ.

f. The target deasserts TRDY to signal the initiator that there will not be new data during the coming cycle.

g. The target places the third data item on the bus, but the initiator is not yet ready to read data item, therefore it deasserts IRDY; this will cause

the target to maintain the third data item on the bus for an extra clock cycle.

Page 182: Computer Architecture All lecture.pdf

PCI Bus Arbitration

h. The initiator “knows” that the third data is the last, and so it deasserts FRAME to signal the target that this is the last data transfer, it also

asserts IRDY to signal that it is ready to complete the transfer.

i. The initiator deasserts IRDY, returning the bus to the idle state, and the target deasserts TRDY and DEVSEL.

PCI makes use of centralized, synchronous arbitration

scheme in which each master has unique request (REQ)

and grant (GNT) signals. These signal lines are attached to

a central arbiter and a simple request-grant is used to

grant access to the bus.

When two devices A and B are arbitrating for the bus,

the following sequence occurs:

a. At some point prior to the start of clock 1, A has

asserted its REQ signal. The arbiter samples this signal

at the beginning of clock cycle 1.

b. During clock cycle 1, B requests use of the bus by

asserting its REQ signal.

c. At the same time, the arbiter asserts GNT-A to grant

Bus access to A.

d. Bus master A samples GNT-A at the beginning of

clock 2 and learns that it has been granted bus access.

It also finds IRDY and TRDY unasserted, indicating

that bus is idle. Accordingly, it asserts FRAME and

places the address information on the address bus and

the command on the C/BE bus, it also continues to

assert REQ—A, because it has a second transaction to

perform after this one.

e. The bus arbiter samples all GNT lines at the beginning

of clock 3 and makes an arbitration decision to grant

the bus to B for the next transaction. It then asserts

GNT-B and deasserts GNT-A. B will not be able to

use the bus until it returns to an idle state.

Page 183: Computer Architecture All lecture.pdf

f.. A deasserts FRAME to indicate that the last data transfer is in progress It puts the data on the data bus and signals the target with IRDY.

The target reds the data at the beginning of the next clock cycle.

g. At the beginning of clock 5, B finds IRDY and FRAME deasserted and so it is able to take control of the bus by asserting FRAME. It

also deasserts its REQ line because it only wants to perform one transaction .

Page 184: Computer Architecture All lecture.pdf

Lecture 5 System Buses.

I. 1. Interconnections of base computer components through the bus.

2. Bus structure.

3. Bus hierarchy.

II.

1. Elements of Bus Design (Types, Methods of Arbitration, Timing).

2. PCI bus. Instructions of PCI bus. Data transaction and arbitration of PCI bus.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th

ed. – Upper Saddle River, NJ :

Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL

INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th

ed. - Upper Saddle River, NJ : Prentice Hall, 2002.

Page 185: Computer Architecture All lecture.pdf

Connecting In effect, a computer is a network of basic modules (CPU, Memory, I/O), thus, there must be paths for connecting the

modules together. The way of connecting the various modules is called the interconnection structure.

All the units must be connected

Different types of connections for different types of units

Memory

Input/Output

CPU

Memory Connection

Receives and sends data

Receives addresses (of locations)

Receives control signals

Read

Write

Timing Memory

0

1

.

. N-1

Read

Write

Address

Data

Data

Page 186: Computer Architecture All lecture.pdf

Input/Output Connection(1)

Similar to memory from computer’s viewpoint

Output

Receive data from computer

Send data to peripheral

Input

Receive data from peripheral

Send data to computer

Input/Output Connection(2)

Receive control signals from computer

Send control signals to peripherals

e.g. spin disk

Receive addresses from computer

e.g. port number to identify peripheral

Send interrupt signals (control)

Address

Internal Data

External Data

Internal Data

External Data

Interrupt Signals

I/O Module

M Ports

Read

Write

Page 187: Computer Architecture All lecture.pdf

The interconnection structure is determined by character of exchange

operations, which are specific for each module.

CPU Connection

Reads instructions and data

Writes out data (after processing)

Sends control signals to other units

Receives (& acts on) interrupts

Instructions

Data

Interrupt Signals

Control Signals

Data

CPU

Page 188: Computer Architecture All lecture.pdf

Major forms of input and output for the modules: Memory: Typically, a memory module will consists of N words of equal length. Each word is assigned a unique

numerical address (0, 1, …, N-1). A word of data can be read from or written into the memory. The nature of the

operations is indicated by READ or WRITE control signals. The location for the operation is specified by an address.

I/O Module: It’s functionally similar to the memory (from internal point of view). There are two operations READ and

WRITE. Further, an I/O module may control more than one external device. We can refer to each of the interfaces to an

external device as a port and give each a unique address (e.g., 0, 1, 2.,…, M-1). In addition, there are external data paths

for the input and output of data with an external device. Finally, an I/O module may be able to send interrupt signals to

the CPU.

CPU: CPU reads in instructions and data, writes out data after processing, and uses control signals to control the overall

operation of the system. It also receives interrupt signals.

Types of transfers supported by interconnection structure.

Memory to CPU: The CPU reads an instruction or unit of data from memory.

CPU to Memory: The CPU writes a unit of data to memory.

I/O to CPU: The CPU reads data from I/O device via an I/O module.

CPU to I/O: The CPU sends data to the I/O device.

I/O to or from the Memory: For these two cases, an I/O module is allowed to exchange data directly

with memory, without going through the CPU, using direct memory access (DMA).

Page 189: Computer Architecture All lecture.pdf

Multiplexer is a functional device which permits to two or more channels of data link to use the same common device

of data transfer jointly.

Buses

There are a number of possible interconnection systems

Single and multiple BUS structures are most common

e.g. Control/Address/Data bus (PC)

e.g. Unibus (DEC-PDP)

What is a Bus? A bus is a set of electric pathways and service electronic devices (framing), providing exchange of data among computer units and

devices.

A communication pathway connecting two or more devices is a bus.

Often grouped

A number of channels in one bus

e.g. 32 bit data bus is 32 separate single bit channels

Power lines may not be shown

What do buses look like?

Parallel lines on circuit boards

Ribbon (ленточный) cables

Strip (полоса) connectors on mother boards

e.g. PCI (Peripheral Component Interconnect)

Sets of wires

Page 190: Computer Architecture All lecture.pdf

A system bus consists, typically, of from 50 to 100 separate lines, which can be classified into three functional groups: data,

address and control lines (power lines are usually omitted ).

Data Bus (Line) The data lines provide a path for moving data between system modules. Number of lines is referred as WIDTH of the data bus (the number of

lines determines how many bits can be transferred at a time)

Carries data

Remember that there is no difference between “data” and “instruction” at this level!

Width of Data Bus is a key determinant of the system performance

8, 16, 32, 64 bit Address Bus (Line)

Identify the source or destination of data

(e.g. CPU needs to read an instruction (data) from a given location in memory)

Address Bus width determines maximum memory capacity of the system.

Used to address as the Main Memory, so I/O ports (the higher-order bits are used to select a particular module on the bus, and the lower-order bits select an address in the Memory or I/O port within the module).

E.g., if a width of a bus is equal to 8, then codes 01111111 and less specify cells addresses in the Main Memory module (module with 0

address), and codes from 10000000 and higher specify I/O ports which are under control of a module with an address 1.

Bus Structure

Page 191: Computer Architecture All lecture.pdf

Command signals specify operations to be performed. Typical control lines include:

Memory Write: Causes data on the bus to be written into the addressed location.

Memory Read: Causes data from the addressed location to be placed on the bus.

I/O Write: Causes data on the bus to be output to the addressed I/O port.

I/O Read: Causes data from the addressed I/O port to be placed on the bus.

Transfer ACK: Indicates that data have been accepted from or placed on the bus.

Bus Request: Indicates that a module needs to gain control of the bus.

Bus Grant: Indicates that a requesting module has been granted control of the bus.

Interrupt request: Indicates that interrupt is pending.

Interrupt ACK: Acknowledges that the pending interrupt has been recognized.

Clock: Used to synchronize operations.

Reset: Initializes all modules.

The operation of any bus is as follows:

If one of the modules “wishes” to send data to another, it must do two things:

1. Obtain the use of the bus;

2. Transfer data through the bus.

If one of the modules “wishes” to receive data from the other module it must do:

1. Obtain the use of the bus;

2. Send request to the other module, by putting the corresponding code on the address lines after formation

signals on the certain control lines.

Control Bus(Line) Is used to control the access to and the use of the data and address lines.

Control and timing information(indicate validity of data and address information)

Memory read/write signal

Interrupt request

Page 192: Computer Architecture All lecture.pdf

Computer systems contain a number of different buses that provide pathways between components at various levels of

the computer systems hierarchy. A bus that connects major computer components (CPU, Memory, I/O) is called a System Bus.

Bus Interconnection Scheme

System

Bus

Page 193: Computer Architecture All lecture.pdf

CPU

Memory

.

.

. I/O Module

Bus

Boards

Page 194: Computer Architecture All lecture.pdf

Buses Hierarchy.

Single Bus Problems:

Lots of devices on one bus leads to:

Propagation delays (задержки распространения).

Long data paths mean that co-ordination of bus use can adversely affect (неблагоприятно сказываться)

performance (dynamic characteristics become worse).

If aggregate data transfer approaches bus capacity the system’s work may become unreliable.

Most systems use multiple buses organised by hierarchy principle to overcome these problems

Page 195: Computer Architecture All lecture.pdf

Traditional Bus Architecture (ISA –Industry Standard Architecture) (with cache)

Page 196: Computer Architecture All lecture.pdf

Up to now the Traditional Bus Architecture has been widely used. In this case the Computer System includes Local Bus,

which connects the CPU, Cache Memory and some peripheral devices. Cache Memory Controller provides connections not

only with the Local Bus, but with the System Bus as well (all modules of the Main Memory are connected with the System

Bus). Under such structure all processes of input-output are realized through the System Bus omitting the CPU, it allows

the CPU to perform more important operations.

The connecting peripheral devices not directly to the System Bus, but to additional bus -Expansion Bus, which buffers

data circulating between the Main Memory and peripheral devices’ controllers allows to support a large variety of external

devices, and at the same time to separate information-flows “CPU – Memory” and “ Memory – I/O Controllers”.

The appearance of new high-performance external devices demands to increase speed of data transfer through buses, that

is why one more High-Speed Bus is often used in contemporary computer systems. This bus unites high-speed external

devices and is connected with the System Bus through special concordance module (модуль согласования) - Bridge. Such

kind of structure is called Mezzanine Architecture (Мезонинная Архитектура).

The advantage of this structure: high-speed peripheral devices are integrated with the processor and at the same time they

may work independently (themselves). It means that functioning of the bus doesn’t depend on the CPU architecture and vice

versa.

Page 197: Computer Architecture All lecture.pdf

SCSI- Small Computer System Interface; LAN – Local Area Network; PI394 – Peripheral Interface (high-speed)

High Performance Bus

Page 198: Computer Architecture All lecture.pdf

Bus Types

Dedicated

Separate data & address lines

Multiplexed

Shared Lines

Address valid or Data valid control lines

Advantage – fewer lines

Disadvantages

More complex control

Reduction in performance

Physically Dedicated

The use of multiple buses, each of which may connect only some certain modules (Expansion Bus, High-Speed

Bus)

Advantage – high throughput (there is less bus connection)

Disadvantage – increased size and cost of the system.

Bus Arbitration

More than one module controlling the bus

e.g. CPU and DMA controller

Only one module may control bus at one time (to be a master)

Arbitration may be centralised or distributed

Page 199: Computer Architecture All lecture.pdf

Centralised Arbitration

Single hardware device controlling bus access

Bus Controller

Arbiter

May be part of CPU or separate

Distributed Arbitration

Each module may claim the bus

Control logic on all modules

Timing

Co-ordination of events on bus

Synchronous

Events determined by clock signals

Control Bus includes clock line

A single 1-0 is a bus cycle

All devices can read clock line

Usually sync on leading edge

Usually a single cycle for an event

Asynchronous

Scheme for controlling data transfers on the bus is based on the use of handshake(квитирование)

between the initiator and the target.

The clock line is replaced by two timing control lines “READY” (“MSYN”) and “ACCEPT”

Page 200: Computer Architecture All lecture.pdf

Synchronous Timing Diagram

Read Operation The CPU issues Read signal

and places memory Address

on the address bus, issues a

Start signal to mark the

presence (validity) of the

address. The memory module

recognizes the address and

after a delay of 1 bus cycle it

places the Data and

Acknowledge signal on the

bus

Page 201: Computer Architecture All lecture.pdf

Asynchronous Timing Diagram

Read Operation The CPU places Address and

Read signals on the bus. After

pausing for the signals to stabilize,

it issues an MSYN (master sync)

signal, indicating the presence of

valid address and control signals.

The memory module responds with

Data and SSYN (slave) signal,

indicating the response.

Page 202: Computer Architecture All lecture.pdf

Timing refers to the way in which events are coordinated on the bus.

With synchronous timing the occurrence of events on the bus is determined by a clock.

The bus includes a clock line upon which a clock transmits a regular sequence of alternating 1s and 0s of equal duration. A

single 1-0 transmission is referred to as a clock cycle (bus cycle) and defines a time slot (интервал). All other devices

on the bus can read the clock line, and all events start at the beginning of a clock cycle. Other bus signals may change at the

leading edge of the clock signal.

With asynchronous timing the occurrence of one event on a bus follows and depends on the occurrence of a

previous event.

Synchronous timing Asynchronous timing

Advantages Disadvantages Advantages Disadvantages

Simple to implement

and test

Less flexible: all devices

are tied to a fixed clock

rate

Flexible More complex to

implement and test

The system can’t take

advantage of advances in

device performance

Allow to use newer

technology; mixture of

slow and fast devices

In actual implementations, electronic switches are used. The output gate of register is capable of being electrically

disconnected from the bus or placing a 0 or a 1 on the bus. Because it supports these three possibilities, such a gate is said to

have a three—state output. A separate control input is used either to enable the gate output to drive the bus to 0 or to 1 or to

put it in a high-impedance (electrically disconnected) state. The latter state corresponds to the open-circuit state of a

mechanical switch.

PCI Bus

Peripheral Component Interconnection, high-bandwidth, processor-independent, functions as a mezzanine or peripheral bus

Intel released to public domain

32 or 64 bit, 33 (66)MHz, a transfer rate 264(528) Mbytes/sec

50 lines

Page 203: Computer Architecture All lecture.pdf

PCI Commands

PCI Bus Lines (required)

1. Systems lines

Including clock and reset

2. Address & Data

32 time lines for address/data

Interrupt & validate lines

3. Interface Control

Control the timing transactions

and provide coordination among

initiators and targets

4. Arbitration

Not shared

Direct connection to PCI bus

arbiter

5. Error lines

PCI Bus Lines (Optional)

Interrupt lines

Not shared

Cache support

64-bit Bus Extension

Additional 32 lines

Time multiplexed

2 lines to enable

devices to agree to use

64-bit transfer

JTAG/Boundary Scan

For testing procedures

Transaction between initiator (master) and target

Master claims bus

Determine type of transaction

e.g. I/O read/write

Address phase

One or more data phases

Page 204: Computer Architecture All lecture.pdf

Synchronous Bus (SB) On a SB all devices derive timing information from a common clock line.

Equally spaced pulses on this line define equal time intervals; each interval constitutes a bus cycle, during which one

data transfer can take place.

Such a scheme is illustrated below. In the scheme the address and data lines are shown as high and low at the same time

(this indicates that some lines are high and some low, depending on the particular address or data pattern being transmitted).

The crossing points indicate the times at which these patterns change.

A signal line in an indeterminate state (or high impedance state) is represented by an intermediate level halfway between

the low and high signal levels.

The sequence of events during an input (read) operation.

At time t0 the processor places the device address on the address lines and sets the mode control lines to indicate an input

operation. This information travels over the bus at a speed determined by its physical and electrical characteristics. The

clock pulse width t1 –t0 should be chosen such, that it is greater than maximum propagation delay between the CPU and any

devices connected to the bus. It should also be wide enough to allow all devices to decode the address and control signals so

that the addressed device can be ready to respond at time t1. The addressed device, recognizing, that an input operation is

requested, places its input data on the data lines at time t1 . At the end of the clock cycle, that is at time t2, the CPU

strobes the data lines and loads the data into its input buffer (here, “strobe” means to determine the value of the data at a

given instant). For data to be loaded correctly into a storage device, the data must be available at the input of that device for

a period greater than the setup time of the device. Hence, the period t2 –t1 must be greater than the maximum propagation

time on the bus plus the setup time of the input buffer register of the CPU.

The procedure for an output operation is similar to that for the input sequence. The processor places the output data on

the data lines when it transmits the address and the mode information. At time t1, the addressed device strobes the data lines

and loads the data into its data buffer.

The synchronous bus scheme is simple and results in a simple design for the device interface. The clock speed must be

chosen such that it accommodates the longest delays on the bus and the slowest interface. Note, that the CPU has no way of

determining whether the addressed device has actually responded. It simply assumes that at t2 the output data have been

Page 205: Computer Architecture All lecture.pdf

received by I/O device or the input data are available on the data lines; if, because of malfunction, the device does not

respond, the error will not be detected.

Page 206: Computer Architecture All lecture.pdf

Asynchronous Bus (AsB)

An alternative scheme for controlling data transfers on the bus is based on the

use of handshake between the processor and the device being addressed. The

common clock is eliminated; hence, the resulting bus operation is

asynchronous. The clock line is replaced by two timing control lines, which we

refer as Ready and Accept. In principle, a data transfer controlled by a

handshake protocol proceeds as follows:

The processor places the address and mode information on bus;

Then it indicates to all devices that it has done so by activating the Ready

line;

When the addressed device receives the Ready signal, it performs the

required operation;

After this the addressed device informs the processor it has done so by

activating the Accept line;

The processor waits for the Accept signal before it removes its signals

from the bus (in the case of a read operation, it also strobes data into its

input buffer).

Bus cycle

Timing of an input transfer on a synchronous bus

Voltage

t

Threshold

Forbidden range Logic value 1

Logic value 0

Bus clock

Voltage

t

Address and mode

information

Voltage

t

Data

t0

t2

t1

Page 207: Computer Architecture All lecture.pdf

Data

t0 t1 t2 t3 t4 t5

Handshake control of data transfer during an input operation

Voltage

t

Address and mode

information

Voltage

t

Voltage

t

t0 t1 t2 t3 t4 t5

Voltage

t t0 t1 t2 t3 t4 t5

Ready

Accept

t0 t1 t2 t3 t4 t5

Page 208: Computer Architecture All lecture.pdf

When the device interface receives the 1 to 0 transition of the Ready signal, it removes the data and the Accept signal from the bus. This completes the input transfer.

The Accept signal arrives at the processor, indicating, that the input data are available on the bus. however, since it was assumed, that the device interface transmits Accept signal at the same time that it places the data on the bus, the CPU must allow for bus skew. After the delay equivalent to the maximum bus skew, the CPU strobes the data into its input buffer. at the same time it drops the signal Ready, indicating, that it has receive the data.

The interface of the addressed device sets the Accept signal to 1 and gates the data from the data its register to the data lines. If extra delays are introduced by the interface circuitry before it places the data on the bus, it must delay the Accept signal accordingly. the period t2 - t1 depends on the distance between CPU and the device interface; it is also a function of the delays introduced by interface circuitry.

Data

The delay t1 - t0 is intended to allow for any skew that may occur on the bus . Sufficient time

should be allowed for the interface circuitry to decode the signal; this delay must be also

included in the period t1 - t0

The processor sets the Ready line to 1 to inform I/O unit that the address and mode information is ready

t0 t1 t2 t3 t4 t5

Voltage

t

Address and mode

information

Voltage

t

Voltage

t

t0 t1 t2 t3 t4 t5

Voltage

t t0 t1 t2 t3 t4 t5

Ready

Accept

t0 t1 t2 t3 t4 t5

t3

t4

t5

t1

t0

t2

The CPU removes the address and mode information from

the bus. The delay t4 - t3 is again intended to allow for bus

skew. Erroneous addressing may take place if the address, as seen by some device on the bus, starts to change, while the Ready signal is still equal to 1.

The processor places the address and mode of information on the bus

Page 209: Computer Architecture All lecture.pdf

In this diagram it is assumed, that compensation for bus skew and address

decoding is performed by the CPU. This simplifies the I/O interface at the

device end, because the interface circuit can use the Ready signal directly

to gate other signals to or from the bus.

Skew occurs when two signals simultaneously transmitted from one source

arrive at the destination at different times. This happens because different lines

of the bus may have different propagation speeds. thus, to guarantee that Ready

signal does not arrive at any device ahead of the address and mode information

the delay - should be larger than the maximum possible skew of the bus (Note

that, in synchronous case, bus skew is accounted for as a part of the maximum

propagation delay).

Mixed Synchronous/Asynchronous Bus (MS/AsB)

Another practical alternative is to use an asynchronous bus, but with a

provision that all signaling changes are synchronized with a clock. The time

elapsed between successive handshake signals is an integral number of clock

cycles. For example, the CPU may send an address during one clock cycle.

During that cycle, it asserts a signal indicating that the address is valid and that

all devices on the bus should decode this address. The addressed device

responds, when it is ready, by asserting an acknowledge signal and placing the

data on the bus (in the case of read operation). One or more clock cycles may

separate the request and the response, depending on the speed of the device

being addressed. Using the clock often leads to simpler designs of logic circuits

in device interfaces.

Many variations of the bus techniques are found in commercial computers.

For example, the bus in the 68000 family of processors has two modes of

operation, one asynchronous and one synchronous. The PowerPC bus uses the

mixed approach.

The choice of design involves trade-offs among many factors. Some of the

important considerations are:

Simplicity of the device interface;

Ability to accommodate device interfaces that introduce different amount

of delay;

Total time required for a bus transfer;

Ability to detect errors resulting from addressing a nonexistent device or

from an interface malfunction.

The fully asynchronous scheme provides the highest degree of flexibility and

reliability, but its device interface circuit is somewhat more complex than that

Page 210: Computer Architecture All lecture.pdf

of the synchronous or mixed bus. Asynchronous buses have an error-detecting

capability provided by interlocking the Ready and Accept signals. If the Accept

signal is not received within a fixed time-out period after Ready is set to 1, the

CPU assumes that an error has occurred. A bus error can be used to cause an

interrupt and execute a routine that either alerts the operator to the malfunction

or takes some other appropriate action.

Types of Operations of Data Transfer. Bus may support the following types of operations:

Read (data transfer from master to slave);

Write (data transfer from slave to master);

Read-Modify-Write (for the executing Write operation there is no necessity

to change an address on the address lines);

Read-after-Write (it also executed without changing address for the Write

operation).

(Multiplexed lines) (Dedicated (Separate) lines)

Write

Write

Read

Read

Address (1-st Cycle)

Data

(2-nd Cycle)

Address

(1-st Cycle)

Access Time

(2-nd Cycle)

Data (3-rd Cycle)

Address

1-st Cycle

Time Time

Address

1-st Cycle

Data

Data

Page 211: Computer Architecture All lecture.pdf

Read-Modify-Write Read-after-Write

Address

Access

Time

Data Read

Data Write

Address

Data Write

Access

Time

Data Read

Address

Data

Data Data

Data-block Transfer

Page 212: Computer Architecture All lecture.pdf

Cache

Memory

CPU

PCI

Bridge

Main

Memory

Sound

Board Video

Board

Graphic

Board

Concordance

Module of Ex-tended Bus

Basic I/O devices

LAN Adapter

SCSI Interface

CCoonnffiigguurraattiioonn ooff CCoommppuutteerr SSyysstteemm oonn tthhee bbaassee ooff PPCCII BBuuss..

PCI bus

Cache Memory bus Local bus

Main Memory bus

ISA bus

Page 213: Computer Architecture All lecture.pdf

Main Characteristics of PCI Bus Introduced in ’90 by Intel. Improved by consortium of

manufacturers (PCI SIG (Special Interest Group)).

Uses 66 MHz clock, independent from that of the processor;

Capacity 528 Mbytes/s;

Cycle time 30 Ns;

64-bit data and address lines (multiplexed);

Supports up to 16 slots;

Systems also include ISA slots for compatibility;

Mixed synchronous/asynchronous bus;

Centralized bus architecture

Structure of PCI Bus Lines. (required)

There are 5 groups of lines:

System Lines. Through these lines timing signals and signals of initial

setting are transferred. Namely:

CLK - Clock ticks (loops), in accordance with the left

(increasing) front all processes on the bus are synchronized.

The frequency 33 (66) MHz;

RST# - Reset of all registers, counters and potential signals

(return into the initial state).

Informational Lines. Through 32 (64) lines of these group code-

signals of addresses and data are transferred, the others are used for

interpretation and acknowledgement of data validity. Namely:

AD- Multiplexed lines, which are used for addresses and data

transfer;

C/BE- Multiplexed Lines of Bus commands and signals of

bytes selection. During the phase of data transfer signals on

these lines indicate, which of the bytes (4 bytes are

transferred at the same time) contain the necessary

information;

PAR- Signal of parity checking of data on the lines AD and

C/BE with a delay equals to one cycle.

Page 214: Computer Architecture All lecture.pdf

Interface Managing Lines. Through these lines signals, which

guarantee coordination of master and slave work during an exchange

process, are transferred. Namely:

FRAME# - The current master asserts a signal on this line in

order to inform other devices about starting of a transaction.

The master unlocks this signal at a moment, when the

completing phase of the transaction has begun;

IRDY#- Initiator (Master) Ready. A signal on this line is

formed by the current initiator (master). When an operation

Read is performed this signal is an indicator of master

readiness to accept the data. When an operation Write is

performed this signal is an indicator of data validity (the data

asserted on the AD lines).

TRDY# - Target Ready. A signal on this line is formed by

the current slave (target). When an operation Read is

performed this signal is an indicator of the data validity on

the lines AD. When an operation Write is performed this

signal is an indicator of slave readiness to accept the data.

STOP - The signal on this line is formed by the current slave

and informs the master, that a situation for necessity to stop

the current transaction has appeared.

IDSEL – Initialization Device Select. This line is used for

selecting a chip during Read or Write operations in the

configuration process (any device, attached to the PCI has

got 256 internal registers; states of these registers determine

the current configuration of slaves).

DEVSEL – Device Select. This line is used by a slave, when

it has recognized its own address on the phase of address

code transfer through AD lines, and for the master it serves

as an indicator of slave determining.

Lines of Arbitration. Unlike the other lines, these lines are used by

separate modules individually: for each of the modules, attached to the

PCI bus a pair of lines is allotted (these two lines are directly

connected with the bus arbiter).Namely:

REQ# - By using this line some of the devices inquire the

bus arbiter about permission to use the bus.

GNT# - By signals on this line the arbiter informs a device,

which has made the request, that it has obtained a grant to

use the bus.

Lines of Errors Indication. Through these lines signals of appeared

errors are transferred. Namely:

Page 215: Computer Architecture All lecture.pdf

PERR# - Parity Error. Signals on this line inform, that the

control system has detected a parity error.

SERR# - System Error. Any device may assert a signal on

this line, and it informs about detecting error (parity error

during an address analysis or some other errors during data

codes analysis).

Instructions of PCI Bus.

Functioning of PCI Bus can be presented as a sequential executing of

transactions. Transaction is a seance of data portion transfer. Any

transaction is initialized by a master and supported by a slave.

When a master begins a transaction, it sets an instruction on the C/BE

line in the phase of address transfer.

The PCI standard specifies the following instructions:

Interrupt Acknowledge. It is one of READ types of instructions,

which is envisaged for PCI’s controllers (during an address transfer the

code of the instruction isn’t asserted, but during data transfer phase on the

BE lines a code is asserted, which indicates the size of the required

interrupt identifier);

Special cycle. This instruction points out, that the master “wishes” to

send a message to one or to several slaves;

I/O READ and I/O WRITE. These instructions are used for data

transfer between the master and the selected device of I/O;

Memory READ and Memory WRITE. These instructions are used for

transferring data between the master and the maim memory.

Configuration READ and Configuration WRITE. These instructions

allow to the master to read the information, which concerns the current

slave configuration and renew its parameters, if it is necessary;

Page 216: Computer Architecture All lecture.pdf

Transaction of PCI Bus

T1

T2

T3

T4

T5

T6

T7

READ Empty Cycle WRITE

ADDRESS

“Reverse” DATA

ADDRESS

DATA

READ CD Permission

WRITE CD

Permission

CLK

AD

C/BE#

FRAME#

IRDY#

DEVSEL#

TRDY#

During the T1 Cycle (on the back front):

1. Master asserts address on AD line;

2. Master asserts READ command on

the line C/BE;

3. Master asserts signal (start of

transaction) on the line FRAME#;

During the T2 cycle (on the back front):

1. Master switches AD line in order

the Slave would be able to use it

during the next cycle;

2. Master changes the contents of

C/BE# lines, pointing out which of

the bytes in the word it will read;

During the T3 cycle (on the back front):

1. Slave asserts a signal

(confirmation to the master, that it

has recognized the address and it is

ready to respond) on the line

DEVSEL#;

2. Slave asserts data on AD line;

3. Slave asserts signal (confirmation

of the master, that the necessary

data have been asserted) on the

TRDY#;

The cycle T4 in reality is usually

empty;

During the cycle T5 the same chip

initializes WRITE operation (so, the

cycle T5 is identical to the T1 one);

During the T6 cycle the Master itself

places data on AD lines (there is no

necessity to use a “reverse cycle”);

During the T7 cycle the memory

accepts data.

Page 217: Computer Architecture All lecture.pdf

Data

t0 t1 t2 t3 t4 t5

Timing of data transfer on an asynchronous bus

Voltage

t

Address and

operation

mode

Voltage

t

Voltage

t

t0 t1 t2 t3 t4 t5

Voltage

t t0 t1 t2 t3 t4 t5

Ready

Accept

t0 t1 t2 t3 t4 t5

Page 218: Computer Architecture All lecture.pdf

Bus cycle

Timing of data transfer on a synchronous bus

Voltage

t

Threshold

Forbidden range Logic value 1

Logic value 0

Bus clock

Voltage

t

Address and operation

mode

Voltage

t

Data

t0

t2

t1

Page 219: Computer Architecture All lecture.pdf

Timing of data transfer on a mixed bus

Page 220: Computer Architecture All lecture.pdf

PCI Bus Arbitration

Page 221: Computer Architecture All lecture.pdf

Main Characteristics of PCI Bus

Introduced in ’90 by Intel. Improved by consortium of manufacturers (PCI SIG (Special Interest Group)).

Uses 66 MHz clock, independent from that of the processor;

Capacity 528 Mbytes/s;

Cycle time 30 Ns;

64-bit data and address lines (multiplexed);

Supports up to 16 slots;

Systems also include ISA slots for compatibility;

Mixed synchronous/asynchronous bus;

Centralized bus architecture

Structure of PCI Bus Lines.

(required) There are 5 groups of lines: System Lines. Through these lines timing signals and signals

of initial setting are transferred. Namely: CLK - Clock ticks (loops), in accordance with the left

(increasing) front all processes on the bus are synchronized. The frequency 33 (66) MHz;

RST# - Reset of all registers, counters and potential signals (return into the initial state).

Informational Lines. Through 32 (64) lines of these group code-signals of addresses and data are transferred, the others are used for interpretation and acknowledgement of data validity. Namely:

AD- Multiplexed lines, which are used for addresses and data transfer;

C/BE- Multiplexed Lines of Bus commands and signals of bytes selection. During the phase of data transfer signals on these lines indicate, which of the bytes (4 bytes are transferred at the same time) contain the necessary information;

PAR- Signal of parity checking of data on the lines AD and C/BE with a delay equals to one cycle.

Page 222: Computer Architecture All lecture.pdf

Interface Managing Lines. Through these lines signals, which guarantee coordination of master and slave work during an exchange process, are transferred. Namely:

FRAME# - The current master asserts a signal on this line in order to inform other devices about starting of a transaction. The master unlocks this signal at a moment, when the completing phase of the transaction has begun;

IRDY#- Initiator (Master) Ready. A signal on this line is formed by the current initiator (master). When an operation Read is performed this signal is an indicator of master readiness to accept the data. When an operation Write is performed this signal is an indicator of data validity (the data asserted on the AD lines).

TRDY# - Target Ready. A signal on this line is formed by the current slave (target). When an operation Read is performed this signal is an indicator of the data validity on the lines AD. When an operation Write is performed this signal is an indicator of slave readiness to accept the data.

STOP - The signal on this line is formed by the current slave and informs the master, that a situation for necessity to stop the current transaction has appeared.

IDSEL – Initialization Device Select. This line is used for selecting a chip during Read or Write operations in the configuration process (any device, attached to the PCI has got 256 internal registers; states of these registers determine the current configuration of slaves).

DEVSEL – Device Select. This line is used by a slave, when it has recognized its own address on the phase of address code transfer through AD lines, and for the master it serves as an indicator of slave determining.

Lines of Arbitration. Unlike the other lines, these lines are used by separate modules individually: for each of the modules, attached to the PCI bus a pair of lines is allotted (these two lines are directly connected with the bus arbiter).Namely:

REQ# - By using this line some of the devices inquire the bus arbiter about permission to use the bus.

GNT# - By signals on this line the arbiter informs a device, which has made the request, that it has obtained a grant to use the bus.

Page 223: Computer Architecture All lecture.pdf

Lines of Errors Indication. Through these lines signals of appeared errors are transferred. Namely:

PERR# - Parity Error. Signals on this line inform, that the control system has detected a parity error.

SERR# - System Error. Any device may assert a signal on this line, and it informs about detecting error (parity error during an address analysis or some other errors during data codes analysis).

Instructions of PCI Bus. Functioning of PCI Bus can be presented as a sequential executing of transactions. Transaction is a seance of data portion transfer. Any transaction is initialized by a master and supported by a slave. When a master begins a transaction, it sets an instruction on the C/BE line in the phase of address transfer. The PCI standard specifies the following instructions:

Interrupt Acknowledge. It is one of READ types of instructions, which is envisaged for PCI’s controllers (during an address transfer the code of the instruction isn’t asserted, but during data transfer phase on the BE lines a code is asserted, which indicates the size of the required interrupt identifier);

Special cycle. This instruction points out, that the master “wishes” to send a message to one or to several slaves;

I/O READ and I/O WRITE. These instructions are used for data transfer between the master and the selected device of I/O;

Memory READ and Memory WRITE. These instructions are used for transferring data between the master and the maim memory.

Configuration READ and Configuration WRITE. These instructions allow to the master to read the information, which concerns the current slave configuration and renew its parameters, if it is necessary;

Page 224: Computer Architecture All lecture.pdf

Questions to Lecture 5.

1. What is the interconnection structure, and by which factors is

it determined?

2. List the types of exchanges (input and output) that are

characteristically for each module, draw up a sketch for the

CPU module (indicate the major forms of input and output)

and explain from which modules the CPU receives data (What

kind of operations are specific for the CPU module?).

3. What kind of buses does the System Bus include? What

function does each of these buses carry out?

4. What do we call the width of a bus? Which parameters of the

Computer System are determined by widths of some buses

included in the System Bus?

5. What operation does the control signal “I/O read” set?

6. What problems may arise, when only one (single) bus is used

in a computer system?

7. Give examples of using multiple bus structures in computer

systems and explain necessity of including each of the buses

in the system.

8. List and describe main generic types of buses.

9. Which methods of arbitration are used now? What’s the

difference between these methods?

Page 225: Computer Architecture All lecture.pdf

Centralized Arbitration

Single-Level Centralized Arbitration

Inquest of the First Level Bus

1

2

3

4

5

6

Arbiter

Inquest of the Second Level Bus

Grant of the First Level Bus

Grant of the Second Level Bus

1

2

3

4

5

6

Arbiter

Inquest of the Bus

Grant of the Bus

Two-Levels Centralized Arbitration

Page 226: Computer Architecture All lecture.pdf

Distributed Arbitration

Arbitration of PCI Bus.

in out

1

in out

2

in out

3

in out

4

in out

5

in out

6

5 v

Arbitration Line

Busy

Inquest of the Bus

REQ# GNT#

Controller of

Device

……..

ARBITER

REQ# GNT#

Controller of

Device

REQ# GNT#

Controller of

Device

REQ# GNT#

Controller of

Device

Page 227: Computer Architecture All lecture.pdf

Lecture № 6

Memory Subsystem. Internal Memory.

1. Functions and characteristics of Memory subsystem.

2. Semiconductor memories: RAM ( DRAM & SRAM), ROM.

3. Internal organization of Memory Chips.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and

performance, 5th ed. – Upper Saddle River, NJ : Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer

organization,4th

ed. – McGRAW-HILL INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th

ed. - Upper Saddle

River, NJ : Prentice Hall, 2002.

Page 228: Computer Architecture All lecture.pdf

Memory is a functional part of Computer System which is

intended for accepting, storing and presentation of data.

Location

Capacity

Unit of transfer

Access method

Characteristics

Performance

Physical type

Physical characteristics

Organisation

Location

CPU (Local - Registers )

Internal ( Main & Cache)

External (the external memory devices are such devices for the CPU access to which it is possible only through corresponding I/O module)

Capacity

Word size

The natural unit - word

Number of words(for external devices)

or Bytes

Unit of Transfer

Internal

Usually governed by data bus width and measured in words: 8, 16 or 32 bits

External

Usually a block which is much larger than a word and estimated in bytes or bits (designated as N)

Page 229: Computer Architecture All lecture.pdf

Addressable unit

Smallest location which can be uniquely addressed

Word internally

Cluster on some external devices.

These methods are used for the access to

external devices

Access Methods (1)

Sequential (The stored data and additional address information are divided into elements, called records)

Start at the current position and read through in order

Access time depends on location of data and previous location

e.g. tape

Direct

Individual blocks have unique address

Access is by jumping to vicinity(окрестность) plus sequential search

Access time depends on location and previous location

e.g. disk

In both types (direct and sequential) of accesses a combined mechanism of Read/Write is used

Page 230: Computer Architecture All lecture.pdf

These methods are used for the access to

internal devices

Access Methods (2)

Random

Individual addresses identify locations exactly

Access time is independent of location or previous access

e.g. RAM

Associative

Data is located by a comparison with contents of a portion of the store

Access time is independent of location or previous access

e.g. cache Performance

Access time

Time between presenting the address and getting the valid data (for random access).

Time, which is necessary for the transference of the Read/Write mechanism in the required position with respect to the carrier (for direct and sequential accesses).

Memory Cycle time (TC)

Time may be required for the memory to “recover” before next access

Cycle time is access + recovery

Transfer Rate (R[bit/sec])

for direct and sequential accesses: R = N/(TN – TA) ,

Here TN is a time for read or write operation of a block of data of N bits volume,

TA is an average access time

Page 231: Computer Architecture All lecture.pdf

Physical Types

Semiconductor

RAM

Magnetic

Disk & Tape

Optical

CD & DVD

Others

Hologram

Physical Characteristics

Decay(разрушаемость) Volatility(энергозависимость) Erasable Power consumption

Organisation

Physical arrangement of bits into words

Not always obvious

e.g. interleaved (расслоенная, с чередованием адресов)

Page 232: Computer Architecture All lecture.pdf

There are relationships among these three characteristics:

Smaller access time – greater cost per bit

Greater capacity – smaller cost per bit

Greater capacity – greater access time

To meet performance requirements it is necessary to use

expensive, relatively lower-capacity memories with fast access

time.

The way out of this dilemma is not to rely on a single memory

component or technology, but to employ a memory hierarchy

Memory Hierarchy

Registers

In CPU

Internal or Main memory

May include one or more levels of cache

“RAM”

External memory

Backing store

The Bottom Line

How much?

Capacity

How fast?

Time is money

How expensive?

Page 233: Computer Architecture All lecture.pdf

Internal Memory

Registers

Cache

Main Memory

External Memory Magnetic Disk

CD-R

CD-ROM

Off-line Storage

Magnetic Tape Storage

Magnetic Optical Disks

Optical Disks

Table: The Memory Hierarchy

As one goes down the hierarchy, the following occur:

a. Decreasing cost/bit

b. Increasing capacity

c. Increasing access time

d. Decreasing frequency of access of the Memory by the CPU

Thus: smaller, more expensive, faster memories are

supplemented by larger, cheaper, slower memories

Performance of a simple two—level memory

Percentage of

Access (involving

only Level 1) [%]

Average Access Time [ms]

100

10

Page 234: Computer Architecture All lecture.pdf

The basis for the validity of condition d is a principle

known as locality of reference. During the course of

execution of a program, memory references by the

processor, for both instructions and data, tend to

cluster. Over a long period of time the CPU is

primarily working with these clusters of memory

references. It is possible to organize data across the

hierarchy such, that the percentage of accesses to

each succeedingly lower level is substantially less

than to the level above.

Other forms of memory may be included in the

hierarchy (Expanded Storage (intermediate storage), Disk

Cache).

Page 235: Computer Architecture All lecture.pdf

Cycle times of semiconductor memories range from a few

hundred nanoseconds to less than 10 nanoseconds.

Memory unit is called RAM if any location can be

accessed for Read or Write operation in some fixed

amount of time that is independent of the location’s

address

Semiconductor Memory

RAM, ROM, PROM, EPROM,

Flash Memory, EEPROM, CMOS

RAM

Read/Write at an arbitrary address (at random)

Volatile

Temporary storage

Static or dynamic

Dynamic RAM

Bits stored as a charge in capacitors

Charges leak

Need refreshing even when powered

Simpler construction

Smaller per bit

Less expensive

Need refresh circuits

Slower

Main memory

Page 236: Computer Architecture All lecture.pdf

Static RAM

Bits stored as on/off switches (using traditional flip-flop logic gate configurations)

No charges to leak

No refreshing needed when powered

More complex construction

Larger per bit

More expensive

Does not need refresh circuits Faster

Cache

Read Only Memory (ROM)

Permanent storage

Micro-programming

Library subroutines

Systems programs (BIOS)

Function tables

CMOS – Complementary Metal-Oxide Semiconductor. CMOS is

intended for storing the Computer current configuration. It stores data

practically without using energy.

Page 237: Computer Architecture All lecture.pdf

Memory cells are usually organized in the form of an array, in

which each cell is capable of storing one bit of information.

For semiconductor memories one of the key

design issues is the number of bits of data that

may be read/written at a time.

At one extreme is an organization in which the physical arrangement of

cells in the array is the same as the logical arrangement (as perceived by the

processor) of words in the memory: the array is organized into W words of

B bits each and B bits are read/written at a time.

Types of ROM

Written during manufacture ROM, very expensive for small runs

Programmable (once) PROM Needs special equipment to program (programmer)

Read “mostly” Erasable Programmable (EPROM)

Erased by UV (all the storage) Electrically Erasable (EEPROM)

Takes much longer to write than read Flash memory (intermediate between EPROM and EEPROM)

Erase memory electrically

Organisation in detail

Page 238: Computer Architecture All lecture.pdf

At the other extreme is the so-called “one-bit-per-chip” organization, in

which data read/written one bit at a time.

A 16Mbit chip can be organised as 1M of 16 bit words

A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on

A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array

Reduces number of address pins Multiplex row address and column address 11 pins to address (211=2048) Adding one more pin doubles range of values so

x4 capacity, so far, we have gone through the following generations, at rate of roughly one every three years: 1K, 4K, 16K, …, 16M.

Refreshing

Refresh circuit included on chip

Disable chip

Count through rows

Read & Write back

Takes time

Slows down apparent performance

Page 239: Computer Architecture All lecture.pdf

Logically, the memory array is organized as 4 square arrays of

2048 by 2048 elements. The elements of the array are connected

by horizontal (row) and vertical (column) lines. Each horizontal

line connects to the Select terminal of each cell in its row, and

each vertical line connects to the Data-In/Sense terminal of each

cell in its column.

Address lines supply address of the word to be selected. In this

example 11 address lines are used to select one of 2048 rows;

Typical 16 Mb DRAM (4M x 4)

Page 240: Computer Architecture All lecture.pdf

additional 11 address lines select one of 2048 columns. Four

data lines are used for input and output of 4 bits to and from a

data buffer. The row line selects which row of cells is used for

reading or writing. Since only 4 bits are read/written to this

DRAM, there must be multiple DRAMs connected to the

memory controller in order to read/write a word of data to the

bus.

Integrated circuits are mounted on packages of DIP (dual in-line

package) type: pins are located in 2 rows(lines). The number of pins is

usually less or equal to 32.

Packaging

Page 241: Computer Architecture All lecture.pdf

Fig. (a) shows an example EPROM package (8-Mbit chip). It

is “one-word-per-chip” package. It includes 32 pins, which

support following signal lines:

(A0 – A19) the address of the word being accessed;

(D0 –D7)the data to be read out, consisting of 8 lines;

Vcc the power supply;

Vss the ground pin;

CE a chip enable;

Vpp a program voltage that is supplied during programming.

Fig. (b) shows an example DRAM pin package (16-Mbit

chip organized as 4M . 4.

Since RAM can be updated, the data pins are input/output.

The write enable (WE) and output enable (OE) pins indicate

whether this is write or read operation.

RAS means row address select, and CAS – column address

select.

Page 242: Computer Architecture All lecture.pdf

Module Organisation(1)

Page 243: Computer Architecture All lecture.pdf

If RAM chip contains only 1 bit per word, then we will need a

number of chips equal to number of bits per word. Fig.

Module Organisation (1) shows how a memory module

consisting of 256K 8—bit words could be organised. For 256K

words an 18-bit address is needed and is supplied to the module

from some external source. The address is presented to 8 chips,

each of which provides the input/output of 1 bit.

When the larger memory is required, an array of chips is

needed. The possible organization of 1Mbyte memory is shown

in Fig. Module Organisation (2). In this case, we have 4

columns of chip, each column contains 256K words. 20 address

lines are needed. The 18 of them are routed to 32 modules. The

other 2 are input to a group select logic module, that sends a chip

enable signal to one of 4 columns of modules.

Page 244: Computer Architecture All lecture.pdf

If RAM

Module Organisation (2)

Page 245: Computer Architecture All lecture.pdf

1

1

1 0

1

1

1 0

1

1

0 0

1

1

0 0

Hamming’s Correcting Control Code Formation

0

0

0

0

0

0

1

1

1

a) b)

c) d)

A A

A A B B

B B

C

C C

C

Page 246: Computer Architecture All lecture.pdf

Questions to Lecture 6

1. Describe existed methods of access to different types of

memory.

2. Which parameters are used for the estimation memory devices

performance? What does each of these parameters

characterize?

3. Explain the necessity of a memory hierarchy employment.

4. What is RAM? Describe distinguishing characteristics of

RAM. What’s the difference between DRAM and SRAM?

5. What is ROM?

6. Explain the necessity of implementation of EPROM

(EEROM, Flash Memory).

Page 247: Computer Architecture All lecture.pdf

Error detecting and correcting with help of Correcting Codes.

Memory

Comparison

f

f

Corrector

M M

M

K K

K

Error Signal

Output Data

Input Data

Page 248: Computer Architecture All lecture.pdf

Lecture № 7

CACHE MEMORY 1. Purpose and principles of work. Elements of Cache Design.

2. Mapping Function. Direct, associative and set associative techniques.

3. Cache organization in PENTIUM & PowerPC processors.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5

th ed. – Upper Saddle River, NJ :

Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL

INTERNATIONAL EDITIONS, 1996.

3. Tanenbaum, A.S. Structured Computer Organization, 4th

ed. - Upper Saddle River, NJ : Prentice Hall, 2002.

Page 249: Computer Architecture All lecture.pdf

Cache memory is intended to give memory speed approaching that of the fastest memories available, and at the same time provide this fast memory the price of less expensive types of semiconductor memories.

Cache

Small amount of fast memory

Sits between normal main memory and CPU (off-chip cache)

May be located on CPU chip or module (on-chip cache)

Page 250: Computer Architecture All lecture.pdf

Cache operation - overview

CPU requests contents of memory location

Check cache for this data If present, get from cache (fast) If not present, read required block from main

memory to cache Then deliver from cache to CPU. Cache includes

tags to identify which block of main memory is in each cache slot

Page 251: Computer Architecture All lecture.pdf

Cache Design

Size (more optimal size: between 1K and 512K) Mapping Function (direct, associative, set associative) Replacement Algorithm (LRU, FIFO, LFU, Random) Write Policy(Information integrity)(Write through, Write back) Block Size (no definitive optimum value has been found) Number of Caches (Single- or two-level, Unified or Split)

Page 252: Computer Architecture All lecture.pdf

Size does matter

Cost More cache is expensive

Speed Large cache is slightly slower than small one Checking cache for data takes time

Page 253: Computer Architecture All lecture.pdf

The Cache Efficiency is characterized by hit ratio. The hit ratio is a ratio of

all hits in the cache to the number of CPU’s accesses to the memory.

Typical Cache Organization

Page 254: Computer Architecture All lecture.pdf

Mapping Function

Cache of 64kByte Cache slot of 4 bytes

i.e. cache is 16k (214) lines(slots) of 4 bytes

16MBytes main memory 24 bit address

(224=16M)

Page 255: Computer Architecture All lecture.pdf

Direct Mapping

Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific

place

Address is in two parts Least Significant w bits identify unique word

Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a

tag of s-r (most significant)

Page 256: Computer Architecture All lecture.pdf

Direct Mapping Address Structure

Tag s-r Line or Slot

r Word w

8 14 2

24 bit address

the low-order 2 bits select one of 4 words in 4 byte block 22 bit block identifier

8 bit tag (=22-14) (the high-order 8 bits of the memory address of the block are stored in 8 tag bits associated with its location in the cache )

14 bit slot or line (determines the cache position in this block)

No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

Page 257: Computer Architecture All lecture.pdf

Direct Mapping Cache Line Table

Cache line Main Memory blocks held 0 0, m, 2m, … 2s-m 1 1, m+1, 2m+1 … 2s-m+1 m-1 m-1, 2m-1,3m-1 … 2s-1

Page 258: Computer Architecture All lecture.pdf

Direct Mapping Cache Organization

Page 259: Computer Architecture All lecture.pdf

Direct Mapping Example

Page 260: Computer Architecture All lecture.pdf

Direct Mapping: advantages &

disadvantages

Simple

Inexpensive Fixed location for given block

If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

Page 261: Computer Architecture All lecture.pdf

Associative Mapping

A main memory block can load into any line of cache

Memory address is interpreted as tag and word

Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive

Page 262: Computer Architecture All lecture.pdf

Associative Mapping Example

Page 263: Computer Architecture All lecture.pdf

Tag 22 bit Word

2 bit

Associative Mapping Address Structure

22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit

word is required from 32 bit data block

e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF

Page 264: Computer Architecture All lecture.pdf

Set Associative Mapping

Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set

e.g. Block B can be in any line of set i

e.g. 2 lines per set 2 way associative mapping

A given block can be in one of 2 lines in only one set

Page 265: Computer Architecture All lecture.pdf

Set Associative Mapping Example

13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set

Page 266: Computer Architecture All lecture.pdf

Two Way Set Associative Cache Organization

Page 267: Computer Architecture All lecture.pdf

Set Associative Mapping Address Structure

Use set field to determine cache set to look in Compare tag field to see if we have a hit

e.g Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF

Tag 9

bit Set 13

bit

Word

2 bit

Page 268: Computer Architecture All lecture.pdf

Two Way Set Associative Mapping Example

Page 269: Computer Architecture All lecture.pdf

Replacement Algorithms (1)

Direct mapping

No choice Each block only maps to one line Replace that line

Page 270: Computer Architecture All lecture.pdf

Replacement Algorithms (2)

Associative & Set Associative

Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative

Which of the 2 block is LRU?

First in first out (FIFO) replace block that has been in cache longest

Least frequently used replace block which has had fewest hits

Random

Page 271: Computer Architecture All lecture.pdf

Write Policy

Must not overwrite a cache block unless main memory is up to date

Multiple CPUs may have individual caches I/O may address main memory directly

Page 272: Computer Architecture All lecture.pdf

Write through

All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic

to keep local (to CPU) cache up to date Lots of traffic Slows down writes

Page 273: Computer Architecture All lecture.pdf

Write back

Updates initially made in cache only Update bit for cache slot is set when update

occurs If block is to be replaced, write to main memory

only if update bit is set Other caches get out of sync

I/O must access main memory through cache 15% of memory references are writes

Page 274: Computer Architecture All lecture.pdf

Number of Caches

On-chip cache (L1) Reduces the processor’s external bus activity Speeds up execution times and increases overall system performance

External cache (L2) If an L2 SRAM cache is used, then frequently the missing information can be

quickly retrieved. The data can be accessed using the fastest type of bus transfer.

Contemporary designs include both L1 and L2 caches The potential savings due to the use of L2 cache depends on the hit rates in both

the L1 and L2 caches

Page 275: Computer Architecture All lecture.pdf

The 80386 does not include an on—chip cache The 80486 includes a single on-chip cache of 8 KBytes The initial Pentium includes 2 on-chip caches (L1)

One for data and one for instructions, each 8 Kbytes, using a line size of 32 bytes

and two-way set associative organization

The Pentium Pro and Pentium II include 2 on-chip caches (L1) (size: 8 – 16 Kbytes) and one off-chip cache (L2) of size from 256 Kbytes up to 1 Mbytes.

The core of the processor includes four main nodes (units):

Pentium Cache Organization

Page 276: Computer Architecture All lecture.pdf

Node of Fetch/Decoding: fetches (according the order)

instructions from the Code Cache (L1), decodes them, forms a sequence of micro-instructions and saves them in the Micro-instructions Buffer.

Micro-instructions Buffer: stores the current sequence of the

micro-instructions prepared for execution.

Node of Distribution/Execution: planes an execution of the

micro-operations with an account of their dependence on data and accessibility of necessary resources (that is why the instructions can be executed in order, which differs from the sequence of their entering into the Micro-instruction Buffer). This node organizes the forecasting execution of micro-operations. After execution of micro-operations the node fetches results from the cache and stores them in the processor’s registers.

Node of Termination: determines when the result of forestall

micro-operations (операции, выполненные с опережением) can be considered as decisive and it must be fixed in Data Cache, it also deletes those instructions from the Buffers which are not necessary at this moment.

Page 277: Computer Architecture All lecture.pdf

System Bus

READ

LOAD STORE

Cache L2

(256K ,1M)

The Interface Node with the Bus

Code Cache

L1

(8, 16K)

Data Cache L1

(8, 16K)

Node of

Fetch/Decoding

Node of

Distribution/

Execution

Node of

Termination

Micro-instructions Buffer

Pentium II processor block diagram

Page 278: Computer Architecture All lecture.pdf

Set (Way) 0 Set (Way) 1

LRU Directory 0 Directory 1 Bank 0 Bank 1 Bank 0 Bank 1

127 element

0 element

32 bytes 32 bytes

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Service Fields

.

.

.

.

4 Kb 4 Kb

Directory

Entry State

Bits

Structure of Pentium II Internal Data Cache

Page 279: Computer Architecture All lecture.pdf

To provide cache consistency the data cache supports a protocol

MESI (modified/exclusive/ shared/invalid). The data cache includes two status bits per tag, so each line can be in one of four states: Modified: the line in the cache has been modified and it differs from

that in the main memory, so it is available only in this cache. Exclusive: The line in the cache is the same as that in the main

memory and also is not present in any other cache. Shared: The line in the cache is the same as that in the main

memory and may be present in another cache. Invalid: the line in the cache does not contain valid data.

Data Cache Consistency

Page 280: Computer Architecture All lecture.pdf

The internal cache is controlled by two bits of the control registers: CD (cache disable) and NW (not write through). There are two Pentium instructions that can be used to control the cache: INVD – Flushes the cache memory and signals the external memory (if any) to flush (принудительно обновлять). WBINVD – performs the same function but also signals an external write-back cache to write modified blocks before flushing.

Cache Control

Page 281: Computer Architecture All lecture.pdf

Table: PowerPC Internal Caches

Model Size Bytes/Line Organization

PowerPC 601

(1) 32—Kbytes 32 8-way set associative

PowerPC 603

(2) 8--Kbytes 32 2-way set associative

PowerPC 604

(2) 16—Kbytes 32 4-way set associative

PowerPC 620

(2) 32--Kbytes 64 8-way set associative

PowerPC Cache Organization

Page 282: Computer Architecture All lecture.pdf

128 bits

64 bits

64 bits

128-bit

L2/Bus

Interface

Instruction Cache

32 KBytes

Instruction

Unit

Integer

ALU

Integer

ALU

Integer

ALU

Integer

Registers

Load/Store

Unit

Floating-

Point

Registers

Floating-

Point

ALU

Data Cache

32 KBytes

PowerPC 620 (G3) block diagram

Page 283: Computer Architecture All lecture.pdf

Questions to Lecture № 7

1. What is the main purpose of Cache Memory implementation?

2. Describe principles of Cache Memory work.

3. Enumerate elements of Cache Design.

4. Draw up a block-diagram of Pentium processor and explain

functions of its main nodes.

5. How is ensured the Data Cache Consistency?

Page 284: Computer Architecture All lecture.pdf

Lecture № 8

External Memory

1. Types of external memory. Data organization and formatting.

2. RAID (Six levels of RAID).

3. Optical memory.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th ed. – Upper Saddle River, NJ : Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL INTERNATIONAL EDITIONS, 1996.

Tanenbaum, A.S. Structured Computer Organization, 4th ed. - Upper Saddle River, NJ : Prentice Hall, 2002

Types of External Memory

Magnetic Disk

RAID

Removable

Optical

CD-ROM

Page 285: Computer Architecture All lecture.pdf

CD-Writable (WORM)

CD-R/W

DVD

Magnetic Tape Magnetic Disk

Metal or plastic disk coated with magnetizable material (iron oxide … rust)

Range of packaging

Floppy

Winchester hard disk

Removable hard disk

Data Organization and Formatting.

Page 286: Computer Architecture All lecture.pdf

Concentric rings or tracks (between 500 and 2000 tracks on one side)

Tracks divided into sectors (data are read by blocks = sectors; there can be between 10 and 100 sectors)

Adjacent tracks are separated by gaps. This prevents errors due to misalignment (несоосность) of the head (conducting coil).

To simplify the electronics, the same number of bits are typically stored on each track.

Gap 1

ID Field

0

Gap 2

Data Field

0

Gap 3

Gap 1

ID Field

1

Gap 2

Data Field

1

Gap 3

Gap 1

ID Field 29

Gap 2

Data Field 29

Gap 3

Bytes 17 7 41 515 20 17 7 41 515 20 17 7 41 515 20

600 Bytes/Sector

Index

Sector Physical Sector 0 Physical Sector 1 Physical Sector 29

Winchester Disk Track Format

Disk Data Layout

Inter—record Gap

Tracks

Sectors

Inter-track Gaps

Identifies the

start of the ID

Page 287: Computer Architecture All lecture.pdf

Synch Byte

Data

CRC

Synch Byte

Track #

Head #

Sector #

CRC

Data are transferred to and from the disk in blocks, accordingly data are stored in block-size regions known as sectors. To avoid imposing unreasonable precision requirements the adjacent sectors are separated by intra—track(record) gaps. In order to identify positions within a track there are must be starting points on the tracks and ways for identification the start and the end of each sector. These requirements are handled by means of control data recorded on the disk. Thus, the disk is formatted with some extra data used only by the disk driver, and they are not accessible to the user. In Fig. Winchester Disk Track Format each track contains 30 fixed-length sectors of 600 bytes each. Every sector holds 512 bytes data, plus control information useful to the disk controller. The ID field is a unique identifier or address used to locate a particular sector. The SYNCH byte is special bit pattern that determines the beginning of the field. The track number identifies a head, since this disk has multiple surfaces. The ID and data fields contain an error-detecting code (CRC).

Characteristics of Disk Systems

Characteristic Set of Parameters/Possible meanings

Head Motion Fixed head (one per track) Movable head (one per surface)

Disk Portability Non-removable disk Removable disk

Sides Single-sided Double-sided

Platters Single-platter Multiple-platter

Head Mechanism Contact (floppy) Fixed gap

Aerodynamic gap (Winchester)

Bytes 1 2 1 1 2

Bytes 1 512 2

Holds the control

sum of the field

Page 288: Computer Architecture All lecture.pdf

Disk Access Time

Disk Access Time is the main Characteristic of Disk Performance. If removable heads are used and disk drive is operating, then to read/write, the head must be positioned at the desired track and at the beginning of the desired sector on that track. The time it takes to position the head at the track is known as seek time. In either case, once the track is selected, the system waits until the appropriate sector rotates to line up with the head. The time it takes for the sector to reach the head is known as rotational latency.

RAID (Six[seven] levels of RAID).

With the use of multiple disks, there is a wide variety of ways in which the data can be organised and in which redundancy can be added to improve reliability. Industry has agreed on a standardised scheme for multiple-disk database design, known as RAID (Redundant Array of Independent Disks) The RAID scheme consists of six levels. These levels do not imply a hierarchical relationship but designate different design architectures that share three common characteristics:

RAID is a set of physical disk drives (набор приводов магнитных дисков) viewed by operating system as a single logical drive.

Data is distributed across the physical drives of an array.

Redundant disk capacity is used to store parity information (контрольная информация), which guarantees data recoverability in case of a disk failure.

RAID systems of different levels differ by methods of realisation the second and the third characteristics.

Disk Access Time is the main Characteristic of Disk Performance.

Disk Access Time is equal to the sum of Seek time and

Rotational Latency time

Page 289: Computer Architecture All lecture.pdf

RAID, Level 0

No redundancy (is not a true member of the RAID family)

Data is striped across all disks

Round Robin stripe organization (циклическая ленточная организация)

All the user and system data is viewed as being stored on a logical disk; the disk is divided into strips, these strips may be physical blocks, sectors or some other units. The strips are mapped round—robin to consecutive array members. A set of logically consecutive strips that maps exactly one strip to each array member is referred to as a stripe. In an n-disk array the first n logical strips are physically stored as the first strip on each of the n disks, the second n strips are distributed as the second strips on each physical disk, and so on.

Strip 9 Strip 8 8

Strip 7 7

Strip 6 66

Strip 5 5

Strip 4 4

Strip 3 33

Strip 2 2

Strip 1 11

Strip 0 0

Strip 14 14

Strip 10 10

Strip 6 6

Strip 2 2

Strip 15 15

Strip 11 11

Strip 7 7

Strip 3 3

Strip 13 13

Strip 9 9

Strip 5 5

Strip 1 1

Strip 12 12

Strip 8 8

Strip 4 4

Strip 0 0

Logical Physical Physical Physical Physical

Disk Disk 0 Disk 1 Disk 2 Disk 3

Array

Management

Software

Data Mapping for a RAID Level 0 Array

Strip 15

Strip 11

Strip 7

Strip 3 Strip

15

Strip 11

Strip 7

Strip 3 Strip

15

Strip 11

Strip 7

Strip 3 Strip

15

Strip 11

Strip 7

Strip 3

RAID 1 Mirrored

Page 290: Computer Architecture All lecture.pdf

Mirrored Disks

Data is striped across disks

2 copies of each stripe on separate disks

Read from either

Write to both

Recovery is simple Swap faulty disk & re-mirror No down time

Expensive

RAID 2 Redundancy Through Hamming Code

Disks are synchronized

Very small stripes

Error correction calculated across corresponding bits on disks

Multiple parity disks store Hamming code error correction in corresponding positions.

Lots of redundancy

Strip 12 15

Strip 8 11

Strip 4 7

Strip 0 3

Strip 13 15

Strip 9 11

Strip 5 7

Strip 1 3

Strip 14 15

Strip 10 11

Strip 6 7

Strip 2 3

Strip 15 15

Strip 11 11

Strip 7 7

Strip 3 3

Strip 12 15

Strip 8 11

Strip 4 7

Strip 0 3

Strip 13 15

Strip 9 11

Strip 5 7

Strip 1 3

Strip 14 15

Strip 10 11

Strip 6 7

Strip 2 3

Strip 15 15

Strip 11 11

Strip 7 7

Strip 3 3

Page 291: Computer Architecture All lecture.pdf

Expensive Not used

RAID 3

Bit-Interleaved Parity

Similar to RAID 2

Only one redundant disk, no matter how large the array

Simple parity bit for each set of corresponding bits

Data on failed drive can be reconstructed from surviving data and parity info

Very high transfer rates

RAID 4 Block-Level Parity

Each disk operates independently

Good for high I/O request rate

Large stripes

b0

b1

b2

b3

f0(b)

f1(b)

f2(b)

b0

b1

b2

b3

P(b)

Page 292: Computer Architecture All lecture.pdf

Bit by bit parity calculated across stripes on each disk

Parity stored on each disk

RAID 5 Block-Level Distributed Parity

Like RAID 4

Parity striped across all disks

Round robin allocation for parity stripe

Avoid RAID 4 botl-neck at parity disk

Commonly used in network servers

RAID 6 Redundancy Through 2 Different Codes

The scheme of functioning suggests calculation of 2 control codes stored in different blocks distributed through all disks.

Control codes P and Q are calculated by different algorithms, it allows to restore lost data when even two disks have been failed.

Strip 12 15

Strip 8 11

Strip 4 7

Strip 0 3

Strip 13 15

Strip 9 11

Strip 5 7

Strip 1 3

Strip 14 15

Strip 10 11

Strip 6 7

Strip 2 3

Strip 15 15

Strip 11 11

Strip 7 7

Strip 3 3

P(12-15)

P(8-11)

P(4-7)

P(0-3)

P(16-19) Strip 12 15

Strip 8 11

Strip 4 7

Strip 0 3

Strip 16 P(12-15)

Strip 9 11

Strip 5 7

Strip 1 3

Strip 17 Strip 13 15

P(8-11)

Strip 6 7

Strip 2 3

Strip 18 Strip 14 15

Strip 10 11

P(4-7)

Strip 3 3

Strip 19 Strip 15 15

Strip 11 11

Strip 7 7

P(0-3)

Page 293: Computer Architecture All lecture.pdf

The hardware is more complicated. Strip 12

15

Strip 8 11

Strip 4 7

Strip 0 3

P(12-15) Strip 9 11

Strip 5 7

Strip 1 3

Q(12-15) P(8-11) Strip 6 7

Strip 2 3 Strip 15

15

Q(8-11) P(4-7)

Strip 3 3 Strip 12

15

Strip 8 11

Q(4-7) P(0-3)

Strip 13 15

Strip 9 11

Strip 5 7

Q(0-3)

Page 294: Computer Architecture All lecture.pdf

Optical Memory Optical Disk Products:

CD

A nonerasable disk that stores digitized audio information. The standard system uses 12-cm disks and can record more than 60 minutes of uninterrupted

playing time.

CD-ROM

A nonerasable disk used for storing computer data. The standard system uses 12-cm disks and can hold more than 550 Mbytes.

DVD

Digital video disk. The technology of video-signals recording and other data of a large volume is used and it’s based on methods of information (data)

compression.

WORM

Write-Once Read Many is more easily written than CD-ROM, making single-copy disks commercially feasible; holds from 200 to 800 Mbytes of data.

Erasable Optical Disk

A disk that uses optical technology but that can be easily erased and rewritten. A typical capacity is 650 Mbytes.

Both the audio and the CD-ROM share similar technology. The main difference is in the formats of data presentation.

Page 295: Computer Architecture All lecture.pdf

Optical Storage CD-ROM

Originally for audio

650 (775)Mbytes giving over 70(73.2) minutes audio

Poly-carbonate coated with highly reflective coat (aluminum)

Data stored as pits

Reads by reflecting laser

Constant packing density

Constant linear velocity (1.2 m/s)

CD-ROM block Format 12 bytes 4 bytes 2046 bytes 288 bytes Sync Id

2352 bytes

Mode 0 = blank data field

Mode 1 = 2048 bytes data+ error correction

00

FF x 10

00

Min

Sec

Sector

Mode

Data

Layered

ECC

Page 296: Computer Architecture All lecture.pdf

Mode 2 = 2336 bytes data CD-ROM block Format consists of the following fields: 1. Sync: identifies the beginning of a block; 2. Header: contains the block address and the mode byte; 3. Data: User’s data; 4. Auxiliary: additional user’s data in mode 2. In mode 1, this is 288-bytes error-correcting code.

Random Access on CD-ROM

Difficult

Move head to rough position

Set correct speed

Read address

Adjust to required location

Page 297: Computer Architecture All lecture.pdf

Other Optical Storage

CD-Writable WORM Now affordable Compatible with CD—ROM drives

CD-RW Erasable Getting cheaper Mostly CD-ROM drive compatible

DVD Storage

Digital Video Disk Used to indicate a player for movies

Only plays video disks

Digital Versatile Disk Used to indicate a computer drive

Will read computer disks and play video disks

DVD technology

Multi-layer

Page 298: Computer Architecture All lecture.pdf

Very high capacity (4.7 G per layer)

Full length movie on single disk

Using MPEG compression

Finally standardised (honest!)

Movies carry regional coding

Players only play correct region films

Can be “fixed”

DVD-Writable

Loads of trouble with standards

First generation DVD drives may not read first generation DVD—W disks

First generation DVD drives may not read CD-RW disks

Magnetic Tape

Serial access

Slow

Page 299: Computer Architecture All lecture.pdf

Very cheap

Backup and archive

Page 300: Computer Architecture All lecture.pdf

Questions to Lecture 8

1. Why RAID 0 can not be considered as a true member of RAID family?

Compare RAID 5 and RAID 6 (illustrate the answer by pictures).

2. List the well-known Optical Disk products and describe their characteristics.

3. Give an example of CD-ROM block formats.

4. List the major characteristics of Disk System.

5. How is evaluated the Disk Access Time? What does the Disk Access Time

characterize? What is RAID? List three common characteristics of RAID.

6. Describe the typical Disk data layout (draw a picture).

7. How are sector positions within a track identified? Give an example of disk

track format (describe the meaning of each field).

Page 301: Computer Architecture All lecture.pdf

Lecture № 9 Virtual Memory

1. Virtual Memory Techniques.

2. Virtual Memory Address translation.

3. Use of an associative-mapped TLB.

Literature.

1. Stallings W. Computer Organization and Architecture. Designing and performance, 5

th ed. – Upper Saddle River, NJ :

Prentice Hall, 2002.

2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL

INTERNATIONAL EDITIONS, 1996.

Tanenbaum, A.S. Structured Computer Organization, 4th

ed. - Upper Saddle River, NJ : Prentice Hall, 2002

Page 302: Computer Architecture All lecture.pdf

Virtual-memory Technique Usually only some parts of a program that are executed are first brought into the main memory; when a new part (segment) of a program is to be moved into the full memory, it must replace another segment already in the memory. In modern computers, the operating system moves programs and data automatically between the main memory and secondary storage. Techniques that automatically move program and data blocks into a physical main memory when they are required for execution

are called virtual memory techniques. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. Programs, and hence the processor, reference an instruction and data space, that is independent of the available physical main memory space. The binary addresses that the processor issues for either instructions or

data are called virtual or logical addresses.

Page 303: Computer Architecture All lecture.pdf

These addresses are translated into physical addresses by a combination of hardware and software components. If a virtual address refers to a part of the program or data space that is currently in the physical memory, then the contents of appropriate location in the memory are accessed immediately. On the other hand, if the referenced address is not in the main memory, its contents must to be brought into a suitable location in the main memory before they can be used.

The figure Virtual Memory Organization shows a typical

organization that implements virtual memory. A special hardware unit,

called Memory Management Unit (MMU), translates virtual

address into physical address. When the desired data are in the main memory, these data are fetched as of the cache mechanism. If the data are not in the main memory, the MMU causes the operating system to bring the data into the memory from the disk. Transfer of data between the disk and the main memory is performed using the DMA.

Page 304: Computer Architecture All lecture.pdf

Virtual Memory Organization

Processor

Cache

Main Memory

Disk Storage

MMU

Virtual address Physical address Physical address Physical address

Data DMA transfer

DMA transfer

Page 305: Computer Architecture All lecture.pdf

Virtual Memory Address Translation

Page table base register

Page Table

Page table address Virtual page number Offset

.

.

.

.

.

Page frame Offset

+

Virtual address from processor

Control Page frame

Bits in memory

Physical address in main memory

Page 306: Computer Architecture All lecture.pdf

Address Translation. A simple method for translating virtual address into physical addresses is to assume that all programs and data are composed of fixed length units called pages. Each page consists of a block of words that occupy contiguous locations in the main memory. Pages commonly range from 2K to 16K bytes in length. They constitute the basic unit of information that is moved between the main memory and the disk whenever the translation mechanism determines that a move is required. Pages should not to be too small, because the access time of a magnetic disks is much longer than of the main memory. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. A virtual memory address translation method based on the concept of fixed length pages: each virtual address generated by the processor, whether it is for an instruction fetch or an operand fetch/store operation, is interpreted as virtual page number (high—order bits) followed by an offset(low-order bits) that specifies the location of a particular byte (or word) within a page. Information about the main memory location of each page is kept in a page table. This information includes the main memory address where the page is

Page 307: Computer Architecture All lecture.pdf

stored and the current status of the page. An area in the main memory that can hold one page is called a page frame. The starting address of the page table is kept in a page table base register. By adding the virtual page number to the contents of this register, the address of the corresponding entry in the page table is obtained. The contents of this location give the starting address of the page if that page currently resides in the main memory. Each entry in the page table also includes some control bits that describe the status of the page while it is in the main memory. One bit indicates the validity of the page, that is, whether the page is actually loaded in the main memory. This bit allows the operating system to invalidate the page without actually removing it. Another bit indicates whether the page has been modified during its residency in the memory. Other control bits indicate various restrictions that may be imposed on accessing the page. For example, a program may be given full read and write permission, or it may be restricted to read access only The page table information is used by MMU for every read and write access. An access to every word, set by the virtual address demands two operations with the main memory. Thus, a straightforward virtual memory scheme would have the effect of doubling the memory access time. To overcome this

Page 308: Computer Architecture All lecture.pdf

problem, most virtual memory schemes make use of a special cache for page table entries. The page table is kept in the main memory, however, a copy of a small portion of the page table can be accommodated within the MMU. This portion consists of the page table entries that correspond to the most recently accessed pages. A small cache, usually called the Translation Lookaside Buffer (TLB) (буфер быстрой переадресации), is incorporated into the MMU for this purpose. In addition to the information that constitutes a page table entry, the TLB includes the virtual address of the entry. Address translation proceeds as follows. Given a virtual address, the MMU looks in the TLB for the referenced page. If the page table entry for this page is found in the TLB, the physical address is obtained immediately. If there is a miss in the TLB, then the required entry is obtained from the page table in the main memory, and the TLB is updated.

Page 309: Computer Architecture All lecture.pdf

Virtual address from processor TLB

Virtual page number

Control bits

Page frame In memory

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Physical address in Main memory

No

Use of Associative-mapped TLB

Virtual page number Offset

Page frame Offset

=?

Hit Miss

Page 310: Computer Architecture All lecture.pdf

Questions to Lecture 9

1. Give a definition of Virtual Memory Techniques. Draw up a

scheme of Virtual Memory Organization and explain roles of

the MMU in this scheme.

2. Describe the process of Virtual Memory Address Translation

(draw up a scheme of this process).

Page 311: Computer Architecture All lecture.pdf

Glossary on the course Computer Organization and Architecture

Address Translation 19 Architecture of the Computer System (CS) 2 Bus 5, 6, 7, 8, 9 Bus Structure 5, 6 Cache 13, 14, 15 Clock cycle 8 CMOS 12 Computer 2, 3 Computer Performance 4 Central Processing Unit (CPU) 5 Data 2 Disk Access Time 17 EEPROM 10 EPROM 10 External Memory 16, 17 Flash Memory 10 Format 2 Function 4 Hardware 3 Information 2, 3 Interface 3 I/O Module 5 Memory 5, 10, 11 Memory Hierarchy 10 Mezzanine Architecture 8 Organization of the CS 4 PCI Bus 9 PROM 10 Protocols 3,4 RAM 10,11 Redundant Array of Independent Disks 17,18 Replacement Algorithms 13,14 ROM 12 Semiconductor Memory 10, 11, 12, 13 Software 3 Structure 4 TLB 19, 20 Virtual-memory Technique 18

Page 312: Computer Architecture All lecture.pdf

2

Data are base elements of information, such as numbers, letters,

symbols and so on, which are processed or carried out by human or

computer (or by some machine) [sometimes the information itself,

prepared for certain purposes (in a special form) is considered as

data].

Information is a matter conferred (присваиваемое содержимое)

to the data.

Format is a way of data representation, or a scheme of data

positioning.

Computer is a device or a complex of devices, which is intended

for mechanization or automating of data processing, and which is

constructed on the base of electronic elements (transistors, logic

circuits, magnet elements and so on).

[Analog Computer is a computing device, which processes data

given in a form of continuously changing physical values, the

meanings of which may be measured (such values may be angle or

linear transfers, electric voltage, electric current power, time and so

on). These analog values are processed by mechanical or some

other physical methods, by measuring results of such operations.

Such type of computers are usually used for solving equations,

describing processes in real scale of time, when initial data is input

from special measuring data monitors.]

[Digital Computer is an electronic computing device, which

receives a discrete input data, processes it in accordance with the

list of instructions stored inside it and generates resulting output

data. (Instructions may be considered as a special type of data,

which are coded in correspondence with format; these instructions:

a) manage data transfer as inside the computer itself, so the

computer internal and peripheral devices (input-output devices), b)

determine arithmetic or logic operations to be performed).]

Page 313: Computer Architecture All lecture.pdf

3

[Hybrid Computer is a computing system, in which elements of

analog and digital computers are combined. These computers are

used for solving equations by implementing analog devices, but for

storage, future processing and results representation digital devices

are implemented.]

Composition of Computer is called configuration.

Hardware consists of tangible (palpable) objects: integrated

circuits, printed boards, cables, memory devices, printers, some

others technical devices and physical equipment.

Software is a detailed instructions that control the operation of a

computer system.

Interface is:

(1) a relation between two processing components.

(2) a complete complex of agreements (a language in a

common sense) concerning input and output signals, by

which may exchange the following data processors:

computer device – computer device; program – program

medium; human beings – data processing system, - and

some others. These agreements are called protocols. Protocols

are sequence of technical requirements, which must be provided

by constructors of any device for successful concordance

(compatibility) of its (the considered device) work with other

devices.

Definition Architecture of the Computer System (CS) is a

specification of its interfaces, which determines data

processing and includes: methods of data coding, system of

instructions, principles of software-hardware interaction. It

is also determined as a set of information, which is

necessary and sufficient for programming in the machinery

code.

Page 314: Computer Architecture All lecture.pdf

4

Definition The operational units and their interconnections

that realize the architecture of the CS is the Organization of

the CS. All Intel x86 family share the same basic architecture

The IBM System/370 family share the same basic architecture

This gives code compatibility, software succession Organization differs between different versions Architecture is more conservative than organization

Structure is the way of merging (uniting) components of some subsystem in one (whole) unit.

Function is an operation of individual component as a part of the structure.

Definition. The Computer Performance (CP) is determined by

number of certain (well known) operations per time unity.

The generalized estimation of the CP is a number of

transactions per second.

The basic performance characteristics of a computer

system: processor speed, memory capacity, interconnection

data rates.

The instruction fetch consists of reading an instruction from a

location in the memory. The instruction execution may involve several operations and depends

on the nature of the instruction.

Address Space(AS) is a set of addresses, which the

microprocessor is able to generate.

The way of connecting the various modules is called the

interconnection structure. The interconnection structure is determined by character of exchange

operations, which are specific for each module.

Major forms of input and output for the modules:

Memory: Typically, a memory module will consists of N words of equal

length. Each word is assigned a unique numerical address (0, 1, …, N-1).

Page 315: Computer Architecture All lecture.pdf

5

A word of data can be read from or written into the memory. The nature

of the operations is indicated by READ or WRITE control signals. The

location for the operation is specified by an address.

I/O Module: It’s functionally similar to the memory (from internal point

of view). There are two operations READ and WRITE. Further, an I/O

module may control more than one external device. We can refer to each

of the interfaces to an external device as a port and give each a unique

address (e.g., 0, 1, 2.,…, M-1). In addition, there are external data paths

for the input and output of data with an external device. Finally, an I/O

module may be able to send interrupt signals to the CPU.

CPU: CPU reads in instructions and data, writes out data after processing,

and uses control signals to control the overall operation of the system. It

also receives interrupt signals.

Types of transfers supported by interconnection structure.

Memory to CPU: The CPU reads an instruction or unit of data from

memory.

CPU to Memory: The CPU writes a unit of data to memory.

I/O to CPU: The CPU reads data from I/O device via an I/O module.

CPU to I/O: The CPU sends data to the I/O device.

I/O to or from the Memory: For these two cases, an I/O module is

allowed to exchange data directly with memory, without going through

the CPU, using direct memory access (DMA).

Multiplexer is a functional device which permits to two or more

channels of data link to use the same common device of data

transfer jointly.

A bus is a set of electric pathways and service

electronic devices (framing), providing exchange

of data among computer units and devices.

A communication pathway connecting two or more devices is a bus.

Bus Structure

A system bus consists, typically, of from 50 to 100 separate lines, which can

be classified into three functional groups: data, address and control lines

(power lines are usually omitted ).

Page 316: Computer Architecture All lecture.pdf

6

Data Bus (Line)

The data lines provide a path for moving data between system modules.

Number of lines is referred as WIDTH of the data bus (the number of lines

determines how many bits can be transferred at a time)

Address Bus (Line)

Identify the source or destination of data

(e.g. CPU needs to read an instruction (data) from a given

location in memory)

Address Bus width determines maximum memory capacity of the

system.

Used to address as the Main Memory, so I/O ports (the higher-order

bits are used to select a particular module on the bus, and the lower-order bits

select an address in the Memory or I/O port within the module).

E.g., if a width of a bus is equal to 8, then codes 01111111 and less

specify cells addresses in the Main Memory module (module with 0

address), and codes from 10000000 and higher specify I/O ports which

are under control of a module with an address 1.

Control Bus(Line)

Is used to control the access to and the use of the data and

address lines.

Control and timing information(indicate validity of data and address information)

Memory read/write signal

Interrupt request

Typical control lines include:

Memory Write: Causes data on the bus to be written

into the addressed location.

Memory Read: Causes data from the addressed

location to be placed on the bus.

Page 317: Computer Architecture All lecture.pdf

7

I/O Write: Causes data on the bus to be output to

the addressed I/O port.

I/O Read: Causes data from the addressed I/O port

to be placed on the bus.

Transfer ACK: Indicates that data have been

accepted from or placed on the bus.

Bus Request: Indicates that a module needs to gain

control of the bus.

Bus Grant: Indicates that a requesting module has

been granted control of the bus.

Interrupt request: Indicates that interrupt is

pending.

Interrupt ACK: Acknowledges that the pending

interrupt has been recognized.

Clock: Used to synchronize operations.

Reset: Initializes all modules.

The operation of any bus is as follows:

If one of the modules “wishes” to send data to

another, it must do two things:

1. Obtain the use of the bus;

2. Transfer data through the bus.

If one of the modules “wishes” to receive data from

the other module it must do:

1. Obtain the use of the bus;

2. Send request to the other module, by putting the

corresponding code on the address lines after

formation signals on the certain control lines.

Page 318: Computer Architecture All lecture.pdf

8

Computer systems contain a number of different

buses that provide pathways between components at

various levels of the computer systems hierarchy.

A bus that connects major computer components (CPU,

Memory, I/O) is called a System Bus.

Up to now the Traditional Bus Architecture has been widely used. In this

case the Computer System includes Local Bus, which connects the CPU,

Cache Memory and some peripheral devices. Cache Memory Controller

provides connections not only with the Local Bus, but with the System Bus

as well (all modules of the Main Memory are connected with the System

Bus). Under such structure all processes of input-output are realized through

the System Bus omitting the CPU, it allows the CPU to perform more

important operations.

The connecting peripheral devices not directly to the System Bus, but to

additional bus - Expansion Bus, which buffers data circulating between the

Main Memory and peripheral devices’ controllers allows to support a large

variety of external devices, and at the same time to separate information-

flows “CPU – Memory” and “ Memory – I/O Controllers”.

The appearance of new high-performance external devices demands to

increase speed of data transfer through buses, that is why one more High-

Speed Bus is often used in contemporary computer systems. This bus unites

high-speed external devices and is connected with the System Bus through

special concordance module (модуль согласования) - Bridge. Such kind of

structure is called Mezzanine Architecture (Мезонинная Архитектура).

The advantage of this structure: high-speed peripheral devices are

integrated with the processor and at the same time they may work

independently (themselves). It means that functioning of the bus doesn’t

depend on the CPU architecture and vice versa.

The bus includes a clock line upon which a clock transmits a regular

sequence of alternating 1s and 0s of equal duration. A single 1-0

transmission is referred to as a clock cycle (bus cycle) and defines a time

slot (интервал). All other devices on the bus can read the clock line, and all

events start at the beginning of a clock cycle. Other bus signals may change

at the leading edge of the clock signal.

With asynchronous timing the occurrence of one event on a bus

follows and depends on the occurrence of a previous event.

Page 319: Computer Architecture All lecture.pdf

9

In actual implementations, electronic switches are used. The output gate of

register is capable of being electrically disconnected from the bus or placing

a 0 or a 1 on the bus. Because it supports these three possibilities, such a gate

is said to have a three—state output. A separate control input is used either

to enable the gate output to drive the bus to 0 or to 1 or to put it in a high-

impedance (electrically disconnected) state. The latter state corresponds to

the open-circuit state of a mechanical switch.

PCI Bus

Peripheral Component Interconnection, high-bandwidth, processor-independent, functions as a mezzanine or peripheral bus

Intel released to public domain

32 or 64 bit, 33 (66)MHz, a transfer rate 264(528) Mbytes/sec

50 lines

PCI Bus Lines (required)

1. Systems lines

Including clock and reset 2. Address & Data

32 time lines for address/data

Interrupt & validate lines 3. Interface Control Control the timing transactions and provide coordination among

initiators and targets 4. Arbitration

Not shared

Direct connection to PCI bus arbiter 5. Error lines

PCI Bus Lines (Optional)

Interrupt lines

Not shared

Page 320: Computer Architecture All lecture.pdf

10

Cache support

64-bit Bus Extension

Additional 32 lines

Time multiplexed

2 lines to enable devices to agree to use 64-bit transfer

JTAG/Boundary Scan (For testing procedures)

Memory Hierarchy

Registers

In CPU

Internal or Main memory

May include one or more levels of cache

―RAM‖

External memory

Backing store

Semiconductor Memory RAM, ROM, PROM, EPROM,

Flash Memory, EEPROM, CMOS Cycle times of semiconductor memories range from a few hundred

nanoseconds to less than 10 nanoseconds.

Memory unit is called RAM if any location can be accessed for Read or

Write operation in some fixed amount of time that is independent of the

location’s address

RAM

Read/Write at an arbitrary address (at random)

Page 321: Computer Architecture All lecture.pdf

11

Volatile

Temporary storage

Static or dynamic

Dynamic RAM

Bits stored as a charge in capacitors

Charges leak

Need refreshing even when powered

Simpler construction

Smaller per bit

Less expensive

Need refresh circuits

Slower

Main memory

Static RAM

Bits stored as on/off switches (using traditional flip-flop logic gate configurations)

No charges to leak

No refreshing needed when powered

More complex construction

Larger per bit

More expensive

Does not need refresh circuits

Page 322: Computer Architecture All lecture.pdf

12

Faster

Cache

Read Only Memory (ROM)

Permanent storage

Micro-programming

Library subroutines

Systems programs (BIOS)

Function tables

CMOS – Complementary Metal-Oxide Semiconductor. CMOS is

intended for storing the Computer current configuration. It stores data

practically without using energy.

Types of ROM

Written during manufacture

ROM, very expensive for small runs

Programmable (once)

PROM

Needs special equipment to program (programmer)

Read ―mostly‖

Erasable Programmable (EPROM)

Erased by UV (all the storage)

Electrically Erasable (EEPROM)

Takes much longer to write than read

Flash memory (intermediate between EPROM and EEPROM) Erase memory electrically

Cache

Cache memory is intended to give memory speed approaching that of the

fastest memories available, and at the same time provide this fast memory

the price of less expensive types of semiconductor memories.

Page 323: Computer Architecture All lecture.pdf

13

Small amount of fast memory

Sits between normal main memory and CPU (off-chip cache)

May be located on CPU chip or module (on-chip cache)

Cache Design

Size (more optimal size: between 1K and 512K)

Mapping Function (direct, associative, set associative)

Replacement Algorithm (LRU, FIFO, LFU, Random)

Write Policy(Information integrity)(Write through, Write back)

Block Size (no definitive optimum value has been found)

Number of Caches (Single- or two-level, Unified or Split)

The Cache Efficiency is characterized by hit

ratio. The hit ratio is a ratio of all hits in the

cache to the number of CPU’s accesses to the

memory. Replacement Algorithms (1)

Direct mapping

No choice

Each block only maps to one line

Replace that line

Replacement Algorithms (2) Associative & Set Associative

Hardware implemented algorithm (speed)

Least Recently used (LRU)

Page 324: Computer Architecture All lecture.pdf

14

e.g. in 2 way set associative

Which of the 2 block is LRU?

First in first out (FIFO)

replace block that has been in cache longest

Least frequently used

replace block which has had fewest hits

Random

Write Policy

Must not overwrite a cache block unless main memory is up to date

Multiple CPUs may have individual caches

I/O may address main memory directly

Write through

All writes go to main memory as well as cache

Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date

Lots of traffic

Slows down writes

Write back

Updates initially made in cache only

Update bit for cache slot is set when update occurs

Page 325: Computer Architecture All lecture.pdf

15

If block is to be replaced, write to main memory only if update bit is set

Other caches get out of sync

I/O must access main memory through cache

15% of memory references are writes

Number of Caches

On-chip cache (L1) Reduces the processor’s external bus activity

Speeds up execution times and increases overall system performance

External cache (L2) If an L2 SRAM cache is used, then frequently the missing

information can be quickly retrieved. The data can be accessed using the fastest type of bus transfer.

Contemporary designs include both L1 and L2 caches The potential savings due to the use of L2 cache depends on the

hit rates in both the L1 and L2 caches

Data Cache Consistency To provide cache consistency the data cache supports a protocol MESI (modified/exclusive/ shared/invalid). The data cache includes two status bits per tag, so each line can be in one of four states: Modified: the line in the cache has been modified and it differs

from that in the main memory, so it is available only in this cache. Exclusive: The line in the cache is the same as that in the main

memory and also is not present in any other cache. Shared: The line in the cache is the same as that in the main

memory and may be present in another cache. Invalid: the line in the cache does not contain valid data.

Cache Control The internal cache is controlled by two bits of the control registers: CD (cache disable) and NW (not write through).

Page 326: Computer Architecture All lecture.pdf

16

There are two Pentium instructions that can be used to control the cache: INVD – Flushes the cache memory and signals the external memory (if any) to flush (принудительно обновлять). WBINVD – performs the same function but also signals an external write-back cache to write modified blocks before flushing.

Types of External Memory

Magnetic Disk

RAID

Removable

Optical

CD-ROM

CD-Writable (WORM)

CD-R/W

DVD Magnetic Tape

Magnetic Disk

Metal or plastic disk coated with magnetizable material (iron oxide … rust)

Range of packaging

Floppy

Winchester hard disk

Removable hard disk

Winchester Disk Track Format each track contains 30 fixed-

length sectors of 600 bytes each. Every sector holds 512 bytes data, plus control information useful to the disk controller.

Page 327: Computer Architecture All lecture.pdf

17

The ID field is a unique identifier or address used to locate a particular sector. The SYNCH byte is special bit pattern that determines the beginning of the field. The track number identifies a head, since this disk has multiple surfaces. The ID and data fields contain an error-detecting code (CRC).

Characteristics of Disk Systems

Characteristic Set of Parameters/Possible

meanings

Head Motion Fixed head (one per track) Movable head (one per surface)

Disk Portability Non-removable disk Removable disk

Sides Single-sided Double-sided

Platters Single-platter Multiple-platter

Head Mechanism Contact (floppy) Fixed gap

Aerodynamic gap (Winchester)

Disk Access Time Disk Access Time is the main Characteristic of Disk Performance. If removable heads are used and disk drive is operating, then to read/write, the head must be positioned at the desired track and at the beginning of the desired sector on that track. The time it takes to position the head at the track is known as seek time. In either case, once the track is selected, the system waits until the appropriate sector rotates to line up with the head. The time it takes for the sector to reach the head is known as rotational latency. Disk Access Time is equal to the sum of Seek time and Rotational Latency time

RAID (Six[seven] levels of RAID). With the use of multiple disks, there is a wide variety of ways in which the data can be organised and in which redundancy can be

Page 328: Computer Architecture All lecture.pdf

18

added to improve reliability. Industry has agreed on a standardised scheme for multiple-disk database design, known as RAID (Redundant Array of Independent Disks) The RAID scheme consists of six levels. These levels do not imply a

hierarchical relationship but designate different design architectures

that share three common characteristics:

RAID is a set of physical disk drives (набор приводов

магнитных дисков) viewed by operating system as a single logical

drive.

Data is distributed across the physical drives of an array.

Redundant disk capacity is used to store parity information

(контрольная информация), which guarantees data recoverability in

case of a disk failure.

RAID systems of different levels differ by methods of realisation the second

and the third characteristics.

Virtual-memory Technique

Usually only some parts of a program that are executed are first brought into the main memory; when a new part (segment) of a program is to be moved into the full memory, it must replace another segment already in the memory. In modern computers, the operating system moves programs and data automatically between the main memory and secondary storage. Techniques that automatically move program and data blocks into a physical main memory when they are required for execution are called virtual memory techniques. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. Programs, and hence the processor, reference an instruction and data space, that is independent of the available physical main memory space. The binary addresses that the processor issues for either instructions or data are called virtual or logical addresses. A special hardware unit, called Memory Management Unit (MMU),

translates virtual address into physical address. When the desired data

are in the main memory, these data are fetched as of the cache

mechanism. If the data are not in the main memory, the MMU causes

the operating system to bring the data into the memory from the disk.

Page 329: Computer Architecture All lecture.pdf

19

Transfer of data between the disk and the main memory is performed

using the DMA.

Address Translation. A simple method for translating virtual address into physical addresses is to assume that all programs and data are composed of fixed length units called pages. Each page consists of a block of words that occupy contiguous locations in the main memory. Pages commonly range from 2K to 16K bytes in length. They constitute the basic unit of information that is moved between the main memory and the disk whenever the translation mechanism determines that a move is required. Pages should not to be too small, because the access time of a magnetic disks is much longer than of the main memory. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. A virtual memory address translation method based on the concept of fixed length pages: each virtual address generated by the processor, whether it is for an instruction fetch or an operand fetch/store operation, is interpreted as virtual page number (high—order bits) followed by an offset(low-order bits) that specifies the location of a particular byte (or word) within a page. Information about the main memory location of each page is kept in a page table. This information includes the main memory address where the page is stored and the current status of the page. An area in the main memory that can hold one page is called a page frame. The starting address of the page table is kept in a page table base register. By adding the virtual page number to the contents of this register, the address of the corresponding entry in the page table is obtained. The contents of this location give the starting address of the page if that page currently resides in the main memory. The page table information is used by MMU for every read and write

access. An access to every word, set by the virtual address demands two

operations with the main memory. Thus, a straightforward virtual memory

scheme would have the effect of doubling the memory access time. To

overcome this problem, most virtual memory schemes make use of a special

cache for page table entries. The page table is kept in the main memory,

however, a copy of a small portion of the page table can be accommodated

within the MMU. This portion consists of the page table entries that

correspond to the most recently accessed pages. A small cache, usually

called the Translation Lookaside Buffer (TLB) (буфер быстрой

Page 330: Computer Architecture All lecture.pdf

20

переадресации), is incorporated into the MMU for this purpose. In addition

to the information that constitutes a page table entry, the TLB includes the

virtual address of the entry.

Address translation proceeds as follows. Given a virtual address, the

MMU looks in the TLB for the referenced page. If the page table entry for

this page is found in the TLB, the physical address is obtained immediately.

If there is a miss in the TLB, then the required entry is obtained from the

page table in the main memory, and the TLB is updated.

Page 331: Computer Architecture All lecture.pdf

Questions

1. In your own words explain the following notions(concepts) and give

examples:

a) data, information, format;

b) computer (analog, digital, hybrid);

c) hardware, software, computer configuration;

d) function, structure, interface;

e) architecture, organization.

2. List the major components of contemporary computer system and

indicate there functions.

3. List operations, which you more often use, when you work with a

computer and explain, which of the computer’s major components are

engaged in a process of executing one of these operations.

4. Analyze the 5 given below definitions of Computer architecture.

Which of these definitions does more than others correspond to the

officially accepted one? (Give a detailed explanation).

1)”The design of the integrated system which provides a useful tool

to the programmer” (Bear)

2)”The study of structure, behavior and design of computers”

(Hayes)

3)”The design of the system specification at a general or subsystem

level”(Abd-Alla)

4)”The art of designing a machine that will be pleasure to work

with”(Foster)

5)”The interface between the hardware and the lowest level

software”(Hennessy and Patterson).

5. Call the minimal number of levels of virtual machine, which can

execute all main computer functions (give explanation).

6. What is the difference between translator and interpreter?

7. Why computer hardware and computer software are considered as

logically equivalent?

8. Describe Architecture and Structure Organization of computers of I,

II, III and IV generations, compare them.

9. Formulate and analyze Key Concepts of von Neumann Architecture.

10. Describe the functional structure of von Neumann machine.

11. Describe the functional structure of IAS. List elements of Architecture

and Structure Organization (details) of IAS.

Page 332: Computer Architecture All lecture.pdf

12. List and describe base electronic components of contemporary

computer.

13. Formulate and analyze Moor’s Law.

14. What’s Computer System Performance? List the basic characteristics

of Computer System Performance.

15. What’s Hardwired Program? (What’s programming in Hardware?)

16. What’s Software Program? (What’s programming in Software?)

17. Describe the functional structure of Computer components (Top level

View) in the eye of Interconnection Subsystem.

18. What’s the Main Cycle of Instruction Processing (MCIP)?

19. Describe the architecture of “Hypothetical Machine”. What is the

difference between translator and interpreter?

20. Describe each step of MCIP on the “Hypothetical Machine” for one

concrete instruction.

21. Describe each step of MCIP on the IAS for one concrete instruction.

22. What do we mean under the Interrupts? What is the main reason of

using the Interrupt Mechanism?

23. Draw up diagrams of the Program Flow Control without interrupts

and with interrupts, describe each fragment of the Program Flow Control.

24 Which classes of interrupts must be enabled constantly? (give

explanation) .

25. Describe the mechanism of work with interrupts.

26. In the diagram “Program Flow Control” find points, which

correspond to interrupts of user’s program and explain the necessity of

using these interrupts.

27. How many techniques of I/O operations execution are used? Describe

each of these techniques and compare them.

28. Describe the Direct Memory Access technique.

29. Draw a scheme of DMA Transfer in Computer System.

30. Which approaches can be taken to dealing with multiple interrupts?

Show advantages and disadvantages of these approaches.

31. What is the interconnection structure, and by which factors is it

determined?

32. List the types of exchanges (input and output) that are

characteristically for each module, draw up a sketch for the CPU

module (indicate the major forms of input and output) and explain

from which modules the CPU receives data (What kind of operations

are specific for the CPU module?).

33. What kind of buses does the System Bus include? What function does

each of these buses carry out?

34. What do we call the width of a bus? Which parameters of the

Computer System are determined by widths of some buses included in

the System Bus?

Page 333: Computer Architecture All lecture.pdf

35. What operation does the control signal “I/O read” set?

36. What problems may arise, when only one (single) bus is used in a

computer system?

37. Give examples of using multiple bus structures in computer systems

and explain necessity of including each of the buses in the system.

38. List and describe main generic types of buses.

39 . Which methods of arbitration are used now? What’s the difference

between these methods?

40. Describe existed methods of access to different types of memory.

41. Which parameters are used for the estimation memory devices

performance? What does each of these parameters characterize?

42. Explain the necessity of a memory hierarchy employment.

43. What is RAM? Describe distinguishing characteristics of RAM.

What’s the difference between DRAM and SRAM?

44. What is ROM?

45. Explain the necessity of implementation of EPROM (EEROM,

Flash Memory).

46. What is the main purpose of Cache Memory implementation?

47. Describe principles of Cache Memory work.

48. Enumerate elements of Cache Design.

49. Draw up a block-diagram of Pentium processor and explain

functions of its main nodes.

50. How is ensured the Data Cache Consistency?

52. Why RAID 0 can not be considered as a true member of RAID

family? Compare RAID 5 and RAID 6 (illustrate the answer by

pictures).

53. List the well-known Optical Disk products and describe their

characteristics.

54. Give an example of CD-ROM block formats.

55. List the major characteristics of Disk System.

56. How is evaluated the Disk Access Time? What does the Disk

Access Time characterize? What is RAID? List three common

characteristics of RAID.

57. Describe the typical Disk data layout (draw a picture).

58. How are sector positions within a track identified? Give an

example of disk track format (describe the meaning of each field).

59. Give a definition of Virtual Memory Techniques. Draw up a

scheme of Virtual Memory Organization and explain roles of the

MMU in this scheme.

60. Describe the process of Virtual Memory Address Translation (draw

up a scheme of this process).

Page 334: Computer Architecture All lecture.pdf

Questions

1. In your own words explain the following notions(concepts) and give

examples:

a) data, information, format;

b) computer (analog, digital, hybrid);

c) hardware, software, computer configuration;

d) function, structure, interface;

e) architecture, organization.

2. List the major components of contemporary computer system and

indicate there functions.

3. List operations, which you more often use, when you work with a

computer and explain, which of the computer’s major components are

engaged in a process of executing one of these operations.

4. Analyze the 5 given below definitions of Computer architecture.

Which of these definitions does more than others correspond to the

officially accepted one? (Give a detailed explanation).

1)”The design of the integrated system which provides a useful tool

to the programmer” (Bear)

2)”The study of structure, behavior and design of computers”

(Hayes)

3)”The design of the system specification at a general or subsystem

level”(Abd-Alla)

4)”The art of designing a machine that will be pleasure to work

with”(Foster)

5)”The interface between the hardware and the lowest level

software”(Hennessy and Patterson).

5. Call the minimal number of levels of virtual machine, which can

execute all main computer functions (give explanation).

6. What is the difference between translator and interpreter?

7. Why computer hardware and computer software are considered as

logically equivalent?

8. Describe Architecture and Structure Organization of computers of I,

II, III and IV generations, compare them.

9. Formulate and analyze Key Concepts of von Neumann Architecture.

10. Describe the functional structure of von Neumann machine.

11. Describe the functional structure of IAS. List elements of Architecture

and Structure Organization (details) of IAS.

Page 335: Computer Architecture All lecture.pdf

12. List and describe base electronic components of contemporary

computer.

13. Formulate and analyze Moor’s Law.

14. What’s Computer System Performance? List the basic characteristics

of Computer System Performance.

15. What’s Hardwired Program? (What’s programming in Hardware?)

16. What’s Software Program? (What’s programming in Software?)

17. Describe the functional structure of Computer components (Top level

View) in the eye of Interconnection Subsystem.

18. What’s the Main Cycle of Instruction Processing (MCIP)?

19. Describe the architecture of “Hypothetical Machine”. What is the

difference between translator and interpreter?

20. Describe each step of MCIP on the “Hypothetical Machine” for one

concrete instruction.

21. Describe each step of MCIP on the IAS for one concrete instruction.

22. What do we mean under the Interrupts? What is the main reason of

using the Interrupt Mechanism?

23. Draw up diagrams of the Program Flow Control without interrupts

and with interrupts, describe each fragment of the Program Flow Control.

24 Which classes of interrupts must be enabled constantly? (give

explanation) .

25. Describe the mechanism of work with interrupts.

26. In the diagram “Program Flow Control” find points, which

correspond to interrupts of user’s program and explain the necessity of

using these interrupts.

27. How many techniques of I/O operations execution are used? Describe

each of these techniques and compare them.

28. Describe the Direct Memory Access technique.

29. Draw a scheme of DMA Transfer in Computer System.

30. Which approaches can be taken to dealing with multiple interrupts?

Show advantages and disadvantages of these approaches.

31. What is the interconnection structure, and by which factors is it

determined?

32. List the types of exchanges (input and output) that are

characteristically for each module, draw up a sketch for the CPU

module (indicate the major forms of input and output) and explain

from which modules the CPU receives data (What kind of operations

are specific for the CPU module?).

33. What kind of buses does the System Bus include? What function does

each of these buses carry out?

34. What do we call the width of a bus? Which parameters of the

Computer System are determined by widths of some buses included in

the System Bus?

Page 336: Computer Architecture All lecture.pdf

35. What operation does the control signal “I/O read” set?

36. What problems may arise, when only one (single) bus is used in a

computer system?

37. Give examples of using multiple bus structures in computer systems

and explain necessity of including each of the buses in the system.

38. List and describe main generic types of buses.

39 . Which methods of arbitration are used now? What’s the difference

between these methods?

40. Describe existed methods of access to different types of memory.

41. Which parameters are used for the estimation memory devices

performance? What does each of these parameters characterize?

42. Explain the necessity of a memory hierarchy employment.

43. What is RAM? Describe distinguishing characteristics of RAM.

What’s the difference between DRAM and SRAM?

44. What is ROM?

45. Explain the necessity of implementation of EPROM (EEROM,

Flash Memory).

46. What is the main purpose of Cache Memory implementation?

47. Describe principles of Cache Memory work.

48. Enumerate elements of Cache Design.

49. Draw up a block-diagram of Pentium processor and explain

functions of its main nodes.

50. How is ensured the Data Cache Consistency?

52. Why RAID 0 can not be considered as a true member of RAID

family? Compare RAID 5 and RAID 6 (illustrate the answer by

pictures).

53. List the well-known Optical Disk products and describe their

characteristics.

54. Give an example of CD-ROM block formats.

55. List the major characteristics of Disk System.

56. How is evaluated the Disk Access Time? What does the Disk

Access Time characterize? What is RAID? List three common

characteristics of RAID.

57. Describe the typical Disk data layout (draw a picture).

58. How are sector positions within a track identified? Give an

example of disk track format (describe the meaning of each field).

59. Give a definition of Virtual Memory Techniques. Draw up a

scheme of Virtual Memory Organization and explain roles of the

MMU in this scheme.

60. Describe the process of Virtual Memory Address Translation (draw

up a scheme of this process).

Page 337: Computer Architecture All lecture.pdf