CS 325: CS Hardware and Software Organization and Architecture
-
Upload
duncan-white -
Category
Documents
-
view
42 -
download
1
description
Transcript of CS 325: CS Hardware and Software Organization and Architecture
+ CS 325: CS Hardware and SoftwareOrganization and Architecture
Computer Evolution and Performance 1
+Outline
Generations in Computer Organization
Milestones in Computer Organization
Von Neumann Architecture
Moore’s Law
CPU Transistor sizes and count
Memory Hierarchy
Performance
Cache Memory
Performance Issues and Solutions
+History – Generations in Computer Organization
Zeroth generation – mechanical computers (1642 – 1945) Blaise Pascal’s mechanical calculator
First generation – vacuum tubes (1945 –1955) ENIAC, Colossus
Second generation – transistors (1955 –1965) Harwell CADET, TX-0
Third generation – integrated circuits (1965 –1980) IBM System/360
Fourth generation – VLSI (Very Large Scale Integration ) (1980 - ?) Microprocessors
Supercomputers
+History – Milestones in Computer Organization
1642 – Blaise Pascal, Mechanical machine with gears used to add and subtract.
c. 1670 – Baron Gottfried Wilhelm von Leibniz, Mechanical machine with gears used to add, subtract, multiply,
and divide.
1834 – Charles Babbage, Analytical Engine
Ada Lovelace wrote its “assembler”
1936 – Konrad Zuse, Z1
Calculator made of electromagnetic relays
+History – Milestones in Computer Organization
1940’s – John Atanasoff, George Stibbitz, Howard Aiken, Each worked independently on calculating machines with
properties such as: Binary arithmetic Capacitors for memory
1943 – Alan Turing and British Government, COLOSSUS, the first electronic computer.
1946 – John Mauchley, John Presper Eckert, ENIAC, vacuum tubes
1949 – Maurice Wilkes, EDSAC, first stored program computer.
+History – Milestones in Computer Organization
1952 – John von Neumann, von Neumann architecture – most current computers use this
basic design. Stored program computer w/ shared memory for instructions
and data.
1950’s – Researchers at MIT, Transistorized Experimental Computer Zero (TX-0), first computer
to use transistors (3600 transistors).
1960 – Digital Equipment Corporation (DEC), PDP-1, first mini computer
1961 – IBM, 1401, popular small business computer.
+History – Milestones in Computer Organization
1962 – IBM, 7094 and 709 using transistors.
Used mainly for scientific computing.
1963 – Burroghs, B500, first computer designed for a high-level language (Algol,
precursor to C).
1964 – Seymour Cray, Control Data Group, 6600, 10x faster than IBM 7094
Highly parallelized CPU
1965 – PDP-8, First mass-market mini computer.
Used a single bus.
+History – Milestones in Computer Organization
1964 – IBM, System/360, family of compatible computers (low end to high
end). First computers with multiprogramming. Used integrated circuits (dozens of transistors on one “chip”).
1974 – Intel, 8080, first general purpose computer on a chip.
1974 – Cray-1, First vector computer (single instructions on vectors of numbers).
1978 – DEC, VAX, first 32-bit mini computer.
+History – Milestones in Computer Organization
1978 – Steve Jobs, Steve Wozniak, Apple personal computer
1981 – IBM, IBM PC becomes the most popular personal computer.
Used MS-DOS by Microsoft as OS. CPU developed by Intel.
1985 – MIPS (company), MIPS, first commercial RISC computer.
1987 – Sun Microsystems, SPARC, popular RISC computer.
1990 – IBM, RS6000, first superscalar machine
+History – Milestones in Computer Organization
1993 – Intel, Pentium CPU released.
32-bit, 60 Mhz 3.2 million transistors $878
1993 – NVIDIA is founded.
1994 – Intel, Pentium 2 CPU released.
1999 – Intel, Pentium 3 CPU released.
32-bit, 550 Mhz
+History – Milestones in Computer Organization
2003 – AMD, Athlon 64 CPU is released.
2005 – AMD, Intel, Dual core CPU’s released.
+Computer and CPU Organization
Definition: The terms processor, CPU, and computational engine refer
broadly to any mechanism that drives computation.
+Von Neumann Architecture
Stored Program/Data concept Characteristic of most modern CPUs
Main memory stores programs (instructions) and data.
ALU operates on binary data.
Control unit interprets instructions from memory and passes information along to ALU.
+Von Neumann Architecture – Three Basic Components
CPU
Memory
I/O facilities Control units Busses
All interact to form a complete computer
+Structure of Von Neumann Architecture
+Milestones in Computer Organization
Moore’s Law: Number of transistors on a chip doubles every 18 months.
+Milestones in Computer Organization
Moore’s Law: Increased density of components on chip. Originally, thought number of transistors on a chip will double
every year. Since the 1970’s, development has slowed. Number of
transistors doubles every 18 months. Cost of chip has remained almost unchanged. Higher packing density means shorter electrical path, giving
higher performance. Smaller size gives increased flexibility. Reduced power and cooling requirements. Fewer interconnections increases reliability.
+CPU Transistor Sizes - Intel 8086 – 29K transistors, 3µm
80186 – 29K transistors, 2µm
80286 – 134k transistors, 1.5µm
80386 – 855k transistors, 1µm
80486 – 1.6m transistors, 0.6µm
Pentium 1 – 4.5m transistors, 0.35µm
Pentium 2 – 7.5m transistors, 0.35µm
Pentium 3 – 9.5m transistors, 0.25µm
Pentium 4 – 42m transistors, 0.18µm
Pentium m – 140m transistors, 0.13µm
Pentium D – 230m transistors, 90nm
Core 2 – 291m transistors, 65nm
Current Intel architecture: Core I series “Haswell” I7: 1.4b transistors, 22nm
+Growth in CPU Transistor Count
+Memory Hierarchy Importance
1980: No cache memory on CPU.
1989: First Intel CPU that included cache memory.
1995: 2-level cache on CPU.
2003: 3-level cache on CPU.
2013: 4-level cache on CPU. Intel Haswell architecture
w/ integrated Iris Pro Graphics
+How to Increase Performance?
Pipelining
On board cache memory
Branch prediction
Data flow analysis
Speculative execution
+Performance Balance
CPU performance increasing
Memory capacity increasing
Memory speed lagging behind CPU performance
+Core Memory
1950’s – 1970’s
1 core = 1 bit Polarity determines logical “1” or “0”
Roughly 1Mhz clock rate.
Up to 32kB storage.
+Semiconductor Memory
1970’s - Today Fairchild Size of a single core
i.e. 1 bit of magnetic core storage Holds 256 bits
Non-destructive read, but volatile SDRAM most common, uses capacitors.
Much faster than core Today: 1.3 – 3.1 Ghz
Capacity approximately doubles each year. Today: 64GB per single DIMM
+CPU (Logic) and Memory Performance Gap
+Solutions
Increase number of bits retrieved at one time Make DRAM “wider” rather than “deeper”
Change DRAM interface Cache
Reduce frequency of memory access More complex cache and cache on chip
Increase interconnection bandwidth High speed buses Hierarchy of buses
+Improvements in CPU Organization and Architecture
Increase hardware speed of processor Fundamentally due to shrinking logic gate size
More gates, packed more tightly, increasing clock rate Propagation time for signals reduced
Increase size and speed of caches Dedicating part of processor chip
Cache access times drop significantly
Change processor organization and architecture Increase effective speed of execution Parallelism
+Problems with Clock Speed and Logic Density
Power Power density increases with density of logic and clock speed Dissipating heat
Resistor-Capacitor (RC) delay Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them Delay increases as RC product increases Wire interconnects thinner, increasing resistance Wires closer together, increasing capacitance
Memory latency Memory speeds lag processor speeds
Solution: More emphasis on organizational and architectural approaches
+Intel CPU Performance
+Increased Cache Capacity
Typically two or three levels of cache between processor and main memory.
Chip density increased More cache memory on chip
Faster cache access
Pentium chip devoted about 10% of chip area to cache.
Pentium 4 devotes about 50%.
+Increased Cache Capacity
+
+More Complex Execution Logic
Enable parallel execution of instructions
Pipeline works like assembly line Different stages of execution of different
instructions at same time along pipeline
Superscalar allows multiple pipelines within single processor Instructions that do not depend on one another
can be executed in parallel
+Diminishing Returns
Internal organization of processors complex Can get a great deal of parallelism Further significant increases likely to be
relatively modest
Benefits from cache are reaching limit
Increasing clock rate runs into power dissipation problem Some fundamental physical limits are being
reached
+New Approach – Multiple Cores Multiple processors on single chip
Large shared cache
Within a processor, increase in performance proportional to square root of increase in complexity
If software can use multiple processors, doubling number of processors almost doubles performance
So, use two simpler processors on the chip rather than one more complex processor
With two processors, larger caches are justified Power consumption of memory logic less than processing
logic