Complete Lessonplan Aca 12 Unki
-
Upload
santosh-dewar -
Category
Documents
-
view
220 -
download
0
Transcript of Complete Lessonplan Aca 12 Unki
-
7/27/2019 Complete Lessonplan Aca 12 Unki
1/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
DEPARTMENT OF COMPUTER SCIENCE
Department of computer science & Engineering
Title of the Course: Advanced Computer Architecture Course Code: 10CS81Type of the Course: Lecture Designation: Core
Total Hrs. 52 Hrs/Week: 04
Exam Hours: 03 Exam Marks: 100
Semester: 08
Course Assignment Methods: Continuous (Three IA Tests & One Main VTU Examination)
Prerequisites:
1. Familiarity with computer organization
2. Basic concepts of cache memory and microprocessor
Syllabus:
PART - A
UNIT - 1
FUNDAMENTALS OF COMPUTER DESIGN: Introduction; Classes of computers;
Defining computer architecture; Trends in Technology, power in Integrated Circuits and cost;
Dependability; Measuring, reporting and summarizing Performance; Quantitative Principles ofcomputer design.
6 hours
UNIT - 2
PIPELINING: Introduction; Pipeline hazards; Implementation of pipeline; What makes
pipelining hard to implement?
6 Hours
UNIT - 3
INSTRUCTION LEVEL PARALLELISM 1: ILP: Concepts and challenges; Basic
Compiler Techniques for exposing ILP; Reducing Branch costs with prediction; OvercomingData hazards with Dynamic scheduling; Hardware-based speculation.
7 Hours
UNIT - 4
INSTRUCTION LEVEL PARALLELISM 2: Exploiting ILP using multiple issue and staticscheduling; Exploiting ILP using dynamic scheduling, multiple issue and speculation; Advanced
Techniques for instruction delivery and Speculation; The Intel Pentium 4 as example.
7 Hours
-
7/27/2019 Complete Lessonplan Aca 12 Unki
2/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
PART - B
UNIT - 5
MULTIPROCESSORS AND THREAD LEVEL PARALLELISM: Introduction;Symmetric shared-memory architectures; Performance of symmetric sharedmemory
multiprocessors; Distributed shared memory and directory-based coherence; Basics of
synchronization; Models of Memory Consistency.
7 Hours
UNIT - 6
REVIEW OF MEMORY HIERARCHY: Introduction; Cache performance; CacheOptimizations, Virtual memory.
6 Hours
UNIT - 7
MEMORY HIERARCHY DESIGN: Introduction; Advanced optimizations of Cacheperformance; Memory technology and optimizations; Protection: Virtual memory and virtual
machines.
6 Hours
UNIT - 8
HARDWARE AND SOFTWARE FOR VLIW AND EPIC: Introduction: ExploitingInstruction-Level Parallelism Statically; Detecting and Enhancing Loop-Level Parallelism;
Scheduling and Structuring Code for Parallelism; Hardware Support for Exposing Parallelism:
Predicated Instructions; Hardware Support for Compiler Speculation; The Intel IA-64
Architecture and Itanium Processor; Conclusions.
7 Hours
TEXT BOOK:1. Computer Architecture, A Quantitative Approach John L. Hennessey and David
A. Patterson:,4th Edition,Elsevier, 2007.
REFERENCE BOOKS:
1. Advanced Computer Architecture Parallelism, Scalability Kai Hwang:,
Programability, Tata Mc Grawhill, 2003.
-
7/27/2019 Complete Lessonplan Aca 12 Unki
3/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
2. Parallel Computer Architecture, A Hardware / Software Approach David E.Culler, Jaswinder Pal Singh, Anoop Gupta:, Morgan Kaufman, 1999.
Course Overview and its relevance to program:
The term architecture in computer literature can be traced to the work of Lyle R. Johnson,
Muhammad Usman Khan and Frederick P. Brooks, Jr., members in 1959 of the Machine
Organization department in IBMs main research center. Johnson had the opportunity to write aproprietary research communication about Stretch, an IBM-developed supercomputer for Los
Alamos Scientific Laboratory. In computer science and computer engineering, computer
architecture or digital computer organization is the conceptual design and fundamentaloperational structure of a computer system. It's a blueprint and functional description of
requirements and design implementations for the various parts of a computer, focusing largely on
the way by which the central processing unit(CPU) performs internally and accesses addressesin memory. It may also be defined as the science and art of selecting and interconnecting
hardware components to create computers that meet functional, performance and cost goals.
Computer technology has made incredible progress in the roughly from last 55 years. This rapid
rate of improvement has come both from advances in the technology used to build computersand from innovation in computer design.
Advanced computer architecture aims to develop a thorough understanding of high-performanceand energy-efficient computers as a basis for informed software performance engineering and as
a foundation for advanced work in computer architecture, compiler design, operating systems
and parallel processing.
This course contains pipelined CPU architecture instruction set design and pipeline structure,
dynamic scheduling using scoreboarding and Tomasulo's algorithm, register renaming, software
instruction scheduling and software pipelining, superscalar and long-instruction-wordarchitectures (VLIW, EPIC and Itanium), branch prediction and speculative execution.
The cache memory associativity, allocation and replacement policies, multilevel caches, cache
performance issues. uniprocessor cache coherency issues are discussed with examples.Implementations of shared memory, the cache coherency problem. the bus-based 'snooping'
protocol, scalable shared memory using directory-based cache coherency are explained with
practical examples.
Applications:
1. To understand various computer architectures currently used in market2. To understand parallel programming.
3. To design new computer architectures
http://en.wikipedia.org/wiki/Supercomputerhttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_engineeringhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Blueprinthttp://en.wikipedia.org/wiki/Central_processing_unithttp://en.wikipedia.org/wiki/Central_processing_unithttp://en.wikipedia.org/wiki/Memory_addresshttp://en.wikipedia.org/wiki/Memory_addresshttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_engineeringhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Blueprinthttp://en.wikipedia.org/wiki/Central_processing_unithttp://en.wikipedia.org/wiki/Memory_addresshttp://en.wikipedia.org/wiki/Memory_addresshttp://en.wikipedia.org/wiki/Supercomputer -
7/27/2019 Complete Lessonplan Aca 12 Unki
4/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
PART_A
UNIT I
UNIT WISE PLAN
Chapter Number: 1 No of Hours: 06
Unit Title: FUNDAMENTALS OF COMPUTER DESIGN
Learning Objectives:
Atthe end of this unit students will understand:
1. Classes of computers, Practical knowledge of computer architecture
2. Trends in Technology, Power in IC and cost
3. Quantitative Principles
4. Performance
5. Real processor examples
Lesson Plan:
L1. Introduction; Classes of computers
L2. Defining computer architecture
L3. Trends in Technology, power in Integrated Circuits and cost
L4. Dependability.
L5. Measuring, reporting and summarizing Performance
L6. Quantitative Principles of computer design
-
7/27/2019 Complete Lessonplan Aca 12 Unki
5/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Assignment Questions:
Q1) Explain the growth in processor and computer performance using a graph.
Q2) Explain the different classes of computers.Q3) Define computer architecture. Discuss the 7 dimensions of ISA.Q4) Explain the meaning of following MIPS instructions and explain instruction formats.
.
Q5) List the most important functional requirements an architect faces. `Q6) Explain the different trends in technology.
Q7) Write the formulas for the following (i) Powerdynamic (ii) Energy dynamic (iii) Powerstatic A 20% reduction in voltage may result in a 10% reduction in frequency. What would be the
impact on dynamic power.
Q8) Write the formulas for the following.
(i) cost of IC (ii) cost of die (iii) dies per wafer (iv) die yield
Find the die yield for a die that is 2.0 cm on a side, assuming a defect density of 0.3 per cm2and is 4.
Q9) Explain MTTF and MTTR. Calculate reliability of a redundant power supply if MTTF of
Power supply is 5*105 hours and it takes on average 48 hours for a human operator torepair the system. Assume two power supplies are available.
Q10) Explain the different desktop and server benchmarks.
-
7/27/2019 Complete Lessonplan Aca 12 Unki
6/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
UNIT II
UNIT WISE PLAN
Chapter Number: Appendix A No of Hours: 06
Unit Title: PIPELINING
Learning Objectives:
Atthe end of this unit students will understand:
1. Pipeline basics, hazards
2. Implementation of pipeline
3. Pipeline to design parallel processors
4. Performance evalvation of pipeline processors
5. Applications of pipeline
Lesson Plan:
L1. Introduction
L2. Pipeline hazards
L3. Pipeline hazards continued
L4. Implementation of pipeline
L5. Implementation of pipeline continued
L6. What makes pipelining hard to implement?
-
7/27/2019 Complete Lessonplan Aca 12 Unki
7/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Assignment Questions:
Q1) What is pipelining? Explain the basics of RISC instruction set.
Q2) Explain the simple implementation of a RISC instruction setQ3) Explain the classic five stage pipeline for RISC processor and explain the useof pipeline registers.
Q4) Assume that unpipelined processor has a 1ns clock cycle and that it uses 4 cycles for ALU
operations and branches and 5 cycles for memory operations. Assume that the relativefrequencies of these operations are 30%, 20% and 50% respectively. Suppose that due to
clock skew and setup, pipelining the processor adds 0.3ns of overhead to the clock. Ignoring
any latency impact, how much speedup in the instruction execution rate will we gain from apipeline?
Q5) Explain the major hurdles of pipelining-pipeline hazards in brief.
Q6) Explain in detail the data hazard with an example.
Q7) Discuss branch hazards along with reducing pipeline branch penalties andscheduling branch delay slot.
Q8) Explain the simple implementation of MIPS with a neat diagram
Q9) Explain the basic pipeline for MIPS and discus implementation of controlfor MIPS & branches.
Q10) Explain the five categories of exceptions.
-
7/27/2019 Complete Lessonplan Aca 12 Unki
8/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
UNIT III
UNIT WISE PLAN
Chapter Number: 2 No of Hours: 07
Unit Title: INSTRUCTION LEVEL PARALLELISM 1
Learning Objectives:
Atthe end of this unit students will understand:
1. Parallel processing using ILP2. Static and Dynamic scheduling
3. Speculation
4. Implementation of scheduling algorithms.
5. Implementation of reducing branch costs
Lesson Plan:
L1. ILP: Concepts and challenges
L2. Basic Compiler Techniques for exposing ILP
L3. Reducing Branch costs with prediction
L4. Reducing Branch costs with prediction -Examples.
L5. Overcoming Data hazards with Dynamic scheduling
L6. Overcoming Data hazards with Dynamic scheduling- Examples
L7. Hardware-based speculation
-
7/27/2019 Complete Lessonplan Aca 12 Unki
9/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Assignment Questions:
Q1) What is ILP? What are the ILP Concepts and challenges?Q2) Discuss data dependences and hazards.
Q3) Discuss control dependences with examples .
Q4) Explain the basic Compiler Techniques for exposing ILP Examples.Q5) Explain the methods for reducing branch costs with prediction.
Q6) Explain the method for overcoming Data hazards with Dynamic scheduling.
Q7) Explain the various fields in reservation station with an example.
Q8) Explain tomasulo algorithm using loop based example.Q9) Explain hardware-based speculation and explain the basic structure of a FP
unit using tomasulo algorithm and extended to handle speculation.
-
7/27/2019 Complete Lessonplan Aca 12 Unki
10/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
UNIT IV
UNIT WISE PLAN
Chapter Number: 2 No of Hours: 07
Unit Title: INSTRUCTION LEVEL PARALLELISM 2
Learning Objectives:
Atthe end of this unit students will understand:
1. ILP-multiple issue and static scheduling2. Dynamic scheduling
3. Instruction delivery
4. Exploiting ILP
5. Intel Pentium 4 for understanding ILP
Lesson Plan:
L1. Exploiting ILP using multiple issue and static scheduling
L2. Exploiting ILP using dynamic scheduling, multiple issue and speculation
L3. Exploiting ILP using dynamic scheduling, multiple issue and speculation-examples
L4. Advanced Techniques for instruction delivery and Speculation
L5. Advanced Techniques for instruction delivery and Speculation-examples
L6. The Intel Pentium 4 as example.
L7. The Intel Pentium 4 as example-analysis
-
7/27/2019 Complete Lessonplan Aca 12 Unki
11/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Assignment Questions:
Q1) List the five primary approaches in use for multiple-issue processors and
their primary characteristics.Q2) Explain the basic VLIW approach for exploiting ILP using an example.
Q3) Explain exploiting ILP using dynamic scheduling, multiple issue and speculation
examples.Q4) Explain increasing instruction fetch bandwidth for instruction delivery and
Speculation.
Q5) Explain increasing instruction fetch bandwidth for instruction delivery and
Speculation.Q6) Explain the Pentium 4 microarchitecture with a neat diagram
Q7) List the important characteristics of the recent pentiun 4 640
Q8) Explain the analysis of the perfiormance of the Pentium 4.
-
7/27/2019 Complete Lessonplan Aca 12 Unki
12/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
PART_B
UNIT V
UNIT WISE PLAN
Chapter Number: 4 No of Hours: 07
Unit Title: MULTIPROCESSORS AND THREAD LEVEL PARALLELISM
Learning Objectives:
Atthe end of this unit students will understand:
1. Multiprocessors2. Shared-memory architectures
3. Distributed shared memory
4. Performance of symmetric sharedmemory multiprocessors
5. Synchronization and Memory Consistency
Lesson Plan:
L1. Introduction to multiprocessors
L2. Symmetric shared-memory architectures
L3. Performance of symmetric sharedmemory multiprocessors
L4. Distributed shared memory
L5. Directory-based coherence;
L6. Basics of synchronization
L7. Models of Memory Consistency
-
7/27/2019 Complete Lessonplan Aca 12 Unki
13/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Assignment Questions:
Q1) Explain the taxonomy of parallel architectures and draw the basic structure of shared
memory and distributed memory multiprocessor
Q2) Suppose you want to achieve a speedup of 80 with 100 processors. What fraction of the
original computation can be sequential?Q3) What is multiprocessor cache coherence? Explain with an example.
Q4) What are the basic schemes for enforcing coherence? Explain in brief .
Q5) Explain Snooping protocols and basic implementation techniques with an exampleprotocol
Q6) Explain Performance of symmetric sharedmemory multiprocessors
for a commercial workload
Q7) Explain distributed shared memory and directory-based coherence withan example protocol.
Q8) Explain basics of synchronization
Q9) Explain Models of Memory Consistency
-
7/27/2019 Complete Lessonplan Aca 12 Unki
14/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
UNIT VI
UNIT WISE PLAN
Chapter Number: Appendix C No of Hours: 06
Unit Title: REVIEW OF MEMORY HIERARCHY
Learning Objectives:
Atthe end of this unit students will understand:
1. Cache memory
2. Virtual memory
3. Mathematical and theory aspects of cache
4. Problems based on cache
5. Cache Optimization methods
Lesson Plan:
L1. Introduction
L2. Cache performance
L3. Cache Optimizations
L4. Virtual memory
L5. Numerical Problems-1
L6. Numerical Problems-2
Assignment Questions:
-
7/27/2019 Complete Lessonplan Aca 12 Unki
15/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Q1) Assume we have a computer where the CPI is 2.0 when all memory accesses hit in the
cache. The only data accesses are loads and stores ,and these total 40% of the instructions.
If the miss penalty is 35 clock cycles and the miss rare is 3%, how much faster would thecomputer be if all instructions were cache hits?
Q2) What do you mean by memory stall cycles? List the different formulae for memory stallcycles.
Q3) Explain different block placement methods with neat diagrams.
Q4) Explain the following terms: (i)Write through (ii) Write back (iii) Write stall and Write
buffer (iv) Write allocate (v) No-write allocateQ5) Explain the organization of Opteron data cache with a neat diagram.
Q6) Explain multilevel caches to reduce miss penalty. Discuss average memory access time,
local miss rate, global miss rate w.r.t. multilevel caches.Q7) Suppose that in 1000 memory references there are 50 misses in the first level cache and 30
misses in the second level cache. What are the various miss rates? Assume the miss penalty
from L2 cache to memory is 250 clock cycles, the hit time of L2 cache is 15 clock cycles,the hit time of L1 is 1 clock cycle, and there are 1.4 memory references per instruction.
What is the average memory access time and average stall cycles per instruction?
Q8) Compare paging and segmentation with neat diagrams.Q9) List the typical levels in memory hierarchy with their important features.
-
7/27/2019 Complete Lessonplan Aca 12 Unki
16/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
UNIT VII
UNIT WISE PLAN
Chapter Number: No of Hours: 06
Unit Title: REVIEW OF MEMORY HIERARCHY
Learning Objectives:
Atthe end of this unit students will understand:
1. Memory hierarchy and cache optimization
2. Memory technology
3. Virual machines
4. Cache performance
5. Protection using virtual memory and virtual machines
Lesson Plan:
L1. Introduction to memory hierarchy design
L2. Advanced optimizations of Cache performance
L3. Memory technology and optimizations
L4. Protection: Virtual memory
L5. Virtual machines.L6. Numerical problems
Assignment Questions:
-
7/27/2019 Complete Lessonplan Aca 12 Unki
17/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Q1) Explain the following optimizations techniques which reduce hit time.
(i)Small and simple caches (ii) Way prediction (iii) Trace caches
Q2) Explain the compiler optimization techniques to reduce miss rate.Q3) Differentiate between SRAM and DRAM. Draw the internal organization of
64M bit DRAM.
Q4) List eleven advanced optimizations of cache performance and explain any one.Q5) Explain optimizations techniques for increasing cache bandwidth.
Q6) Explain memory technology and optimizations.
Q7)Explainoptimizations techniques for Reducing miss penalty.
Q8) Explain protection via virtual memory.Q9) Explain protection via virtual machines.
Q10) Explain Xen virtual machine .
-
7/27/2019 Complete Lessonplan Aca 12 Unki
18/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
UNIT VIII
UNIT WISE PLAN
Chapter Number: Appendix G No of Hours: 07
Unit Title: HARDWARE AND SOFTWARE FOR VLIW AND EPIC
Learning Objectives:
Atthe end of this unit students will understand:
1. VLIW.
2. EPIC
3. Intel IA-64 Architecture, Itanium Processor
4. Loop-Level Parallelism, Code for Parallelism
5. Hardware Support for Parallelism
Lesson Plan:
L1. Introduction: Exploiting Instruction-Level Parallelism Statically
L2. Detecting and Enhancing Loop-Level Parallelism
L3. Scheduling and Structuring Code for Parallelism
L4. Hardware Support for Exposing Parallelism: Predicated Instructions
L5. Hardware Support for Compiler Speculation
L6. The Intel IA-64 Architecture
L7. Itanium Processor; Conclusions.
-
7/27/2019 Complete Lessonplan Aca 12 Unki
19/19
B.L.D.E.AsV.P Dr P.G. H College Of Engineering & Technology, Bijapur.
Assignment Questions:
Q1) Explain the methods, advantages and disadvantages for exploiting
instruction-level parallelism statically.Q2) Explain the methods for detecting and enhancing loop-level parallelism
Q3)Explain software pipelining using symbolic loop unrolling.
Q4) Explain global code schedulingQ5) Explain hardware support for exposing parallelism using predicated
instructions in detail.
Q6)Explainhardware support for compiler speculation.
Q7) Explain superblocks using a flowchartQ8) Explain IA-64 instruction set architecture
Q9) Explain Itanium 2 processor in detail.