Prestasi dan Kos

27
Prestasi dan Kos ° Dr perspektif pembelian Ada pelbagai koleksi mesin, ada mesin yang - best performance ? - least cost ? - best performance / cost ? ° Dr perspektif rekabentuk Berdepan dgn pilihan rekabentuk, ada rekabentuk yang - best performance improvement ? - least cost ? - best performance / cost ? ° Kedua-dua perlu basis for comparison metric for evaluation ° Our goal is to understand cost & performance implications of architectural choices

description

Prestasi dan Kos. Dr perspektif pembelian Ada pelbagai koleksi mesin, ada mesin yang best performance ? least cost ? best performance / cost ? Dr perspektif rekabentuk Berdepan dgn pilihan rekabentuk, ada rekabentuk yang best performance improvement ? least cost ? - PowerPoint PPT Presentation

Transcript of Prestasi dan Kos

Page 1: Prestasi dan Kos

Prestasi dan Kos

° Dr perspektif pembelian

• Ada pelbagai koleksi mesin, ada mesin yang

- best performance ?

- least cost ?

- best performance / cost ?

° Dr perspektif rekabentuk

• Berdepan dgn pilihan rekabentuk, ada rekabentuk yang

- best performance improvement ?

- least cost ?

- best performance / cost ?

° Kedua-dua perlu

• basis for comparison

• metric for evaluation

° Our goal is to understand cost & performance implications of architectural choices

Page 2: Prestasi dan Kos

Two notions of “performance”

° Masa utk melakukan tugas (Execution Time)

– masa perlaksanaan, masa respon, latency (pendaman)

° Bil. tugasan sehari, sejam, seminggu, sesaat, ns. .. (Prestasi)

– prestasi, truput, bandwidth (lebar jalur)

Response time and throughput often are in opposition - why?

Plane

Boeing 747

BAD/Sud Concodre

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput (pmph)

286,700

178,200

Yang mana berprestasi lbh tinggi?

Page 3: Prestasi dan Kos

Definasi Prestasi

° Prestasi ialah dlm bil. unit sesuatu benda-per-saat• Lbh besar lagi bagus

° Jk kita pertimbangkan ‘response time’• performance(x) = 1

execution_time(x)

° " X ialah n kali lbh laju dr Y" iaitu

performance(X) execution_time(Y)

n = ---------------------- = ----------------------

performance(Y) execution_time(X)

° Bilakah truput lbh penting drpd ‘execution time’?

° Bilakah ‘execution time’ lbh penting drpd truput?

Page 4: Prestasi dan Kos

Cth prestasi

• Masa Concorde lwn. Boeing 747?

• Concord iallah 1350 mph / 610 mph = 2.2 kali lbh laju

= 6.5 hours / 3 hours

• Truput bg Concorde lwn. Boeing 747 ?

• Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster”

• Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster”

• Boeing adalah 1.6 kali (“60%”) lbh laju sekiranya dlm truput

• Concord adalah 2.2 kali (“120%”) lbh laju dr segi masa penerbangan

• Apabila membincangkan prestasi pemproses, kita fokuskan kepada ‘execution time’ utk satu tugas (job) - kenapa?

Page 5: Prestasi dan Kos

Memahami Prestasi

° Sejauh mana perkara berikut memberi kesan kpd ‘response time’ dan truput?

• Meningkat kelajuan clock pemproses.

• Meningkatkan bilangan job dlm sistem (cth, satu komputer melayan multi pengguna).

• Meningkat bilangan pemproses dlm sistem yg menggunakan multi pemproses (cth, a network of ATM machines).

° Jk Pentium III melarikan satu program dlm masa 8 saat dan PowerPC melarikan program yg saman dlm masa 10 saat, berapa kali kelajuan Pentium Pro?

n = 10 / 8 = 1.25 kali lbh laju (or 25% faster)

Page 6: Prestasi dan Kos

Definasi Masa

° Ada beberapa definasi masa, bergantung kpd apa yg kita ukur:

• Response time : Jumlah masa utk menyelesaikan tugas, termasuklah masa yg digunakan utk perlaksanaan pd CPU, capaian cakera dan memori, tunggu I/O dan pemprosesan lain, dan OS overhead.

• CPU execution time : Jumlah masa yg digunakan oleh CPU utk menyelesaikan tugas yg diberi (tdk termasuk masa I/O atau masa larian program lain). Ia juga dikenali sbg CPU time.

• User CPU time : Jumlah masa yg diperlukan oleh CPU dlm program

• System CPU execution time : Jumlah masa yg diperlukan oleh OS utk melaksanakan tugas bg program tersebut.

° For example, a program may have a system CPU time of 22 sec., a user CPU time of 90 sec., a CPU execution time of 112 sec., and a response time of 162 sec..

Page 7: Prestasi dan Kos

Jam Komputer (Computer Clocks)

° ‘Computer clock’ runs at a constant rate and determines when events take placed in hardware.

Clk

° The clock cycle time is the amount of time for one clock period to elapse (e.g. 5 ns).

° The clock rate is the inverse of the clock cycle time.

° For example, if a computer has a clock cycle time of 5 ns, the clock rate is:

1 ---------------------- = 200 MHz

5 x 10 sec

clock period

-9

Page 8: Prestasi dan Kos

Computing CPU time

° The time to execute a given program can be computed asCPU time = CPU clock cycles x clock cycle time

Since clock cycle time and clock rate are reciprocalsCPU time = CPU clock cycles / clock rate

° The number of CPU clock cycles can be determined byCPU clock cycles = (instructions/program) x (clock cycles/instruction)

= Instruction count x CPI

which givesCPU time = Instruction count x CPI x clock cycle time

CPU time = Instruction count x CPI / clock rate

° The units for this are instructions cIock cycles secondsseconds = ----------------- x ----------------- x ---------------- program instruction clock cycle

Page 9: Prestasi dan Kos

Example of Computing CPU time

° If a computer has a clock rate of 50 HHz, how long does it take to execute a program with 1,000 instructions, if the CPI for the program is 3.5?

° Using the equationCPU time = Instruction count x CPI / clock rate

gives CPU time = 1000 x 3.5 / (50 x 10 )

° If a computer’s clock rate increases from 200 MHz to 250 MHz and the other factors remain the same, how many times faster will the computer be?

CPU time old clock rate new 250 MHz------------------- = ---------------------- = ---------------- = 1.25CPU time new clock rate old 200 MHZ

° What simplifying assumptions did we make?

6

Page 10: Prestasi dan Kos

Factors affecting CPU Performance

instr. count CPI clock rate

Program

Compiler

Instr. Set Arch.

Organization

Technology

° Which factors are affected by each of the following?

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Page 11: Prestasi dan Kos

Computing CPI

° The CPI is the average number of cycles per instruction.

° If for each instruction type, we know its frequency and number of cycles need to execute it, we can compute the overall CPI as follows:

CPI = Σ CPI x F

° For examplei = 1

n

i i

Op F CPI CPI x F % Time

ALU 50% 1 .5 23%

Load 20% 5 1.0 45%

Store 10% 3 .3 14%

Branch 20% 2 .4 18%

Total 100% 2.2 100%

i i i i

Page 12: Prestasi dan Kos

Performance Summary

° The two main measure of performance are• execution time : time to do the task

• throughput : number of tasks completed per unit time

° Performance and execution time are reciprocals. Increasing performance, decreases execution time.

° The time to execute a given program can be computed as:

CPU time = Instruction count x CPI x clock cycle time

CPU time = Instruction count x CPI / clock rate

° These factors are affected by compiler technology, the instruction set architecture, the machine organization, and the underlying technology.

° When trying to improve performance, look at what occurs frequently => make the common case fast.

Page 13: Prestasi dan Kos

Computer Benchmarks

° A benchmark is a program or set of programs used to evaluate computer performance.

° Benchmarks allow us to make performance comparisons based on execution times

° Benchmarks should• Be representative of the type of applications run on the computer

• Not be overly dependent on one or two features of a computer

° Benchmarks can vary greatly in terms of their complexity and their usefulness.

Page 14: Prestasi dan Kos

Jenis Benchmark

Actual Target Workload

Full Application Benchmarks (e.g., SPEC benchmarks)

Small “Kernel” Benchmarks

Microbenchmarks

Pros Cons

• representative• very specific• non-portable• difficult to run, or measure• hard to identify cause

• portable• widely used• improvements useful in reality

• easy to run, early in design cycle

• identify peak capability and potential bottlenecks

•less representative

• does not measure memory system

• “peak” may be a long way from application performance

Page 15: Prestasi dan Kos

SPEC: System Performance Evaluation Cooperative

° Bencmark2 SPEC yg kerap digunakan dgn meluas utk merekodkan prestasi workstation dan PC.

• First Round SPEC CPU89- 10 programs yielding a single number

• Second Round SPEC CPU92- SPEC CINT92 (6 integer programs) and SPEC CFP92 (14

floating point programs)- Compiler flags can be set differently for different programs

• Third Round SPEC CPU95- New set of programs: SPEC CINT95 (8 integer programs)

and SPEC CFP95 (10 floating point) - Single compiler flag setting for all programs

• Fourth Round SPEC CPU2000- New set of programs: SPEC CINT2000 (12 integer

programs) and SPEC CFP2000 (14 floating point) - Single compiler flag setting for all programs

° Value reported is the SPEC ratio

° CPU time of reference machine / CPU time of measured machine

Page 16: Prestasi dan Kos

Benchmark2 SPEC yang lain

° JVM98: • Mengukur prestasi Java Virtual Machines

° SFS97: • Mengukur prestasi protokol2 network file server (NFS)

° Web99: • Mengukur prestasi aplikasi2 World Wide Web

° HPC96:• Mengukur prestasi aplikasi besar, industri

° APC, MEDIA, OPC• Mengukur prestasi aplikasi2 grafik

° For more information about the SPEC benchmarks see: http://www.spec.org.

Page 17: Prestasi dan Kos

Cth bagi Benchmark2 SPEC95

° Dibawah merupakan nisbah SPEC ratios utk pemproses Pentium dan Pentium Pro (Pentium+)

Clock Pentium Pentium+ Pentium Pentium+

Rate SPECint SPECint SPEFfp SPECfp

100 MHz 3.2 N/A 2.6 N/A

150 MHZ 4.4 6.0 3.0 5.1

200 MHZ 5.5 8.0 3.8 6.8

° Apa yg kita dapat drpd maklumat ini?

Page 18: Prestasi dan Kos

Peringkasan Prestasi

° Kaedah yg digunakan utk meringkas prestasi bg beberapa benchmark bergantung kpd jenis pengukuran.

° Utk satu set ‘execution time’, T1 ke Tn, gunakan arithmetic mean (AM) atau weighted arithmetic mean (WAM).

• AM = (T1 + T2 + … + TN) / N

• WAM = (W1*T1 + W2*T2 + … + WN*TN)

° Utk satu set ‘normalized execution time ratio’, R1 ke RN, gunakan geometric mean (GM).

• GM = (R1 * R2 *… * RN)^(1/N) /* the Nth root of the product */

• The geometric mean of exeuction time ratios is not proportional to the total execution time.

Page 19: Prestasi dan Kos

Cth bg Peringkasan Prestasi

° Dua program P1 dan P2 dijalankan pd komputer A dan B. Jadual berikut menunjukkan pelbagai keadah utk peringkasan prestasi.

Time A Time B A B A B

P1 1 10 1 10 0.1 1

P2 1000 100 1 0.1 10 1

AM 500.5 55 1 5.05 5.05 1

GM 31.6 31.6 1 1 1 1

| Normalized to A | Normalized to B |

° Apakah kebaikan dan keburukan bila menggunakan ‘geometric mean’ utk menjejak ‘normalized execution times’?

° Utk benchmark SPEC, adakah anda menggunakan arithmetic atau geometric mean? Kenapa?

Page 20: Prestasi dan Kos

Pengukuran Prestasi yg Tidak Bagus

° Ukuran yg digunakan dalam pemasaran utk mengukur prestasi komputer ialah MIPS dan MFLOPS

° MIPS : millions of instructions per second• MIPS = instruction count / (execution time x 10^6)

• Sbg cth, program yg melaksanakan 3 juta arahan dlm masa 2 saat ialah 1.5 MIPS

• Kelebihan : Mudah difahami dan diukur

• Keburukan : tidak menunjukkan prestasi sebenar, kerana arahan yg mudah lagi cepat.

° MFLOPS : millions of floating point operations per second• MFLOPS = floating point operations / (execution time x 10^6)

• Sbg cth, program yg melakasanakan 4 juta arahan dlm masa 5 saat ialah 0.8 MFLOPS

• Kelebihan : Mudah difahami dan diukur

• Keburukan : sama dgn MIPS, hanya ukur titik apungan

Page 21: Prestasi dan Kos

MIPS

° Example 2: Impact of optimizing compiler

Assume the following program makeup:

Operation Freq Clock Cycles

ALU 43% 1

Load 21% 2

Store 12% 2

Branch 24% 2

Assume a 20 ns clock, optimizing compiler eliminates 50% of all ALU operations

Page 22: Prestasi dan Kos

MIPS (cont.)

° Answer

Not Optimized:

Ave CPI = 0.43x1 + 0.21x2 + 0.12x2 + 0.24x2

= 1.57

MIPS = 50 MHz/1.57x10^6 = 31.8

Optimized:

Ave CPI = (0.43/2x1 + 0.21x2 + 0.12x2 + 0.24x2)

(1 - 0.43/2)

= 1.73

MIPS = 50 MHz/1.73x10^6 = 28.6

Page 23: Prestasi dan Kos

Hukum Amdahl

Kepantasan (Speedup) yg disebabkan oleh kemajuan ditakrif sbg:

ExTime old Performance new

Speedup = ------------- = -------------------

ExTime new Performance old

Katakanlah ‘enhancement accelerate’ ialah pecahan

Fractionenhanced of the task by a factor Speedupenhanced,

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedup =ExTimeold

ExTimenew

Speedupenhanced

=

1

(1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

Page 24: Prestasi dan Kos

Cth Hukum Amdahl’

° Arahan titik apungan ditingkatkan dua kali lbh laju, ttp pada hakikatnya hy 10% masa digunakan utk arahan ini. Brp pantas ianya pada mesin baru?

° Mesin baru ialah 1.053 kali lbh laju, atau 5.3% lbh laju.

° Jk arahan titik apungan 100 kali lbh pantas, berapakah kepantasan mesin baru?

Speedup =ExTimeold

ExTimenew

=

1

(1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

Speedup =1

(1 - 0.1) + 0.1/2= 1.053

Speedup =1

(1 - 0.1) + 0.1/100= 1.109

Page 25: Prestasi dan Kos

Menganggarkan Kemajuan Prestasi

° Andaikan pada masa ini pemproses memerlukan 10 saat utk melaksanakan satu program dan pretasi pemproses meningkat 50% setahun.

° Berapakah peningkatan prestasi dlm masa 5 tahun?

(1 + 0.5)^5 = 7.59

° Berapa lama masa diambil oleh pemproses utk melaksanakan program selepas 5 tahun?

ExTimenew = 10/7.59 = 1.32 seconds

° Apakah andaian yg dibuat utk masalah di atas?

Page 26: Prestasi dan Kos

Cth Prestasi

° Computer M1 dan M2 menggunakan set arahan yg sama.

° Clock rate M1 ialah 50 MHz dan M2 ialah 75 MHz.

° CPI bg M1 ialah 2.8 dan bagi M2 ialah 3.2 bg program yg diberi.

° Berapa kali pantas M2 drpd M1 utk program ini?

° Apakah clock rate bg M1 supaya ‘execution time’ kedua-duanya sama?

ExTimeM1 ICM1 x CPIM1 / Clock RateM1

=ExTimeM2 ICM2 x CPIM2 / Clock RateM2

=2.8/50

3.2/75= 1.31

Page 27: Prestasi dan Kos

Ringkasan bg Penilaian Prestasi

° Good benchmarks, spt benchmark SPEC, blh memberikan kaedah penilaian dan perbandingan prestasi komputer dgn tepat.

° Utk ‘execution time’ gunakan ‘arithmetic mean’, ttp utk ‘normalized execution time ratio’ gunakan ‘geomentric mean’.

° MIPS dan MFLOPS mudah digunakan, ttp ia menunjukkan nilai prestasi yg tidak tepat.

° Hukum Amdahl sesuai utk menentukan speedup yg disebabkan kemajuan.