Ppt

AN OVERVIEW OF ARM FAMILIES

ARM7TDMI ARM9TDMI ARM10 ARM11

ARM ARCHITECTURE FAMILIES

ARM's architecture is compatible with all four major platform operating systems: Symbian OS, Palm OS, Windows CE, and Linux.

ARM is the industry standard embedded microprocessor architecture, and is a leader in low-power high performance cores

The ARM7 and ARM9 families have contributed to ARM's success. Each core family has several "children" that incorporate many different value-added features and combinations. Essentially, there are four main families available now for license: ARM7, ARM9 ARM10,ARM 11

Some words about ARM

ARM 7• The ARM7 family features hardened and synthesizable

macrocells with variants that incorporate Cache• with either a memory protection unit (MPU) or memory

management unit (MMU).• Other features include real-time debug (RTD) and real-

time trace (RTT) technology.

ARM 9The ARM9 family consists of hardened macrocells with variants also including cache with an MPUor MMU, as well as the RTD and the RTT. Although the ARM9E-S family was released under a different architecture version, ARMv5TE, the fundamental design of the core is based on theARM9TDMI family. The "E" identifies that the family is a DSP-enhanced architecture and the "S"identifies that the family is synthesizable.

Decreased heat production and lower overheating risk. Clock frequency improvements. Shifting from a three-

stage pipeline to a five-stage one lets the clock speed be approximately doubled, on the same silicon fabrication process.

Cycle count improvements. Many unmodified ARM7 binaries were measured as taking about 30% fewer cycles to execute on ARM9 cores. Key improvements include: Faster loads and stores; many instructions now cost just

one cycle. This is helped by both the modified Harvard architecture (reducing bus and cache contention) and the new pipeline stages.

Exposing pipeline interlocks, enabling compiler optimizations to reduce blockage between stages.

Difference from ARM7

Comparison of the ARM7TDMI with the ARM9TDMI families

(1) Pipeline ComparisonTo increase performance, the pipeline of the ARM9TDMI core

was re-engineered from the threestagEsystem used by the ARM7TDMI family to five stages.

Operations previously performed in the execute stage of ARM7 are spread across four stages in the

ARM9 pipeline: decode, execute, memory, and write. The reorganization and removal of these

critical paths resulted in a much higher clock frequency.

Another performance improvement is the reduced cycles per instruction rating of the processor. Thisis due to improved load and store instruction cycle counts. Single load and store instructions are nowsingle-cycle operations. This is an enhancement over the ARM7 operation, which used the executestage three times: 1)first, to calculate the address; 2)second, to access the memory and cache; and 3)third, towrite the data to the register bank. On ARM9, each step has a separate pipeline stage requiring onlyone cycle, avoiding pipeline stalls.

The Harvard bus architecture creates separate instruction and data memory interfaces, enablingsimultaneous access to instructions and data.The ARM9TDMI represents a new family of CPU technology. The enhancements made to this corefamily doubles the performance of the ARM7TDMI family. The ARM7TDMI family is popular with applications where small die size, high performance, andlow power consumption help reduce system costs, especially when the system does not require cache.The ARM9TDMI family are used for high performance applications that previously could not beimplemented at the same cost

P rogramData

Address

D ata

P rocessorRead

W rite

Read

M icrosoftW ord A dobe P hotos hopเ อ ก ส า ร ข อ ง

W ordรู ป ที่ ก ำา ลั ง ถู ก ตั ด ต่ อ

P hotos hopโ ด ย

0000000000

0000

0000

01

0010

0000

00

0025

0000

00

0925

0000

00

4294

9672

95

W indows XP

CPU

Mem

ory

ARM10ARM10E implements:

• Harvard 6-stage pipeline• Supports v5TE instruction set• Embedded ICE RTII debug logic• Fully compatible with v4T architecture• 390-700 MIPS integer performance based on Dhrystone 2.1• Branch prediction:• Eliminates 70% of branches on typical code sequences• Separate load/store unit:• 64-bit path to register bank - load two registers simultaneously• Hit-under-miss caches:• Significantly reduces pipe-line stalls• Write buffer: Holds up to 8 double-words (16 register values)• New energy saving power down modes

The pipeline was widened to add anadditional stage, and improvements were made to the EmbeddedICE logic to provide support for realtimedebug. All the while, compatibility was maintained with ARMv5TE and v4T for ease of codemigration.Performance enhancements include the introduction of branch prediction, hit-under-miss support inthe MMU and cache architecture, an improved write buffer that holds up to eight double-words, and aseparate load and store unit.These features improve code performance by lowering the averagenumber of cycles per instruction of the processor, and also help when code is heavily dependent oncache operations.

• It also supports an optional vector floating point(VFP) unit. The VFp significantly increases floating point performance.

• VFP (Vector Floating Point) technology is an FPU (Floating-Point Unit) coprocessor extension to the ARM architecture

• It provides low-cost single-precision and double-precision floating-point computation

• VFP provides floating-point computation suitable for a wide spectrum of applications such as PDAs, smartphones, voice compression and decompression, three-dimensional graphics and digital audio, printers, set-top boxes, and automotive applications.

• The VFP architecture was intended to support execution of short "vector mode" instructions but these operated on each vector element sequentially

VFP UNIT

ARM11• ARM is designed for high performance and power efficient

appliations.• ARM1136J-S was the first processor implementation to

execute architecture ARMv6 Instructions• Incorporates an 8 stage pipeline with separate load store

and arithmetic pipelines

DIFFERENCE FROM ARM 9• SIMD instructions which can double MPEG-

4 and audio digital signal processing algorithm speed

• Cache is physically addressed, solving many cache aliasing problems and reducing context switch overhead.

• Unaligned and mixed-endian data access is supported.

• Reduced heat production and lower overheating risk

• Redesigned pipeline, supporting faster clock speeds (target up to 1 GHz)• Longer: 8 (vs 5) stages• Out-of-order completion for some

operations (e.g. stores)• Dynamic branch prediction/folding

(like XScale)• Cache misses don't block execution of

non-dependent instructions.• Load/store parallelism ALU parallelism

64-bit data paths

https://en.wikipedia.org/wiki/64-bit

Ppt

Education

Transcript of Ppt