A7 report
-
Upload
shahidullah-shahid -
Category
Documents
-
view
221 -
download
0
Transcript of A7 report
8/13/2019 A7 report
http://slidepdf.com/reader/full/a7-report 1/5
Abstruct
ARM's newest processor, the Cortex-A7, is designed for the very efficient, low-cost main stream
mobile handset market. In addition, because of a new ARM innovation, this power efficient
processor will also be used in high-end superphones and tablets as a companion processor to the
Cortex-A15 CPU as a complementary pair, in a new approach called big.LITTLE processing.
This report will discuss how the extremely power-efficient design will enable entry smartphone
SoC designs as well as high end mobile products. This report will describe in detail the design
choices considered including choice of feature set and performance level, and how its simplified
pipeline enables dramatically lower power consumption. This processor is ideal for not just the
mobile but a slew of other embedded markets.
IntroductionThe Cortex-A7 processor was designed primarily for power-efficiency and a small footprint. The
design team based the pipeline on the extremely power efficient Cortex-A5 CPU, then added
microarchitecture enhancements to increase performance and architectural enhancements to
deliver full software compatibility with the Cortex-A15 CPU. These architectural enhancements
include support for virtualization and 40-bit physical address space, and AMBA® 4 bus
interfaces. Virtualization and large address space are unusual features for so small a CPU, but are
critical to present a software view of the Cortex-A7 that is identical to the Cortex-A15 high-end
CPU.
Like the Cortex-A5, Cortex-A9, and Cortex-A8 processors that came before it, the Cortex-A7
processor is a full ARM v7A CPU, with support for the Thumb®-2 instruction set, optional 32-
bit/64-bit floating point acceleration and optional NEON™128-bit SIMD architectural blocks.
The Cortex-A7 also includes support for TrustZone® to enable secure operating modes which
are increasingly important in modern mobile OEM designs. To bring higher scalability, the
Cortex-A7 is also configurable as a multicore processor, supporting 1-4 cores in a coherent
cluster.
The Cortex-A7 is a simple in-order pipeline with significant but not complete dual-issue
capability; however the careful choice of design features has enabled the performance of a single
Cortex-A7 core to outperform the full dual-issue Cortex-A8 CPU on some important benchmark
tests like web browsing, while consuming up to 60% less power
8/13/2019 A7 report
http://slidepdf.com/reader/full/a7-report 2/5
Instruction set
The original ARM implementation was hardwired without microcode, like the much simpler 8-
bit 6502 processor used in prior Acorn microcomputers.
The 32-bit ARM architecture (and the 64-bit architecture for the most part, see below for
exceptions) includes the following RISC features:
Load/store architecture.
No support for unaligned memory accesses in the original version of the architecture.
ARMv6 and later, except some microcontroller versions, support unaligned accesses for
half-word and single-word load/store instructions with some limitations, such as no
guaranteed atomicity.[31][32]
Uniform 16× 32-bit register file (including the Program Counter, Stack Pointer and the Link
Register).
Fixed instruction width of 32 bits to ease decoding and pipelining, at the cost ofdecreased code density. Later, the Thumb instruction set added 16-bit instructions and
increased code density.
Mostly single clock-cycle execution.
To compensate for the simpler design, compared with processors like the Intel 80286
and Motorola 68020, some additional design features were used:
Conditional execution of most instructions reduces branch overhead and compensates for the
lack of a branch predictor.
Arithmetic instructions alter condition codes only when desired.
32-bit barrel shifter can be used without performance penalty with most arithmeticinstructions and address calculations.
Powerful indexed addressing modes.
A link register supports fast leaf function calls.
A simple, but fast, 2-priority-level interrupt subsystem has switched register banks
Arithmetic instructions
The ARM supports add, subtract, and multiply instructions. The integer divide instructions are
only implemented by ARM cores based on the following ARM architectures:
ARMv7-M and ARMv7E-M architectures always include divide instructions.
ARMv7-R architecture always includes divide instructions in the Thumb instruction set, but
optionally in its 32-bit instruction set.
8/13/2019 A7 report
http://slidepdf.com/reader/full/a7-report 3/5
ARMv7-A architecture optionally includes the divide instructions. The instructions might
not be implemented, or implemented only in the Thumb instruction set, or implemented
in both the Thumb and ARM instructions sets, or implemented if the Virtualization
Extensions are included.
Registers
Registers R0 through R7 are the same across all CPU modes; they are never banked. R13 and
R14 are banked across all privileged CPU modes except system mode. That is, each mode that
can be entered because of an exception has its own R13 and R14. These registers generally
contain the stack pointer and the return address from function calls, respectively.
Software Benchmarks
The performance of Cortex-A7 on a range of benchmarks is 15%~20% higher than Cortex-A5. It
trails behind Cortex-A8 slightly on integer workloads where data and code are L1 cache resident,
but is faster at floating point math and can also outperform the larger Cortex-A8 CPU on typical
modern workloads that havecomplicated branch and TLB behavior, due to the memory system
optimizations that were included in the Cortex-A7 design.
Large workloads like web browsing or compute-intensive Apps do stress the memory system of
a processor, and on these types of workloads the Cortex-A7 performance can be more than 20%
faster than Cortex-A5and can actually outperform the Cortex-A8 at an equivalent clock rate. The
Cortex-A8, with its full dual-issue superscalar design, outperforms Cortex-A7 as expected oninteger benchmarks that have low cache missrates and TLB miss rates, but on complex
workloads the memory system improvements in Cortex-A7 have enabled the simpler processor
to outperform the more complex superscalar Cortex-A8.
Power/Performance of Multicore Cortex-A7 SoCs
The Cortex-A7 improves on the MP model in Cortex-A9 and Cortex-A5 based on learning from
3 generations of multicore designs at ARM. In particular, the Cortex-A7 incorporates bandwidth
optimizations such as 128-bit wide data read buses, 256-bit wide data write buses and 256-bit
wide data snoop buses. The external interface to the SoC is also revised to the 128-bit AMBA4
master port, which helps multicore performance by increasing the bandwidth delivered to the
coherent cores in the SMP cluster.
8/13/2019 A7 report
http://slidepdf.com/reader/full/a7-report 4/5
Pipelines and other implementation issues
The ARM7 and earlier implementations have a three-stage pipeline; the stages being fetch,
decode and execute. Higher-performance designs, such as the ARM9, have deeper pipelines:
Cortex-A8 has thirteen stages. Additional implementation changes for higher performance
include a faster adder and more extensive branch prediction logic. The difference between theARM7DI and ARM7DMI cores, for example, was an improved multiplier; hence the added "M".
Fig: Cortex A7 pipeline
Conclusion
The Cortex-A7 CPU enables the performance of 2011 mainstream smartphones in entry-level
smartphones and tablets of 2013, through enhancements to the microarchitecture, memoryinterface improvements, and innovative power efficient processor design. Large volume low-cost
smartphones will take advantage of the Cortex-A7 CPU’s low power and efficient performance.
In addition, the Cortex-A7 CPU enables big.LITTLE processing, a breakthrough innovation from
ARM, delivering the peak performance of Cortex-A15 within a low average power budget driven
by the Cortex-A7. Big.LITTLE processing will enable high-end smartphones tablets in 2013
with lower power consumption than high-end smartphones of today.
8/13/2019 A7 report
http://slidepdf.com/reader/full/a7-report 5/5
Finally, full architectural compatibility with the high end Cortex-A15 in the small Cortex-A7
power and area footprint will enable applications we haven’t thought of yet.
References
http://www.chipworks.com/en/technical-competitive-
analysis/resources/blog/inside-the-iphone-5s/
http://techcrunch.com/2013/09/20/chipworks-apples-a7-chip-
made-by-samsung-m7-co-processor-by-nxp/
http://www.eetimes.com/document.asp?doc_id=1279167
http://www.arm.com/products/processors/cortex-a/cortex-a7.php