ARM Increasing ProcPerf Advantage IP - RTC Grouprtcgroup.com/arm/2007/presentations/204 - Increasing...

29
1 Confidential Restricted

Transcript of ARM Increasing ProcPerf Advantage IP - RTC Grouprtcgroup.com/arm/2007/presentations/204 - Increasing...

1ConfidentialRestricted

2ConfidentialRestricted

Increasing Your Processor Performance with ARM Advantage

Memories and Standard Cells

Raviraj Mahatme

3rd October 2007

3ConfidentialRestricted

ARM966E-S™

ARM1026EJ-S™

2005

DM

IPS

250

300

500

ARM7TDMI®

100

ARM946E-S™

Cortex-M3

ARM968E-S™

600

ARM926EJ-S™

Cortex-A8

1000+

ARM1176JZF-S™

ARM1136EJ-S™

2000+

2006

ARM® Cortex™“Intelligent Computing”

ARM11™ MPCore™ x4

ARM1156T2F-S™

ARM7TDMI-S™

ARM7EJ-S™

Flexibility Through ARM Processors

Cortex-R4

Cortex-A9™

4ConfidentialRestricted

But….

Are you getting the optimum benefit?

5ConfidentialRestricted

Or is this what you encounter ?A fast processor with slow memory is like driving a sports car in

heavy traffic….

6ConfidentialRestricted

Its about more than having the right coreA

RM

117

6JZF

-S +

The right ARM core Optimized ARM Physical IP

WINNER

7ConfidentialRestricted

ARM Processor Performance PackageProcessor Performance Package (PPP) is ARM Artisan Physical IP that is optimized for use with high performance ARM processors.

Specially designed and optimized Memory Instances for Core cachememory

High Performance Advantage-HS 12 track standard cell library

Floor planning guidelines and other configuration files for “out of the box” implementation

8ConfidentialRestricted

Why choose the PPP ?Physical implementation of the processor determines system throughput .

Choice of cell library affects power and area numbers. Cache memory performance impacts performance .

PPP provides for up to 20% performance increase over mainstream Advantage memories

With minimal impact on dynamic powerVery little area impact.

Floor planning guidelines & other ARM documentation make implementation simple.

9ConfidentialRestricted

Performance Package ContentsARM Advantage-HS Standard Cell Library

Fast cache CPU memory instances for several cache configurations .

Integration Documents

Library Preparation for leading EDA tool flow.

10ConfidentialRestricted

Package flow

ARM1176JZ[F]-SConfiguration

Prepare Libraries for EDA Flow

Perform Implementation

Step 1

Step 2

Step 3

11ConfidentialRestricted

Processor ConfigurationVerilog Memory Wrappers

The wrappers are used to optimally connect the ARM1176JZ[F]-S processor signals to the fast memory instances provided

Cache Configuration FileThe cache configuration file defines the I and D cache sizes and is specific to the configuration chosen .

ARM1176JZ[F]-S Architectural Clock gatingSupport for high-level architectural clock gating constructs that cannot be inferred during RTL synthesis.The library integrated clock gating cell should be instanced directly in the ARM1176JZ[F]-S.

Validation of Configured CoreTest the connections between the core and memory instances usingthe integrated test bench provided with the ARM1176JZ[F]-S release.

12ConfidentialRestricted

Library Preparation

Standard Cell Library Preparation Memory Library Preparation

For Synopsys flow For Cadence Flow

For Synopsys flow For Cadence Flow

Library Preparation

13ConfidentialRestricted

Library Preparation Standard Cell Library Preparation

For Synopsys FlowMilkyway libraries of the standard cells are provided as part of the Advantage-HS standard cell library.

For Cadence FlowVoltageStorm views are needed for a Cadence implementation flow.Scripts are provided for generating the views for both standard cells and memories.

14ConfidentialRestricted

Library PreparationMemory Library Preparation

For Synopsys FlowScripts are provided for generating Milkyway views of memories

For Cadence flowScripts are provided for generating Milkyway views of memoriesThese scripts however require technology files (rcgentechfile and lef_def layer map file), which must be obtained from TSMC

15ConfidentialRestricted

Implementation-Synopsys flow

16ConfidentialRestricted

Implementation-Cadence Flow

17ConfidentialRestricted

ARM Reference Methodology (iRM)ARM Reference Methodologies are designed to provide ARM Partners with a simple, deterministic and rapid route from RTL to GDSII

The iRM takes a configured RTL representation of an ARM core and performs implementation to a cell level DRC/LVS clean representation

It provides an accompanying set of models for specific characteristics( timing,test,physical) of the final implementation

The Processor Performance Package can be easily integrated into an iRM if higher achievable performance or cache configuration changes are required

18ConfidentialRestricted

ARM 1176JZ[F]-S Performance Package for TSMC65LP

Optimized cache RAM instancesAutomatic cache memory configuration

For 8K, 16K and 32K cache optionsImplementation guidelinesLibrary preparation

Synopsys and Cadence EDA tools flow

86.70 µWStatic Power0.363 mW/MHzDynamic Power1.80mm2Area506MHzFrequency

Nominal Vt onlyFrequency data from PrimeTime-SI @ ss,1.08V, 125C (un-margined)Power results Dhrystone @ tt, 1.2V, 25CArea includes RAM @ 84% utilization

Advantage-HS™High Performance Platform

12 track high-performance standard cell library

Performance package includes:

19ConfidentialRestricted

Performance without Penalty

ARM Validated deliverablesReduce Risk

Standard cell architecture and memory access timing is critical to CPU speed

Optimized memory’s improve access timing without compromising area.

Advantage-HS 12 track standard cell architecture is designed for high performance

20% Performance increase.

automem configuration script for synthesis supporting cache sizes :8K/8K, 16K/16K, 32K/32K

Reduce time to market

Using Lvt to achieve equivalent speed can add up to 5% wafer cost + additional mask cost.

Save $

FeatureBenefit

20ConfidentialRestricted

ARM1176 Performance Package deliverablesARM Advantage-HS standard cell library. (CLN65LP)

12 Track high cell architecture for high performanceLarge cell set with over 900 cells and fine drive strength granularityMultiple beta ratios for often used cells enabling power/performance optimizationRobust power rail architecture to support high performance designs

Pre-Configured RAM instances for All Cache configurationsPerformance numbers achieved using Rvt onlyDFT views provided Fastscan and Tetramax

Documentation includes : Automatic Memory Configuration for L1 Cache Instances (8K/8K, 16K/16K, 32K/32K, only)Guidelines on the integration of TCM memories.Library preparation for Synopsys and Cadence EDA tools flowFloor planning guidelines and references to other ARM documentation

21ConfidentialRestricted

Challenge – Implementation Ranges

WANTEDHigher performance

WANTEDLower power

Higher area density

Nominalperformance

200

250

300

350

400

150 200 250 300 350mW

MH

z

You can accomplish all these with the Processor Performance Package and other ARM Physical IP

22ConfidentialRestricted

Mobile Applications SegmentHigh speed required for embedded processor (~650MHz)High density for rest of the SoC (~300MHz)Aggressive power management

Low leakage “LP/LL” processesMulti-VT designs Low voltage operationRetention and shutdown modes

Processor Performance Package is the best choice for the higher-speed ARM processorsAdvantage memory is the most appropriate choice for the high-speed sectionMetro memory is the most appropriate choice for the high density section

23ConfidentialRestricted

Enterprise and Digital Office SegmentHigh speed required over the entire chip (>750MHz)Typically use G or high-speed processesSpeed is the key criterion

Processor Performance package offers the ideal solutionSetup time + access timeMemories need to support pipelined outputs for better timing

High-capacity memories are also required2-4Mbits of contiguous SRAM

Advantage & Advantage-HS memory with pipelined outputs is the most appropriate choiceIn some cases, low VT devices may be used in the periphery to further improve access timeLarge SRAMs greater than 1Mbit are also required

24ConfidentialRestricted

High-Speed Consumer SegmentHigh speed required for embedded processor (~650MHz)High density for rest of the SoC (~300MHz)

Moderate power managementG or low leakage “LP/LL” processesMulti-VT designs Voltage islands

Large memories may be requiredUp to 4Mbits of single-port SRAM

Advantage memory with mixed VT periphery is the most appropriate choice for the high-speed sectionMetro memory with mixed VT periphery is the most appropriate choice for the high-density sectionSRAMs larger than 1Mbit are available as instances

25ConfidentialRestricted

High-Density Consumer SegmentModerate speed required over entire SoC (<300 MHz)High density required for entire SoCModerate power management

Low leakage “LP/LL” processesMulti-VT designs Voltage islands

Low speed subsegment (< 100MHz)Very low leakage requirementsLow voltage operation

Metro memory with mixed VT periphery is the most appropriate choice for the moderate speed segmentMetro memory with all high VT periphery is the most appropriate choice for the low speed segmentMemory power management should be used across the chip

26ConfidentialRestricted

All of the options needed to give the optimum PPA trade-offAvailable at multiple VtPMK for low-power at nominal Vt (RVt)Advantage-HS (LVt) with Cortex-A8 for maximum performance in consumer devices

65nm platforms available for TSMC and Common Platform

65nm High Performance PlatformProductStandard Cells

Advantage SC 10T RVt, HVt, LVtAdvantage PMK 10T RVtMetro SC 8T RVt, HVtMetro PMK 8T RVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt

Memory GeneratorsAdvantage SRAM-SP 64 Rows/BankAdvantage SRAM-DPAdvantage RF-SPAdvantage RF-2PAdvantage ROM-VIAMetro SRAM-SP 128 Rows/BankemBISTRx

I/O ProductsLVDS 850 MHz, 2.5VHSTL Class I/II 2.5VDDR1/2 flip-chipDDR1/2 wire-bond 2.5V - CUP

High Speed Serial PHYsPCI Express 1.1PCI Express 2.0Xuai 3.125GbpsCEI Short-Reach 6.4Gbps10G

27ConfidentialRestricted

45nm Low Power Mobile Platform

45nm platform example based on IBM CMOS11LPTSMC 45GS platform also available for licensing today

Manufacturability becoming major issueYield, variability, test/repairIncreased investment will pay off as reduces cost for high-volume devices

Falcon with PMK delivers high-performance and low-power for Connected Mobile Computers

Standard CellsMetro SC 9T RVt, HVt, LVtMetro PMK 9T RVt, HVt, LVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt, HVt, LVt

Memory GeneratorsAdvantage SRAM-SP (Large Bit cell) 64 / 128 R/BAdvantage SRAM-SP (Small Bit Cell) 64 / 128 R/BAdvantage SRAM-DP 64 / 128 R/BAdvantage RF-SP 128 R/BAdvantage RF-2P 128 R/BAdvantage ROM-VIA 64 R/B

Memory Self-Test and RepairemBISTRx

I/O Products - Inline/StaggeredGPIO Programmable LVDS SSTL_18 SSTL_2 USB 1.1 PCI-X HSTL Class I/II

DDR ProductsMDDR

28ConfidentialRestricted

ConclusionARM Cell Libraries and Memories give you a predictable route to silicon with a industry standard methodology.

The ARM Processor Performance Package helps you get the best PPA performance out of your ARM processor.

Reference methodology and other ARM documents make implementation an easy task

You can target a variety of application using the Processor Performance package combined with other ARM Physical IP.

29ConfidentialRestricted

Thank you

For more information www.arm.com

or contact your ARM sales person