Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

58
A Design Flow for the Development, Characterization, and Refinement of System Level Architectural Services Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007 ittee . Alberto Sangiovanni-Vincentelli (EECS) - Chair . Jan Rabaey (EECS) . Lee Schruben (IEOR)

description

A Design Flow for the Development, Characterization, and Refinement of System Level Architectural Services. Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007. Committee Prof. Alberto Sangiovanni-Vincentelli (EECS) - Chair Prof. Jan Rabaey (EECS) - PowerPoint PPT Presentation

Transcript of Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

Page 1: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

A Design Flow for the Development, Characterization, and Refinement of System Level Architectural ServicesDouglas DensmoreDissertation Talk and DES/CHESS SeminarMay 15th, 2007

Committee Prof. Alberto Sangiovanni-Vincentelli (EECS) - ChairProf. Jan Rabaey (EECS) Prof. Lee Schruben (IEOR)

Page 2: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

2/60

Objective• To demonstrate that architecture service modeling in system level design (SLD) can allow abstraction and modularity while maintaining accuracy and efficiency.

Factor Solutions Techniques Outcomes

Heterogeneity

Modularity

Event Based Architecture Service ModelingArchitecture Service Characterization

Accuracy

Efficiency

Complexity Abstraction

Architecture Service Refinement Verification Time to Market

#1

#2

#3

Page 3: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

3/60

Outline

1. Problem Statement

2. Approach3. Contribution

• Motivating Factors• Design Trends and EDA Growth• Software Solutions• Programmable Platforms• Naïve Approach• My Improved Approach

Page 4: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

4/60

Motivating FactorsFactor 1: Heterogeneity

Problem StatementApproachContribution

Solution 1: Modularity1. D. Edenfeld, et. al., 2003 Technology Roadmap for Semiconductors, IEEE Computer,

January 2004.

Existing and Predicted First Integration of SoC Technologies with Standard CMOS Processes1

Year

Intel's PXA270Mypal A730 PDA (digital camera and a VGA-TFT display) Courtesy:

http://www.intel.com/design/embeddedpca/applicationsprocessors/302302.htm

Various Component TypesVarious Communication Types

System on a Chip (SoC): Block Diagram of the Intel PXA270

PCMCIAUSB

System Bus

(SRAM, Quick Capture Interface)

Page 5: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

5/60

Motivating FactorsFactor 2: Complexity

Problem StatementApproachContribution

Solution 2: Abstraction

Courtesy: 1999 International Technology Roadmap for Semiconductors (ITRS)

Pro

du

ctivity (K

) Tran

s./Staff – M

o.

Lo

gic

Tra

nsi

sto

rs p

er C

hip

(M

)

10,000

1,000

100

10

1

0.1

0.01

0.001

Potential Design Complexity and Designer Productivity100,000

10,000

1,000

100

10

1

0.1

0.01

1981

1983

1985

1987

1989

1991

1993

1997

1999

2001

2003

2005

2007

2009

1995

Equivalent Added Complexity

58%/Yr. compounded Complexity growth rate

21%/Yr. compounded Productivity growth rate

(Top)

(Bottom)

Logic Tr./Chip

Tr./S.M

Page 6: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

6/60

Motivating FactorsFactor 3: Time to Market

Problem StatementApproachContribution

Solution 3: Accuracy and EfficiencyChallenge: Remain modular and abstract

Courtesy: http://www.ibm.com

Year late effectively ends chance of revenue!50%+ revenue loss when nine months late.Three months late still loses 15%+ of revenue.

37% of new digital products were late to market! (Ivo Bolsens, CTO Xilinx)

Digital Consumer DevicesSet-Top EquipmentAutomotive

0

2

4

6

8

10

12

14

16

18

1991 2000 2005

Year

Mo

nth

s

16

11 10.7

Gartner DataQuest. Market Trends: ASIC and FPGA, Worldwide, 1Q05 Update edition, 2002-2008.

Page 7: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

7/60

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

2003 2004 2009

Embedded Software Embedded ICs Embedded Boards

$Mill

ion

s

~$78.7 Billion

(Left) (Center) (Right)

Ravi Krishnan. Future of Embedded Systems Technology. BCC Research, June 2005.

Design Trends and EDA Growth

Des

ign

Co

mp

lexi

ty (

# T

ran

sist

ors

)

Today

Design Gap

Maximum Tolerable Design Gap

Methodology Gap

Design Trend

Gartner Dataquest projections of EDA industry revenue

Gartner Dataquest projection of ESL revenues

Gate Level

RTL

ESL

~22% Growthfor 2007

Tre

men

do

us

Gro

wth

Richard Goering. ESL May Rescue EDA, Analysts Say. EE Times, June 2005.

Problem StatementApproachContribution

Page 8: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

8/60

Software Tools Solution Problem StatementApproachContribution

1. A. Sangiovanni-Vincentelli, Defining Platform-based Design, EE Design, March 5, 2002.2. K. Keutzer, Jan Rabaey, et al, System Level Design: Orthogonalization of Concerns and

Platform-Based Design, IEEE Transactions on Computer-Aided Design, Vol. 19, No. 12, December 2000.

3. 2004 International Technology Roadmap for Semiconductors (ITRS).4. F. Balarin, et al, Metropolis: an Integrated Electronic System Design Environment, IEEE

Computer, Vol. 36, No. 4, April, 2003.

PlatformDesign-Space

Export

PlatformMapping

Architectural Space

Application SpaceApplication Instance

Platform Instance

System

Platform (HW and SW)

DT Improvement

Year Productivity Delta

Productivity (Gates/Desn-Year)

Cost of Component Affected

Description of improvement

Electronic System Level (ES-Level) Methodology

2005 +60% 200K SW Development Verification

Level above RTL including both HW and SW design.

Design Technology Improvements and Impact on Designer Productivity3

• Metropolis Meta Modeling (MMM) language4 and compiler are the core components.

• Backend tools provide various operations for manipulating designs and performing analysis.

Platform Based DesignPlatform Based Design11 is composed of three is composed of three aspects:aspects:

1.1. Top Down Application DevelopmentTop Down Application Development2.2. Platform MappingPlatform Mapping3.3. Bottom Up Design Space ExplorationBottom Up Design Space Exploration

Orthogolization of concernsOrthogolization of concerns22

• Functionality and ArchitectureFunctionality and Architecture• Behavior and Performance IndicesBehavior and Performance Indices• Computation, Communication, and Computation, Communication, and

Coordination.Coordination.

Meta model compiler

Verification tool

Synthesis tool

Front end

Meta model language

Simulator tool

...Back end1

Abstract syntax trees

Back end2 Back endNBack end3

Verification tool

Page 9: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

9/60

Programmable Platforms Problem StatementApproachContribution

1. Tsugio Makimoto, Paradigm Shift in the Electronics Industry, UCB, March 2005.

A system for implementing an electronic design. Distinguished by its ability to be programmed regarding its functionality.

At extremes:Software Programmable – GPPs, DSPsBit Programmable – FPGAs, CPLDs

What devices should the tools target? Programmable Platforms

Platform FPGAs – FPGA Fabric & Embedded Computation ElementsStrengths:Rapid Time-to-MarketVersatile, Flexible (increase product lifespan)In-Field UpgradeabilityPerformance: 2-100X compared to GPPs

Next “digital wave” will require programmable devices.1

Courtesy: K.Keuzter

Modeling focus of this work

Customization

Standardization

Source Electronics Weekly, Jan 1991

Standard Discretes

Custom LSIs

Memories, Micro-

processors

ASICs

Field Program-mability

‘57

‘67

‘77

‘87 ‘97

Weakness:Performance: 2-6x slower than ASICPower: 13x compared to ASICs

One set of models represent a very large design space of individual instantiations.

What? Why?

Page 10: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

10/60

Programmable Platform FocusClassification Description

Granularity Abstraction level:CLB, Functional Unit, ISA

Host Coupling Coupling to host processor:I/O, direct communication, same chip

Reconfiguration Methodology

How device is programmed:Static, dynamic, partial

Memory Organization

How computations access memory:Large block, distributed

Design Levels Design Elements

Communication Storage Processing

Implementation

Switches, MUXES RAM Organization

CLB/ IP Block

uArch Crossbar, Bus Register File Size Execution Unit Type

ISA Address Size Register Set Custom Instructions

System Arch Intercon. Network Buffer Size Number/Types of tasks

K. Bondalapati, V. Prasanna, Reconfigurable Computing Systems, USC

P. Schaumont, et al, A Quick Safari Through the Reconfigurable Jungle, DAC, June 2001.

Problem StatementApproachContribution

What do MY system level models need to capture?

Xilinx Virtex II ML310 Board

Xilinx Virtex IIXC2VP30

IBM’s CoreConnect Architecture

MicroBlaze

PowerPC

Page 11: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

11/60

Naïve Approach Problem StatementApproachContribution

ImplementationPlatform

“C” Model

RTL “Golden Model”

DisconnectedInaccurate!

InefficientMiss Time to Market!

Imple

menta

tion G

ap!

EstimatedPerformance

Data Datasheets Expertise

Bridge the Gap!!

AbstractModularSLD Tools

Architecture Model

Sim

ula

tio

n1. Design Space Exploration

2. Synthesis

Infl

exib

le A

uto

mat

ic T

oo

l Flo

w

Lengthy Feedback

Manual

Manual

Page 12: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

12/60

My Improved Approach Problem StatementApproachContribution

AbstractModularSLD

EstimatedPerformance

Data

Technique 1: Modeling style and characterization for programmable platforms Real

Performance Data

Actual ProgrammablePlatform Description

Technique 2: Refinement Verification

Narrow the Gap

ManualInformal

Formal Checking Methods

RefinedAbstract Correct!!

New approach has improved accuracy and efficiency by relating programmable devices and their tool flow with SLD (Metropolis). Retains modularity and abstraction.

From characterization flow

Functional level blocks of programmable components

Page 13: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

13/60

Approach Statement

Problem:SLD of architecture service models potentially is inaccurate and inefficient.

My Approach:A PBD approach to Architecture Service Modeling which allows modularity and abstraction. By relating service models to:

Problem StatementApproachContribution

Chapter 2 – System Level Architecture Services

Xilinx Virtex II

FLEETGeneral Purpose

Real Performance

Data

Chapter 3 – Architecture Services Characterization

Select architecture services from libraries

1.... ...

Assemble SLD, transaction based architecture from services.

2.

Abstract, Modular

GeneralSpecial Purpose

Augment model with real

performance data

3.

Simulation based, Design Space Exploration

4.

Structure Extractor

Produce an actual programmable platform description

5.

(i.e. MHS File)

Narrow the Gap

Programmable

Functional Modeling(Not discussed in this work)

Program actual device directly

6.

Abstract Refined

Based on DSE results, modify architecture model if needed4a.

Perform refinement check (event based, interface based, compositional component based)

4b.

Yes? No?

MHS

Chapter 4 – System Level Service Refinement

•programmable platforms,• platform characterization,• and refinement verification,

they will retain accuracy and efficiency.

Page 14: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

14/60

Outline Revisited

1. Problem Statement2. Approach

3. Contribution

• My Improved Approach• Approach Statement• Architecture Service Descriptions • Metropolis Overview

• Programmable Architecture Service Modeling• Programmable Platform Characterization• Example of Techniques

Focus: Modularity

Page 15: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

15/60

Problem StatementApproachContribution

Architecture Service Taxonomy

Component

Component

Component Component

Component

Component

Single Component, Single Interface

Multiple Component, Multiple Interface

Multiple Component, Single Interface

Service

Provided Interface Provided Interface

Pro

vid

ed In

terf

ace

Service

Cost CostA (C1, C2)

CostB (C2)

Provided Interface

Cost (C1, C2, C3)Service

Internal Interface

Internal Interface

Services are library elements, <F, C> where F is a set of interface functions (capabilities) and C is a set of cost models.

Single Component, Single Interface (SCSI) – One provided interface and one simple cost modelMultiple Component, Multiple Interface (MCMI) – Two or more provided interfaces, zero or more internal interfaces, one or more simple cost functions, and zero or more complex cost functions. Multiple Component, Single Interface (MCSI) – One provided interface, one or more internal interfaces, and one or more complex cost functions.

Services also classified as active or passive.

General Purpose Processor

Xilinx Virtex II Pro

Add

Multi

CF

CF

DCT

FFT

CF

CF Bus

CPU

CF

CF

Abstract

Page 16: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

16/60

Problem StatementApproachContribution

Service Based Arch. Styles

SCSI

SCSI

SCMI

SCSI

SCMI

SCMI

SCSI

SCSI

SCSI SCSI

MCSI MCSI MCSI

MCSI

MCMI

Ovals – Passive ServicesSquares – Active Services

Architecture Style 1 - Branching

Architecture Style 2 - Ring

SCSI

MCSI

SCSI

MCSI

MCSI

MCSI

MCSI

MCSI

MCSIMCSI

SCSI SCSI

SCSI

SCSI

SCSISCSI

Branching Style – Allows for the usage of all types of services

Ring Style – Allows for the usage of Single Interface (SI) services only

Both Styles – Allow for the usage of active/passive and single/multiple component services.

Assemble collections of services to provide larger sets of capabilities and cost functions.

MCMIHierarchy – Each style can be abstracted into composite services.

Page 17: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

17/60

Metropolis Objects Problem StatementApproachContribution

• Metropolis elements adhere to a “separation of concerns” ideology.

Proc1P1 P2

I1 I2Media1

QM1

Active ObjectsSequential Executing Thread

Passive ObjectsImplement Interface Services

Schedule access to resources and quantities

• Processes (Computation)

• Media (Communication)

• Quantity Managers (Coordination)

Page 18: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

18/60

Metro. Netlists and EventsProblem StatementApproachContribution

Proc1

P1

Media1 QM1

Scheduled Netlist Scheduling Netlist

GlobalTime

Metropolis Architectures are created via two netlists:• Scheduled – generate events1 for services in the scheduled netlist.• Scheduling – allow these events access to the services and annotate events with quantities.

I1

I2 1. E. Lee and A. Sangiovanni-Vincentelli, A Unified Framework for Comparing Models of Computation, IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, Vol. 17, N. 12, pg. 1217-1229, December 1998

Proc2

P2

Page 19: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

19/60

Services in Design Flow

Select architecture services from libraries

1.

Assemble SLD, transaction based architecture from services.

2. Produce an actual programmable platform description

3.

(i.e. MHS File)

Program actual device directly

4.

Process Expanded

Structure Extractor

BRAM

PowerPC

PLB

Mapping Process

Xilinx Virtex II Libraries

OPB

SynMaster uBlaze

Structure Extraction

Characterization Data Input (Chapter 3)

Problem StatementApproachContribution

Metropolis Media

Chapter 2 – System Level Architecture Services

Xilinx Virtex II

FLEETGeneral Purpose

Real Performance

Data

Chapter 3 – Architecture Services Characterization

Select architecture services from libraries

1.... ...

Assemble SLD, transaction based architecture from services.

2.

Abstract, Modular

GeneralSpecial Purpose

Augment model with real

performance data

3.

Simulation based, Design Space Exploration

4.

Structure Extractor

Produce an actual programmable platform description

5.

(i.e. MHS File)

Narrow the Gap

Programmable

Functional Modeling(Not discussed in this work)

Program actual device directly

6.

Abstract Refined

Based on DSE results, modify architecture model if needed4a.

Perform refinement check (event based, interface based, compositional component based)

4b.

Yes? No?

MHS

Chapter 4 – System Level Service Refinement

Page 20: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

20/60

Programmable Arch. ModelingProblem StatementApproachContribution

• Computation Services

• Communication Services

• Other Services

PPC405 MicroBlaze SynthSlaveSynthMaster

ProcessorLocalBus

(PLB)

On-ChipPeripheral

Bus(OPB)

OPB/PLB BridgeMapping Process

Computation InterfacesRead (addr, offset, cnt, size), Write(addr, offset, cnt, size), Execute (operation, complexity)

BRAM

Task Before MappingRead (addr, offset, cnt, size)Task After MappingRead (0x34, 8, 10, 4)

Communication Interfaces addrTransfer(target, master) addrReq(base, offset, transType, device) addrAck(device)

dataTransfer(device, readSeq, writeSeq)dataAck(device)

• Transaction Level• IP Parameters• I/O Interfaces

Services are organized by orthogonal aspects of the system. All services created here are XCMI with more than two provided interfaces each.

Leverage function level granularity; 1-to-1 model/IP correspondence

Page 21: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

21/60

Sample Metropolis Service

Interface Function Assumptions Cycle Count

cpuRead(int bus) Bus Dependent 1(LMB), 7(OPB) cycle

cpuWrite(int bus) Bus Dependent 1(LMB), 2(OPB) cycle

fslRead(int size) Transfer Size (1 * size) cycles

fslWrite(int size) Transfer Size (1 * size) cycles

execute(int inst, int comp)

Valid INST Field (1 * complexity) cycles

Each service profiled manually and given a set of cost models

Problem StatementApproachContribution

uBlaze

Parameterspublic medium uBlaze implements uBlazeISA, GPPOperation, OPBMaster{...}

Ports

_portOPB

_portSM

_portChar

port OPBTrans _portOPB; //connection to characterizerport cycleLookup _portChar;//FSL portsport FSLMasterInterface[] _portMFSL;port FSLSlaveInterface[] _portSFSL;//connection to StateMedia port SchedReq _portSM;//StateMedia to global timeport GTimeSMInterface _portGT;

_portGT

_portMFSL _portSFSL

private int C_FSL_LINKS;private int C_FSL_DATA_SIZE;private int C_USE_BARREL;private int C_USE_DIV;private int C_USE_HW_MUL;

Non-Ideal

Page 22: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

22/60

Programmable Arch. ModelingProblem StatementApproachContribution

• Coordination Services

PPC Sched OPB SchedPLB SchedMicroBlaze

Sched

BRAM Sched General Sched

Request (event e)

-Adds event to pending queue of requested events

Resolve()

-Uses algorithm to select an event from the pending queue

PostCond()

-Augment event with information(annotation). This is typically the interaction with the quantity manager

GTime

Page 23: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

23/60

Sample Metropolis QMInterfaces

public quantity PLBArb implements QuantityManager {…}

Ports

portTaskSM

public quantity SeqQM implements QuantityManager {…}

port StateMediumSched[] portTaskSM;

public eval void request(event e, RequestClass rc) {public update void resolve() {…}public update void postcond() {…} public eval boolean stable(){…}

public event getRequestEvent() {…}public int getserviceType() {…}public int getTaskId() {…}public int getComplexity() {…}public void setTaskId(int id) {…}public int getFlag() {…}public void setFlag(int flag) {…}public int getDeviceId() {…}

Quantity Manager

Request Class

Interfaces{

Problem StatementApproachContribution

Each resolve() function is unique

Page 24: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

24/60

Architecture Extensions for Preemption•Some Services are naturally preempted

–CPU context switch, Bus transactions•Notion of Atomic Transactions

–Prior to dispatching events to a quantity manager via the request() method, decompose events in the scheduled netlist into non-preemptable chunks.–Maintain status with an FSM object (counter) and controller.

Decoder (Process)

.

3. Dispatch the atomic transaction (AT) to the quantity manager (individual events which make up the AT).

Service (Media)Process(Task)

FSM

1

3

2

2 Decoder transforms the transaction into atomic transactions

A

C

B Quantity Manager1

A

2

B

3

C

Initial State

4. Update the FSM to track the state of the transaction.

S1 S3S2

SMsetMustDo()setMustNotDo()

5. Communication with preempted processes through StateMedia

. Trans1 FSM1

Trans0 FSM0

6. Use Stack data structure to store transactions and FSMs

Transaction(i.e. Read)

1. A transaction is introduced into the architecture model.

Event

S1

Problem StatementApproachContribution

Page 25: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

25/60

Architecture Extensions for Mapping•Programmable platforms allow for both SW and HW implementations of a function.•Need to express which architecture components can provide which services and with what affinity.

Potential Mapping StrategiesGreedyBest Average Task Specific

Mapping Process

Mapping Process

Dedicated HW DCT

FFT

DCT

Execute

AffinityTask

0/100

100/100

0/100

General PurposeuProc

2/100FFT

20/100DCT

50/100

Affinity

Execute

Task

Export information from service associated with mapping process

Export information from service associated with mapping process

HW DCT(Service)

uBlaze(Service)

Operations available

Ability to perform operations

Only can perform DCT ! Can perform multiple operations

(Task) (Task)

public HashMap getCapabilityList()

Problem StatementApproachContribution

Page 26: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

26/60

4. Extractor Script Tasks

Structure Extractor

A. Identify parameters for service. For example MHZ,cache settings, etc.

•Type•Parameters• Etc

B. Examine port connections to determine topology.

Programmable Arch. Modeling•Compose scheduling and scheduled netlists in top level netlist.•Extract structure for programmable platform tool flow.

Problem StatementApproachContribution

Modular Modeling Style Accurate & Efficient

Scheduled Netlist Scheduling Netlist

Mapping

Process

MicroBlaze

OPB OPB Sched

MicroBlazeSched

Connec tio

ns

Topolo

gy

D. Check port names, instance names, etc for instantiation.

2. Provide Service Parameters

1. Assemble Netlists

C. Examine address mapping for bus, I/O, etc.

Top Level NetlistPublic netlist XlinxCCArchXilinxCCArchSched schedNetlist ;XilinxCCArchScheduling schedulingNetlistSchedToQuantity [] _stateMedia

5. Gather information and parse into appropriate tool format

File for Programmable Platform Tool Flow (MHS)

3. Simulate Model Decide on final topology.

Page 27: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

27/60

Characterization in Design Flow

Real Performance

Data

Categorize and store data

3.

Physical Timing

Execution Time for Processing

Transaction Cycles

Characterizer Database

1. Select device or family

2. Create systems

Sys

tem

Cre

ato

r

S2

S1

S3

SN

Extract data from systems

Da

ta E

xtr

act

or

4.

Process Expanded

Chapter 2 – System Level Architecture Services

Xilinx Virtex II

FLEETGeneral Purpose

Real Performance

Data

Chapter 3 – Architecture Services Characterization

Select architecture services from libraries

1.... ...

Assemble SLD, transaction based architecture from services.

2.

Abstract, Modular

GeneralSpecial Purpose

Augment model with real

performance data

3.

Simulation based, Design Space Exploration

4.

Structure Extractor

Produce an actual programmable platform description

5.

(i.e. MHS File)

Narrow the Gap

Programmable

Functional Modeling(Not discussed in this work)

Program actual device directly

6.

Abstract Refined

Based on DSE results, modify architecture model if needed4a.

Perform refinement check (event based, interface based, compositional component based)

4b.

Yes? No?

MHS

Chapter 4 – System Level Service Refinement

1. Douglas Densmore, Adam Donlin, A.Sangiovanni-Vincentelli, FPGA Architecture Characterization in System Level Design, Submitted to CODES 2005.

2. Adam Donlin and Douglas Densmore, Method and Apparatus for Precharacterizing Systems for Use in System Level Design of Integrated Circuits, Patent Pending.

Work with Xilinx Research Labs

Problem StatementApproachContribution

Page 28: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

28/60

Prog. Platform CharacterizationProblem StatementApproachContribution

1. Create template system description.

2. Generate many permutations of the architecture using this template and run them through programmable platform tool flow.

3. Extract the desired performance information from the tool reports for database population.

Need to tie the model to actual implementation data!

Process from Structure Extraction

Page 29: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

29/60

Prog. Platform CharacterizationProblem StatementApproachContribution

From Char Flow ShownFrom Metro Model Design

From ISS for PPC

Create database ONCE prior to simulation and populate with independent (modular) information.

1. Data detailing performance based on physical implementation.2. Data detailing the composition of communication transactions.3. Data detailing the processing elements computation.

Page 30: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

30/60

Characterized Data OrganizationProblem StatementApproachContribution

Each system interface function characterized has an entry. These indices can be a hashed if appropriate.

Entries can share data or be independent.

Entries can have all, partial, or no information.

4.2ns4ns

3.8ns3.2ns

System 1 System N

ISS uProc1FFT 20 Cycles

Filter 35 Cycles

ISS uProc2FFT 10 Cycles

Filter 30 Cycles

} PhysicalTiming

} IndexMethod

Computation } Timing

}Transaction Timing

Read = ACK, Trans, Data

Write = ACK, Data, ACK NULL

Metro Characterizer

Model

How is the data associated with each service interface function?

? ???

?

??

Page 31: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

31/60

Slice Count Frequency

Combo Frequency and Resource Usage

0

1000

2000

3000

4000

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64Samples

Slic

e C

ou

nt

0

20

40

60

80

100

120

140

MH

Z

High Spikes in Adjacent (Similar) Samples

Decreasing but not monotonic or linear

Area Measure Often Plateaus

Added BRAM -1sAdded uBlaze – 2s

Increasing System

Complexity

1 2 111 2 2

(Are

a)

(Per

form

ance

)

Periodic Changes

Prog. Platform Characterization

PowerPC System Address Changes

0

500

1000

1500

2000

2500

1 3 5 7 9 11 13 15Sample

Slic

e C

ou

nt

0

20

40

60

80

100

120

140

MH

ZLoose Addr SlicesTight Addr Slices

Loose Addr MHZTight Addr MHZ

10%+ Delta

Area Curves Overlap

Table 3.3 Data

Top Two Curves

(Are

a)

(Per

form

an

ce

)

Problem StatementApproachContribution

Modular Characterization

Accurate & Efficient

P B U Addr. Area Max MHZ MHZ Area

1 2 1 T 1611 119 16.17% 39.7%

1 2 1 L 1613 102 -14.07% 0.12%

1 3 0 T 1334 117 14.56% -17.29%

1 3 0 L 1337 95 -18.57% 0.22%

1 3 1 T 1787 120 26.04% 33.65%

• As resource usage increases system frequency generally decreases.• Not linear nor monotonic.• 15% change is a speed grade for the devices.

•Design from rows 1, 3, and 5 of the table.

•Three abstraction levels: 1, 3, and 10 cycle transactions.

•Metropolis JPEG version: 112,500 write transactions for 3 MegaPixel, 24 bit color depth, 95% compressed image.

• 19% difference between intuition and characterization.

Created database once prior to simulation.

PLB Write Transfer Performance Comparison

Why can’t you just use a static estimation?

Page 32: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

32/60

Modeling & Char. Review Problem StatementApproachContribution

DedHW Sched

PLB Sched

BRAM Sched

GlobalTime

PPC Sched

Task1 Task2

PPC

Task3 Task4

DEDICATED HW

BRAM

PLB

Scheduled Netlist Characterizer

Scheduling Netlist

Media (scheduled) Process

Quantity ManagerQuantity

Enabled Event

Disabled Event

SCSI

MCMI

MCMI

MCMI

Branching Architecture Example

Page 33: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

33/60

Outline Revisited

1. Problem Statement2. Approach 3. Contribution• Architecture Refinement Verification• Vertical Refinement• Horizontal Refinement• Surface Refinement• Depth Refinement• Design Flow Examples• Summary and Conclusions

Focus: Abstraction

Page 34: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

34/60

Arch. Refinement VerificationProblem StatementApproachContribution

• Architectures often involve hierarchy and multiple abstraction levels.

– Limited if it is not possible to check if elements in hierarchy or less abstract components are implementations of their counterparts.

• Asks “Can I substitute M1 for M2?”1. Representing the internal structure of a component.2. Recasting an architectural description in a new style.3. Applying tools developed for one style to another style.

Refinement Technique

Description Metropolis

Style/Pattern Based Define template components. Prove they have a desired relationship once. Build arch. from them.

Potential; TTL YAPI

Event Based Properties (behaviors) expressed as event lists. Explicitly look for this event patterns.

Discussed

Interface Based Create structure capturing all behavior of a components interface. Compare two models.

Discussed

Compositional Component Based

Create structures capturing local behavior. Compose larger systems by synchronizing these smaller pieces.

Discussed

D. Garlan, Style-Based Refinement for Software Architectures, SIGSOFT 96, San Francisco, CA, pg. 72-75.

Page 35: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

35/60

Refinement Verification in Design Flow Problem Statement

ApproachContribution

Chapter 2 – System Level Architecture Services

Xilinx Virtex II

FLEETGeneral Purpose

Real Performance

Data

Chapter 3 – Architecture Services Characterization

Select architecture services from libraries

1.... ...

Assemble SLD, transaction based architecture from services.

2.

Abstract, Modular

GeneralSpecial Purpose

Augment model with real

performance data

3.

Simulation based, Design Space Exploration

4.

Structure Extractor

Produce an actual programmable platform description

5.

(i.e. MHS File)

Narrow the Gap

Programmable

Functional Modeling(Not discussed in this work)

Program actual device directly

6.

Abstract Refined

Based on DSE results, modify architecture model if needed4a.

Perform refinement check (event based, interface based, compositional component based)

4b.

Yes? No?

MHS

Chapter 4 – System Level Service Refinement

1.Identify changes to be made (structural or component)

Process Expanded

P1

P2

P3M1

M2

P1

P2

P3M1

M2

P4

M3

Abstract

Refined

Yes? No?

P1

P2

P31M1

M2

P32

MN

P3StructuralComponent

A. Inter-component structural changes (compositional component based)

Run verification tools2.

B. Structural changes between scheduled and scheduling components (event based)

P1

M1

Scheduled Scheduling

P3

C. Intra-component changes (Interface based)

P2

A

C

B

P2

A

C

B

(More Functionality)

Events

Refinement Question

Surface Refinement1

• Interface Based• Control Flow Graph

• Focus on introducing new behaviors (Reason 1)

Vertical Refinement1

Horizontal Refinement1

• Event Based• Event Based Properties

• Focus on abstraction & synthesis (Reasons 2 & 3)

Depth Refinement• Compositional Component Based

• Labeled Transition Systems

• Focus on reasons 1, 2, and 3

1. Douglas Densmore, Metropolis Architecture Refinement Styles and Methodology, University of California, Berkeley, UCB/ERL M04/36, 14 September 2004.

Page 36: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

36/60

Vertical Refinement Problem StatementApproachContribution

BRAM Sched

Cache Sched

Scheduled Netlist

Scheduling Netlist

Mapping Process

Mapping Process

Rtos Sched

Neworigins andamountsof eventsscheduledand annotated

Sequential

Concurrent

BRAM

PPC405

PLBCache

Rtos

PLB Sched

PPC Sched

Original Sequential Concurrent 1 Concurrent 2

E1 (CPURead) E1 (RTOSRead) E1 (CPURead) E1 (CPURead)

E2 (BusRead) E2 (CPURead) E2 (CacheRead) E2 (CacheRead)

E3 (MemRead) E3 (BusRead) E3 (BusRead)

E4 (MemRead) E4 (MemRead)

•Definition: A manipulation to the scheduled netlist structure to introduce/remove the number or origin of events as seen by the scheduling netlist.

Page 37: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

37/60

Horizontal Refinement Problem StatementApproachContribution

BRAM Sched

Cache Sched

Scheduled Netlist

Scheduling Netlist

Mapping Process

Mapping Process Rtos Sched

Orderingof eventrequestschanged

BRAM

PPC405

PLBCache

Rtos

PLB Sched

PPC SchedArb

ControlThread

Original* Refined (interleaved)E1 (BusRead) -> From CPU1 E1 (BusRead) -> From CPU1

E2 (BusRead) -> From CPU1 E3 (BusRead) -> From CPU2

E3 (BusRead) -> From CPU2 E2 (BusRead) -> From CPU1

E4 (BusRead) -> From CPU2 E4 (BusRead) -> From CPU2

PPC405 PPC Sched •Definition: A manipulation of both the scheduled and scheduling netlist which changes the possible ordering of events as seen by the scheduling netlist.

*Contains all possible orderings if abstract enough

Page 38: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

38/60

Event Based Properties Problem StatementApproachContribution

• Properties expressed as event sequences as seen by the scheduling netlist.

Bad Resolve() Good Resolve()

CPU E1, E2, E3, E4 E4, E1, E2, E3

Bus X, X, X, X, E4 X, E4

Mem X, X, X, X, X, E4 X, X, E4

Bad Resolve() Good Resolve()

CPU (0) E1, E2 E1 E1, E2 E1

CPU (1) E2, E3 E2 E2, E3 E2

Bus (1) E1, EX EX E1, EX EX

CPU (2) E3 E3 E3 E3

Bus (2) E1, E2 E2 E1, E2 E1

CPU(3)

Bus (3) E1, E3 E3 E2, E3 E2

E1 (CPUExe)E2 (CPUExe)E3 (CPUExe)

E4(CPURead)

Resource Utilization

Latency

Page 39: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

39/60

Macro and MicroProperties

MicroProperty - The combination of one or more attributes (or quantities) and an event relation defined with these attributes.

MacroProperty – A property which implies a set of MicroProperties. Defined by the property which ensures the other’s adherence. The satisfaction (i.e. the property holds or is true) of the MacroProperty ensures all MicroProperties covered by this MacroProperty are also satisfied.

Snoop Complete (SC)0

Data Valid (DV)0

Sufficient Space (SS)

0

Data Coherency (DCo)Level 1

Read Access (RA)1

Write Access (WA)1Data Consistency (DC)

Level 2

No Overflow (NO)1Sufficient Bits (SB)

2

Data Precision (DP)Level 3

Problem StatementApproachContribution

Page 40: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

40/60

Event Petri Net

DVSC

pDCo

t1 t2

t3

NO

SB

SS

pDP

RA

WA

2

pDC

t4

t5

t6

t7

t8

t9t10

t11

t12

t13

t14

3

3 4

Bu

sRea

d BusWrite

CP

UW

rite

CP

UE

xecute

Mem

Rea

d

MemWriteCPURead

tE1 tE2tE3

tE3 tE4

tE5 tE6

tC1

tC2tC3

pC1

pC2

pC4

pC3pC5

pC6

start1

start2 start3

Model EPN

Prop EPN

Two Petri Nets – One for the service model and one for the events of interest.

Model Event Petri Net – One transition set which represents events of interest, tEN. Transitions also are used to indicated interface functions.

Property Event Petri Net – Initial marking vector is empty. One place per Macroproperty, p<prop>. Created such that in order to create a token in each MacroProperty place, all transitions must fire once and only once.

Link the two event petri nets together such that select tENs feed connection transitions, tCN, which produce the needed tokens for the property EPN.

Problem StatementApproachContribution

Page 41: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

41/60

Surface Refinement Def.

• Defined as in Hierarchical Verification1

–Model: An object which can generate a set of finite sequences of behaviors, B–Trace: a B–Given a model X and a model Y, X refines the model, denoted X < Y if given a trace a of X then the projection a[ObsY] is a trace of Y.–Two models are trace equivalent, X Y if X < Y and Y < X.

• The answer to the refinement problem (X,Y) is YES if X refines Y, otherwise NO

Problem StatementApproachContribution

1. T.Henzinger, S.Qadeer, S.K. Rajamani, “You Assume, We Guarantee: Methodology and Case Studies”, 10th International Conference on Computer Aided Verification (CAV), Lecture Notes in Computer Science 1427, Springer-Verlag, 1998, p.440-451.

Component P1

P3

P2

1

3

2

4

Interfaces(Ports)

Internal OperationNot Visible

Example:Interface Calls

on Ports

Unknown MoC(DataFlow, KPN, Etc)

Observable

Provides Services

Component

Surface

Su

rfac

e

Surface

Required Services

Restriction on the location and information available to define component behavior.

Surface

Su

rface

Surface

TraceM – Trace in Metropolis = a finite set of function calls to media via interfaces

Page 42: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

42/60

Control Flow Graph •Defined much like*•Tuple <Q, qo, X, Op, >

–Q – Control Locations–qo – initial CL–X – set of variables–Op – function calls to media, basic block start and end - transition relation//sample code

Process example{port Read port1;port Write port2;

Void thread(){int x = 0;while (x < 2){port1.callRead();x++;}port2.callWrite();}}

Problem StatementApproachContribution

1

73

2

4

6

5

8

Control Location 1Group Node Type: ProcessDeclNodeInitial Control Location

X = 0

Control Location 2Group Node Type: LoopNodewhile loop

X < 2 X >= 2

9

Control Location 7Group Node Type: ThisPortAccessNode

Control Location 8Group Node Type: NoneEnding of basic block

Port2.callWrite()+

Port2.callWrite()-

Control Location 9Group Node Type: NoneSink State

10

Control Location 10Group Node Type: NoneBookend of LoopNode

Port1.callRead()+

Port1.callRead()-

X++(+)

X++(-)

Control Location 3Group Node Type: ThisPortAccessNode

Control Location 4Group Node Type: NoneEnding of basic block

Control Location 5Group Node Type: Collection of Variable Nodes

Control Location 6Group Node Type: Variable Node (collection) - End

321X = 0 X = 1 X = 2

Hypothetical Automaton for X variable

Graph for Model

*”Temporal Safety Proofs for Systems Code”, Henzinger et al.

Page 43: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

43/60

Surface Refinement Domains

Component(Producer)

Component(Adder)

Component(Producer)

Component(Adder)

Component(Memory)

move.source.Prodmove.dest.Prod

move.source.Addermove.dest.Adder

move.source.memmove.dest.mem

Component (Switch Fabric)

Component (Switch Fabric)

Add (input1, input2) Adder (input1, input2)prodLit() prodLit() get() put()

move.dest.Adder move.dest.Prod

Add (input1, input2) prodLit()

move.dest.Adder move.dest.Prod move.dest.mem

prodLit()Add (input1, input2)get() put()

Communication Ref Domain

Computation Ref Domain 1 Computation Ref Domain 2 Computation Ref Domain 1

Communication Ref Domain

Storage Ref Domain 1

OP OP

OPOP

OP

OP

OP OP

move.source.Prodmove.dest.Prod

move.source.Addermove.dest.Adder

move.source.Adder

move.dest.Adder move.dest.Prod

move.source.Prod

move.source.mem

move.dest.mem

move.source.Adder

move.source.Prodmove.dest.Adder

Problem StatementApproachContribution

<C, P, OP>

Page 44: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

44/60

Surface Refinement ExampleProblem StatementApproachContribution

1. Douglas Densmore, Sanjay Rekhi, A. Sangiovanni-Vincentelli, MicroArchitecture Development via Successive Platform Refinement, Design Automation and Test Europe (DATE), Paris France, 2004.

Trace FIFO Scheduler Process Traces (*function calls abbr)

T1 Terminated()

T2 Terminated()

wRnd()*

T3 Terminated()

wRnd()* wRnd()*

T4 Terminated()

wRnd()* Tnated()*

qData ()*

T4 Cont putPolicy() PR1S()*

Bref = {T1, T3, T4} Bab = {T1, T2, T3, T4} Refinement!

1

3

2

4

5

6 7

89

10 11

12

terminated()

True False

whatRound()

Type & !Done Else

whatRound()checked_allterminated()

True False

FIFO SchedulerRef

queryData()

putPolicy()putRound1_

Status

FIFO SchedulerAb

1

3

2

4

5

6 7

89

10 11

12

terminated()

True False

whatRound()

Type & !Done Else

whatRound()checked_allterminated()

True False

queryData()

putPolicy()putRound1_

Status

!Type & !Done

Trace containment check for single threaded processes

Page 45: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

45/60

Surface Refinement Flow Problem StatementApproachContribution

CFG Backend(automatic)

Metropolis Model (.mmm)

Visual Representation (for debugging)

Reactive Module of CFG (X)

MOCHA

Kiss file of CFA

SISstate_assign script

(automatic)

BLIF file

Manual Edits to BLIF and NEXLIF2EXE

Mode.exe file

FORTE

Witness Module

(W)

Edit and Parallel Composition

(manual)

X||W

1

23

3a

3b

4

4a

4b

4c

Answer to X Y

Answer to X Y

CFA (Y) developed in previous iteration

BLIF file developed in previous iteration

Three primary branches:1. Visual representation for debugging

2. CFG conversation to a reactive module. Works with the MOCHA tool flow. Requires manual augmentation of a witness module since Y has private variables.3. CFG conversation to a KISS file. Works with the SIS and Forte tool flows. Requires manual edits to BLIF to EXLIF.

Page 46: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

46/60

empty

notempty

full

write

write2read2

Depth Refinement - LTS

• Definition: A Labeled Transition System (LTS) is a

tuple <Q, Q0, E, T, l> where:

–Q is a set of states,–Q0 Q is a set of initial states,–E is a finite set of transition labels or actions,–T Q x E x Q is a labeled transition relation, and–l : is an interpretation of each state on system variables.

•But in LTS there is no notion of input signals

–When we compose LTS, a transition can be triggered when another LTS is in a given state.

Service

ReadWrite

• Depth Refinement – Want to make inter-component structural changes.

Olga Kouchnarenko and Arnaud Lanoix. Refinement and Verification of SynchronizedComponent-Based Systems. In FME 2003: Formal Methods, Lecture Notes in Computer Science,volume 2805/2003, pages 341–358. Springer Berlin / Heidelberg, 2003

Write2 Read2

Problem StatementApproachContribution

Page 47: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

47/60

Refinement Rule 1

• If there is a transition in the refined LTS from one state to another, then there must be the same transition in the abstract

• Note: The two transitions must have the same label!

Strict transition refinement

Problem StatementApproachContribution

Page 48: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

48/60

Refinement Rule 2

• If there is a new (tau) transition in the refinement LTS, then its beginning state and ending state must correspond to the same state in the abstract

Stuttering transition refinement

Problem StatementApproachContribution

Page 49: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

49/60

Refinement Rule 3

• There are no new transitions in the refinement that go on forever

Lack of τ-divergence

Problem StatementApproachContribution

Page 50: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

50/60

Refinement Rule 4

• If there is a transition in the abstract and the corresponding refined state does not have any transition then – there must be another refined state that corresponds to the abstract– it must take a transition to another refined state and in the abstract must exist a state so

that these two are glued together.

External non-determinism preservation

Problem StatementApproachContribution

Page 51: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

51/60

Depth Ref. Design Flow1. Create a .fts file

capturing the LTS for each component of the refined and abstract systems.1. Define observable

events, OE

2. Transaction labels correspond to OE

2. Define gluing invariants in .inv file.

Problem StatementApproachContribution

empty

notempty

full

write

write

read

read

empty

notempty

full

d1

d2

write2

read2

readwrite

readwrite

write

write

read

read

Abstract RefinementGluing Relation

Gluing Relation

Gluing Relation

Gluing Relation

Gluing Relation

Transition System//Two state valuestype SIGNAL = {consume, wait}local con : SIGNAL

//Can only be in one stateInvariant(con = consume) \/ (con = wait)

//Initial stateInitially (con = wait)

//Transistion to consume (‘‘get’’ event)Transition get :enable (con = wait) ;assign con := consume

//Transition to wait (‘‘stallC’’ event)Transition stallC :enable (con = consume) ;assign con := wait

((con = consume) <--> (conR = consume))/\((con = wait) <--> ((conR = wait) \/ (conR = clean)))

//Buffer Events (reads and writes)//‘‘write1’’ event is enabled when the LTSs are in the following states(write1) when((prod = produce) /\ (buf = empty) /\ (con != consume)),(write3) when((prod = produce) /\ (buf = notempty) /\ (con != consume)),(read1) when((prod != produce) /\ (buf = notempty) /\ (con = consume)),(read3) when((prod != produce) /\ (buf = full) /\ (con = consume)),

//Producer Eventsmake when(prod = wait),stall when(prod = produce),

//Consumer Eventsget when(con = wait),stallC when(con = consume)

3. Define synchronization between LTS in .sync file.

Page 52: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

52/60

File for Xilinx EDK Tool Flow

IP Library

1. Select an application and understand its behavior.

2. Create a Metropolis functional model which models this behavior.

3. Assemble an architecture from library services or create your own services.4. Map the functionality to the architecture.

5. Extract a structural file from the top level netlist of the architecture created.

On-ChipPeripheral

Bus(OPB)

SynthMaster

SynthSlave

MicroBlaze

Mapping ProcessMapping

Process

Mapping ProcessMapping

Process

BRAMBRAM

Preprocessing DCT Quantization Huffman

JPEG Encoder Function Model (Block Level)

StructureExtractor

Top Level Netlist

Problem Statement Approach ContributionExample Design

Page 53: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

53/60

Example Design Cont. Problem StatementApproachContribution

File for Xilinx EDK Tool Flow

Permutation Generator

ISS Info CharDataTransaction

Info

Platform Characterization Tool (Xilinx EDK/ISE Tools)

Characterizer Database

Software Routinesint DCT (data){Begin calculate ……} Automatic32 Bit Read = Ack, Addr, Data, Trans, Ack

Manual

Hardware RoutinesDCT1 = 10 CyclesDCT2 =5 CyclesFFT = 5 Cycles

Manual

1. Feed the captured structural file to the permutation generator.

2. Feed the permutations to the Xilinx tools and extract the data.

3. Capture execution info for software and hardware services.

4. Provide transaction info for communication services.

Permutation 1 Permutation 2 Permutation N

Page 54: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

54/60

Example Design Cont. Problem StatementApproachContribution

Preprocessing DCT Quantization Huffman

JPEG Encoder Function Model (Block Level)

On-ChipPeripheral

Bus(OPB)

SynthMaster

SynthSlave

MicroBlaze

Mapping ProcessMapping Process

Mapping ProcessMapping

Process

BRAMBRAM

ISS InfoCharDataTransaction

Info

2. Refine design to meet performance requirements.

3. Use Refinement Verification to check validity of design changes.

• Vertical, or Horizontal• Depth, Surface• Refinement properties

1. Simulate the design and observe the performance.

Execution time 100msBus Cycles 4000Ave Memory Occupancy 500KB

BRAM

ConcurrentVertical Refinement

New Algorithm

Surface

VerificationTool

Yes? No?

Execution time 200msBus Cycles 1000Ave Memory Occupancy 100KB

4. Re-simulate to see if your goals are met.

Backend Tool Process:1. Abstract Syntax Tree (AST) retrieves structure.

2. Control Data Flow Graph - SurfaceFORTE – Intel ToolReactive Models – UC Berkeley

3. Event Traces – Refinement Properties.

Vertical RefinementHorizontal Refinement

Page 55: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

55/60

MJPEG Encoding

Arch 1Arch 3

Arch 4

P D Q H P D Q

HD

D Q

Q

P D Q

HD

D Q

Q

H

H

TM

TM

Arch 2

P D Q

HD

D Q

Q

TM

TM

Col

Completely Sequential

Y, Cr, and Cb components parallelized

DCT and Quant

separated

Huffman operations parallelized

PreProcessing (P)

Huffman Encoding (H)

DCT (D)

Quantization (Q)

Table Modifications (TM)

Functional Key:

Collector (Col)

Mapping Guide:

uBlaze

FastSimplex Link (FSL)

FSL

==

Microblaze Soft Processor (uBlaze)

Fu

nc

tio

nal

Mo

del

Arch

itecture

Mo

del

Mapping Process

Problem StatementApproachContribution

System

Est. Cycles Char. Cycles Real Cycles

Rankings

Arch 1 145282 (52%) 228356 (25%) 304585 4, 4, 4

Arch 2 103812 (33%) 145659 (6%) 154217 3, 3, 2

Arch 3 103935 (29%) 145414 (1.2%) 147036 2, 2, 3

Arch 4 103320 (28%) 144432 (<+1%) 143335 1, 1, 1

Page 56: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

56/60

Other case studies

• H.264 Deblocking Filter– 14 different mapping explored– Execution time analysis for

computation, waiting, and communication operations.

– Average differences from Metropolis simulation and actual implementation was 3.48%.

• SPI-5 Packet Processing– 6 architecture models developed– Optimal FIFO length determined

Page 57: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

57/60

Summary and Conclusions1. Heterogeneity Modularity

– Functional block level Metropolis models of programmable services.• Direct structural correspondence aids accuracy.

Automatic structure extraction creates efficiency.– Independent characterization process of actual

hardware implementations.• Shown to be accurate. Independence creates

efficiency.

2. Complexity Abstraction– Depth/Surface Refinement allows internal

changes to the model.• Trace based formalism accuracy. Automatic checking

efficiency.– Vertical/Horizontal Refinement allow

structural changes to the model.• Event based formalism accuracy. Refinement property

encapsulation efficiency.

Problem StatementApproachContribution

Page 58: Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007

58/60

Thanks

• Questions?• Thanks

– Metropolis Team: Yoshi Watanabe, Felice Balarin, Roberto Passerone, Abhijit Davare, Haibo Zeng, Qi Zhu, Guang Yang, Trevor Meyerowitz, Alessandro Pinto

– Committee: Jan Rabaey, Alberto Sangiovanni-Vincentelli, John Wawrzynek, Lee Schruben

– Industrial: Adam Donlin (Xilinx), Sanjay Rekhi (Cypress)