Leveraging Model-Based Techniques for Component Level ...

Leveraging Model-Based Techniques for Component Level Architecture Analysis in

Product-Based Systems

by David Keith McKean

B.S. Electrical Engineering, May 1978, California State University Long Beach

M.S. Systems Engineering, The George Washington University

A Dissertation submitted to

The Faculty of

The School of Engineering and Applied Science

of The George Washington University

in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

May 19, 2019

Dissertation directed by

Shahram Sarkani

Professor of Engineering Management and Systems Engineering

Thomas Mazzuchi

Professor of Engineering Management and Systems Engineering

ii

The School of Engineering and Applied Science of The George Washington University

certifies that David Keith McKean has passed the Final Examination for the degree of

Doctor of Philosophy as of March 15, 2019. This is the final and approved form of the

dissertation.



David Keith McKean

Dissertation Research Committee:

Shahram Sarkani, Professor of Engineering Management and Systems

Engineering, Dissertation Co-Director

Thomas Mazzuchi, Professor of Engineering Management and Systems

Engineering, Dissertation Co-Director

Amirhossein Etemadi, Assistant Professor of Engineering Management and

Systems Engineering, Committee Member

Ebrahim Malalla, Assistant Professor of Engineering Management and Systems

Engineering, Committee Member

Timothy Blackburn, Professional Lecturer of Engineering Management and

Systems Engineering, Committee Member

iii

© Copyright 2019 by David Keith McKean

All rights reserved

iv

Dedication

First, I dedicate this work to my all-powerful Father and His beloved Son, for the grace

and mercy shown towards me, through continued mental and physical health without

which this work could not have been completed. To my best friend and soulmate, Alana,

your love and support throughout this entire effort has been invaluable. There are

multiple times I would have stopped this endeavor if not for your encouragement. To

Gil, your encouragement to “go big brain” every Saturday morning made it possible to

continue on. To my son’s Jed and Wil, your willingness to listen helped clarify many

ideas and issues. And to my long-deceased father, you challenged me to succeed, you

taught me discipline, you taught me to see the best in, expect the best of, and have

compassion for every person. I hope, through this work, to have proven a worthy student

and son.

Finally, to my advisors thank you for your knowledge, wisdom, and guidance throughout

the doctoral research process. To all doctoral class instructors, thank you for your time,

efforts, and passion to impart the knowledge needed to complete this endeavor.

v

Abstract of Dissertation



System design at the component level seeks to construct a design trade space of alternate

solutions comprising mapping(s) of system function(s) to physical hardware or software

product components. The design space is analyzed to determine a near-optimal next-level

allocated architecture solution that system function and quality requirements. Software

product components are targeted to increasingly complex computer systems that provide

heterogeneous combinations of processing resources. These processing technologies

facilitate performance (speed) optimization via algorithm parallelization. However, speed

optimization can conflict with electrical energy and thermal constraints. A multi-

disciplinary architecture analysis method is presented that considers all attribute

constraints required to synthesize a robust, optimum, extensible next-level solution. This

paper presents an extensible, executable model-based architecture attribute framework

that efficiently constructs a component-level design trade space. A proof-of-concept

performance attribute model is introduced that targets single-CPU systems. The model

produces static performance estimates that support optimization analysis and dynamic

performance estimation values that support simulation analysis. This model-based

approach replaces current architecture analysis of alternatives spreadsheet approaches.

The capability to easily model computer resource alternatives that produces attribute

estimates improves design space exploration productivity. Performance estimation

improvements save time and money through reduced prototype requirements. Credible

architecture attribute estimates facilitate more informed design tradeoff discussions with

vi

specialty engineers. This paper presents initial validation of a model-based architecture

attribute analysis method and model framework using a single computation thread

application on two laptop computers with different CPU configurations. Execution time

estimates are calibrated for several data input sizes using the first laptop. Actual

execution times on the second laptop are shown to be within 10 percent of execution time

estimates for all data input sizes.

vii

Table of Contents

Dedication ......................................................................................................................... iv

Abstract .............................................................................................................................. v

Table of Contents ............................................................................................................ vii

List of Figures .................................................................................................................. xii

List of Tables ................................................................................................................. xvii

List of Acronyms .......................................................................................................... xviii

Chapter 1 - Research Problem ........................................................................................1

1.1 Problem Context ................................................................................................. 1

1.2 Statement of the Problem .................................................................................... 3

1.3 Research Goals and Objectives ........................................................................... 6

1.3.1 Research Scope and Constraints ..................................................................... 7

1.4 Rationale and Justification .................................................................................. 9

1.5 Relevance/Importance/Motivation .................................................................... 10

1.6 Research Contributions to the Systems Engineering Body of Knowledge ....... 13

1.7 Research Approach ........................................................................................... 15

1.8 Organization ...................................................................................................... 16

Chapter 2 - Literature Review.......................................................................................18

2.1 System Architecture Structure .......................................................................... 18

2.2 Component Architecture Views ........................................................................ 20

viii

2.3 Component Physical Architecture .................................................................... 22

2.3.1 Performance Models ..................................................................................... 24

2.3.1.1 Single Computer Resource MOCs ......................................................... 25

2.3.1.2 Multiple Computer Resources................................................................ 28

2.3.2 Energy/Power/Thermal Models .................................................................... 29

2.4 Modeling Languages for Systems ..................................................................... 30

2.4.1 Functional Flow Block Diagrams (FFBDs) .................................................. 30

2.4.2 Integration Definition (IDEF) Language ...................................................... 31

2.4.3 Object-Process Methodology (OPM) ........................................................... 32

2.4.4 System Modeling Language (SysML) .......................................................... 33

2.5 MBSE Methodologies and Architecture Definition ......................................... 36

2.5.1 HW/SW Co-Design ...................................................................................... 37

2.5.2 Rational Harmony for Systems Engineering (SE) ........................................ 39

2.5.2.1 System Functional Analysis ................................................................... 40

2.5.2.2 Harmony-SE Architecture Analysis ....................................................... 41

2.5.2.3 Harmony-SE Architecture Design ......................................................... 42

2.5.3 Object-Oriented Systems Engineering Method (OOSEM) ........................... 43

2.5.4 Vitech Model-Based Systems Engineering .................................................. 49

2.5.5 IBM RUP-SE ................................................................................................ 52

2.5.6 Selected MBSE Methodology....................................................................... 55

ix

2.6 HW/SW Partitioning Optimization ................................................................... 56

Chapter 3 - Research Methodology ..............................................................................60

3.1 Research Method .............................................................................................. 60

3.2 Component Architecture Attribute Introduction ............................................... 60

3.3 Component Architecture Attribute Analysis Method ....................................... 61

3.3.1 Define Key Component System Functions ................................................... 63

3.3.2 Assign Attribute/Thread Weights ................................................................. 64

3.3.3 Define Candidate Physical Architecture Solutions ....................................... 66

3.3.4 Model Function Attributes ............................................................................ 67

3.3.5 Model Physical Architecture Attributes........................................................ 67

3.3.6 Compute Attribute Cost ................................................................................ 67

3.3.7 Perform Optimization Analysis .................................................................... 70

3.3.8 Perform Simulation Analysis ........................................................................ 71

3.3.9 Compute Total Attribute Cost ....................................................................... 71

3.3.10 Select Solution Architecture.................................................................... 71

3.4 Architecture Attribute System Model Overview .............................................. 71

3.5 Architecture Attribute System Model Details ................................................... 81

3.6 Architecture Performance Attribute System Model ......................................... 84

3.6.1 Performance Attribute Logical Architecture Extensions .............................. 84

3.6.2 Performance Attribute Physical Architecture Extensions............................. 84

x

3.6.2.1 Performance Attribute Single Core (SC) CPU Computation Model ..... 88

3.6.2.1.1 SC Analysis Computation Model ................................................... 89

3.6.2.1.1.1 SC Complex Math Computation Model .................................. 92

3.6.2.1.1.2 SC Floating Point Math Computation Model .......................... 95

3.6.2.1.1.3 SC Integer Math Computation Model ...................................... 96

3.6.2.1.1.4 SC Trig Computation Model.................................................... 97

3.6.2.1.1.5 SC Arc Trig Computation Model............................................. 99

3.6.2.1.1.6 SC Miscellaneous Computation Model ................................. 100

3.6.3 Architecture Attribute System Model – Quantitative Model Interface ...... 101

3.7 Performance Attribute Statistical Performance Models ................................. 102

3.7.1 Statistical Process Model Development Computer Configuration ............. 103

3.7.2 Estimation Models ...................................................................................... 105

3.7.2.1 Execution Time Data Collection Workflow Step ................................ 105

3.7.2.2 State Definition (Estimation) Workflow Step ...................................... 107

3.7.2.3 Distribution Analysis Workflow Step .................................................. 107

3.7.2.4 Simulink Modeling (Estimation) Workflow Step ................................ 109

3.7.3 Simulation Analysis Models ....................................................................... 110

3.7.3.1 State Definition (Simulation) Workflow Step...................................... 110

3.7.3.2 Transition Analysis .............................................................................. 110

3.7.3.3 Simulink Modeling (Simulation) Workflow Step ................................ 115

xi

Chapter 4 - Data Analysis and Results .......................................................................116

4.1 Case Study Definition ..................................................................................... 116

4.2 Case Study Architecture Attribute Workflow................................................. 117

4.3 Case Study Results Analysis ........................................................................... 119

4.3.1 Case Study Data Analysis Details............................................................... 123

4.3.2 Thread Cost ................................................................................................. 126

4.3.3 Simulation Analysis .................................................................................... 127

Chapter 5 - Conclusions ..............................................................................................128

5.1 Contributions to the field ................................................................................ 128

5.1.1 Limitations .................................................................................................. 129

5.2 Recommendations for Future Work................................................................ 130

Chapter 6 - Bibliography ............................................................................................132

Chapter 7 - COPYRIGHTS ........................................................................................144

Appendix A Oversized Figures ....................................................................................145

xii

List of Figures

Figure 1-1 Enhanced Architecture Artifacts ....................................................................... 1

Figure 1-2 INCOSE MBSE Roadmap (Shah 2010) ......................................................... 13

Figure 1-3 MBSE Framework for Architecture Attribute Overview ................................ 16

Figure 2-1 System-of-Interest Structure (IEEE 2008) ...................................................... 19

Figure 2-2 IEEE-1220-1998 System Breakdown Structure (ISO 2016) .......................... 19

Figure 2-3 System Structure Example using EIA-632 Building Blocks (SAE 2014) ...... 20

Figure 2-4 Traditional HW/SW Partitioning Target Architecture (Wolf 2003) ............... 22

Figure 2-5 Modern HW/SW Partitioning Physical Architecture ...................................... 23

Figure 2-6 Embedded Microprocessor MOC Levels of Abstraction (Meyerowitz 2008) 24

Figure 2-7 Cyclostatic Dataflow (Bilsen 1996) ................................................................ 26

Figure 2-8 IDEF0 Activity Box (IEEE 2012) ................................................................... 32

Figure 2-9 SysML 1.0 Relationship To UML 2.0 (Balmell, 2007) .................................. 33

Figure 2-10 Foundational Pillars of SysML (OMG 2018) ............................................... 34

Figure 2-11 Modified SysML Diagram Taxonomy (Roedler 2012) ................................ 35

Figure 2-12 Harmony-SE Functional Analysis (Hoffmann 2013) ................................... 41

Figure 2-13 Harmony-SE Architecture Analysis (Hoffmann 2013) ................................. 42

Figure 2-14 Harmony-SE Architecture Design Process (Hoffmann 2013) ...................... 43

Figure 2-15 OOSEM Method Pyramid (Estefan 2008) .................................................... 44

Figure 2-16 OOSEM Specify and Design System Process (Friedenthal 2015)................ 45

Figure 2-17 OOSEM Define Logical Architecture Process (Friedenthal 2015) ............... 46

Figure 2-18 OOSEM Define Physical Architecture Process (Friedenthal 2015) ............. 47

Figure 2-19 OOSEM Optimize and Evaluate Alternatives Process (Friedenthal 2015) .. 47

xiii

Figure 2-20 Analysis Context Block Definition Diagram Example (Friedenthal 2015) .. 48

Figure 2-21 OOSEM Cost Effectiveness Analysis Parametric Model ............................. 49

Figure 2-22 Vitech STRATA™ MBSE Model (Long 2011) ........................................... 50

Figure 2-23 Vitech MBSE Architecture Diagram (Long 2011) ....................................... 50

Figure 2-24 CORE® System Design Repository (SDR) (Booth 2008) ........................... 51

Figure 2-25 Sample DAG ................................................................................................. 57

Figure 3-1 Component Architecture Attribute Analysis Workflow ................................. 62

Figure 3-2 System Function Definition Artifacts ............................................................. 64

Figure 3-3 Sample Architecture Attribute and Thread Weights ....................................... 65

Figure 3-4 DAG Current Supported Model CRs .............................................................. 70

Figure 3-5 Architecture Attribute Model Overview ......................................................... 73

Figure 3-6 Architecture Attribute Model Detail ............................................................... 81

Figure 3-7 Performance Attribute Model ......................................................................... 85

Figure 3-8 computeExecutionTime Operation Code Segment ......................................... 91

Figure 3-9 computeComplexExecutionTime Operation Code Segment .......................... 93

Figure 3-10 selectComplexAddBufferTime Operation Code Segment ............................ 94

Figure 3-11 computeComplexAddTime Operation Code Segment .................................. 95

Figure 3-12 computeTrigExecutionTime Operation Code Segment ................................ 98

Figure 3-13 computeCosExecutionTime Operation Code Segment ................................. 99

Figure 3-14 System Model - Quantitative Model Interface ............................................ 101

Figure 3-15 SPM Development Flow ............................................................................. 102

Figure 3-16 Intel Sandy Bridge Microarchitecture (Lempel 2011) ................................ 104

Figure 3-17 Complex Add Single Buffer Multimodal Distribution ............................... 106

xiv

Figure 3-18 Single Core Complex Add State Parameters .............................................. 109

Figure 3-19 Sample Hidden Markov Model ................................................................... 111

Figure 3-20 Complex Add Single Buffer HMM State Transition Probability Matrix ... 112

Figure 3-21 Complex Add Single Buffer State 1 Histogram Display ............................ 112

Figure 3-22 Complex Add Single Buffer State 1 Histogram Data ................................. 113

Figure 3-23 Complex Add Single Buffer State 1 Visible Output Data .......................... 114

Figure 4-1 Case Study Functional and Physical Definition ............................................ 117

Figure 4-2 Case Study Data Results ............................................................................... 122

Figure 4-3 Computer One 32768 Data Sample 02 Outlier Report (from Minitab) ........ 123

Figure 4-4 Computer One 32768 Data Sample 02 (Minus Outliers) Outlier Report...... 124

Figure 4-5 Sample Minitab Histogram and Probability Plots ......................................... 125

Figure 4-6 Sample Wilcoxon Signed Rank Test Results ................................................ 126

Figure A-1 Architecture Analysis Container IBD .......................................................... 145

Figure A-2 StartArchitectureAnalysisBlock Activity and AD ....................................... 146

Figure A-3 LogArch Container IBD ............................................................................... 147

Figure A-4 LogArch Container Perform Computations Execution Control AD ............ 148

Figure A-5 LogArch One Functional Thread One Architecture IBD ............................. 149

Figure A-6 LogArch Functional Threads Perform Computations Exec Ctl AD ............ 150

Figure A-7 LogArch One Functional Thread One Architecture IBD ............................. 151

Figure A-8 Functional Thread Functions Perform Computations Exec Ctl AD ............ 152

Figure A-9 Architecture Attributes Per Function IBD ................................................... 153

Figure A-10 Architecture Attributes Per Function Execution Control AD .................... 154

xv

Figure A-11 Performance Attribute Per Function Execution Control AD ..................... 155

Figure A-12 PhyArch Container IBD (Three Candidates) ............................................. 156

Figure A-13 PhyArche Container Execution Control AD .............................................. 157

Figure A-14 PhyArche One Container IBD (Three Threads Shown) ............................. 158

Figure A-15 PhyArch One Container Execution Control AD ........................................ 159

Figure A-16 PhyArch One Container Thread One IBD (Three Functions Shown) ....... 160

Figure A-17 PhyArch One Thread One Container Execution Control AD .................... 161

Figure A-18 PhyArch One Container Thread One Function One IBD .......................... 162

Figure A-19 PhyArch One Thread One Function One Container Exec Ctl AD ............. 163

Figure A-20 PhyArch One Thread One Function One Container Interface AD ............ 164

Figure A-21 Function Physical Attribute Computation Container IBD ......................... 165

Figure A-22 Function Physical Attribute Comp Container Execution Control AD ....... 166

Figure A-23 Function Physical Performance Attribute IBD ......................................... 167

Figure A-24 Function Physical Performance Attribute Execution Control AD ............ 168

Figure A-25 Retrieve Performance Computations AD .................................................. 169

Figure A-26 Performance Single Core CM Container IBD ............................................ 170

Figure A-27 Single Core CM Execution Control AD ....................................................... 171

Figure A-28 Single Core CM Execution Time AD ........................................................... 172

Figure A-29 Single Core CM Analysis Container IBD .................................................... 173

Figure A-30 Single Core CM Analysis Container Propagate Start Activity .................. 174

Figure A-31 Single Core CM Analysis Container Execution Time AD ........................... 175

Figure A-32 Single Core CM Analysis Complex Container IBD ..................................... 176

Figure A-33 Single Core CM Analysis Complex Container Propagate Start Activity ... 177

xvi

Figure A-34 Single Core CM Analysis Complex Container Execution Time AD ............ 178

Figure A-35 Single Core CM Analysis Complex Add Container IBD .............................. 179

Figure A-36 Single Core CM Analysis Complex Add Container Execution Time AD .... 180

Figure A-37 MATLAB SIMULINK Complex Add Single Buffer Model ........................... 181

Figure A-38 Single Core CM Analysis Trig Container IBD ............................................ 182

Figure A-39 Single Core CM Analysis Trig Container Propagate Start Activity .......... 183

Figure A-40 Single Core CM Analysis Trig Container Execution Time AD ................... 184

Figure A-41 Single Core CM Analysis Trig Container IBD ............................................ 185

Figure A-42 Single Core CM Analysis Trig Cosine Container Execution Time AD ....... 186

Figure A-43 MATLAB SIMULINK Cosine Model ............................................................ 187

Figure A-44 Complex Add Single Buffer Hot State Pdf Parameters ............................. 188

Figure A-45 Complex Add Single Buffer Warm State Pdf Parameters ......................... 189

xvii

List of Tables

Table 2-1 RUP-SE Architecture Framework Model Levels (Cantor 2003) ..................... 53

Table 2-2 RUP-SE Architecture Viewpoints (Cantor 2003) ............................................ 54

Table 2-3 RUP-SE Sample Model Views (Cantor 2003) ................................................. 55

Table 2-4 HW/SW Partitioning Optimization Algorithm Summary ................................ 59

xviii

List of Acronyms

AD – Activity Diagram

CM – Computational Model

CR – Computer Resource

CPU – Central Processing Unit

FPGA – Field Programmable Gate Array

GPU – Graphics Processor Unit

GPGPU - General Purpose GPU

HW – Hardware

IBD – Internal Block Diagram

INCOSE – International Council on Systems Engineering

LogArch – Logical Architecture

MBSE – Model Based Systems Engineering

mCPU – multicore/manycore CPU

MPSoC – Multi-Processor SoC

PysArch – Physical Architecture

SEBoK – Systems Engineering Body of Knowledge

xix

sCPU – Single-Core CPU

SoC – System-on-a-Chip

SPM – Statistical Process Model

SW – Software

SysML – System Modeling Language

WCET – Worst Case Execution Time

1

Chapter 1 - Research Problem

1.1 Problem Context

System design transforms an input set of requirements to a next-level system

component architecture with associated requirements. Systems engineering standards,

such as ISO/IEC 15288 (IEEE 2008), ISO/IEC 24748-4 (formerly IEEE-1220) (ISO

2016), and EIA-632 (SAE 2014), provide conflicting system definitions of a component

(or component-level). Buede and Miller (Buede 2016) defines the notion of a system

component as a subset of a physical architecture to which a subset of a functional

architecture is allocated. Each component consists of hardware, software, people,

facilities, or some combination thereof. Components have a hierarchical structure like

requirements and functions. This paper focuses on a system decomposition layer where

the next-level physical architecture is composed of Computing Resource (CR) and

software product components as shown in Figure 1-1.

Figure 1-1 Enhanced Architecture Artifacts

2

Modern computer system physical CR architectures are comprised of single

heterogeneous CR nodes or multiple distributed CR nodes organized as a cluster, grid, or

cloud (Sadashiv 2011) configuration. Single node CRs can be composed of single core

Central Processing Units (sCPUs), multicore CPUs (mCPU), graphics processing units

(GPUs), or specialized hardware such as field programmable gate arrays (FPGAs). These

computing technologies enable performance (i.e. speed) optimization through algorithm

parallelization. However, performance optimization comes at the expense of conflicting

energy, thermal, and reliability concerns. Each CR possesses unique physical

performance (computation speed, energy efficiency, thermal characteristics, and

reliability) attributes. Typically, these CR technologies exhibit increased energy

consumption and heat generation for decreased computation time. Correspondingly,

energy consumption and heat generation are decreased with increased computation time.

Thus, the computation attribute inherently conflicts with the energy and heat attributes.

The systems engineer is challenged to perform an architecture analysis that

produces a next-level design trade space that considers attribute (e.g. execution time,

energy consumption, heat generation, etc.) estimates for various CR configurations. The

systems engineer must associate architecture analysis artifacts with the system model.

Past architecture analysis approaches have used utility curves (Hoffmann 2013),

simulations (Buede 2016), or prototypes to develop attribute estimates using specialty

tools. Artifacts produced by these specialty tools do not integrate with the system model.

Model Based System Engineering (MBSE) produces a system model as the

primary artifact (Ramos 2012). Friedenthal et al. (Friedenthal 2015) states that the

system model captures requirements, structure, behavior, parametrics (i.e. constraints),

3

and their interconnection relationships using a modeling language such as the Object

Management Group System Modeling Language™ (OMG SysML®) (OMG 2017).

System architecture captures system model structure, behavior, and interconnection

relationships at each system decomposition level. Figure 1-1, an adaptation of three

architecture views (functional, physical, allocated) plus requirements (Levis 1993),

elaborates the relationship between requirements and architecture. Alternatively, the

Systems Engineering Body of Knowledge (SEBoK) (BKCASE 2017) represents system

architecture as a system element structure (i.e. component) represented as a physical

architecture and behavior represented as a logical architecture. The logical architecture is

comprised of functional, behavioral, and temporal architecture views. The physical

architecture is comprised of a physical view of preferred system elements (i.e.

components) and their interfaces.

1.2 Statement of the Problem

The inability to accurately estimate architecture attributes for high

computation workload systems leads to suboptimal solutions that can introduce

significant technical risk leading to significant cost overruns, schedule delays, or

program failure. The following are real world examples of problems that directly

trace to deficient architecture attribute analysis at the CR/software product level:

1. Software received from a subcontractor was installed on a quad

core general purpose computer system. Software was executed for

a data set of size 2 GB first for 8 hours and then for 24 hours with

no results. The software was declared unusable. Further

investigation revealed that the subcontractor used the software to

4

process data sets of size 100 MB producing results in 2 hours.

Further analysis determined that the results would be produced for

the 2 GB data set in 72 hours. Further analysis revealed that

software implementation on a 100 core GPU would result in 7

hours processing time for the 2 GB data set and 2 hours with a 400

core GPU.

2. Application software received from a subcontractor was installed

for execution on a 30 MZ embedded processor. The software had

two processing cycle time requirements of 10 msec and 25 msec.

The software required 18 msec and 40 msec to complete execution.

The application program required complete redesign introducing a

significant program delay. The money spent to develop the

software turned into a sunk cost. Investigation revealed that the

subcontractor did not have the capability to estimate, simulate, or

prototype computation and software execution on the actual

hardware.

3. A vendor was contracted to develop a mobile application for

remote sensor monitoring on a ruggedized mobile phone. The

system was required to operate for 8 hours on a single battery

charge for deployment purposes. The customer required delivery in

one year. The vendor delivered the product in 11 months. The

application’s use of processing and graphics resources on the

ruggedized phone resulted in phone operation for 2 hours on a

5

single battery charge. The vendor’s software application was

rejected by the customer. Investigation revealed that a capability

was not available to estimate energy consumption on the actual

ruggedized phone.

All of these problems were discovered late in system development (system

integration or after system (or software) delivery). These are only three of many

such types of system development failures that can be traced to inadequate

component-level computation, energy, and thermal physical attribute performance

versus system function analysis. A systems engineer can avoid these problems

with architecture analysis methods and modeling capabilities that:

1. Associate algorithmic decisions in the functional architecture with CR decisions

in the physical architecture.

2. Accurately estimate attribute (computation (or execution) time, energy

consumption, heat produced, etc.) performance for selected CR configurations

early in the design life-cycle.

3. Understand the available multi-attribute trade space relative to a set of system

Measure of Performance (MOP) constraints.

4. Produce a near-optimal functional allocation to a selected physical CR

configuration.

The systems engineer must also consider the use of software (SW) CRs and

hardware CRAs. Software CRs can be configured as multicore (MC) (2-8 cores) and

many-core (16+ cores) CPUs (mCPUs), and Graphics Processing Units (GPUs) or

6

General-Purpose GPUs (GPGPUs) (100s-1000s threads). Hardware CRs can be

configured as multiprocessor system-on-chip (MPSoC) (Wolf 2008), field programmable

gate arrays (FPGAs), and specialty processors (such as Digital Signal Processors)1. All of

these CRs possess varying levels (or none at all) of architecture attribute models for

computation (or execution) time, energy consumption, heat generation, Mean-Time-

Between-Failure (MTBF) (reliability attribute), and so on. The systems engineer can use

CR simulations (when available) to assess algorithm effects on attribute performance

such as computation (or execution) times, energy consumption, or heat generation. In

most cases, the systems engineer must resort to algorithm implementation on CR

prototypes to assess attribute performance.

1.3 Research Goals and Objectives

This paper addresses a series of research questions that addresses the problem

discussed in the preceding section:

1. Can an abstract computation model be developed that accurately

estimates architecture attribute (i.e. execution time, energy consumption,

heat production, etc.) performance for a selected CR (sCPU, mCPU,

GPU) configuration during component level architecture analysis?

a. Can execution time be estimated to within 10 percent (20 percent

threshold) at the thread (and hence function) level for normal

execution?

1 All identified HW resources are reconfigurable (or reprogrammable). Programs targeting these devices are

referred to as firmware (FW). Programs that target general purpose processors are referred to as SW. So

technically, the partitioning is between SW and FW.

7

b. Can execution time be estimated to within 25 percent at the thread

(and hence function) level for worst case execution?

2. Can the produced estimates be related to Measures of Performance

(MOP) constraints to develop an understanding of the available multi-

attribute trade space versus requirements?

3. Can executable model extensions to the component-level logical and

physical architecture be developed that enable the systems engineer to

perform architecture analysis within the system model environment?

An additional objective of the research associated with this paper is to develop

model constructs that serve as integration points with optimization model(s)/algorithm(s)

and simulation models.

This paper will demonstrate that MBSE can facilitate improvement of component

level trade studies through use of an executable logical/physical architecture model layer

extension. The paper documents development of a model framework for computation of

multiple attribute estimates (i.e. execution time, energy consumed, heat generated, etc.)

for various CR configurations. This paper specifically develops computation (i.e.

execution time) attribute estimates using an sCPU computation model developed during

this research effort. This paper records results of a proof-of-concept to demonstrate the

ability to produce sCPU execution time estimates for various input data sizes.

1.3.1 Research Scope and Constraints

The full executable architecture attribute model framework computes thread level

and function level multi-attribute estimates with associated costs. Thread/function costs

8

are provided to optimization algorithm(s). Optimization algorithm(s) return an optimum

allocation configuration. Thread/function costs are computed using attribute estimates

that include execution time, consumed energy, and generated heat. This paper performs

the foundational research to develop the executable architecture attribute framework.

Further, this paper develops, integrates, and performs proof-of-concept testing for an

sCPU computation model to estimate execution time.

A separate sCPU computation model must be separately developed for each

processor family. The processor family used in this paper is the 2nd Generation Intel®

Core™ processor family (Intel® Core™ i7, i5, and i3) (Lempel 2011). This processor

family was chosen because it supports both sCPU and mCPU configurations and the

availability of assets. Proof-of-concept testing was performed due to limited asset

availability. More thorough validation testing can be performed with expanded access to

test assets such as are available in the George Washington University High Performance

Computing facility located on the Virginia Science and Technology campus.

This paper reviews various optimization algorithms (that could potentially be used

to determine optimum allocation) but does not perform optimization algorithm

integration. Rather, the model framework is constructed to construct and export Directed

Acyclic Graph (DAG) and associated cost data required by optimization algorithm(s).

The model framework enables optimization algorithm implementation within the

framework or integration external to the model framework.

The executable architecture attribute model and the toy problem presented in this

paper were developed using IBM® Rational® Rhapsody® Designer for Systems

9

Engineers SysML profile. SysML semantics are discussed in this paper to a model user

comprehension level. Full SysML semantics are found in (OMG 2017) or (Friedenthal

2015).

1.4 Rationale and Justification

Past efforts have addressed CR/software product level functional analysis/

allocation as a HW/SW partitioning problem (Wolf 2003). Systems engineers first

developed HW/SW Co-design analysis methods in the 1990’s (Wolf 2003) that

evolved to Co-simulation methods/environments in the early 2000’s (Teich 2012)

that today exist as third generation integrated Co-design environments (Teich

2012). In conjunction, there has been significant research in solving the HW/SW

Partitioning problem using various optimization algorithmic approaches (Wu

2012). All of the referenced approaches assume a single core CPU as the SW

allocation target and an Application-Specific Integrated Circuit (ASIC) (or

FPGA) as the HW allocation target.

Recent research (Campeanu 2014) has begun to address multicore CPU,

GPU, and FPGA optimal performance (or computation) allocation. Past HW/SW

partitioning approaches address only architecture allocation optimization for

performance (or computation) while not considering performance (i.e. response

time) requirements. Recent research (Sapienza 2013) begins to address multi-

attribute architecture optimization that includes requirements. The SW domain

Palladio Component Model (PCM) (Becker 2009) enables performance (or

computation) and reliability prediction modeling but does not support allocation

optimization.

10

None of the aforementioned approaches addresses

quantification/optimization/ simulation of computation time, energy, and thermal

architecture attributes related to MOP constraints. This paper defines an

architecture attribute model framework used to estimate thread/function execution

time, energy consumption, and generated heat. A thread/function cost (i.e.

estimate divided by MOP constraint) is computed for each attribute for use by

optimization algorithm(s).

Additionally, none of the approaches integrates with a system model

consistent with an MBSE methodology. The architecture attribute model

developed in this paper maintains system model consistency during architecture

attribute analysis through extension of functional and physical architectures.

These model extensions improve architecture analysis visualization making it

easier to understand. Integration of optimization and simulation interfaces with

the framework ensures analyses consistency with attribute estimates and values

that form the design trade space. In addition, architecture selection and allocation

decisions are easily documented within the model framework maintaining

consistency with other decision analysis artifacts.

1.5 Relevance/Importance/Motivation

INCOSE’s Systems Engineering Vision 2025 (Beihoff 2014) identifies

“Virtual Engineering Part of The Digital Revolution” as one of areas of “The

Future State of Systems Engineering”. The future state is defined as follows:

11

“Formal systems modeling is standard practice for specifying, analyzing,

designing, and verifying systems, and is fully integrated with other engineering

models. System models are adapted to the application domain and include a

broad spectrum of models for representing all aspects of systems. The use of

internet-driven knowledge representation and immersive technologies enable

highly efficient and shared human understanding of systems in a virtual

environment that span the full life cycle from concept through development,

manufacturing, operations, and support.”

The motivating technology for Virtual Engineering is Model-Based

Systems Engineering (MBSE). This paper proposes an MBSE architecture

attribute analysis method and executable architecture attribute model framework

that exhibits the capabilities defined by Vision 2025 as follows:

• “Tool suites, visualization and virtualization capabilities will mature to

efficiently support the development of integrated cross-disciplinary analyses

and design space explorations and optimizations, comprehensive

customer/market needs, requirements, architecture, design, operations and

servicing solutions.” This paper presents system model extensions that

encapsulates a computer resource virtualization environment consisting of

computation, energy, thermal, and reliability models that improves systems

engineering communication with computer, electrical, and mechanical cross-

discipline analysts. This paper also presents system model extensions that

facilitates design space exploration of functional algorithm formulation,

12

algorithm data processing size, and computer resource architecture (sCPU,

mCPU, GPU, GPGPU, FPGA, etc.), capacity (number cores, number threads,

number gates, etc.), and clock speed. System model extensions also introduce

multi-attribute node cost that enables multi-attribute optimization analysis.

• “Model-based approaches will move engineering and management from paper

documentation as a communications medium to a paperless environment, by

permitting the capture and review of systems design and performance in

digital form.” The system model extensions in this paper captures systems

design through architecture attribute analysis, trade space exploration, and

optimization analysis. Performance is captured by simulation of algorithm

computation execution, energy consumption, and heat generated for alternative

computer resource solution(s).

• “Model-based approaches will enable understanding of complex system

behavior much earlier in the product life cycle.” Product-based systems (e.g.

embedded systems) are increasing in complexity (Shah 2010) including computer

resource architecture. Multiple disciplines are required to support computer

resource systems analysis including software, computer, electrical, and

mechanical engineering. This paper describes methods and multi-view models

that enable effective and efficient multi-disciplinary analysis and synthesis of

component-level application algorithm functional and computer resource

13

physical architectures.

Figure 1-2 INCOSE MBSE Roadmap (Shah 2010)

Figure 1-2 shows the INCOSE vision for MBSE capability maturation

primarily through the 2010 decade. The research embodied by this thesis directly

supports the “Architecture model integrated with Simulation, Analysis, and

Visualization” capability shown in Figure 1-2.

1.6 Research Contributions to the Systems Engineering Body of Knowledge

Architecture analysis at the component-level (i.e. CR/SW product) requires the

systems engineer to address multi-disciplinary (software, computer, mechanical,

reliability, etc.) concerns to determine optimum architecture allocation to the next level.

Optimization algorithms, known as HW/SW partitioning, have been developed to support

allocation decisions. These algorithms require computation of cost for each function or

thread (section 2.6). This research develops a unique cost algorithm (section 3.3.6) that

14

integrates multiple attribute (performance, energy, thermal) estimates, quality

requirements (i.e. Measure of Performance (MOP) constraints), attribute weights, and

thread/ function weights (section 3.3.2). The key to successful application of the cost

algorithm are accurate attribute estimates. Attribute estimates must support both thread-

level and function-level estimates. Estimates must also support heterogeneous CR options

that include multi/many core CPUs, GPUs, GPGPUs, MPSoCs, FPGAs, and so on. This

research develops a unique model to estimate execution time (i.e. performance attribute)

for single-core CPUs. This research develops Statistical Performance Models (SPMs) for

several arithmetic operations typically used signal processing applications. The SPMs are

integrated with algorithm computation requirements and CPU clock speed to compute

algorithm execution time estimates. This research also uses arithmetic operation SPMs to

produce simulated streams of algorithm execution times that can be used for simulation

analysis.

These cost and estimation algorithms are integrated into new function and

physical architecture model extensions to the system model using SysML. These model

extensions introduce model-based attribute estimation capabilities to the system model.

Integration of the cost algorithm into the System Model enables further integration of

optimization algorithms into the System Model facilitating total architecture analysis

within the system model context. This research then builds on the performance attribute

model extensions to define a complete model framework for performance, energy, and

thermal attributes along with sCPU, mCPU, and GPU CRs.

The algorithms and models developed by this research enables the systems

engineer to perform architecture attribute analyses, share attribute estimation and

15

simulation data with specialty engineers, and make better informed design decisions

resulting in a robust design architecture. Additionally, the systems engineer is able to

perform model-based analysis activities that: 1) construct a design trade space, 2)

perform optimization analysis of the design trade space, 3) perform simulation analysis

associated with the design trade space, 4) provide detailed design data to trade studies,

and 5) determine the size of data to be processed by selected algorithms. Current

literature indicates that these activities are performed external to the system model using:

1) domain specific modeling environments, 2) unique software or scripts, or 3) Excel

spreadsheets.

1.7 Research Approach

MBSE framework elements developed by this research are shown in

Figure 1-3. ‘Component Function Modeling’ encapsulates research workflow

steps that produced functional architecture model extensions. ‘Performance Modeling’

encapsulates research workflow steps that produced functional and physical architecture

model extensions to introduce speed, energy, and thermal attributes that support

optimization analysis and simulation analysis. ‘Quantitative Performance Modeling’

research workflow steps developed arithmetic operation SPMs, estimation algorithms,

cost algorithms, and definition of SysML model elements required to integrate SPMs

through MATLAB Simulink model interfaces. The ‘Optimization Analysis Interface’

introduced model constructs and parameters that enable integration with SW/SW and

HW/SW optimization algorithms. ‘Simulation Analysis Interface’ introduced model

constructs that enables “implementation of” or “integration with” simulation analysis

tools.

16

Figure 1-3 MBSE Framework for Architecture Attribute Overview

This research effort focused on development of the performance (i.e. execution

time) architecture attribute to estimate execution time on sCPUs. A toy problem was

used to identify a single computational thread for analysis, performance estimation, and

proof-of-concept testing on two different sCPU computer systems. Future work will

demonstrate the breadth of applicability of the framework to other platforms and

attributes (energy [i.e. energy consumption], thermal [i.e. heat produced], reliability,

etc.).

1.8 Organization

Section 2 presents a literature review addressing the multi-dimensional challenge

facing the introduction of component level architecture analysis using model-based

architecture attributes (e.g. performance, energy, thermal, reliability) into existing MBSE

methodologies. First, the component system decomposition level is introduced to the

reader. Current definitions of component level architecture views are reviewed from the

literature. Component physical architecture is introduced with associated attributes

17

(performance, energy, thermal, reliability) and attribute models. SysML is reviewed at a

user level from the perspective of architecture development. Finally, several current

MBSE methodologies are reviewed with respect to architecture development. Section 3

introduces the architecture attribute development process, defines executable extensions

to application logical and physical architectures for performing architecture attribute

estimates, and describes the development of quantitative computation models that support

performance estimation. Section 4 presents a nominal example to demonstrate proof-of-

concept and discusses associated results. Section 5 provides research conclusions and

recommendations for future research.

18

Chapter 2 - Literature Review

This literature review investigates fundamental concepts associated with industry

standard practices regarding component-level architecture definition. Aspects discussed

includes system structure, architecture views, architecture attributes, and architecture

development methods. Traditional system design methods for HW/SW co-design have

evolved over the past two decades (Wolf 2003) (Teich 2012) focusing primarily on the

function performance attribute. Traditional optimization methods for HW/SW

partitioning have used cost as the basis for optimization. Cost has been primarily based

on function performance. Optimization algorithms have used various flow graph

techniques to represent the analysis model.

2.1 System Architecture Structure

Developers of product systems (BKCASE 2017) have utilized one of three

popular standards ISO/IEC/IEEE 15288, ISO/IEC 24748-4-2016 (formerly IEEE Std

1220-2005), or ANSI/EIA 632. Harmonization of these standards is on-going as

described by Roedler (Roedler 2012) where ISO/IEC/IEEE-15288 is responsible to

define the process requirement framework, ISO/IEC/IEEE 24748-4 defines SE planning

activities, and ANSI/EIA-632A defines detailed process descriptions. These standards, in

their current configuration, defines three concepts for system structure. The system

structure (i.e. decomposition) defined by ISO/IEC/IEEE 15288 is shown in Figure 2-1.

19

Figure 2-1 System-of-Interest Structure (IEEE 2008)

ISO/IEC/IEEE 26702 defines system breakdown structure (Figure 2-2) and

ANSI/EIA-632 defines system structure using building blocks (Figure 2-3).

Figure 2-2 IEEE-1220-1998 System Breakdown Structure (ISO 2016)

20

Figure 2-3 System Structure Example using EIA-632 Building Blocks (SAE 2014)

Buede (Buede 2016) defines a system component as an element of the physical

architecture which receives system function allocations. Buede (Buede 2016) states that a

component can represent SW/HW integration, specific HW element, specific SW

element, people, facilities, or a combination of these elements similar to Figure 2-2. This

dissertation will build off of the Buede component definition, as all algorithms will be

allocated to one or more CRs executing associated software2.

2.2 Component Architecture Views

ISO/IEC/IEEE 15288 (IEEE 2008) influences evolution of the SEBoK (BKCASE

2017) which defines the notion of a System Architecture for a System of Interest (SoI).

The SEBoK defines system architecture as “abstract, conceptualization-oriented, global,

2 The option to allocate to firmware is introduced with the FPGA and other specialty processors (e.g.

MpSOC, DSP) and is not covered by this research effort.

21

and focused to achieve the mission and life cycle concepts of the system” (BKCASE

2017). The SEBoK also states that system architecture “focuses on high-level structure in

systems and system elements” (BKCASE 2017). The architecture exhibits heuristics that

organize into four domains; static, dynamic, temporal, and environmental (Maier 2009)

where:

• Static Domain - encapsulates physical structure and physical interfaces

• Dynamic Domain - encapsulates logical structure including functions, functional

interactions, reactions to events, and effectiveness (e.g. performance)

• Temporal Domain – encapsulates temporal execution characteristics of functions

both cyclic and acyclic

• Environmental Domain – encapsulates system enablers (i.e. production, logistics

support), safety, climatic, and electromagnetic.

The SEBoK (BKCASE 2017) defines a logical architecture (mapping of Dynamic

and Temporal domains) and a physical architecture (mapping of Static domain). The

SEBoK (BKCASE 2017) decomposes the logical architecture into functional

architecture, behavioral architecture, and temporal architecture views. These architecture

views are developed at each decomposition level (i.e. system, subsystem, component).

This research will build on component-level logical architecture extensions to the

dynamic domain through the addition of logical attributes (e.g. number of computations

for various computation types). This research also extends the physical architecture

through addition of a temporal domain to support execution time calculations and

environmental domain to support consumed energy, generated heat, etc. calculations.

22

2.3 Component Physical Architecture

Buede (Buede 2016) defines a component as “a subset of the physical realization

(and the physical architecture) of the system to which a subset of the system’s functions

has been (or will be) allocated.” Components can be hardware, software, people,

facilities, etc. Traditional embedded systems define component level physical

architecture as one CPU and one Reconfigurable Unit (i.e. Application Specific

Integrated Circuit (ASIC)) (Figure 2-4) (Wolf 2003). Allocation to software in this

architecture is to the CPU. Allocation to hardware is to the Reconfigurable Unit (e.g.

FPGA).

Figure 2-4 Traditional HW/SW Partitioning Target Architecture (Wolf 2003)

Modern Embedded Systems (and Software and Information Technology Systems

(Maier 2009)) form a component physical architecture through various combinations of

heterogeneous computer resources (Campeanu 2014) (Figure 2-5).

23

Figure 2-5 Modern HW/SW Partitioning Physical Architecture

Allocation to SW in this architecture is to a sCPU, mCPU, GPU, or GPGPU.

Allocation to HW in this architecture is a reconfigurable unit (e.g. FPGA). This research

defines a CR (i.e. sCPU, mCPU, GPU, or GPGPU that executes associated software) as

the allocation target.

Research has been conducted to model performance, energy consumption, heat

generated, and reliability for sCPU, mCPU, and GPU resources as discussed in the

following sections.

24

2.3.1 Performance Models

A system’s functional semantics (i.e. algorithms) are formally represented by a

Model of Computation (MOC) (Fernandez 2009). The MOC defines rules that govern the

execution, interaction, and complexity associated with a group of connected

computational elements (Savage 1998). MOC rules dictate computational element

analysis, synthesis, simulation, and verification methods. MOCs have been developed at

different levels of abstraction (e.g. logic circuits, machines, languages, etc.) (Savage

1998) and domains (state, dataflow, event, etc.) (Meyerowitz 2008).

Figure 2-6 Embedded Microprocessor MOC Levels of Abstraction (Meyerowitz 2008)

Figure 2-6 identifies four levels of MOCs that will be referenced in the following

sections. Microarchitecture refers to the representation level of microprocessor

implementation. Timing refers to timing granularity. Typical model speeds are provided

for each level. RTL models simulate a gate level MOC. Cycle-accurate models simulate

cycle-level program execution on target microarchitecture. Instruction level models

25

simulate instruction counts without the target microarchitecture. Algorithm level models

simulate a compiled application executing at native speed on a host system (Meyerowitz

2008).

2.3.1.1 Single Computer Resource MOCs

The logic circuit is an RTL-level MOC that can be represented as a Directed

Acyclic Graph (DAG). DAG nodes (i.e. vertices) model gates and DAG flows model

gate output-input connections (Savage 1998). FPGAs are composed of logic circuits used

to implement algorithms at the HW, or more appropriately FW, level. Finite State

Machines (FSMs) (Mealy 1955) are one example of a state-based machine (Savage 1998)

MOC at the RTL level. FSMs can be connected to form a single FSM. Harel (Harel 1990)

introduced hierarchy, concurrency, and timing extensions to FSMs at the software level.

The Random-Access Machine (RAM) is an FSM that models a general-purpose computer

(Savage 1998). The RAM models a CPU FSM connected to a Random Access Memory

FSM. The RAM is another example of an RTL-level MOC.

Dataflow MOCs feature stateless functional transformation of input data to output

data at the algorithm level. Dataflows have firing rules determined by input channel

tokens that are characterized as follows:

• Single-Rate Data-Flow (SRDF), or Marked Directed (Commoner 1971),

graph: One token is consumed and produced on each graph edge

(Commoner 1971) for each graph vertex (or node) firing.

• Multi-Rate Data-Flow (MRDF): Greater than one token is consumed or

produced for each node (or actor) firing (Schaumont 2013).

26

• Dynamic Data-Flow (DDF): Token consumption and production depend

on conditional values that may not be known at compile time. Node firing

is based on Boolean valued tokens (modeled using select and switch

operators) (Buck 1993).

• Cyclostatic Data-Flow (CSDF) (Bilsen 1996): Token consumption and

production follows a corresponding firing sequence shown in Figure 2-7. In

CSDF each task Vj possesses an execution pattern, fj(1) … fj(Pj), of length

Pj. The sequence is defined as follows: Each nth firing of vertex Vj

function fj((n-1) mod Pj+1) is executed. Consequently, token consumption

and production are also CSDF sequences. The production on edge eu by

vertex Vj is represented by sequence 𝑥𝑗𝑢(1), 𝑥𝑘

𝑢(2) ⋯ , 𝑥𝑘𝑢(𝑃𝑗) of constant

integers. Each nth Vj firing produces 𝑥𝑗𝑢 ((𝑛 − 1) 𝑚𝑜𝑑 𝑃𝑗 + 1) tokens on

edge ej. In an analogous manner, vertex Vk fires when all inputs contain at

least 𝑦𝑘𝑢((𝑛 − 1) 𝑚𝑜𝑑 𝑃𝑘 + 1) tokens.

Figure 2-7 Cyclostatic Dataflow (Bilsen 1996)

Synchronous Dataflow (SDF) (Lee 1987) is another example of a simple,

analyzable dataflow MOC. An SDF graph is a construction of synchronous actors that

describe an algorithm. Each synchronous actor consumes a specified a priori number of

input samples for each input and produces a specified number of samples for each output.

27

Powerful SDL techniques exist that demonstrate graph consistency, determine memory

requirements (Moreira 2010), and enable execution scheduling on single or multiple

CPUs (Lee 1987) to construct a deterministic solution. A fully constructed SDF

implements a cycle-accurate model.

MOCs have been modeled via process networks, embedded in synchronous

languages, and embedded in toolsets. Dataflow Process Network (PN) (Lee 1995) MOCs

have been developed that exhibit various properties such as process concurrency (via

Kahn Process Networks (KPN) (Kahn 1974)), nondeterminism (KPN extension), stream

behavior (Gaudiot 1991), and hierarchy. Other examples of process network MOCs

include Petri Nets (Murata 1989) and Communicating Sequential Processes (Hoare

1978).

Examples of synchronous languages that embed timed MOCs include Estrel

(Berry 2000), for control dominated systems, LUSTRE (Halbwachs 1991) and SIGNAL

(LeGuernic 1991) for dataflow dominated systems. Estrel, LUSTRE, and SIGNAL are

examples of instruction-level MOCs. The Discrete Event (DE) MOC adds the timing

concept to events. DE is the basis for the system-level language SystemC

(SOURCEFORGE n.d.) and VHDL and Verilog Hardware Description Languages

(HDLs). The SystemC DE MOC is instruction-level and VHDL/Verilog HDL DE MOC

is register-transfer level.

Giotto is a SW toolset that includes a timed MOC. The MOC utilizes “known”

Worst-Case Execution Times (WCETs) for tasks. The Giotto compiler assembles a

task/communication schedule that satisfies timing requirements (Henzinger 2001). The

28

Ptolemy project (Hylands 2003) provides another SW toolset that implements several

MOCs including Boolean Dataflow (BDF) (Buck 1993), SDF, DDF, Multidimensional

Synchronous Dataflow (MDSDF), and PN domains with a focus on MOC hierarchical

connection. The Metropolis Electronic System Design Environment (Balarin 2003)

features a metamodel that incorporates existing MOCs at multiple levels of abstraction.

The metamodel also accommodates addition of new MOCs.

However, none of the aforementioned MOC environments computes algorithm

execution time based on function computation characteristics. This research develops an

abstract MOC based on algorithm computations. Furthermore, this research implements

the computation model using a SysML (section 2.4) centric MBSE environment.

2.3.1.2 Multiple Computer Resources

Many of the MOCs identified in the previous section also support applications

that require parallel programs and environments. Additional stochastic analytical models

(Boyd 1994) (Tikir 2007) and statistical performance models (Asanovic 2006)

(Thomasian 1986) that estimate mCPU software performance have been developed but

are difficult to use by non-experts (Tikir 2007). The roofline model features a visual

(two-dimensional graph) performance model integrating Floating-Point (FP)

performance, “operational intensity” (FP operations per main memory operation), and

memory performance (Williams 2009). Amdahl’s Law extended to mCPUs presents a

very intuitive multicore performance model (Hill 2008). Amdahl’s Law states:

𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙(𝑓, 𝑛) = 1

(1−𝑓)+ 𝑓

𝑛

(1)

29

Where f is the amount of an algorithm that can be parallelized and (1-f) is the

sequential portion of the algorithm and n is the number of CPU cores. Hill (Hill 2008)

introduces variations to Amdahl’s Law for symmetric, asymmetric, and dynamic multi-

core architectures. Though this research does not implement mCPU computational

models, the intention is to use the sCPU computational model to compute an sCPU

execution time followed by use of the modified Amdahl’s Law in (1) to compute mCPU

execution time as follows:

𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒𝑚𝑙𝐶𝑃𝑈 = (1𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙

⁄ ) × 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒𝑠𝐶𝑃𝑈 (2)

GPU performance models target low level (i.e. instruction pipeline or

shared/global memory access) (Zhang 2011) or program control level (Cui 2012). One

example abstract GPU computation model (Hong 2009) performs function execution time

computation using arithmetic intensity and memory access time. A CPU-GPU data

transfer performance model (Van Werkhoven 2014) can be used to estimate interface

transfer times when a function executes in the CPU/GPU followed by a function that

executes in the GPU/CPU. Further research is required to select or develop an appropriate

abstract GPU computation model.

The architecture attribute model integrates a dataflow MOC with the functional

architecture extensions. A dataflow MOC and SPM with the physical architecture

extensions for arithmetic computations presented in section 3.6.

2.3.2 Energy/Power/Thermal Models

There are a significant number (i.e. hundreds) of single-core, multicore, and GPU

energy, power, and thermal models that have been developed over the last five years.

30

Study of these models will require considerable research to determine which are

applicable for use in the architecture attribute model.

2.4 Modeling Languages for Systems

The SEBoK (BKCASE 2017) identifies six prevalent modeling languages used to

build descriptive system models:

1. Functional Flow Block Diagram (FFBD)

2. Integration Definition for Functional Modeling (IDEF0)

3. Object-Process Methodology (OPM)

4. Systems Modeling Language (SysML)

5. Department of Defense Architecture Framework (DoDAF)

6. Web Ontology Language (OWL)

Modeling languages 1-4 are pertinent to this paper. Each modeling language is

presented with details pertinent to selection of a modeling language.

2.4.1 Functional Flow Block Diagrams (FFBDs)

Functional Flow Block Diagrams (FFBDs) were introduced in the late 1950’s by

TRW Corporation (Oliver 1997) FFBDs are multi-tiered, order-sequenced diagrams of

the flow of system functions but do not represent time duration within or between

functions (NASA 2007) FFBD’s are generated during functional analysis and define the

“what” for each functional event (NASA 2007) Functional decomposition is a method

used to decompose higher level more abstract FFBDs to lower level more detailed

FFBDs for multiple tiers. The resulting functional hierarchy represents the functional

31

architecture. Original FFBD syntax is used to depict sequence, concurrency “AND”,

selection “OR”, and iteration (Oliver 1997).

TRW developed enhancements to FFBDs to make them executable (Oliver 1997).

First, data was added to FFBD flows (Alford 1977) Behavior Diagrams (BDs) introduced

additional flow notations, graph MOCs, and hierarchical control concepts (Alford 1992)

to produce executable EFFBDs. SysML Activity Diagrams subsume EFFBDs as

discussed in section 2.4.4.

2.4.2 Integration Definition (IDEF) Language

The IDEF language modeling methodology originated in the 1970’s via the USAF

Integrated Computer Aided Manufacturing (ICAM) program. IDEF models were defined

in three domains:

• IDEF03: Syntax and semantics to produce system function (activity,

process, operation, action) structural representation (functional model),

function relationships, and data required to integrate identified functions

• IDEF14: Syntax and semantics to produce system information structural

representation (information model), information complexity, and

application independent information view that can be transformed into a

database design

3 IDEF0 was originally introduced as Federal Information Processing Standard (FIPS) 183 in December of

1993. IEEE subsumed FIPS 183 and introduced it as IEEE Std 1320.1-1998 (IEEE Standard for Functional

Modeling Language – Syntax and Semantics for IDEF0). ISO/IEC then subsumed IEEE Std 1320.1-1998

and introduced it as ISO/IEC/IEEE 31320-1 (Information Technology – Modeling Languages – Part 1:

Syntax and Semantics for IDEF0) in 2012. 4 IDEF1 was originally introduced as FIPS 184 in December of 1993. IEEE subsumed FIPS 184 and

introduced it as IEEE Std 1320.2-1998 (IEEE Standard for Conceptual Modeling Language Syntax and

Semantics for IDEF1X/Sub 97). ISO/IEC then subsumed IEEE Std 1320.1-1998 and introduced it as

ISO/IEC/IEEE 31320-2 (Information Technology – Modeling Languages – Part 2: Syntax and Semantics

for IDEF1X97) in 2012.

32

• IDEF25: Syntax and semantics to represent system behavior over time

(dynamics model), specifically, the behavior of manufacturing system

resources.

IDEF0 is based on the Structured Analysis and Design Technique™. IDEF0 (and

SADT™) produce hierarchically decomposed activity and data models (Ross 1977) using

Activity boxes as shown in Figure 2-8. These models are used to build a system

functional architecture. IDEF0 (SADT) models are not executable as they do not fully

specify system behavior.

Figure 2-8 IDEF0 Activity Box (IEEE 2012)

2.4.3 Object-Process Methodology (OPM)

OPM encapsulates both a modeling language (graphics and natural language) and

model development methodology (Dori 2002) OPM holistically integrates structure and

behavior in a single model (Dori 2006) The methodology produces OPM diagrams that

represent structure with an object (i.e. function) entity and behavior with process and

5 IDEF2 did not continue

33

state entities (Dori 2006) OPM diagram entities are interconnected with links and

triggers. OPM defines several links; input, output, consumption, result, state-specified

consumption, and state-specified result. OPM diagram entities contain sufficient

semantics to simulate individual diagram behavior.

2.4.4 System Modeling Language (SysML)

The development of SysML was initiated by a UML™ for Systems Engineering

Request for Proposal (OMG 2003) promulgated by the Object Management Group

(OMG) in March 2003. SysML was conceived as an extension of the Unified Modeling

Language (UML) as shown in Figure 2-9. SysML Partners presented a technical

approach along with language features for SysML (Friedenthal 2004) at the 2004

INCOSE International Symposium that represented a response to the UML for SE RFP.

Over the next two years other competing SysML specifications were proposed to the

OMG. In 2006 a development team from more than ten companies (Balmelli 2007)

merged the competing specifications to form SysML 1.0. The current version of SysML

available from OMG is 1.5 (OMG 2017).

Figure 2-9 SysML 1.0 Relationship To UML 2.0 (Balmell, 2007)

34

The foundational viewpoints of SysML are Structure, Behavior, Requirements,

and Parametrics (Friedenthal, Moore and Steiner 2015) as shown in Figure 2-10.

Figure 2-10 Foundational Pillars of SysML (OMG 2018)

Figure 2-11 shows a taxonomy of SysML diagrams organized around four

viewpoints (Structure, Behavior, Specification, Parametric) suggested by the SysML

specification (OMG 2017). Requirements and Parametric diagrams are unique to SysML

(OMG 2017) (Friedenthal, Moore and Steiner 2015). Block Definition Diagrams are

modified from UML 2.0 Class diagrams (Friedenthal, Moore and Steiner 2015). Internal

Block Diagrams are modified from UML 2.0 Composite Structure diagrams (Friedenthal,

Moore and Steiner 2015). Activity Diagrams are modified to account for the differences

in Activity Modeling between SysML and UML 2.0 (Bock 2006). Package, Use Case,

Sequence, and State Machine diagrams are used as in UML 2.0.

35

Figure 2-11 Modified SysML Diagram Taxonomy (Roedler 2012)

SysML diagrams consist of model elements (e.g. blocks, activities, states, etc.)

representing graphic nodes and model elements (e.g. associations, dependencies, links,

etc.) representing interconnection paths (OMG 2017). Each of the nine SysML diagrams

supports a subset of graphic and interconnection model elements. The diagrams pertinent

to this paper are summarized below: (Friedenthal, Moore and Steiner 2015) (OMG 2017)

• Package Diagram (PD) – organizes a system model using the package

model element. Packages encapsulate model elements in a viewpoint,

view, domain, or namespace. Package diagrams are used to describe

package relationships.

• Block Definition Diagram (BDD) – defines block structure elements with

associated composition and classification relationships. Blocks support

port and flow interfaces.

36

• Internal Block Diagram (IBD) – defines block part structure elements with

associated interfaces and interconnections. Interfaces include Full Ports

and Proxy Ports that support provided and required features.

Interconnections are supported by links that connect ports and flows.

• Activity Diagram (AD) – defines the execution order of actions depending

on availability of action inputs, controls, and outputs. This flow-based

behavior models how action inputs are transformed into action outputs.

SysML was selected for model development over other modeling languages

because:

• SysML provides the capability to model executable functional

architecture extensions that compute functional attributes (i.e. number

computations, energy efficiency, etc.)

• SysML provides the capability to define a physical architecture with

executable extensions that compute physical attributes (i.e. execution

time, consumed energy, generated heat, etc.)

• SysML blocks facilitate integration of external models (i.e.

MATLAB® Simulink), tools and environments. This research imports

quantitative computation MATLAB® Simulink models for execution

time attribute computation.

2.5 MBSE Methodologies and Architecture Definition

This section begins with a discussion of the HW/SW co-design concept that pre-

dates MBSE. The section then summarizes four leading MBSE Methodology (Estefan

37

2008) approaches to architecture definition focused at the component level. The MBSE

methodologies are:

• IBM® Rational® Harmony Systems Engineering (Harmony-SE)

• Object-Oriented Systems Engineering Method (OOSEM)

• Vitech Model-Based Systems Engineering (MBSE)

• IBM Rational Unified Process for Systems Engineering (RUP-SE)

2.5.1 HW/SW Co-Design

HW/SW Co-design has been used in the development of electronic products

containing embedded systems6 in many application domains (e.g. mobile devices,

automobiles, home appliances, avionics, and so on).7 HW/SW co-design is defined as the

synergistic concurrent design of HW/SW to satisfy system requirements (De Micheli

1997). System designers were faced with the complex challenge of estimating and

optimizing embedded system performance for various partition alternatives. Teich

(Teich 2012) has identified four generations of evolving co-design methods and

environments from single CPU/ASIC target (thru mid-90s), co-simulation/complex

targets (thru mid-00s), co-synthesis/ heterogeneous multi-core/ASIC targets (thru early-

10s), to the present and future evolution of co-design methods for different system types

and attributes. Three sample co-design methodologies that have been used to produce

executable models are Electronic System Level (ESL) design, Platform Based Design

(PBD), and Model-based co-design.

6 Embedded Systems (ES) in these domains are implemented as Card-Based (e.g. 3U or 6U cards on

compactPCI® backplane), System-on-a-Chip (SoC), MPSoC, or Network-on-a-Chip (NoC). 7 ES domains can be classified as safety-critical, mission-critical, dependable, cyber-physical, and resilient

affecting attribute priorities.

38

The ESL (Teich 2012) co-design synthesis methodology consists of five steps; 1)

modeling/specification, 2) performance estimation, 3) module (or node) mapping, 4)

design space exploration (including optimization), and 5) automatic generation of

selected implementation. Step 1 develops an actor-oriented model that identifies

application actors with associated communication. SystemC is used to specify actor

behavior used to generate an executable specification. Step 2 builds an architecture

template that contains performance data for each actor as well as a SW and/or HW

module. The architecture template also contains all module permutations (processors,

HW modules, communication infrastructures) that satisfy overall requirements for

throughput and size. SW performance is estimated from model code transformations.

HW performance is obtained from an external tool. Step 4 uses an evolutionary multi-

objective optimization algorithm to support tradeoffs among attributes (e.g. FPGA gate

count vs throughput). ESL is supported by the SystemCoDesigner (Keinert 2009) tool.

The double roof co-design model (Teich 2012) coordinates HW and SW implementation

processes that maintains linkage with the ESL design approach.

PBD is based on the “platform” concept. Sangiovanni-Vincentelli defines a

platform as a library of computation and communication components used to compose a

design at a level of abstraction (Sangiovanni-Vincentelli 2007). PBD maps functionality

to a platform instance (top-down) and builds a corresponding platform (bottom-up) by

library component selection that meets propagated performance constraints. The

“middle” defines the functional-platform interface and is described by a semantic domain

that supports mapping of functions to platforms. PBD functionality can be represented

using Hardware/Software design language(s) and/or computation model (homogeneous

39

or heterogeneous). PBD platform (i.e. architecture) is represented using software (e.g.

OMG Unified Modeling Language (UML)) or hardware (e.g. Transaction Level

Modeling (TLM), communication-based, or microprocessor-based) modeling techniques.

Each architecture block is assigned a cost (i.e. execution time, power consumed, etc.)

used for subsequent optimization. The Metropolis framework (Balarin 2003) supports

the PBD methodology by providing a meta-model language parser (for functional and

architecture specifications) and interfaces to various back-end tools (simulator(s),

algorithm plug-ins, logic of constraints (LOC) checker(s), and other verification tools).

Model-based Co-design is another approach to HW/SW Co-design where a

system model, consisting of structural, functional, and dynamic models, was constructed

for an embedded system. A simulation model is then developed using virtual prototypes,

implemented via the Discrete Event System Specification (DEVS) language, and mapped

to a HW/SW architecture (Schulz 1998).

Current research is addressing processor/interface/memory allocation

optimization for mCPU-GPU computational resources. One SW/SW Co-design approach

proposes use of the OMG Modeling and Analysis for Real-Time and Embedded Systems

(MARTE) (OMG 2008) Unified Modeling Language (UML) profile (Campeanu 2014)

for descriptive qualitative component modeling combined with a GPU analytical model.

2.5.2 Rational Harmony for Systems Engineering (SE)

The Harmony-SE MBSE process is shown on the left side of Figure 2-12. The

process is iterated to decompose a system level to the next system level (e.g. system to

subsystem). Key process model objectives (Hoffmann 2013) are to:

40

• Identify required system functions

• Identify associated system states and modes

• Allocate system functions and states/modes to next level structure

These modeling objectives emphasize state-based behavior and function

identification and allocation not detailed functional behavior (Hoffmann 2013).

2.5.2.1 System Functional Analysis

System Functional Analysis transforms functional requirements into an

executable functional model (Figure 2-12) with associated function descriptions. The

functional model contains a combination of SysML Internal Block, Sequence, Activity,

and Statechart diagrams. The diagrams are developed using one of three alternatives

shown in Figure 2-12. Each alternative implements Use Case scenarios that are identified

during Requirements Analysis (Hoffmann 2013). Statecharts are the primary executable

model diagrams used for system functional analysis. Internal Block, Activity, and

Sequence diagrams are used for model execution in later stages of the design process.

41

Figure 2-12 Harmony-SE Functional Analysis (Hoffmann 2013)

2.5.2.2 Harmony-SE Architecture Analysis

The main objective of Design Synthesis (Figure 2-13 right side) is development of

a physical architecture (i.e. next level entities) that perform required functions within

performance constraints (Hoffmann 2013). Architecture Analysis uses Trade Studies to

select the best approach to achieve the required capability for each system function. The

process flow for Architecture Analysis is shown in Figure 2-13. Harmony-SE uses the

Weighted Objectives Method (Cross 2008) to evaluate alternatives by building a

weighted objectives table for each system function. A Weighted Objectives (Cross 2008)

calculation is performed in the Determine Solution action to arrive at the preferred

solution for each system function (Hoffmann 2013).

42

Figure 2-13 Harmony-SE Architecture Analysis (Hoffmann 2013)

2.5.2.3 Harmony-SE Architecture Design

Where Architecture Analysis refines the functional architecture, Architecture

Design focuses on allocation of system functions to an architectural structure. Each Use

Case Scenario defined during System Functional Analysis is evolved from a black-box to

white-box view (also known as Use Case Realization) (Hoffmann 2013). Additionally,

the next level physical architecture structure is defined with parts and interfaces

(including ports). The physical architecture is modeled with BDDs and IBDs. Function

activities and state machines are allocated to the physical architecture. Function

allocations to parts and interfaces collaboration is verified through model execution

(Figure 2-14).

43

Figure 2-14 Harmony-SE Architecture Design Process (Hoffmann 2013)

2.5.3 Object-Oriented Systems Engineering Method (OOSEM)

OOSEM applies object-oriented principles at system levels but somewhat

differently than how they are applied to software development (Friedenthal, Moore and

Steiner 2015). OOSEM integrates traditional structured analysis methods with certain

object-oriented methods. OOSEM uses traditional SE process concepts such as

requirements engineering and trade studies shown in Figure 2-15. OOSEM uses

methods common to other Object-Oriented Systems Engineering (OOSE) such as Use

Cases, black/white box descriptions, and SysML shown in Figure 2-15. OOSEM also

includes unique methods such as logical decomposition, partitioning criteria, node

allocation that pertain to architecture development.

44

Figure 2-15 OOSEM Method Pyramid (Estefan 2008)

The top-level OOSEM system development process is shown in Figure 2-16.

Each pass through the process produces specification(s) at the next level. The process is

repeated at system, system element, and component levels. The process is performed

recursively until requirements are specified for software, database, hardware, and

operational procedures. The Define Logical Architecture block (Figure 2-16) decomposes

current level logical components to next level logical components including logical

component interaction to satisfy current level requirements. The Synthesize Candidate

Physical Architectures (Figure 2-16) block allocates next level logical components next

level physical components. The Optimize and Evaluate Alternatives (Figure 2-16) block

performs design optimization and design trade studies.

45

Figure 2-16 OOSEM Specify and Design System Process (Friedenthal 2015)

Logical architecture definition (Figure 2-17) decomposes current level logical

components into next level logical components. Logical components abstract physical

components that satisfy required functionality without dictating implementation

constraints. Logical scenarios describe logical component interactions that realize system

element block functionality. Logical component interconnection is defined using internal

block diagrams (Figure 2-17). Initial next level logical components can again be

decomposed to repartition functionality and properties. Next level logical components

are specified in the same way as current level logical components (Figure 2-17). If a

logical component is characterized by state-based behavior, the logical component can be

specified by a state machine (Figure 2-17).

46

Figure 2-17 OOSEM Define Logical Architecture Process (Friedenthal 2015)

Figure 2-18 evaluates alternative next level physical architectures that satisfy next

level logical architecture requirements. The physical architecture is defined by physical

components, component relationships, and component distribution among system

elements (or nodes). Partitioning criteria (such as performance, reliability, etc.) are

defined for use in partition analysis (Figure 2-17). Logical component architecture

function, control, and persistent store elements are mapped to SysML nodes (e.g. block,

activity, etc.) to define a logical node architecture (Figure 2-18). Physical components

are mapped to SysML nodes to define a physical node architecture (Figure 2-18). At the

lowest level, logical node elements are mapped to software, persistent data, hardware, or

operator procedures (Figure 2-18). Critical component properties are identified for use in

trade study analysis to evaluate and select a refined physical architecture (Figure 2-18).

47

Figure 2-18 OOSEM Define Physical Architecture Process (Friedenthal 2015)

OOSEM Optimize and Evaluate Alternatives follows the flow shown in Figure

2-19.

Figure 2-19 OOSEM Optimize and Evaluate Alternatives Process (Friedenthal 2015)

48

A block definition diagram is used to model “define analysis context” (Figure

2-19) for trade studies as shown in Figure 2-20. Block definition and parametric SysML

diagrams are used to further elaborate the equations associated with each analysis (Figure

2-21).

Figure 2-20 Analysis Context Block Definition Diagram Example (Friedenthal 2015)

49

Figure 2-21 OOSEM Cost Effectiveness Analysis Parametric Model

Figure 2-21 models analysis equation(s) but is not an executable model.

2.5.4 Vitech Model-Based Systems Engineering

Vitech defines an MBSE development approach called STRATA™ (Long 2011)

that analyzes and decomposes a system into layers of increasing granularity as shown in

Figure 2-22. Each more detailed layer strategically converges to a final solution. A layer

integrates four domains: Requirements, Behavior, Architecture, and Verification and

Validation (V&V). V&V criteria at each layer must be met before proceeding to the next

layer.

50

Figure 2-22 Vitech STRATA™ MBSE Model (Long 2011)

Layer 1, the most general, functions are derived from systems requirements.

Functions are allocated to the system architecture. Vitech provides a representative flow

and set of artifacts for system architecture development shown in Figure 2-23. The

resulting system architecture exists within an Operational (or Enterprise) architecture.

The system architecture provides a context to relate operational entities with system

entities (Long 2011).

Figure 2-23 Vitech MBSE Architecture Diagram (Long 2011)

51

The behavior domain is supported by diagrams that identify functions, function

control flow, and function data flow. Control flow diagrams include Functional Flow

Block Diagrams (FFBDs), Enhanced Functional Flow Block Diagrams (EFFBDs), and

Activity Diagrams (ADs). Data flow diagrams include N2 charts and Sequence

Diagrams (SDs) (Long 2011). The Layer 1 behavior domain identifies functions at the

system level. The Level 2 behavior domain identifies functions and functional threads

(control/data flows) at the subsystem level. Subsequent layers (i.e. through Layer N)

decompose Layer 2 subsystems into more granular functions and functional threads.

The architecture domain encapsulates a physical hierarchy which receives

allocations of functions. The Layer 1 architecture identifies the system and associated

external counterparts. (Long 2011). The Layer 2 architecture evaluates system

partitioning strategies using criteria such as complexity (interface and testing),

performance, technology risk, future performance, and future technology insertion.

Architecture decomposition and partition evaluation continue to Layer N.

Figure 2-24 CORE® System Design Repository (SDR) (Booth 2008)

52

CORE® (Figure 2-24) supports Vitech MBSE by providing a central repository,

called the System Design Repository, and tool framework that (Vitech 2013):

• Integrates requirements

• Executes behavior models

• Facilitates architecture development

• Supports verification and validation

• Produces system documentation

2.5.5 IBM RUP-SE

The IBM Rational Unified Process® (RUP®) provides an iterative lifecycle

development framework. RUP was originally intended for software development but was

extended to support systems engineering. RUP® for Systems Engineering (RUP-SE)

(Nolan 2008) instantiates Model-driven Systems Development (Balmelli 2006). RUP-SE

facilitates Architecture-centric system development using Unified Modeling Language

(UML) 2.0 semantics (Cantor 2003). The RUP-SE architecture framework defines a set

of model levels (Table 2-1), viewpoints (Table 2-2), and views (Table 2-3) consistent

with ISO/IEC/IEEE-42010 (ISO/IEC JTC 1/SC 7 2011). Views elaborate viewpoints at

each model level as shown in Table 2-3.

53

Model level Expresses

Context System black box – the system and its actors (though this is a

black-box view of the system, it is a white-box view of the

enterprise containing the system)

Analysis System white box – initial system partitioning in each viewpoint

that establishes the conceptual approach

Design Realization of the analysis level in hardware, software, and

people

Implementation Realization of the design model into specific configurations

Table 2-1 RUP-SE Architecture Framework Model Levels (Cantor 2003)

Each model level encapsulates a level of specificity from abstract to concrete.

Each model level groups artifacts of similar detail. A model level does not represent a

decomposition level. Each model level can encapsulate multiple decomposition levels.

RUP-SE identifies system, subsystem, sub-subsystem, and classes as decomposition

levels.

Analysis model level Logical Viewpoint (Table 2-2) encapsulates functional

decomposition of system, subsystem, sub-subsystem, etc. technology independent

artifacts. Similarly, the Analysis model level Distribution Viewpoint (Table 2-2) defines

localities to distribute functionality. Design level models capture decisions that drive

implementation. Design level models are descriptive models not quantitative or

executable models. Analysis to design level transition maps subsystems, localities, and

classes to software, hardware, and worker designs. Supplementary (or non-functional)

54

requirements constrain distribution choices. RUP-SE supports the concept of “design

trades” in the construction of alternate design level distribution conceptual approaches

that are analyzed in terms of feasibility, quality, and cost.

Viewpoint Expresses Concern

Worker Roles and responsibilities

of system workers

Worker activities, human system

interaction, human performance

specification

Logical

Logical decomposition of

the system as a coherent

set of SysML blocks that

collaborate to provide the

desired behavior

• Adequate system functionality to realize

use cases

• System extensibility and maintainability

• Internal reuse

• Good cohesion and connectivity

Distribution

Distribution of the physical

elements that can host the

logical services

Adequate system physical characteristics to

host functionality and meet supplementary

requirements

Information Information stored and

processed by the system

Sufficient system capacity to store data:

sufficient system throughput to provide

timely data access

Geometric Spatial relationships

between physical systems Manufacturability, accessibility

Process

Threads of control that

carry out computational

elements

Sufficient partitioning of processing to

support concurrency and reliability needs

Table 2-2 RUP-SE Architecture Viewpoints (Cantor 2003)

RUP-SE employs the Object Management Group (OMG) System Modeling

Language (SysML) to model various viewpoint views. The context level logical

viewpoint view uses SysML Use Case diagrams (Table 2-3) to model actors, functions,

and provide functional descriptions at the system decomposition level. SysML block

diagrams are used to model the structural aspect of function decomposition and SysML

activity diagrams are used to model functional flow. SysML sequence diagrams are used

55

to model external and internal interactions. The view artifacts shown in Table 2-3 offers

no location to analyze the black-box (computer resource and software) multi-attribute

characteristics with associated distribution (or allocation).

Model

levels

Model viewpoints

Worker Logical Informatio

n

Distributio

n Process Geometric

Context Role

definition,

activity

modeling

Use case

diagram

specification

Enterprise

data view

Domain-

dependent

views

Domain-

dependent

views

Analysis Partitionin

g of system

Product

logical

decompositio

n

Product data

conceptual

schema

Product

locality

view

Product

process

view

Layouts

Design Operator

instruction

s

Software

component

design

Product data

schema

ECM

(electronic

control

media)

design

Timing

diagram

s

MCAD

(mechanica

l computer-

assisted

Implemen

-tation

Hardware and software configuration

Table 2-3 RUP-SE Sample Model Views (Cantor 2003)

2.5.6 Selected MBSE Methodology

All of the synopsized methodologies address functional (or behavioral) and

physical architecture development. All methodologies identify trade studies as a method

to evaluate various logical architecture to physical architecture allocation solutions. The

Harmony-SE methodology specifically elaborates an architecture analysis method

(Figure 2-13). The method integrates with model-based functional analysis (i.e. logical

architecture development) and model-based architecture design. The method forms the

baseline for definition of a model-based architecture analysis method.

56

2.6 HW/SW Partitioning Optimization

HW/SW partitioning is considered a sub-process of HW/SW Co-design where the

system designer makes function allocation decisions between CPU (SW) and FPGA (or

ASIC) (HW). All decisions are made to synthesize an optimum solution based on one or

more cost criteria (latency, area, cost, etc.). Early approaches (early 90s) placed all

functionality in HW (HW-approach) or SW (SW-approach) and partitioned to optimize

(minimize or maximize) a defined cost function (Wolf 2003). Vulcan-II (HW approach)

(Gupta 1992) partitioned a system graph model (based on data-flow graphs) into HW or

SW modules using an initial assumption of all HW modules. The system graph model is

produced from a system behavior model described using HardwareC. A module is moved

to SW upon satisfaction of timing constraint(s) and communication overhead (cost

function) minimization. Cosyma (SW approach) (Ernst 1993) partitioned an Extended

Syntax (ES) graph model (i.e. annotated control and data-flow graphs) into HW and SW

modules using an initial assumption of all SW modules. The ES model is produced from

a CX system description. Later (mid 90s) the Lyngby Co-synthesis System (LYCOS)

(Madsen 1997) approach represented functional behavior using control/data-flow graphs

overlain with a fine-grained computation model consisting of Basic Scheduling Blocks

(BSBs). Starting with all SW blocks, LYCOS uses the PACE (Knudsen 1996) dynamic

programming algorithm to map blocks to HW maximizing speedup for a defined

hardware area (i.e. maximize cost function).

There has been considerable research into the application of optimization algorithms to

the HW/SW partitioning problem over the last two decades. The problem is considered

to be NP-Hard in terms of computational complexity meaning optimization algorithm

57

execution time increases exponentially with the addition of analysis nodes. Optimization

algorithms analyze graph node/edge costs to optimize overall cost. Graph models

include Directed Acyclic Graph (DAG), Data Flow Graph (DFG), Control/Data Graph

(CDFG), and BSB. The TGFF (Dick 1998) tool generates pseudorandom task graphs

that implements DAGs for the purpose of performing and comparing HW/SW allocation

optimization algorithms. A sample DAG for three functions allocated to SW and HW

processing resources is shown in Figure 2-25.

Figure 2-25 Sample DAG

(López-Vallejo 2003) summarizes some optimization algorithms developed in the

90’s. Table 2-4 summarizes some of the many HW/SW partitioning optimization

algorithms developed since the mid-00s categorized by solution precision (exact versus

heuristic or near-optimal, algorithm method, graph model method, and granularity

(coarse [task level] versus fine [instruction level). Precise algorithm classes include

integer linear and dynamic programming8. Heuristic algorithm classes include Simulated

Annealing (SA), Evolutionary, Knapsack (KS), genetic (sub-classed as gene-based

8 In the 90’s precise optimization algorithms were developed using integer (Niemann and Marwedel,

Hardware/Software Partitioning Using Integer Programming 1996) and mixed integer linear programming

(Niemann and Marwedel, An Algorithm for Hardware/Software Partitioning Using Mixed Integer Linear

Programming 1997).

58

Genetic Algorithm (GA), meme-based Memetic Algorithm (MA), Ant Colony (AC),

Particle Swarm (PS), and Shuffled Frog Leaping (SFL)). Many GA’s are further refined

by search method including Non-dominated Sort (NS), Tabu Search (TS), and Pareto (P).

(Elbeltagi 2005) compares five evolutionary optimization algorithms (GA, MA, PS AC,

SFL) albeit not the HW/SW partitioning problem.

Reference

Solution

Precision

Algorithm

Method

Graph

Method

Granularity

(Kuang 2005) Exact Linear

(Integer)

DAG Coarse

(Banerjee 2006) Exact Linear

(Integer)

Task Graph Coarse

(Wu 2006) Exact Dynamic CDFG Coarse

(Henkel 2001) Near-

Optimal

SA BSB Coarse/Fine

(Banerjee 2004) Near-

Optimal

SA DAG Coarse

(Jing 2013) Near-

Optimal

SA + Greedy DFG Coarse

(Liu 2013) Near-

Optimal

SA + TS DAG Coarse

(Wu 2013) Near-

Optimal

KS + TS CDFG Fine

(Schlichter 2006) Near-

Optimal

Evolutionary Specification

Graph

Coarse

(Zitzler 2001) Near-

Optimal

GA + P N/A Coarse

(Deb 2002) Near-

Optimal

GA + NS N/A Coarse

(Mudry 2006) Near-

Optimal

GA Source Code Fine

(Li 2014) Near-

Optimal

GA DAG Coarse

(Lin 2014) Near-

Optimal

GA + TS DAG Coarse

(Kang 2013) Near-

Optimal

GA + PSO DAG Coarse

(Yu-dong 2009) Near-

Optimal

AC CDFG Coarse

(Du 2014) Near-

Optimal

SFL DAG Coarse

59

Table 2-4 HW/SW Partitioning Optimization Algorithm Summary

The algorithms shown in Table 2-4 are candidates to determine optimum logical

architecture function to candidate physical architecture allocation. Inspection of Table

2-4 Graph Method column shows that DAG is the most popular node model method for

HW/SW optimization algorithms. The architecture attribute model is constructed to

integrate DAG node structures which can be export to optimization algorithms9.

9 Optimization algorithm integration and evaluation is not covered by this research effort.

60

Chapter 3 - Research Methodology

3.1 Research Method

The method employed by this research effort was to develop a framework that

includes model-based analysis method and models. The research method was executed

in four phases:

• Define Architecture Attribute Analysis method

• Develop executable Architecture Attribute Model Framework

• Develop Statistical Performance Models

• Perform Case Study

The following sections provide details of the activities and artifacts produced by

each research phase.

3.2 Component Architecture Attribute Introduction

Functional analysis at the component-level develops a component-level functional

architecture. The architecture comprises a group of functional threads as discussed in

(McKean 2019). Each thread is assigned a Component Response Time (CRT) MOP

constraint. Each thread CRT decomposes to a group of thread function latency

constraints. Each thread and thread function are energy and thermal constraints.

The physical architecture identifies candidate CR solutions. Each CR solution

contains a combination of sCPU, mCPU, and GPU processing resources. This paper

replicates thread/ function structure in both functional and physical architecture replacing

the term “function” in the functional architecture with “function node” in the physical

61

architecture. In addition, “function nodes” are mapped to DAG nodes for integration with

optimization algorithms.

The architecture attribute model defined in this paper extends both functional and

physical architectures (sections 3.4 through 3.6). This paper presents a detailed

performance attribute model exposition including development of an sCPU computation

model (i.e. SPM) for mathematical operations (section 3.6.2.1). Section 3.3 introduces the

architecture attribute analysis method that details modeling activities that produce

application architecture attribute executable models.

3.3 Component Architecture Attribute Analysis Method

Figure 3-1 defines an architecture attribute analysis method that incorporates

modeling workflow elements required to develop an architecture attribute model (defined

in sections 3.4 and 3.5). The method replaces the existing Rational Harmony-SE

architecture analysis workflow in (Figure 2-13).

62

Figure 3-1 Component Architecture Attribute Analysis Workflow

The model defines two execution states corresponding to two method use cases:

optimization and simulation. The model also supports two analysis modes: ‘Most Likely’

where model computes Expected Value (EV) (e.g. execution time for performance

attribute) and worst case (e.g. Worst-Case Execution Time (WCET) (Wilhelm 2008) for

performance attribute).

The following sections provide method details for each workflow step with model

blocks designated by bold font and block attributes designated by italics font.

63

3.3.1 Define Key Component System Functions

This workflow step is synonymous with ‘System Functional Analysis’.

Harmony-SE implements this workflow step in Figure 2-12 to produce an executable

logical architecture. The architecture consists of structural diagrams (BDDs and IBDs).

BDDs model functional decomposition. Block ports/interfaces define interaction points

between blocks. IBDs model the internal part structure of blocks. Each part contains a

behavior diagram (AD or SMD).

Figure 3-2 details various elements of a component logical architecture. Each

Use Case at the component level is composed of one or more scenarios (Pohl 2010) (or

function threads). Each function thread is composed of one or more functions as in

Figure 3-2. For streaming applications (e.g. Digital Signal Processing) each function

encapsulates an algorithm. Algorithms can be evaluated at the thread level (for example,

spatial domain filter versus frequency domain filter) or at the individual function level

(for example, radix-2, radix-4, split-radix Fast Fourier Transform (FFT) algorithm

(Balducci 1997).

64

Figure 3-2 System Function Definition Artifacts

3.3.2 Assign Attribute/Thread Weights

This process step assigns weights for Performance, Energy, and Thermal

architecture attributes (see AttributeWeight block in Figure 3-5). Architecture attribute

weights are uniformly applied to all functions of all threads. The weights are assigned to

each attribute based on the system domain. The performance attribute would be most

prominent in streaming applications (e.g. signal processing) embedded in sensor, combat,

command and control and other similar systems that do not have energy or thermal (i.e.

65

cooling) constraints. The energy attribute would be most prominent for smart phone

applications and other energy constrained (e.g. battery-operated) systems. The thermal

attribute would be most prominent in automotive and other high thermal environment

systems.

One approach used to determine attribute/thread weights is the swing weight

matrix (Parnell 2009) as discussed in Mckean, et. al. (McKean 2019). Another approach

to determine attribute/thread weights is the Weighted Objectives Method (Cross 2008).

This method uses relative weights among architecture attributes and thread execution

frequency. Figure 3-3 illustrates an example configuration of weighted architecture

attributes for a system where performance is determined to be the most important

attribute, thermal is valued at 60 percent relative to performance, and energy is valued at

20 percent relative to performance. Attributes do not have to be placed at the top and

bottom of the scale. Weights are numbered from 1 to 10 when used to compute minimum

cost (see section 3.3.6). Weights are set to reciprocal values of those shown in Figure 3-3

when computing minimum cost. Weights are numbered from 10 to 1 when used to

compute maximum cost.

Figure 3-3 Sample Architecture Attribute and Thread Weights

66

This workflow step also assigns thread weights (see ThreadWeight block in

Figure 3-5) for each thread frequency (i.e. Very High, Normal, etc.). Each thread is

uniquely assigned a thread weight (Thread_X_Weight in Figure 3-5). Threads are

organized into Very High, High, Normal, and Seldom thread frequency. Each category is

further sub-divided into normal and abnormal (or failure) threads (Carson 2013). Thread

categories are weighted similar to architecture attributes. Figure 3-3 shows an example

set of thread frequency weights. Specific architecture attribute and thread frequency

weights are determined uniquely for each system domain based on system requirements

and environment.

The swing matrix method (McKean 2019) is preferred for applications that have a

large number of function threads and/or functions that are grouped by importance and

variation. The weighted objectives method is preferred for applications that have a small

number of function threads and/or functions or where there is a desire to uniquely weight

each function thread and/or function attribute.

3.3.3 Define Candidate Physical Architecture Solutions

This workflow step identifies a set of CR alternative solutions that form a

physical trade space for analysis. Each CR solution instantiates the

Candidate_X_PhysicalArchitecture block in Figure 3-5. Each CR solution is defined

as a NumberCpus (0 if no multicore or >1) operating at CpuClockFrequency (in GHz)

and NumberGpuThreads (0 if no GPU) operating at GpuClockFrequency (in GHz). A

single CPU configuration is included for every CR alternative operating at

CpuClockFrequency. Model specifics are provided in section 3.4.

67

3.3.4 Model Function Attributes

This workflow step adds executable architecture attribute model elements that

extends each logical architecture (Figure 3-6 left side) function. Model elements are

added to each function of all function threads. The model currently implements the

performance attribute. Specific model details are provided for the performance attribute

in section 3.6.1.

3.3.5 Model Physical Architecture Attributes

This workflow step adds executable architecture attribute model elements that

extends each function-function node (Figure 3-6 right side) pair for all CR architecture

(SC, MC, GPU) elements for all supported architecture attributes. The model currently

implements the performance attribute. Specific model details are provided for the

performance attribute in section 3.6.2.

3.3.6 Compute Attribute Cost

This workflow step computes thread and function node costs (NOTE: Costs are

NOT monetary, but computed values for use in optimization algorithms as discussed in

section 2.6). Computed costs will be minimum or maximum costs as dictated by the

optimization algorithm. The architecture attribute model supports both minimum and

maximum cost computations. The cost algorithms defined in this section compute

minimum costs. Each attribute cost computation is patterned after computation of a

Technical Performance Measure (TPM)10. Attribute and thread weights provided from

section 3.3.2 must be consistent with minimum (or maximum) cost computations.

10 TPM definition from the SEBoK (BKCASE Editorial Board 2017) is “Measures of attributes of a system

element within the system to determine how well the system or system element is satisfying specified

68

PhysArch_X_Thread_Y blocks (Figure 3-5 right side) encapsulate thread cost

values. PhysArch_X_Thread_X_ FunctionNode_Z blocks (Figure 3-5 right side)

encapsulate function node cost values. Figure 3-5 depicts system Measure of

Effectiveness (MOE)11 ‘System Response Time’ flow down to component level

‘Component Response Time’ MOP for each thread.

Function node WeightedNodeCost (i.e. CostNode) is computed for each CR to

minimize node cost12. WeightedNodeCost is computed for each CR as follows:

𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑁𝑜𝑑𝑒𝐶𝑅= 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑃𝑒𝑟𝑓𝐶𝑅

+ 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝐸𝑛𝑒𝑟𝑔𝑦𝐶𝑅+

𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝐻𝑒𝑎𝑡𝐶𝑅 (3)

where CR is SC, MC, GPU; WeightedCostPerf is the WeightedNodePerformance

Cost attribute (Figure 3-6 right side); WeightedCostEnergy is the WeightedNodeEnergyCost

attribute (Figure 3-6 right side); and WeightedCostHeat is the WeightedNodeHeatCost

attribute (Figure 3-6 right side). WeightedNodeEnergyCost is computed for each CR

using the following:

𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑃𝑒𝑟𝑓𝐶𝑅= 𝑊𝑝 ∗

𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝐸𝑠𝑡𝐶𝑅

𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑀𝑂𝑃 (4)

where CR is SC, MC, GPU; ExecTimeEst is estimated execution time (Figure 3-6

right side), LatencyMOP is allocated function Latency MOP (Figure 3-6 center), and

weight Wp is the PerformanceWeight attribute from AttributeWeight (Figure 3-5 right

side).

requirements (Roedler and Jones 2005, 1-65) (Roedler and Jones 2005)”. Here system element is

component and requirements are MOPs. 11 These constraints are derived from quality requirements (Pohl 2010). 12 Some optimization algorithms prefer to maximize node cost. For this case MOP is moved to numerator

and Estimate is moved to the denominator for equations (4), (5), and (6).

69

WeightedNodeEnergyCost is computed for each CR using the following:

𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝐸𝑛𝑒𝑟𝐶𝑅= 𝑊𝐸 ∗

𝐸𝑛𝑒𝑟𝑔𝑦𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑑𝐸𝑠𝑡𝐶𝑅

𝐸𝑛𝑒𝑟𝑔𝑦𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑑𝑀𝑂𝑃 (5)

where CR is SC, MC, GPU; EnergyConsumedEst is estimated consumed energy

(Figure 3-6 right side), EnergyConsumedMOP is allocated function energy consumption,

and weight WE is the EnergyWeight attribute from AttributeWeight (Figure 3-5 right

side).

WeightedNodeThermalCost is computed for each CR using the following:

𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑇ℎ𝑒𝑟𝑚𝑎𝑙𝐶𝑅= 𝑊𝑇 ∗

𝐻𝑒𝑎𝑡𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑𝐸𝑠𝑡𝐶𝑅

𝐻𝑒𝑎𝑡𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑𝑀𝑂𝑃 (6)

where CR is SC, MC, GPU; HeatGeneratedEst is estimated generated heat (Figure

3-6 right side), HeatGeneratedMOP is allocated function heat generation, and weight WT is

the ThermalWeight attribute from AttributeWeight (Figure 3-5 right side).

Function node costs are combined to form a thread cost according to the

following:

𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑇ℎ𝑟𝑒𝑎𝑑 = 𝑊𝑇ℎ ∗ (𝑁𝑜𝑑𝑒𝐶𝑜𝑠𝑡𝐹𝑁_1𝐶𝑅+ 𝑁𝑜𝑑𝑒𝐶𝑜𝑠𝑡𝐹𝑁_2𝐶𝑅

+ ⋯ + 𝑁𝑜𝑑𝑒𝐶𝑜𝑠𝑡𝐹𝑁_𝑛𝐶𝑅) (7)

where CR is SC, MC, GPU; NodeCostFN_1 through NodeCostFN_n are weighted

node costs computed by equation (3) above; and WTh is one of the attributes from

ThreadWeight.

70

3.3.7 Perform Optimization Analysis

This workflow step supports execution of an optimization algorithm, encapsulated

by OptimizationAnalysis (Figure 3-5), to select the optimum solution from the trade

space created by all Candidate_X_PhysicalArchitecture blocks. Many different meta-

heuristic techniques have been used to optimize HW-SW partitioning (see section 2.6).

The OptimizationAnalysis block provides a model placeholder for implementation of

one or more of these algorithms. HW-SW optimization algorithms require evaluation for

applicability to SW-SW optimization (and is not the subject of this research). This

research defines a DAG (see section 2.6) framework (Figure 3-4) that includes all CRs

currently supported by this model that is exported to the OptimizationAnalysis block.

Figure 3-4 DAG Current Supported Model CRs

Figure 3-4 shows a sample set of DAG nodes for three functions executing on

each CR.

71

3.3.8 Perform Simulation Analysis

This workflow step encapsulates execution of a simulation capability

encapsulated by SimulationAnalysis (Figure 3-5) to analyze architecture attribute

simulation data produced from the trade space when the model executes in the simulation

state. Generation of architecture attribute simulation data is discussed in section 3.7.3.

3.3.9 Compute Total Attribute Cost

This process step determines the total cost for each solution encapsulated by each

Candidate_X_PhysicalArchitecture block. This method builds a

Candidate_X_Physical Architecture SolutionCost by summing the

Thread_X_Physical OptimumThreadCost (i.e. maximum cost) returned by the

OptimizationAnalysis block for all solution threads.

The method in the preceding paragraph is repeated for each

Candidate_X_Logical Architecture block to develop a series of

Candidate_X_PhysicalArchitecture solution costs.

3.3.10 Select Solution Architecture

This method selects the Candidate_X_PhysicalArchitecture with the minimum

(or maximum, if optimization algorithm uses maximum cost) Solution Cost as the

preferred physical architecture.

3.4 Architecture Attribute System Model Overview

Figure 3-5 presents a component level SysML Block Definition Diagram (BDD)

that defines a set of candidate logical architectures and a set of candidate physical

72

architectures. The user configures the model to support one of three trade study

scenarios:

• One logical architecture with multiple physical architectures –

enables evaluation single algorithm(s) on multiple CR configurations.

For example, to evaluate one algorithm on two physical architectures set

NumberLogical Architectures to 1, NumberLogicalThreads to 1 for

logical architecture one, NumberLogicalThreadFunctions to 1 for thread

one of logical architecture one, NumberPhysicalArchitectures to 2,

NumberPhysicalThreads to 1 for both physical architecture one and two,

and NumberPhysicalThread Functions to 1 for thread one of both

physical architecture one and two.

• Multiple logical architectures with a single physical architecture –

enables evaluation of multiple algorithms on a single CR configuration.

For example, to evaluate two algorithms on one physical architecture set

NumberLogical Architectures to 2, NumberLogical Threads to 1 for both

logical architecture one and two, NumberLogicalThreadFunctions to 1

for thread one of both logical architecture one and two,

NumberPhysicalArchitectures to 1, NumberPhysical Threads to 1 for

physical architecture one, and NumberPhysicalThreadFunctions to 1 for

thread one of physical architecture one.

• Multiple logical architectures on multiple physical architectures –

enables evaluation of multiple algorithms on multiple CR configurations.

For example, to evaluate two algorithms on two physical architectures set

73

NumberLogical Architectures to 2, NumberLogicalThreads to 1 for both

logical architecture one and two, NumberLogicalThreadFunctions to 1

for thread one of both logical architecture one and two, NumberPhysical

Architectures to 2, NumberPhysical Threads to 1 for both physical

architecture one and two, and NumberPhysical ThreadFunctions to 1 for

thread one for both physical architecture one and two.

Figure 3-5 Architecture Attribute Model Overview

The left side of Figure 3-5 decomposes a logical architecture container that

encapsulates one or more candidate logical architecture(s). Each logical architecture

encapsulates one or more functional thread(s). Each functional thread encapsulates one

or more function(s). The right side of Figure 3-5 decomposes a physical architecture

container that encapsulates one or more candidate physical architecture(s). Each physical

74

architecture encapsulates one or more physical functional thread(s). Each physical

functional thread encapsulates one or more physical function node(s). The logical and

physical architectures structures are identical. Figure 3-5 provides a descriptive model

BDD of the structure of logical and physical architecture containers. However, Figure 3-5

does not provide enough semantic detail to build an executable model. Within SysML,

IBDs, ADs, and SMDs provide the semantic detail required to construct executable

models. This section presents the model constructs necessary to build an executable

model that performs architecture analysis, structurally decompose the logical architecture

to the function level, structurally decompose the physical architecture to the function

level, and describe architecture analysis behavior to the logical/physical function level.

The ArchitectureAnalysis block (Figure A-1) models a container that functions

as the seed model element for construction of an executable model. The Architecture

Analysis IBD models a group of SysML parts (i.e. block instantiations) and part

interfaces (i.e. ports) shown in Figure A-1. The ArchitectureAnalysis IBD instantiates

and interconnects the four main blocks used to perform architecture attribute analysis; 1)

LogicalArchitectureContainer (i.e. via CMP_Logical ArchitectureContainerPart), 2)

PhysicalArchitectureContainer (i.e. via CMP_Physical ArchitectureContainerPart),

3) OptimizationAnalysis (i.e. via OptimizationAnalysisPart), and 4) Simulation

Analysis (i.e. via SimulationAnalysisPart). The ArchitectureAnalysis IBD also

instantiates and interconnects nine support blocks that configure various trade study

parameters:

75

• Number of Logical Architectures (1-4): CMP_NumberLogical

ArchitecturesBlock (i.e. Figure A-1 CMP_NumberLogical

ArchitecturesPart).

• Number of Physical Architectures (1-4): CMP_NumberPhysical

ArchitecturesBlock (i.e. Figure A-1 CMP_NumberPhysical

ArchitecturesPart).

• Analysis State (Analysis or Simulation): AnalysisStateBlock (i.e. Figure

A-1 AnalysisStatePart).

• Analysis Mode (Most Likely or WCET): AnalysisModeBlock (i.e.

Figure A-1 AnalysisModePart).

• Optimization Mode (ThreadLevel or FunctionLevel): OptimizationMode

block (i.e. Figure A-1 OptimizationModePart).

• Simulation Mode (ThreadLevel or FunctionLevel): SimulationMode

block (i.e. Figure A-1 SimulationModePart).

• Thread Weight (see section 3.3.2): ThreadWeightBlock (i.e. Figure A-1

ThreadWeightPart).

• Attribute Weight (see section 3.3.2): AttributeWeightBlock (i.e. Figure

A-1 AttributeWeightPart).

• Thread Constraints (see section 3.3.6): ThreadConstraintsBlock (i.e.

Figure A-1 ThreadConstraintsPart). The current

ThreadConstraintsBlock definition contains a SystemResponseTime

constraint for each physical thread (1-4) of each physical architecture (1-4)

76

and Latency constraint for each physical function (1-4) of each physical

thread (1-4) of each physical architecture (1-4).

Finally, the ArchitectureAnalysis IBD instantiates and interconnects the

Start ArchitectureAnalysisBlock (i.e. StartArchitectureAnalysisPart) that

provides overall architecture analysis executable model control. The IBD for

StartArchitectureAnalysis Block is shown on the left side of Figure A-2. The

IBD defines one activity whose Activity Diagram (AD) is shown on the right side

of Figure A-2. The AD issues a Start ArchitectureAnalysis event for each

logical architecture (CMP_LogicalArchitecture ContainerPart in Figure A-3)

to CMP_PhysicalArchitectureContainerPart (Figure A-12) to step through

each physical architecture. The number of logical architectures analyzed is

defined by the CMP_NumberLogicalArchitecturesBlock and the number of

physical architectures is defined by the OUT_NumberLogicalArchitectures

attribute of the CMP_NumberPhysicalArchitecturesBlock.

From a structural perspective, the LogicalArchitectureContainer (i.e.

CMP_ LogicalArchitectureContainerPart) encapsulates four logical

architecture block instantiations (see Figure A-3). Each logical architecture block

encapsulates four logical functional thread block instantiations (see Figure A-5)13.

Each logical architecture logical functional thread encapsulates four logical

functions (see Figure A-7)14. From a corresponding structural perspective, the

PhysicalArchitectureContainer (i.e. CMP_PhysicalArchitectureContainer

13 NOTE: The model currently supports four logical threads and can be easily expanded to support more

logical threads. 14 NOTE: The model currently supports four logical thread functions and can be easily expanded to

support more logical thread functions.

77

Part) encapsulates four physical architecture block instantiations (see Figure

A-12). Each physical architecture block encapsulates four physical functional

thread block instantiations (see Figure A-14)15. Each physical architecture

functional thread encapsulates four function nodes (see Figure A-16)16.

From a behavior perspective, the PhysicalArchitectureContainer

propagates a start event to initiate attribute computations and propagates a results

available event to retrieve computed attribute values. The executable model

element Physical ArchitectureContainer includes a CMP_PhysArch_

ExecutionControl block (i.e. CMP_PhysArch_ExecutionControlPart) that

provides execution behavior control for each of the four physical architecture

block instantiations (see Figure A-12). Execution control is provided via the

Activity Diagram (AD) in Figure A-13. The AD accepts StartArchitecture

Analysis event as the behavior entry point. The AD then retrieves the number of

physical architectures to analyze and issues the StartPhysicalArchitecture_1 event

to physical architecture one. Physical architecture one represented by the CMP_

Candidate_1_PhysicalArchitecture block includes a

CMP_PhysArchOne_ExecutionControl block (i.e. CMP_Candidate_1_

PhysicalArchitecturePart) that provides execution behavior control for each of

the four physical thread block instantiations (see Figure A-14). Execution control

is provided via the AD in Figure A-15. The AD accepts

StartPhysicalArchitecture_1 event as the physical architecture one behavior entry

15 NOTE: The model currently supports four physical threads and can be easily expanded to support more

physical threads. 16 NOTE: The model currently supports four physical thread functions and can be easily expanded to

support more physical thread functions.

78

point. The AD then retrieves the current logical architecture being analyzed and

the number of threads associated with the current logical architecture and issues

the StartPhysArchOneThreadOne event to physical architecture one thread one.

Physical architecture one thread one represented by the

CMP_PhysArchOne_ThreadOne block includes a CMP_PhysArchOneThr

One_ExecutionControl block (i.e. CMP_PhysArchOneThrOne_Execution

ControlPart) that provides execution behavior control for each of four physical

thread function block instantiations (see Figure A-16). Execution control is

provided via the AD in Figure A-17. The AD accepts StartPhysArchOneThread

One event as the physical architecture one thread one behavior entry point. The

AD then retrieves the current logical architecture being analyzed and the number

of thread functions associated with the current logical architecture and issues the

StartPhysArchOneThread OneFunctionOne event to physical architecture one

thread one function one. The CMP_PhysArchOne_ThreadOne_FunctionOne

block (i.e. CMP_PhysArchOne_Thread One_FunctionOnePart) encapsulates

physical architecture one thread one function one (see Figure A-18). The CMP_

FunctionPhysicalContainerBlock (i.e. CMP_FunctionPhysicalContainerPart

in Figure A-18) represents the interface point to the physical architecture attribute

layer (section 3.5). Execution control is provided via two ADs. The first AD

(Figure A-19) implements behavior for CMP_PhysArchOneThrOneFuncOne_

ExecutionControlBlock block (i.e. CMP_PhysArchOneThrOneFuncOne_

ExecutionControlPart in Figure A-18) that provides execution behavior control

for physical function block instantiations. The AD performs an initial state that

79

identifies the function by thread number and function number. The AD accepts

StartPhysArchOne ThreadOneFunctionOne event and issues the StartFuncPhys

AttributeComputations event with arguments thread number and function number

(NOTE: Behavior thread continues in section 3.5).

The second AD (Figure A-20) implements behavior for CMP_PhysArch

OneThrOneFuncOne_InterfaceBlock (CMP_PhysArchOneThrOneFunc

One_InterfacePart in Figure A-18). The AD performs an initial state that

identifies the function by thread number and function number. The AD accepts

FuncPhysAttributeComputationResults Available event and compares thread

number and function number event arguments to the function identification (i.e.

the thread and function numbers read at initialization). If the comparison is true,

then the AD executes operation retrieveFuncPhysAttribute Values to retrieve

computed physical attributes (I.e. Execution Time, Energy Consumed, Heat

Generated) for all CRs (i.e. SC, MC, GPU). The AD next executes compute

FunctionOneCosts to compute function costs using computed physical attributes.

Finally, the AD issues event RsltPhysArchOneThrOneFuncOneComplete to the

CMP_PhysArchOneThrOneFuncOne_ExecutionControlBlock AD (see

Figure A-19) which accepts the event and issues the PhysArchOneThrOneFunc

OneResultsAvailable event to propagate function physical attributes and costs.

The CMP_PhysArchOneThrOne_ExecutionControlBlock AD (see Figure

A-17) accepts the event and increments the thread one function number. If the

current thread function number is less than the number of thread one functions,

the AD issues the StartPhysArchOneThreadOneFunctionTwo event to CMP_

80

PhysArchOne_ThreadOne_FunctionTwoPart (Figure A-16). Otherwise, the

AD issues PhysArch OneThrOneResultsAvailable event to CMP_PhysArchOne_

ExecutionControlBlock AD (see Figure A-15)17. CMP_PhysArchOne_

ExecutionControlBlock AD accepts the event and increments the physical

architecture one thread number. If the current thread number is less than the

number of physical architecture one threads, the AD issues the StartPhysArchOne

ThreadTwo event to CMP_PhysArchOne_ThreadTwoPart (Figure A-14).

Otherwise, the AD issues PhysArchOneResultsAvailable event to CMP_Phys

Arch_ExecutionControlBlock AD (see Figure A-13)18. CMP_PhysArch_

ExecutionControlBlock AD accepts the event and increments the physical

architecture number. If the current physical architecture number is less than the

number of physical architectures, the AD issues the StartPhysicalArchitecture_2

event to CMP_Candidate_2_PhysicalArchitecturePart (Figure A-12).

Otherwise, the AD issues PhysArchResultsAvailable event to StartArchitecture

AnalysisBlock AD (see Figure A-2)19. StartArchitectureAnalysisBlock AD

accepts the event and increments the logical architecture number. If the current

logical architecture number is less than the number of logical architectures, the

AD issues the StartArchitectureAnalysis event to CMP_PhysicalArchitecture

ContainerBlock (Figure A-1). Otherwise, the AD proceeds to final state

indicating that architecture analysis is complete.

17 The same process is repeated for thread functions three and four, if appropriate. 18 The same process is repeated for threads three and four, if appropriate. 19 The same process is repeated for physical architectures three and four, if appropriate

81

3.5 Architecture Attribute System Model Details

This section extends both logical and physical architectures from the function

level to the architecture attribute level as shown in Figure 3-6 BDD. Figure 3-6 provides a

descriptive model of the structure of attribute extensions to the logical and physical

architectures in Figure 3-5. However, Figure 3-6 does not provide enough semantic detail

to build an executable model. The remainder of this section presents the model

constructs necessary to extend the executable model definition in section 3.4 from the

function level.

Figure 3-6 Architecture Attribute Model Detail

Figure A-9 structurally extends the logical architecture for candidate one logical

architecture, thread one, function one through addition of a performance attribute (i.e.

CMP_LogArchOneThrOneFuncOne_PerfAttrPart), energy attribute (i.e. CMP_Log

ArchOneThrOne FuncOne_EnergyAttrPart) and thermal attribute (i.e. CMP_Log

ArchOneThrOneFuncOne_ThermAttrPart). The provided attribute set is identical for

82

every model function (four logical architectures, four threads per logical architecture,

four functions per thread). The CMP_Log ArchOneThrOneFuncOne_PerfAttrBlock,

instantiated by CMP_LogArchOneThrOneFuncOne_PerfAttrPart, encapsulates the

number of arithmetic computations required by the function algorithm. Arithmetic

computations supported by the model are 1) Complex/Floating Point/Integer addition20,

multiplication, and division; 2) trigonometric (cos, sin, tan); 3) arc trigonometric (arccos,

arcsin, arctan, four quadrant arctan); 4) miscellaneous (log, exp, sqrt). The CMP_Log

ArchOneThrOneFuncOne_EnergyAttrBlock, instantiated by CMP_LogArchOneThr

OneFuncOne_EnergyAttrPart, encapsulates required function energy attributes such as

energy efficiency and number of arithmetic computations. The CMP_LogArchOneThr

OneFuncOne_ThermAttrBlock, instantiated by CMP_LogArchOneThrOneFunc

One_ThermAttrPart, encapsulates required function thermal attributes such as number

of arithmetic computations.

Figure A-21 structurally extends the physical architecture for candidate one

physical architecture, thread one, function one through addition of a performance

attribute (i.e. CMP_ FuncPhys_PerformanceContainerPart), energy attribute (i.e.

CMP_FuncPhys_EnergyContainerPart) and thermal attribute (i.e. CMP_FuncPhys_

ThermContainerPart). The attribute set is provided for every model function (four

physical architectures, four threads per physical architecture, four functions per thread).

The CMP_FuncPhys_PerformanceContainerBlock, instantiated by CMP_FuncPhys_

PerformanceContainerPart, encapsulates a SC, MC, and GPU computation model.

The CMP_FuncPhys_EnergyContainerBlock, instantiated by CMP_FuncPhys_

20 Subtraction computations are treated as addition with a two’s complement operand and is considered the

equivalent of addition. Therefor algorithm subtractions are counted as algorithm additions.

83

EnergyContainerPart, encapsulates SC, MC, and GPU energy models. The CMP_

FuncPhys_ThermalContainerBlock, instantiated by CMP_FuncPhys_Thermal

ContainerPart, encapsulates SC, MC, and GPU thermal models.

Logical architecture attribute extensions contain no behavior model elements.

Figure A-22 AD defines behavior for candidate one physical architecture, thread one,

function one attribute processing. Upon receipt of the StartFuncPhysAttribute

Computations event, the AD constructs and sends the StartFuncPhysEnergyAttribute

Computations to the CMP_FuncPhys_EnergyContainerBlock (i.e. CMP_FuncPhys_

EnergyContainerPart) in Figure A-21. Upon completion of energy attribute processing,

the CMP_FuncPhys_EnergyContainerBlock sends a CMP_FuncPhys_EnergyResults

Available event to the CMP_FuncPhys_AttributeExecutionControlBlock (i.e. CMP_

FuncPhys_AttributeExecutionControlPart) in Figure A-21. Upon receipt of the event,

the AD constructs and sends the StartFuncPhysPerformanceAttribute Computations to

the CMP_FuncPhys_PerformanceContainerBlock (i.e. CMP_FuncPhys_

PerformanceContainerPart) in Figure A-21. Upon completion of performance attribute

processing, the CMP_FuncPhys_PerformanceContainerBlock sends a CMP_Func

Phys_PerformanceResultsAvailable event to the CMP_FuncPhys_AttributeExecution

ControlBlock in Figure A-21. Upon receipt of the event, the AD constructs and sends

the StartFuncPhysThermalAttributeComputations to the CMP_FuncPhys_Thermal

ContainerBlock (i.e. CMP_Func Phys_ThermalContainerPart) in Figure A-21.

Upon completion of thermal attribute processing, the CMP_FuncPhys_Thermal

ContainerBlock sends a CMP_FuncPhys_ThermalResultsAvailable event to the

CMP_FuncPhys_AttributeExecutionControlBlock in Figure A-21. Upon receipt of

84

the event, the AD constructs and sends the FuncPhysAttributeComputationsAvailable

event to the CMP_PhysArchOneThrOneFuncOne_InterfaceBlock (i.e. CMP_

PhysArchOneThrOneFunc One_InterfaceBlock) in Figure A-18.

3.6 Architecture Performance Attribute System Model

This section presents the model constructs necessary to extend the architecture

performance attribute. The model constructs for the energy and thermal attributes are

identical to the performance attribute.

3.6.1 Performance Attribute Logical Architecture Extensions

There are no logical architecture extensions past those discussed in section 3.5.

3.6.2 Performance Attribute Physical Architecture Extensions

This section extends the physical architecture from architecture attribute level as

shown in Figure 3-7 BDD. Figure 3-7 provides a descriptive model of the structure of

performance attribute extensions that introduce SC, MC, and GPU computational model

decomposition level. Each computational model is further decomposed into analysis and

simulation computational model blocks. As in the preceding sections, Figure 3-7 does not

provide enough semantic detail to build an executable model. The remainder of this

section presents model constructs necessary to extend the executable model definition in

section 3.5.

85

Figure 3-7 Performance Attribute Model

Figure A-23 structurally extends the executable physical performance attribute

model through the addition of SC_ComputationModel_ContainerBlock (i.e. SC_

ComputationModel_ContainerPart), MC_ComputationModel_ContainerBlock (i.e.

MC_ComputationModel_ContainerPart), and GPU_ComputationModel_Container

Block (i.e. GPU_ComputationModel_ContainerPart) that encapsulate CR arithmetic

computation models. The CMP_FuncPhys_PerformanceExecutionControlBlock (i.e.

CMP_FuncPhys_PerformanceExecutionControlPart) encapsulates CR computation

model execution control and CMP_FuncPhys_PerformanceComputationsBlock (i.e.

CMP_FuncPhys_PerformanceComputationsPart) encapsulates retrieval of the set of

arithmetic computations used to compute a function execution time.

86

Upon receipt of the StartFuncPhysPerformanceAttributeComputations event, the

CMP_ FuncPhys_PerformanceExecutionControlBlock AD (Figure A-24) builds the

SetupPerformanceAttributeComputation event using the current logical architecture

number, current thread number, current function number, and computer resource

identification. The event is then sent to the CMP_FuncPhys_Performance

ComputationsBlock. Upon event receipt, the CMP_FuncPhys_Performance

ComputationsBlock AD (Figure A-25) tests for CR. If CR is SC, then the Figure A-25

AD uses the current logical architecture number, thread number, and function number to

build the StartRetrievePerformanceComputations event. The event is relayed through the

physical architecture to the CMP_LogArch_ExecutionControl block (i.e. CMP_Log

Arch_ExecutionControlPart) AD (Figure A-4). The CMP_LogArch_Execution

Control AD builds and sends LogArchOne(Two/Three/Four)_StartRetrievePerfComps

event if current logical architecture is one(two/three/four) to the CMP_LogArch

One(Two/Three/Four)_ExecutionControl block.

Each logical architecture/thread/function performance computation retrieval

processing is identical. Figure A-6 presents performance computation retrieval behavior

for the CMP_LogArchOne_ExecutionControl block. The CMP_LogArchOne_

ExecutionControl AD builds and sends LogArchOneThrOne(Two/Three/Four)_

StartRetrievePerfComps event if current thread number is one(two/three/four) to the

CMP_LogArchOneThrOne(Two/Three/Four)_ExecutionControl block. Figure A-8

presents performance computation retrieval behavior for the CMP_LogArchOneThr

One_ExecutionControl block. The CMP_LogArchOneThrOne_ExecutionControl

AD builds and sends LogArchOneThrOneFuncOne(Two/Three/Four)_StartRetrieve

87

PerfComps event if current function number is one(two/three/four) to the CMP_Log

ArchOneThrOneFuncOne(Two/Three/Four)_ExecutionControl block.

The previously described behavior results in a single function executing a series

of set operations provided by the AlgorithmPerformanceInterfaceBlock to set the

number of arithmetic computations (Complex/Floating Point/Integer Add/Multiply/

Divide, Trig, ArcTrig, and Miscellaneous) required by the accessed logical architecture

function. A series of performance computations available events are promulgated

through the logical architecture. The right side of Figure A-8 responds to the appropriate

LogArchOneThrOneFuncOne(Two/ Three/Four)_PerfCompResultsAvailable event and

sends LogArchOneThrOne_PerfCompResults Available event to CMP_LogArch

One_ExecutionControl block. The right side of Figure A-6 responds to the appropriate

LogArchOneThrOne(Two/Three/Four)_PerfCompResultsAvailable event and sends

LogArchOne_PerfCompResultsAvailable event to CMP_LogArch_ExecutionControl

block. The right side of Figure A-4 responds to the appropriate LogArchOne(Two/Three/

Four)_PerfCompResultsAvailable event and sends PerformanceComputationResults

Available event to the CMP_PhysicalArchitectureContainer block. The Performance

ComputationResultsAvailable event is promulgated to the CMP_FuncPhys_

PerformanceComputationsBlock AD (Figure A-25). The right side of Figure A-25

checks that the appropriate function computations have been retrieved and then sends

StartFuncPhysScPerformanceAttributeComputations event to the SC_Computation

Model_ContainerBlock (Figure A-23).

If CR is MC, the Figure A-25 AD sends StartFuncPhysMcPerformanceAttribute

Computations event to the MC_ComputationModel_ContainerBlock (Figure A-23). If

88

CR is GPU, the Figure A-25 AD sends StartFuncPhysGpuPerformanceAttribute

Computations event to the GPU_ComputationModel_ContainerBlock (Figure A-23).

3.6.2.1 Performance Attribute Single Core (SC) CPU Computation Model

Figure A-26 structurally extends the SC_ComputationModel_ContainerBlock

from the section 3.6.2 through addition of the SC_CM_AnalysisContainerBlock (i.e.

SC_CM_Analysis ContainerPart) and the SC_CM_SimulationContainerBlock (i.e.

SC_CM_SimulationContainerPart). SC_CM_AnalysisContainerBlock computes an

estimated most-likely or WCET SC execution time based on the state of the Analysis

Mode configuration parameter21. SC_CM_ SimulationContainerBlock generates a

number of simulated SC execution times (see section 3.7.3). The SC_Computation

Model_ExecutionControlBlock (i.e. SC_ComputationModel_ExecutionControlPart)

is responsible for propagating an execution start event to the appropriate block based on

the state of the AnalysisState configuration parameter22. SC_CM_ ExecutionTimeBlock

(i.e. SC_CM_ExecutionTimePart) is responsible for propagating SC execution time

results.

Figure A-27 presents the AD encapsulated by the SC_ComputationModel_

ExecutionControlBlock. The AD accepts StartFuncPhysScPerformanceAttribute

Computations event and retrieves the AnalysisState configuration parameter. The AD

sends StartScAnalysisExecutionTimeComputationEvent to the SC_CM_Analysis

ContainerBlock if AnalysisState is set to ‘Optimization’. The AD sends StartSc

21 AnalysisMode configuration parameter is managed by the AnalysisModeBlock. 22 AnalysisState configuration parameter is managed by the AnalysisStateBlock.

89

SimulationExecutionTimeComputationEvent to SC_CM_SimulationContainerBlock if

AnalysisState is set to ‘Simulation’.

Figure A-28 presents the AD encapsulated by the SC_CM_ExecutionTime

Block. The AD manages two concurrent swimlanes. The SC_Analysis swimlane accepts

ScAnalysisExecutionTimeAvailableEvent, sets SC_Execution_Time to the retrieved

estimated execution time, and sends CMP_FuncPhys_ScPerformanceResultsAvailable

event to promulgate estimated execution time results. The SC_Simulation swimlane

accepts ScSimulationExecutionTimeAvailableEvent, sets SC_Execution_Time to the

retrieved simulation execution times, and sends CMP_FuncPhys_ScPerformanceResults

Available event to promulgate simulated execution time results.

3.6.2.1.1 SC Analysis Computation Model

Figure A-29 structurally extends the SC_CM_AnalysisContainerBlock from the

section 3.6.2.1 through addition of the SC_AnalComputation_CmplxContainerBlock

(i.e. SC_Anal Computation_CmplxContainerPart), the SC_AnalComputation_Float

ContainerBlock (i.e. SC_AnalComputation_FloatContainerPart), the SC_Anal

Computation_IntContainerBlock (i.e. SC_AnalComputation_IntContainerPart), the

SC_AnalComputation_TrigContainerBlock (i.e. SC_AnalComputation_Trig

ContainerPart) ), the SC_AnalComputation_ArcTrigContainer Block (i.e. SC_Anal

Computation_ArcTrigContainerPart), the SC_AnalComputation_MiscContainer

Block (i.e. SC_AnalComputation_MiscContainerPart). The SC_Promulgate

AnalysisExecution TimeStartBlock (i.e. SC_PromulgateAnalysisExecutionTime

StartPart) is responsible for propagating an execution start event to all math operation

90

blocks. SC_AnalysisExecutionTime Block (i.e. SC_AnalysisExecutionTimePart) is

responsible for propagating SC analysis execution time results.

Figure A-30 presents the AD encapsulated by the SC_PromulgateAnalysis

ExecutionTimeStartBlock. The AD accepts the StartScAnalysisExecutionTime

ComputationEvent event. The AD concurrently sends StartTrigExecutionTime

ComputationEvent to the SC_AnalComputation_Trig ContainerBlock, StartArcTrig

ExecutionTimeComputationEvent to the SC_AnalComputation_ArcTrigContainer

Block, StartCmplxExecutionTimeComputationEvent to the SC_AnalComputation_

CmplxContainerBlock, StartFloatExecutionTimeComputationEvent to the SC_Anal

Computation_FloatContainerBlock, StartIntExecutionTimeComputationEvent to the

SC_Anal Computation_IntContainerBlock, and StartMiscExecutionTimeComputation

Event to the SC_AnalComputation_MiscContainerBlock.

91

Figure 3-8 computeExecutionTime Operation Code Segment

Figure A-28 presents the AD encapsulated by the SC_AnalysisExecutionTime

Block. The AD accepts TrigExecutionTimeAvailableEvent, ArcTrigExecutionTime

AvailableEvent, Misc ExecutionTimeAvailableEvent, CmplxComputationExecutionTime

AvailableEvent, FloatExecution TimeAvailableEvent, and IntExecutionTimeAvailable

Event. The AD must receive all events before continuing to compute execution time. At

a high level the Execution time is computed as shown in Figure 3-7. LOC_CompExecTime

92

is an accumulation of individual math group execution times. OUT_CompExecTime is

computed according to equation 8 below:

𝑂𝑈𝑇_𝐶𝑜𝑚𝑝𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 = 𝐴𝐶𝐹 ∗ 𝐿𝑂𝐶_𝐶𝑜𝑚𝑝𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 (8)

Where Architecture Calibration Factor (ACF) accounts for CR memory and

computation efficiencies. ACF is discussed further in section 4.3.

3.6.2.1.1.1 SC Complex Math Computation Model

Figure A-32 structurally extends the SC_AnalComputation_CmplxContainer

Block (Figure A-29) through addition of the SC_AnalComputation_CmplxAdd

ContainerBlock (i.e. SC_AnalComputation_CmplxAddContainerPart), the

SC_AnalComputation_CmplxDivContainerBlock (i.e. SC_AnalComputation_

CmplxDivContainerPart), and the SC_AnalComputation_CmplxMulContainer

Block (i.e. SC_AnalComputation_CmplxMulContainerPart). The SC_Promulgate

AnalysisCmplxExecutionTimeStartBlock (i.e. SC_PromulgateAnalysisCmplx

ExecutionTimeStartPart) is responsible for propagating an execution start event to all

complex math operation blocks. SC_AnalysisCmplxExecutionTimeBlock (i.e.

SC_AnalysisCmplx ExecutionTimePart) is responsible for propagating SC analysis

complex math execution time results.

Figure A-33 presents the AD encapsulated by the SC_PromulgateAnalysis

CmplxExecutionTimeStartBlock. The AD accepts the StartCmplxExecutionTime

ComputationEvent event. The AD concurrently sends StartCmplxAddExecutionTime

ComputationEvent to the SC_AnalComputation_CmplxAddContainerBlock,

StartCmplxDivExecutionTimeComputationEvent to the SC_AnalComputation_Cmplx

93

DivContainerBlock, and StartCmplxMulExecutionTimeComputationEvent to the

SC_AnalComputation_CmplxMulContainerBlock,.

Figure A-34 presents the AD encapsulated by the SC_AnalysisCmplxExecution

TimeBlock. The AD accepts CmplxAddExecTimeAvailableEvent, CmplxDivExecTime

AvailableEvent, and CmplxMulExecTimeAvailableEvent. The AD must receive all events

before continuing to compute complex execution time. Execution time is computed as

shown in Figure 3-9.

Figure 3-9 computeComplexExecutionTime Operation Code Segment

Figure A-35 structurally extends the SC_AnalComputation_CmplxAdd

ContainerBlock (Figure A-32). This block provides support for one to five algorithm

buffers (e.g. three input buffers and two output buffers). The container block

encapsulates the SC_Anal_CmplxAddSingle(Double/Triple/Quad/Quint)BufferQmif

Block (i.e. SC_Anal_CmplxAddSingle(Double/ Triple/Quad/Quint)BufferQmifPart.

These blocks/parts provide the interface from the system model to SPMs (see section

3.6.3). SC_Anal_ComplexAddExecutionTimeBlock (i.e. SC_Anal_ComplexAdd

ExecutionTimePart) is responsible for propagating SC analysis complex add math

execution time results.

94

Figure 3-10 selectComplexAddBufferTime Operation Code Segment

Figure A-36 presents the AD encapsulated by the SC_Anal_ComplexAdd

ExecutionTimeBlock. The AD accepts the StartCmplxAddExecutionTimeComputation

Event. The AD waits for input flows from all SPM interfaces before continuing to

compute complex add execution time. Complex add times are selected using

IN_NumberComplexAddBuffers attribute (Figure 3-10).

95

Execution time (Figure 3-11) is computed for the selected AnalysisMode23 (i.e.

IN_AnalysisType attribute) according to equation 9 below:

𝑂𝑈𝑇_𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝐴𝑑𝑑𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 = (𝐶𝐹𝑟𝑒𝑓

𝐶𝐹𝑠𝑒𝑙⁄ ) ∗ 𝑁𝑜𝑝𝑠 ∗ 𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝑀𝑜𝑑𝑒𝐸𝑉

(9)

Where CFref is the reference clock frequency, CFsel is the selected clock

frequency, CFref / CFsel is clock ratio (i.e. IN_ClkRatio attribute), Nops is the number of

math operations (i.e. IN_NumComplexAddComps attribute), and ExecTimeModeEV is the

selected execution time from Figure 3-10.

Figure 3-11 computeComplexAddTime Operation Code Segment

3.6.2.1.1.2 SC Floating Point Math Computation Model

SC_AnalComputation_FloatContainerBlock (Figure A-29) is structurally

extended through addition of the SC_AnalComputation_FloatAddContainerBlock (i.e.

SC_Anal Computation_FloatAddContainerPart), the SC_AnalComputation_Float

23 MostLikely and Wcet are currently supported with the rest as future growth.

96

DivContainerBlock (i.e. SC_AnalComputation_FloatDivContainerPart), and the

SC_AnalComputation_FloatMulContainerBlock (i.e. SC_AnalComputation_Float

MulContainerPart). The SC_PromulgateAnalysisFloatExecutionTimeStartBlock

(i.e. SC_PromulgateAnalysisFloatExecutionTimeStart Part) is responsible for

propagating an execution start event to all float math operation blocks. SC_Analysis

FloatExecutionTimeBlock (i.e. SC_AnalysisFloatExecutionTimePart) is responsible

for propagating SC analysis float math execution time results.

All SC_AnalComputation_FloatContainerBlock parts are interconnected

exactly the same as SC_AnalComputation_CmplxContainerBlock (Figure A-32) parts.

SC_Promulgate AnalysisFloatExecutionTimeStartBlock AD implementation is

identical to the SC_PromulgateAnalysisCmplxExecutionTimeStartBlock AD. SC_

AnalysisFloatExecutionTimeBlock implementation is identical to the SC_Analysis

CmplxExecutionTimeBlock AD.

3.6.2.1.1.3 SC Integer Math Computation Model

SC_AnalComputation_IntContainerBlock (Figure A-29) is structurally

extended through addition of the SC_AnalComputation_IntAddContainerBlock (i.e.

SC_AnalComputation_IntAdd ContainerPart), the SC_AnalComputation_IntDiv

ContainerBlock (i.e. SC_AnalComputation_IntDivContainerPart), and the SC_Anal

Computation_IntMulContainerBlock (i.e. SC_AnalComputation_IntMulContainer

Part). The SC_PromulgateAnalysisIntExecutionTimeStartBlock (i.e. SC_

PromulgateAnalysisIntExecutionTimeStartPart) is responsible for propagating an

execution start event to all integer math operation blocks.

97

SC_AnalysisIntExecutionTimeBlock (i.e. SC_AnalysisIntExecutionTimePart) is

responsible for propagating SC analysis Integer math execution time results.

All SC_AnalComputation_IntContainerBlock parts are interconnected exactly

the same as SC_AnalComputation_CmplxContainerBlock (Figure A-32) parts.

SC_PromulgateAnalysisInt ExecutionTimeStartBlock AD implementation is identical

to the SC_PromulgateAnalysisCmplx ExecutionTimeStartBlock AD. SC_Analysis

IntExecutionTimeBlock implementation is identical to the SC_AnalysisCmplx

ExecutionTimeBlock AD.

3.6.2.1.1.4 SC Trig Computation Model

Figure A-38 structurally extends the SC_AnalComputation_TrigContainer

Block (Figure A-29) through addition of the SC_AnalComputation_CosContainer

Block (i.e. SC_Anal Computation_CosContainerPart), the SC_AnalComputation_

SinContainerBlock (i.e. SC_Anal Computation_SinContainerPart), and the SC_Anal

Computation_TanContainerBlock (i.e. SC_Anal Computation_TanContainerPart).

The SC_PromulgateAnalysisTrigExecutionTimeStartBlock (i.e. SC_Promulgate

AnalysisTrigExecutionStartPart) is responsible for propagating an execution start

event all trig blocks. SC_AnalysisTrigExecutionTimeBlock (i.e. SC_AnalysisTrig

ExecutionTimePart) is responsible for propagating SC analysis trig execution time

results.

Figure A-39 presents the AD encapsulated by the SC_PromulgateAnalysisTrig

ExecutionTimeStartBlock. The AD accepts the StartTrigExecutionTimeComputation

Event event. The AD concurrently sends StartCosExecutionTimeComputationEvent to

98

the SC_AnalComputation_CosContainerBlock, StartSinExecutionTimeComputation

Event to the SC_AnalComputation_SinContainerBlock, and StartTanExecutionTime

ComputationEvent to the SC_AnalComputation_TanContainerBlock.

Figure A-40 presents the AD encapsulated by the SC_AnalysisTrigExecution

TimeBlock. The AD accepts CosExecTimeAvailableEvent, SinExecTimeAvailableEvent,

and TanExecTimeAvailableEvent. The AD must receive all events before continuing to

compute trig execution time. Execution time is computed as shown in Figure 3-12.

Figure 3-12 computeTrigExecutionTime Operation Code Segment

Figure A-41 structurally extends the SC_AnalComputation_CosContainer

Block (Figure A-38). This block provides support for one algorithm buffer. The

container block encapsulates the SC_Anal_CosQmifBlock (i.e. SC_Anal_CosQmif

Part). This block/part provides the interface from the system model to SPM models (see

section 3.6.3). SC_Anal_CosExecution TimeBlock (i.e. SC_Anal_CosExecution

TimePart) is responsible for propagating SC analysis cosine math execution time results.

Figure A-42 presents the AD encapsulated by the SC_Anal_CosExecutionTime

Block. The AD accepts StartCosExecutionTimeComputationEvent. The AD waits for

input flows from minimum and maximum SPM interfaces before continuing to compute

99

cosine execution time. Execution time (Figure 3-13) is computed for the selected

AnalysisMode24 (i.e. IN_AnalysisType attribute) according to equation 10 below:

𝑂𝑈𝑇_𝐶𝑜𝑠𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 = (𝐶𝐹𝑟𝑒𝑓

𝐶𝐹𝑠𝑒𝑙⁄ ) ∗ 𝑁𝑜𝑝𝑠 ∗ 𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝑀𝑜𝑑𝑒𝐸𝑉 (10)

Where CFref is the reference clock frequency, CFsel is the selected clock

frequency, CFref / CFsel is clock ratio (i.e. IN_ClkRatio attribute), Nops is the number of

math operations (i.e. IN_NumCosComps attribute), and ExecTimeModeEV is the selected

execution time (i.e. IN_CosMostLikely or IN_CosMax attribute),

Figure 3-13 computeCosExecutionTime Operation Code Segment

3.6.2.1.1.5 SC Arc Trig Computation Model

SC_AnalComputation_ArcTrigContainerBlock (Figure A-29) is structurally

extended through addition of the SC_AnalComputation_ArcCosContainerBlock (i.e.

SC_Anal Computation_ArcCosContainerPart), the SC_AnalComputation_ArcSin

24 MostLikely and Wcet are currently supported with the rest as future growth.

100

ContainerBlock (i.e. SC_AnalComputation_ArcSinContainerPart), the SC_Anal

Computation_ArcTanContainerBlock (i.e. SC_AnalComputation_ArcTan

ContainerPart), and the SC_AnalComputation_ArcTanFourQuadContainerBlock

(i.e. SC_Anal Computation_ArcTanFourQuadContainerPart) . The SC_

PromulgateAnalysisArcTrigExecutionTimeStartBlock (i.e. SC_PromulgateAnalysis

ArcTrigExecutionTimeStartPart) is responsible for propagating an execution start

event to all arc trig math operation blocks. SC_AnalysisArcTrigExecutionTimeBlock

(i.e. SC_AnalysisArcTrig ExecutionTimePart) is responsible for propagating SC

analysis arc trig math execution time results.

All SC_AnalComputation_ArcTrigContainerBlock parts are interconnected

exactly the same as SC_AnalComputation_TrigContainerBlock (Figure A-38) parts.

SC_Promulgate AnalysisArcTrigExecutionTimeStartBlock AD implementation is

identical to the SC_Promulgate AnalysisTrigExecutionTimeStartBlock AD.

SC_AnalysisArcTrigExecutionTimeBlock implementation is identical to the

SC_AnalysisTrigExecutionTimeBlock AD.

3.6.2.1.1.6 SC Miscellaneous Computation Model

SC_AnalComputation_MiscContainerBlock (Figure A-29) is structurally

extended through addition of the SC_AnalComputation_ExpContainerBlock (i.e.

SC_AnalComputation_ExpContainerPart), the SC_AnalComputation_Log

ContainerBlock (i.e. SC_AnalComputation_ LogContainerPart), and the SC_Anal

Computation_SqrtContainerBlock (i.e. SC_AnalComputation_SqrtContainerPart) .

The SC_PromulgateAnalysisMiscExecutionTimeStartBlock (i.e. SC_Promulgate

AnalysisMiscExecutionTimeStartPart) is responsible for propagating an execution

101

start event to all miscellaneous math operation blocks. SC_AnalysisMiscExecution

TimeBlock (i.e. SC_AnalysisMiscExecutionTimePart) is responsible for propagating

SC analysis miscellaneous math execution time results.

All SC_AnalComputation_MiscContainerBlock parts are interconnected

exactly the same as SC_AnalComputation_TrigContainerBlock (Figure A-38) parts.

SC_Promulgate AnalysisMiscExecutionTimeStartBlock AD implementation is

identical to the SC_Promulgate AnalysisTrigExecutionTimeStartBlock AD. SC_

AnalysisMiscExecutionTimeBlock implementation is identical to the SC_AnalysisTrig

ExecutionTimeBlock AD.

3.6.3 Architecture Attribute System Model – Quantitative Model Interface

The interface between the executable architecture attribute system model and each

math operation SPM is mechanized through two model constructs. The first is a

specialized block stereotyped as a SimulinkBlock shown in Figure 3-14). The second is a

MATLAB® Simulink® model discussed in section 3.7.2.4.

Figure 3-14 System Model - Quantitative Model Interface

102

The SimulinkBlock encapsulates a MATLAB® Simulink® model with the flow

ports on the right side of the block matching Simulink model ports. Data can flow in,

out, or in/out of the flow ports. The rate that data is produced or consumed by each flow

port is controlled by the m_SampleTime attribute set to 50 milliseconds in Figure 3-14.

3.7 Performance Attribute Statistical Performance Models

This section discusses a series of workflow steps to develop and integrate

estimation and simulation SPMs. Both models are integrated with the component

physical architecture system model during the “Physical Architecture Modeling”

workflow step shown in Figure 3-15. Figure 3-7 depicts physical architecture performance

attribute (i.e. computational SPM) and physical architecture system model layers. A

similar model construct exists for energy (thermal) attributes where energy (thermal)

model(s) replace performance SPMs.

Figure 3-15 SPM Development Flow

103

Figure 3-15 SPM development flows from bottom to top. The products at each

level from bottom to top represent a level of abstraction from observed data, to statistical

models, to Simulink models to SysML models.

From this point forward, the paper will focus on development of sCPU

computational SPMs to support estimation (i.e. optimization analysis state) and

simulation (i.e. simulation analysis state). Section 3.7.1 describes the processor and

memory architecture configuration and assumptions made for statistical performance

model development.

3.7.1 Statistical Process Model Development Computer Configuration

A 2nd Gen Intel® Core™ microarchitecture (formerly known as Sandy Bridge)

(Lempel 2011) 2.40 GHz dual-core i3 processor was used for statistical performance

model development. An overview of the Sandy Bridge microarchitecture is shown in

Figure 3-16. The microarchitecture implements one on-chip L1 Instruction Cache (32

KByte) per core. The microarchitecture implements an on-chip L1 data L1 Data Cache

(32 KByte) and on-chip L2 Data Cache (256 KByte) per core. The microarchitecture also

implements an on-chip Last Level (or L3) Data Cache (3072 Kbytes) shared by all cores.

Finally, the microarchitecture provides an Integrated Memory Controller two channel

interface to 8 GByte bulk memory (i.e. Double Data Rate type three (DDR3)

Synchronous Dynamic Random-Access Memory). Dual DDR3 memory operates at a

channel transfer rate of 21,328 MB/s.

104

The second asset used during the case study (section 4.1) is a 4th Gen Intel®

Core™ microarchitecture (known as Broadwell) 2.50 GHz dual-core i5 processor.

Instruction cache size, data cache size, and bulk memory size and speed are identical

between the two architectures. For this reason, it was decided to only address CPU clock

speed in this research effort.

Figure 3-16 Intel Sandy Bridge Microarchitecture (Lempel 2011)

The software developed to collect arithmetic operation execution time data was

designed to utilize all of the available on-chip cache memory plus some bulk memory.

This was done in order to force collection of longer (i.e. more conservative) execution

times resulting from the utilization of slower memories. The strategy for this research

effort was to collect execution time data for one memory usage profile and calibrate

execution times for other memory usage profiles. Future research can address execution

time dependencies on memory size, speed, and usage profiles.

105

The software environment used for SPMs development was MATLAB R2017A.

This software package was chosen to perform arithmetic operations with associated time

collection at the application level. Arithmetic operation SPMs require development one

time for each processor family (e.g. Intel, Arm, etc.).

3.7.2 Estimation Models

This section describes development of an arithmetic computation SPM library.

The library currently consists of nineteen arithmetic operations: Complex/Floating

point/Integer Add/Multiply/Divide (9 individual models), Cos/Sin/Tan (3 individual

models), Arc Cos/Sin/Tan/TanFourQuad (4 four individual models), Exp/Log/Sqrt (3

individual models). Add, Multiply, and Divide operations are modeled for one through

five buffers that supports processing up to five matrix dimensions. Each arithmetic

operation SPM was produced by collecting observation data, defining states, and

performing distribution analysis. A total of fifty-five models were developed for this

research effort.

The arithmetic operations chosen for this research effort primarily support vector

based (i.e. one dimension) signal processing algorithms. This class of algorithms are used

for embedded sensor processing in many application domains such as automotive,

aircraft, ship, submarine, manufacturing control, chemical processing, and so on. Section

5.2 discusses enhancements to this arithmetic operation library.

3.7.2.1 Execution Time Data Collection Workflow Step

Each arithmetic operation SPM was produced by first generating a set of

observation data at the software application level. Execution times observed at the

106

application level encompasses computer CPU and memory architecture, operating

system, compiler, and software application activities required to perform basic

mathematical operations. This observation data can be used to build a coarse-grained

system-level execution time estimate.

Observation data was produced using an application level MATLAB script

performing vector arithmetic operations. Each observation data point represents the

execution time associated with 100,000 mathematical operations (e.g. cosine, integer add,

complex multiply, etc.). Each operand and result used 64-bit (or 8 byte) long word (or

double) data types. The collected execution time is then divided by 100,000 to derive the

average execution time per math operation. The procedure is repeated 50,000 times to

form the observation data set. Therefore, each observation data set consists of 50,000

sample points.

Figure 3-17 Complex Add Single Buffer Multimodal Distribution

107

The 50,000 sample points are used to build a histogram. Figure 3-17 (left side)

depicts a histogram constructed from observation data for the Complex Add Single

Buffer operation. The histogram reveals a multimodal distribution.

3.7.2.2 State Definition (Estimation) Workflow Step

A group of unimodal distributions (Figure 3-17 right side) is formed from the

multimodal distribution. Each unimodal distribution represents a group of execution

times associated with executable code and operand memory location (i.e. cache memory

level) for the associated arithmetic operation. Code and data residence in faster memories

at the time of arithmetic operation execution results in faster execution times.

The minimum observed execution time, called the Best-Case Execution Time

(BCET) (Wilhelm 2008) is defined as the State 1 minimum execution time. The

maximum execution time for State 1 is chosen such that the state histogram represents a

unimodal distribution (e.g. Figure 3-17 right side). The maximum execution time for State

1 becomes the minimum execution time for State 2. The process repeats until the last

state where the maximum execution time is the maximum observed execution time (i.e.

WCET (Wilhelm 2008)). The multimodal distribution is decomposed into a series of

unimodal distributions each bounded by the state minimum and maximum execution

times. A histogram is built using each state’s data (e.g. Figure 3-17 right side) that

represents an empirical unimodal distribution.

3.7.2.3 Distribution Analysis Workflow Step

Each state unimodal distribution is fit with multiple candidate distribution models

(i.e. colored lines of Figure 3-17 right side). Maximum likelihood estimation (Myung

108

2003) is used to estimate distribution parameters (e.g. mu, standard deviation, shape) for

each candidate distribution. Ninety-five percent confidence intervals are computed for all

distribution parameters. Covariances are also computed for all distribution parameters.

Figure A-44 shows candidate distributions and associated distribution parameters for

complex add single buffer add operation hot states. Figure A-45 shows candidate

distributions and associated distribution parameters for the warm states of the Complex

Add Single Buffer operation. Hot states (Figure 3-17 left side) are associated with faster

execution time where executable code and operands are loaded in the fastest cache

memories. Warm states (Figure 3-17 left side) are associated with slower execution time

where executable code and operands have to be loaded into cache memory or retrieved

from slower bulk memory.

The Bayesian Information Criterion (BIC) (Schhwarz 1978) is computed for each

distribution using the maximum likelihood value. The most negative BIC value identifies

the best distribution fit. BIC is preferred over the Akaike Information Criterion (AIC)

(Akaike 1974) is this scenario for two reasons:

• BIC says more about absolute model quality

• AIC selects the best model fit from a set but says nothing about model

quality

The mu for each selected state distribution (Figure A-44 and Figure A-45 ) is

placed in an Excel spreadsheet (Figure 3-18) along with mu 95% confidence interval

values. State probability is computed by dividing the number of state sample points

(‘HMM State Count’ column in Figure 3-18) by the total number of sample points

109

(50000). A similar spreadsheet is built for each mathematical operation. Each

spreadsheet feeds a MATLAB® Simulink® Model discussed in the next section.

Figure 3-18 Single Core Complex Add State Parameters

3.7.2.4 Simulink Modeling (Estimation) Workflow Step

A MATLAB® Simulink® model (Figure A-43) is built for each math operation.

The model computes expected value (EV) using state mu and probabilities according to

equation 11.

𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝐸𝑉 = 𝜇𝑆𝑇1 ∗ 𝑝𝑆𝑇1 + 𝜇𝑆𝑇2 ∗ 𝑝𝑆𝑇2 + ⋯ + 𝜇𝑆𝑇𝑛 ∗ 𝑝𝑆𝑇𝑛 (11)

The Simulink model computes EV’s for Hot States, Warm States, and a Cold25

state. The Simulink model also computes an EV for all states designated as “Most

Likely”. The Simulink model also determines the minimum and maximum (or WCET)

observed execution times to complete the set of six values computed for each math

operation. The Simulink model reads state mus, probabilities, and observation data from

25 Cold state encapsulates the state with highest execution values. This state occurs when all arithmetic

operation operands are retrieved from bulk memory.

110

the math operation spreadsheet generated in section 3.7.2.3. The Simulink model outputs

computed values via output ports that are associated with flow ports in the system model

(see section 3.6.3).

3.7.3 Simulation Analysis Models

The purpose of the simulation SPM is to produce a representative set of execution

time samples for use in simulation analysis. A simulation SPM has been developed for

each arithmetic operation in the computation library (see section 3.7.2). Each simulation

SPM implements a Hidden Markov Model (HMM) as discussed in section 3.7.3.2. The

HMM state model uses the same boundary execution times (section 3.7.2.2) but differs in

structure as discussed in section 3.7.3.2. MATLAB® Simulink® models could not be

built to interface simulation SPMs to the system model as discussed in section 3.7.3.3.

3.7.3.1 State Definition (Simulation) Workflow Step

The states defined in section 3.7.2.2 are also used for simulation. However,

instead of being organized to Hot, Warm, and Cold states, the states are organized into a

single group. This approach was used because there was no need to distinguish among

state groups from the perspective of generating simulated execution time samples.

3.7.3.2 Transition Analysis

Inspection of math operation observation data revealed patterns of execution

times that result from instruction caching, data caching, dynamic power management,

operating system thread breaks, etc. An HMM is used to model visible state outputs and

an underlying hidden state behavior. A sample HMM is shown in Figure 3-19.

111

Figure 3-19 Sample Hidden Markov Model

It is assumed that for each t time step the model occupies a state ωi(t) and emits a

visible output φi(t), χi(t), or ψi(t). Visible outputs can be continuous functions or discrete

outputs. This research will restrict outputs to discrete values. At any state ωi(t) the

probability of any particular visible output is defined by the probability ρik. The

unobservable states (or nodes) (ωi) and transition probabilities operate the same as a basic

Markov model.

The observable data set is analyzed to develop an HMM state transition

probability matrix. Figure 3-20 shows the state transition probability matrix for the

complex add single buffer math operation. Each observable data value (i.e. current state)

is compared to the next data value (i.e. next state). The appropriate State Transition

Count Matrix cell is incremented by one. The values for each State Transition Count

Matrix row are summed and placed in the HMM State Count column. The State

112

Transition Probability matrix is then built by dividing each cell of the State Transition

Count Matrix by the appropriate HMM State Count column cell.

Figure 3-20 Complex Add Single Buffer HMM State Transition Probability Matrix

A set of discrete visible outputs with associated output probabilities must be

developed for each state. Figure 3-21 shows an example histogram display built from the

histogram data table (Figure 3-22).

Figure 3-21 Complex Add Single Buffer State 1 Histogram Display

113

Figure 3-22 Complex Add Single Buffer State 1 Histogram Data

The output probability is computed using the data in Figure 3-22 as follows:

1. Sum all non-zero state histogram values

2. Divide each non-zero state histogram value by the sum

The resulting table in Figure 3-23 shows each discrete output and its associated

output probability.

114

Figure 3-23 Complex Add Single Buffer State 1 Visible Output Data

The state transition probability matrix and the discrete outputs with associated

output probabilities for all states fully define the HMM for each math operation. The

HMM is used to build a stream of simulated execution time values for each math

operation.

115

3.7.3.3 Simulink Modeling (Simulation) Workflow Step

A MATLAB® Simulink® model is to be constructed to encapsulate each HMM.

The Simulink model provides a stream of simulated execution time values to the system

model. At this time there are multiple issues that prevent construction of the Simulink

models required to execution time simulation. First, there is a MATLAB function

hmmgenerate that is used to generate a stream of visible output states using the HMM

developed in section 3.7.3.2. The function takes the number of samples, state transition

probability matrix and visible output probability matrix as inputs to produce a stream of

observable output data. However, this function is currently not supported by the

MATLAB Simulink code generator. The MATLAB Simulink code generator produces code

to export the behavior associated with the Simulink model into the Rhapsody system

model. Second, the hmmgenerate function reproduces an identical stream of visible

state outputs for each function invocation. Mathworks is aware of both issues and is

currently working on solutions. Those solutions will be integrated into the model when

available.

116

Chapter 4 - Data Analysis and Results

4.1 Case Study Definition

A simple case study is presented to evaluate execution time estimates generated

using section 3.6.2.1.1 arithmetic computation SPMs integrated with the executable

performance attribute SysML model and actual execution times. The goals for this case

study are 1) provide initial validation of use of the executable architecture attribute model

and architecture attribute analysis workflow, 2) evaluate execution time (i.e. performance

attribute) estimation versus observed actual execution times, 3) compute performance

cost for use with optimization analysis, and 4) select algorithm data input size that meets

performance constraints. These goals were accomplished through a proof of concept

application using a single computational thread for two different sCPU computer

systems.

The case study implements a single functional thread consisting of three

functions. Thread execution time is required to satisfy (i.e. be less than or equal to) a

Component Response Time (CRT) MOP constraint. The CRT value chosen for the case

study is 15 msec as shown in Figure 4-1. Two sCPU proposed solutions, Computer 1 and

Computer 2 in Figure 4-1 with memory architectures defined in section 3.7.1, are used to

evaluate estimated thread execution time versus actual execution time. Analysis is

performed for fifteen algorithm data input sizes (256, 512, 1024, 2048, 4096, 8192,

16384, 32768, 65536, 131072, 262144, 528376, 1048576, 2097152, 4096384). An

execution time estimate for each data input size is produced by the executable model for

both Analysis Modes (Most Likely and WCET).

117

Figure 4-1 Case Study Functional and Physical Definition

4.2 Case Study Architecture Attribute Workflow

The first step of the architecture analysis workflow (section 3.1) used for this case

study is ‘Define Key Component System Functions’. For this case study, the function

thread model (Figure 4-1) consists of a Fast Fourier Transform (FFT), followed by a

Gaussian Filter – Frequency Domain (GFFD), followed by an Inverse FFT. The

FFT/IFFT algorithm selected was the Cooley-Tukey algorithm (Cooley and Tukey 1965).

The GFFD algorithm was adapted from a two dimensional (Gonzalez, Woods and Eddins

2009) to one dimensional gaussian filter. The functional architecture, physical

architecture, and performance attribute extensions are modeled using the blocks, IBDs,

and ADs defined in section 3.4 through 3.6.

The ‘Assign Attribute/Thread Weights’ workflow step assigns Wp, We, and Wh

attribute weights to 1, 0, and 0 for thread/function cost computation (section 3.3.6),

118

because the case study focus is on the performance attribute. The ‘Define Candidate

Physical Architecture Solutions’ workflow step defines Computers 1 and 2 as candidate

physical solutions (Figure 4-1).

The next step of the architecture analysis method (section 3.3) used for this case

study is to ‘Model Function Architecture Attributes’. The functional architecture (Figure

3-5) is modeled with a LogArch_1_Thread_1 block, LogArch_1_Thread_1_Function_n

where n=1..3 for each of three functions (Figure 3-6). The functional architecture is

extended with blocks for LogArch_1_Thread_1_Function_n_Performance where n=1..3

for each of three functions (Figure 3-7). An operation is developed for each function

performance block to compute the number of algorithm computations for each arithmetic

operation (LogArch_1_Thread_X_ Function_Y_Performance block in Figure 3-7).

The ‘Model Physical Architecture Attributes’ (Figure 3-1) workflow step begins

by adding all corresponding physical architecture blocks (Figure 3-5, Figure 3-6, Figure

3-7) to those presented in the previous paragraph to the executable model. Additionally,

all physical architecture performance attribute extensions (Figure A-29 through Figure

A-43) for all arithmetic operations26 are modeled for the SC_CM_AnalysisContainer

(Figure 3-7). The container computes the SC_Anal_EstExecutionTime attribute for each

estimated function execution time. Function estimated execution times are summed to

form a thread execution time.

The FFT/IFFT was implemented using code implementation obtained from

LIBROW™ (Chernenko n.d.). The GFFD was developed for this research effort. All

26 This case study fully exercises the sCPU performance attribute Complex Add, Complex Multiply,

Floating Point Multiply, Floating Point Divide, Integer Add, Sin, and Exp computation models.

119

code was implemented in C++ and was compiled (optimized for speed) using Visual

Studio 2015. The code was executed on each computer identified in Figure 4-1. Timing

data was collected using the SmartBear™ AQtime™ Pro (version 8.1) memory and

performance profiling tool. Statistical analysis for the collected data was performed

using the Minitab™ 18 Statistical Software product.

4.3 Case Study Results Analysis

The results presented focus on validating use of execution time SPMs developed

in section 3.6.2.1.1 to perform thread execution time estimates that support the Figure 3-1

‘Compute Attribute Cost’ workflow step. Thread execution time estimates are considered

useful if actual thread execution time is within 10 percent of the most-likely estimate and

25 percent of the WCET estimate.

The general approach discussed over the next three paragraphs was to use an

initial set of Computer 1 estimates and observed results build the ACF (referenced

equation 8 of section 3.6.2.1.1) that “calibrates” the execution time estimation for

unmodeled effects/factors. The ACF is then used to compute execution time estimates for

Computer 1 and also Computer 2 to validate the results on a different platform.

Figure 4-2 presents actual thread execution times (in microseconds) versus data

input size for both Computers 1 and 2. Actual optimized (i.e. for speed) mean values

(Rows 3 and 15) are derived by performing statistical analysis for 128 collected samples

of thread execution time. Collected samples are analyzed to remove outlier values that

result from unmodeled system dynamic effects such as the operating system lowering

CPU frequency due to CPU overheating and runtime thread breaks (NOTE: These

120

systems dynamic effects may be normal and expected and require further analysis for

inclusion in the arithmetic operation SPMs in section 3.7). The remaining values are

analyzed to produce a mean (NOTE: Details of the analysis are provided in section

4.3.1). This process is repeated 32 times. The 32 means are used to produce an overall

mean (Rows 3 and 15) and 95% confidence interval lower (Rows 1 and 13) and upper

(Rows 2 and 14) bounds. Figure 4-2 also presents estimates produced by the executable

model (i.e. developed in section 4.2) versus data input size for Computer 1 (Rows 5

through 7) and Computer 2 (Rows 17 through 19). ‘OP vs ML Est’ (Row 8) is computed

by dividing the ‘OP Mean’ (Row 3) actual time by ‘MostLikelyEst’ (Row 5) estimated

time. ACF (Row 9) architecture calibration factors are were originally used to calibrate

Computer 1 execution time estimates to 10 percent or less of actual Computer 1

execution times. ACF values (Row 9) are selected to produce more conservative

estimates than the values computed in Row 8 in order to introduce design margin. ‘OP

Adj Perf Est’ (Row 10) shows the calibrated estimate produced by multiplying Row 5

and Row 9. ‘C1 % Act vs Est’ (Row 11) presents the error for Computer 1 estimated and

actual results as the percent difference between Rows 10 and 3.

It should be noted that ACF values less than 1 are applied to execution time

estimates for data input block sizes less than or equal to 262144. Smaller data input block

size arithmetic operations execute faster because operand data and results are primarily

located in data cache memory. An ACF greater than 1 is used for data input block sizes

greater than 262144 because of increased interaction with bulk memory.

The ACF (Row 9) is used to produce Computer 2 ‘MostLikelyEst’ (Row 17)

execution time estimates. Comparison of ‘MostLikelyEst’ (Row 17) execution time

121

estimates with Computer 2 ‘OP Mean’ (Row 15) actual execution times results in ‘C2 %

Act vs Est’ (Row 20) errors within 10 percent. Computer 1 and Computer 2 results are

shown to be validated using the calibration approach. This was expected because though

the CPUs are different architectures and running at different speeds, the cache memory

architectures are the same for both computers. Most of the time for arithmetic operations

is spent retrieving and storing data which has been accounted for through development of

the SPMs and the ACF. The implications of these results are that the execution time

estimation approach can be used across the entire family of Intel i3, i5, and i7 processors

using the standard cache memory architecture. More testing must be performed with

additional processors and different algorithms to completely validate the calibration

approach.

122

Figure 4-2 Case Study Data Results

Another goal of the case study was to verify WCET estimates versus actual

results. It became evident very quickly that WCET estimates produced by the executable

model were very much higher than actual observed results for all thread data input sizes.

However, it was observed that a combination of uncalibrated cold state and warm state

estimates could be used to produce WCET execution time estimates that are within the 25

percent target with actual measured WCET execution times for all data input sizes for

both Computer 1 and 2.

123

The executable model was used to produce cold state and warm state execution

time estimates for Computer 1 (Rows 6 and 7) and Computer 2 (Rows 18 and 19). Actual

WCET execution times collected during data collection are provided in Figure 4-2 for

Computer 1 (Row 4) and Computer 2 (Row 16). ‘C1 WCET % Act vs Est’ (Row 12)

presents the error for Computer 1 estimated and actual results as the percent difference

between Rows 6/7 and 4. ‘C2 WCET % Act vs Est’ (Row 21) presents the error for

Computer 2 estimated and actual results as the percent difference between Rows 18/19

and 16.

4.3.1 Case Study Data Analysis Details

This section provides details on the data analysis performed to produce Figure 4-2

Rows 1-3 and 13-15 values. The procedure and analysis were conducted for both

Computer 1 and 2. The function thread was executed 128 times for each data input size

producing 128 execution time data samples. Each sample set is input to Minitab for

statistical analysis. The Outlier test is used to identify data that can skew analysis results

as shown in Figure 4-3. For example, the data point sample with red box (i.e. Row 38) is

identified as an outlier in Figure 4-3.

Figure 4-3 Computer One 32768 Data Sample 02 Outlier Report (from Minitab)

124

Figure 4-4 shows an Outlier report after removing Row 38 and 67 outliers from

the data sample set. Outlier values result from two operating conditions:

1. Thread execution experiences a thread break by the operating system.

This condition was detected and annotated using the AQtime tool.

2. CPU frequency is slowed by Dynamic Power Management (DPM)

software due to detection of a CPU overtemperature condition. This

condition was observed using the Speccy (by Piriform) tool.

Figure 4-4 Computer One 32768 Data Sample 02 (Minus Outliers) Outlier Report

The remaining data sample is analyzed to determine a median value and 95%

confidence interval. First, a histogram is produced from Minitab with a hypothesized

Largest Extreme Value overlay as shown in Figure 4-5 (Left Side). After visualization of

the plot, a Probability Plot is produced from Minitab with a hypothesized Largest

Extreme Value overlay as shown in Figure 4-5 (Right Side). The p-value indicates that

the data does not properly populate a Largest Extreme Value distribution.

125

Figure 4-5 Sample Minitab Histogram and Probability Plots

Since distribution characteristics cannot be determined the data is treated as a

related sample to the data that was collected and analyzed to produce the ACF (Figure

4-2 Row 9) and associated adjusted estimates (Figure 4-2 Rows 10 and 17). A Wilcoxon

signed-rank non-parametric hypothesis test is performed on the data. The test is run in

two stages. The first stage produces a hypothesized median value and 95% Confidence

Interval (Low and High values) shown in Figure 4-6 (Left Side) using the Minitab 1-stage

Wilcoxon test. The second stage reruns the 1-stage Wilcoxon test in Minitab with a

hypothesized median value (i.e. the estimated execution time from Figure 4-2 Rows 10

and 17). The test result is shown in Figure 4-6 (Right Side). The Computer One 32768

estimate of 9429.32 (Figure 4-2 Row 10) was used for the test shown in Figure 4-6. The p-

value of zero rejects the null hypothesis concluding that the data set median value is less

than the hypothesized estimated value.

126

Figure 4-6 Sample Wilcoxon Signed Rank Test Results

The median and 95% confidence interval values for all 32 data samples are

averaged to produce the data recorded in Figure 4-2 Rows 1-3 (Computer 1) and Rows

13-15 (Computer 2) for each data input size for each computer.

4.3.2 Thread Cost

The last four rows of Figure 4-2 address the ‘Compute Attribute Cost’ workflow

step (Figure 3-1, section 3.3.6) used in this case study. Figure 4-2 Rows 23 and 25 shows

computed C1 and C2 thread performance cost (Equation 4) using C1 (Row 10) and C2

(Row 17) estimated execution time (ExecTimeEst) divided by ‘CRT MOP’ (Row 22)

(LatencyReq) cost function for each candidate data input block size. A computed cost

greater than one indicates that thread execution time does not satisfy the ‘CRT MOP’ for

that data input block size. A cost value less than or equal to one indicates that thread

execution time satisfies the ‘CRT MOP’ for that data input block size. The computed

cost values show a maximum data input block size of 32768 satisfies the ‘CRT MOP’ for

127

both computers. Any smaller data input block size also satisfies the ‘CRT MOP’.

Computer 2 has lower cost for each data input block size, which is no surprise, since

execution-time performance is the sole weighted factor in the cost equation and

Computer 2 has greater performance capability. The primary goal of thread algorithm

design is to process the largest amount of data for the minimum cost. In addition, thread

performance cost for the selected data input block size is assigned to DAG nodes for

Computer 1 and 2 for subsequent use in the OptimizationAnalysis block (Figure 3-5).

Computed thread costs for all data input sizes and computers defines a SC trade

space value set. Final data input size is chosen based on overall objectives. If minimum

thread cost is an overarching constraint, then a data input size of 256 would be the

preferred solution. If processing the maximum amount of data is an overarching

constraint, then a data input size of 32768 would be the preferred solution. If lower

computer purchase cost is an additional overarching constraint, then computer one would

be the preferred solution. Otherwise, computer two would be the preferred solution.

4.3.3 Simulation Analysis

Demonstration of the simulation SPMs developed during this research effort is

beyond the scope of this paper due to the issues discussed in section 3.7.3.3. This

capability will be validated in future research efforts.

128

Chapter 5 - Conclusions

5.1 Contributions to the field

This dissertation makes the following contributions:

• Presents a component-level model-based architecture analysis method for

developing a design trade space, performing optimization analysis, and

performing simulation analysis.

• Presents system model logical and physical architecture extensions using

SysML. These extensions enable evaluation of multiple functional threads

on multiple physical architectures. Functional threads can be used to

model complex applications or to evaluate different algorithm approaches.

• Implements physical architecture system models that enables computation

of performance, energy, and thermal attributes for single core, multi/many

core, and GPU computer resource architectures.

• Implements and performs an initial proof-of-concept for an executable

abstract single core computation model capable of producing thread

algorithm performance (i.e. execution time) estimates within 10 percent

for normal execution and 25 percent for worst-case execution. The

computation model provides the basis for multi/many core performance

estimates.

• Computes DAG node (thread/function) costs. These costs will be export

to optimization algorithms implemented in MATLAB. A separate cost is

computed for each node and target physical architecture combination for

use by the optimization algorithm.

129

These contributions produce a component architecture (software and computer

resource) that possesses reduced technical risk. Risk is reduced through multi-attribute

(i.e. performance, energy, thermal, etc.) optimization that considers constraints. The

systems engineer is provided with analytical data, via the architecture attribute view, that

facilitates communication regarding domain-specific engineering artifacts (e.g. software

architecture with software engineers, computer resource architecture with computer

engineers, energy and thermal design with mechanical engineers, and redundancy

architecture with reliability engineers) earlier in the system design cycle. Requirements

for building a prototype are greatly reduced because performance, energy, and thermal

attributes are quantified through use of methods and artifacts produced by this research.

The attribute framework defined by this research is extensible. Domain specific (e.g.

mobile, cyber, etc.) logical and physical architecture attributes can be added to extend the

model and node cost equations. Finally, analysis efficiency and quality are improved via

an executable model. Analysis results are obtained quickly for changes/additions of

algorithms, computer resource solutions, and computer resource configurations without

the use of utility curves.

5.1.1 Limitations

The single core computation model must be separately developed for each

computer architecture. Examples of popular architecture families include the Advanced

Reduced Instruction Set Computer (RISC) Machine (ARM) (used in Embedded

Systems), and PowerPC RISC machine (used in Apple).

130

5.2 Recommendations for Future Work

The foundational research associated with this paper can be built upon in several

different ways:

• Enhance operations supported by and fidelity of statistical performance

models:

o Add memory size and speed parameter dependencies

o Add additional operations such as comparison and search to

support algorithms in data analysis

o Add matrix operations such as add and multiply to support multi-

dimensional algorithms in the areas in image processing, machine

learning, computer vision, and artificial intelligence

o Analyze outliers for inclusion into SPMs

• Integrate computation model for multi/many-core processors (utilize

Amdahl’s Law to compute speedup), GPU, and Field Programmable Gate

Arrays (FPGAs) within a single node

• Integrate computation model for distributed node architectures (e.g.

cluster)

• Incorporate performance estimates for state-based algorithms

• Integrate energy consumption, generated heat (i.e. thermal), and reliability

quantitative models for all physical architectures (single core, multi/many

core, GPU, FPGA)

• Integrate multi-attribute optimization algorithms

• Complete simulation analysis capability

131

• Integrate application specific architecture attributes. For example, mobile

applications that utilize intensive graphics consume energy and produce

heat at an accelerated rate. Lower energy consumption and heat

generation occur when screen pixels are black. A logical architecture

attribute such as percent screen pixels black can be added to the model

that associates with physical architecture energy consumption and heat

generation.

Additionally, computation models and quantitative energy consumption, heat

generation, and reliability models can be integrated for different processor families

(ARM, PowerPC, etc.).

132

Chapter 6 - Bibliography

Akaike, H. 1974. "A New Look at the Statistical Model Identification." IEEE

Transactions on Automatic Control (IEEE) 19 (6): 716-723.

doi:10.1109/TAC.1974.1100705.

Alford, Mack. 1992. "Strengthening the Systems/Software Engineering Interface for Real

Time Systems." Proceedings of the Second International Symposium of the

National Council on Systems Engineering. Seattle, Wash. 411-418.

Alford, Mack W. 1977. "A Requirements Engineering Methodology for Real-Time

Processing Requirements." IEEE Transactions on Software Engineering SE-3 (1):

60-69.

Asanovic, Krste, Ras Bodick, Bryan C. Catanzaro, Joseph J. Gebis, Parry husbands, Kurt

Keutzer, David A Patterson, et al. 2006. "The Landscape of Parallel Computing:

A View from Berkeley." Berkeley EECS Electrical Engineering and Computer

Science. University of California, Berkeley. December 18. Accessed June 15,

2017. https::www2.eecs.berkeley.edu/Pubs/RechRpts/2006/EECS-2006-183.html.

Balarin, Felice, Yosinori Watanabe, Harry Hsieh, Luciano Lavagno, Claudio Passerone,

and Alberto Sangiovanni-Vincentelli. 2003. "Metropolis: an integrated electronic

system design environment." Computer, April 08: 45-52.

doi:10.1109/MC.2003.1193228.

Balducci, M., A. Ganapathiraju, J. Hamaker, J. Picone, A. Choudary, and A. Skjellum.

1997. "Benchmarking of FFT Algorithms." IEEE Southeastcon Proceedings.

Blackburg, VA. doi:10.1109/SECON.1997.598704.

Balmelli, Laurent. 2007. "An Overview of the Systems Modeling Language for Products

and Systems Development." Journal of Object Technology 6 (6): 149-177.

Accessed October 16, 2017. http://www.jot.fm/issues/issue_2007_07/article2.

Balmelli, Laurent, D. Brown, Murray Cantor, and M. Mott. 2006. "Model-driven Systems

Development." IBM Systems Journal 45 (3): 569-585.

Banerjee, Sudarshan, and Nikil Dutt. 2004. "Efficient Search Space Exploration for HW-

SW Partitioning." Proceedings of the 2nd IEEE/ACM/IFIP International

Conference on Hardware/Software Codesign and System Synthesis. Stockholm:

ACM. 122-127.

Banerjee, Sudarshan, Elaheh Bozorgzadeh, and Nikil D. Dutt. 2006. "Integrating Physical

Constraints in HW-SW Partitioning for Architectures With Partial Dynamic

Reconfiguration." IEEE Transactions on Very Large Scale Integration (VLSI)

Systems 14 (11): 1189-1202. doi:10.1109/TVLSI.2006.886411.

Becker, Steffen, Heiko Koziolek, and Ralf Reussner. 2009. "The Palladio component

model for model-driven performance prediction." The Journal of Systems and

Software (Elsevier Science Inc.) 82 (1): 3-22. doi:10.1016/j.jss.2008.03.066.

133

Beihoff, Bruce, Christopher Oster, Sanford Friedenthal, Chris Paredis, Duncan Kemp,

Heinz Stoewer, David Nichols, and Jon Wade. 2014. A World in Motion - Systems

Engineering Vision 2025. San Diego, CA: INCOSE.

Berry, Gerard. 2000. "The Foundations of Estrel." In Proof, Language, and Interaction:

Essays in Honour of Robin Milner, 425-454. Cambridge, MA: MIT Press.

Bilsen, Greet, Marc Engels, Rudy Lauwereins, and Jean Peperstraete. 1996. "Cycle-static

dataflow." IEEE Transactions on Signal Processing 44 (2): 397-408.

doi:10.1109/78.485935.

BKCASE Editorial Board. 2017. The Guide to the Systems Engineering Body of

Knowledge (SEBoK). Edited by Hoboken, NJ R.D. Adcock (EIC). Vers. 1.8. The

Trustees of the Stevens Institute of Technology. Accessed October 23, 2017.

www.sebokwiki.org.

Bock, Conrad. 2006. "SysML and UML 2 Support for Activity Modeling." Systems

Engineering (INCOSE) 9 (2): 160-186. Accessed December 15, 2017.

doi:10.1002/sys.20046.

Booth, S. 2008. "System Engineering and Architecting with CORE." INCOSE WMA

Chapter Meeting.

http://www.incose.org/wma/library/docs/INCOSE_WMA.080408.01.pdf.

Boyd, E. L., W. Azeem, Hsien-Hsin Lee, Tien-Pao Shih, Shih-Hao Hung, and E. S.

Davidson. 1994. "A Hierarchical Approach to Modeling and Improving the

Performance of Scientific Applications on the KSR1." International Conference

on Parallel Processing (ICPP 1994). North Carolina: IEEE. 188-192.

doi:10.1109/ICPP.1994.30.

Buck, Joseph T., and Edward A. Lee. 1993. "Scheduling Dynamic dataflow graphs with

bounded memory using the token flow model." 1993 IEEE International

Conference on Acoustics, Speech, and Signal Processing (ICASSP-93).

Minneapolis, MN: IEEE. 429-432. doi:10.1109/ICASSP.1993.319147.

Buede, Dennis M., and William D. Miller. 2016. The Engineering Design of Systems:

Models and Methods. Hoboken, NJ: John Wiley & Sons, Inc.

Campeanu, Gabriel, Jan Carlson, and Severine Sentilles. 2014. "Component Allocation

Optimization for Heterogeneous CPU-GPU Embedded Systems." Software

Engineering and Advanced Applications (SEAA), 2014 40th EUROMICRO

Conference on, August 27-29: 229-236. doi:10.1109/SEAA.2014.29.

Cantor, Murray. 2003. "Rational Unified Process for Systems Engineering Part 1:

Introducing RUP SE Version 2.0." August. Accessed November 01, 2017.

http://vincentvanrooijen.com/container%5Cprocess%5CRational%20Unified%20

Process%20for%20Systems%20Engineering%20-%201.pdf.

134

—. 2003. "Rational Unified Process for Systems Engineering Part II: System

Architecture." September. Accessed November 01, 2017.

http://vincentvanrooijen.com/container%5CArchitecture%5CRational%20Unified

%20Process%20for%20Systems%20Engineering%20Part%20II%20-

%20System%20Architecting.pdf.

Carson, Ronald S., and Barbara J. Sheeley. 2013. "Functional Architecture as the Core of

Model-Based Systems Engineering." INCOSE International Symposium.

Philadelphia, PA. 29-45. doi:10.1002/j.2334-5837.2013.tb03002.x.

Chernenko, Sergey. n.d. Article 10 Fast Fourier Transform - FFT. LIBROW. Accessed

March 21, 2018. www.librow.com/articles/article-10.

Commoner, F., A. W. Holt, S. Even, and A. Pnueli. 1971. "Marked Directed Graphs."

Journal of Computer and System Sciences (Elsevier) 5 (5): 511-523.

doi:10.1016/S0022-0000(71)80013-2.

Cooley, James W., and John W. Tukey. 1965. "An Algorithm for the Machine

Calculation of Complex Fourier Series." Mathematics of Computation (American

Mathematical Society) 19 (90): 297-301. doi:10.2307/2003354.

Cross, Nigel. 2008. Engineering Design Methods: Strategies for Product Design. Fourth.

West Sussex, Eng.: John Wiley & Sons Ltd.

Cui, Zheng, Yun Liang, Kyle Rupnow, and Deming Chen. 2012. "An Accurate GPU

Performance Model for Effective Control Flow Divergence Optimization." 26th

International Parallel & Distributed Symposium (IFDPS). Shanghai, China:

IEEE. 83-94. doi:10.1109/IPDPS.2012.18.

De Micheli, Giovanni, and Rajesh K. Gupta. 1997. "Hardware/Software Co-Design."

Proceedings of the IEEE (IEEE) 85 (3): 349-365. doi:10.1109/5.558708.

Deb, Kalyanmoy, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. "A fast and

elitist multiobjective genetic algorithm: NSGA-II." IEEE Transactions on

Evolutionary Computation 6 (2): 182-197. doi:10.1109/4235.996017.

Dick, Robert P., David L. Rhodes, and Wayne Wolf. 1998. "TGFF: Task Graphs for

Free." Proceedings of the 6th International Workshop on Hardware/Software

Codesign. Seattle, Washington, USA: IEEE Computer Society. 97-101.

Dori, Dov E.F. 2006. "Object-Process Methodology." In Encyclopedia of Knowledge

Management, by David G. Schwartz, 683-693. Hershey: Idea Group Reference.

—. 2002. Object-Process Methodology: A Holistic Systems Paradigm. Berlin: Springer-

Verlag.

Du, Jiayi, Xiangsheng Kong, Xin Zuo, Lingyan Zhang, and Aijia Ouyang. 2014.

"Shuffled frog leaping algorithm for hardware/software partitioning." Journal of

Computers 9 (11): 2752-2760.

135

Elbeltagi, Emad, Tarek Hegazy, and Donald Grierson. 2005. "Comparison among five

evolutionary-based optimization algorithms." Advanced engineering informatics

19 (1): 43-53.

Ernst, Rolf, Jorg Henkel, and Thomas Benner. 1993. "Hardware-software cosynthesis for

microcontrollers." IEEE Design & Test of computers (IEEE Computer Society) 10

(4): 64-75. doi:10.1109/54.245964.

Estefan, Jeff A. 2008. "Survey of Model-Based Systems Engineering (MBSE)

Methodologies." Vers. B. INCOSE MBSE Initiaitive. May 23. Accessed 10 01,

2017. http://www.omgsysml.org/MBSE_Methodology_Survey_RevB.pdf.

Fernandez, Maribel. 2009. Models of Computation: An Introduction to Computability

Theory. Springer.

Fisher, Gerard H. 1998. "Model-Based Systems Engineering of Automotive Systems."

Digital Avionics Systems Conference, 1998. Proceedings, 17th DASC. The

AIAA/IEEE/SAE. Bellevue, WA: IEEE. B15/1-B15/7.

doi:10.1109/DASC.1998.741455.

Friedenthal, Sanford A., and Cris Kobryn. 2004. "Extending UML to Support a Systems

Modeling Language." Annual INCOSE International Symposium. Toulouse,

France: INCOSE. 686-706. doi:10.1002/j.2334-5837.2004.tb00527.x.

Friedenthal, Sanford, Alan Moore, and Rick Steiner. 2015. A Practical Guide to SysML:

the systems modeling language. Third. Morgan Kaufmann.

Friedenthal, Sanford, and Mark Sampson. 2012. "Model-based Systems Engineering

(MBSE) Initiative." INCOSE MBSE Workshop. Jacksonville, FL: INCOSE,

January 21-22. Accessed November 01, 2017.

http://www.omgwiki.org/MBSE/lib/exe/fetch.php?media=mbse:mbse_iw_2012-

introduction-2012-01-21-friedenthal-c.pptx.

Gaudiot, Jean-Luc. 1991. "Stream Languages and data-flow." In Advanced Topics in

data-flow computing, edited by Jean-Luc Gaudiot and Lubomir Bic, 439-454.

Englewood Cliffs, NJ: Prentice Hall.

Gonzalez, Rafael C., Richard E. Woods, and Steven L. Eddins. 2009. "Lowpass

(Smoothing) Frequency Domain Filters." Chap. 4.5.2 in Digital Image Processing

Using MATLAB, 826. Gatesmark Publishing.

Gupta, Rajesh K., Claudionor Nunes Coellho Jr., and Giovanni De Micheli. 1992.

"Synthesis and simulation of digital systems containing interacting hardware and

software components." Proceedings of the 29th ACM/IEEE Design Automation

Conference. IEEE Computer Society Press. 225-230.

Halbwachs, N., P. Caspi, P. Raymond, and D. Pilaud. 1991. "The synchronous data flow

programming language LUSTRE." Proceedings of the IEEE (IEEE) 79 (9): 1305-

1320. doi:10.1109/5.97300.

136

Harel, D., H. Lachover, A. Naamad, A. Pnueli, M. Politi, R. Sherman, A. Shtull-Trauring,

and M. Trakhtenbrot. 1990. "STATEMATE: a working environment for the

development of complex reactive systems." IEEE Transactions on Software

Engineering 16 (4): 403-414.

Henkel, Jorg, and Rolf Ernst. 2001. "An Approach to Automated Hardware/Software

Partitioning Using a Flexible Granularity that is Driven by High-Level Estimation

Techniques." IEEE Transactions on Very Large Scale Integration (VLSI) Systems

9 (2): 273-289.

Henzinger, Thomas A., Benjamin Horowitz, and Christoph Meyer Kirsch. 2001.

"Embedded Control Systems Development with Giotto." Proceedings of the ACM

SIGPLAN workshop on Languages, compilers, and tools for embedded systems

(LCTES '01). Snow Bird, UT. 64-72. doi:10.1145/384197.384208.

Hill, Mark D., and Michael R. Marty. 2008. "Amdahl's Law in the Multicore Era."

Computer (IEEE) 41 (7): 33-38. doi:10.1109/MC.2008.209.

Hoare, C. A. R. 1978. "Communicating Sequential Processes." Communications of the

ACM (ACM) 21 (8): 666-677. doi:10.1145/359576.359585.

Hoffmann, Hans-Peter. 2013. "IBM Rational Harmony Deskbook Rel. 4.1." July.

Accessed 10 15, 2017.

https://www.ibm.com/developerworks/community/groups/service/html/communit

yview?communityUuid=dbc39547-3619-4c31-9535-

0b583a4e6190#fullpageWidgetId=W62078615f88f_4809_afad_c27cdc9d7e71&fi

le=2132d88d-4dde-40b4-8102-254ca4456c82.

Hong, Sunpyo, and Hyesoon Kim. 2009. "Memory-Level and Thread_Level Parallelism

Aware GPU Architecture Performance Analytical Model." Proceedings of the

36th Annual International Symposium on Computer Architecture (ISCA '09).

Austin, TX: ACM. 152-163. doi:10.1145/1555754.1555775.

Hylands, Christopher, Edward Lee, Jie Liu, Xiaojun Liu, Stephen Neuendorffer, Yuhong

Xiong, Yang Zhao, and Haiyang Zheng. 2003. "Overview of the Ptolemy

Project." Ptolemy Project Heterogeneous Modeling and Design. University of

California, Berleley. July 02. Accessed December 28, 2017.

https://ptolemy.eecs.berkeley.edu/publications/papers/03/overview.

IEEE Computer Society Software & Systems Engineering Standards Committee. 2008.

Systems and Software Engineering - System life cycle processes. Standard,

Geneva/Piscataway: ISO/IEC-IEEE.

Institute of Electrical and Electronics Engineers. 2012. Information Technology -

Modeling Languages - Part 1: Syntax and Semantics for IDEF0. Standard,

Geneva: International Standards Organization, 120.

137

ISO. 2016. ISO/IEC/IEEE International Standard for Systems and Software Engineering-

Life Cycle Management-Part 4: Systems Engineering Planning. Standard,

Geneva: ISO.

ISO/IEC JTC 1/SC 7. 2011. December. Accessed 10 05, 2017.

http://cabibbo.dia.uniroma3.it/asw/altrui/iso-iec-ieee-42010-2011.pdf.

Jing, Yiming, Jishun Kuang, Jiayi Du, and Biao Hu. 2013. "Application of improved

simulated annealing optimization algorithms in hardware/software partitioning of

the reconfigurable system-on-chip." Proceedings of the International Conference

on Parallel Computing in Fluid Dynamics. Changsha, China: Springer-Verlag.

532-540.

Kahn, Gilles. 1974. "The semantics of a simple language for parallel programming."

Proceedings of the IFIP Congress. 471-475. Accessed December 26, 2017.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.597.5710&rep=rep1&t

ype=pdf.

Kang, Yan, He Lu, and Jing He. 2013. "A PSO-based Genetic Algorithm for Scheduling

of Tasks in a Heterogeneous Distributed System." Journal of Software 8 (6):

1443-1450.

Keinert, Joachim, Thomas Schlichter, Joachim Falk, Jens Gladigau, Christian Haubelt,

Jurgen Teich, and Michael Meredith. 2009. "SystemCoDesigner—an automatic

ESL synthesis approach by design space exploration and behavioral synthesis for

streaming applications." ACM Transactions on Design Automation of Electronic

Systems (TODAES) 14 (1): 1-23.

Knudsen, Peter Voigt, and Jan Madsen. 1996. "PACE: A dynamic programming

algorithm for hardware/software partitioning." Proceedings of the 4th

International Workshop on Hardware/Software Co-Design. IEEE Computer

Society. 85-92.

Kuang, Shiann-Rong, Chin-Yang Chen, and Ren-Zheng Liao. 2005. "Partitioning and

Pipelined Scheduling of Embedded System Using Integer Linear Programming."

Proceedings of the 11th International Conference on Parallel and Distributed

Systems (ICPADS'05). Fukuoka, Japan: IEEE Computer Society. 37-41.

doi:10.1109/ICPADS.2005.219.

Lee, Edward A., and David G. Messerschmitt. 1987. "Synchronous data flow."

Proceedings of the IEEE (IEEE) 75 (9): 1235-1245.

doi:10.1109/PROC.1987.13876.

Lee, Edward A., and Thomas M. Parks. 1995. "Dataflow process networks." Proceedings

of the IEEE 83 (5): 773-801. doi:10.1109/5.381846.

Lee, Edward Ashford, and David G. Messerschmitt. 1987. "Static Scheduling of

Synchronous Data Flow Programs for Digital Signal Processing." IEEE

Transactions on Computers C-36 (1): 24-35. doi:10.1109/TC.1987.5009446.

138

LeGuernic, P., T. Gautier, M. Le Borgne, and C. Le Maire. 1991. "Programming real-

time applications with SIGNAL." Proceeding of the IEEE (IEEE) 79 (9): 1321-

1336. doi:10.1109/5.97301.

Lempel, Oded. 2011. "2nd Generation Intel Core Processor Family: Intel Core i7, i5, and

i3." 2011 IEEE Hot Chips 23 Symposium (HCS). Stanford, CA, USA: IEEE.

doi:10.1109/HOTCHIPS.2011.7477509.

—. 2011. "2nd Generation Intel Core Processor Family: Intel Core i7, i5, and i3)." Intel.

July 28. Accessed 10 20, 2017. https://www.hotchips.org/wp-

content/uploads/hc_archives/hc23/HC23.19.9-Desktop-CPUs/HC23.19.911-

Sandy-Bridge-Lempel-Intel-Rev%207.pdf.

Levis, A. 1993. "National Missile Defense (NMD) Command and Control Methodology

Development." Contract Data Requirements List A005 report for US Army

Contract MDA 903-88-019, Delivery Order 0042, Center of Excellence in

Command, Control, Communication, and Intelligence, George Mason University,

Fairfax, VA.

Li, Guoshuai, Jinfu Feng, Junhua Hu, Cong Wang, and Duo Qi. 2014.

"Hardware/Software Partitioning Algorithm Based on Genetic Algorithm."

Journal of Computers 9 (6): 1309-1315.

Lin, Geng, Wenxing Zhu, and M Montaz Ali. 2014. "A tabu search-based memetic

algorithm for hardware/software partitioning." Mathematical Problems in

Engineering 1-15.

Liu, Peng, Jigang Wu, and Yongji Wang. 2013. "Hybrid algorithms for

hardware/software partitioning and scheduling on reconfigurable devices."

Mathematical and Computer Modelling 58 (1): 409-420.

Long, David, and Zane Scott. 2011. "A Primer for Model-Based Systems Engineering."

Vers. 2nd Edition. Vitech Corporation. Vitech Corporation. October. Accessed 10

07, 2017. http://www.vitechcorp.com/resources/mbse.shtml.

López-Vallejo, Marisa, and Juan Carlos López. 2003. "On the hardware-software

partitioning problem: System modeling and partitioning techniques." ACM

Transactions on Design Automation of Electronic Systems (TODAES) 8 (3): 269-

297.

Madsen, Jan, Jesper Grode, Peter Voigt Knudsen, Morten Elo Petersen, and Anne

Haxthausen. 1997. "LYCOS: The Lyngby co-synthesis system." Design

Automation for Embedded Systems (Kluwer Academic Publishers) 2 (2): 195-235.

Maier, Mark W., and Eberhardt Rechtin. 2009. The Art of Systems Architecting. Third.

Boca Raton, FL: CRC Press. doi:978-1420079135.

139

McKean, David, James D Moreland Jr., and Steven Doskey. 2019. "Use of model-based

architecture attributes to construct a component-level trade space." INCOSE

Systems Engineering (Wiley Online Library). doi:10.1002.sys.21478.

Mealy, George H. 1955. "A Method for Synthesizing Sequential Circuits." Bell System

Technical Journal (Wiley Online Library) 34 (5): 1045-1079. Accessed 12 15,

2017. doi:10.1002/j.1538.7305.1955.tb03788.x.

Meyerowitz, Trevor C. 2008. Single and Multi-CPU Performance Modeling for

Embedded Systems. Dissertation, Electrical Engineering and Computer Sciences,

Graduate Division, University of California at Berkley.

http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-36.html.

Moreira, Orlando, Twan Basten, Marc Geilen, and Sander Stuijk. 2010. "Buffer Sizing

for Rate-Optimal Single-Rate Data-Flow Scheduling Revisited." IEEE

Transactions on Computers (IEEE) 59 (2): 188-201. doi:10.1109/TC.2009.155.

Mudry, Pierre-Andre, Guillaume Zufferey, and Gianluca Tempesti. 2006. "A

Dynamically Constrained Genetic Algorithm for Hardware-software

Partitioning." Proceedings of the 8th Annual Conference on Genetic and

Evolutionary Computation. Seattle, Washington, USA. 769-776.

doi:10.1145/1143997.1144134.

Murata, Tadao. 1989. "Petri Nets: Properties, analysis and applications." Proceedings of

the IEEE (IEEE) 77 (4): 541-580. doi:10.1109/5.24143.

Myung, Jae. 2003. "Tutorial on Maximum Likelihood Estimation." Journal of

Mathematical Psychology (Academic Press) 47: 90-100.

NASA. 2007. NASA Systems Engineering Handbook. Handbook, NASA, 360. Accessed

June 2018. file:///E:/GWU/Research/Model-

Based%20Systems%20Engineering/Modeling%20Languages/NASA-SP-2007-

6105-Rev-1-Final-31Dec2007.pdf.

Niemann, Ralf, and Peter Marwedel. 1997. "An Algorithm for Hardware/Software

Partitioning Using Mixed Integer Linear Programming." Design Automation for

Embedded Systems (Kluwer Academic Publishers) 2 (2): 165-193.

doi:10.1013/A.1008832202436.

—. 1996. "Hardware/Software Partitioning Using Integer Programming." EDTC '96

Proceedings of the 1996 European conference on Design and Test. Paris, France:

IEEE Computer Society. 473-479.

Nolan, Brian, Barclay Brown, Laurent Balmelli, Tim Bohn, and Ueli Wahli. 2008.

ibm.com/redbooks. International Technical Support Organization. February.

Accessed October 02, 2017. https://www-

01.ibm.com/events/wwe/grp/grp004.nsf/vLookupPDFs/Rational%20MBSE-

MDSD%20Redbook%202008/$file/Rational%20MBSE-

MDSD%20Redbook%202008.pdf.

140

Object Management Group (OMG). 2018. What is SysML? OMG. Accessed June 15,

2018. http://www.omgsysml.org/what-is-sysml.htm.

Object Management Group. 2008. "MARTE Specification." Object Management Group.

June 08. http://www.omg.org/omgmarte/Specification.htm.

—. 2017. "OMG System Modeling Language Specification." Vers. 1.5. OMG. May.

Accessed October 16, 2017. http://www.omg.org/spec/SysML/About-SysML/.

—. 2003. "UML for Systems Engineering RFP." March 28. Accessed 10 19, 2017.

http://syseng.omg.org/UML_for_SE_RFP.htm.

Oliver, David W., Timothy P. Kelliher, and James G. Keegan, Jr. 1997. Engineering

Complex Systems With Objects and Models. New York: McGraw-Hill.

Parnell, Gregory S., and Timothy E. Trainor. 2009. "Using the Swing Weight Matrix to

Weight Multiple Objectives." INCOSE International Symposium. Singapore. 283-

298. doi:10.1002/j.2334-5837.2009.tb00949.x.

Pohl, Klaus. 2010. Requirements Engineering: Fundamentals, Principles, and

Techniques. Heidelberg: Springer.

Ramos, Ana Luisa, Jose Vasconcelos Ferreira, and Jaume Barcelo. 2012. "Model-Based

Systems Engineering: An Emerging Approach for Modern Systems." IEEE

Tansactions of Systems, Man, and Cybernetics - Part C: Applications and

Reviews 42 (1): 101-111.

Roedler, G. J., and C. Jones. 2005. Technical Measurement: A Collaborative Project of

PSM, INCOSE, and Industry. Practical Software & Systems Management (PSM),

San Diego, CA: INCOSE, 65.

Roedler, Garry. 2012. "Harmonization of Key Systems Engineering Resources."

Proceedings 15th Annual NDIA Systems Engineering Conference. San Diego,

CA. Accessed November 01, 2017.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.386.8520&rep=rep1&t

ype=pdf.

Ross, Douglass T. 1977. "Strutured Analysis (SA): A Language for Communicating

Ideas." IEEE Transactions on Software Engineering SE-3 (1): 16-34.

doi:10.1109/TSE.1977.229900.

Sadashiv, Naidila, and S. M. Dilip Kumar. 2011. "Cluster, grid and cloud computing: A

detailed comparison." 2011 6th International Conference on Computer Science

and Education (ICCSE). Singapore: IEEE. 477-482.

doi:10.1109/ICCSE.2011.6028683.

Sangiovanni-Vincentelli, Alberto. 2007. "Quo vadis, SLD? reasoning about the trends

and challenges of system level design." Proceedings of the IEEE (IEEE) 95 (5):

467-506.

141

Sapienza, Gaetana, Tiberiu Seceleanu, and Ivica Crnknovic. 2013. "Partitioning Decision

Process for Embedded Hardware and Software Deployment." Computer Software

and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th

Annual. Japan: IEEE. 674-680. doi:10.1109/COMPSACW.2013.131.

Savage, John E. 1998. Models of Computation: Exploring the Power of Computation.

Addison-Wesley.

Schaumont, Patrick R. 2013. "Data Flow Modeling and Transformation." Chap. 2 in A

Practical Introduction to Hardware/Software Codesign, 31-59. New York:

Springer. doi:10.1007/978-1-4614-3737-6.

Schhwarz, Gideon. 1978. "Estimating the Dimension of a Model." The Annals of

Statistics (The Institute of Mathematical Statistics) 6 (2): 461-464.

Schlichter, Thomas, Martin Lukasiewycz, Christian Haubelt, and Jürgen Teich. 2006.

"Improving system level design space exploration by incorporating sat-solvers

into multi-objective evolutionary algorithms." IEEE Computer Society Annual

Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

Karlsruhe: IEEE. 6 pp.

Schulz, Stephan, Jerzy W. Rozenblit, Michael Mrva, and Klaus Buchenrieder. 1998.

"Model-Based Codesign." IEEE Computer 60-67.

Shah, A. A., A. A. Kerzhner, D. Schaefer, and C. J.J. Paredis. 2010. Multi-View Modeling

to Support Embedded Systems Engineering in SysML. Vol. 5765, in Graph

Transformations and Model-Driven Engineering. Lecture Notes in Computer

Science, edited by G. Engels, C. Lewerentz, W. Schafer, A. Schurr and B.

Westfechtel, 580-601. Berlin, Heidelberg: Springer.

doi:https://doi.org.10.1007/978-3-642-17322-6_25.

Society of Automotive Engineers. 2014. Processes for Engineering a System.

Warrendale: SAE International.

SOURCEFORGE. n.d. Open SystemC Initiative (OSCI). Slashdot Media. Accessed

December 28, 2017. https://sourceforge.net/p/systemc/wiki/Home.

Teich, Jurgen. 2012. "Hardware/Software Codesign: The Past, the Present, and Predicting

the Future." Proceedings of the IEEE (IEEE) 100 (Special Centennial Issue):

1411-1430. doi:10.1109/JPROC.2011.2182009.

Thomasian, Alexander, and Paul F. Bay. 1986. "Analytic Queueing Network Models for

Parallel Processing of Task Systems." IEEE Transactions on Computers (IEEE)

C-35 (12): 1045-1054. doi:10.1109/TC.1986.1676712.

Tikir, Mustafa M, Laura Carrington, Erich Strohmaier, and Allan Snavely. 2007. "A

genetic algorithm approach to modeling the performance of memory-bound

computations." ACM/IEEE Conference on Supercomputing (SC '07). Reno, NV:

IEEE. doi:10.1145/1362622.1362686.

142

Van Werkhoven, B., J. Maassen, F. J. Seinstra, and H. E. Bal. 2014. "Performance

Models for CPU-GPU Data Transfers." 14th IEEE/ACM International Symposium

on Cluster, Cloud, and Grid Computing (CCGrid). Chicago, IL: IEEE. 11-20.

doi:10.1109/CCGrid.2014.16.

Vitech. 2013. "CORE 9 Unlocking the Power of MBSE - Product Slick." Accessed 10 15,

2017. www.vitechcorp.com/products/files/core4pageslick.pdf.

Wilhelm, Reinhard, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing,

David Whalley, Guillem Bernat, et al. 2008. "The worst-case execution-time

problem -- overview of methods and survey of tools." ACM Transactions on

Embedded Computing Systems (TECS) (ACM) 7 (3): 36-53.

doi:10.1145/1347375.1347389.

Williams, Samuel, Andrew Waterman, and David Patterson. 2009. "Roofline: an

Insightful Visual Performance Model for Multicore Architectures."

Communications of the ACM (ACM) 52 (4): 65-76.

doi:10.1145/1498765.1498785.

Wolf, Wayne. 2003. "A decade of hardware/software codesign." Computer (IEEE

Computer Society) 36 (4): 38-43. doi:10.1109/MC.2003.1193227.

Wolf, Wayne, Ahmed Amine Jerraya, and Grant Martin. 2008. "Multiprocessor System-

on-Chip (MPSoC) Technology." IEEE Transactions on Computer-Aided Design

of Integrated Circuits and Systems 27 (10): 1701-1713.

Wu, Jigang, and Thambipillai Srikanthan. 2006. "Low-complex dynamic programming

algorithm for hardware/software partitioning." Information Processing Letters

(Elsevier) 98 (2): 41-46. doi:10.1016/j.ipl.2005.12.008.

Wu, Jigang, Pu Wang, Siew-Kei Lam, and Thambipillai Srikanthan. 2013. "Efficient

heuristic and tabu search for hardware/software partitioning." The Journal of

Supercomputing 66 (1): 118-134.

Wu, Jigang, Qiqiang Sun, and Thambipillai Srikanthan. 2012. "Algorithmic aspects for

multiple-choice hardware/software partitioning." Computers & Operations

Research (Elsevier) 39 (12): 3281-3292. doi:10.1016/j.cor.2012.04.013.

Yu-dong, Zhang, Wu Le-nan, and Wei Geng. 2009. "Hardware/software partition using

adaptive ant colony algorithm." Control and Decision 24 (9): 1385-1389.

Zhang, Yao, and John D. Owens. 2011. "A Quantitative Performance Analysis Model for

GPU Architectures." 17th International Symposium on High Performance

Computer Architecture (HPCA). San Antonio, TX: IEEE. 382-393.

doi:10.1109/HPCA.2011.5749745.

Zitzler, Eckart, Marco Laumanns, and Lothar Thiele. 2001. SPEA2: Improved the

Performance of the Strength Pareto Evolutionary Algorithm. Computer

143

Engineering and Communication Networks Lac (TIK), Swiss Federal institute of

Technology (ETH), Zurich: , 21 pp.

144

Chapter 7 - COPYRIGHTS

A World in Motion – Systems Engineering Vision 2025 (INCOSE)

This product was prepared by the Systems Engineering Vision 2025 Project Team of the

International Council on Systems Engineering (INCOSE). It is approved by the INCOSE

Technical Operations for release as an INCOSE Technical Product.

Copyright ©2014 by INCOSE, subject to the following restrictions:

Author use: Authors have full rights to use their contributions in a totally unfettered way

with credit to the INCOSE Technical Product.

INCOSE use: Permission to reproduce this document and to prepare derivative works

from this document for INCOSE use is granted provided this copyright notice is included

with all reproductions and derivative works.

External Use: This document may be shared or distributed to non-INCOSE third parties.

Requests for permission to reproduce this document in whole are granted provided it is

not altered in any way.

Extracts for use in other works are permitted provided this copyright notice and INCOSE

attribution are included with all reproductions; and, all uses including derivative works

and commercial use, acquire additional permission for use of images unless indicated as a

public image in the General Domain.

Requests for permission to prepare derivative works of this document or any for

commercial use will be denied unless covered by other formal agreements with INCOSE.

Contact INCOSE Administration Office, 7670 Opportunity Rd., Suite 220, San Diego,

CA 92111-2222, USA.

145

Appendix A Oversized Figures

This appendix contains figures referenced in the main document are best represented in landscape orientation.

Figure A-1 Architecture Analysis Container IBD

ibd [Block] CMP_ArchitectureAnalysis [IBD_CMP_ArchitectureAnalysis]

OptimizationAnalysisPart1

IB_ThreadWeight

trrweighti

IB_AttributeWeight

attrweighti

IB_OptimControl

optctli

IB_DagThreadData

optdatin

IB_OptimMode

optmdi

OpimizationModePart1

IB_OptimModeopmdo

CMP_PhysicalArchitectureContainerPart1

IB_ArchitectureAttributeResultsAvailable

rsltarchattri

IB_StartRetrieveLogicalArchitectureAttributesEvent

strtarchattro

AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams

attrparamsinIB_ThreadConstraints

thrconstri

IB_ThreadWeight

trrweighti

IB_AttributeWeight

attrweighti

IB_LogicalArchContainerOut

logarchconout

IB_PhysArchResultsAvailablersltphysarchout

la_numfuncperthrin

la_numfuncthrin

IB_NumberPhysArchs

numphysarchsin

IB_StartAnalysisEvent

startanalin

IB_SimControl

simctlout

IB_SimulationData

simdatout

IB_SimMode

simmdin

IB_OptimControl

optctlo

IB_DagThreadData, IB_DagFunctionData

optdatout

IB_OptimMode

optmdi

IB_AnalysisState

asi

IB_AnalysisMode

ati

CMP_LogicalArchitectureContainerPart1


rsltarchattro


strtarchattri


logarchconout

la_numfuncperthrout

la_numfuncthrout

IB_LogicalArchContainerInlogarchconin

attrparamsout

SimulationAnalysisPart1

IB_SimControl

simctli

IB_SimulationData

simdatin

IB_SimMode

simmdin

SimulationModePart1

IB_SimModesimmdout

AnalysisModePart1

IB_AnalysisMode

amo

AnalysisStatePart1

IB_AnalysisStateaso

StartArchitectureAnalysisPart1

IB_LogicalArchContainerIn

logarchconout

IB_PhysArchResultsAvailable

rsltphysarchin

IB_NumberLogicalArchs

numlogarchsin


startanalout

CMP_NumberPhysicalArchitecturesPart1

IB_NumberPhysArchs

numphysarchsout

CMP_NumberLogicalArchitecturesPart1

IB_NumberLogicalArchs

numlogarchsout

AttributeWeightPart1

IB_AttributeWeight

attrweighto

ThreadWeightPart1

IB_ThreadWeight

trrweighto

ThreadConstraintsPart1

IB_ThreadConstraints

thrconstro

146

Figure A-2 StartArchitectureAnalysisBlock Activity and AD

ibd [Block] StartArchitectureAnalysisBlock[IBD_StartArchitectureAnalysis]

ACT_StartArchitectureAnalysis«Activity»

act [Block] StartArchitectureAnalysisBlock [ACT_StartArchitectureAnalysis]

Subsequemt_Activity_Entry

PhysicalArchitectureResultsAvailable

DN_1

StartArchitectureAnalysis to startanalout

[LOC_CurrentLogicalArchitecture < IN_NumberLogicalArchitectures]

[else]

LOC_CurrentLogicalArchitecture++;

logarchconout.incrementCurrentLogicalArchitecture();

Initial_Activity_Entry


100

IN_NumberLogicalArchitectures =

numlogarchsin.getNumberLogicalArchitectures();



DN_1



[else]





100





DN_1



[else]





100



147

Figure A-3 LogArch Container IBD

ibd [Block] CMP_LogicalArchitectureContainer [IBD_CMP_LogArch_Container]


rsltarchattro


strtarchattri


logarchconout

la_numfuncperthrout

IB_LogArchOne_FunctionalThreads, IB_LogArchTwo_FunctionalThreads, IB_LogArchThree_FunctionalThreads, IB_LogArchFour_FunctionalThreads

la_numfuncthrout

IB_LogicalArchContainerIn

logarchconin


attrparamsout

CMP_Candidate_1_LogicalArchitecturePart1

IB_LogArchOne_StartRetrieveArchAttrs

strtlogarchonearchattri

IB_LogArchOne_ArchAttrResultsAvailable

rsltlogarchonearchattro


attrparamsout

IB_LogArchOne_FunctionalThreads

la1_numfthrout

IB_LogArchOne_ThrOne_NumberFunctions, IB_LogArchOne_ThrTwo_NumberFunctions, IB_LogArchOne_ThrThree_NumberFunctions, IB_LogArchOne_ThrFour_NumberFunctions

la1_numfuncperthrout


IB_LogArchTwo_StartRetrieveArchAttrs

strtlogarchtwoarchattri

IB_LogArchTwo_ArchAttrResultsAvailable

rsltlogarchtwoarchattro


attrparamsout

IB_LogArchTwo_FunctionalThreads

la2_numfthrout

IB_LogArchTwo_ThrOne_NumberFunctions, IB_LogArchTwo_ThrTwo_NumberFunctions, IB_LogArchTwo_ThrThree_NumberFunctions, IB_LogArchTwo_ThrFour_NumberFunctions



IB_LogArchThree_ArchAttrResultsAvailable

rsltlogarchthreearchattro

IB_LogArchThree_StartRetrieveArchAttrs

strtlogarchthreearchattri


attrparamsout

IB_LogArchThree_FunctionalThreads

la3_numfthrout

IB_LogArchThree_ThrOne_NumberFunctions, IB_LogArchThree_ThrTwo_NumberFunctions, IB_LogArchThree_ThrThree_NumberFunctions, IB_LogArchThree_ThrFour_NumberFunctions



IB_LogArchFour_ArchAttrResultsAvailable

rsltlogarchfourarchattro

IB_LogArchFour_StartRetrieveArchAttrs

strtlogarchfourarchattri


attrparamsout

IB_LogArchFour_FunctionalThreads

la4_numfthrout

IB_LogArchFour_ThrOne_NumberFunctions, IB_LogArchFour_ThrTwo_NumberFunctions, IB_LogArchFour_ThrThree_NumberFunctions, IB_LogArchFour_ThrFour_NumberFunctions


The model currently supports 4 logical architectures. This enables the tradeoff of four logical architectures. The model can be extended if more are required.

CMP_LogArch_ExecutionControlPart1

IB_LogArchFour_StartRetrieveArchAttrs

strtlogarchfourarchattro

IB_LogArchThree_StartRetrieveArchAttrs

strtlogarchthreearchattro

IB_LogArchTwo_StartRetrieveArchAttrs

strtlogarchtwoarchattro


strtlogarchonearchattro

IB_LogArchFour_ArchAttrResultsAvailable

rsltlogarchfourarchattri

IB_LogArchThree_ArchAttrResultsAvailable

rsltlogarchthreearchattri


rsltlogarchtwoarchattri


rsltlogarchonearchattri


rsltarchattro


strtarchattri

148

Figure A-4 LogArch Container Perform Computations Execution Control AD

act [Block] CMP_LogArch_ExecutionControl [ACT_LogArchExecutionControl]

ResultsAvailableProcessing

ArchitectureAttributeResultsAvailable(LogArch, ThrNum, FuncNum, AttrTyp) to rsltarchattro

AttrTypFuncNumThrNumLogArch

setResultsEventParams

AttrTyp FuncNum ThrNum

LogArchThree_ArchAttrResultsAvailable

AttrTyp FuncNum ThrNumLogArchOne_ArchAttrResultsAvailable


LogArchTwo_ArchAttrResultsAvailable

AttrTyp FuncNum ThrNumLogArchFour_ArchAttrResultsAvailable


checkLogicalArchitectureNumber

getLogicalArchitectureStatus

RETURN

getFunctionNumberStatus

RETURN

getThreadNumberStatus

RETURN

getAttributeTypeStatus

RETURN

DN_5

[LOC_LogicalArchitectureMatch == true]

LOC_LogicalArchitectureStatus = 3;


LOC_LogicalArchitectureStatus = 4;LOC_LogicalArchitectureStatus = 1;

LOC_ErrorFlag = true;

[else]

StartEventProcessing

LogArchOne_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchonearchattroAttrTyp

FuncNum

ThrNum

LogArchFour_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchfourarchattroAttrTyp

FuncNum

ThrNum

LogArchThree_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchthreearchattroAttrTyp

FuncNum

ThrNum

LogArchTwo_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchtwoarchattroAttrTyp

FuncNum

ThrNum

StartRetrieveLogicalArchitectureAttributes


setStartEventParams

AttrTyp

FuncNumThrNumLogArchNum

DN_1[LOC_LogicalArchitectureCommand == 1]

DN_2

[LOC_LogicalArchitectureCommand == 2]

[else]


[else]


[else]


[else]














RETURN


RETURN


RETURN


RETURN

DN_5






[else]



FuncNum

ThrNum


FuncNum

ThrNum


FuncNum

ThrNum


FuncNum

ThrNum



setStartEventParams

AttrTyp



DN_2


[else]


[else]


[else]


[else]














RETURN


RETURN


RETURN


RETURN

DN_5






[else]



FuncNum

ThrNum


FuncNum

ThrNum


FuncNum

ThrNum


FuncNum

ThrNum



setStartEventParams

AttrTyp



DN_2


[else]


[else]


[else]


[else]

149

Figure A-5 LogArch One Functional Thread One Architecture IBD

ibd [Block] CMP_Candidate_1_LogicalArchitecture [IBD_LogArchOne]






attrparamsout

IB_LogArchOne_FunctionalThreads

la1_numfthrout

IB_LogArchOne_ThrOne_NumberFunctions, IB_LogArchOne_ThrTwo_NumberFunctions, IB_LogArchOne_ThrThree_NumberFunctions, IB_LogArchOne_ThrFour_NumberFunctions


CMP_LogArchOne_ThreadOnePart1

IB_LogArchOneThrOne_InputDataSize

logarchonethroneindatsiziIB_LogArchOneThrOne_ArchAttrResultsAvailable

rsltlogarchonethronearchattro

IB_LogArchOneThrOne_StartRetrieveArchAttrs

strtlogarchonethronearchattri


attrparamsout

IB_LogArchOne_ThrOne_NumberFunctions

la1_ft1_numfuncout

The model currently supports 4 functional threads per logical architecture. The model can be extended if more are required.

CMP_LogArchOne_ThreadTwoPart1

IB_LogArchOneThrTwo_InputDataSize

logarchonethrtwoindatsizi


rsltlogarchonethrtwoarchattro

IB_LogArchOneThrTwo_StartRetrieveArchAttrs

strtlogarchonethrtwoarchattri


attrparamsout

IB_LogArchOne_ThrTwo_NumberFunctions

la1_ft2_numfuncout

CMP_LogArchOne_ThreadThreePart1

IB_LogArchOneThrThree_InputDataSize

logarchonethrthreeindatsizi

IB_LogArchOneThrThree_ArchAttrResultsAvailable

rsltlogarchonethrthreearchattro

IB_LogArchOneThrThree_StartRetrieveArchAttrs

strtlogarchonethrthreearchattri


attrparamsout

IB_LogArchOne_ThrThree_NumberFunctions

la1_ft3_numfuncout

CMP_LogArchOne_ThreadFourPart1

IB_LogArchOneThrFour_InputDataSize

logarchonethrfourindatsiziIB_LogArchOneThrFour_ArchAttrResultsAvailable

rsltlogarchonethrfourarchattro

IB_LogArchOneThrFour_StartRetrieveArchAttrs

strtlogarchonethrfourarchattri


attrparamsout

IB_LogArchOne_ThrFour_NumberFunctions

la1_ft4_numfuncout

CMP_LogArchOne_ExecutionControlPart1

IB_LogArchOneThrTwo_StartRetrieveArchAttrs

strtlogarchonethrtwoarchattro


strtlogarchonethrthreearchattro

IB_LogArchOneThrFour_StartRetrieveArchAttrs

strtlogarchonethrfourarchattro


strtlogarchonethronearchattro

IB_LogArchOneThrTwo_ArchAttrResultsAvailable

rsltlogarchonethrtwoarchattri

IB_LogArchOneThrThree_ArchAttrResultsAvailable

rsltlogarchonethrthreearchattri

IB_LogArchOneThrFour_ArchAttrResultsAvailable

rsltlogarchonethrfourarchattri

IB_LogArchOneThrOne_ArchAttrResultsAvailable

rsltlogarchonethronearchattri IB_LogArchOne_StartRetrieveArchAttrs




CMP_LogArchOneThrOne_DataInputSizePart1


logarchonethroneindatsizo

CMP_LogArchOneThrFour_DataInputSizePart1

IB_LogArchOneThrFour_InputDataSize

logarchonethrfourindatsizo

CMP_LogArchOneThrThree_DataInputSizePart1

IB_LogArchOneThrThree_InputDataSize

logarchonethrthreeindatsizo

CMP_LogArchOneThrTwo_DataInputSizePart1

IB_LogArchOneThrTwo_InputDataSize

logarchonethrtwoindatsizo

150

Figure A-6 LogArch Functional Threads Perform Computations Exec Ctl AD

act [Block] CMP_LogArchOne_ExecutionControl [ACT_LogArchOneExecutionControl]


LogArchOne_ArchAttrResultsAvailable(ThrNum, FuncNum, AttrTyp) to rsltlogarchonearchattro



AttrTyp FuncNum

LogArchOneThrThree_ArchAttrResultsAvailable

AttrTyp FuncNum

LogArchOneThrFour_ArchAttrResultsAvailable

AttrTyp FuncNum

LogArchOneThrOne_ArchAttrResultsAvailable

AttrTyp FuncNum

LogArchOneThrTwo_ArchAttrResultsAvailable

ArchTyp FuncNum


RETURN


RETURN

checkThreadNumber


RETURN

DN_5

[LOC_ThreadNumberMatch == true]

LOC_ThreadNumberStatus = 3;LOC_ThreadNumberStatus = 2;LOC_ThreadNumberStatus = 1;


[else]

LOC_ThreadNumberStatus = 4;


LogArchOneThrTwo_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrtwoarchattroAttrTyp

FuncNum

LogArchOneThrThree_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrthreearchattroAttrTyp

FuncNum

LogArchOneThrOne_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethronearchattroAttrTyp

FuncNum

LogArchOneThrFour_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrfourarchattroAttrTyp

FuncNum

LogArchOne_StartRetrieveArchAttrs

AttrTypFuncNumThrNum

setStartEventParams


DN_1[LOC_ThreadNumberCommand == 1]


[else]


[else]


[else]


[else]





AttrTyp FuncNum


AttrTyp FuncNum


AttrTyp FuncNum


AttrTyp FuncNum


ArchTyp FuncNum


RETURN


RETURN

checkThreadNumber


RETURN

DN_5




[else]




FuncNum


FuncNum


FuncNum


FuncNum



setStartEventParams




[else]


[else]


[else]


[else]





AttrTyp FuncNum


AttrTyp FuncNum


AttrTyp FuncNum


AttrTyp FuncNum


ArchTyp FuncNum


RETURN


RETURN

checkThreadNumber


RETURN

DN_5




[else]




FuncNum


FuncNum


FuncNum


FuncNum



setStartEventParams




[else]


[else]


[else]


[else]

151

Figure A-7 LogArch One Functional Thread One Architecture IBD

ibd [Block] CMP_LogArchOne_ThreadOne [IBD_LogArchOne_ThreadOne]


logarchonethroneindatsizi






attrparamsout

IB_LogArchOne_ThrOne_NumberFunctions

la1_ft1_numfuncout

CMP_LogArchOne_ThreadOne_FunctionOnePart1



IB_LogArchOneThrOneFuncOne_ArchAttrResultsAvailable

rsltlogarchonethronefunconearchattro

IB_LogArchOneThrOneFuncOne_StartRetrieveArchAttrs

strtlogarchonethronefunconearchattri

AlgorithmPerformanceInterfaceBlock, AlgorithmEnergyInterfaceBlock, AlgorithmThermalInterfaceBlock

attrparamso

CMP_LogArchOne_ThreadOne_FunctionTwoPart1



IB_LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable

rsltlogarchonethronefunctwoarchattro

IB_LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs

strtlogarchonethronefunctwoarchattri


attrparamsout

CMP_LogArchOne_ThreadOne_FunctionThreePart1



IB_LogArchOneThrOneFuncThree_ArchAttrResultsAvailable

rsltlogarchonethronefuncthreearchattro

IB_LogArchOneThrOneFuncThree_StartRetrieveArchAttrs

strtlogarchonethronefuncthreearchattri


attrparamsout

CMP_LogArchOne_ThreadOne_FunctionFourPart1


logarchonethroneindatsizi IB_LogArchOneThrOneFuncFour_StartRetrieveArchAttrs

strtlogarchonethronefuncfourarchattri

IB_LogArchOneThrOneFuncFour_ArchAttrResultsAvailable

rsltlogarchonethronefuncfouarchattro


attrparamsout

The model currently supports 4 functions per thread. The model can be extended if more are required.

CMP_LogArchOneThrOne_ExecutionControlPart1



IB_LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs

strtlogarchonethronefunctwoarchattro


strtlogarchonethronefunconearchattro


strtlogarchonethronefuncthreearchattro

IB_LogArchOneThrOneFuncFour_StartRetrieveArchAttrs

strtlogarchonethronefuncfourarchattro

IB_LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable

rsltlogarchonethronefunctwoarchattri

IB_LogArchOneThrOneFuncThree_ArchAttrResultsAvailable

rsltlogarchonethronefuncthreearchattri

IB_LogArchOneThrOneFuncFour_ArchAttrResultsAvailable

rsltlogarchonethronefuncfourarchattri


rsltlogarchonethronefunconearchattri



152

Figure A-8 Functional Thread Functions Perform Computations Exec Ctl AD

act [Block] CMP_LogArchOneThrOne_ExecutionControl [ACT_LogArchOneThrOne_ExecutionControl]

ResultaAvailableProcessing

LogArchOneThrOne_ArchAttrResultsAvailable(FuncNum, AttrTyp) to rsltlogarchonethronearchattro

AttrTyp FuncNum


AttrTyp

LogArchOneThrOneFuncFour_ArchAttrResultsAvailable

AttrTyp

LogArchOneThrOneFuncThree_ArchAttrResultsAvailable

AttrTyp

LogArchOneThrOneFuncOne_ArchAttrResultsAvailable

AttrTyp

LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable

AttrTyp

checkFunctionNumber


RETURN


RETURN

DN_5

[LOC_FunctionNumberMatch == true]

LOC_FunctionNumberStatus = 4;LOC_FunctionNumberStatus = 3;LOC_FunctionNumberStatus = 1; LOC_FunctionNumberStatus = 2;


[else]


LogArchOneThrOneFuncOne_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunconearchattroAttrTyp

LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunctwoarchattroAttrTyp

LogArchOneThrOneFuncThree_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncthreearchattroAttrTyp

LogArchOneThrOneFuncFour_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncfourarchattroAttrTyp

LogArchOneThrOne_StartRetrieveArchAttrs

AttrTypFuncNum

setStartEventParameters

AttrTypFuncNum

DN_1[LOC_FunctionNumberCommand == 1]



[else]


[else]

[else]


[else]



AttrTyp FuncNum


AttrTyp


AttrTyp


AttrTyp


AttrTyp


AttrTyp

checkFunctionNumber


RETURN


RETURN

DN_5




[else]







AttrTypFuncNum


AttrTypFuncNum




[else]


[else]

[else]


[else]



AttrTyp FuncNum


AttrTyp


AttrTyp


AttrTyp


AttrTyp


AttrTyp

checkFunctionNumber


RETURN


RETURN

DN_5




[else]







AttrTypFuncNum


AttrTypFuncNum




[else]


[else]

[else]


[else]

153

Figure A-9 Architecture Attributes Per Function IBD

ibd [Block] CMP_LogArchOne_ThreadOne_FunctionOne [IBD_LogArchOneThrOneFuncOne]








attrparamso

CMP_LogArchOneThrOneFuncOne_PerfAttrPart1



AlgorithmPerformanceInterfaceBlock

attrperfparamso

IB_LogArchOneThrOneFuncOne_StartRetrievePerfAttrs

strtlogarchonethronefunconeperfattri


rsltlogarchonethronefunconeperfattro

This IBD currently only supports the Performance Attribute. Subsequent versions will add support for the Energy and Thermal attributes. This model also requires the addition of a function that computes the performance computations required for this function.

CMP_LogArchOneThrOneFuncOne_EnergyAttrPart1

AlgorithmEnergyInterfaceBlock

attrenerparamsoIB_LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs

strtlogarchonetheonefunconeenerattri

IB_LogArchOneThrOneFuncOne_EnerAttrResultsAvailable

rsltlogarchonethronefunconeenerattro

CMP_LogArchOneThrOneFuncOne_ThermAttrPart1

AlgorithmThermalInterfaceBlock

attrthermparamsoIB_LogArchOneThrOneFuncOne_StartRetrieveThermAttrs

strtlogarchonethronefunconethermattri

IB_LogArchOneThrOneFuncOne_ThermAttrResultsAvailable

rsltlogarchonethronefunconethermattro

CMP_LogArchOneThrOneFuncOne_ExecutionControlPart1

IB_LogArchOneThrOneFuncOne_ThermAttrResultsAvailable

rsltlogarchonethronefunconethermattri

IB_LogArchOneThrOneFuncOne_StartRetrieveThermAttrs

strtlogarchonethronefunconethermattro

IB_LogArchOneThrOneFuncOne_EnerAttrResultsAvailable

rsltlogarchonethronefunconeenerattri

IB_LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs

strtlogarchonetheonefunconeenerattro

IB_LogArchOneThrOneFuncOne_StartRetrievePerfAttrs

strtlogarchonethronefunconeperfattro

IB_LogArchOneThrOneFuncOne_PerfAttrResultsAvailable

rsltlogarchonethronefunconeperfattri





154

Figure A-10 Architecture Attributes Per Function Execution Control AD

act [Block] CMP_LogArchOneThrOneFuncOne_ExecutionControl [ACT_LogArchOneThrOneFuncOne_ExecutionControl]

PropagateAttributeResults

LogArchOneThrOneFuncOne_ArchAttrResultsAvailable(AttrTyp) to rsltlogarchonethronefunconearchattro

AttrTyp

LogArchOneThrOneFuncOne_ThermAttrResultsAvailableLogArchOneThrOneFuncOne_EnerAttrResultsAvailable LogArchOneThrOneFuncOne_PerfAttrResultsAvailable


RETURN

checkAttributeType

DN_4

[LOC_AttributeTypeMatch == true]

LOC_AttributeTypeStatus = Performance;LOC_AttributeTypeStatus = Energy;

LOC_AttributeTypeStatus = Thermal;


[else]

InitiateAttributeRetrieval

LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs to strtlogarchonetheonefunconeenerattro

LogArchOneThrOneFuncOne_StartRetrieveThermAttrs to strtlogarchonethronefunconethermattro

LogArchOneThrOneFuncOne_StartRetrievePerfAttrs to strtlogarchonethronefunconeperfattro

LogArchOneThrOneFuncOne_StartRetrieveArchAttrs

AttrTyp

setEventParams

AttrTyp

DN_3[LOC_AttributeType == Thermal]

DN_1[LOC_AttributeType == Energy]

DN_2[LOC_AttributeType = Performance]

[else]

[else]


[else]



AttrTyp



RETURN

checkAttributeType

DN_4





[else]






AttrTyp

setEventParams

AttrTyp




[else]

[else]


[else]



AttrTyp



RETURN

checkAttributeType

DN_4





[else]






AttrTyp

setEventParams

AttrTyp




[else]

[else]


[else]

155

Figure A-11 Performance Attribute Per Function Execution Control AD

act [Block] CMP_LogArchOne_ThreadOne_FunctionOne_PerformanceAttrBlock[ACT_LogArchOneThrOneFuncOnePerformanceAttribute]

LogArchOneThrOneFuncOne_StartRetrievePerfAttrs

setPerformanceComputationValues

LogArchOneThrOneFuncOne_PerfAttrResultsAvailable to rsltlogarchonethronefunconeperfattro

156

Figure A-12 PhyArch Container IBD (Three Candidates)

ibd [Block] CMP_PhysicalArchitectureContainer [IBD_CMP_PhysArch_Container]


rsltarchattri


strtarchattro


attrparamsin


thrconstriIB_ThreadWeight

trrweighti

IB_AttributeWeight

attrweighti

IB_LogicalArchContainerOutlogarchconout


rsltphysarchout

la_numfuncperthrin

la_numfuncthrin

IB_NumberPhysArchs

numphysarchsin


startanalin

IB_SimControl

simctlout

IB_SimulationData

simdatout

IB_SimMode

simmdin

IB_OptimControl

optctlo


optdatout

IB_OptimMode

optmdi

IB_AnalysisState

asi

IB_AnalysisMode

ati

CMP_Candidate_1_PhysicalArchitecturePart1


rsltarchattriIB_StartRetrieveLogicalArchitectureAttributesEvent

strtarchattro

attrparamsin


thrconstri

IB_ThreadWeight

thrweightiIB_AttributeWeight

attrweighti

la_numfuncperthrin

la_numfuncthrin


logarchconout

IB_PhysArchOneResultsAvailable

rsltphysarchoneIB_StartPhysArchOne

strtphysarchone

IB_ThreadOneSimData, IB_ThreadTwoSimData, IB_ThreadThreeSimData

simdatout

IB_SimControl

simctli


optdatoutIB_OptimControl

optctli

IB_SimMode

simmdin

IB_OptimMode

optmdi

IB_AnalysisMode

ati

IB_AnalysisState

asi



rsltarchattri


strtarchattro

attrparamsin


thrconstri

IB_ThreadWeight

trrweightiIB_AttributeWeight

attrweighti

la_numfuncthrin

la_numfuncperthrin


logarchconout

IB_PhysArchTwoResultsAvailable

rsltphysarchtwo

IB_StartPhysArchTwo

strtphysarchtwo

IB_SimulationData

simdatout

IB_SimControl

simctli

IB_DagThreadData


optctli

IB_SimMode

simmdin

IB_OptimMode

optmdi

IB_AnalysisState

asi

IB_AnalysisMode

ati



rsltarchattri


strtarchattro

attrparamsin



trrweighti

IB_AttributeWeight

attrweighti

la_numfuncthrin

la_numfuncperthrin


logarchconout

IB_PhysArchThreeResultsAvailable

rsltphysarchthree

IB_StartPhysArchThree

strtphysarchthree

IB_SimulationData

simdatout

IB_SimControl

simctli

IB_DagThreadData


optctli

IB_SimMode

simmdin

IB_OptimMode

optmdi

IB_AnalysisState

asi

IB_AnalysisMode

ati

CMP_PhysArch_ExecutionControlPart1


rsltphysarchout

IB_PhysArchThreeResultsAvailable

rsltphysarchthree

IB_PhysArchTwoResultsAvailable

rsltphysarchtwo


rsltphysarchone

IB_NumberPhysArchs

numphysarchsinIB_StartPhysArchThree

strtphysarchthree

IB_StartPhysArchTwo

strtphysarchtwo

IB_StartPhysArchOne

strtphysarchone


startanalin

The model supports four physicalarchitecture containers. Three are shown in this diagram.

157

Figure A-13 PhyArche Container Execution Control AD

act [Block] CMP_PhysArch_ExecutionControl[ACT_PhysArch_ExecutionControl]

FinishControl

PhysArchThreeResultsAvailable

LOC_CurrentPhysArch = 0;

PhysArchResultsAvailable to rsltphysarchout

PhysicalArchitectureThreeControl

PhysArchTwoResultsAvailable

StartPhysicalArchitecture_3 to strtphysarchthree

LOC_CurrentPhysArch++;

DN_2[LOC_CurrentPhysArch < IN_NumPhysArchs]


[else]

PhysicalArchitectureTwoControl

PhysArchOneResultsAvailable

StartPhysicalArchitecture_2 to strtphysarchtwo




[else]

PhysicalArchitectureOneControl

StartArchitectureAnalysis

StartPhysicalArchitecture_1 to strtphysarchone

IN_NumPhysArchs =

numphysarchsin.getNumberPhysicalArchitectures();

FinishControl










[else]







[else]




IN_NumPhysArchs =


FinishControl










[else]







[else]




IN_NumPhysArchs =


FinishControl










[else]







[else]




IN_NumPhysArchs =


FinishControl










[else]







[else]




IN_NumPhysArchs =


158

Figure A-14 PhyArche One Container IBD (Three Threads Shown)

ibd [Block] CMP_Candidate_1_PhysicalArchitecture [IBD_Cand_1_PhysicalContainer]


rsltarchattri


strtarchattro


attrparamsin

IB_ThreadConstraintsthrconstri

IB_ThreadWeight

thrweighti

IB_AttributeWeight

attrweighti

la_numfuncperthrin

IB_LogArchOne_FunctionalThreads, IB_LogArchTwo_FunctionalThreads, IB_LogArchThree_FunctionalThreads, IB_LogArchFour_FunctionalThreads

la_numfuncthrin

IB_LogicalArchContainerOutlogarchconout


rsltphysarchone

IB_StartPhysArchOne

strtphysarchone

IB_ThreadOneSimData, IB_ThreadTwoSimData, IB_ThreadThreeSimData

simdatout

IB_SimControl

simctli


optdatout

IB_OptimControl

optctli

IB_SimMode

simmdin

IB_OptimModeoptmdi

IB_AnalysisMode

ati

IB_AnalysisState

asi

PhysArchOne_CpuClockPart1

IB_CpuClockRatio

cpuckrato

PhysArchOne_GpuClockBlock1

IB_GpuClockRatio

gpuckrato

PhysArchOne_NumberCpus1

IB_NumberCpus

numcpuo

PhysArchOne_NumberGpuThreads1

IB_NumberGpuThreads

numgpuo

CMP_PhysArchOne_ThreadOnePart1


rsltarchattri


strtarchattro



thrweighti

IB_AttributeWeight

attrweighti

la_numfuncperthrin

IB_StartPhysArchOneThrOne

strtphysarchonethronein

IB_PhysArchOneThrOneResultsAvailable

rsltphysarchonethroneout


logarchconout

IB_ThreadOneCostData

throneopto

IB_ThreadOneSimData

thronesimo

attrparamsin

IB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_AnalysisMode

atiIB_AnalysisState

asi

IB_GpuClockRatio

gpuckrati

IB_CpuClockRatio

cpuckrati

CMP_PhysArchOne_DagThreadsPart1

IB_ThreadThreeCostData

thrthreeopti

IB_ThreadTwoCostData

thrtwoopti

IB_ThreadOneCostData

throneopti

IB_OptimControl

optctli

IB_DagThreadData

optdagthrout

IB_OptimModeoptmdi

Model provides 4 threads. Three areshown in this diagram.

CMP_PhysArchOne_ThreadTwoPart1



strtarchattro

attrparamsinIB_ThreadConstraints


thrweighti

IB_AttributeWeight

attrweighti

la_numfuncperthrin

IB_StartPhysArchOneThrTwo

strtphysarchonethrtwoin

IB_PhysArchOneThrTwoResultsAvailable

rsltphysarchonethrtwoout


logarchconout

IB_ThreadTwoSimData

thrtwosimo

IB_ThreadTwoCostData

thrtwooptoIB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_GpuClockRatio

gpuckrati

IB_CpuClockRatio

cpuckrati

IB_AnalysisMode

atiIB_AnalysisState

asi

CMP_PhysArchOne_ThreadThreePart1


rsltarchattri


strtarchattro

attrparamsin



thrweighti

IB_AttributeWeight

attrweighti

la_numfuncperthrin

IB_StartPhysArchOneThrThree

strtphysarchonethrthreein

IB_PhysArchOneThrThreeResultsAvailable

rsltphysarchonethrthreeout


logarchconout

IB_ThreadThreeSimData

thrthreesimo

IB_ThreadThreeCostData

thrthreeoptoIB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_GpuClockRatio

gpuckrati

IB_CpuClockRatio

cpuckrati

IB_AnalysisMode

ati

IB_AnalysisState

asi

CMP_PhysArchOne_ExecutionControl1


logarchconout

la_numfuncthrin

IB_StartPhysArchOneThrTwo

strtphysarchonethrtwoout

IB_StartPhysArchOneThrThree

strtphysarchonethrthreeout

IB_StartPhysArchOneThrOne

strtphysarchonethroneout

IB_PhysArchOneThrThreeResultsAvailable

rsltphysarchonethrthreein

IB_PhysArchOneThrTwoResultsAvailable

rsltphysarchonethrtwoin

IB_PhysArchOneThrOneResultsAvailable

rsltphysarchonethronein


rsltphysarchone

IB_StartPhysArchOne

strtphysarchone

159

Figure A-15 PhyArch One Container Execution Control AD

act [Block] CMP_PhysArchOne_ExecutionControl [ACT_PhysArchOneContainer]

FinishThreadControl

PhysArchOneResultsAvailable to rsltphysarchone

PhysArchOneThrThreeResultsAvailable

LOC_CurrentPhysArchOneThread = 0;

PhysArchOneThrThreeControl

StartPhysArchOneThreadThree to strtphysarchonethrthreeout PhysArchOneResultsAvailable to rsltphysarchone

PhysArchOneThrTwoResultsAvailable

DN_2[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]

LOC_CurrentPhysArchOneThread++;

PhysArchOneThrTwoControl

StartPhysArchOneThreadTwo to strtphysarchonethrtwoout PhysArchOneResultsAvailable to rsltphysarchone

PhysArchOneThrOneResultsAvailable



PhysArchOneThrOneControl

StartPhysArchOneThreadOne to strtphysarchonethroneout

StartPhysicalArchitecture_1

LOC_CurrentLogicalArch =

logarchconout.getCurrentLogicalArchitecture();

IN_PhysArchOneNumberThreads =

determineNumberThreads();

FinishThreadControl





















FinishThreadControl





















FinishThreadControl





















FinishThreadControl





















160

Figure A-16 PhyArch One Container Thread One IBD (Three Functions Shown)

161

Figure A-17 PhyArch One Thread One Container Execution Control AD

act [Block] CMP_PhysArchOneThrOne_ExecutionControl [ACT_PhysArchOneThreadOneExecutionContainer]

FinishThreadFunctionControl

PhysArchOneThrOneFuncThreeResultsAvailable

LOC_CurrentPhysArchOneThreadOneFunction =

0;

PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout

PhysArchOneThrOneFuncThreeControl

PhysArchOneThrOneFuncTwoResultsAvailable


LOC_CurrentPhysArchOneThreadOneFunction++

;

DN_2

[else]

StartPhysArchOneThreadOneFunctionThree to strtphysarchonethronefuncthreeout

[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions]

PhysArchOneThrOneFuncTwoControl

PhysArchOneThrOneFuncOneResultsAvailable

LOC_CurrentPhysArchOneThreadOneFunction++;

StartPhysArchOneThreadOneFunctionTwo to strtphysarchonethronefunctwoout PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout

DN_1[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions] [else]

PhysArchOneThrOneFuncOneControl

StartPhysArchOneThreadOne



IN_PhysArchOneThreadOneNumberFunctions =

determineNumberThreadFunctions();

StartPhysArchOneThreadOneFunctionOne to strtphysarchonethronefunconeout




0;






;

DN_2

[else]


















0;






;

DN_2

[else]


















0;






;

DN_2

[else]


















0;






;

DN_2

[else]















162

Figure A-18 PhyArch One Container Thread One Function One IBD

ibd [Block] CMP_PhysArchOne_ThreadOne_FunctionOne [IBD_PhysArchOneThrOneFuncOne]



strtarchattro

IB_ThreadConstraintsthrconstri IB_AttributeWeightattrweighti

IB_PhysArchOneThrOneFuncOne

strtphysarchonethronefunconein

IB_PhysArchOneThrOneFuncOneResultsAvailable

rsltphysarchonethronefunconeout

IB_ThrOneFuncOneSimData

thronefunconesimo

IB_ThrOneFuncOneCostData

thronefunconeopto

IB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui


logarchconout

IB_GpuClockRatio

gpuckrati

IB_CpuClockRatio

cpuckrati

IB_AnalysisMode

ati


attrparamsin

IB_AnalysisState

asiCMP_PhysArchOneThrOneFuncOne_ExecutionControlPart1

IB_ResultPhysArchOneThrOneFuncOneComplete

rsltpaonethronefunconecompi

IB_ThreadFunctionIdthronefunconeidi

IB_FuncPhysAttributeComputations

strtfuncphysattrcompo IB_PhysArchOneThrOneFuncOneResultsAvailable

rsltphysarchonethronefunconeo

IB_PhysArchOneThrOneFuncOne

strtphysarchonethronefunconei

This model is replicated for each function of each thread.

CMP_FunctionPhysicalContainerPart1


rsltarchattri


strtarchattro


strtfuncphysattrcompi

IB_FuncPhysAttributeComputationResultsAvailable

rsltfuncphysattrcompo


logarchconout

IB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_GpuClockRatio

gpuckrati

IB_FuncPhysAttributeData

funcphysattrdatao

IB_CpuClockRatio

cpuckrati

attrparamsin

IB_AnalysisMode

atiIB_AnalysisState

asi

CMP_PhysArchOneThrOneFuncOne_IdPart1

IB_ThreadFunctionId

thronefunconeido

CMP_PhysArchOneThrOneFuncOne_InterfacePart1

IB_ThreadConstraintsthrconstri IB_AttributeWeightattrweighti

IB_ResultPhysArchOneThrOneFuncOneComplete

rsltpaonethronefunconecompo

IB_ThreadFunctionIdthronefunconeidi

IB_ThrOneFuncOneSimData

thronefunconesimo

IB_ThrOneFuncOneCostData

thronefunconeopto


funcphysattrdatai


rsltfuncphysattrcompi

163

Figure A-19 PhyArch One Thread One Function One Container Exec Ctl AD

act [Block] CMP_PhysArchOneThrOneFuncOne_ExecutionControlBlock [ACT_PhysArchOneThrOneFuncOneExecutionControl]

CompleteFunction

PhysArchOneThrOneFuncOneResultsAvailable to rsltphysarchonethronefunconeout

RsltPhysArchOneThrOneFuncOneComplete

ProcessFunctionAttributes

StartFuncPhysAttributeComputations(ThrNum, FuncNum) to strtfuncphysattrcompout

FuncNum ThrNum

StartPhysArchOneThreadOneFunctionOne

getThreadNumber

RETURN

getFunctionNumber

RETURN

InitialState

LOC_ThrOneFuncOneFunctionNumber =

thronefunconeidi.getFunctionNumber();

LOC_ThrOneFuncOneThreadNumber =

thronefunconeidi.getThreadNumber();

CompleteFunction





FuncNum ThrNum


getThreadNumber

RETURN

getFunctionNumber

RETURN

InitialState





CompleteFunction





FuncNum ThrNum


getThreadNumber

RETURN

getFunctionNumber

RETURN

InitialState





CompleteFunction





FuncNum ThrNum


getThreadNumber

RETURN

getFunctionNumber

RETURN

InitialState





164

Figure A-20 PhyArch One Thread One Function One Container Interface AD

act [Block] CMP_PhysArchOneThrOneFuncOne_InterfaceBlock [ACT_PhysArchOneThrOneFuncOneInterface]


RsltPhysArchOneThrOneFuncOneComplete to rsltpaonethronefunconecompo

computeFunctionOneCosts

retrieveFuncPhysAttributeValues

FuncPhysAttributeComputationResultsAvailable

FuncNum ThrNum

setReceivedThreadFunctionValues

FuncNum ThrNum

DN_1[LOC_ReceivedMatch == True]

LOC_ReceivedMatch = False;

LOC_ReceivedCount++;

[else]

InitialState

LOC_ThrOneFuncOneFuncNum =


LOC_ThrOneFuncOneThrNum =







FuncNum ThrNum


FuncNum ThrNum




[else]

InitialState










FuncNum ThrNum


FuncNum ThrNum




[else]

InitialState





165

Figure A-21 Function Physical Attribute Computation Container IBD

ibd [Block] CMP_FunctionPhysicalContainerBlock [IBD_FunctionPhysicalContainer]


rsltarchattri


strtarchattro


strtfuncphysattrcompi




logarchconoutIB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_GpuClockRatio

gpuckrati


funcphysattrdatao

IB_CpuClockRatio

cpuckrati


attrparamsin

IB_AnalysisMode

ati

IB_AnalysisState

asi

CMP_FuncPhys_EnergyContainerPart1

IB_EnergyComputationsResultsAvailable

rsltenercompi


strtarchattro

IB_StartFuncPhysEnergyComputation

strtenerattrcompi

IB_FuncPhysEnergyResultsAvailable

rsltfuncphysenergyo

IB_FuncPhysEnergyData

funcphysenerdataoIB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_GpuClockRatio

gpuckrati

IB_CpuClockRatio

cpuckrati

attrenergyi

AlgorithmEnergyInterfaceBlock

IB_AnalysisMode

atiIB_AnalysisState

asi

The current model supports energy, performance, and thermal attributes. The model can be expanded to support other attributes (e.g. reliability, risk, etc.).

CMP_FuncPhys_PerformanceContainerPart1

IB_PerformanceComputationsResultsAvailable

rsltperfcompi


strtarchattro

IB_StartFuncPhysPerformanceComputation

strtperfattrcompi

IB_FuncPhysPerformanceResultsAvailable

rsltfuncphysperfo

IB_FuncPhysPerformanceData

funcphysperfdatoIB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_GpuClockRatio

gpuckrati

IB_CpuClockRatio

cpuckrati

AlgorithmPerformanceInterfaceBlock

attrperfi

IB_AnalysisMode

atiIB_AnalysisState

asi

CMP_FuncPhys_ThermalContainerPart1

IB_ThermalComputationResultsAvailable

rsltthermcompi


strtarchattro

IB_StartFuncPhysThermalComputations

strtthermattrcompi

IB_FuncPhysThermalResultsAvailable

rsltfuncphysthermalo

IB_FuncPhysThermalData

funcphystherdatoIB_NumberGpuThreads

numgpui

IB_NumberCpus

numcpui

IB_GpuClockRatio

gpuckrati

IB_CpuClockRatio

cpuckrati

attrthermali

AlgorithmThermalInterfaceBlock

IB_AnalysisMode

atiIB_AnalysisState

asi

CMP_FuncPhys_AttributeExecutionControlPart1

rsltthermcompo

IB_ThermalComputationResultsAvailable

rsltenercompo

IB_EnergyComputationsResultsAvailable

rsltperfcompo

IB_PerformanceComputationsResultsAvailable


rsltarchattri


logarchconout




strtfuncphysattrcompiIB_StartFuncPhysEnergyComputation

strtenerattrcompo

IB_StartFuncPhysPerformanceComputation

strtperfattrcompo

IB_StartFuncPhysThermalComputations

strtthermattrcompo

IB_FuncPhysThermalResultsAvailable

rsltfuncphysthermali

IB_FuncPhysEnergyResultsAvailable

rsltfuncphysenergyi

IB_FuncPhysPerformanceResultsAvailable

rsltfuncphysperfi

166

Figure A-22 Function Physical Attribute Comp Container Execution Control AD

act [Block] CMP_FunctionPhysicalAttributeExecutionControlBlock [ACT_FuncPhysAttributeExecutionControl]

ProcessAttributeComputationsAvailableProcessing

ThermalComputationResultsAvailable to rsltthermcompo

EnergyComputationResultsAvailable to rsltenercompo

PerformanceComputationResultsAvailable to rsltperfcompo


AttrTypLogArch FuncNumThrNum

ArchitectureAttributeResultsAvailable


checkCommandStatusParams

DN_4[LOC_AttributeTypeStatus == Thermal]

DN_3[LOC_AttributeTypeStatus == Energy]

[else]

DN_2[LOC_AttributeTypeStatus == Performance]

[else]

DN_1

[LOC_ParamMatch == true]LOC_ErrorFlag = true;

[else]


[else]

FinishAttributeComputationProcessing

FuncPhysAttributeComputationResultsAvailable(ThrNum, FuncNum) to rsltfuncphysattrcompo

FuncNumThrNum

CMP_FuncPhys_ThermalResultsAvailable

getFunctionNumber

RETURN

getThreadNumber

RETURN

StartThermalAttributeComputations

StartFuncPhysThermalAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtthermattrcompo

FuncNumThrNum LogArchNum

getLogicalArchitectureNumber

RETURN

CMP_FuncPhys_EnergyResultsAvailable

getFunctionNumber

RETURN

getThreadNumber

RETURN

StartEnergyAttributeComputations

StartFuncPhysEnergyAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtenerattrcompo


CMP_FuncPhys_PerformanceResultsAvailable


RETURN

getFunctionNumber

RETURN

getThreadNumber

RETURN

StartPerformanceAttributeComputations

StartFuncPhysPerformanceAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtperfattrcompo

LogArchNumFuncNumThrNum

StartFuncPhysAttributeComputations

FuncNumThrNum


RETURN

setLogicalArchitectureNumber

setStartEventParams

FuncNumThrNum

getFunctionNumber

RETURN

getThreadNumber

RETURN












[else]


[else]

DN_1


[else]


[else]



FuncNumThrNum


getFunctionNumber

RETURN

getThreadNumber

RETURN





RETURN


getFunctionNumber

RETURN

getThreadNumber

RETURN






RETURN

getFunctionNumber

RETURN

getThreadNumber

RETURN





FuncNumThrNum


RETURN


setStartEventParams

FuncNumThrNum

getFunctionNumber

RETURN

getThreadNumber

RETURN












[else]


[else]

DN_1


[else]


[else]



FuncNumThrNum


getFunctionNumber

RETURN

getThreadNumber

RETURN





RETURN


getFunctionNumber

RETURN

getThreadNumber

RETURN






RETURN

getFunctionNumber

RETURN

getThreadNumber

RETURN





FuncNumThrNum


RETURN


setStartEventParams

FuncNumThrNum

getFunctionNumber

RETURN

getThreadNumber

RETURN












[else]


[else]

DN_1


[else]


[else]



FuncNumThrNum


getFunctionNumber

RETURN

getThreadNumber

RETURN





RETURN


getFunctionNumber

RETURN

getThreadNumber

RETURN






RETURN

getFunctionNumber

RETURN

getThreadNumber

RETURN





FuncNumThrNum


RETURN


setStartEventParams

FuncNumThrNum

getFunctionNumber

RETURN

getThreadNumber

RETURN












[else]


[else]

DN_1


[else]


[else]



FuncNumThrNum


getFunctionNumber

RETURN

getThreadNumber

RETURN





RETURN


getFunctionNumber

RETURN

getThreadNumber

RETURN






RETURN

getFunctionNumber

RETURN

getThreadNumber

RETURN





FuncNumThrNum


RETURN


setStartEventParams

FuncNumThrNum

getFunctionNumber

RETURN

getThreadNumber

RETURN












[else]


[else]

DN_1


[else]


[else]



FuncNumThrNum


getFunctionNumber

RETURN

getThreadNumber

RETURN





RETURN


getFunctionNumber

RETURN

getThreadNumber

RETURN






RETURN

getFunctionNumber

RETURN

getThreadNumber

RETURN





FuncNumThrNum


RETURN


setStartEventParams

FuncNumThrNum

getFunctionNumber

RETURN

getThreadNumber

RETURN

167

Figure A-23 Function Physical Performance Attribute IBD

168

Figure A-24 Function Physical Performance Attribute Execution Control AD

act [Block] CMP_FuncPhys_PerformanceExecutionControlBlock [ACT_FuncPhysPerformanceExecutionControl]

FinishPerformanceAttributeProcessing

CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo

CMP_FuncPhys_GpuPerformanceResultsAvailable

DBG_GpuEventsReceived = DBG_GpuEventsReceived + 1;

LOC_LogicalArchitectureNumber = 0;

LOC_ThreadNumber = 0;

LOC_FunctionNumber = 0;

LOC_NumberCpus = 0;

LOC_NumberGpuThreads = 0;

LOC_CurrentCrId = SC;

InitiateGpuPerformanceAttributeProcessing


StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo

CMP_FuncPhys_McPerformanceResultsAvailable

setGpuComputations

DN_3[LOC_NumberGpuThreads > 0]

DBG_McEventsReceived = DBG_McEventsReceived + 1;




LOC_NumberCpus = 0;

[else]

LOC_CurrentCrId = GPU;

InitiateMcOrGpuPerformanceAttributeProcessing


StartFuncPhysMcPerformanceAttributeComputations to strtmcperfattrcompo


CMP_FuncPhys_ScPerformanceResultsAvailable

setGpuComputations

setMcComputations

DN_1[LOC_NumberCpus > 1]


[else]

DBG_ScEventsReceived++;




[else]

LOC_NumberCpus = numcpui.getNumberCpus();

LOC_NumberGpuThreads = numgpui.getNumberGpuThreads();

LOC_CurrentCrId = MC;


InitiateScPerformanceAttributeProcessing

StartFuncPhysScPerformanceAttributeComputations to strtscperfattrcompo

PerformanceComputationResultsAvailable

setScComputations









LOC_NumberCpus = 0;







setGpuComputations






LOC_NumberCpus = 0;

[else]







setGpuComputations

setMcComputations



[else]





[else]








setScComputations









LOC_NumberCpus = 0;







setGpuComputations






LOC_NumberCpus = 0;

[else]







setGpuComputations

setMcComputations



[else]





[else]








setScComputations









LOC_NumberCpus = 0;







setGpuComputations






LOC_NumberCpus = 0;

[else]







setGpuComputations

setMcComputations



[else]





[else]








setScComputations









LOC_NumberCpus = 0;







setGpuComputations






LOC_NumberCpus = 0;

[else]







setGpuComputations

setMcComputations



[else]





[else]








setScComputations


169

Figure A-25 Retrieve Performance Computations AD

act [Block] CMP_FuncPhys_PerformanceComputationsBlock [ACT_AlgorithmComputationsTestContainer]

setParams


StartRetrieveLogicalArchitectureAttributes(LogArch, ThrNum, FuncNum, AttrTyp) to strtarchattro

AttrTyp FuncNumThrNumLogArch


RETURN

getThreadNumber

RETURN

getFunctionNumber

RETURN

getAttributeType

RETURN

StartFuncPhysPerformanceAttributeComputations


170

Figure A-26 Performance Single Core CM Container IBD

171

Figure A-27 Single Core CM Execution Control AD

act [Block] SC_ComputationModel_ExecutionControl [ACT_SC_CM_ExecutionControl]

StartFuncPhysScPerformanceAttributeComputations

DN_1

StartScAnalysisExecutionTimeComputationEvent to stscanalexeccompo

[LOC_AnalysisState == Analysis]

StartScSimulationExecutionTimeComputationEvent to stscsimexeccompo

[else]

LOC_AnalysisState = asi.getAnalysisState();

172

Figure A-28 Single Core CM Execution Time AD

act [Block] SC_CM_ExecutionTimeBlock [ACT_SC_CM_ExecutionTime]

SC_Simulation

CMP_FuncPhys_ScPerformanceResultsAvailable to rsltfuncphysscperfo

ScSimulationExecutionTimeAvailableEvent

SC_ExecutionTime =

scsimexeci.getScSimulationExecutionTime();

SC_Analysis


ScAnalysisExecutionTimeAvailableEvent

SC_Execution_Time =

scanalexeci.getScAnalysisExecutionTime();

SC_Simulation



SC_ExecutionTime =


SC_Analysis



SC_Execution_Time =


SC_Simulation



SC_ExecutionTime =


SC_Analysis



SC_Execution_Time =


173

Figure A-29 Single Core CM Analysis Container IBD

ibd [Block] SC_CM_AnalysisContainer [IBD_SC_CM_AnalysisContainer]

IB_ScAnalysisExecutionTime

scanalexeco

IB_ScAnalysisExecutionTimeEvent

scanalexecevo

IB_StartScAnalysisExecutionTimeComputationEvent

stscanalexeccompi

IB_CpuClockRatio

cpuckrati

IB_AnalysisMode

ati

intcompi

floatcompi

misccompi

arctrigcompi

cmplxcompi

trigcompi

SC_AnalComputation_CmplxContainerPart1

StartCmplxExecutionTimeComputationEventInterfaceBlock

stcmplxexeccompi

CmplxExecutionTimeEventInterfaceBlock

cmplxexeci

CmplxExecutionTimeInterfaceBlock

cmplxexeco

NumberCmplxAddsInterfaceBlock, NumberCmplxDivsInterfaceBlock, NumberComplexMulsInterfaceBlock

cmplxcompi

IB_CpuClockRatio

ckrat_2

IB_AnalysisMode

ati_2

SC_AnalComputation_TrigContainerPart1

TrigExecutionTimeEventInterfaceBlock

trigexeci

TrigExecutionTimeInterfaceBlock

trigexeco

StartTrigExecutionTimeComputationEventInterfaceBlock

sttrigexeccompi

NumberCosComputationsInterfaceBlock_V1, NumberSinComputationsInterfaceBlock_V1, NumberTanComputationsInterfaceBlock_V1

trigcompi

IB_CpuClockRatio

ckrati_1

IB_AnalysisMode

ati_1

SC_AnalysisExecutionTimePart1

IB_ScAnalysisExecutionTimeEvent

scanalexecevo

IntExecutionTimeInterfaceBlock

intexeci

IntExecutionTimeEventInterfaceBlock

intexeco

FloatExecutionTimeInterfaceBlock

floatexeci

FloatExecutionTimeEventInterfaceBlock

floatexeco

MiscExecutionTimeInterfaceBlock

miscexeci

MiscExecutionTimeEventInterfaceBlock

miscexeco

ArcTrigExecutionTimeInterfaceBlock

arctrigexeciArcTrigExecutionTimeEventInterfaceBlock

arctrigexeco


cmplxexeci


trigexeci


cmplxexeco

IB_ScAnalysisExecutionTime

scanalexeco


trigexeco

SC_PromulgateAnalysisExecutionTimeStartPart1

StartIntExecutionTimeComputationEventInterfaceBlock

stintexeccompi

StartFloatExecutionTimeComputationEventInterfaceBlock

stfloatexeccompi

StartMiscExecutionTimeComputationEventInterfaceBlock

stmiscexeccompi

StartArcTrigExecutionTimeComputationEventInterfaceBlock

starctrigexeccompi

StartCmplxExecutionTimeComputationEventInterfaceBlock

stcmplxexeccompi

StartTrigExecutionTimeComputationEventInterfaceBlock

sttrigexeccompi

IB_StartScAnalysisExecutionTimeComputationEvent

stscanalexeccompi

SC_AnalComputation__ArcTrigContainerPart1

ArcTrigExecutionTimeInterfaceBlock

arctrigexeco

NumberArcCosComputationsInterfaceBlock, NumberArcSinComputationsInterfaceBlock, NumberArcTanComputationsInterfaceBlock, NumberArcTanFourQuadComputationsInterfaceBlock

arctrigcompi

StartArcTrigExecutionTimeComputationEventInterfaceBlock

starctrigexeccompi

IB_CpuClockRatio

ckrati_1

IB_AnalysisMode

ati_1ArcTrigExecutionTimeEventInterfaceBlock

arctrigexeci

SC_AnalComputation_MiscContainerPart1

MiscExecutionTimeInterfaceBlock

miscexeco

MiscExecutionTimeEventInterfaceBlock

miscexeci

NumberLogComputationsInterfaceBlock, NumberSqrtComputationsInterfaceBlock

misccompi

StartMiscExecutionTimeComputationEventInterfaceBlock

stmiscexeccompi

IB_CpuClockRatio

ckrati_1

IB_AnalysisMode

ati_1

SC_AnalComputation_FloatContainerPart1

NumberFloatAddsInterfaceBlock, NumberFloatDivsInterfaceBlock, NumberFloatMulsInterfaceBlock

floatcompiFloatExecutionTimeInterfaceBlock

floatexeco

FloatExecutionTimeEventInterfaceBlock

floatexeci

StartFloatExecutionTimeComputationEventInterfaceBlock

stfloatexeccompi

IB_CpuClockRatio

ckrat_2

IB_AnalysisMode

ati_2

SC_AnalComputation_IntContainerPart1

NumberIntAddsInterfaceBlock, NumberIntDivsInterfaceBlock, NumberIntMulsInterfaceBlock

intcompi

IntExecutionTimeEventInterfaceBlock

intexeci

IntExecutionTimeInterfaceBlock

intexeco

IB_CpuClockRatio

ckrat_2

IB_AnalysisMode

ati_2

StartIntExecutionTimeComputationEventInterfaceBlock

stintexeccompi

The Single Core Computation Model currently supports Integer/Floating Point/Complex Add/Multiply/Divide, Cos/Sin/Tan trigonometric, ArcCos/ArcSin/ArnTan/ArcTanFourQuad trigonometric, and Log/Exp/Sqrt miscellaneous arithmetric operations. Other math operations can be added to the model. Single Core Computation Model Simulation Container has the same structure and behavior as the Analysis Container.

174

Figure A-30 Single Core CM Analysis Container Propagate Start Activity

act [Block] SC_PromulgateAnalysisExecutionStartEventBlock [ACT_PromulgateExecutionStartEventBlock]

StartScAnalysisExecutionTimeComputationEvent

setStartEventReceivedFlag

StartTrigExecutionTimeComputationEvent to sttrigexeccompi StartCmplxExecutionTimeComputationEvent to stcmplxexeccompi

StartArcTrigExecutionTimeComputationEvent to starctrigexeccompi StartMiscExecutionTimeComputationEvent to stmiscexeccompiStartFloatExecutionTimeComputationEvent to stfloatexeccompi

StartIntExecutionTimeComputationEvent to stintexeccompi

175

Figure A-31 Single Core CM Analysis Container Execution Time AD

act [Block] SC_AnalysisComputationTimeBlock [ACT_SC_ComputationTimeBlock]

TrigExecutionTimeAvailableEvent

CmplxComputationExecutionTimeAvailableEvent

computeExecutionTime

ScAnalysisExecutionTimeAvailableEvent to scanalexecevo

ArcTrigExecutionTimeAvailableEvent

MiscExecutionTimeAvailableEvent FloatExecutionTimeAvailableEvent

IntExecutionTimeAvailableEvent

176

Figure A-32 Single Core CM Analysis Complex Container IBD

ibd [Block] SC_AnalComputation_CmplxContainerBlock [IBD_SC_CmplxComputationContainer]

StartCmplxExecutionTimeComputationEventInterfaceBlockstcmplxexeccompi


cmplxexeci


cmplxexeco

NumberCmplxAddsInterfaceBlock, NumberCmplxDivsInterfaceBlock, NumberComplexMulsInterfaceBlock

cmplxcompi

IB_CpuClockRatio

ckrat_2

IB_AnalysisMode

ati_2 SC_AnalComputation_CmplxAddContainerPart1

ComplexAddExecutionTimeEventInterfaceBlock

cmplxaddexeci_1

StartCmplxAddExecutionTimeComputationEventInterfaceBlock

stcmplxaddexeccompi

ComplexAddExecutionTimeInterfaceBlock

cmplxaddexeco_1

NumberCmplxAddsInterfaceBlock

cmplxaddi_1

IB_CpuClockRatio

ckrat_1

IB_AnalysisMode

ati_1

SC_AnalComputation_CmplxMulContainerPart1

ComplexMulExecutionTimeEventInterfaceBlock

cmplxmulexeci_1

StartCmplxMulExecutionTimeComputationEventInterfaceBlock

stcmplxmulexeccompiNumberComplexMulsInterfaceBlock

cmplxmulcompi_1

ComplexMulExecutionTimeInterfaceBlock

cmplxmulexeco_1

IB_CpuClockRatio

ckrat_1

IB_AnalysisMode

ati_1

SC_AnalComputation_CmplxDivContainerPart1

ComplexDivExecutionTimeEventInterfaceBlock

cmplxdivexeci_1

StartCmplxDivExecutionTimeComputationEventInterfaceBlock

stcmplxdivexeccompi

NumberCmplxDivsInterfaceBlock

cmplxdivcompi_1

ComplexDivExecutionTimeInterfaceBlock

cmplxdivexeco_1

IB_CpuClockRatio

ckrat_1

IB_AnalysisMode

ati_1

SC_AnalysisCmplxExecutionTimePart1

ComplexDivExecutionTimeInterfaceBlock

cmplxdivexeci_2

ComplexMulExecutionTimeInterfaceBlock

cmplxmulexeci_2


cmplxaddexeci_2


cmplxexeci


cmplxexeco

ComplexMulExecutionTimeEventInterfaceBlock

cmplxmulexeco_2

ComplexDivExecutionTimeEventInterfaceBlock

cmplxdivexeco_2


cmplxaddexeco_2

SC_PromulgateAnalysisCmplxExecutionTimeStartPart1

StartCmplxMulExecutionTimeComputationEventInterfaceBlock

stcmplxmulexeccompi

StartCmplxDivExecutionTimeComputationEventInterfaceBlock

stcmplxdivexeccompi


stcmplxaddexeccompi

StartCmplxExecutionTimeComputationEventInterfaceBlockstcmplxexeccompi

Model structure is identical for Floating Point and Integer computations

177

Figure A-33 Single Core CM Analysis Complex Container Propagate Start Activity

act [Block] SC_PromulgateCmplxExecutionStartEventBlock [ACT_SC_PromulgateStartCmplxExecutionEvent]

StartCmplxExecutionTimeComputationEvent


StartCmplxAddExecutionTimeComputationEvent to stcmplxaddexeccompi

StartCmplxDivExecutionTimeComputationEvent to stcmplxdivexeccompi

StartCmplxMulExecutionTimeComputationEvent to stcmplxmulexeccompi

178

Figure A-34 Single Core CM Analysis Complex Container Execution Time AD

act [Block] SC_CmplxTimeBlock [ACT_CmplxTimeBlock]

ComplexAddExecTimeAvailableEvent ComplexDivExecTimeAvailableEvent ComplexMulExecTimeAvailableEvent

computeCmplxComputationExecutionTime

CmplxComputationExecutionTimeAvailableEvent to cmplxexeci

179

Figure A-35 Single Core CM Analysis Complex Add Container IBD

ibd [Block] SC_AnalComputation_CmplxAddContainerBlock [IBD_CmplxAddContainer]


cmplxaddexeci_1


stcmplxaddexeccompi


cmplxaddexeco_1


cmplxaddi_1

IB_CpuClockRatiockrat_1IB_AnalysisModeati_1

SC_Anal_ComplexAddExecutionTimePart1


cmplxaddexeci


stcmplxaddexeccompi


cmplxaddexeco


cmplxaddcompi

IB_CpuClockRatiockratiIB_AnalysisMode

ati

IN_CmplxAddQuadWarm:double

IN_CmplxAddQuadMostLikely:double

IN_CmplxAddQuadMin:double

IN_CmplxAddQuadMax:double

IN_CmplxAddQuadHot:double

IN_CmplxAddQuadCold:double

IN_CmplxAddQuintWarm:double

IN_CmplxAddQuintMostLikely:double

IN_CmplxAddQuintMin:double

IN_CmplxAddQuintMax:double

IN_CmplxAddQuintHot:double

IN_CmplxAddQuintCold:double

IN_CmplxAddTripleWarm:double

IN_CmplxAddTripleMostLikely:double

IN_CmplxAddTripleMax:double

IN_CmplxAddTripleMin:double

IN_CmplxAddTripleHot:double

IN_CmplxAddTripleCold:double

IN_CmplxAddDoubleWarm:double

IN_CmplxAddDoubleMostLikely:double

IN_CmplxAddDoubleMin:double

IN_CmplxAddDoubleMax:double

IN_CmplxAddDoubleHot:double

IN_CmplxAddDoubleCold:double

IN_CmplxAddSingleWarm:double

IN_CmplxAddSingleMostLikely:double

IN_CmplxAddSingleMin:double

IN_CmplxAddSingleMax:double

IN_CmplxAddSingleHot:double

IN_CmplxAddSingleCold:double

SC_Anal_CmplxAddSingleBufferQmifPart1

SCcmplxAddSingleMaxLikely:real_T

SCcmplxAddSingleWarmMaxLikely:real_T

SCcmplxAddSingleHotMaxLikely:real_T

SCcmplxAddSingleWcet:real_T

SCcmplxAddSingleWarmMuMax:real_T

SCcmplxAddSingleMin:real_T

SC_Anal_CmplxAddDoubleBufferQmifPart1

SCcmplxAddDoubleMaxLikely:real_T

SCcmplxAddDoubleWarmMaxLikely:real_T

SCcmplxAddDoubleHotMaxLikely:real_T

SCcmplxAddDoubleWcet:real_T

SCcmplxAddDoubleWarmMuMax:real_T

SCcmplxAddDoubleMin:real_T

SC_Anal_CmplxAddTripleBufferQmifPart1

SCcmplxAddTripleMaxLikely:real_T

SCcmplxAddTripleWarmMaxLikely:real_T

SCcmplxAddTripleHotMaxLikely:real_T

SCcmplxAddTripleWcet:real_T

SCcmplxAddTripleWarmMuMax:real_T

SCcmplxAddTripleMin:real_T

SC_Anal_CmplxAddQuadBufferQmifPart1

SCcmplxAddQuadMaxLikely:real_T

SCcmplxAddQuadWarmMaxLikely:real_T

SCcmplxAddQuadHotMaxLikely:real_T

SCcmplxAddQuadWcet:real_T

SCcmplxAddQuadWarmMuMax:real_T

SCcmplxAddQuadMin:real_T

SC_Anal_CmplxAddQuintBufferQmifPart1

SCcmplxAddQuintMaxLikely:real_T

SCcmplxAddQuintWarmMaxLikely:real_T

SCcmplxAddQuintHotMaxLikely:real_T

SCcmplxAddQuintWcet:real_T

SCcmplxAddQuintWarmMuMax:real_T

SCcmplxAddQuintMin:real_T

Supports algorithms with up to five buffers (e.g. three input, two output)

180

Figure A-36 Single Core CM Analysis Complex Add Container Execution Time AD

act [Block] SC_Anal_ComplexAddExecutionTimeBlock [ACT_ComplexAddExecTime]

CmplxAddSingleColdWaitCount++;CmplxAddDoubleColdWaitCount++;

CmplxAddSingleColdWaitCount = 0; CmplxAddDoubleColdWaitCount = 0;

ComplexAddExecTimeAvailableEvent to cmplxaddexeci

CxAddSing_1

[else]

[IN_CmplxAddSingleCold == 0.0]

CxAddDoub_1

[IN_CmplxAddDoubleCold == 0]

[else]

CmplxAddTripleColdWaitCount = 0;

CmplxAddTripleColdWaitCount++;

CxAddTrip_1

[IN_CmplxAddTripleCold == 0]

[else]

CmplxAddQuadColdWaitCount = 0;

CmplxAddQuadColdWaitCount++;

CxAddQuad_1

[IN_CmplxAddQuadCold == 0]

[else]

selectComplexAddBufferTime

computeComplexAddTime

CmplxAddQuintColdWaitCount = 0;

CmplxAddQuintColdWaitCount++

;

CxAddQuint_1

[IN_CmplxAddQuintCold == 0]

[else]

StartCmplxAddExecutionTimeComputationEvent

181

Figure A-37 MATLAB SIMULINK Complex Add Single Buffer Model

182

Figure A-38 Single Core CM Analysis Trig Container IBD

ibd [Block] SC_AnalComputation_TrigContainerBlock [IBD_SC_TrigComputationContainer]


trigexeci


trigexeco

StartTrigExecutionTimeComputationEventInterfaceBlocksttrigexeccompi

NumberCosComputationsInterfaceBlock_V1, NumberSinComputationsInterfaceBlock_V1, NumberTanComputationsInterfaceBlock_V1

trigcompi

IB_CpuClockRatio

ckrati_1

IB_AnalysisMode

ati_1SC_AnalysisTrigExecutionTimePart1

TanExecutionTimeInterfaceBlock

tanexeci_1

SinExecutionTimeInterfaceBlock

sinexeci_1

CosExecutionTimeInterfaceBlock

cosexeci_1


trigexeci


trigexeco

SinExecutionTimeEventInterfaceBlock

sinexeco_1

CosExecutionTimeEventInterfaceBlock

cosexeco_1

TanExecutionTimeEventInterfaceBlock

tanexeco_1

SC_AnalComputation_CosContainerPart1


cosexeci_1

StartCosExecutionTimeComputationEventInterfaceBlock

stcosexeccompi

IB_AnalysisMode

ati_1


cosexeco_1

NumberCosComputationsInterfaceBlock_V1

coscompi_1

IB_CpuClockRatio

ckrati_1

SC_AnalComputation_SinContainerPart1

SinExecutionTimeEventInterfaceBlock

sinexeci_1

StartSinExecutionTimeComputationEventInterfaceBlock

stsinexeccompi

SinExecutionTimeInterfaceBlock

sinexeco_1

NumberSinComputationsInterfaceBlock_V1

sincompi_1

IB_CpuClockRatio

ckrati_1

IB_AnalysisMode

ati_1

SC_AnalComputation_TanContainerPart1

TanExecutionTimeEventInterfaceBlock

tanexeci_1

TanExecutionTimeInterfaceBlock

tanexeco_1

NumberTanComputationsInterfaceBlock_V1

tancompi_1

StartTanExecutionTimeComputationEventInterfaceBlock

sttanexeccompi

IB_CpuClockRatio

ckrati_1

IB_AnalysisMode

ati_1

SC_PromulgateAnalysisTrigExecutionTimeStartPart1

StartTanExecutionTimeComputationEventInterfaceBlock

sttanexeccompi

StartSinExecutionTimeComputationEventInterfaceBlock

stsinexeccompi


stcosexeccompi

StartTrigExecutionTimeComputationEventInterfaceBlocksttrigexeccompi

This Analysis Container is replicated in the companion Simulation Container.

183

Figure A-39 Single Core CM Analysis Trig Container Propagate Start Activity

act [Block] SC_PromulgateTrigExecutionStartEventBlock [ACT_PromulgateStatTrigExecutionTimeEvent]

StartTrigExecutionTimeComputationEvent


StartCosExecutionTimeComputationEvent to stcosexeccompi StartSinExecutionTimeComputationEvent to stsinexeccompi StartTanExecutionTimeComputationEvent to sttanexeccompi

184

Figure A-40 Single Core CM Analysis Trig Container Execution Time AD

act [Block] SC_TrigTimeBlock [ACT_TrigTimeBlock]

CosExecTimeAvailableEvent SinExecTimeAvailableEvent

computeTrigComputationExecutionTime

TrigExecutionTimeAvailableEvent to trigexeci

TanExecTimeAvailableEvent

185

Figure A-41 Single Core CM Analysis Trig Container IBD

ibd [Block] SC_AnalComputation_CosContainerBlock [IBD_SC_CosComputationContainer]


cosexeci_1


stcosexeccompi

IB_AnalysisMode

ati_1


cosexeco_1


coscompi_1

IB_CpuClockRatiockrati_1

SC_AnalysisCosExecutionTimePart1


cosexeci


stcosexeccompi


cosexeco


coscompi

IB_CpuClockRatiockrati

IB_AnalysisMode

ati

IN_CosCold:double

IN_CosMostLikely:double

IN_CosWarm:double

IN_CosHot:double

IN_CosMax:double

IN_CosMin:double

SC_Anal_CosQmifPart1

SCcosWarmMuMax:real_T

SCcosMaxLikelihood:real_T

SCcosWarmMaxLikelihood:real_T

SCcosHotMaxLikelihood:real_T

SCcosWorstCaseExecutionTime:real_T

SCcosMin:real_T

The model structure shown here is replicated for TrigSin,TrigTan, ArcCos, ArcSin, ArcTan,ArcTanFourQuad, MiscExp, MiscLog, and MiscSqrt

186

Figure A-42 Single Core CM Analysis Trig Cosine Container Execution Time AD

act [Block] SC_AnalysisCosExecutionTimeBlock [ACT_CosTimeBlock]

computeCosExecutionTime

CosMinWaitCount++; CosMaxWaitCount++;

CosMinIn_1

[else]

CosMinWaitCount = 0;

[IN_CosMin == 0]

CosMaxWaitCount = 0;

CosMaxIn_1[IN_CosMax == 0]

[else]

CosExecTimeAvailableEvent to cosexeci

StartCosExecutionTimeComputationEvent


187

Figure A-43 MATLAB SIMULINK Cosine Model

188

Figure A-44 Complex Add Single Buffer Hot State Pdf Parameters

HMM_State(1,1) HMM_State(1,2) HMM_State(1,3) HMM_State(1,4) HMM_State(2,1) HMM_State(2,2) HMM_State(2,3) HMM_State(2,4) HMM_State(3,1) HMM_State(3,2) HMM_State(3,3) HMM_State(3,4)

DistName generalized extreme value loglogistic tlocationscale beta generalized extreme value generalized pareto loglogistic lognormal generalized pareto generalized extreme value loglogistic lognormal

NLogL -568076.0794 -562972.3344 -561869.8315 -561828.835 -265244.0404 -264464.6268 -261111.1075 -260445.784 -71651.14141 -71473.40133 -70822.07087 -70711.35527

BIC -1136121.765 -1125924.406 -1123709.269 -1123637.408 -530459.9057 -528901.0784 -522203.4315 -520872.7845 -143277.8865 -142922.4063 -141627.8775 -141406.4463

AIC -1136146.159 -1125940.669 -1123733.663 -1123653.67 -530482.0809 -528923.2536 -522218.215 -520887.568 -143296.2828 -142940.8027 -141640.1417 -141418.7105

AICc -1136146.158 -1125940.668 -1123733.662 -1123653.67 -530482.0789 -528923.2516 -522218.214 -520887.567 -143296.2757 -142940.7956 -141640.1382 -141418.707

ParamName_1 k mu mu a k k mu mu k k mu mu

ParamDescr_1 shape log location location a shape shape log location log location shape shape log location log location

ParamValue_1 0.23715722 -19.2054384 4.55427E-09 8462.800415 0.447574409 -0.090828143 -19.14831128 -19.14486292 -0.144599417 0.337717059 -19.04287341 -19.03688708

ParamName_2 sigma sigma sigma b sigma sigma sigma sigma sigma sigma sigma sigma

ParamDescr_2 scale log scale scale b scale scale log scale log scale scale scale log scale log scale

ParamValue_2 2.69887E-11 0.005297994 4.45383E-11 1.85315E+12 3.92564E-11 1.05944E-10 0.009307866 0.018356663 3.03125E-10 1.28118E-10 0.022457104 0.042124098

ParamName_3 mu nu mu theta theta mu

ParamDescr_3 location degrees of freedom location threshold threshold location

ParamValue_3 4.54664E-09 6.906482472 4.80065E-09 4.75098E-09 5.14012E-09 5.27991E-09

Paramci(1,1) 0.227434199 -19.20555041 65535 8288.067666 0.431122518 -0.110155639 -19.14859469 -19.14519154 -0.176600991 0.299244709 -19.04417591 -19.03830309

Paramci(1,2) 2.66865E-11 0.005242017 65535 1.8152E+12 3.85232E-11 1.03195E-10 0.009163244 0.018127234 2.89454E-10 1.23703E-10 0.021825803 0.041146476

Paramci(1,3) 4.54627E-09 65535 4.79985E-09 4.75098E-09 5.14012E-09 5.27473E-09

Paramci(2,1) 0.246880242 -19.20532638 65535 8641.216958 0.464026301 -0.071500647 -19.14802787 -19.1445343 -0.112597843 0.376189409 -19.04157092 -19.03547107

Paramci(2,2) 2.72943E-11 0.005354569 65535 1.89189E+12 4.00037E-11 1.08766E-10 0.009454771 0.018592015 3.17441E-10 1.32691E-10 0.023106665 0.043149648

Paramci(2,3) 4.54702E-09 65535 4.80144E-09 4.75098E-09 5.14012E-09 5.28509E-09

ParamCov(1,1) 2.46097E-05 3.26622E-09 8115.167651 7.04588E-05 9.72423E-05 2.09093E-08 2.81064E-08 0.000266592 0.000385302 4.41626E-07 5.21587E-07

ParamCov(1,2) 1.60383E-17 1.96753E-10 1.76213E+12 4.90216E-16 -1.1032E-14 1.67756E-09 5.40912E-19 -9.18315E-14 -7.75988E-15 2.4695E-08 -8.37473E-19

ParamCov(1,3) -2.49668E-16 -7.50726E-16 0 0 -2.09454E-14

ParamCov(2,1) 1.60383E-17 1.96753E-10 1.76213E+12 4.90216E-16 -1.1032E-14 1.67756E-09 5.40912E-19 -9.18315E-14 -7.75988E-15 2.4695E-08 -8.37473E-19

ParamCov(2,2) 2.40475E-26 8.24401E-10 3.82649E+20 1.42631E-25 2.01994E-24 5.53054E-09 1.40549E-08 5.09399E-23 5.25488E-24 1.06741E-07 2.60909E-07

ParamCov(2,3) 1.71446E-26 1.07874E-25 0 0 4.21218E-24

ParamCov(3,1) -2.49668E-16 -7.50726E-16 0 0 -2.09454E-14

ParamCov(3,2) 1.71446E-26 1.07874E-25 0 0 4.21218E-24

ParamCov(3,3) 3.63886E-26 1.63749E-25 0 0 6.99013E-24

189

Figure A-45 Complex Add Single Buffer Warm State Pdf Parameters

HMM_State(1,1) HMM_State(1,2) HMM_State(1,3) HMM_State(1,4) HMM_State(2,1) HMM_State(2,2) HMM_State(2,3) HMM_State(2,4) HMM_State(3,1) HMM_State(3,2) HMM_State(3,3) HMM_State(3,4) HMM_State(4,1) HMM_State(4,2) HMM_State(4,3) HMM_State(4,4)

DistName generalized extreme value loglogistic tlocationscale beta generalized pareto generalized extreme value beta inverse gaussian generalized pareto generalized extreme value beta inverse gaussian generalized extreme value loglogistic tlocationscale generalized pareto

NLogL -137419.4533 -135917.2134 -135752.1999 -135046.7182 -25034.5688 -25000.08912 -24981.05301 -24980.85718 -5202.620544 -5167.933528 -5164.862739 -5164.09248 -37159.86877 -37134.87149 -37079.61223 -36254.58257

BIC -274812.7072 -271816.9607 -271478.2004 -270075.9701 -50047.8824 -49978.92306 -49947.93589 -49947.54424 -10388.7867 -10319.41266 -10318.75588 -10317.21537 -74297.16385 -74254.69386 -74136.65078 -72486.59146

AIC -274832.9065 -271830.4269 -271498.3997 -270089.4363 -50063.13759 -49994.17825 -49958.10602 -49957.71437 -10399.24109 -10329.86706 -10325.72548 -10324.18496 -74313.73753 -74265.74298 -74153.22446 -72503.16515

AICc -274832.9027 -271830.425 -271498.3959 -270089.4344 -50063.11743 -49994.15808 -49958.09594 -49957.70429 -10399.13982 -10329.76579 -10325.67506 -10324.13454 -74313.72455 -74265.7365 -74153.21148 -72503.15217

ParamName_1 k mu mu a k k a mu k k a mu k mu mu k

ParamDescr_1 shape log location location a shape shape a scale shape shape a scale shape log location location shape

ParamValue_1 0.178498126 -18.83969954 6.56413E-09 5984.382385 -0.804640651 -0.210288348 1382.828752 7.43074E-09 -0.890696487 -0.277993223 4562.190614 8.05988E-09 0.085972141 -18.4913442 9.25393E-09 0.0280111

ParamName_2 sigma sigma sigma b sigma sigma b lambda sigma sigma b lambda sigma sigma sigma sigma

ParamDescr_2 scale log scale scale b scale scale b shape scale scale b shape scale log scale scale scale

ParamValue_2 4.45031E-11 0.005875126 7.28196E-11 9.07894E+11 6.44636E-10 1.85429E-10 1.86078E+11 1.04164E-05 3.77693E-10 1.16098E-10 5.65724E+11 3.65798E-05 4.05101E-10 0.02477077 5.26453E-10 1.13873E-09

ParamName_3 mu nu theta mu theta mu mu nu theta

ParamDescr_3 location degrees of freedom threshold location threshold location location degrees of freedom threshold

ParamValue_3 6.55565E-09 4.90599084 7.06018E-09 7.35398E-09 7.86413E-09 8.01739E-09 9.17135E-09 4.011342747 8.29176E-09

Paramci(1,1) 0.166552998 -18.83994308 65535 5811.730023 -0.852824222 -0.270337798 1230.304944 65535 -1.017209458 -0.416713802 3508.751176 65535 0.071320656 -18.49313723 65535 0.01162135

Paramci(1,2) 4.35949E-11 0.005748419 65535 8.82098E+11 6.07832E-10 1.76166E-10 1.65604E+11 65535 3.28704E-10 1.03291E-10 4.35145E+11 65535 3.92022E-10 0.023767004 65535 1.08495E-09

Paramci(1,3) 6.55447E-09 65535 7.06018E-09 7.34148E-09 7.86413E-09 7.99991E-09 9.15166E-09 65535 8.29176E-09

Paramci(2,1) 0.190443253 -18.839456 65535 6162.163829 -0.756457081 -0.150238897 1554.261296 65535 -0.764183515 -0.139272643 5931.906297 65535 0.100623626 -18.48955117 65535 0.04440085

Paramci(2,2) 4.54303E-11 0.006004626 65535 9.34444E+11 6.83669E-10 1.95179E-10 2.09083E+11 65535 4.33982E-10 1.30494E-10 7.35489E+11 65535 4.18617E-10 0.025816929 65535 1.19517E-09

Paramci(2,3) 6.55684E-09 65535 7.06018E-09 7.36648E-09 7.86413E-09 8.03488E-09 9.19105E-09 65535 8.29176E-09

ParamCov(1,1) 3.71437E-05 1.544E-08 7989.723583 0.000604368 0.000938689 6798.928416 0.004166524 0.005009399 373465.6288 5.58814E-05 8.3691E-07 6.99276E-05

ParamCov(1,2) 9.12877E-16 9.35255E-10 1.19339E+12 -4.74466E-13 -9.61483E-14 9.12435E+11 -1.72624E-12 -3.40829E-13 4.62895E+13 1.2899E-14 2.91682E-08 -6.19324E-14

ParamCov(1,3) -3.65451E-16 0 -9.83394E-14 0 -3.27058E-13 -8.7818E-15 0

ParamCov(2,1) 9.12877E-16 9.35255E-10 1.19339E+12 -4.74466E-13 -9.61483E-14 9.12435E+11 -1.72624E-12 -3.40829E-13 4.62895E+13 1.2899E-14 2.91682E-08 -6.19324E-14

ParamCov(2,2) 2.19179E-25 4.2713E-09 1.78272E+20 3.7385E-22 2.35056E-23 1.22472E+20 7.16687E-22 4.79409E-23 5.73767E+21 4.60139E-23 2.7332E-07 7.8991E-22

ParamCov(2,3) 1.34991E-25 0 8.44669E-24 0 1.35763E-23 2.72015E-23 0

ParamCov(3,1) -3.65451E-16 0 -9.83394E-14 0 -3.27058E-13 -8.7818E-15 0

ParamCov(3,2) 1.34991E-25 0 8.44669E-24 0 1.35763E-23 2.72015E-23 0

ParamCov(3,3) 3.63346E-25 0 4.06841E-23 0 7.95934E-23 1.00966E-22 0

Leveraging Model-Based Techniques for Component Level ...

Documents

Transcript of Leveraging Model-Based Techniques for Component Level ...