Management Excellence - Leveraging Technology and Techniques
Leveraging Model-Based Techniques for Component Level ...
Transcript of Leveraging Model-Based Techniques for Component Level ...
Leveraging Model-Based Techniques for Component Level Architecture Analysis in
Product-Based Systems
by David Keith McKean
B.S. Electrical Engineering, May 1978, California State University Long Beach
M.S. Systems Engineering, The George Washington University
A Dissertation submitted to
The Faculty of
The School of Engineering and Applied Science
of The George Washington University
in partial fulfillment of the requirements
for the degree of Doctor of Philosophy
May 19, 2019
Dissertation directed by
Shahram Sarkani
Professor of Engineering Management and Systems Engineering
Thomas Mazzuchi
Professor of Engineering Management and Systems Engineering
ii
The School of Engineering and Applied Science of The George Washington University
certifies that David Keith McKean has passed the Final Examination for the degree of
Doctor of Philosophy as of March 15, 2019. This is the final and approved form of the
dissertation.
Leveraging Model-Based Techniques for Component Level Architecture Analysis in
Product-Based Systems
David Keith McKean
Dissertation Research Committee:
Shahram Sarkani, Professor of Engineering Management and Systems
Engineering, Dissertation Co-Director
Thomas Mazzuchi, Professor of Engineering Management and Systems
Engineering, Dissertation Co-Director
Amirhossein Etemadi, Assistant Professor of Engineering Management and
Systems Engineering, Committee Member
Ebrahim Malalla, Assistant Professor of Engineering Management and Systems
Engineering, Committee Member
Timothy Blackburn, Professional Lecturer of Engineering Management and
Systems Engineering, Committee Member
iii
© Copyright 2019 by David Keith McKean
All rights reserved
iv
Dedication
First, I dedicate this work to my all-powerful Father and His beloved Son, for the grace
and mercy shown towards me, through continued mental and physical health without
which this work could not have been completed. To my best friend and soulmate, Alana,
your love and support throughout this entire effort has been invaluable. There are
multiple times I would have stopped this endeavor if not for your encouragement. To
Gil, your encouragement to “go big brain” every Saturday morning made it possible to
continue on. To my son’s Jed and Wil, your willingness to listen helped clarify many
ideas and issues. And to my long-deceased father, you challenged me to succeed, you
taught me discipline, you taught me to see the best in, expect the best of, and have
compassion for every person. I hope, through this work, to have proven a worthy student
and son.
Finally, to my advisors thank you for your knowledge, wisdom, and guidance throughout
the doctoral research process. To all doctoral class instructors, thank you for your time,
efforts, and passion to impart the knowledge needed to complete this endeavor.
v
Abstract of Dissertation
Leveraging Model-Based Techniques for Component Level Architecture Analysis in
Product-Based Systems
System design at the component level seeks to construct a design trade space of alternate
solutions comprising mapping(s) of system function(s) to physical hardware or software
product components. The design space is analyzed to determine a near-optimal next-level
allocated architecture solution that system function and quality requirements. Software
product components are targeted to increasingly complex computer systems that provide
heterogeneous combinations of processing resources. These processing technologies
facilitate performance (speed) optimization via algorithm parallelization. However, speed
optimization can conflict with electrical energy and thermal constraints. A multi-
disciplinary architecture analysis method is presented that considers all attribute
constraints required to synthesize a robust, optimum, extensible next-level solution. This
paper presents an extensible, executable model-based architecture attribute framework
that efficiently constructs a component-level design trade space. A proof-of-concept
performance attribute model is introduced that targets single-CPU systems. The model
produces static performance estimates that support optimization analysis and dynamic
performance estimation values that support simulation analysis. This model-based
approach replaces current architecture analysis of alternatives spreadsheet approaches.
The capability to easily model computer resource alternatives that produces attribute
estimates improves design space exploration productivity. Performance estimation
improvements save time and money through reduced prototype requirements. Credible
architecture attribute estimates facilitate more informed design tradeoff discussions with
vi
specialty engineers. This paper presents initial validation of a model-based architecture
attribute analysis method and model framework using a single computation thread
application on two laptop computers with different CPU configurations. Execution time
estimates are calibrated for several data input sizes using the first laptop. Actual
execution times on the second laptop are shown to be within 10 percent of execution time
estimates for all data input sizes.
vii
Table of Contents
Dedication ......................................................................................................................... iv
Abstract .............................................................................................................................. v
Table of Contents ............................................................................................................ vii
List of Figures .................................................................................................................. xii
List of Tables ................................................................................................................. xvii
List of Acronyms .......................................................................................................... xviii
Chapter 1 - Research Problem ........................................................................................1
1.1 Problem Context ................................................................................................. 1
1.2 Statement of the Problem .................................................................................... 3
1.3 Research Goals and Objectives ........................................................................... 6
1.3.1 Research Scope and Constraints ..................................................................... 7
1.4 Rationale and Justification .................................................................................. 9
1.5 Relevance/Importance/Motivation .................................................................... 10
1.6 Research Contributions to the Systems Engineering Body of Knowledge ....... 13
1.7 Research Approach ........................................................................................... 15
1.8 Organization ...................................................................................................... 16
Chapter 2 - Literature Review.......................................................................................18
2.1 System Architecture Structure .......................................................................... 18
2.2 Component Architecture Views ........................................................................ 20
viii
2.3 Component Physical Architecture .................................................................... 22
2.3.1 Performance Models ..................................................................................... 24
2.3.1.1 Single Computer Resource MOCs ......................................................... 25
2.3.1.2 Multiple Computer Resources................................................................ 28
2.3.2 Energy/Power/Thermal Models .................................................................... 29
2.4 Modeling Languages for Systems ..................................................................... 30
2.4.1 Functional Flow Block Diagrams (FFBDs) .................................................. 30
2.4.2 Integration Definition (IDEF) Language ...................................................... 31
2.4.3 Object-Process Methodology (OPM) ........................................................... 32
2.4.4 System Modeling Language (SysML) .......................................................... 33
2.5 MBSE Methodologies and Architecture Definition ......................................... 36
2.5.1 HW/SW Co-Design ...................................................................................... 37
2.5.2 Rational Harmony for Systems Engineering (SE) ........................................ 39
2.5.2.1 System Functional Analysis ................................................................... 40
2.5.2.2 Harmony-SE Architecture Analysis ....................................................... 41
2.5.2.3 Harmony-SE Architecture Design ......................................................... 42
2.5.3 Object-Oriented Systems Engineering Method (OOSEM) ........................... 43
2.5.4 Vitech Model-Based Systems Engineering .................................................. 49
2.5.5 IBM RUP-SE ................................................................................................ 52
2.5.6 Selected MBSE Methodology....................................................................... 55
ix
2.6 HW/SW Partitioning Optimization ................................................................... 56
Chapter 3 - Research Methodology ..............................................................................60
3.1 Research Method .............................................................................................. 60
3.2 Component Architecture Attribute Introduction ............................................... 60
3.3 Component Architecture Attribute Analysis Method ....................................... 61
3.3.1 Define Key Component System Functions ................................................... 63
3.3.2 Assign Attribute/Thread Weights ................................................................. 64
3.3.3 Define Candidate Physical Architecture Solutions ....................................... 66
3.3.4 Model Function Attributes ............................................................................ 67
3.3.5 Model Physical Architecture Attributes........................................................ 67
3.3.6 Compute Attribute Cost ................................................................................ 67
3.3.7 Perform Optimization Analysis .................................................................... 70
3.3.8 Perform Simulation Analysis ........................................................................ 71
3.3.9 Compute Total Attribute Cost ....................................................................... 71
3.3.10 Select Solution Architecture.................................................................... 71
3.4 Architecture Attribute System Model Overview .............................................. 71
3.5 Architecture Attribute System Model Details ................................................... 81
3.6 Architecture Performance Attribute System Model ......................................... 84
3.6.1 Performance Attribute Logical Architecture Extensions .............................. 84
3.6.2 Performance Attribute Physical Architecture Extensions............................. 84
x
3.6.2.1 Performance Attribute Single Core (SC) CPU Computation Model ..... 88
3.6.2.1.1 SC Analysis Computation Model ................................................... 89
3.6.2.1.1.1 SC Complex Math Computation Model .................................. 92
3.6.2.1.1.2 SC Floating Point Math Computation Model .......................... 95
3.6.2.1.1.3 SC Integer Math Computation Model ...................................... 96
3.6.2.1.1.4 SC Trig Computation Model.................................................... 97
3.6.2.1.1.5 SC Arc Trig Computation Model............................................. 99
3.6.2.1.1.6 SC Miscellaneous Computation Model ................................. 100
3.6.3 Architecture Attribute System Model – Quantitative Model Interface ...... 101
3.7 Performance Attribute Statistical Performance Models ................................. 102
3.7.1 Statistical Process Model Development Computer Configuration ............. 103
3.7.2 Estimation Models ...................................................................................... 105
3.7.2.1 Execution Time Data Collection Workflow Step ................................ 105
3.7.2.2 State Definition (Estimation) Workflow Step ...................................... 107
3.7.2.3 Distribution Analysis Workflow Step .................................................. 107
3.7.2.4 Simulink Modeling (Estimation) Workflow Step ................................ 109
3.7.3 Simulation Analysis Models ....................................................................... 110
3.7.3.1 State Definition (Simulation) Workflow Step...................................... 110
3.7.3.2 Transition Analysis .............................................................................. 110
3.7.3.3 Simulink Modeling (Simulation) Workflow Step ................................ 115
xi
Chapter 4 - Data Analysis and Results .......................................................................116
4.1 Case Study Definition ..................................................................................... 116
4.2 Case Study Architecture Attribute Workflow................................................. 117
4.3 Case Study Results Analysis ........................................................................... 119
4.3.1 Case Study Data Analysis Details............................................................... 123
4.3.2 Thread Cost ................................................................................................. 126
4.3.3 Simulation Analysis .................................................................................... 127
Chapter 5 - Conclusions ..............................................................................................128
5.1 Contributions to the field ................................................................................ 128
5.1.1 Limitations .................................................................................................. 129
5.2 Recommendations for Future Work................................................................ 130
Chapter 6 - Bibliography ............................................................................................132
Chapter 7 - COPYRIGHTS ........................................................................................144
Appendix A Oversized Figures ....................................................................................145
xii
List of Figures
Figure 1-1 Enhanced Architecture Artifacts ....................................................................... 1
Figure 1-2 INCOSE MBSE Roadmap (Shah 2010) ......................................................... 13
Figure 1-3 MBSE Framework for Architecture Attribute Overview ................................ 16
Figure 2-1 System-of-Interest Structure (IEEE 2008) ...................................................... 19
Figure 2-2 IEEE-1220-1998 System Breakdown Structure (ISO 2016) .......................... 19
Figure 2-3 System Structure Example using EIA-632 Building Blocks (SAE 2014) ...... 20
Figure 2-4 Traditional HW/SW Partitioning Target Architecture (Wolf 2003) ............... 22
Figure 2-5 Modern HW/SW Partitioning Physical Architecture ...................................... 23
Figure 2-6 Embedded Microprocessor MOC Levels of Abstraction (Meyerowitz 2008) 24
Figure 2-7 Cyclostatic Dataflow (Bilsen 1996) ................................................................ 26
Figure 2-8 IDEF0 Activity Box (IEEE 2012) ................................................................... 32
Figure 2-9 SysML 1.0 Relationship To UML 2.0 (Balmell, 2007) .................................. 33
Figure 2-10 Foundational Pillars of SysML (OMG 2018) ............................................... 34
Figure 2-11 Modified SysML Diagram Taxonomy (Roedler 2012) ................................ 35
Figure 2-12 Harmony-SE Functional Analysis (Hoffmann 2013) ................................... 41
Figure 2-13 Harmony-SE Architecture Analysis (Hoffmann 2013) ................................. 42
Figure 2-14 Harmony-SE Architecture Design Process (Hoffmann 2013) ...................... 43
Figure 2-15 OOSEM Method Pyramid (Estefan 2008) .................................................... 44
Figure 2-16 OOSEM Specify and Design System Process (Friedenthal 2015)................ 45
Figure 2-17 OOSEM Define Logical Architecture Process (Friedenthal 2015) ............... 46
Figure 2-18 OOSEM Define Physical Architecture Process (Friedenthal 2015) ............. 47
Figure 2-19 OOSEM Optimize and Evaluate Alternatives Process (Friedenthal 2015) .. 47
xiii
Figure 2-20 Analysis Context Block Definition Diagram Example (Friedenthal 2015) .. 48
Figure 2-21 OOSEM Cost Effectiveness Analysis Parametric Model ............................. 49
Figure 2-22 Vitech STRATA™ MBSE Model (Long 2011) ........................................... 50
Figure 2-23 Vitech MBSE Architecture Diagram (Long 2011) ....................................... 50
Figure 2-24 CORE® System Design Repository (SDR) (Booth 2008) ........................... 51
Figure 2-25 Sample DAG ................................................................................................. 57
Figure 3-1 Component Architecture Attribute Analysis Workflow ................................. 62
Figure 3-2 System Function Definition Artifacts ............................................................. 64
Figure 3-3 Sample Architecture Attribute and Thread Weights ....................................... 65
Figure 3-4 DAG Current Supported Model CRs .............................................................. 70
Figure 3-5 Architecture Attribute Model Overview ......................................................... 73
Figure 3-6 Architecture Attribute Model Detail ............................................................... 81
Figure 3-7 Performance Attribute Model ......................................................................... 85
Figure 3-8 computeExecutionTime Operation Code Segment ......................................... 91
Figure 3-9 computeComplexExecutionTime Operation Code Segment .......................... 93
Figure 3-10 selectComplexAddBufferTime Operation Code Segment ............................ 94
Figure 3-11 computeComplexAddTime Operation Code Segment .................................. 95
Figure 3-12 computeTrigExecutionTime Operation Code Segment ................................ 98
Figure 3-13 computeCosExecutionTime Operation Code Segment ................................. 99
Figure 3-14 System Model - Quantitative Model Interface ............................................ 101
Figure 3-15 SPM Development Flow ............................................................................. 102
Figure 3-16 Intel Sandy Bridge Microarchitecture (Lempel 2011) ................................ 104
Figure 3-17 Complex Add Single Buffer Multimodal Distribution ............................... 106
xiv
Figure 3-18 Single Core Complex Add State Parameters .............................................. 109
Figure 3-19 Sample Hidden Markov Model ................................................................... 111
Figure 3-20 Complex Add Single Buffer HMM State Transition Probability Matrix ... 112
Figure 3-21 Complex Add Single Buffer State 1 Histogram Display ............................ 112
Figure 3-22 Complex Add Single Buffer State 1 Histogram Data ................................. 113
Figure 3-23 Complex Add Single Buffer State 1 Visible Output Data .......................... 114
Figure 4-1 Case Study Functional and Physical Definition ............................................ 117
Figure 4-2 Case Study Data Results ............................................................................... 122
Figure 4-3 Computer One 32768 Data Sample 02 Outlier Report (from Minitab) ........ 123
Figure 4-4 Computer One 32768 Data Sample 02 (Minus Outliers) Outlier Report...... 124
Figure 4-5 Sample Minitab Histogram and Probability Plots ......................................... 125
Figure 4-6 Sample Wilcoxon Signed Rank Test Results ................................................ 126
Figure A-1 Architecture Analysis Container IBD .......................................................... 145
Figure A-2 StartArchitectureAnalysisBlock Activity and AD ....................................... 146
Figure A-3 LogArch Container IBD ............................................................................... 147
Figure A-4 LogArch Container Perform Computations Execution Control AD ............ 148
Figure A-5 LogArch One Functional Thread One Architecture IBD ............................. 149
Figure A-6 LogArch Functional Threads Perform Computations Exec Ctl AD ............ 150
Figure A-7 LogArch One Functional Thread One Architecture IBD ............................. 151
Figure A-8 Functional Thread Functions Perform Computations Exec Ctl AD ............ 152
Figure A-9 Architecture Attributes Per Function IBD ................................................... 153
Figure A-10 Architecture Attributes Per Function Execution Control AD .................... 154
xv
Figure A-11 Performance Attribute Per Function Execution Control AD ..................... 155
Figure A-12 PhyArch Container IBD (Three Candidates) ............................................. 156
Figure A-13 PhyArche Container Execution Control AD .............................................. 157
Figure A-14 PhyArche One Container IBD (Three Threads Shown) ............................. 158
Figure A-15 PhyArch One Container Execution Control AD ........................................ 159
Figure A-16 PhyArch One Container Thread One IBD (Three Functions Shown) ....... 160
Figure A-17 PhyArch One Thread One Container Execution Control AD .................... 161
Figure A-18 PhyArch One Container Thread One Function One IBD .......................... 162
Figure A-19 PhyArch One Thread One Function One Container Exec Ctl AD ............. 163
Figure A-20 PhyArch One Thread One Function One Container Interface AD ............ 164
Figure A-21 Function Physical Attribute Computation Container IBD ......................... 165
Figure A-22 Function Physical Attribute Comp Container Execution Control AD ....... 166
Figure A-23 Function Physical Performance Attribute IBD ......................................... 167
Figure A-24 Function Physical Performance Attribute Execution Control AD ............ 168
Figure A-25 Retrieve Performance Computations AD .................................................. 169
Figure A-26 Performance Single Core CM Container IBD ............................................ 170
Figure A-27 Single Core CM Execution Control AD ....................................................... 171
Figure A-28 Single Core CM Execution Time AD ........................................................... 172
Figure A-29 Single Core CM Analysis Container IBD .................................................... 173
Figure A-30 Single Core CM Analysis Container Propagate Start Activity .................. 174
Figure A-31 Single Core CM Analysis Container Execution Time AD ........................... 175
Figure A-32 Single Core CM Analysis Complex Container IBD ..................................... 176
Figure A-33 Single Core CM Analysis Complex Container Propagate Start Activity ... 177
xvi
Figure A-34 Single Core CM Analysis Complex Container Execution Time AD ............ 178
Figure A-35 Single Core CM Analysis Complex Add Container IBD .............................. 179
Figure A-36 Single Core CM Analysis Complex Add Container Execution Time AD .... 180
Figure A-37 MATLAB SIMULINK Complex Add Single Buffer Model ........................... 181
Figure A-38 Single Core CM Analysis Trig Container IBD ............................................ 182
Figure A-39 Single Core CM Analysis Trig Container Propagate Start Activity .......... 183
Figure A-40 Single Core CM Analysis Trig Container Execution Time AD ................... 184
Figure A-41 Single Core CM Analysis Trig Container IBD ............................................ 185
Figure A-42 Single Core CM Analysis Trig Cosine Container Execution Time AD ....... 186
Figure A-43 MATLAB SIMULINK Cosine Model ............................................................ 187
Figure A-44 Complex Add Single Buffer Hot State Pdf Parameters ............................. 188
Figure A-45 Complex Add Single Buffer Warm State Pdf Parameters ......................... 189
xvii
List of Tables
Table 2-1 RUP-SE Architecture Framework Model Levels (Cantor 2003) ..................... 53
Table 2-2 RUP-SE Architecture Viewpoints (Cantor 2003) ............................................ 54
Table 2-3 RUP-SE Sample Model Views (Cantor 2003) ................................................. 55
Table 2-4 HW/SW Partitioning Optimization Algorithm Summary ................................ 59
xviii
List of Acronyms
AD – Activity Diagram
CM – Computational Model
CR – Computer Resource
CPU – Central Processing Unit
FPGA – Field Programmable Gate Array
GPU – Graphics Processor Unit
GPGPU - General Purpose GPU
HW – Hardware
IBD – Internal Block Diagram
INCOSE – International Council on Systems Engineering
LogArch – Logical Architecture
MBSE – Model Based Systems Engineering
mCPU – multicore/manycore CPU
MPSoC – Multi-Processor SoC
PysArch – Physical Architecture
SEBoK – Systems Engineering Body of Knowledge
xix
sCPU – Single-Core CPU
SoC – System-on-a-Chip
SPM – Statistical Process Model
SW – Software
SysML – System Modeling Language
WCET – Worst Case Execution Time
1
Chapter 1 - Research Problem
1.1 Problem Context
System design transforms an input set of requirements to a next-level system
component architecture with associated requirements. Systems engineering standards,
such as ISO/IEC 15288 (IEEE 2008), ISO/IEC 24748-4 (formerly IEEE-1220) (ISO
2016), and EIA-632 (SAE 2014), provide conflicting system definitions of a component
(or component-level). Buede and Miller (Buede 2016) defines the notion of a system
component as a subset of a physical architecture to which a subset of a functional
architecture is allocated. Each component consists of hardware, software, people,
facilities, or some combination thereof. Components have a hierarchical structure like
requirements and functions. This paper focuses on a system decomposition layer where
the next-level physical architecture is composed of Computing Resource (CR) and
software product components as shown in Figure 1-1.
Figure 1-1 Enhanced Architecture Artifacts
2
Modern computer system physical CR architectures are comprised of single
heterogeneous CR nodes or multiple distributed CR nodes organized as a cluster, grid, or
cloud (Sadashiv 2011) configuration. Single node CRs can be composed of single core
Central Processing Units (sCPUs), multicore CPUs (mCPU), graphics processing units
(GPUs), or specialized hardware such as field programmable gate arrays (FPGAs). These
computing technologies enable performance (i.e. speed) optimization through algorithm
parallelization. However, performance optimization comes at the expense of conflicting
energy, thermal, and reliability concerns. Each CR possesses unique physical
performance (computation speed, energy efficiency, thermal characteristics, and
reliability) attributes. Typically, these CR technologies exhibit increased energy
consumption and heat generation for decreased computation time. Correspondingly,
energy consumption and heat generation are decreased with increased computation time.
Thus, the computation attribute inherently conflicts with the energy and heat attributes.
The systems engineer is challenged to perform an architecture analysis that
produces a next-level design trade space that considers attribute (e.g. execution time,
energy consumption, heat generation, etc.) estimates for various CR configurations. The
systems engineer must associate architecture analysis artifacts with the system model.
Past architecture analysis approaches have used utility curves (Hoffmann 2013),
simulations (Buede 2016), or prototypes to develop attribute estimates using specialty
tools. Artifacts produced by these specialty tools do not integrate with the system model.
Model Based System Engineering (MBSE) produces a system model as the
primary artifact (Ramos 2012). Friedenthal et al. (Friedenthal 2015) states that the
system model captures requirements, structure, behavior, parametrics (i.e. constraints),
3
and their interconnection relationships using a modeling language such as the Object
Management Group System Modeling Language™ (OMG SysML®) (OMG 2017).
System architecture captures system model structure, behavior, and interconnection
relationships at each system decomposition level. Figure 1-1, an adaptation of three
architecture views (functional, physical, allocated) plus requirements (Levis 1993),
elaborates the relationship between requirements and architecture. Alternatively, the
Systems Engineering Body of Knowledge (SEBoK) (BKCASE 2017) represents system
architecture as a system element structure (i.e. component) represented as a physical
architecture and behavior represented as a logical architecture. The logical architecture is
comprised of functional, behavioral, and temporal architecture views. The physical
architecture is comprised of a physical view of preferred system elements (i.e.
components) and their interfaces.
1.2 Statement of the Problem
The inability to accurately estimate architecture attributes for high
computation workload systems leads to suboptimal solutions that can introduce
significant technical risk leading to significant cost overruns, schedule delays, or
program failure. The following are real world examples of problems that directly
trace to deficient architecture attribute analysis at the CR/software product level:
1. Software received from a subcontractor was installed on a quad
core general purpose computer system. Software was executed for
a data set of size 2 GB first for 8 hours and then for 24 hours with
no results. The software was declared unusable. Further
investigation revealed that the subcontractor used the software to
4
process data sets of size 100 MB producing results in 2 hours.
Further analysis determined that the results would be produced for
the 2 GB data set in 72 hours. Further analysis revealed that
software implementation on a 100 core GPU would result in 7
hours processing time for the 2 GB data set and 2 hours with a 400
core GPU.
2. Application software received from a subcontractor was installed
for execution on a 30 MZ embedded processor. The software had
two processing cycle time requirements of 10 msec and 25 msec.
The software required 18 msec and 40 msec to complete execution.
The application program required complete redesign introducing a
significant program delay. The money spent to develop the
software turned into a sunk cost. Investigation revealed that the
subcontractor did not have the capability to estimate, simulate, or
prototype computation and software execution on the actual
hardware.
3. A vendor was contracted to develop a mobile application for
remote sensor monitoring on a ruggedized mobile phone. The
system was required to operate for 8 hours on a single battery
charge for deployment purposes. The customer required delivery in
one year. The vendor delivered the product in 11 months. The
application’s use of processing and graphics resources on the
ruggedized phone resulted in phone operation for 2 hours on a
5
single battery charge. The vendor’s software application was
rejected by the customer. Investigation revealed that a capability
was not available to estimate energy consumption on the actual
ruggedized phone.
All of these problems were discovered late in system development (system
integration or after system (or software) delivery). These are only three of many
such types of system development failures that can be traced to inadequate
component-level computation, energy, and thermal physical attribute performance
versus system function analysis. A systems engineer can avoid these problems
with architecture analysis methods and modeling capabilities that:
1. Associate algorithmic decisions in the functional architecture with CR decisions
in the physical architecture.
2. Accurately estimate attribute (computation (or execution) time, energy
consumption, heat produced, etc.) performance for selected CR configurations
early in the design life-cycle.
3. Understand the available multi-attribute trade space relative to a set of system
Measure of Performance (MOP) constraints.
4. Produce a near-optimal functional allocation to a selected physical CR
configuration.
The systems engineer must also consider the use of software (SW) CRs and
hardware CRAs. Software CRs can be configured as multicore (MC) (2-8 cores) and
many-core (16+ cores) CPUs (mCPUs), and Graphics Processing Units (GPUs) or
6
General-Purpose GPUs (GPGPUs) (100s-1000s threads). Hardware CRs can be
configured as multiprocessor system-on-chip (MPSoC) (Wolf 2008), field programmable
gate arrays (FPGAs), and specialty processors (such as Digital Signal Processors)1. All of
these CRs possess varying levels (or none at all) of architecture attribute models for
computation (or execution) time, energy consumption, heat generation, Mean-Time-
Between-Failure (MTBF) (reliability attribute), and so on. The systems engineer can use
CR simulations (when available) to assess algorithm effects on attribute performance
such as computation (or execution) times, energy consumption, or heat generation. In
most cases, the systems engineer must resort to algorithm implementation on CR
prototypes to assess attribute performance.
1.3 Research Goals and Objectives
This paper addresses a series of research questions that addresses the problem
discussed in the preceding section:
1. Can an abstract computation model be developed that accurately
estimates architecture attribute (i.e. execution time, energy consumption,
heat production, etc.) performance for a selected CR (sCPU, mCPU,
GPU) configuration during component level architecture analysis?
a. Can execution time be estimated to within 10 percent (20 percent
threshold) at the thread (and hence function) level for normal
execution?
1 All identified HW resources are reconfigurable (or reprogrammable). Programs targeting these devices are
referred to as firmware (FW). Programs that target general purpose processors are referred to as SW. So
technically, the partitioning is between SW and FW.
7
b. Can execution time be estimated to within 25 percent at the thread
(and hence function) level for worst case execution?
2. Can the produced estimates be related to Measures of Performance
(MOP) constraints to develop an understanding of the available multi-
attribute trade space versus requirements?
3. Can executable model extensions to the component-level logical and
physical architecture be developed that enable the systems engineer to
perform architecture analysis within the system model environment?
An additional objective of the research associated with this paper is to develop
model constructs that serve as integration points with optimization model(s)/algorithm(s)
and simulation models.
This paper will demonstrate that MBSE can facilitate improvement of component
level trade studies through use of an executable logical/physical architecture model layer
extension. The paper documents development of a model framework for computation of
multiple attribute estimates (i.e. execution time, energy consumed, heat generated, etc.)
for various CR configurations. This paper specifically develops computation (i.e.
execution time) attribute estimates using an sCPU computation model developed during
this research effort. This paper records results of a proof-of-concept to demonstrate the
ability to produce sCPU execution time estimates for various input data sizes.
1.3.1 Research Scope and Constraints
The full executable architecture attribute model framework computes thread level
and function level multi-attribute estimates with associated costs. Thread/function costs
8
are provided to optimization algorithm(s). Optimization algorithm(s) return an optimum
allocation configuration. Thread/function costs are computed using attribute estimates
that include execution time, consumed energy, and generated heat. This paper performs
the foundational research to develop the executable architecture attribute framework.
Further, this paper develops, integrates, and performs proof-of-concept testing for an
sCPU computation model to estimate execution time.
A separate sCPU computation model must be separately developed for each
processor family. The processor family used in this paper is the 2nd Generation Intel®
Core™ processor family (Intel® Core™ i7, i5, and i3) (Lempel 2011). This processor
family was chosen because it supports both sCPU and mCPU configurations and the
availability of assets. Proof-of-concept testing was performed due to limited asset
availability. More thorough validation testing can be performed with expanded access to
test assets such as are available in the George Washington University High Performance
Computing facility located on the Virginia Science and Technology campus.
This paper reviews various optimization algorithms (that could potentially be used
to determine optimum allocation) but does not perform optimization algorithm
integration. Rather, the model framework is constructed to construct and export Directed
Acyclic Graph (DAG) and associated cost data required by optimization algorithm(s).
The model framework enables optimization algorithm implementation within the
framework or integration external to the model framework.
The executable architecture attribute model and the toy problem presented in this
paper were developed using IBM® Rational® Rhapsody® Designer for Systems
9
Engineers SysML profile. SysML semantics are discussed in this paper to a model user
comprehension level. Full SysML semantics are found in (OMG 2017) or (Friedenthal
2015).
1.4 Rationale and Justification
Past efforts have addressed CR/software product level functional analysis/
allocation as a HW/SW partitioning problem (Wolf 2003). Systems engineers first
developed HW/SW Co-design analysis methods in the 1990’s (Wolf 2003) that
evolved to Co-simulation methods/environments in the early 2000’s (Teich 2012)
that today exist as third generation integrated Co-design environments (Teich
2012). In conjunction, there has been significant research in solving the HW/SW
Partitioning problem using various optimization algorithmic approaches (Wu
2012). All of the referenced approaches assume a single core CPU as the SW
allocation target and an Application-Specific Integrated Circuit (ASIC) (or
FPGA) as the HW allocation target.
Recent research (Campeanu 2014) has begun to address multicore CPU,
GPU, and FPGA optimal performance (or computation) allocation. Past HW/SW
partitioning approaches address only architecture allocation optimization for
performance (or computation) while not considering performance (i.e. response
time) requirements. Recent research (Sapienza 2013) begins to address multi-
attribute architecture optimization that includes requirements. The SW domain
Palladio Component Model (PCM) (Becker 2009) enables performance (or
computation) and reliability prediction modeling but does not support allocation
optimization.
10
None of the aforementioned approaches addresses
quantification/optimization/ simulation of computation time, energy, and thermal
architecture attributes related to MOP constraints. This paper defines an
architecture attribute model framework used to estimate thread/function execution
time, energy consumption, and generated heat. A thread/function cost (i.e.
estimate divided by MOP constraint) is computed for each attribute for use by
optimization algorithm(s).
Additionally, none of the approaches integrates with a system model
consistent with an MBSE methodology. The architecture attribute model
developed in this paper maintains system model consistency during architecture
attribute analysis through extension of functional and physical architectures.
These model extensions improve architecture analysis visualization making it
easier to understand. Integration of optimization and simulation interfaces with
the framework ensures analyses consistency with attribute estimates and values
that form the design trade space. In addition, architecture selection and allocation
decisions are easily documented within the model framework maintaining
consistency with other decision analysis artifacts.
1.5 Relevance/Importance/Motivation
INCOSE’s Systems Engineering Vision 2025 (Beihoff 2014) identifies
“Virtual Engineering Part of The Digital Revolution” as one of areas of “The
Future State of Systems Engineering”. The future state is defined as follows:
11
“Formal systems modeling is standard practice for specifying, analyzing,
designing, and verifying systems, and is fully integrated with other engineering
models. System models are adapted to the application domain and include a
broad spectrum of models for representing all aspects of systems. The use of
internet-driven knowledge representation and immersive technologies enable
highly efficient and shared human understanding of systems in a virtual
environment that span the full life cycle from concept through development,
manufacturing, operations, and support.”
The motivating technology for Virtual Engineering is Model-Based
Systems Engineering (MBSE). This paper proposes an MBSE architecture
attribute analysis method and executable architecture attribute model framework
that exhibits the capabilities defined by Vision 2025 as follows:
• “Tool suites, visualization and virtualization capabilities will mature to
efficiently support the development of integrated cross-disciplinary analyses
and design space explorations and optimizations, comprehensive
customer/market needs, requirements, architecture, design, operations and
servicing solutions.” This paper presents system model extensions that
encapsulates a computer resource virtualization environment consisting of
computation, energy, thermal, and reliability models that improves systems
engineering communication with computer, electrical, and mechanical cross-
discipline analysts. This paper also presents system model extensions that
facilitates design space exploration of functional algorithm formulation,
12
algorithm data processing size, and computer resource architecture (sCPU,
mCPU, GPU, GPGPU, FPGA, etc.), capacity (number cores, number threads,
number gates, etc.), and clock speed. System model extensions also introduce
multi-attribute node cost that enables multi-attribute optimization analysis.
• “Model-based approaches will move engineering and management from paper
documentation as a communications medium to a paperless environment, by
permitting the capture and review of systems design and performance in
digital form.” The system model extensions in this paper captures systems
design through architecture attribute analysis, trade space exploration, and
optimization analysis. Performance is captured by simulation of algorithm
computation execution, energy consumption, and heat generated for alternative
computer resource solution(s).
• “Model-based approaches will enable understanding of complex system
behavior much earlier in the product life cycle.” Product-based systems (e.g.
embedded systems) are increasing in complexity (Shah 2010) including computer
resource architecture. Multiple disciplines are required to support computer
resource systems analysis including software, computer, electrical, and
mechanical engineering. This paper describes methods and multi-view models
that enable effective and efficient multi-disciplinary analysis and synthesis of
component-level application algorithm functional and computer resource
13
physical architectures.
Figure 1-2 INCOSE MBSE Roadmap (Shah 2010)
Figure 1-2 shows the INCOSE vision for MBSE capability maturation
primarily through the 2010 decade. The research embodied by this thesis directly
supports the “Architecture model integrated with Simulation, Analysis, and
Visualization” capability shown in Figure 1-2.
1.6 Research Contributions to the Systems Engineering Body of Knowledge
Architecture analysis at the component-level (i.e. CR/SW product) requires the
systems engineer to address multi-disciplinary (software, computer, mechanical,
reliability, etc.) concerns to determine optimum architecture allocation to the next level.
Optimization algorithms, known as HW/SW partitioning, have been developed to support
allocation decisions. These algorithms require computation of cost for each function or
thread (section 2.6). This research develops a unique cost algorithm (section 3.3.6) that
14
integrates multiple attribute (performance, energy, thermal) estimates, quality
requirements (i.e. Measure of Performance (MOP) constraints), attribute weights, and
thread/ function weights (section 3.3.2). The key to successful application of the cost
algorithm are accurate attribute estimates. Attribute estimates must support both thread-
level and function-level estimates. Estimates must also support heterogeneous CR options
that include multi/many core CPUs, GPUs, GPGPUs, MPSoCs, FPGAs, and so on. This
research develops a unique model to estimate execution time (i.e. performance attribute)
for single-core CPUs. This research develops Statistical Performance Models (SPMs) for
several arithmetic operations typically used signal processing applications. The SPMs are
integrated with algorithm computation requirements and CPU clock speed to compute
algorithm execution time estimates. This research also uses arithmetic operation SPMs to
produce simulated streams of algorithm execution times that can be used for simulation
analysis.
These cost and estimation algorithms are integrated into new function and
physical architecture model extensions to the system model using SysML. These model
extensions introduce model-based attribute estimation capabilities to the system model.
Integration of the cost algorithm into the System Model enables further integration of
optimization algorithms into the System Model facilitating total architecture analysis
within the system model context. This research then builds on the performance attribute
model extensions to define a complete model framework for performance, energy, and
thermal attributes along with sCPU, mCPU, and GPU CRs.
The algorithms and models developed by this research enables the systems
engineer to perform architecture attribute analyses, share attribute estimation and
15
simulation data with specialty engineers, and make better informed design decisions
resulting in a robust design architecture. Additionally, the systems engineer is able to
perform model-based analysis activities that: 1) construct a design trade space, 2)
perform optimization analysis of the design trade space, 3) perform simulation analysis
associated with the design trade space, 4) provide detailed design data to trade studies,
and 5) determine the size of data to be processed by selected algorithms. Current
literature indicates that these activities are performed external to the system model using:
1) domain specific modeling environments, 2) unique software or scripts, or 3) Excel
spreadsheets.
1.7 Research Approach
MBSE framework elements developed by this research are shown in
Figure 1-3. ‘Component Function Modeling’ encapsulates research workflow
steps that produced functional architecture model extensions. ‘Performance Modeling’
encapsulates research workflow steps that produced functional and physical architecture
model extensions to introduce speed, energy, and thermal attributes that support
optimization analysis and simulation analysis. ‘Quantitative Performance Modeling’
research workflow steps developed arithmetic operation SPMs, estimation algorithms,
cost algorithms, and definition of SysML model elements required to integrate SPMs
through MATLAB Simulink model interfaces. The ‘Optimization Analysis Interface’
introduced model constructs and parameters that enable integration with SW/SW and
HW/SW optimization algorithms. ‘Simulation Analysis Interface’ introduced model
constructs that enables “implementation of” or “integration with” simulation analysis
tools.
16
Figure 1-3 MBSE Framework for Architecture Attribute Overview
This research effort focused on development of the performance (i.e. execution
time) architecture attribute to estimate execution time on sCPUs. A toy problem was
used to identify a single computational thread for analysis, performance estimation, and
proof-of-concept testing on two different sCPU computer systems. Future work will
demonstrate the breadth of applicability of the framework to other platforms and
attributes (energy [i.e. energy consumption], thermal [i.e. heat produced], reliability,
etc.).
1.8 Organization
Section 2 presents a literature review addressing the multi-dimensional challenge
facing the introduction of component level architecture analysis using model-based
architecture attributes (e.g. performance, energy, thermal, reliability) into existing MBSE
methodologies. First, the component system decomposition level is introduced to the
reader. Current definitions of component level architecture views are reviewed from the
literature. Component physical architecture is introduced with associated attributes
17
(performance, energy, thermal, reliability) and attribute models. SysML is reviewed at a
user level from the perspective of architecture development. Finally, several current
MBSE methodologies are reviewed with respect to architecture development. Section 3
introduces the architecture attribute development process, defines executable extensions
to application logical and physical architectures for performing architecture attribute
estimates, and describes the development of quantitative computation models that support
performance estimation. Section 4 presents a nominal example to demonstrate proof-of-
concept and discusses associated results. Section 5 provides research conclusions and
recommendations for future research.
18
Chapter 2 - Literature Review
This literature review investigates fundamental concepts associated with industry
standard practices regarding component-level architecture definition. Aspects discussed
includes system structure, architecture views, architecture attributes, and architecture
development methods. Traditional system design methods for HW/SW co-design have
evolved over the past two decades (Wolf 2003) (Teich 2012) focusing primarily on the
function performance attribute. Traditional optimization methods for HW/SW
partitioning have used cost as the basis for optimization. Cost has been primarily based
on function performance. Optimization algorithms have used various flow graph
techniques to represent the analysis model.
2.1 System Architecture Structure
Developers of product systems (BKCASE 2017) have utilized one of three
popular standards ISO/IEC/IEEE 15288, ISO/IEC 24748-4-2016 (formerly IEEE Std
1220-2005), or ANSI/EIA 632. Harmonization of these standards is on-going as
described by Roedler (Roedler 2012) where ISO/IEC/IEEE-15288 is responsible to
define the process requirement framework, ISO/IEC/IEEE 24748-4 defines SE planning
activities, and ANSI/EIA-632A defines detailed process descriptions. These standards, in
their current configuration, defines three concepts for system structure. The system
structure (i.e. decomposition) defined by ISO/IEC/IEEE 15288 is shown in Figure 2-1.
19
Figure 2-1 System-of-Interest Structure (IEEE 2008)
ISO/IEC/IEEE 26702 defines system breakdown structure (Figure 2-2) and
ANSI/EIA-632 defines system structure using building blocks (Figure 2-3).
Figure 2-2 IEEE-1220-1998 System Breakdown Structure (ISO 2016)
20
Figure 2-3 System Structure Example using EIA-632 Building Blocks (SAE 2014)
Buede (Buede 2016) defines a system component as an element of the physical
architecture which receives system function allocations. Buede (Buede 2016) states that a
component can represent SW/HW integration, specific HW element, specific SW
element, people, facilities, or a combination of these elements similar to Figure 2-2. This
dissertation will build off of the Buede component definition, as all algorithms will be
allocated to one or more CRs executing associated software2.
2.2 Component Architecture Views
ISO/IEC/IEEE 15288 (IEEE 2008) influences evolution of the SEBoK (BKCASE
2017) which defines the notion of a System Architecture for a System of Interest (SoI).
The SEBoK defines system architecture as “abstract, conceptualization-oriented, global,
2 The option to allocate to firmware is introduced with the FPGA and other specialty processors (e.g.
MpSOC, DSP) and is not covered by this research effort.
21
and focused to achieve the mission and life cycle concepts of the system” (BKCASE
2017). The SEBoK also states that system architecture “focuses on high-level structure in
systems and system elements” (BKCASE 2017). The architecture exhibits heuristics that
organize into four domains; static, dynamic, temporal, and environmental (Maier 2009)
where:
• Static Domain - encapsulates physical structure and physical interfaces
• Dynamic Domain - encapsulates logical structure including functions, functional
interactions, reactions to events, and effectiveness (e.g. performance)
• Temporal Domain – encapsulates temporal execution characteristics of functions
both cyclic and acyclic
• Environmental Domain – encapsulates system enablers (i.e. production, logistics
support), safety, climatic, and electromagnetic.
The SEBoK (BKCASE 2017) defines a logical architecture (mapping of Dynamic
and Temporal domains) and a physical architecture (mapping of Static domain). The
SEBoK (BKCASE 2017) decomposes the logical architecture into functional
architecture, behavioral architecture, and temporal architecture views. These architecture
views are developed at each decomposition level (i.e. system, subsystem, component).
This research will build on component-level logical architecture extensions to the
dynamic domain through the addition of logical attributes (e.g. number of computations
for various computation types). This research also extends the physical architecture
through addition of a temporal domain to support execution time calculations and
environmental domain to support consumed energy, generated heat, etc. calculations.
22
2.3 Component Physical Architecture
Buede (Buede 2016) defines a component as “a subset of the physical realization
(and the physical architecture) of the system to which a subset of the system’s functions
has been (or will be) allocated.” Components can be hardware, software, people,
facilities, etc. Traditional embedded systems define component level physical
architecture as one CPU and one Reconfigurable Unit (i.e. Application Specific
Integrated Circuit (ASIC)) (Figure 2-4) (Wolf 2003). Allocation to software in this
architecture is to the CPU. Allocation to hardware is to the Reconfigurable Unit (e.g.
FPGA).
Figure 2-4 Traditional HW/SW Partitioning Target Architecture (Wolf 2003)
Modern Embedded Systems (and Software and Information Technology Systems
(Maier 2009)) form a component physical architecture through various combinations of
heterogeneous computer resources (Campeanu 2014) (Figure 2-5).
23
Figure 2-5 Modern HW/SW Partitioning Physical Architecture
Allocation to SW in this architecture is to a sCPU, mCPU, GPU, or GPGPU.
Allocation to HW in this architecture is a reconfigurable unit (e.g. FPGA). This research
defines a CR (i.e. sCPU, mCPU, GPU, or GPGPU that executes associated software) as
the allocation target.
Research has been conducted to model performance, energy consumption, heat
generated, and reliability for sCPU, mCPU, and GPU resources as discussed in the
following sections.
24
2.3.1 Performance Models
A system’s functional semantics (i.e. algorithms) are formally represented by a
Model of Computation (MOC) (Fernandez 2009). The MOC defines rules that govern the
execution, interaction, and complexity associated with a group of connected
computational elements (Savage 1998). MOC rules dictate computational element
analysis, synthesis, simulation, and verification methods. MOCs have been developed at
different levels of abstraction (e.g. logic circuits, machines, languages, etc.) (Savage
1998) and domains (state, dataflow, event, etc.) (Meyerowitz 2008).
Figure 2-6 Embedded Microprocessor MOC Levels of Abstraction (Meyerowitz 2008)
Figure 2-6 identifies four levels of MOCs that will be referenced in the following
sections. Microarchitecture refers to the representation level of microprocessor
implementation. Timing refers to timing granularity. Typical model speeds are provided
for each level. RTL models simulate a gate level MOC. Cycle-accurate models simulate
cycle-level program execution on target microarchitecture. Instruction level models
25
simulate instruction counts without the target microarchitecture. Algorithm level models
simulate a compiled application executing at native speed on a host system (Meyerowitz
2008).
2.3.1.1 Single Computer Resource MOCs
The logic circuit is an RTL-level MOC that can be represented as a Directed
Acyclic Graph (DAG). DAG nodes (i.e. vertices) model gates and DAG flows model
gate output-input connections (Savage 1998). FPGAs are composed of logic circuits used
to implement algorithms at the HW, or more appropriately FW, level. Finite State
Machines (FSMs) (Mealy 1955) are one example of a state-based machine (Savage 1998)
MOC at the RTL level. FSMs can be connected to form a single FSM. Harel (Harel 1990)
introduced hierarchy, concurrency, and timing extensions to FSMs at the software level.
The Random-Access Machine (RAM) is an FSM that models a general-purpose computer
(Savage 1998). The RAM models a CPU FSM connected to a Random Access Memory
FSM. The RAM is another example of an RTL-level MOC.
Dataflow MOCs feature stateless functional transformation of input data to output
data at the algorithm level. Dataflows have firing rules determined by input channel
tokens that are characterized as follows:
• Single-Rate Data-Flow (SRDF), or Marked Directed (Commoner 1971),
graph: One token is consumed and produced on each graph edge
(Commoner 1971) for each graph vertex (or node) firing.
• Multi-Rate Data-Flow (MRDF): Greater than one token is consumed or
produced for each node (or actor) firing (Schaumont 2013).
26
• Dynamic Data-Flow (DDF): Token consumption and production depend
on conditional values that may not be known at compile time. Node firing
is based on Boolean valued tokens (modeled using select and switch
operators) (Buck 1993).
• Cyclostatic Data-Flow (CSDF) (Bilsen 1996): Token consumption and
production follows a corresponding firing sequence shown in Figure 2-7. In
CSDF each task Vj possesses an execution pattern, fj(1) … fj(Pj), of length
Pj. The sequence is defined as follows: Each nth firing of vertex Vj
function fj((n-1) mod Pj+1) is executed. Consequently, token consumption
and production are also CSDF sequences. The production on edge eu by
vertex Vj is represented by sequence 𝑥𝑗𝑢(1), 𝑥𝑘
𝑢(2) ⋯ , 𝑥𝑘𝑢(𝑃𝑗) of constant
integers. Each nth Vj firing produces 𝑥𝑗𝑢 ((𝑛 − 1) 𝑚𝑜𝑑 𝑃𝑗 + 1) tokens on
edge ej. In an analogous manner, vertex Vk fires when all inputs contain at
least 𝑦𝑘𝑢((𝑛 − 1) 𝑚𝑜𝑑 𝑃𝑘 + 1) tokens.
Figure 2-7 Cyclostatic Dataflow (Bilsen 1996)
Synchronous Dataflow (SDF) (Lee 1987) is another example of a simple,
analyzable dataflow MOC. An SDF graph is a construction of synchronous actors that
describe an algorithm. Each synchronous actor consumes a specified a priori number of
input samples for each input and produces a specified number of samples for each output.
27
Powerful SDL techniques exist that demonstrate graph consistency, determine memory
requirements (Moreira 2010), and enable execution scheduling on single or multiple
CPUs (Lee 1987) to construct a deterministic solution. A fully constructed SDF
implements a cycle-accurate model.
MOCs have been modeled via process networks, embedded in synchronous
languages, and embedded in toolsets. Dataflow Process Network (PN) (Lee 1995) MOCs
have been developed that exhibit various properties such as process concurrency (via
Kahn Process Networks (KPN) (Kahn 1974)), nondeterminism (KPN extension), stream
behavior (Gaudiot 1991), and hierarchy. Other examples of process network MOCs
include Petri Nets (Murata 1989) and Communicating Sequential Processes (Hoare
1978).
Examples of synchronous languages that embed timed MOCs include Estrel
(Berry 2000), for control dominated systems, LUSTRE (Halbwachs 1991) and SIGNAL
(LeGuernic 1991) for dataflow dominated systems. Estrel, LUSTRE, and SIGNAL are
examples of instruction-level MOCs. The Discrete Event (DE) MOC adds the timing
concept to events. DE is the basis for the system-level language SystemC
(SOURCEFORGE n.d.) and VHDL and Verilog Hardware Description Languages
(HDLs). The SystemC DE MOC is instruction-level and VHDL/Verilog HDL DE MOC
is register-transfer level.
Giotto is a SW toolset that includes a timed MOC. The MOC utilizes “known”
Worst-Case Execution Times (WCETs) for tasks. The Giotto compiler assembles a
task/communication schedule that satisfies timing requirements (Henzinger 2001). The
28
Ptolemy project (Hylands 2003) provides another SW toolset that implements several
MOCs including Boolean Dataflow (BDF) (Buck 1993), SDF, DDF, Multidimensional
Synchronous Dataflow (MDSDF), and PN domains with a focus on MOC hierarchical
connection. The Metropolis Electronic System Design Environment (Balarin 2003)
features a metamodel that incorporates existing MOCs at multiple levels of abstraction.
The metamodel also accommodates addition of new MOCs.
However, none of the aforementioned MOC environments computes algorithm
execution time based on function computation characteristics. This research develops an
abstract MOC based on algorithm computations. Furthermore, this research implements
the computation model using a SysML (section 2.4) centric MBSE environment.
2.3.1.2 Multiple Computer Resources
Many of the MOCs identified in the previous section also support applications
that require parallel programs and environments. Additional stochastic analytical models
(Boyd 1994) (Tikir 2007) and statistical performance models (Asanovic 2006)
(Thomasian 1986) that estimate mCPU software performance have been developed but
are difficult to use by non-experts (Tikir 2007). The roofline model features a visual
(two-dimensional graph) performance model integrating Floating-Point (FP)
performance, “operational intensity” (FP operations per main memory operation), and
memory performance (Williams 2009). Amdahl’s Law extended to mCPUs presents a
very intuitive multicore performance model (Hill 2008). Amdahl’s Law states:
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙(𝑓, 𝑛) = 1
(1−𝑓)+ 𝑓
𝑛
(1)
29
Where f is the amount of an algorithm that can be parallelized and (1-f) is the
sequential portion of the algorithm and n is the number of CPU cores. Hill (Hill 2008)
introduces variations to Amdahl’s Law for symmetric, asymmetric, and dynamic multi-
core architectures. Though this research does not implement mCPU computational
models, the intention is to use the sCPU computational model to compute an sCPU
execution time followed by use of the modified Amdahl’s Law in (1) to compute mCPU
execution time as follows:
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒𝑚𝑙𝐶𝑃𝑈 = (1𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙
⁄ ) × 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒𝑠𝐶𝑃𝑈 (2)
GPU performance models target low level (i.e. instruction pipeline or
shared/global memory access) (Zhang 2011) or program control level (Cui 2012). One
example abstract GPU computation model (Hong 2009) performs function execution time
computation using arithmetic intensity and memory access time. A CPU-GPU data
transfer performance model (Van Werkhoven 2014) can be used to estimate interface
transfer times when a function executes in the CPU/GPU followed by a function that
executes in the GPU/CPU. Further research is required to select or develop an appropriate
abstract GPU computation model.
The architecture attribute model integrates a dataflow MOC with the functional
architecture extensions. A dataflow MOC and SPM with the physical architecture
extensions for arithmetic computations presented in section 3.6.
2.3.2 Energy/Power/Thermal Models
There are a significant number (i.e. hundreds) of single-core, multicore, and GPU
energy, power, and thermal models that have been developed over the last five years.
30
Study of these models will require considerable research to determine which are
applicable for use in the architecture attribute model.
2.4 Modeling Languages for Systems
The SEBoK (BKCASE 2017) identifies six prevalent modeling languages used to
build descriptive system models:
1. Functional Flow Block Diagram (FFBD)
2. Integration Definition for Functional Modeling (IDEF0)
3. Object-Process Methodology (OPM)
4. Systems Modeling Language (SysML)
5. Department of Defense Architecture Framework (DoDAF)
6. Web Ontology Language (OWL)
Modeling languages 1-4 are pertinent to this paper. Each modeling language is
presented with details pertinent to selection of a modeling language.
2.4.1 Functional Flow Block Diagrams (FFBDs)
Functional Flow Block Diagrams (FFBDs) were introduced in the late 1950’s by
TRW Corporation (Oliver 1997) FFBDs are multi-tiered, order-sequenced diagrams of
the flow of system functions but do not represent time duration within or between
functions (NASA 2007) FFBD’s are generated during functional analysis and define the
“what” for each functional event (NASA 2007) Functional decomposition is a method
used to decompose higher level more abstract FFBDs to lower level more detailed
FFBDs for multiple tiers. The resulting functional hierarchy represents the functional
31
architecture. Original FFBD syntax is used to depict sequence, concurrency “AND”,
selection “OR”, and iteration (Oliver 1997).
TRW developed enhancements to FFBDs to make them executable (Oliver 1997).
First, data was added to FFBD flows (Alford 1977) Behavior Diagrams (BDs) introduced
additional flow notations, graph MOCs, and hierarchical control concepts (Alford 1992)
to produce executable EFFBDs. SysML Activity Diagrams subsume EFFBDs as
discussed in section 2.4.4.
2.4.2 Integration Definition (IDEF) Language
The IDEF language modeling methodology originated in the 1970’s via the USAF
Integrated Computer Aided Manufacturing (ICAM) program. IDEF models were defined
in three domains:
• IDEF03: Syntax and semantics to produce system function (activity,
process, operation, action) structural representation (functional model),
function relationships, and data required to integrate identified functions
• IDEF14: Syntax and semantics to produce system information structural
representation (information model), information complexity, and
application independent information view that can be transformed into a
database design
3 IDEF0 was originally introduced as Federal Information Processing Standard (FIPS) 183 in December of
1993. IEEE subsumed FIPS 183 and introduced it as IEEE Std 1320.1-1998 (IEEE Standard for Functional
Modeling Language – Syntax and Semantics for IDEF0). ISO/IEC then subsumed IEEE Std 1320.1-1998
and introduced it as ISO/IEC/IEEE 31320-1 (Information Technology – Modeling Languages – Part 1:
Syntax and Semantics for IDEF0) in 2012. 4 IDEF1 was originally introduced as FIPS 184 in December of 1993. IEEE subsumed FIPS 184 and
introduced it as IEEE Std 1320.2-1998 (IEEE Standard for Conceptual Modeling Language Syntax and
Semantics for IDEF1X/Sub 97). ISO/IEC then subsumed IEEE Std 1320.1-1998 and introduced it as
ISO/IEC/IEEE 31320-2 (Information Technology – Modeling Languages – Part 2: Syntax and Semantics
for IDEF1X97) in 2012.
32
• IDEF25: Syntax and semantics to represent system behavior over time
(dynamics model), specifically, the behavior of manufacturing system
resources.
IDEF0 is based on the Structured Analysis and Design Technique™. IDEF0 (and
SADT™) produce hierarchically decomposed activity and data models (Ross 1977) using
Activity boxes as shown in Figure 2-8. These models are used to build a system
functional architecture. IDEF0 (SADT) models are not executable as they do not fully
specify system behavior.
Figure 2-8 IDEF0 Activity Box (IEEE 2012)
2.4.3 Object-Process Methodology (OPM)
OPM encapsulates both a modeling language (graphics and natural language) and
model development methodology (Dori 2002) OPM holistically integrates structure and
behavior in a single model (Dori 2006) The methodology produces OPM diagrams that
represent structure with an object (i.e. function) entity and behavior with process and
5 IDEF2 did not continue
33
state entities (Dori 2006) OPM diagram entities are interconnected with links and
triggers. OPM defines several links; input, output, consumption, result, state-specified
consumption, and state-specified result. OPM diagram entities contain sufficient
semantics to simulate individual diagram behavior.
2.4.4 System Modeling Language (SysML)
The development of SysML was initiated by a UML™ for Systems Engineering
Request for Proposal (OMG 2003) promulgated by the Object Management Group
(OMG) in March 2003. SysML was conceived as an extension of the Unified Modeling
Language (UML) as shown in Figure 2-9. SysML Partners presented a technical
approach along with language features for SysML (Friedenthal 2004) at the 2004
INCOSE International Symposium that represented a response to the UML for SE RFP.
Over the next two years other competing SysML specifications were proposed to the
OMG. In 2006 a development team from more than ten companies (Balmelli 2007)
merged the competing specifications to form SysML 1.0. The current version of SysML
available from OMG is 1.5 (OMG 2017).
Figure 2-9 SysML 1.0 Relationship To UML 2.0 (Balmell, 2007)
34
The foundational viewpoints of SysML are Structure, Behavior, Requirements,
and Parametrics (Friedenthal, Moore and Steiner 2015) as shown in Figure 2-10.
Figure 2-10 Foundational Pillars of SysML (OMG 2018)
Figure 2-11 shows a taxonomy of SysML diagrams organized around four
viewpoints (Structure, Behavior, Specification, Parametric) suggested by the SysML
specification (OMG 2017). Requirements and Parametric diagrams are unique to SysML
(OMG 2017) (Friedenthal, Moore and Steiner 2015). Block Definition Diagrams are
modified from UML 2.0 Class diagrams (Friedenthal, Moore and Steiner 2015). Internal
Block Diagrams are modified from UML 2.0 Composite Structure diagrams (Friedenthal,
Moore and Steiner 2015). Activity Diagrams are modified to account for the differences
in Activity Modeling between SysML and UML 2.0 (Bock 2006). Package, Use Case,
Sequence, and State Machine diagrams are used as in UML 2.0.
35
Figure 2-11 Modified SysML Diagram Taxonomy (Roedler 2012)
SysML diagrams consist of model elements (e.g. blocks, activities, states, etc.)
representing graphic nodes and model elements (e.g. associations, dependencies, links,
etc.) representing interconnection paths (OMG 2017). Each of the nine SysML diagrams
supports a subset of graphic and interconnection model elements. The diagrams pertinent
to this paper are summarized below: (Friedenthal, Moore and Steiner 2015) (OMG 2017)
• Package Diagram (PD) – organizes a system model using the package
model element. Packages encapsulate model elements in a viewpoint,
view, domain, or namespace. Package diagrams are used to describe
package relationships.
• Block Definition Diagram (BDD) – defines block structure elements with
associated composition and classification relationships. Blocks support
port and flow interfaces.
36
• Internal Block Diagram (IBD) – defines block part structure elements with
associated interfaces and interconnections. Interfaces include Full Ports
and Proxy Ports that support provided and required features.
Interconnections are supported by links that connect ports and flows.
• Activity Diagram (AD) – defines the execution order of actions depending
on availability of action inputs, controls, and outputs. This flow-based
behavior models how action inputs are transformed into action outputs.
SysML was selected for model development over other modeling languages
because:
• SysML provides the capability to model executable functional
architecture extensions that compute functional attributes (i.e. number
computations, energy efficiency, etc.)
• SysML provides the capability to define a physical architecture with
executable extensions that compute physical attributes (i.e. execution
time, consumed energy, generated heat, etc.)
• SysML blocks facilitate integration of external models (i.e.
MATLAB® Simulink), tools and environments. This research imports
quantitative computation MATLAB® Simulink models for execution
time attribute computation.
2.5 MBSE Methodologies and Architecture Definition
This section begins with a discussion of the HW/SW co-design concept that pre-
dates MBSE. The section then summarizes four leading MBSE Methodology (Estefan
37
2008) approaches to architecture definition focused at the component level. The MBSE
methodologies are:
• IBM® Rational® Harmony Systems Engineering (Harmony-SE)
• Object-Oriented Systems Engineering Method (OOSEM)
• Vitech Model-Based Systems Engineering (MBSE)
• IBM Rational Unified Process for Systems Engineering (RUP-SE)
2.5.1 HW/SW Co-Design
HW/SW Co-design has been used in the development of electronic products
containing embedded systems6 in many application domains (e.g. mobile devices,
automobiles, home appliances, avionics, and so on).7 HW/SW co-design is defined as the
synergistic concurrent design of HW/SW to satisfy system requirements (De Micheli
1997). System designers were faced with the complex challenge of estimating and
optimizing embedded system performance for various partition alternatives. Teich
(Teich 2012) has identified four generations of evolving co-design methods and
environments from single CPU/ASIC target (thru mid-90s), co-simulation/complex
targets (thru mid-00s), co-synthesis/ heterogeneous multi-core/ASIC targets (thru early-
10s), to the present and future evolution of co-design methods for different system types
and attributes. Three sample co-design methodologies that have been used to produce
executable models are Electronic System Level (ESL) design, Platform Based Design
(PBD), and Model-based co-design.
6 Embedded Systems (ES) in these domains are implemented as Card-Based (e.g. 3U or 6U cards on
compactPCI® backplane), System-on-a-Chip (SoC), MPSoC, or Network-on-a-Chip (NoC). 7 ES domains can be classified as safety-critical, mission-critical, dependable, cyber-physical, and resilient
affecting attribute priorities.
38
The ESL (Teich 2012) co-design synthesis methodology consists of five steps; 1)
modeling/specification, 2) performance estimation, 3) module (or node) mapping, 4)
design space exploration (including optimization), and 5) automatic generation of
selected implementation. Step 1 develops an actor-oriented model that identifies
application actors with associated communication. SystemC is used to specify actor
behavior used to generate an executable specification. Step 2 builds an architecture
template that contains performance data for each actor as well as a SW and/or HW
module. The architecture template also contains all module permutations (processors,
HW modules, communication infrastructures) that satisfy overall requirements for
throughput and size. SW performance is estimated from model code transformations.
HW performance is obtained from an external tool. Step 4 uses an evolutionary multi-
objective optimization algorithm to support tradeoffs among attributes (e.g. FPGA gate
count vs throughput). ESL is supported by the SystemCoDesigner (Keinert 2009) tool.
The double roof co-design model (Teich 2012) coordinates HW and SW implementation
processes that maintains linkage with the ESL design approach.
PBD is based on the “platform” concept. Sangiovanni-Vincentelli defines a
platform as a library of computation and communication components used to compose a
design at a level of abstraction (Sangiovanni-Vincentelli 2007). PBD maps functionality
to a platform instance (top-down) and builds a corresponding platform (bottom-up) by
library component selection that meets propagated performance constraints. The
“middle” defines the functional-platform interface and is described by a semantic domain
that supports mapping of functions to platforms. PBD functionality can be represented
using Hardware/Software design language(s) and/or computation model (homogeneous
39
or heterogeneous). PBD platform (i.e. architecture) is represented using software (e.g.
OMG Unified Modeling Language (UML)) or hardware (e.g. Transaction Level
Modeling (TLM), communication-based, or microprocessor-based) modeling techniques.
Each architecture block is assigned a cost (i.e. execution time, power consumed, etc.)
used for subsequent optimization. The Metropolis framework (Balarin 2003) supports
the PBD methodology by providing a meta-model language parser (for functional and
architecture specifications) and interfaces to various back-end tools (simulator(s),
algorithm plug-ins, logic of constraints (LOC) checker(s), and other verification tools).
Model-based Co-design is another approach to HW/SW Co-design where a
system model, consisting of structural, functional, and dynamic models, was constructed
for an embedded system. A simulation model is then developed using virtual prototypes,
implemented via the Discrete Event System Specification (DEVS) language, and mapped
to a HW/SW architecture (Schulz 1998).
Current research is addressing processor/interface/memory allocation
optimization for mCPU-GPU computational resources. One SW/SW Co-design approach
proposes use of the OMG Modeling and Analysis for Real-Time and Embedded Systems
(MARTE) (OMG 2008) Unified Modeling Language (UML) profile (Campeanu 2014)
for descriptive qualitative component modeling combined with a GPU analytical model.
2.5.2 Rational Harmony for Systems Engineering (SE)
The Harmony-SE MBSE process is shown on the left side of Figure 2-12. The
process is iterated to decompose a system level to the next system level (e.g. system to
subsystem). Key process model objectives (Hoffmann 2013) are to:
40
• Identify required system functions
• Identify associated system states and modes
• Allocate system functions and states/modes to next level structure
These modeling objectives emphasize state-based behavior and function
identification and allocation not detailed functional behavior (Hoffmann 2013).
2.5.2.1 System Functional Analysis
System Functional Analysis transforms functional requirements into an
executable functional model (Figure 2-12) with associated function descriptions. The
functional model contains a combination of SysML Internal Block, Sequence, Activity,
and Statechart diagrams. The diagrams are developed using one of three alternatives
shown in Figure 2-12. Each alternative implements Use Case scenarios that are identified
during Requirements Analysis (Hoffmann 2013). Statecharts are the primary executable
model diagrams used for system functional analysis. Internal Block, Activity, and
Sequence diagrams are used for model execution in later stages of the design process.
41
Figure 2-12 Harmony-SE Functional Analysis (Hoffmann 2013)
2.5.2.2 Harmony-SE Architecture Analysis
The main objective of Design Synthesis (Figure 2-13 right side) is development of
a physical architecture (i.e. next level entities) that perform required functions within
performance constraints (Hoffmann 2013). Architecture Analysis uses Trade Studies to
select the best approach to achieve the required capability for each system function. The
process flow for Architecture Analysis is shown in Figure 2-13. Harmony-SE uses the
Weighted Objectives Method (Cross 2008) to evaluate alternatives by building a
weighted objectives table for each system function. A Weighted Objectives (Cross 2008)
calculation is performed in the Determine Solution action to arrive at the preferred
solution for each system function (Hoffmann 2013).
42
Figure 2-13 Harmony-SE Architecture Analysis (Hoffmann 2013)
2.5.2.3 Harmony-SE Architecture Design
Where Architecture Analysis refines the functional architecture, Architecture
Design focuses on allocation of system functions to an architectural structure. Each Use
Case Scenario defined during System Functional Analysis is evolved from a black-box to
white-box view (also known as Use Case Realization) (Hoffmann 2013). Additionally,
the next level physical architecture structure is defined with parts and interfaces
(including ports). The physical architecture is modeled with BDDs and IBDs. Function
activities and state machines are allocated to the physical architecture. Function
allocations to parts and interfaces collaboration is verified through model execution
(Figure 2-14).
43
Figure 2-14 Harmony-SE Architecture Design Process (Hoffmann 2013)
2.5.3 Object-Oriented Systems Engineering Method (OOSEM)
OOSEM applies object-oriented principles at system levels but somewhat
differently than how they are applied to software development (Friedenthal, Moore and
Steiner 2015). OOSEM integrates traditional structured analysis methods with certain
object-oriented methods. OOSEM uses traditional SE process concepts such as
requirements engineering and trade studies shown in Figure 2-15. OOSEM uses
methods common to other Object-Oriented Systems Engineering (OOSE) such as Use
Cases, black/white box descriptions, and SysML shown in Figure 2-15. OOSEM also
includes unique methods such as logical decomposition, partitioning criteria, node
allocation that pertain to architecture development.
44
Figure 2-15 OOSEM Method Pyramid (Estefan 2008)
The top-level OOSEM system development process is shown in Figure 2-16.
Each pass through the process produces specification(s) at the next level. The process is
repeated at system, system element, and component levels. The process is performed
recursively until requirements are specified for software, database, hardware, and
operational procedures. The Define Logical Architecture block (Figure 2-16) decomposes
current level logical components to next level logical components including logical
component interaction to satisfy current level requirements. The Synthesize Candidate
Physical Architectures (Figure 2-16) block allocates next level logical components next
level physical components. The Optimize and Evaluate Alternatives (Figure 2-16) block
performs design optimization and design trade studies.
45
Figure 2-16 OOSEM Specify and Design System Process (Friedenthal 2015)
Logical architecture definition (Figure 2-17) decomposes current level logical
components into next level logical components. Logical components abstract physical
components that satisfy required functionality without dictating implementation
constraints. Logical scenarios describe logical component interactions that realize system
element block functionality. Logical component interconnection is defined using internal
block diagrams (Figure 2-17). Initial next level logical components can again be
decomposed to repartition functionality and properties. Next level logical components
are specified in the same way as current level logical components (Figure 2-17). If a
logical component is characterized by state-based behavior, the logical component can be
specified by a state machine (Figure 2-17).
46
Figure 2-17 OOSEM Define Logical Architecture Process (Friedenthal 2015)
Figure 2-18 evaluates alternative next level physical architectures that satisfy next
level logical architecture requirements. The physical architecture is defined by physical
components, component relationships, and component distribution among system
elements (or nodes). Partitioning criteria (such as performance, reliability, etc.) are
defined for use in partition analysis (Figure 2-17). Logical component architecture
function, control, and persistent store elements are mapped to SysML nodes (e.g. block,
activity, etc.) to define a logical node architecture (Figure 2-18). Physical components
are mapped to SysML nodes to define a physical node architecture (Figure 2-18). At the
lowest level, logical node elements are mapped to software, persistent data, hardware, or
operator procedures (Figure 2-18). Critical component properties are identified for use in
trade study analysis to evaluate and select a refined physical architecture (Figure 2-18).
47
Figure 2-18 OOSEM Define Physical Architecture Process (Friedenthal 2015)
OOSEM Optimize and Evaluate Alternatives follows the flow shown in Figure
2-19.
Figure 2-19 OOSEM Optimize and Evaluate Alternatives Process (Friedenthal 2015)
48
A block definition diagram is used to model “define analysis context” (Figure
2-19) for trade studies as shown in Figure 2-20. Block definition and parametric SysML
diagrams are used to further elaborate the equations associated with each analysis (Figure
2-21).
Figure 2-20 Analysis Context Block Definition Diagram Example (Friedenthal 2015)
49
Figure 2-21 OOSEM Cost Effectiveness Analysis Parametric Model
Figure 2-21 models analysis equation(s) but is not an executable model.
2.5.4 Vitech Model-Based Systems Engineering
Vitech defines an MBSE development approach called STRATA™ (Long 2011)
that analyzes and decomposes a system into layers of increasing granularity as shown in
Figure 2-22. Each more detailed layer strategically converges to a final solution. A layer
integrates four domains: Requirements, Behavior, Architecture, and Verification and
Validation (V&V). V&V criteria at each layer must be met before proceeding to the next
layer.
50
Figure 2-22 Vitech STRATA™ MBSE Model (Long 2011)
Layer 1, the most general, functions are derived from systems requirements.
Functions are allocated to the system architecture. Vitech provides a representative flow
and set of artifacts for system architecture development shown in Figure 2-23. The
resulting system architecture exists within an Operational (or Enterprise) architecture.
The system architecture provides a context to relate operational entities with system
entities (Long 2011).
Figure 2-23 Vitech MBSE Architecture Diagram (Long 2011)
51
The behavior domain is supported by diagrams that identify functions, function
control flow, and function data flow. Control flow diagrams include Functional Flow
Block Diagrams (FFBDs), Enhanced Functional Flow Block Diagrams (EFFBDs), and
Activity Diagrams (ADs). Data flow diagrams include N2 charts and Sequence
Diagrams (SDs) (Long 2011). The Layer 1 behavior domain identifies functions at the
system level. The Level 2 behavior domain identifies functions and functional threads
(control/data flows) at the subsystem level. Subsequent layers (i.e. through Layer N)
decompose Layer 2 subsystems into more granular functions and functional threads.
The architecture domain encapsulates a physical hierarchy which receives
allocations of functions. The Layer 1 architecture identifies the system and associated
external counterparts. (Long 2011). The Layer 2 architecture evaluates system
partitioning strategies using criteria such as complexity (interface and testing),
performance, technology risk, future performance, and future technology insertion.
Architecture decomposition and partition evaluation continue to Layer N.
Figure 2-24 CORE® System Design Repository (SDR) (Booth 2008)
52
CORE® (Figure 2-24) supports Vitech MBSE by providing a central repository,
called the System Design Repository, and tool framework that (Vitech 2013):
• Integrates requirements
• Executes behavior models
• Facilitates architecture development
• Supports verification and validation
• Produces system documentation
2.5.5 IBM RUP-SE
The IBM Rational Unified Process® (RUP®) provides an iterative lifecycle
development framework. RUP was originally intended for software development but was
extended to support systems engineering. RUP® for Systems Engineering (RUP-SE)
(Nolan 2008) instantiates Model-driven Systems Development (Balmelli 2006). RUP-SE
facilitates Architecture-centric system development using Unified Modeling Language
(UML) 2.0 semantics (Cantor 2003). The RUP-SE architecture framework defines a set
of model levels (Table 2-1), viewpoints (Table 2-2), and views (Table 2-3) consistent
with ISO/IEC/IEEE-42010 (ISO/IEC JTC 1/SC 7 2011). Views elaborate viewpoints at
each model level as shown in Table 2-3.
53
Model level Expresses
Context System black box – the system and its actors (though this is a
black-box view of the system, it is a white-box view of the
enterprise containing the system)
Analysis System white box – initial system partitioning in each viewpoint
that establishes the conceptual approach
Design Realization of the analysis level in hardware, software, and
people
Implementation Realization of the design model into specific configurations
Table 2-1 RUP-SE Architecture Framework Model Levels (Cantor 2003)
Each model level encapsulates a level of specificity from abstract to concrete.
Each model level groups artifacts of similar detail. A model level does not represent a
decomposition level. Each model level can encapsulate multiple decomposition levels.
RUP-SE identifies system, subsystem, sub-subsystem, and classes as decomposition
levels.
Analysis model level Logical Viewpoint (Table 2-2) encapsulates functional
decomposition of system, subsystem, sub-subsystem, etc. technology independent
artifacts. Similarly, the Analysis model level Distribution Viewpoint (Table 2-2) defines
localities to distribute functionality. Design level models capture decisions that drive
implementation. Design level models are descriptive models not quantitative or
executable models. Analysis to design level transition maps subsystems, localities, and
classes to software, hardware, and worker designs. Supplementary (or non-functional)
54
requirements constrain distribution choices. RUP-SE supports the concept of “design
trades” in the construction of alternate design level distribution conceptual approaches
that are analyzed in terms of feasibility, quality, and cost.
Viewpoint Expresses Concern
Worker Roles and responsibilities
of system workers
Worker activities, human system
interaction, human performance
specification
Logical
Logical decomposition of
the system as a coherent
set of SysML blocks that
collaborate to provide the
desired behavior
• Adequate system functionality to realize
use cases
• System extensibility and maintainability
• Internal reuse
• Good cohesion and connectivity
Distribution
Distribution of the physical
elements that can host the
logical services
Adequate system physical characteristics to
host functionality and meet supplementary
requirements
Information Information stored and
processed by the system
Sufficient system capacity to store data:
sufficient system throughput to provide
timely data access
Geometric Spatial relationships
between physical systems Manufacturability, accessibility
Process
Threads of control that
carry out computational
elements
Sufficient partitioning of processing to
support concurrency and reliability needs
Table 2-2 RUP-SE Architecture Viewpoints (Cantor 2003)
RUP-SE employs the Object Management Group (OMG) System Modeling
Language (SysML) to model various viewpoint views. The context level logical
viewpoint view uses SysML Use Case diagrams (Table 2-3) to model actors, functions,
and provide functional descriptions at the system decomposition level. SysML block
diagrams are used to model the structural aspect of function decomposition and SysML
activity diagrams are used to model functional flow. SysML sequence diagrams are used
55
to model external and internal interactions. The view artifacts shown in Table 2-3 offers
no location to analyze the black-box (computer resource and software) multi-attribute
characteristics with associated distribution (or allocation).
Model
levels
Model viewpoints
Worker Logical Informatio
n
Distributio
n Process Geometric
Context Role
definition,
activity
modeling
Use case
diagram
specification
Enterprise
data view
Domain-
dependent
views
Domain-
dependent
views
Analysis Partitionin
g of system
Product
logical
decompositio
n
Product data
conceptual
schema
Product
locality
view
Product
process
view
Layouts
Design Operator
instruction
s
Software
component
design
Product data
schema
ECM
(electronic
control
media)
design
Timing
diagram
s
MCAD
(mechanica
l computer-
assisted
Implemen
-tation
Hardware and software configuration
Table 2-3 RUP-SE Sample Model Views (Cantor 2003)
2.5.6 Selected MBSE Methodology
All of the synopsized methodologies address functional (or behavioral) and
physical architecture development. All methodologies identify trade studies as a method
to evaluate various logical architecture to physical architecture allocation solutions. The
Harmony-SE methodology specifically elaborates an architecture analysis method
(Figure 2-13). The method integrates with model-based functional analysis (i.e. logical
architecture development) and model-based architecture design. The method forms the
baseline for definition of a model-based architecture analysis method.
56
2.6 HW/SW Partitioning Optimization
HW/SW partitioning is considered a sub-process of HW/SW Co-design where the
system designer makes function allocation decisions between CPU (SW) and FPGA (or
ASIC) (HW). All decisions are made to synthesize an optimum solution based on one or
more cost criteria (latency, area, cost, etc.). Early approaches (early 90s) placed all
functionality in HW (HW-approach) or SW (SW-approach) and partitioned to optimize
(minimize or maximize) a defined cost function (Wolf 2003). Vulcan-II (HW approach)
(Gupta 1992) partitioned a system graph model (based on data-flow graphs) into HW or
SW modules using an initial assumption of all HW modules. The system graph model is
produced from a system behavior model described using HardwareC. A module is moved
to SW upon satisfaction of timing constraint(s) and communication overhead (cost
function) minimization. Cosyma (SW approach) (Ernst 1993) partitioned an Extended
Syntax (ES) graph model (i.e. annotated control and data-flow graphs) into HW and SW
modules using an initial assumption of all SW modules. The ES model is produced from
a CX system description. Later (mid 90s) the Lyngby Co-synthesis System (LYCOS)
(Madsen 1997) approach represented functional behavior using control/data-flow graphs
overlain with a fine-grained computation model consisting of Basic Scheduling Blocks
(BSBs). Starting with all SW blocks, LYCOS uses the PACE (Knudsen 1996) dynamic
programming algorithm to map blocks to HW maximizing speedup for a defined
hardware area (i.e. maximize cost function).
There has been considerable research into the application of optimization algorithms to
the HW/SW partitioning problem over the last two decades. The problem is considered
to be NP-Hard in terms of computational complexity meaning optimization algorithm
57
execution time increases exponentially with the addition of analysis nodes. Optimization
algorithms analyze graph node/edge costs to optimize overall cost. Graph models
include Directed Acyclic Graph (DAG), Data Flow Graph (DFG), Control/Data Graph
(CDFG), and BSB. The TGFF (Dick 1998) tool generates pseudorandom task graphs
that implements DAGs for the purpose of performing and comparing HW/SW allocation
optimization algorithms. A sample DAG for three functions allocated to SW and HW
processing resources is shown in Figure 2-25.
Figure 2-25 Sample DAG
(López-Vallejo 2003) summarizes some optimization algorithms developed in the
90’s. Table 2-4 summarizes some of the many HW/SW partitioning optimization
algorithms developed since the mid-00s categorized by solution precision (exact versus
heuristic or near-optimal, algorithm method, graph model method, and granularity
(coarse [task level] versus fine [instruction level). Precise algorithm classes include
integer linear and dynamic programming8. Heuristic algorithm classes include Simulated
Annealing (SA), Evolutionary, Knapsack (KS), genetic (sub-classed as gene-based
8 In the 90’s precise optimization algorithms were developed using integer (Niemann and Marwedel,
Hardware/Software Partitioning Using Integer Programming 1996) and mixed integer linear programming
(Niemann and Marwedel, An Algorithm for Hardware/Software Partitioning Using Mixed Integer Linear
Programming 1997).
58
Genetic Algorithm (GA), meme-based Memetic Algorithm (MA), Ant Colony (AC),
Particle Swarm (PS), and Shuffled Frog Leaping (SFL)). Many GA’s are further refined
by search method including Non-dominated Sort (NS), Tabu Search (TS), and Pareto (P).
(Elbeltagi 2005) compares five evolutionary optimization algorithms (GA, MA, PS AC,
SFL) albeit not the HW/SW partitioning problem.
Reference
Solution
Precision
Algorithm
Method
Graph
Method
Granularity
(Kuang 2005) Exact Linear
(Integer)
DAG Coarse
(Banerjee 2006) Exact Linear
(Integer)
Task Graph Coarse
(Wu 2006) Exact Dynamic CDFG Coarse
(Henkel 2001) Near-
Optimal
SA BSB Coarse/Fine
(Banerjee 2004) Near-
Optimal
SA DAG Coarse
(Jing 2013) Near-
Optimal
SA + Greedy DFG Coarse
(Liu 2013) Near-
Optimal
SA + TS DAG Coarse
(Wu 2013) Near-
Optimal
KS + TS CDFG Fine
(Schlichter 2006) Near-
Optimal
Evolutionary Specification
Graph
Coarse
(Zitzler 2001) Near-
Optimal
GA + P N/A Coarse
(Deb 2002) Near-
Optimal
GA + NS N/A Coarse
(Mudry 2006) Near-
Optimal
GA Source Code Fine
(Li 2014) Near-
Optimal
GA DAG Coarse
(Lin 2014) Near-
Optimal
GA + TS DAG Coarse
(Kang 2013) Near-
Optimal
GA + PSO DAG Coarse
(Yu-dong 2009) Near-
Optimal
AC CDFG Coarse
(Du 2014) Near-
Optimal
SFL DAG Coarse
59
Table 2-4 HW/SW Partitioning Optimization Algorithm Summary
The algorithms shown in Table 2-4 are candidates to determine optimum logical
architecture function to candidate physical architecture allocation. Inspection of Table
2-4 Graph Method column shows that DAG is the most popular node model method for
HW/SW optimization algorithms. The architecture attribute model is constructed to
integrate DAG node structures which can be export to optimization algorithms9.
9 Optimization algorithm integration and evaluation is not covered by this research effort.
60
Chapter 3 - Research Methodology
3.1 Research Method
The method employed by this research effort was to develop a framework that
includes model-based analysis method and models. The research method was executed
in four phases:
• Define Architecture Attribute Analysis method
• Develop executable Architecture Attribute Model Framework
• Develop Statistical Performance Models
• Perform Case Study
The following sections provide details of the activities and artifacts produced by
each research phase.
3.2 Component Architecture Attribute Introduction
Functional analysis at the component-level develops a component-level functional
architecture. The architecture comprises a group of functional threads as discussed in
(McKean 2019). Each thread is assigned a Component Response Time (CRT) MOP
constraint. Each thread CRT decomposes to a group of thread function latency
constraints. Each thread and thread function are energy and thermal constraints.
The physical architecture identifies candidate CR solutions. Each CR solution
contains a combination of sCPU, mCPU, and GPU processing resources. This paper
replicates thread/ function structure in both functional and physical architecture replacing
the term “function” in the functional architecture with “function node” in the physical
61
architecture. In addition, “function nodes” are mapped to DAG nodes for integration with
optimization algorithms.
The architecture attribute model defined in this paper extends both functional and
physical architectures (sections 3.4 through 3.6). This paper presents a detailed
performance attribute model exposition including development of an sCPU computation
model (i.e. SPM) for mathematical operations (section 3.6.2.1). Section 3.3 introduces the
architecture attribute analysis method that details modeling activities that produce
application architecture attribute executable models.
3.3 Component Architecture Attribute Analysis Method
Figure 3-1 defines an architecture attribute analysis method that incorporates
modeling workflow elements required to develop an architecture attribute model (defined
in sections 3.4 and 3.5). The method replaces the existing Rational Harmony-SE
architecture analysis workflow in (Figure 2-13).
62
Figure 3-1 Component Architecture Attribute Analysis Workflow
The model defines two execution states corresponding to two method use cases:
optimization and simulation. The model also supports two analysis modes: ‘Most Likely’
where model computes Expected Value (EV) (e.g. execution time for performance
attribute) and worst case (e.g. Worst-Case Execution Time (WCET) (Wilhelm 2008) for
performance attribute).
The following sections provide method details for each workflow step with model
blocks designated by bold font and block attributes designated by italics font.
63
3.3.1 Define Key Component System Functions
This workflow step is synonymous with ‘System Functional Analysis’.
Harmony-SE implements this workflow step in Figure 2-12 to produce an executable
logical architecture. The architecture consists of structural diagrams (BDDs and IBDs).
BDDs model functional decomposition. Block ports/interfaces define interaction points
between blocks. IBDs model the internal part structure of blocks. Each part contains a
behavior diagram (AD or SMD).
Figure 3-2 details various elements of a component logical architecture. Each
Use Case at the component level is composed of one or more scenarios (Pohl 2010) (or
function threads). Each function thread is composed of one or more functions as in
Figure 3-2. For streaming applications (e.g. Digital Signal Processing) each function
encapsulates an algorithm. Algorithms can be evaluated at the thread level (for example,
spatial domain filter versus frequency domain filter) or at the individual function level
(for example, radix-2, radix-4, split-radix Fast Fourier Transform (FFT) algorithm
(Balducci 1997).
64
Figure 3-2 System Function Definition Artifacts
3.3.2 Assign Attribute/Thread Weights
This process step assigns weights for Performance, Energy, and Thermal
architecture attributes (see AttributeWeight block in Figure 3-5). Architecture attribute
weights are uniformly applied to all functions of all threads. The weights are assigned to
each attribute based on the system domain. The performance attribute would be most
prominent in streaming applications (e.g. signal processing) embedded in sensor, combat,
command and control and other similar systems that do not have energy or thermal (i.e.
65
cooling) constraints. The energy attribute would be most prominent for smart phone
applications and other energy constrained (e.g. battery-operated) systems. The thermal
attribute would be most prominent in automotive and other high thermal environment
systems.
One approach used to determine attribute/thread weights is the swing weight
matrix (Parnell 2009) as discussed in Mckean, et. al. (McKean 2019). Another approach
to determine attribute/thread weights is the Weighted Objectives Method (Cross 2008).
This method uses relative weights among architecture attributes and thread execution
frequency. Figure 3-3 illustrates an example configuration of weighted architecture
attributes for a system where performance is determined to be the most important
attribute, thermal is valued at 60 percent relative to performance, and energy is valued at
20 percent relative to performance. Attributes do not have to be placed at the top and
bottom of the scale. Weights are numbered from 1 to 10 when used to compute minimum
cost (see section 3.3.6). Weights are set to reciprocal values of those shown in Figure 3-3
when computing minimum cost. Weights are numbered from 10 to 1 when used to
compute maximum cost.
Figure 3-3 Sample Architecture Attribute and Thread Weights
66
This workflow step also assigns thread weights (see ThreadWeight block in
Figure 3-5) for each thread frequency (i.e. Very High, Normal, etc.). Each thread is
uniquely assigned a thread weight (Thread_X_Weight in Figure 3-5). Threads are
organized into Very High, High, Normal, and Seldom thread frequency. Each category is
further sub-divided into normal and abnormal (or failure) threads (Carson 2013). Thread
categories are weighted similar to architecture attributes. Figure 3-3 shows an example
set of thread frequency weights. Specific architecture attribute and thread frequency
weights are determined uniquely for each system domain based on system requirements
and environment.
The swing matrix method (McKean 2019) is preferred for applications that have a
large number of function threads and/or functions that are grouped by importance and
variation. The weighted objectives method is preferred for applications that have a small
number of function threads and/or functions or where there is a desire to uniquely weight
each function thread and/or function attribute.
3.3.3 Define Candidate Physical Architecture Solutions
This workflow step identifies a set of CR alternative solutions that form a
physical trade space for analysis. Each CR solution instantiates the
Candidate_X_PhysicalArchitecture block in Figure 3-5. Each CR solution is defined
as a NumberCpus (0 if no multicore or >1) operating at CpuClockFrequency (in GHz)
and NumberGpuThreads (0 if no GPU) operating at GpuClockFrequency (in GHz). A
single CPU configuration is included for every CR alternative operating at
CpuClockFrequency. Model specifics are provided in section 3.4.
67
3.3.4 Model Function Attributes
This workflow step adds executable architecture attribute model elements that
extends each logical architecture (Figure 3-6 left side) function. Model elements are
added to each function of all function threads. The model currently implements the
performance attribute. Specific model details are provided for the performance attribute
in section 3.6.1.
3.3.5 Model Physical Architecture Attributes
This workflow step adds executable architecture attribute model elements that
extends each function-function node (Figure 3-6 right side) pair for all CR architecture
(SC, MC, GPU) elements for all supported architecture attributes. The model currently
implements the performance attribute. Specific model details are provided for the
performance attribute in section 3.6.2.
3.3.6 Compute Attribute Cost
This workflow step computes thread and function node costs (NOTE: Costs are
NOT monetary, but computed values for use in optimization algorithms as discussed in
section 2.6). Computed costs will be minimum or maximum costs as dictated by the
optimization algorithm. The architecture attribute model supports both minimum and
maximum cost computations. The cost algorithms defined in this section compute
minimum costs. Each attribute cost computation is patterned after computation of a
Technical Performance Measure (TPM)10. Attribute and thread weights provided from
section 3.3.2 must be consistent with minimum (or maximum) cost computations.
10 TPM definition from the SEBoK (BKCASE Editorial Board 2017) is “Measures of attributes of a system
element within the system to determine how well the system or system element is satisfying specified
68
PhysArch_X_Thread_Y blocks (Figure 3-5 right side) encapsulate thread cost
values. PhysArch_X_Thread_X_ FunctionNode_Z blocks (Figure 3-5 right side)
encapsulate function node cost values. Figure 3-5 depicts system Measure of
Effectiveness (MOE)11 ‘System Response Time’ flow down to component level
‘Component Response Time’ MOP for each thread.
Function node WeightedNodeCost (i.e. CostNode) is computed for each CR to
minimize node cost12. WeightedNodeCost is computed for each CR as follows:
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑁𝑜𝑑𝑒𝐶𝑅= 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑃𝑒𝑟𝑓𝐶𝑅
+ 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝐸𝑛𝑒𝑟𝑔𝑦𝐶𝑅+
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝐻𝑒𝑎𝑡𝐶𝑅 (3)
where CR is SC, MC, GPU; WeightedCostPerf is the WeightedNodePerformance
Cost attribute (Figure 3-6 right side); WeightedCostEnergy is the WeightedNodeEnergyCost
attribute (Figure 3-6 right side); and WeightedCostHeat is the WeightedNodeHeatCost
attribute (Figure 3-6 right side). WeightedNodeEnergyCost is computed for each CR
using the following:
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑃𝑒𝑟𝑓𝐶𝑅= 𝑊𝑝 ∗
𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝐸𝑠𝑡𝐶𝑅
𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑀𝑂𝑃 (4)
where CR is SC, MC, GPU; ExecTimeEst is estimated execution time (Figure 3-6
right side), LatencyMOP is allocated function Latency MOP (Figure 3-6 center), and
weight Wp is the PerformanceWeight attribute from AttributeWeight (Figure 3-5 right
side).
requirements (Roedler and Jones 2005, 1-65) (Roedler and Jones 2005)”. Here system element is
component and requirements are MOPs. 11 These constraints are derived from quality requirements (Pohl 2010). 12 Some optimization algorithms prefer to maximize node cost. For this case MOP is moved to numerator
and Estimate is moved to the denominator for equations (4), (5), and (6).
69
WeightedNodeEnergyCost is computed for each CR using the following:
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝐸𝑛𝑒𝑟𝐶𝑅= 𝑊𝐸 ∗
𝐸𝑛𝑒𝑟𝑔𝑦𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑑𝐸𝑠𝑡𝐶𝑅
𝐸𝑛𝑒𝑟𝑔𝑦𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑑𝑀𝑂𝑃 (5)
where CR is SC, MC, GPU; EnergyConsumedEst is estimated consumed energy
(Figure 3-6 right side), EnergyConsumedMOP is allocated function energy consumption,
and weight WE is the EnergyWeight attribute from AttributeWeight (Figure 3-5 right
side).
WeightedNodeThermalCost is computed for each CR using the following:
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑇ℎ𝑒𝑟𝑚𝑎𝑙𝐶𝑅= 𝑊𝑇 ∗
𝐻𝑒𝑎𝑡𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑𝐸𝑠𝑡𝐶𝑅
𝐻𝑒𝑎𝑡𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑𝑀𝑂𝑃 (6)
where CR is SC, MC, GPU; HeatGeneratedEst is estimated generated heat (Figure
3-6 right side), HeatGeneratedMOP is allocated function heat generation, and weight WT is
the ThermalWeight attribute from AttributeWeight (Figure 3-5 right side).
Function node costs are combined to form a thread cost according to the
following:
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝐶𝑜𝑠𝑡𝑇ℎ𝑟𝑒𝑎𝑑 = 𝑊𝑇ℎ ∗ (𝑁𝑜𝑑𝑒𝐶𝑜𝑠𝑡𝐹𝑁_1𝐶𝑅+ 𝑁𝑜𝑑𝑒𝐶𝑜𝑠𝑡𝐹𝑁_2𝐶𝑅
+ ⋯ + 𝑁𝑜𝑑𝑒𝐶𝑜𝑠𝑡𝐹𝑁_𝑛𝐶𝑅) (7)
where CR is SC, MC, GPU; NodeCostFN_1 through NodeCostFN_n are weighted
node costs computed by equation (3) above; and WTh is one of the attributes from
ThreadWeight.
70
3.3.7 Perform Optimization Analysis
This workflow step supports execution of an optimization algorithm, encapsulated
by OptimizationAnalysis (Figure 3-5), to select the optimum solution from the trade
space created by all Candidate_X_PhysicalArchitecture blocks. Many different meta-
heuristic techniques have been used to optimize HW-SW partitioning (see section 2.6).
The OptimizationAnalysis block provides a model placeholder for implementation of
one or more of these algorithms. HW-SW optimization algorithms require evaluation for
applicability to SW-SW optimization (and is not the subject of this research). This
research defines a DAG (see section 2.6) framework (Figure 3-4) that includes all CRs
currently supported by this model that is exported to the OptimizationAnalysis block.
Figure 3-4 DAG Current Supported Model CRs
Figure 3-4 shows a sample set of DAG nodes for three functions executing on
each CR.
71
3.3.8 Perform Simulation Analysis
This workflow step encapsulates execution of a simulation capability
encapsulated by SimulationAnalysis (Figure 3-5) to analyze architecture attribute
simulation data produced from the trade space when the model executes in the simulation
state. Generation of architecture attribute simulation data is discussed in section 3.7.3.
3.3.9 Compute Total Attribute Cost
This process step determines the total cost for each solution encapsulated by each
Candidate_X_PhysicalArchitecture block. This method builds a
Candidate_X_Physical Architecture SolutionCost by summing the
Thread_X_Physical OptimumThreadCost (i.e. maximum cost) returned by the
OptimizationAnalysis block for all solution threads.
The method in the preceding paragraph is repeated for each
Candidate_X_Logical Architecture block to develop a series of
Candidate_X_PhysicalArchitecture solution costs.
3.3.10 Select Solution Architecture
This method selects the Candidate_X_PhysicalArchitecture with the minimum
(or maximum, if optimization algorithm uses maximum cost) Solution Cost as the
preferred physical architecture.
3.4 Architecture Attribute System Model Overview
Figure 3-5 presents a component level SysML Block Definition Diagram (BDD)
that defines a set of candidate logical architectures and a set of candidate physical
72
architectures. The user configures the model to support one of three trade study
scenarios:
• One logical architecture with multiple physical architectures –
enables evaluation single algorithm(s) on multiple CR configurations.
For example, to evaluate one algorithm on two physical architectures set
NumberLogical Architectures to 1, NumberLogicalThreads to 1 for
logical architecture one, NumberLogicalThreadFunctions to 1 for thread
one of logical architecture one, NumberPhysicalArchitectures to 2,
NumberPhysicalThreads to 1 for both physical architecture one and two,
and NumberPhysicalThread Functions to 1 for thread one of both
physical architecture one and two.
• Multiple logical architectures with a single physical architecture –
enables evaluation of multiple algorithms on a single CR configuration.
For example, to evaluate two algorithms on one physical architecture set
NumberLogical Architectures to 2, NumberLogical Threads to 1 for both
logical architecture one and two, NumberLogicalThreadFunctions to 1
for thread one of both logical architecture one and two,
NumberPhysicalArchitectures to 1, NumberPhysical Threads to 1 for
physical architecture one, and NumberPhysicalThreadFunctions to 1 for
thread one of physical architecture one.
• Multiple logical architectures on multiple physical architectures –
enables evaluation of multiple algorithms on multiple CR configurations.
For example, to evaluate two algorithms on two physical architectures set
73
NumberLogical Architectures to 2, NumberLogicalThreads to 1 for both
logical architecture one and two, NumberLogicalThreadFunctions to 1
for thread one of both logical architecture one and two, NumberPhysical
Architectures to 2, NumberPhysical Threads to 1 for both physical
architecture one and two, and NumberPhysical ThreadFunctions to 1 for
thread one for both physical architecture one and two.
Figure 3-5 Architecture Attribute Model Overview
The left side of Figure 3-5 decomposes a logical architecture container that
encapsulates one or more candidate logical architecture(s). Each logical architecture
encapsulates one or more functional thread(s). Each functional thread encapsulates one
or more function(s). The right side of Figure 3-5 decomposes a physical architecture
container that encapsulates one or more candidate physical architecture(s). Each physical
74
architecture encapsulates one or more physical functional thread(s). Each physical
functional thread encapsulates one or more physical function node(s). The logical and
physical architectures structures are identical. Figure 3-5 provides a descriptive model
BDD of the structure of logical and physical architecture containers. However, Figure 3-5
does not provide enough semantic detail to build an executable model. Within SysML,
IBDs, ADs, and SMDs provide the semantic detail required to construct executable
models. This section presents the model constructs necessary to build an executable
model that performs architecture analysis, structurally decompose the logical architecture
to the function level, structurally decompose the physical architecture to the function
level, and describe architecture analysis behavior to the logical/physical function level.
The ArchitectureAnalysis block (Figure A-1) models a container that functions
as the seed model element for construction of an executable model. The Architecture
Analysis IBD models a group of SysML parts (i.e. block instantiations) and part
interfaces (i.e. ports) shown in Figure A-1. The ArchitectureAnalysis IBD instantiates
and interconnects the four main blocks used to perform architecture attribute analysis; 1)
LogicalArchitectureContainer (i.e. via CMP_Logical ArchitectureContainerPart), 2)
PhysicalArchitectureContainer (i.e. via CMP_Physical ArchitectureContainerPart),
3) OptimizationAnalysis (i.e. via OptimizationAnalysisPart), and 4) Simulation
Analysis (i.e. via SimulationAnalysisPart). The ArchitectureAnalysis IBD also
instantiates and interconnects nine support blocks that configure various trade study
parameters:
75
• Number of Logical Architectures (1-4): CMP_NumberLogical
ArchitecturesBlock (i.e. Figure A-1 CMP_NumberLogical
ArchitecturesPart).
• Number of Physical Architectures (1-4): CMP_NumberPhysical
ArchitecturesBlock (i.e. Figure A-1 CMP_NumberPhysical
ArchitecturesPart).
• Analysis State (Analysis or Simulation): AnalysisStateBlock (i.e. Figure
A-1 AnalysisStatePart).
• Analysis Mode (Most Likely or WCET): AnalysisModeBlock (i.e.
Figure A-1 AnalysisModePart).
• Optimization Mode (ThreadLevel or FunctionLevel): OptimizationMode
block (i.e. Figure A-1 OptimizationModePart).
• Simulation Mode (ThreadLevel or FunctionLevel): SimulationMode
block (i.e. Figure A-1 SimulationModePart).
• Thread Weight (see section 3.3.2): ThreadWeightBlock (i.e. Figure A-1
ThreadWeightPart).
• Attribute Weight (see section 3.3.2): AttributeWeightBlock (i.e. Figure
A-1 AttributeWeightPart).
• Thread Constraints (see section 3.3.6): ThreadConstraintsBlock (i.e.
Figure A-1 ThreadConstraintsPart). The current
ThreadConstraintsBlock definition contains a SystemResponseTime
constraint for each physical thread (1-4) of each physical architecture (1-4)
76
and Latency constraint for each physical function (1-4) of each physical
thread (1-4) of each physical architecture (1-4).
Finally, the ArchitectureAnalysis IBD instantiates and interconnects the
Start ArchitectureAnalysisBlock (i.e. StartArchitectureAnalysisPart) that
provides overall architecture analysis executable model control. The IBD for
StartArchitectureAnalysis Block is shown on the left side of Figure A-2. The
IBD defines one activity whose Activity Diagram (AD) is shown on the right side
of Figure A-2. The AD issues a Start ArchitectureAnalysis event for each
logical architecture (CMP_LogicalArchitecture ContainerPart in Figure A-3)
to CMP_PhysicalArchitectureContainerPart (Figure A-12) to step through
each physical architecture. The number of logical architectures analyzed is
defined by the CMP_NumberLogicalArchitecturesBlock and the number of
physical architectures is defined by the OUT_NumberLogicalArchitectures
attribute of the CMP_NumberPhysicalArchitecturesBlock.
From a structural perspective, the LogicalArchitectureContainer (i.e.
CMP_ LogicalArchitectureContainerPart) encapsulates four logical
architecture block instantiations (see Figure A-3). Each logical architecture block
encapsulates four logical functional thread block instantiations (see Figure A-5)13.
Each logical architecture logical functional thread encapsulates four logical
functions (see Figure A-7)14. From a corresponding structural perspective, the
PhysicalArchitectureContainer (i.e. CMP_PhysicalArchitectureContainer
13 NOTE: The model currently supports four logical threads and can be easily expanded to support more
logical threads. 14 NOTE: The model currently supports four logical thread functions and can be easily expanded to
support more logical thread functions.
77
Part) encapsulates four physical architecture block instantiations (see Figure
A-12). Each physical architecture block encapsulates four physical functional
thread block instantiations (see Figure A-14)15. Each physical architecture
functional thread encapsulates four function nodes (see Figure A-16)16.
From a behavior perspective, the PhysicalArchitectureContainer
propagates a start event to initiate attribute computations and propagates a results
available event to retrieve computed attribute values. The executable model
element Physical ArchitectureContainer includes a CMP_PhysArch_
ExecutionControl block (i.e. CMP_PhysArch_ExecutionControlPart) that
provides execution behavior control for each of the four physical architecture
block instantiations (see Figure A-12). Execution control is provided via the
Activity Diagram (AD) in Figure A-13. The AD accepts StartArchitecture
Analysis event as the behavior entry point. The AD then retrieves the number of
physical architectures to analyze and issues the StartPhysicalArchitecture_1 event
to physical architecture one. Physical architecture one represented by the CMP_
Candidate_1_PhysicalArchitecture block includes a
CMP_PhysArchOne_ExecutionControl block (i.e. CMP_Candidate_1_
PhysicalArchitecturePart) that provides execution behavior control for each of
the four physical thread block instantiations (see Figure A-14). Execution control
is provided via the AD in Figure A-15. The AD accepts
StartPhysicalArchitecture_1 event as the physical architecture one behavior entry
15 NOTE: The model currently supports four physical threads and can be easily expanded to support more
physical threads. 16 NOTE: The model currently supports four physical thread functions and can be easily expanded to
support more physical thread functions.
78
point. The AD then retrieves the current logical architecture being analyzed and
the number of threads associated with the current logical architecture and issues
the StartPhysArchOneThreadOne event to physical architecture one thread one.
Physical architecture one thread one represented by the
CMP_PhysArchOne_ThreadOne block includes a CMP_PhysArchOneThr
One_ExecutionControl block (i.e. CMP_PhysArchOneThrOne_Execution
ControlPart) that provides execution behavior control for each of four physical
thread function block instantiations (see Figure A-16). Execution control is
provided via the AD in Figure A-17. The AD accepts StartPhysArchOneThread
One event as the physical architecture one thread one behavior entry point. The
AD then retrieves the current logical architecture being analyzed and the number
of thread functions associated with the current logical architecture and issues the
StartPhysArchOneThread OneFunctionOne event to physical architecture one
thread one function one. The CMP_PhysArchOne_ThreadOne_FunctionOne
block (i.e. CMP_PhysArchOne_Thread One_FunctionOnePart) encapsulates
physical architecture one thread one function one (see Figure A-18). The CMP_
FunctionPhysicalContainerBlock (i.e. CMP_FunctionPhysicalContainerPart
in Figure A-18) represents the interface point to the physical architecture attribute
layer (section 3.5). Execution control is provided via two ADs. The first AD
(Figure A-19) implements behavior for CMP_PhysArchOneThrOneFuncOne_
ExecutionControlBlock block (i.e. CMP_PhysArchOneThrOneFuncOne_
ExecutionControlPart in Figure A-18) that provides execution behavior control
for physical function block instantiations. The AD performs an initial state that
79
identifies the function by thread number and function number. The AD accepts
StartPhysArchOne ThreadOneFunctionOne event and issues the StartFuncPhys
AttributeComputations event with arguments thread number and function number
(NOTE: Behavior thread continues in section 3.5).
The second AD (Figure A-20) implements behavior for CMP_PhysArch
OneThrOneFuncOne_InterfaceBlock (CMP_PhysArchOneThrOneFunc
One_InterfacePart in Figure A-18). The AD performs an initial state that
identifies the function by thread number and function number. The AD accepts
FuncPhysAttributeComputationResults Available event and compares thread
number and function number event arguments to the function identification (i.e.
the thread and function numbers read at initialization). If the comparison is true,
then the AD executes operation retrieveFuncPhysAttribute Values to retrieve
computed physical attributes (I.e. Execution Time, Energy Consumed, Heat
Generated) for all CRs (i.e. SC, MC, GPU). The AD next executes compute
FunctionOneCosts to compute function costs using computed physical attributes.
Finally, the AD issues event RsltPhysArchOneThrOneFuncOneComplete to the
CMP_PhysArchOneThrOneFuncOne_ExecutionControlBlock AD (see
Figure A-19) which accepts the event and issues the PhysArchOneThrOneFunc
OneResultsAvailable event to propagate function physical attributes and costs.
The CMP_PhysArchOneThrOne_ExecutionControlBlock AD (see Figure
A-17) accepts the event and increments the thread one function number. If the
current thread function number is less than the number of thread one functions,
the AD issues the StartPhysArchOneThreadOneFunctionTwo event to CMP_
80
PhysArchOne_ThreadOne_FunctionTwoPart (Figure A-16). Otherwise, the
AD issues PhysArch OneThrOneResultsAvailable event to CMP_PhysArchOne_
ExecutionControlBlock AD (see Figure A-15)17. CMP_PhysArchOne_
ExecutionControlBlock AD accepts the event and increments the physical
architecture one thread number. If the current thread number is less than the
number of physical architecture one threads, the AD issues the StartPhysArchOne
ThreadTwo event to CMP_PhysArchOne_ThreadTwoPart (Figure A-14).
Otherwise, the AD issues PhysArchOneResultsAvailable event to CMP_Phys
Arch_ExecutionControlBlock AD (see Figure A-13)18. CMP_PhysArch_
ExecutionControlBlock AD accepts the event and increments the physical
architecture number. If the current physical architecture number is less than the
number of physical architectures, the AD issues the StartPhysicalArchitecture_2
event to CMP_Candidate_2_PhysicalArchitecturePart (Figure A-12).
Otherwise, the AD issues PhysArchResultsAvailable event to StartArchitecture
AnalysisBlock AD (see Figure A-2)19. StartArchitectureAnalysisBlock AD
accepts the event and increments the logical architecture number. If the current
logical architecture number is less than the number of logical architectures, the
AD issues the StartArchitectureAnalysis event to CMP_PhysicalArchitecture
ContainerBlock (Figure A-1). Otherwise, the AD proceeds to final state
indicating that architecture analysis is complete.
17 The same process is repeated for thread functions three and four, if appropriate. 18 The same process is repeated for threads three and four, if appropriate. 19 The same process is repeated for physical architectures three and four, if appropriate
81
3.5 Architecture Attribute System Model Details
This section extends both logical and physical architectures from the function
level to the architecture attribute level as shown in Figure 3-6 BDD. Figure 3-6 provides a
descriptive model of the structure of attribute extensions to the logical and physical
architectures in Figure 3-5. However, Figure 3-6 does not provide enough semantic detail
to build an executable model. The remainder of this section presents the model
constructs necessary to extend the executable model definition in section 3.4 from the
function level.
Figure 3-6 Architecture Attribute Model Detail
Figure A-9 structurally extends the logical architecture for candidate one logical
architecture, thread one, function one through addition of a performance attribute (i.e.
CMP_LogArchOneThrOneFuncOne_PerfAttrPart), energy attribute (i.e. CMP_Log
ArchOneThrOne FuncOne_EnergyAttrPart) and thermal attribute (i.e. CMP_Log
ArchOneThrOneFuncOne_ThermAttrPart). The provided attribute set is identical for
82
every model function (four logical architectures, four threads per logical architecture,
four functions per thread). The CMP_Log ArchOneThrOneFuncOne_PerfAttrBlock,
instantiated by CMP_LogArchOneThrOneFuncOne_PerfAttrPart, encapsulates the
number of arithmetic computations required by the function algorithm. Arithmetic
computations supported by the model are 1) Complex/Floating Point/Integer addition20,
multiplication, and division; 2) trigonometric (cos, sin, tan); 3) arc trigonometric (arccos,
arcsin, arctan, four quadrant arctan); 4) miscellaneous (log, exp, sqrt). The CMP_Log
ArchOneThrOneFuncOne_EnergyAttrBlock, instantiated by CMP_LogArchOneThr
OneFuncOne_EnergyAttrPart, encapsulates required function energy attributes such as
energy efficiency and number of arithmetic computations. The CMP_LogArchOneThr
OneFuncOne_ThermAttrBlock, instantiated by CMP_LogArchOneThrOneFunc
One_ThermAttrPart, encapsulates required function thermal attributes such as number
of arithmetic computations.
Figure A-21 structurally extends the physical architecture for candidate one
physical architecture, thread one, function one through addition of a performance
attribute (i.e. CMP_ FuncPhys_PerformanceContainerPart), energy attribute (i.e.
CMP_FuncPhys_EnergyContainerPart) and thermal attribute (i.e. CMP_FuncPhys_
ThermContainerPart). The attribute set is provided for every model function (four
physical architectures, four threads per physical architecture, four functions per thread).
The CMP_FuncPhys_PerformanceContainerBlock, instantiated by CMP_FuncPhys_
PerformanceContainerPart, encapsulates a SC, MC, and GPU computation model.
The CMP_FuncPhys_EnergyContainerBlock, instantiated by CMP_FuncPhys_
20 Subtraction computations are treated as addition with a two’s complement operand and is considered the
equivalent of addition. Therefor algorithm subtractions are counted as algorithm additions.
83
EnergyContainerPart, encapsulates SC, MC, and GPU energy models. The CMP_
FuncPhys_ThermalContainerBlock, instantiated by CMP_FuncPhys_Thermal
ContainerPart, encapsulates SC, MC, and GPU thermal models.
Logical architecture attribute extensions contain no behavior model elements.
Figure A-22 AD defines behavior for candidate one physical architecture, thread one,
function one attribute processing. Upon receipt of the StartFuncPhysAttribute
Computations event, the AD constructs and sends the StartFuncPhysEnergyAttribute
Computations to the CMP_FuncPhys_EnergyContainerBlock (i.e. CMP_FuncPhys_
EnergyContainerPart) in Figure A-21. Upon completion of energy attribute processing,
the CMP_FuncPhys_EnergyContainerBlock sends a CMP_FuncPhys_EnergyResults
Available event to the CMP_FuncPhys_AttributeExecutionControlBlock (i.e. CMP_
FuncPhys_AttributeExecutionControlPart) in Figure A-21. Upon receipt of the event,
the AD constructs and sends the StartFuncPhysPerformanceAttribute Computations to
the CMP_FuncPhys_PerformanceContainerBlock (i.e. CMP_FuncPhys_
PerformanceContainerPart) in Figure A-21. Upon completion of performance attribute
processing, the CMP_FuncPhys_PerformanceContainerBlock sends a CMP_Func
Phys_PerformanceResultsAvailable event to the CMP_FuncPhys_AttributeExecution
ControlBlock in Figure A-21. Upon receipt of the event, the AD constructs and sends
the StartFuncPhysThermalAttributeComputations to the CMP_FuncPhys_Thermal
ContainerBlock (i.e. CMP_Func Phys_ThermalContainerPart) in Figure A-21.
Upon completion of thermal attribute processing, the CMP_FuncPhys_Thermal
ContainerBlock sends a CMP_FuncPhys_ThermalResultsAvailable event to the
CMP_FuncPhys_AttributeExecutionControlBlock in Figure A-21. Upon receipt of
84
the event, the AD constructs and sends the FuncPhysAttributeComputationsAvailable
event to the CMP_PhysArchOneThrOneFuncOne_InterfaceBlock (i.e. CMP_
PhysArchOneThrOneFunc One_InterfaceBlock) in Figure A-18.
3.6 Architecture Performance Attribute System Model
This section presents the model constructs necessary to extend the architecture
performance attribute. The model constructs for the energy and thermal attributes are
identical to the performance attribute.
3.6.1 Performance Attribute Logical Architecture Extensions
There are no logical architecture extensions past those discussed in section 3.5.
3.6.2 Performance Attribute Physical Architecture Extensions
This section extends the physical architecture from architecture attribute level as
shown in Figure 3-7 BDD. Figure 3-7 provides a descriptive model of the structure of
performance attribute extensions that introduce SC, MC, and GPU computational model
decomposition level. Each computational model is further decomposed into analysis and
simulation computational model blocks. As in the preceding sections, Figure 3-7 does not
provide enough semantic detail to build an executable model. The remainder of this
section presents model constructs necessary to extend the executable model definition in
section 3.5.
85
Figure 3-7 Performance Attribute Model
Figure A-23 structurally extends the executable physical performance attribute
model through the addition of SC_ComputationModel_ContainerBlock (i.e. SC_
ComputationModel_ContainerPart), MC_ComputationModel_ContainerBlock (i.e.
MC_ComputationModel_ContainerPart), and GPU_ComputationModel_Container
Block (i.e. GPU_ComputationModel_ContainerPart) that encapsulate CR arithmetic
computation models. The CMP_FuncPhys_PerformanceExecutionControlBlock (i.e.
CMP_FuncPhys_PerformanceExecutionControlPart) encapsulates CR computation
model execution control and CMP_FuncPhys_PerformanceComputationsBlock (i.e.
CMP_FuncPhys_PerformanceComputationsPart) encapsulates retrieval of the set of
arithmetic computations used to compute a function execution time.
86
Upon receipt of the StartFuncPhysPerformanceAttributeComputations event, the
CMP_ FuncPhys_PerformanceExecutionControlBlock AD (Figure A-24) builds the
SetupPerformanceAttributeComputation event using the current logical architecture
number, current thread number, current function number, and computer resource
identification. The event is then sent to the CMP_FuncPhys_Performance
ComputationsBlock. Upon event receipt, the CMP_FuncPhys_Performance
ComputationsBlock AD (Figure A-25) tests for CR. If CR is SC, then the Figure A-25
AD uses the current logical architecture number, thread number, and function number to
build the StartRetrievePerformanceComputations event. The event is relayed through the
physical architecture to the CMP_LogArch_ExecutionControl block (i.e. CMP_Log
Arch_ExecutionControlPart) AD (Figure A-4). The CMP_LogArch_Execution
Control AD builds and sends LogArchOne(Two/Three/Four)_StartRetrievePerfComps
event if current logical architecture is one(two/three/four) to the CMP_LogArch
One(Two/Three/Four)_ExecutionControl block.
Each logical architecture/thread/function performance computation retrieval
processing is identical. Figure A-6 presents performance computation retrieval behavior
for the CMP_LogArchOne_ExecutionControl block. The CMP_LogArchOne_
ExecutionControl AD builds and sends LogArchOneThrOne(Two/Three/Four)_
StartRetrievePerfComps event if current thread number is one(two/three/four) to the
CMP_LogArchOneThrOne(Two/Three/Four)_ExecutionControl block. Figure A-8
presents performance computation retrieval behavior for the CMP_LogArchOneThr
One_ExecutionControl block. The CMP_LogArchOneThrOne_ExecutionControl
AD builds and sends LogArchOneThrOneFuncOne(Two/Three/Four)_StartRetrieve
87
PerfComps event if current function number is one(two/three/four) to the CMP_Log
ArchOneThrOneFuncOne(Two/Three/Four)_ExecutionControl block.
The previously described behavior results in a single function executing a series
of set operations provided by the AlgorithmPerformanceInterfaceBlock to set the
number of arithmetic computations (Complex/Floating Point/Integer Add/Multiply/
Divide, Trig, ArcTrig, and Miscellaneous) required by the accessed logical architecture
function. A series of performance computations available events are promulgated
through the logical architecture. The right side of Figure A-8 responds to the appropriate
LogArchOneThrOneFuncOne(Two/ Three/Four)_PerfCompResultsAvailable event and
sends LogArchOneThrOne_PerfCompResults Available event to CMP_LogArch
One_ExecutionControl block. The right side of Figure A-6 responds to the appropriate
LogArchOneThrOne(Two/Three/Four)_PerfCompResultsAvailable event and sends
LogArchOne_PerfCompResultsAvailable event to CMP_LogArch_ExecutionControl
block. The right side of Figure A-4 responds to the appropriate LogArchOne(Two/Three/
Four)_PerfCompResultsAvailable event and sends PerformanceComputationResults
Available event to the CMP_PhysicalArchitectureContainer block. The Performance
ComputationResultsAvailable event is promulgated to the CMP_FuncPhys_
PerformanceComputationsBlock AD (Figure A-25). The right side of Figure A-25
checks that the appropriate function computations have been retrieved and then sends
StartFuncPhysScPerformanceAttributeComputations event to the SC_Computation
Model_ContainerBlock (Figure A-23).
If CR is MC, the Figure A-25 AD sends StartFuncPhysMcPerformanceAttribute
Computations event to the MC_ComputationModel_ContainerBlock (Figure A-23). If
88
CR is GPU, the Figure A-25 AD sends StartFuncPhysGpuPerformanceAttribute
Computations event to the GPU_ComputationModel_ContainerBlock (Figure A-23).
3.6.2.1 Performance Attribute Single Core (SC) CPU Computation Model
Figure A-26 structurally extends the SC_ComputationModel_ContainerBlock
from the section 3.6.2 through addition of the SC_CM_AnalysisContainerBlock (i.e.
SC_CM_Analysis ContainerPart) and the SC_CM_SimulationContainerBlock (i.e.
SC_CM_SimulationContainerPart). SC_CM_AnalysisContainerBlock computes an
estimated most-likely or WCET SC execution time based on the state of the Analysis
Mode configuration parameter21. SC_CM_ SimulationContainerBlock generates a
number of simulated SC execution times (see section 3.7.3). The SC_Computation
Model_ExecutionControlBlock (i.e. SC_ComputationModel_ExecutionControlPart)
is responsible for propagating an execution start event to the appropriate block based on
the state of the AnalysisState configuration parameter22. SC_CM_ ExecutionTimeBlock
(i.e. SC_CM_ExecutionTimePart) is responsible for propagating SC execution time
results.
Figure A-27 presents the AD encapsulated by the SC_ComputationModel_
ExecutionControlBlock. The AD accepts StartFuncPhysScPerformanceAttribute
Computations event and retrieves the AnalysisState configuration parameter. The AD
sends StartScAnalysisExecutionTimeComputationEvent to the SC_CM_Analysis
ContainerBlock if AnalysisState is set to ‘Optimization’. The AD sends StartSc
21 AnalysisMode configuration parameter is managed by the AnalysisModeBlock. 22 AnalysisState configuration parameter is managed by the AnalysisStateBlock.
89
SimulationExecutionTimeComputationEvent to SC_CM_SimulationContainerBlock if
AnalysisState is set to ‘Simulation’.
Figure A-28 presents the AD encapsulated by the SC_CM_ExecutionTime
Block. The AD manages two concurrent swimlanes. The SC_Analysis swimlane accepts
ScAnalysisExecutionTimeAvailableEvent, sets SC_Execution_Time to the retrieved
estimated execution time, and sends CMP_FuncPhys_ScPerformanceResultsAvailable
event to promulgate estimated execution time results. The SC_Simulation swimlane
accepts ScSimulationExecutionTimeAvailableEvent, sets SC_Execution_Time to the
retrieved simulation execution times, and sends CMP_FuncPhys_ScPerformanceResults
Available event to promulgate simulated execution time results.
3.6.2.1.1 SC Analysis Computation Model
Figure A-29 structurally extends the SC_CM_AnalysisContainerBlock from the
section 3.6.2.1 through addition of the SC_AnalComputation_CmplxContainerBlock
(i.e. SC_Anal Computation_CmplxContainerPart), the SC_AnalComputation_Float
ContainerBlock (i.e. SC_AnalComputation_FloatContainerPart), the SC_Anal
Computation_IntContainerBlock (i.e. SC_AnalComputation_IntContainerPart), the
SC_AnalComputation_TrigContainerBlock (i.e. SC_AnalComputation_Trig
ContainerPart) ), the SC_AnalComputation_ArcTrigContainer Block (i.e. SC_Anal
Computation_ArcTrigContainerPart), the SC_AnalComputation_MiscContainer
Block (i.e. SC_AnalComputation_MiscContainerPart). The SC_Promulgate
AnalysisExecution TimeStartBlock (i.e. SC_PromulgateAnalysisExecutionTime
StartPart) is responsible for propagating an execution start event to all math operation
90
blocks. SC_AnalysisExecutionTime Block (i.e. SC_AnalysisExecutionTimePart) is
responsible for propagating SC analysis execution time results.
Figure A-30 presents the AD encapsulated by the SC_PromulgateAnalysis
ExecutionTimeStartBlock. The AD accepts the StartScAnalysisExecutionTime
ComputationEvent event. The AD concurrently sends StartTrigExecutionTime
ComputationEvent to the SC_AnalComputation_Trig ContainerBlock, StartArcTrig
ExecutionTimeComputationEvent to the SC_AnalComputation_ArcTrigContainer
Block, StartCmplxExecutionTimeComputationEvent to the SC_AnalComputation_
CmplxContainerBlock, StartFloatExecutionTimeComputationEvent to the SC_Anal
Computation_FloatContainerBlock, StartIntExecutionTimeComputationEvent to the
SC_Anal Computation_IntContainerBlock, and StartMiscExecutionTimeComputation
Event to the SC_AnalComputation_MiscContainerBlock.
91
Figure 3-8 computeExecutionTime Operation Code Segment
Figure A-28 presents the AD encapsulated by the SC_AnalysisExecutionTime
Block. The AD accepts TrigExecutionTimeAvailableEvent, ArcTrigExecutionTime
AvailableEvent, Misc ExecutionTimeAvailableEvent, CmplxComputationExecutionTime
AvailableEvent, FloatExecution TimeAvailableEvent, and IntExecutionTimeAvailable
Event. The AD must receive all events before continuing to compute execution time. At
a high level the Execution time is computed as shown in Figure 3-7. LOC_CompExecTime
92
is an accumulation of individual math group execution times. OUT_CompExecTime is
computed according to equation 8 below:
𝑂𝑈𝑇_𝐶𝑜𝑚𝑝𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 = 𝐴𝐶𝐹 ∗ 𝐿𝑂𝐶_𝐶𝑜𝑚𝑝𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 (8)
Where Architecture Calibration Factor (ACF) accounts for CR memory and
computation efficiencies. ACF is discussed further in section 4.3.
3.6.2.1.1.1 SC Complex Math Computation Model
Figure A-32 structurally extends the SC_AnalComputation_CmplxContainer
Block (Figure A-29) through addition of the SC_AnalComputation_CmplxAdd
ContainerBlock (i.e. SC_AnalComputation_CmplxAddContainerPart), the
SC_AnalComputation_CmplxDivContainerBlock (i.e. SC_AnalComputation_
CmplxDivContainerPart), and the SC_AnalComputation_CmplxMulContainer
Block (i.e. SC_AnalComputation_CmplxMulContainerPart). The SC_Promulgate
AnalysisCmplxExecutionTimeStartBlock (i.e. SC_PromulgateAnalysisCmplx
ExecutionTimeStartPart) is responsible for propagating an execution start event to all
complex math operation blocks. SC_AnalysisCmplxExecutionTimeBlock (i.e.
SC_AnalysisCmplx ExecutionTimePart) is responsible for propagating SC analysis
complex math execution time results.
Figure A-33 presents the AD encapsulated by the SC_PromulgateAnalysis
CmplxExecutionTimeStartBlock. The AD accepts the StartCmplxExecutionTime
ComputationEvent event. The AD concurrently sends StartCmplxAddExecutionTime
ComputationEvent to the SC_AnalComputation_CmplxAddContainerBlock,
StartCmplxDivExecutionTimeComputationEvent to the SC_AnalComputation_Cmplx
93
DivContainerBlock, and StartCmplxMulExecutionTimeComputationEvent to the
SC_AnalComputation_CmplxMulContainerBlock,.
Figure A-34 presents the AD encapsulated by the SC_AnalysisCmplxExecution
TimeBlock. The AD accepts CmplxAddExecTimeAvailableEvent, CmplxDivExecTime
AvailableEvent, and CmplxMulExecTimeAvailableEvent. The AD must receive all events
before continuing to compute complex execution time. Execution time is computed as
shown in Figure 3-9.
Figure 3-9 computeComplexExecutionTime Operation Code Segment
Figure A-35 structurally extends the SC_AnalComputation_CmplxAdd
ContainerBlock (Figure A-32). This block provides support for one to five algorithm
buffers (e.g. three input buffers and two output buffers). The container block
encapsulates the SC_Anal_CmplxAddSingle(Double/Triple/Quad/Quint)BufferQmif
Block (i.e. SC_Anal_CmplxAddSingle(Double/ Triple/Quad/Quint)BufferQmifPart.
These blocks/parts provide the interface from the system model to SPMs (see section
3.6.3). SC_Anal_ComplexAddExecutionTimeBlock (i.e. SC_Anal_ComplexAdd
ExecutionTimePart) is responsible for propagating SC analysis complex add math
execution time results.
94
Figure 3-10 selectComplexAddBufferTime Operation Code Segment
Figure A-36 presents the AD encapsulated by the SC_Anal_ComplexAdd
ExecutionTimeBlock. The AD accepts the StartCmplxAddExecutionTimeComputation
Event. The AD waits for input flows from all SPM interfaces before continuing to
compute complex add execution time. Complex add times are selected using
IN_NumberComplexAddBuffers attribute (Figure 3-10).
95
Execution time (Figure 3-11) is computed for the selected AnalysisMode23 (i.e.
IN_AnalysisType attribute) according to equation 9 below:
𝑂𝑈𝑇_𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝐴𝑑𝑑𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 = (𝐶𝐹𝑟𝑒𝑓
𝐶𝐹𝑠𝑒𝑙⁄ ) ∗ 𝑁𝑜𝑝𝑠 ∗ 𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝑀𝑜𝑑𝑒𝐸𝑉
(9)
Where CFref is the reference clock frequency, CFsel is the selected clock
frequency, CFref / CFsel is clock ratio (i.e. IN_ClkRatio attribute), Nops is the number of
math operations (i.e. IN_NumComplexAddComps attribute), and ExecTimeModeEV is the
selected execution time from Figure 3-10.
Figure 3-11 computeComplexAddTime Operation Code Segment
3.6.2.1.1.2 SC Floating Point Math Computation Model
SC_AnalComputation_FloatContainerBlock (Figure A-29) is structurally
extended through addition of the SC_AnalComputation_FloatAddContainerBlock (i.e.
SC_Anal Computation_FloatAddContainerPart), the SC_AnalComputation_Float
23 MostLikely and Wcet are currently supported with the rest as future growth.
96
DivContainerBlock (i.e. SC_AnalComputation_FloatDivContainerPart), and the
SC_AnalComputation_FloatMulContainerBlock (i.e. SC_AnalComputation_Float
MulContainerPart). The SC_PromulgateAnalysisFloatExecutionTimeStartBlock
(i.e. SC_PromulgateAnalysisFloatExecutionTimeStart Part) is responsible for
propagating an execution start event to all float math operation blocks. SC_Analysis
FloatExecutionTimeBlock (i.e. SC_AnalysisFloatExecutionTimePart) is responsible
for propagating SC analysis float math execution time results.
All SC_AnalComputation_FloatContainerBlock parts are interconnected
exactly the same as SC_AnalComputation_CmplxContainerBlock (Figure A-32) parts.
SC_Promulgate AnalysisFloatExecutionTimeStartBlock AD implementation is
identical to the SC_PromulgateAnalysisCmplxExecutionTimeStartBlock AD. SC_
AnalysisFloatExecutionTimeBlock implementation is identical to the SC_Analysis
CmplxExecutionTimeBlock AD.
3.6.2.1.1.3 SC Integer Math Computation Model
SC_AnalComputation_IntContainerBlock (Figure A-29) is structurally
extended through addition of the SC_AnalComputation_IntAddContainerBlock (i.e.
SC_AnalComputation_IntAdd ContainerPart), the SC_AnalComputation_IntDiv
ContainerBlock (i.e. SC_AnalComputation_IntDivContainerPart), and the SC_Anal
Computation_IntMulContainerBlock (i.e. SC_AnalComputation_IntMulContainer
Part). The SC_PromulgateAnalysisIntExecutionTimeStartBlock (i.e. SC_
PromulgateAnalysisIntExecutionTimeStartPart) is responsible for propagating an
execution start event to all integer math operation blocks.
97
SC_AnalysisIntExecutionTimeBlock (i.e. SC_AnalysisIntExecutionTimePart) is
responsible for propagating SC analysis Integer math execution time results.
All SC_AnalComputation_IntContainerBlock parts are interconnected exactly
the same as SC_AnalComputation_CmplxContainerBlock (Figure A-32) parts.
SC_PromulgateAnalysisInt ExecutionTimeStartBlock AD implementation is identical
to the SC_PromulgateAnalysisCmplx ExecutionTimeStartBlock AD. SC_Analysis
IntExecutionTimeBlock implementation is identical to the SC_AnalysisCmplx
ExecutionTimeBlock AD.
3.6.2.1.1.4 SC Trig Computation Model
Figure A-38 structurally extends the SC_AnalComputation_TrigContainer
Block (Figure A-29) through addition of the SC_AnalComputation_CosContainer
Block (i.e. SC_Anal Computation_CosContainerPart), the SC_AnalComputation_
SinContainerBlock (i.e. SC_Anal Computation_SinContainerPart), and the SC_Anal
Computation_TanContainerBlock (i.e. SC_Anal Computation_TanContainerPart).
The SC_PromulgateAnalysisTrigExecutionTimeStartBlock (i.e. SC_Promulgate
AnalysisTrigExecutionStartPart) is responsible for propagating an execution start
event all trig blocks. SC_AnalysisTrigExecutionTimeBlock (i.e. SC_AnalysisTrig
ExecutionTimePart) is responsible for propagating SC analysis trig execution time
results.
Figure A-39 presents the AD encapsulated by the SC_PromulgateAnalysisTrig
ExecutionTimeStartBlock. The AD accepts the StartTrigExecutionTimeComputation
Event event. The AD concurrently sends StartCosExecutionTimeComputationEvent to
98
the SC_AnalComputation_CosContainerBlock, StartSinExecutionTimeComputation
Event to the SC_AnalComputation_SinContainerBlock, and StartTanExecutionTime
ComputationEvent to the SC_AnalComputation_TanContainerBlock.
Figure A-40 presents the AD encapsulated by the SC_AnalysisTrigExecution
TimeBlock. The AD accepts CosExecTimeAvailableEvent, SinExecTimeAvailableEvent,
and TanExecTimeAvailableEvent. The AD must receive all events before continuing to
compute trig execution time. Execution time is computed as shown in Figure 3-12.
Figure 3-12 computeTrigExecutionTime Operation Code Segment
Figure A-41 structurally extends the SC_AnalComputation_CosContainer
Block (Figure A-38). This block provides support for one algorithm buffer. The
container block encapsulates the SC_Anal_CosQmifBlock (i.e. SC_Anal_CosQmif
Part). This block/part provides the interface from the system model to SPM models (see
section 3.6.3). SC_Anal_CosExecution TimeBlock (i.e. SC_Anal_CosExecution
TimePart) is responsible for propagating SC analysis cosine math execution time results.
Figure A-42 presents the AD encapsulated by the SC_Anal_CosExecutionTime
Block. The AD accepts StartCosExecutionTimeComputationEvent. The AD waits for
input flows from minimum and maximum SPM interfaces before continuing to compute
99
cosine execution time. Execution time (Figure 3-13) is computed for the selected
AnalysisMode24 (i.e. IN_AnalysisType attribute) according to equation 10 below:
𝑂𝑈𝑇_𝐶𝑜𝑠𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒 = (𝐶𝐹𝑟𝑒𝑓
𝐶𝐹𝑠𝑒𝑙⁄ ) ∗ 𝑁𝑜𝑝𝑠 ∗ 𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝑀𝑜𝑑𝑒𝐸𝑉 (10)
Where CFref is the reference clock frequency, CFsel is the selected clock
frequency, CFref / CFsel is clock ratio (i.e. IN_ClkRatio attribute), Nops is the number of
math operations (i.e. IN_NumCosComps attribute), and ExecTimeModeEV is the selected
execution time (i.e. IN_CosMostLikely or IN_CosMax attribute),
Figure 3-13 computeCosExecutionTime Operation Code Segment
3.6.2.1.1.5 SC Arc Trig Computation Model
SC_AnalComputation_ArcTrigContainerBlock (Figure A-29) is structurally
extended through addition of the SC_AnalComputation_ArcCosContainerBlock (i.e.
SC_Anal Computation_ArcCosContainerPart), the SC_AnalComputation_ArcSin
24 MostLikely and Wcet are currently supported with the rest as future growth.
100
ContainerBlock (i.e. SC_AnalComputation_ArcSinContainerPart), the SC_Anal
Computation_ArcTanContainerBlock (i.e. SC_AnalComputation_ArcTan
ContainerPart), and the SC_AnalComputation_ArcTanFourQuadContainerBlock
(i.e. SC_Anal Computation_ArcTanFourQuadContainerPart) . The SC_
PromulgateAnalysisArcTrigExecutionTimeStartBlock (i.e. SC_PromulgateAnalysis
ArcTrigExecutionTimeStartPart) is responsible for propagating an execution start
event to all arc trig math operation blocks. SC_AnalysisArcTrigExecutionTimeBlock
(i.e. SC_AnalysisArcTrig ExecutionTimePart) is responsible for propagating SC
analysis arc trig math execution time results.
All SC_AnalComputation_ArcTrigContainerBlock parts are interconnected
exactly the same as SC_AnalComputation_TrigContainerBlock (Figure A-38) parts.
SC_Promulgate AnalysisArcTrigExecutionTimeStartBlock AD implementation is
identical to the SC_Promulgate AnalysisTrigExecutionTimeStartBlock AD.
SC_AnalysisArcTrigExecutionTimeBlock implementation is identical to the
SC_AnalysisTrigExecutionTimeBlock AD.
3.6.2.1.1.6 SC Miscellaneous Computation Model
SC_AnalComputation_MiscContainerBlock (Figure A-29) is structurally
extended through addition of the SC_AnalComputation_ExpContainerBlock (i.e.
SC_AnalComputation_ExpContainerPart), the SC_AnalComputation_Log
ContainerBlock (i.e. SC_AnalComputation_ LogContainerPart), and the SC_Anal
Computation_SqrtContainerBlock (i.e. SC_AnalComputation_SqrtContainerPart) .
The SC_PromulgateAnalysisMiscExecutionTimeStartBlock (i.e. SC_Promulgate
AnalysisMiscExecutionTimeStartPart) is responsible for propagating an execution
101
start event to all miscellaneous math operation blocks. SC_AnalysisMiscExecution
TimeBlock (i.e. SC_AnalysisMiscExecutionTimePart) is responsible for propagating
SC analysis miscellaneous math execution time results.
All SC_AnalComputation_MiscContainerBlock parts are interconnected
exactly the same as SC_AnalComputation_TrigContainerBlock (Figure A-38) parts.
SC_Promulgate AnalysisMiscExecutionTimeStartBlock AD implementation is
identical to the SC_Promulgate AnalysisTrigExecutionTimeStartBlock AD. SC_
AnalysisMiscExecutionTimeBlock implementation is identical to the SC_AnalysisTrig
ExecutionTimeBlock AD.
3.6.3 Architecture Attribute System Model – Quantitative Model Interface
The interface between the executable architecture attribute system model and each
math operation SPM is mechanized through two model constructs. The first is a
specialized block stereotyped as a SimulinkBlock shown in Figure 3-14). The second is a
MATLAB® Simulink® model discussed in section 3.7.2.4.
Figure 3-14 System Model - Quantitative Model Interface
102
The SimulinkBlock encapsulates a MATLAB® Simulink® model with the flow
ports on the right side of the block matching Simulink model ports. Data can flow in,
out, or in/out of the flow ports. The rate that data is produced or consumed by each flow
port is controlled by the m_SampleTime attribute set to 50 milliseconds in Figure 3-14.
3.7 Performance Attribute Statistical Performance Models
This section discusses a series of workflow steps to develop and integrate
estimation and simulation SPMs. Both models are integrated with the component
physical architecture system model during the “Physical Architecture Modeling”
workflow step shown in Figure 3-15. Figure 3-7 depicts physical architecture performance
attribute (i.e. computational SPM) and physical architecture system model layers. A
similar model construct exists for energy (thermal) attributes where energy (thermal)
model(s) replace performance SPMs.
Figure 3-15 SPM Development Flow
103
Figure 3-15 SPM development flows from bottom to top. The products at each
level from bottom to top represent a level of abstraction from observed data, to statistical
models, to Simulink models to SysML models.
From this point forward, the paper will focus on development of sCPU
computational SPMs to support estimation (i.e. optimization analysis state) and
simulation (i.e. simulation analysis state). Section 3.7.1 describes the processor and
memory architecture configuration and assumptions made for statistical performance
model development.
3.7.1 Statistical Process Model Development Computer Configuration
A 2nd Gen Intel® Core™ microarchitecture (formerly known as Sandy Bridge)
(Lempel 2011) 2.40 GHz dual-core i3 processor was used for statistical performance
model development. An overview of the Sandy Bridge microarchitecture is shown in
Figure 3-16. The microarchitecture implements one on-chip L1 Instruction Cache (32
KByte) per core. The microarchitecture implements an on-chip L1 data L1 Data Cache
(32 KByte) and on-chip L2 Data Cache (256 KByte) per core. The microarchitecture also
implements an on-chip Last Level (or L3) Data Cache (3072 Kbytes) shared by all cores.
Finally, the microarchitecture provides an Integrated Memory Controller two channel
interface to 8 GByte bulk memory (i.e. Double Data Rate type three (DDR3)
Synchronous Dynamic Random-Access Memory). Dual DDR3 memory operates at a
channel transfer rate of 21,328 MB/s.
104
The second asset used during the case study (section 4.1) is a 4th Gen Intel®
Core™ microarchitecture (known as Broadwell) 2.50 GHz dual-core i5 processor.
Instruction cache size, data cache size, and bulk memory size and speed are identical
between the two architectures. For this reason, it was decided to only address CPU clock
speed in this research effort.
Figure 3-16 Intel Sandy Bridge Microarchitecture (Lempel 2011)
The software developed to collect arithmetic operation execution time data was
designed to utilize all of the available on-chip cache memory plus some bulk memory.
This was done in order to force collection of longer (i.e. more conservative) execution
times resulting from the utilization of slower memories. The strategy for this research
effort was to collect execution time data for one memory usage profile and calibrate
execution times for other memory usage profiles. Future research can address execution
time dependencies on memory size, speed, and usage profiles.
105
The software environment used for SPMs development was MATLAB R2017A.
This software package was chosen to perform arithmetic operations with associated time
collection at the application level. Arithmetic operation SPMs require development one
time for each processor family (e.g. Intel, Arm, etc.).
3.7.2 Estimation Models
This section describes development of an arithmetic computation SPM library.
The library currently consists of nineteen arithmetic operations: Complex/Floating
point/Integer Add/Multiply/Divide (9 individual models), Cos/Sin/Tan (3 individual
models), Arc Cos/Sin/Tan/TanFourQuad (4 four individual models), Exp/Log/Sqrt (3
individual models). Add, Multiply, and Divide operations are modeled for one through
five buffers that supports processing up to five matrix dimensions. Each arithmetic
operation SPM was produced by collecting observation data, defining states, and
performing distribution analysis. A total of fifty-five models were developed for this
research effort.
The arithmetic operations chosen for this research effort primarily support vector
based (i.e. one dimension) signal processing algorithms. This class of algorithms are used
for embedded sensor processing in many application domains such as automotive,
aircraft, ship, submarine, manufacturing control, chemical processing, and so on. Section
5.2 discusses enhancements to this arithmetic operation library.
3.7.2.1 Execution Time Data Collection Workflow Step
Each arithmetic operation SPM was produced by first generating a set of
observation data at the software application level. Execution times observed at the
106
application level encompasses computer CPU and memory architecture, operating
system, compiler, and software application activities required to perform basic
mathematical operations. This observation data can be used to build a coarse-grained
system-level execution time estimate.
Observation data was produced using an application level MATLAB script
performing vector arithmetic operations. Each observation data point represents the
execution time associated with 100,000 mathematical operations (e.g. cosine, integer add,
complex multiply, etc.). Each operand and result used 64-bit (or 8 byte) long word (or
double) data types. The collected execution time is then divided by 100,000 to derive the
average execution time per math operation. The procedure is repeated 50,000 times to
form the observation data set. Therefore, each observation data set consists of 50,000
sample points.
Figure 3-17 Complex Add Single Buffer Multimodal Distribution
107
The 50,000 sample points are used to build a histogram. Figure 3-17 (left side)
depicts a histogram constructed from observation data for the Complex Add Single
Buffer operation. The histogram reveals a multimodal distribution.
3.7.2.2 State Definition (Estimation) Workflow Step
A group of unimodal distributions (Figure 3-17 right side) is formed from the
multimodal distribution. Each unimodal distribution represents a group of execution
times associated with executable code and operand memory location (i.e. cache memory
level) for the associated arithmetic operation. Code and data residence in faster memories
at the time of arithmetic operation execution results in faster execution times.
The minimum observed execution time, called the Best-Case Execution Time
(BCET) (Wilhelm 2008) is defined as the State 1 minimum execution time. The
maximum execution time for State 1 is chosen such that the state histogram represents a
unimodal distribution (e.g. Figure 3-17 right side). The maximum execution time for State
1 becomes the minimum execution time for State 2. The process repeats until the last
state where the maximum execution time is the maximum observed execution time (i.e.
WCET (Wilhelm 2008)). The multimodal distribution is decomposed into a series of
unimodal distributions each bounded by the state minimum and maximum execution
times. A histogram is built using each state’s data (e.g. Figure 3-17 right side) that
represents an empirical unimodal distribution.
3.7.2.3 Distribution Analysis Workflow Step
Each state unimodal distribution is fit with multiple candidate distribution models
(i.e. colored lines of Figure 3-17 right side). Maximum likelihood estimation (Myung
108
2003) is used to estimate distribution parameters (e.g. mu, standard deviation, shape) for
each candidate distribution. Ninety-five percent confidence intervals are computed for all
distribution parameters. Covariances are also computed for all distribution parameters.
Figure A-44 shows candidate distributions and associated distribution parameters for
complex add single buffer add operation hot states. Figure A-45 shows candidate
distributions and associated distribution parameters for the warm states of the Complex
Add Single Buffer operation. Hot states (Figure 3-17 left side) are associated with faster
execution time where executable code and operands are loaded in the fastest cache
memories. Warm states (Figure 3-17 left side) are associated with slower execution time
where executable code and operands have to be loaded into cache memory or retrieved
from slower bulk memory.
The Bayesian Information Criterion (BIC) (Schhwarz 1978) is computed for each
distribution using the maximum likelihood value. The most negative BIC value identifies
the best distribution fit. BIC is preferred over the Akaike Information Criterion (AIC)
(Akaike 1974) is this scenario for two reasons:
• BIC says more about absolute model quality
• AIC selects the best model fit from a set but says nothing about model
quality
The mu for each selected state distribution (Figure A-44 and Figure A-45 ) is
placed in an Excel spreadsheet (Figure 3-18) along with mu 95% confidence interval
values. State probability is computed by dividing the number of state sample points
(‘HMM State Count’ column in Figure 3-18) by the total number of sample points
109
(50000). A similar spreadsheet is built for each mathematical operation. Each
spreadsheet feeds a MATLAB® Simulink® Model discussed in the next section.
Figure 3-18 Single Core Complex Add State Parameters
3.7.2.4 Simulink Modeling (Estimation) Workflow Step
A MATLAB® Simulink® model (Figure A-43) is built for each math operation.
The model computes expected value (EV) using state mu and probabilities according to
equation 11.
𝐸𝑥𝑒𝑐𝑇𝑖𝑚𝑒𝐸𝑉 = 𝜇𝑆𝑇1 ∗ 𝑝𝑆𝑇1 + 𝜇𝑆𝑇2 ∗ 𝑝𝑆𝑇2 + ⋯ + 𝜇𝑆𝑇𝑛 ∗ 𝑝𝑆𝑇𝑛 (11)
The Simulink model computes EV’s for Hot States, Warm States, and a Cold25
state. The Simulink model also computes an EV for all states designated as “Most
Likely”. The Simulink model also determines the minimum and maximum (or WCET)
observed execution times to complete the set of six values computed for each math
operation. The Simulink model reads state mus, probabilities, and observation data from
25 Cold state encapsulates the state with highest execution values. This state occurs when all arithmetic
operation operands are retrieved from bulk memory.
110
the math operation spreadsheet generated in section 3.7.2.3. The Simulink model outputs
computed values via output ports that are associated with flow ports in the system model
(see section 3.6.3).
3.7.3 Simulation Analysis Models
The purpose of the simulation SPM is to produce a representative set of execution
time samples for use in simulation analysis. A simulation SPM has been developed for
each arithmetic operation in the computation library (see section 3.7.2). Each simulation
SPM implements a Hidden Markov Model (HMM) as discussed in section 3.7.3.2. The
HMM state model uses the same boundary execution times (section 3.7.2.2) but differs in
structure as discussed in section 3.7.3.2. MATLAB® Simulink® models could not be
built to interface simulation SPMs to the system model as discussed in section 3.7.3.3.
3.7.3.1 State Definition (Simulation) Workflow Step
The states defined in section 3.7.2.2 are also used for simulation. However,
instead of being organized to Hot, Warm, and Cold states, the states are organized into a
single group. This approach was used because there was no need to distinguish among
state groups from the perspective of generating simulated execution time samples.
3.7.3.2 Transition Analysis
Inspection of math operation observation data revealed patterns of execution
times that result from instruction caching, data caching, dynamic power management,
operating system thread breaks, etc. An HMM is used to model visible state outputs and
an underlying hidden state behavior. A sample HMM is shown in Figure 3-19.
111
Figure 3-19 Sample Hidden Markov Model
It is assumed that for each t time step the model occupies a state ωi(t) and emits a
visible output φi(t), χi(t), or ψi(t). Visible outputs can be continuous functions or discrete
outputs. This research will restrict outputs to discrete values. At any state ωi(t) the
probability of any particular visible output is defined by the probability ρik. The
unobservable states (or nodes) (ωi) and transition probabilities operate the same as a basic
Markov model.
The observable data set is analyzed to develop an HMM state transition
probability matrix. Figure 3-20 shows the state transition probability matrix for the
complex add single buffer math operation. Each observable data value (i.e. current state)
is compared to the next data value (i.e. next state). The appropriate State Transition
Count Matrix cell is incremented by one. The values for each State Transition Count
Matrix row are summed and placed in the HMM State Count column. The State
112
Transition Probability matrix is then built by dividing each cell of the State Transition
Count Matrix by the appropriate HMM State Count column cell.
Figure 3-20 Complex Add Single Buffer HMM State Transition Probability Matrix
A set of discrete visible outputs with associated output probabilities must be
developed for each state. Figure 3-21 shows an example histogram display built from the
histogram data table (Figure 3-22).
Figure 3-21 Complex Add Single Buffer State 1 Histogram Display
113
Figure 3-22 Complex Add Single Buffer State 1 Histogram Data
The output probability is computed using the data in Figure 3-22 as follows:
1. Sum all non-zero state histogram values
2. Divide each non-zero state histogram value by the sum
The resulting table in Figure 3-23 shows each discrete output and its associated
output probability.
114
Figure 3-23 Complex Add Single Buffer State 1 Visible Output Data
The state transition probability matrix and the discrete outputs with associated
output probabilities for all states fully define the HMM for each math operation. The
HMM is used to build a stream of simulated execution time values for each math
operation.
115
3.7.3.3 Simulink Modeling (Simulation) Workflow Step
A MATLAB® Simulink® model is to be constructed to encapsulate each HMM.
The Simulink model provides a stream of simulated execution time values to the system
model. At this time there are multiple issues that prevent construction of the Simulink
models required to execution time simulation. First, there is a MATLAB function
hmmgenerate that is used to generate a stream of visible output states using the HMM
developed in section 3.7.3.2. The function takes the number of samples, state transition
probability matrix and visible output probability matrix as inputs to produce a stream of
observable output data. However, this function is currently not supported by the
MATLAB Simulink code generator. The MATLAB Simulink code generator produces code
to export the behavior associated with the Simulink model into the Rhapsody system
model. Second, the hmmgenerate function reproduces an identical stream of visible
state outputs for each function invocation. Mathworks is aware of both issues and is
currently working on solutions. Those solutions will be integrated into the model when
available.
116
Chapter 4 - Data Analysis and Results
4.1 Case Study Definition
A simple case study is presented to evaluate execution time estimates generated
using section 3.6.2.1.1 arithmetic computation SPMs integrated with the executable
performance attribute SysML model and actual execution times. The goals for this case
study are 1) provide initial validation of use of the executable architecture attribute model
and architecture attribute analysis workflow, 2) evaluate execution time (i.e. performance
attribute) estimation versus observed actual execution times, 3) compute performance
cost for use with optimization analysis, and 4) select algorithm data input size that meets
performance constraints. These goals were accomplished through a proof of concept
application using a single computational thread for two different sCPU computer
systems.
The case study implements a single functional thread consisting of three
functions. Thread execution time is required to satisfy (i.e. be less than or equal to) a
Component Response Time (CRT) MOP constraint. The CRT value chosen for the case
study is 15 msec as shown in Figure 4-1. Two sCPU proposed solutions, Computer 1 and
Computer 2 in Figure 4-1 with memory architectures defined in section 3.7.1, are used to
evaluate estimated thread execution time versus actual execution time. Analysis is
performed for fifteen algorithm data input sizes (256, 512, 1024, 2048, 4096, 8192,
16384, 32768, 65536, 131072, 262144, 528376, 1048576, 2097152, 4096384). An
execution time estimate for each data input size is produced by the executable model for
both Analysis Modes (Most Likely and WCET).
117
Figure 4-1 Case Study Functional and Physical Definition
4.2 Case Study Architecture Attribute Workflow
The first step of the architecture analysis workflow (section 3.1) used for this case
study is ‘Define Key Component System Functions’. For this case study, the function
thread model (Figure 4-1) consists of a Fast Fourier Transform (FFT), followed by a
Gaussian Filter – Frequency Domain (GFFD), followed by an Inverse FFT. The
FFT/IFFT algorithm selected was the Cooley-Tukey algorithm (Cooley and Tukey 1965).
The GFFD algorithm was adapted from a two dimensional (Gonzalez, Woods and Eddins
2009) to one dimensional gaussian filter. The functional architecture, physical
architecture, and performance attribute extensions are modeled using the blocks, IBDs,
and ADs defined in section 3.4 through 3.6.
The ‘Assign Attribute/Thread Weights’ workflow step assigns Wp, We, and Wh
attribute weights to 1, 0, and 0 for thread/function cost computation (section 3.3.6),
118
because the case study focus is on the performance attribute. The ‘Define Candidate
Physical Architecture Solutions’ workflow step defines Computers 1 and 2 as candidate
physical solutions (Figure 4-1).
The next step of the architecture analysis method (section 3.3) used for this case
study is to ‘Model Function Architecture Attributes’. The functional architecture (Figure
3-5) is modeled with a LogArch_1_Thread_1 block, LogArch_1_Thread_1_Function_n
where n=1..3 for each of three functions (Figure 3-6). The functional architecture is
extended with blocks for LogArch_1_Thread_1_Function_n_Performance where n=1..3
for each of three functions (Figure 3-7). An operation is developed for each function
performance block to compute the number of algorithm computations for each arithmetic
operation (LogArch_1_Thread_X_ Function_Y_Performance block in Figure 3-7).
The ‘Model Physical Architecture Attributes’ (Figure 3-1) workflow step begins
by adding all corresponding physical architecture blocks (Figure 3-5, Figure 3-6, Figure
3-7) to those presented in the previous paragraph to the executable model. Additionally,
all physical architecture performance attribute extensions (Figure A-29 through Figure
A-43) for all arithmetic operations26 are modeled for the SC_CM_AnalysisContainer
(Figure 3-7). The container computes the SC_Anal_EstExecutionTime attribute for each
estimated function execution time. Function estimated execution times are summed to
form a thread execution time.
The FFT/IFFT was implemented using code implementation obtained from
LIBROW™ (Chernenko n.d.). The GFFD was developed for this research effort. All
26 This case study fully exercises the sCPU performance attribute Complex Add, Complex Multiply,
Floating Point Multiply, Floating Point Divide, Integer Add, Sin, and Exp computation models.
119
code was implemented in C++ and was compiled (optimized for speed) using Visual
Studio 2015. The code was executed on each computer identified in Figure 4-1. Timing
data was collected using the SmartBear™ AQtime™ Pro (version 8.1) memory and
performance profiling tool. Statistical analysis for the collected data was performed
using the Minitab™ 18 Statistical Software product.
4.3 Case Study Results Analysis
The results presented focus on validating use of execution time SPMs developed
in section 3.6.2.1.1 to perform thread execution time estimates that support the Figure 3-1
‘Compute Attribute Cost’ workflow step. Thread execution time estimates are considered
useful if actual thread execution time is within 10 percent of the most-likely estimate and
25 percent of the WCET estimate.
The general approach discussed over the next three paragraphs was to use an
initial set of Computer 1 estimates and observed results build the ACF (referenced
equation 8 of section 3.6.2.1.1) that “calibrates” the execution time estimation for
unmodeled effects/factors. The ACF is then used to compute execution time estimates for
Computer 1 and also Computer 2 to validate the results on a different platform.
Figure 4-2 presents actual thread execution times (in microseconds) versus data
input size for both Computers 1 and 2. Actual optimized (i.e. for speed) mean values
(Rows 3 and 15) are derived by performing statistical analysis for 128 collected samples
of thread execution time. Collected samples are analyzed to remove outlier values that
result from unmodeled system dynamic effects such as the operating system lowering
CPU frequency due to CPU overheating and runtime thread breaks (NOTE: These
120
systems dynamic effects may be normal and expected and require further analysis for
inclusion in the arithmetic operation SPMs in section 3.7). The remaining values are
analyzed to produce a mean (NOTE: Details of the analysis are provided in section
4.3.1). This process is repeated 32 times. The 32 means are used to produce an overall
mean (Rows 3 and 15) and 95% confidence interval lower (Rows 1 and 13) and upper
(Rows 2 and 14) bounds. Figure 4-2 also presents estimates produced by the executable
model (i.e. developed in section 4.2) versus data input size for Computer 1 (Rows 5
through 7) and Computer 2 (Rows 17 through 19). ‘OP vs ML Est’ (Row 8) is computed
by dividing the ‘OP Mean’ (Row 3) actual time by ‘MostLikelyEst’ (Row 5) estimated
time. ACF (Row 9) architecture calibration factors are were originally used to calibrate
Computer 1 execution time estimates to 10 percent or less of actual Computer 1
execution times. ACF values (Row 9) are selected to produce more conservative
estimates than the values computed in Row 8 in order to introduce design margin. ‘OP
Adj Perf Est’ (Row 10) shows the calibrated estimate produced by multiplying Row 5
and Row 9. ‘C1 % Act vs Est’ (Row 11) presents the error for Computer 1 estimated and
actual results as the percent difference between Rows 10 and 3.
It should be noted that ACF values less than 1 are applied to execution time
estimates for data input block sizes less than or equal to 262144. Smaller data input block
size arithmetic operations execute faster because operand data and results are primarily
located in data cache memory. An ACF greater than 1 is used for data input block sizes
greater than 262144 because of increased interaction with bulk memory.
The ACF (Row 9) is used to produce Computer 2 ‘MostLikelyEst’ (Row 17)
execution time estimates. Comparison of ‘MostLikelyEst’ (Row 17) execution time
121
estimates with Computer 2 ‘OP Mean’ (Row 15) actual execution times results in ‘C2 %
Act vs Est’ (Row 20) errors within 10 percent. Computer 1 and Computer 2 results are
shown to be validated using the calibration approach. This was expected because though
the CPUs are different architectures and running at different speeds, the cache memory
architectures are the same for both computers. Most of the time for arithmetic operations
is spent retrieving and storing data which has been accounted for through development of
the SPMs and the ACF. The implications of these results are that the execution time
estimation approach can be used across the entire family of Intel i3, i5, and i7 processors
using the standard cache memory architecture. More testing must be performed with
additional processors and different algorithms to completely validate the calibration
approach.
122
Figure 4-2 Case Study Data Results
Another goal of the case study was to verify WCET estimates versus actual
results. It became evident very quickly that WCET estimates produced by the executable
model were very much higher than actual observed results for all thread data input sizes.
However, it was observed that a combination of uncalibrated cold state and warm state
estimates could be used to produce WCET execution time estimates that are within the 25
percent target with actual measured WCET execution times for all data input sizes for
both Computer 1 and 2.
123
The executable model was used to produce cold state and warm state execution
time estimates for Computer 1 (Rows 6 and 7) and Computer 2 (Rows 18 and 19). Actual
WCET execution times collected during data collection are provided in Figure 4-2 for
Computer 1 (Row 4) and Computer 2 (Row 16). ‘C1 WCET % Act vs Est’ (Row 12)
presents the error for Computer 1 estimated and actual results as the percent difference
between Rows 6/7 and 4. ‘C2 WCET % Act vs Est’ (Row 21) presents the error for
Computer 2 estimated and actual results as the percent difference between Rows 18/19
and 16.
4.3.1 Case Study Data Analysis Details
This section provides details on the data analysis performed to produce Figure 4-2
Rows 1-3 and 13-15 values. The procedure and analysis were conducted for both
Computer 1 and 2. The function thread was executed 128 times for each data input size
producing 128 execution time data samples. Each sample set is input to Minitab for
statistical analysis. The Outlier test is used to identify data that can skew analysis results
as shown in Figure 4-3. For example, the data point sample with red box (i.e. Row 38) is
identified as an outlier in Figure 4-3.
Figure 4-3 Computer One 32768 Data Sample 02 Outlier Report (from Minitab)
124
Figure 4-4 shows an Outlier report after removing Row 38 and 67 outliers from
the data sample set. Outlier values result from two operating conditions:
1. Thread execution experiences a thread break by the operating system.
This condition was detected and annotated using the AQtime tool.
2. CPU frequency is slowed by Dynamic Power Management (DPM)
software due to detection of a CPU overtemperature condition. This
condition was observed using the Speccy (by Piriform) tool.
Figure 4-4 Computer One 32768 Data Sample 02 (Minus Outliers) Outlier Report
The remaining data sample is analyzed to determine a median value and 95%
confidence interval. First, a histogram is produced from Minitab with a hypothesized
Largest Extreme Value overlay as shown in Figure 4-5 (Left Side). After visualization of
the plot, a Probability Plot is produced from Minitab with a hypothesized Largest
Extreme Value overlay as shown in Figure 4-5 (Right Side). The p-value indicates that
the data does not properly populate a Largest Extreme Value distribution.
125
Figure 4-5 Sample Minitab Histogram and Probability Plots
Since distribution characteristics cannot be determined the data is treated as a
related sample to the data that was collected and analyzed to produce the ACF (Figure
4-2 Row 9) and associated adjusted estimates (Figure 4-2 Rows 10 and 17). A Wilcoxon
signed-rank non-parametric hypothesis test is performed on the data. The test is run in
two stages. The first stage produces a hypothesized median value and 95% Confidence
Interval (Low and High values) shown in Figure 4-6 (Left Side) using the Minitab 1-stage
Wilcoxon test. The second stage reruns the 1-stage Wilcoxon test in Minitab with a
hypothesized median value (i.e. the estimated execution time from Figure 4-2 Rows 10
and 17). The test result is shown in Figure 4-6 (Right Side). The Computer One 32768
estimate of 9429.32 (Figure 4-2 Row 10) was used for the test shown in Figure 4-6. The p-
value of zero rejects the null hypothesis concluding that the data set median value is less
than the hypothesized estimated value.
126
Figure 4-6 Sample Wilcoxon Signed Rank Test Results
The median and 95% confidence interval values for all 32 data samples are
averaged to produce the data recorded in Figure 4-2 Rows 1-3 (Computer 1) and Rows
13-15 (Computer 2) for each data input size for each computer.
4.3.2 Thread Cost
The last four rows of Figure 4-2 address the ‘Compute Attribute Cost’ workflow
step (Figure 3-1, section 3.3.6) used in this case study. Figure 4-2 Rows 23 and 25 shows
computed C1 and C2 thread performance cost (Equation 4) using C1 (Row 10) and C2
(Row 17) estimated execution time (ExecTimeEst) divided by ‘CRT MOP’ (Row 22)
(LatencyReq) cost function for each candidate data input block size. A computed cost
greater than one indicates that thread execution time does not satisfy the ‘CRT MOP’ for
that data input block size. A cost value less than or equal to one indicates that thread
execution time satisfies the ‘CRT MOP’ for that data input block size. The computed
cost values show a maximum data input block size of 32768 satisfies the ‘CRT MOP’ for
127
both computers. Any smaller data input block size also satisfies the ‘CRT MOP’.
Computer 2 has lower cost for each data input block size, which is no surprise, since
execution-time performance is the sole weighted factor in the cost equation and
Computer 2 has greater performance capability. The primary goal of thread algorithm
design is to process the largest amount of data for the minimum cost. In addition, thread
performance cost for the selected data input block size is assigned to DAG nodes for
Computer 1 and 2 for subsequent use in the OptimizationAnalysis block (Figure 3-5).
Computed thread costs for all data input sizes and computers defines a SC trade
space value set. Final data input size is chosen based on overall objectives. If minimum
thread cost is an overarching constraint, then a data input size of 256 would be the
preferred solution. If processing the maximum amount of data is an overarching
constraint, then a data input size of 32768 would be the preferred solution. If lower
computer purchase cost is an additional overarching constraint, then computer one would
be the preferred solution. Otherwise, computer two would be the preferred solution.
4.3.3 Simulation Analysis
Demonstration of the simulation SPMs developed during this research effort is
beyond the scope of this paper due to the issues discussed in section 3.7.3.3. This
capability will be validated in future research efforts.
128
Chapter 5 - Conclusions
5.1 Contributions to the field
This dissertation makes the following contributions:
• Presents a component-level model-based architecture analysis method for
developing a design trade space, performing optimization analysis, and
performing simulation analysis.
• Presents system model logical and physical architecture extensions using
SysML. These extensions enable evaluation of multiple functional threads
on multiple physical architectures. Functional threads can be used to
model complex applications or to evaluate different algorithm approaches.
• Implements physical architecture system models that enables computation
of performance, energy, and thermal attributes for single core, multi/many
core, and GPU computer resource architectures.
• Implements and performs an initial proof-of-concept for an executable
abstract single core computation model capable of producing thread
algorithm performance (i.e. execution time) estimates within 10 percent
for normal execution and 25 percent for worst-case execution. The
computation model provides the basis for multi/many core performance
estimates.
• Computes DAG node (thread/function) costs. These costs will be export
to optimization algorithms implemented in MATLAB. A separate cost is
computed for each node and target physical architecture combination for
use by the optimization algorithm.
129
These contributions produce a component architecture (software and computer
resource) that possesses reduced technical risk. Risk is reduced through multi-attribute
(i.e. performance, energy, thermal, etc.) optimization that considers constraints. The
systems engineer is provided with analytical data, via the architecture attribute view, that
facilitates communication regarding domain-specific engineering artifacts (e.g. software
architecture with software engineers, computer resource architecture with computer
engineers, energy and thermal design with mechanical engineers, and redundancy
architecture with reliability engineers) earlier in the system design cycle. Requirements
for building a prototype are greatly reduced because performance, energy, and thermal
attributes are quantified through use of methods and artifacts produced by this research.
The attribute framework defined by this research is extensible. Domain specific (e.g.
mobile, cyber, etc.) logical and physical architecture attributes can be added to extend the
model and node cost equations. Finally, analysis efficiency and quality are improved via
an executable model. Analysis results are obtained quickly for changes/additions of
algorithms, computer resource solutions, and computer resource configurations without
the use of utility curves.
5.1.1 Limitations
The single core computation model must be separately developed for each
computer architecture. Examples of popular architecture families include the Advanced
Reduced Instruction Set Computer (RISC) Machine (ARM) (used in Embedded
Systems), and PowerPC RISC machine (used in Apple).
130
5.2 Recommendations for Future Work
The foundational research associated with this paper can be built upon in several
different ways:
• Enhance operations supported by and fidelity of statistical performance
models:
o Add memory size and speed parameter dependencies
o Add additional operations such as comparison and search to
support algorithms in data analysis
o Add matrix operations such as add and multiply to support multi-
dimensional algorithms in the areas in image processing, machine
learning, computer vision, and artificial intelligence
o Analyze outliers for inclusion into SPMs
• Integrate computation model for multi/many-core processors (utilize
Amdahl’s Law to compute speedup), GPU, and Field Programmable Gate
Arrays (FPGAs) within a single node
• Integrate computation model for distributed node architectures (e.g.
cluster)
• Incorporate performance estimates for state-based algorithms
• Integrate energy consumption, generated heat (i.e. thermal), and reliability
quantitative models for all physical architectures (single core, multi/many
core, GPU, FPGA)
• Integrate multi-attribute optimization algorithms
• Complete simulation analysis capability
131
• Integrate application specific architecture attributes. For example, mobile
applications that utilize intensive graphics consume energy and produce
heat at an accelerated rate. Lower energy consumption and heat
generation occur when screen pixels are black. A logical architecture
attribute such as percent screen pixels black can be added to the model
that associates with physical architecture energy consumption and heat
generation.
Additionally, computation models and quantitative energy consumption, heat
generation, and reliability models can be integrated for different processor families
(ARM, PowerPC, etc.).
132
Chapter 6 - Bibliography
Akaike, H. 1974. "A New Look at the Statistical Model Identification." IEEE
Transactions on Automatic Control (IEEE) 19 (6): 716-723.
doi:10.1109/TAC.1974.1100705.
Alford, Mack. 1992. "Strengthening the Systems/Software Engineering Interface for Real
Time Systems." Proceedings of the Second International Symposium of the
National Council on Systems Engineering. Seattle, Wash. 411-418.
Alford, Mack W. 1977. "A Requirements Engineering Methodology for Real-Time
Processing Requirements." IEEE Transactions on Software Engineering SE-3 (1):
60-69.
Asanovic, Krste, Ras Bodick, Bryan C. Catanzaro, Joseph J. Gebis, Parry husbands, Kurt
Keutzer, David A Patterson, et al. 2006. "The Landscape of Parallel Computing:
A View from Berkeley." Berkeley EECS Electrical Engineering and Computer
Science. University of California, Berkeley. December 18. Accessed June 15,
2017. https::www2.eecs.berkeley.edu/Pubs/RechRpts/2006/EECS-2006-183.html.
Balarin, Felice, Yosinori Watanabe, Harry Hsieh, Luciano Lavagno, Claudio Passerone,
and Alberto Sangiovanni-Vincentelli. 2003. "Metropolis: an integrated electronic
system design environment." Computer, April 08: 45-52.
doi:10.1109/MC.2003.1193228.
Balducci, M., A. Ganapathiraju, J. Hamaker, J. Picone, A. Choudary, and A. Skjellum.
1997. "Benchmarking of FFT Algorithms." IEEE Southeastcon Proceedings.
Blackburg, VA. doi:10.1109/SECON.1997.598704.
Balmelli, Laurent. 2007. "An Overview of the Systems Modeling Language for Products
and Systems Development." Journal of Object Technology 6 (6): 149-177.
Accessed October 16, 2017. http://www.jot.fm/issues/issue_2007_07/article2.
Balmelli, Laurent, D. Brown, Murray Cantor, and M. Mott. 2006. "Model-driven Systems
Development." IBM Systems Journal 45 (3): 569-585.
Banerjee, Sudarshan, and Nikil Dutt. 2004. "Efficient Search Space Exploration for HW-
SW Partitioning." Proceedings of the 2nd IEEE/ACM/IFIP International
Conference on Hardware/Software Codesign and System Synthesis. Stockholm:
ACM. 122-127.
Banerjee, Sudarshan, Elaheh Bozorgzadeh, and Nikil D. Dutt. 2006. "Integrating Physical
Constraints in HW-SW Partitioning for Architectures With Partial Dynamic
Reconfiguration." IEEE Transactions on Very Large Scale Integration (VLSI)
Systems 14 (11): 1189-1202. doi:10.1109/TVLSI.2006.886411.
Becker, Steffen, Heiko Koziolek, and Ralf Reussner. 2009. "The Palladio component
model for model-driven performance prediction." The Journal of Systems and
Software (Elsevier Science Inc.) 82 (1): 3-22. doi:10.1016/j.jss.2008.03.066.
133
Beihoff, Bruce, Christopher Oster, Sanford Friedenthal, Chris Paredis, Duncan Kemp,
Heinz Stoewer, David Nichols, and Jon Wade. 2014. A World in Motion - Systems
Engineering Vision 2025. San Diego, CA: INCOSE.
Berry, Gerard. 2000. "The Foundations of Estrel." In Proof, Language, and Interaction:
Essays in Honour of Robin Milner, 425-454. Cambridge, MA: MIT Press.
Bilsen, Greet, Marc Engels, Rudy Lauwereins, and Jean Peperstraete. 1996. "Cycle-static
dataflow." IEEE Transactions on Signal Processing 44 (2): 397-408.
doi:10.1109/78.485935.
BKCASE Editorial Board. 2017. The Guide to the Systems Engineering Body of
Knowledge (SEBoK). Edited by Hoboken, NJ R.D. Adcock (EIC). Vers. 1.8. The
Trustees of the Stevens Institute of Technology. Accessed October 23, 2017.
www.sebokwiki.org.
Bock, Conrad. 2006. "SysML and UML 2 Support for Activity Modeling." Systems
Engineering (INCOSE) 9 (2): 160-186. Accessed December 15, 2017.
doi:10.1002/sys.20046.
Booth, S. 2008. "System Engineering and Architecting with CORE." INCOSE WMA
Chapter Meeting.
http://www.incose.org/wma/library/docs/INCOSE_WMA.080408.01.pdf.
Boyd, E. L., W. Azeem, Hsien-Hsin Lee, Tien-Pao Shih, Shih-Hao Hung, and E. S.
Davidson. 1994. "A Hierarchical Approach to Modeling and Improving the
Performance of Scientific Applications on the KSR1." International Conference
on Parallel Processing (ICPP 1994). North Carolina: IEEE. 188-192.
doi:10.1109/ICPP.1994.30.
Buck, Joseph T., and Edward A. Lee. 1993. "Scheduling Dynamic dataflow graphs with
bounded memory using the token flow model." 1993 IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP-93).
Minneapolis, MN: IEEE. 429-432. doi:10.1109/ICASSP.1993.319147.
Buede, Dennis M., and William D. Miller. 2016. The Engineering Design of Systems:
Models and Methods. Hoboken, NJ: John Wiley & Sons, Inc.
Campeanu, Gabriel, Jan Carlson, and Severine Sentilles. 2014. "Component Allocation
Optimization for Heterogeneous CPU-GPU Embedded Systems." Software
Engineering and Advanced Applications (SEAA), 2014 40th EUROMICRO
Conference on, August 27-29: 229-236. doi:10.1109/SEAA.2014.29.
Cantor, Murray. 2003. "Rational Unified Process for Systems Engineering Part 1:
Introducing RUP SE Version 2.0." August. Accessed November 01, 2017.
http://vincentvanrooijen.com/container%5Cprocess%5CRational%20Unified%20
Process%20for%20Systems%20Engineering%20-%201.pdf.
134
—. 2003. "Rational Unified Process for Systems Engineering Part II: System
Architecture." September. Accessed November 01, 2017.
http://vincentvanrooijen.com/container%5CArchitecture%5CRational%20Unified
%20Process%20for%20Systems%20Engineering%20Part%20II%20-
%20System%20Architecting.pdf.
Carson, Ronald S., and Barbara J. Sheeley. 2013. "Functional Architecture as the Core of
Model-Based Systems Engineering." INCOSE International Symposium.
Philadelphia, PA. 29-45. doi:10.1002/j.2334-5837.2013.tb03002.x.
Chernenko, Sergey. n.d. Article 10 Fast Fourier Transform - FFT. LIBROW. Accessed
March 21, 2018. www.librow.com/articles/article-10.
Commoner, F., A. W. Holt, S. Even, and A. Pnueli. 1971. "Marked Directed Graphs."
Journal of Computer and System Sciences (Elsevier) 5 (5): 511-523.
doi:10.1016/S0022-0000(71)80013-2.
Cooley, James W., and John W. Tukey. 1965. "An Algorithm for the Machine
Calculation of Complex Fourier Series." Mathematics of Computation (American
Mathematical Society) 19 (90): 297-301. doi:10.2307/2003354.
Cross, Nigel. 2008. Engineering Design Methods: Strategies for Product Design. Fourth.
West Sussex, Eng.: John Wiley & Sons Ltd.
Cui, Zheng, Yun Liang, Kyle Rupnow, and Deming Chen. 2012. "An Accurate GPU
Performance Model for Effective Control Flow Divergence Optimization." 26th
International Parallel & Distributed Symposium (IFDPS). Shanghai, China:
IEEE. 83-94. doi:10.1109/IPDPS.2012.18.
De Micheli, Giovanni, and Rajesh K. Gupta. 1997. "Hardware/Software Co-Design."
Proceedings of the IEEE (IEEE) 85 (3): 349-365. doi:10.1109/5.558708.
Deb, Kalyanmoy, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. "A fast and
elitist multiobjective genetic algorithm: NSGA-II." IEEE Transactions on
Evolutionary Computation 6 (2): 182-197. doi:10.1109/4235.996017.
Dick, Robert P., David L. Rhodes, and Wayne Wolf. 1998. "TGFF: Task Graphs for
Free." Proceedings of the 6th International Workshop on Hardware/Software
Codesign. Seattle, Washington, USA: IEEE Computer Society. 97-101.
Dori, Dov E.F. 2006. "Object-Process Methodology." In Encyclopedia of Knowledge
Management, by David G. Schwartz, 683-693. Hershey: Idea Group Reference.
—. 2002. Object-Process Methodology: A Holistic Systems Paradigm. Berlin: Springer-
Verlag.
Du, Jiayi, Xiangsheng Kong, Xin Zuo, Lingyan Zhang, and Aijia Ouyang. 2014.
"Shuffled frog leaping algorithm for hardware/software partitioning." Journal of
Computers 9 (11): 2752-2760.
135
Elbeltagi, Emad, Tarek Hegazy, and Donald Grierson. 2005. "Comparison among five
evolutionary-based optimization algorithms." Advanced engineering informatics
19 (1): 43-53.
Ernst, Rolf, Jorg Henkel, and Thomas Benner. 1993. "Hardware-software cosynthesis for
microcontrollers." IEEE Design & Test of computers (IEEE Computer Society) 10
(4): 64-75. doi:10.1109/54.245964.
Estefan, Jeff A. 2008. "Survey of Model-Based Systems Engineering (MBSE)
Methodologies." Vers. B. INCOSE MBSE Initiaitive. May 23. Accessed 10 01,
2017. http://www.omgsysml.org/MBSE_Methodology_Survey_RevB.pdf.
Fernandez, Maribel. 2009. Models of Computation: An Introduction to Computability
Theory. Springer.
Fisher, Gerard H. 1998. "Model-Based Systems Engineering of Automotive Systems."
Digital Avionics Systems Conference, 1998. Proceedings, 17th DASC. The
AIAA/IEEE/SAE. Bellevue, WA: IEEE. B15/1-B15/7.
doi:10.1109/DASC.1998.741455.
Friedenthal, Sanford A., and Cris Kobryn. 2004. "Extending UML to Support a Systems
Modeling Language." Annual INCOSE International Symposium. Toulouse,
France: INCOSE. 686-706. doi:10.1002/j.2334-5837.2004.tb00527.x.
Friedenthal, Sanford, Alan Moore, and Rick Steiner. 2015. A Practical Guide to SysML:
the systems modeling language. Third. Morgan Kaufmann.
Friedenthal, Sanford, and Mark Sampson. 2012. "Model-based Systems Engineering
(MBSE) Initiative." INCOSE MBSE Workshop. Jacksonville, FL: INCOSE,
January 21-22. Accessed November 01, 2017.
http://www.omgwiki.org/MBSE/lib/exe/fetch.php?media=mbse:mbse_iw_2012-
introduction-2012-01-21-friedenthal-c.pptx.
Gaudiot, Jean-Luc. 1991. "Stream Languages and data-flow." In Advanced Topics in
data-flow computing, edited by Jean-Luc Gaudiot and Lubomir Bic, 439-454.
Englewood Cliffs, NJ: Prentice Hall.
Gonzalez, Rafael C., Richard E. Woods, and Steven L. Eddins. 2009. "Lowpass
(Smoothing) Frequency Domain Filters." Chap. 4.5.2 in Digital Image Processing
Using MATLAB, 826. Gatesmark Publishing.
Gupta, Rajesh K., Claudionor Nunes Coellho Jr., and Giovanni De Micheli. 1992.
"Synthesis and simulation of digital systems containing interacting hardware and
software components." Proceedings of the 29th ACM/IEEE Design Automation
Conference. IEEE Computer Society Press. 225-230.
Halbwachs, N., P. Caspi, P. Raymond, and D. Pilaud. 1991. "The synchronous data flow
programming language LUSTRE." Proceedings of the IEEE (IEEE) 79 (9): 1305-
1320. doi:10.1109/5.97300.
136
Harel, D., H. Lachover, A. Naamad, A. Pnueli, M. Politi, R. Sherman, A. Shtull-Trauring,
and M. Trakhtenbrot. 1990. "STATEMATE: a working environment for the
development of complex reactive systems." IEEE Transactions on Software
Engineering 16 (4): 403-414.
Henkel, Jorg, and Rolf Ernst. 2001. "An Approach to Automated Hardware/Software
Partitioning Using a Flexible Granularity that is Driven by High-Level Estimation
Techniques." IEEE Transactions on Very Large Scale Integration (VLSI) Systems
9 (2): 273-289.
Henzinger, Thomas A., Benjamin Horowitz, and Christoph Meyer Kirsch. 2001.
"Embedded Control Systems Development with Giotto." Proceedings of the ACM
SIGPLAN workshop on Languages, compilers, and tools for embedded systems
(LCTES '01). Snow Bird, UT. 64-72. doi:10.1145/384197.384208.
Hill, Mark D., and Michael R. Marty. 2008. "Amdahl's Law in the Multicore Era."
Computer (IEEE) 41 (7): 33-38. doi:10.1109/MC.2008.209.
Hoare, C. A. R. 1978. "Communicating Sequential Processes." Communications of the
ACM (ACM) 21 (8): 666-677. doi:10.1145/359576.359585.
Hoffmann, Hans-Peter. 2013. "IBM Rational Harmony Deskbook Rel. 4.1." July.
Accessed 10 15, 2017.
https://www.ibm.com/developerworks/community/groups/service/html/communit
yview?communityUuid=dbc39547-3619-4c31-9535-
0b583a4e6190#fullpageWidgetId=W62078615f88f_4809_afad_c27cdc9d7e71&fi
le=2132d88d-4dde-40b4-8102-254ca4456c82.
Hong, Sunpyo, and Hyesoon Kim. 2009. "Memory-Level and Thread_Level Parallelism
Aware GPU Architecture Performance Analytical Model." Proceedings of the
36th Annual International Symposium on Computer Architecture (ISCA '09).
Austin, TX: ACM. 152-163. doi:10.1145/1555754.1555775.
Hylands, Christopher, Edward Lee, Jie Liu, Xiaojun Liu, Stephen Neuendorffer, Yuhong
Xiong, Yang Zhao, and Haiyang Zheng. 2003. "Overview of the Ptolemy
Project." Ptolemy Project Heterogeneous Modeling and Design. University of
California, Berleley. July 02. Accessed December 28, 2017.
https://ptolemy.eecs.berkeley.edu/publications/papers/03/overview.
IEEE Computer Society Software & Systems Engineering Standards Committee. 2008.
Systems and Software Engineering - System life cycle processes. Standard,
Geneva/Piscataway: ISO/IEC-IEEE.
Institute of Electrical and Electronics Engineers. 2012. Information Technology -
Modeling Languages - Part 1: Syntax and Semantics for IDEF0. Standard,
Geneva: International Standards Organization, 120.
137
ISO. 2016. ISO/IEC/IEEE International Standard for Systems and Software Engineering-
Life Cycle Management-Part 4: Systems Engineering Planning. Standard,
Geneva: ISO.
ISO/IEC JTC 1/SC 7. 2011. December. Accessed 10 05, 2017.
http://cabibbo.dia.uniroma3.it/asw/altrui/iso-iec-ieee-42010-2011.pdf.
Jing, Yiming, Jishun Kuang, Jiayi Du, and Biao Hu. 2013. "Application of improved
simulated annealing optimization algorithms in hardware/software partitioning of
the reconfigurable system-on-chip." Proceedings of the International Conference
on Parallel Computing in Fluid Dynamics. Changsha, China: Springer-Verlag.
532-540.
Kahn, Gilles. 1974. "The semantics of a simple language for parallel programming."
Proceedings of the IFIP Congress. 471-475. Accessed December 26, 2017.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.597.5710&rep=rep1&t
ype=pdf.
Kang, Yan, He Lu, and Jing He. 2013. "A PSO-based Genetic Algorithm for Scheduling
of Tasks in a Heterogeneous Distributed System." Journal of Software 8 (6):
1443-1450.
Keinert, Joachim, Thomas Schlichter, Joachim Falk, Jens Gladigau, Christian Haubelt,
Jurgen Teich, and Michael Meredith. 2009. "SystemCoDesigner—an automatic
ESL synthesis approach by design space exploration and behavioral synthesis for
streaming applications." ACM Transactions on Design Automation of Electronic
Systems (TODAES) 14 (1): 1-23.
Knudsen, Peter Voigt, and Jan Madsen. 1996. "PACE: A dynamic programming
algorithm for hardware/software partitioning." Proceedings of the 4th
International Workshop on Hardware/Software Co-Design. IEEE Computer
Society. 85-92.
Kuang, Shiann-Rong, Chin-Yang Chen, and Ren-Zheng Liao. 2005. "Partitioning and
Pipelined Scheduling of Embedded System Using Integer Linear Programming."
Proceedings of the 11th International Conference on Parallel and Distributed
Systems (ICPADS'05). Fukuoka, Japan: IEEE Computer Society. 37-41.
doi:10.1109/ICPADS.2005.219.
Lee, Edward A., and David G. Messerschmitt. 1987. "Synchronous data flow."
Proceedings of the IEEE (IEEE) 75 (9): 1235-1245.
doi:10.1109/PROC.1987.13876.
Lee, Edward A., and Thomas M. Parks. 1995. "Dataflow process networks." Proceedings
of the IEEE 83 (5): 773-801. doi:10.1109/5.381846.
Lee, Edward Ashford, and David G. Messerschmitt. 1987. "Static Scheduling of
Synchronous Data Flow Programs for Digital Signal Processing." IEEE
Transactions on Computers C-36 (1): 24-35. doi:10.1109/TC.1987.5009446.
138
LeGuernic, P., T. Gautier, M. Le Borgne, and C. Le Maire. 1991. "Programming real-
time applications with SIGNAL." Proceeding of the IEEE (IEEE) 79 (9): 1321-
1336. doi:10.1109/5.97301.
Lempel, Oded. 2011. "2nd Generation Intel Core Processor Family: Intel Core i7, i5, and
i3." 2011 IEEE Hot Chips 23 Symposium (HCS). Stanford, CA, USA: IEEE.
doi:10.1109/HOTCHIPS.2011.7477509.
—. 2011. "2nd Generation Intel Core Processor Family: Intel Core i7, i5, and i3)." Intel.
July 28. Accessed 10 20, 2017. https://www.hotchips.org/wp-
content/uploads/hc_archives/hc23/HC23.19.9-Desktop-CPUs/HC23.19.911-
Sandy-Bridge-Lempel-Intel-Rev%207.pdf.
Levis, A. 1993. "National Missile Defense (NMD) Command and Control Methodology
Development." Contract Data Requirements List A005 report for US Army
Contract MDA 903-88-019, Delivery Order 0042, Center of Excellence in
Command, Control, Communication, and Intelligence, George Mason University,
Fairfax, VA.
Li, Guoshuai, Jinfu Feng, Junhua Hu, Cong Wang, and Duo Qi. 2014.
"Hardware/Software Partitioning Algorithm Based on Genetic Algorithm."
Journal of Computers 9 (6): 1309-1315.
Lin, Geng, Wenxing Zhu, and M Montaz Ali. 2014. "A tabu search-based memetic
algorithm for hardware/software partitioning." Mathematical Problems in
Engineering 1-15.
Liu, Peng, Jigang Wu, and Yongji Wang. 2013. "Hybrid algorithms for
hardware/software partitioning and scheduling on reconfigurable devices."
Mathematical and Computer Modelling 58 (1): 409-420.
Long, David, and Zane Scott. 2011. "A Primer for Model-Based Systems Engineering."
Vers. 2nd Edition. Vitech Corporation. Vitech Corporation. October. Accessed 10
07, 2017. http://www.vitechcorp.com/resources/mbse.shtml.
López-Vallejo, Marisa, and Juan Carlos López. 2003. "On the hardware-software
partitioning problem: System modeling and partitioning techniques." ACM
Transactions on Design Automation of Electronic Systems (TODAES) 8 (3): 269-
297.
Madsen, Jan, Jesper Grode, Peter Voigt Knudsen, Morten Elo Petersen, and Anne
Haxthausen. 1997. "LYCOS: The Lyngby co-synthesis system." Design
Automation for Embedded Systems (Kluwer Academic Publishers) 2 (2): 195-235.
Maier, Mark W., and Eberhardt Rechtin. 2009. The Art of Systems Architecting. Third.
Boca Raton, FL: CRC Press. doi:978-1420079135.
139
McKean, David, James D Moreland Jr., and Steven Doskey. 2019. "Use of model-based
architecture attributes to construct a component-level trade space." INCOSE
Systems Engineering (Wiley Online Library). doi:10.1002.sys.21478.
Mealy, George H. 1955. "A Method for Synthesizing Sequential Circuits." Bell System
Technical Journal (Wiley Online Library) 34 (5): 1045-1079. Accessed 12 15,
2017. doi:10.1002/j.1538.7305.1955.tb03788.x.
Meyerowitz, Trevor C. 2008. Single and Multi-CPU Performance Modeling for
Embedded Systems. Dissertation, Electrical Engineering and Computer Sciences,
Graduate Division, University of California at Berkley.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-36.html.
Moreira, Orlando, Twan Basten, Marc Geilen, and Sander Stuijk. 2010. "Buffer Sizing
for Rate-Optimal Single-Rate Data-Flow Scheduling Revisited." IEEE
Transactions on Computers (IEEE) 59 (2): 188-201. doi:10.1109/TC.2009.155.
Mudry, Pierre-Andre, Guillaume Zufferey, and Gianluca Tempesti. 2006. "A
Dynamically Constrained Genetic Algorithm for Hardware-software
Partitioning." Proceedings of the 8th Annual Conference on Genetic and
Evolutionary Computation. Seattle, Washington, USA. 769-776.
doi:10.1145/1143997.1144134.
Murata, Tadao. 1989. "Petri Nets: Properties, analysis and applications." Proceedings of
the IEEE (IEEE) 77 (4): 541-580. doi:10.1109/5.24143.
Myung, Jae. 2003. "Tutorial on Maximum Likelihood Estimation." Journal of
Mathematical Psychology (Academic Press) 47: 90-100.
NASA. 2007. NASA Systems Engineering Handbook. Handbook, NASA, 360. Accessed
June 2018. file:///E:/GWU/Research/Model-
Based%20Systems%20Engineering/Modeling%20Languages/NASA-SP-2007-
6105-Rev-1-Final-31Dec2007.pdf.
Niemann, Ralf, and Peter Marwedel. 1997. "An Algorithm for Hardware/Software
Partitioning Using Mixed Integer Linear Programming." Design Automation for
Embedded Systems (Kluwer Academic Publishers) 2 (2): 165-193.
doi:10.1013/A.1008832202436.
—. 1996. "Hardware/Software Partitioning Using Integer Programming." EDTC '96
Proceedings of the 1996 European conference on Design and Test. Paris, France:
IEEE Computer Society. 473-479.
Nolan, Brian, Barclay Brown, Laurent Balmelli, Tim Bohn, and Ueli Wahli. 2008.
ibm.com/redbooks. International Technical Support Organization. February.
Accessed October 02, 2017. https://www-
01.ibm.com/events/wwe/grp/grp004.nsf/vLookupPDFs/Rational%20MBSE-
MDSD%20Redbook%202008/$file/Rational%20MBSE-
MDSD%20Redbook%202008.pdf.
140
Object Management Group (OMG). 2018. What is SysML? OMG. Accessed June 15,
2018. http://www.omgsysml.org/what-is-sysml.htm.
Object Management Group. 2008. "MARTE Specification." Object Management Group.
June 08. http://www.omg.org/omgmarte/Specification.htm.
—. 2017. "OMG System Modeling Language Specification." Vers. 1.5. OMG. May.
Accessed October 16, 2017. http://www.omg.org/spec/SysML/About-SysML/.
—. 2003. "UML for Systems Engineering RFP." March 28. Accessed 10 19, 2017.
http://syseng.omg.org/UML_for_SE_RFP.htm.
Oliver, David W., Timothy P. Kelliher, and James G. Keegan, Jr. 1997. Engineering
Complex Systems With Objects and Models. New York: McGraw-Hill.
Parnell, Gregory S., and Timothy E. Trainor. 2009. "Using the Swing Weight Matrix to
Weight Multiple Objectives." INCOSE International Symposium. Singapore. 283-
298. doi:10.1002/j.2334-5837.2009.tb00949.x.
Pohl, Klaus. 2010. Requirements Engineering: Fundamentals, Principles, and
Techniques. Heidelberg: Springer.
Ramos, Ana Luisa, Jose Vasconcelos Ferreira, and Jaume Barcelo. 2012. "Model-Based
Systems Engineering: An Emerging Approach for Modern Systems." IEEE
Tansactions of Systems, Man, and Cybernetics - Part C: Applications and
Reviews 42 (1): 101-111.
Roedler, G. J., and C. Jones. 2005. Technical Measurement: A Collaborative Project of
PSM, INCOSE, and Industry. Practical Software & Systems Management (PSM),
San Diego, CA: INCOSE, 65.
Roedler, Garry. 2012. "Harmonization of Key Systems Engineering Resources."
Proceedings 15th Annual NDIA Systems Engineering Conference. San Diego,
CA. Accessed November 01, 2017.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.386.8520&rep=rep1&t
ype=pdf.
Ross, Douglass T. 1977. "Strutured Analysis (SA): A Language for Communicating
Ideas." IEEE Transactions on Software Engineering SE-3 (1): 16-34.
doi:10.1109/TSE.1977.229900.
Sadashiv, Naidila, and S. M. Dilip Kumar. 2011. "Cluster, grid and cloud computing: A
detailed comparison." 2011 6th International Conference on Computer Science
and Education (ICCSE). Singapore: IEEE. 477-482.
doi:10.1109/ICCSE.2011.6028683.
Sangiovanni-Vincentelli, Alberto. 2007. "Quo vadis, SLD? reasoning about the trends
and challenges of system level design." Proceedings of the IEEE (IEEE) 95 (5):
467-506.
141
Sapienza, Gaetana, Tiberiu Seceleanu, and Ivica Crnknovic. 2013. "Partitioning Decision
Process for Embedded Hardware and Software Deployment." Computer Software
and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th
Annual. Japan: IEEE. 674-680. doi:10.1109/COMPSACW.2013.131.
Savage, John E. 1998. Models of Computation: Exploring the Power of Computation.
Addison-Wesley.
Schaumont, Patrick R. 2013. "Data Flow Modeling and Transformation." Chap. 2 in A
Practical Introduction to Hardware/Software Codesign, 31-59. New York:
Springer. doi:10.1007/978-1-4614-3737-6.
Schhwarz, Gideon. 1978. "Estimating the Dimension of a Model." The Annals of
Statistics (The Institute of Mathematical Statistics) 6 (2): 461-464.
Schlichter, Thomas, Martin Lukasiewycz, Christian Haubelt, and Jürgen Teich. 2006.
"Improving system level design space exploration by incorporating sat-solvers
into multi-objective evolutionary algorithms." IEEE Computer Society Annual
Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).
Karlsruhe: IEEE. 6 pp.
Schulz, Stephan, Jerzy W. Rozenblit, Michael Mrva, and Klaus Buchenrieder. 1998.
"Model-Based Codesign." IEEE Computer 60-67.
Shah, A. A., A. A. Kerzhner, D. Schaefer, and C. J.J. Paredis. 2010. Multi-View Modeling
to Support Embedded Systems Engineering in SysML. Vol. 5765, in Graph
Transformations and Model-Driven Engineering. Lecture Notes in Computer
Science, edited by G. Engels, C. Lewerentz, W. Schafer, A. Schurr and B.
Westfechtel, 580-601. Berlin, Heidelberg: Springer.
doi:https://doi.org.10.1007/978-3-642-17322-6_25.
Society of Automotive Engineers. 2014. Processes for Engineering a System.
Warrendale: SAE International.
SOURCEFORGE. n.d. Open SystemC Initiative (OSCI). Slashdot Media. Accessed
December 28, 2017. https://sourceforge.net/p/systemc/wiki/Home.
Teich, Jurgen. 2012. "Hardware/Software Codesign: The Past, the Present, and Predicting
the Future." Proceedings of the IEEE (IEEE) 100 (Special Centennial Issue):
1411-1430. doi:10.1109/JPROC.2011.2182009.
Thomasian, Alexander, and Paul F. Bay. 1986. "Analytic Queueing Network Models for
Parallel Processing of Task Systems." IEEE Transactions on Computers (IEEE)
C-35 (12): 1045-1054. doi:10.1109/TC.1986.1676712.
Tikir, Mustafa M, Laura Carrington, Erich Strohmaier, and Allan Snavely. 2007. "A
genetic algorithm approach to modeling the performance of memory-bound
computations." ACM/IEEE Conference on Supercomputing (SC '07). Reno, NV:
IEEE. doi:10.1145/1362622.1362686.
142
Van Werkhoven, B., J. Maassen, F. J. Seinstra, and H. E. Bal. 2014. "Performance
Models for CPU-GPU Data Transfers." 14th IEEE/ACM International Symposium
on Cluster, Cloud, and Grid Computing (CCGrid). Chicago, IL: IEEE. 11-20.
doi:10.1109/CCGrid.2014.16.
Vitech. 2013. "CORE 9 Unlocking the Power of MBSE - Product Slick." Accessed 10 15,
2017. www.vitechcorp.com/products/files/core4pageslick.pdf.
Wilhelm, Reinhard, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing,
David Whalley, Guillem Bernat, et al. 2008. "The worst-case execution-time
problem -- overview of methods and survey of tools." ACM Transactions on
Embedded Computing Systems (TECS) (ACM) 7 (3): 36-53.
doi:10.1145/1347375.1347389.
Williams, Samuel, Andrew Waterman, and David Patterson. 2009. "Roofline: an
Insightful Visual Performance Model for Multicore Architectures."
Communications of the ACM (ACM) 52 (4): 65-76.
doi:10.1145/1498765.1498785.
Wolf, Wayne. 2003. "A decade of hardware/software codesign." Computer (IEEE
Computer Society) 36 (4): 38-43. doi:10.1109/MC.2003.1193227.
Wolf, Wayne, Ahmed Amine Jerraya, and Grant Martin. 2008. "Multiprocessor System-
on-Chip (MPSoC) Technology." IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems 27 (10): 1701-1713.
Wu, Jigang, and Thambipillai Srikanthan. 2006. "Low-complex dynamic programming
algorithm for hardware/software partitioning." Information Processing Letters
(Elsevier) 98 (2): 41-46. doi:10.1016/j.ipl.2005.12.008.
Wu, Jigang, Pu Wang, Siew-Kei Lam, and Thambipillai Srikanthan. 2013. "Efficient
heuristic and tabu search for hardware/software partitioning." The Journal of
Supercomputing 66 (1): 118-134.
Wu, Jigang, Qiqiang Sun, and Thambipillai Srikanthan. 2012. "Algorithmic aspects for
multiple-choice hardware/software partitioning." Computers & Operations
Research (Elsevier) 39 (12): 3281-3292. doi:10.1016/j.cor.2012.04.013.
Yu-dong, Zhang, Wu Le-nan, and Wei Geng. 2009. "Hardware/software partition using
adaptive ant colony algorithm." Control and Decision 24 (9): 1385-1389.
Zhang, Yao, and John D. Owens. 2011. "A Quantitative Performance Analysis Model for
GPU Architectures." 17th International Symposium on High Performance
Computer Architecture (HPCA). San Antonio, TX: IEEE. 382-393.
doi:10.1109/HPCA.2011.5749745.
Zitzler, Eckart, Marco Laumanns, and Lothar Thiele. 2001. SPEA2: Improved the
Performance of the Strength Pareto Evolutionary Algorithm. Computer
143
Engineering and Communication Networks Lac (TIK), Swiss Federal institute of
Technology (ETH), Zurich: , 21 pp.
144
Chapter 7 - COPYRIGHTS
A World in Motion – Systems Engineering Vision 2025 (INCOSE)
This product was prepared by the Systems Engineering Vision 2025 Project Team of the
International Council on Systems Engineering (INCOSE). It is approved by the INCOSE
Technical Operations for release as an INCOSE Technical Product.
Copyright ©2014 by INCOSE, subject to the following restrictions:
Author use: Authors have full rights to use their contributions in a totally unfettered way
with credit to the INCOSE Technical Product.
INCOSE use: Permission to reproduce this document and to prepare derivative works
from this document for INCOSE use is granted provided this copyright notice is included
with all reproductions and derivative works.
External Use: This document may be shared or distributed to non-INCOSE third parties.
Requests for permission to reproduce this document in whole are granted provided it is
not altered in any way.
Extracts for use in other works are permitted provided this copyright notice and INCOSE
attribution are included with all reproductions; and, all uses including derivative works
and commercial use, acquire additional permission for use of images unless indicated as a
public image in the General Domain.
Requests for permission to prepare derivative works of this document or any for
commercial use will be denied unless covered by other formal agreements with INCOSE.
Contact INCOSE Administration Office, 7670 Opportunity Rd., Suite 220, San Diego,
CA 92111-2222, USA.
145
Appendix A Oversized Figures
This appendix contains figures referenced in the main document are best represented in landscape orientation.
Figure A-1 Architecture Analysis Container IBD
ibd [Block] CMP_ArchitectureAnalysis [IBD_CMP_ArchitectureAnalysis]
OptimizationAnalysisPart1
IB_ThreadWeight
trrweighti
IB_AttributeWeight
attrweighti
IB_OptimControl
optctli
IB_DagThreadData
optdatin
IB_OptimMode
optmdi
OpimizationModePart1
IB_OptimModeopmdo
CMP_PhysicalArchitectureContainerPart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsinIB_ThreadConstraints
thrconstri
IB_ThreadWeight
trrweighti
IB_AttributeWeight
attrweighti
IB_LogicalArchContainerOut
logarchconout
IB_PhysArchResultsAvailablersltphysarchout
la_numfuncperthrin
la_numfuncthrin
IB_NumberPhysArchs
numphysarchsin
IB_StartAnalysisEvent
startanalin
IB_SimControl
simctlout
IB_SimulationData
simdatout
IB_SimMode
simmdin
IB_OptimControl
optctlo
IB_DagThreadData, IB_DagFunctionData
optdatout
IB_OptimMode
optmdi
IB_AnalysisState
asi
IB_AnalysisMode
ati
CMP_LogicalArchitectureContainerPart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattro
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattri
IB_LogicalArchContainerOut
logarchconout
la_numfuncperthrout
la_numfuncthrout
IB_LogicalArchContainerInlogarchconin
attrparamsout
SimulationAnalysisPart1
IB_SimControl
simctli
IB_SimulationData
simdatin
IB_SimMode
simmdin
SimulationModePart1
IB_SimModesimmdout
AnalysisModePart1
IB_AnalysisMode
amo
AnalysisStatePart1
IB_AnalysisStateaso
StartArchitectureAnalysisPart1
IB_LogicalArchContainerIn
logarchconout
IB_PhysArchResultsAvailable
rsltphysarchin
IB_NumberLogicalArchs
numlogarchsin
IB_StartAnalysisEvent
startanalout
CMP_NumberPhysicalArchitecturesPart1
IB_NumberPhysArchs
numphysarchsout
CMP_NumberLogicalArchitecturesPart1
IB_NumberLogicalArchs
numlogarchsout
AttributeWeightPart1
IB_AttributeWeight
attrweighto
ThreadWeightPart1
IB_ThreadWeight
trrweighto
ThreadConstraintsPart1
IB_ThreadConstraints
thrconstro
146
Figure A-2 StartArchitectureAnalysisBlock Activity and AD
ibd [Block] StartArchitectureAnalysisBlock[IBD_StartArchitectureAnalysis]
ACT_StartArchitectureAnalysis«Activity»
act [Block] StartArchitectureAnalysisBlock [ACT_StartArchitectureAnalysis]
Subsequemt_Activity_Entry
PhysicalArchitectureResultsAvailable
DN_1
StartArchitectureAnalysis to startanalout
[LOC_CurrentLogicalArchitecture < IN_NumberLogicalArchitectures]
[else]
LOC_CurrentLogicalArchitecture++;
logarchconout.incrementCurrentLogicalArchitecture();
Initial_Activity_Entry
StartArchitectureAnalysis to startanalout
100
IN_NumberLogicalArchitectures =
numlogarchsin.getNumberLogicalArchitectures();
Subsequemt_Activity_Entry
PhysicalArchitectureResultsAvailable
DN_1
StartArchitectureAnalysis to startanalout
[LOC_CurrentLogicalArchitecture < IN_NumberLogicalArchitectures]
[else]
LOC_CurrentLogicalArchitecture++;
logarchconout.incrementCurrentLogicalArchitecture();
Initial_Activity_Entry
StartArchitectureAnalysis to startanalout
100
IN_NumberLogicalArchitectures =
numlogarchsin.getNumberLogicalArchitectures();
Subsequemt_Activity_Entry
PhysicalArchitectureResultsAvailable
DN_1
StartArchitectureAnalysis to startanalout
[LOC_CurrentLogicalArchitecture < IN_NumberLogicalArchitectures]
[else]
LOC_CurrentLogicalArchitecture++;
logarchconout.incrementCurrentLogicalArchitecture();
Initial_Activity_Entry
StartArchitectureAnalysis to startanalout
100
IN_NumberLogicalArchitectures =
numlogarchsin.getNumberLogicalArchitectures();
147
Figure A-3 LogArch Container IBD
ibd [Block] CMP_LogicalArchitectureContainer [IBD_CMP_LogArch_Container]
IB_ArchitectureAttributeResultsAvailable
rsltarchattro
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattri
IB_LogicalArchContainerOut
logarchconout
la_numfuncperthrout
IB_LogArchOne_FunctionalThreads, IB_LogArchTwo_FunctionalThreads, IB_LogArchThree_FunctionalThreads, IB_LogArchFour_FunctionalThreads
la_numfuncthrout
IB_LogicalArchContainerIn
logarchconin
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
CMP_Candidate_1_LogicalArchitecturePart1
IB_LogArchOne_StartRetrieveArchAttrs
strtlogarchonearchattri
IB_LogArchOne_ArchAttrResultsAvailable
rsltlogarchonearchattro
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchOne_FunctionalThreads
la1_numfthrout
IB_LogArchOne_ThrOne_NumberFunctions, IB_LogArchOne_ThrTwo_NumberFunctions, IB_LogArchOne_ThrThree_NumberFunctions, IB_LogArchOne_ThrFour_NumberFunctions
la1_numfuncperthrout
CMP_Candidate_2_LogicalArchitecturePart1
IB_LogArchTwo_StartRetrieveArchAttrs
strtlogarchtwoarchattri
IB_LogArchTwo_ArchAttrResultsAvailable
rsltlogarchtwoarchattro
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchTwo_FunctionalThreads
la2_numfthrout
IB_LogArchTwo_ThrOne_NumberFunctions, IB_LogArchTwo_ThrTwo_NumberFunctions, IB_LogArchTwo_ThrThree_NumberFunctions, IB_LogArchTwo_ThrFour_NumberFunctions
la2_numfuncperthrout
CMP_Candidate_3_LogicalArchitecturePart1
IB_LogArchThree_ArchAttrResultsAvailable
rsltlogarchthreearchattro
IB_LogArchThree_StartRetrieveArchAttrs
strtlogarchthreearchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchThree_FunctionalThreads
la3_numfthrout
IB_LogArchThree_ThrOne_NumberFunctions, IB_LogArchThree_ThrTwo_NumberFunctions, IB_LogArchThree_ThrThree_NumberFunctions, IB_LogArchThree_ThrFour_NumberFunctions
la3_numfuncperthrout
CMP_Candidate_4_LogicalArchitecturePart1
IB_LogArchFour_ArchAttrResultsAvailable
rsltlogarchfourarchattro
IB_LogArchFour_StartRetrieveArchAttrs
strtlogarchfourarchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchFour_FunctionalThreads
la4_numfthrout
IB_LogArchFour_ThrOne_NumberFunctions, IB_LogArchFour_ThrTwo_NumberFunctions, IB_LogArchFour_ThrThree_NumberFunctions, IB_LogArchFour_ThrFour_NumberFunctions
la4_numfuncperthrout
The model currently supports 4 logical architectures. This enables the tradeoff of four logical architectures. The model can be extended if more are required.
CMP_LogArch_ExecutionControlPart1
IB_LogArchFour_StartRetrieveArchAttrs
strtlogarchfourarchattro
IB_LogArchThree_StartRetrieveArchAttrs
strtlogarchthreearchattro
IB_LogArchTwo_StartRetrieveArchAttrs
strtlogarchtwoarchattro
IB_LogArchOne_StartRetrieveArchAttrs
strtlogarchonearchattro
IB_LogArchFour_ArchAttrResultsAvailable
rsltlogarchfourarchattri
IB_LogArchThree_ArchAttrResultsAvailable
rsltlogarchthreearchattri
IB_LogArchTwo_ArchAttrResultsAvailable
rsltlogarchtwoarchattri
IB_LogArchOne_ArchAttrResultsAvailable
rsltlogarchonearchattri
IB_ArchitectureAttributeResultsAvailable
rsltarchattro
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattri
148
Figure A-4 LogArch Container Perform Computations Execution Control AD
act [Block] CMP_LogArch_ExecutionControl [ACT_LogArchExecutionControl]
ResultsAvailableProcessing
ArchitectureAttributeResultsAvailable(LogArch, ThrNum, FuncNum, AttrTyp) to rsltarchattro
AttrTypFuncNumThrNumLogArch
setResultsEventParams
AttrTyp FuncNum ThrNum
LogArchThree_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNumLogArchOne_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNum
LogArchTwo_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNumLogArchFour_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNum
checkLogicalArchitectureNumber
getLogicalArchitectureStatus
RETURN
getFunctionNumberStatus
RETURN
getThreadNumberStatus
RETURN
getAttributeTypeStatus
RETURN
DN_5
[LOC_LogicalArchitectureMatch == true]
LOC_LogicalArchitectureStatus = 3;
LOC_LogicalArchitectureStatus = 2;
LOC_LogicalArchitectureStatus = 4;LOC_LogicalArchitectureStatus = 1;
LOC_ErrorFlag = true;
[else]
StartEventProcessing
LogArchOne_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchonearchattroAttrTyp
FuncNum
ThrNum
LogArchFour_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchfourarchattroAttrTyp
FuncNum
ThrNum
LogArchThree_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchthreearchattroAttrTyp
FuncNum
ThrNum
LogArchTwo_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchtwoarchattroAttrTyp
FuncNum
ThrNum
StartRetrieveLogicalArchitectureAttributes
AttrTypFuncNumThrNumLogArch
setStartEventParams
AttrTyp
FuncNumThrNumLogArchNum
DN_1[LOC_LogicalArchitectureCommand == 1]
DN_2
[LOC_LogicalArchitectureCommand == 2]
[else]
DN_3[LOC_LogicalArchitectureCommand == 3]
[else]
DN_4[LOC_LogicalArchitectureCommand == 4]
[else]
LOC_ErrorFlag = true;
[else]
ResultsAvailableProcessing
ArchitectureAttributeResultsAvailable(LogArch, ThrNum, FuncNum, AttrTyp) to rsltarchattro
AttrTypFuncNumThrNumLogArch
setResultsEventParams
AttrTyp FuncNum ThrNum
LogArchThree_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNumLogArchOne_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNum
LogArchTwo_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNumLogArchFour_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNum
checkLogicalArchitectureNumber
getLogicalArchitectureStatus
RETURN
getFunctionNumberStatus
RETURN
getThreadNumberStatus
RETURN
getAttributeTypeStatus
RETURN
DN_5
[LOC_LogicalArchitectureMatch == true]
LOC_LogicalArchitectureStatus = 3;
LOC_LogicalArchitectureStatus = 2;
LOC_LogicalArchitectureStatus = 4;LOC_LogicalArchitectureStatus = 1;
LOC_ErrorFlag = true;
[else]
StartEventProcessing
LogArchOne_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchonearchattroAttrTyp
FuncNum
ThrNum
LogArchFour_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchfourarchattroAttrTyp
FuncNum
ThrNum
LogArchThree_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchthreearchattroAttrTyp
FuncNum
ThrNum
LogArchTwo_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchtwoarchattroAttrTyp
FuncNum
ThrNum
StartRetrieveLogicalArchitectureAttributes
AttrTypFuncNumThrNumLogArch
setStartEventParams
AttrTyp
FuncNumThrNumLogArchNum
DN_1[LOC_LogicalArchitectureCommand == 1]
DN_2
[LOC_LogicalArchitectureCommand == 2]
[else]
DN_3[LOC_LogicalArchitectureCommand == 3]
[else]
DN_4[LOC_LogicalArchitectureCommand == 4]
[else]
LOC_ErrorFlag = true;
[else]
ResultsAvailableProcessing
ArchitectureAttributeResultsAvailable(LogArch, ThrNum, FuncNum, AttrTyp) to rsltarchattro
AttrTypFuncNumThrNumLogArch
setResultsEventParams
AttrTyp FuncNum ThrNum
LogArchThree_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNumLogArchOne_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNum
LogArchTwo_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNumLogArchFour_ArchAttrResultsAvailable
AttrTyp FuncNum ThrNum
checkLogicalArchitectureNumber
getLogicalArchitectureStatus
RETURN
getFunctionNumberStatus
RETURN
getThreadNumberStatus
RETURN
getAttributeTypeStatus
RETURN
DN_5
[LOC_LogicalArchitectureMatch == true]
LOC_LogicalArchitectureStatus = 3;
LOC_LogicalArchitectureStatus = 2;
LOC_LogicalArchitectureStatus = 4;LOC_LogicalArchitectureStatus = 1;
LOC_ErrorFlag = true;
[else]
StartEventProcessing
LogArchOne_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchonearchattroAttrTyp
FuncNum
ThrNum
LogArchFour_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchfourarchattroAttrTyp
FuncNum
ThrNum
LogArchThree_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchthreearchattroAttrTyp
FuncNum
ThrNum
LogArchTwo_StartRetrieveArchAttrs(ThrNum, FuncNum, AttrTyp) to strtlogarchtwoarchattroAttrTyp
FuncNum
ThrNum
StartRetrieveLogicalArchitectureAttributes
AttrTypFuncNumThrNumLogArch
setStartEventParams
AttrTyp
FuncNumThrNumLogArchNum
DN_1[LOC_LogicalArchitectureCommand == 1]
DN_2
[LOC_LogicalArchitectureCommand == 2]
[else]
DN_3[LOC_LogicalArchitectureCommand == 3]
[else]
DN_4[LOC_LogicalArchitectureCommand == 4]
[else]
LOC_ErrorFlag = true;
[else]
149
Figure A-5 LogArch One Functional Thread One Architecture IBD
ibd [Block] CMP_Candidate_1_LogicalArchitecture [IBD_LogArchOne]
IB_LogArchOne_StartRetrieveArchAttrs
strtlogarchonearchattri
IB_LogArchOne_ArchAttrResultsAvailable
rsltlogarchonearchattro
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchOne_FunctionalThreads
la1_numfthrout
IB_LogArchOne_ThrOne_NumberFunctions, IB_LogArchOne_ThrTwo_NumberFunctions, IB_LogArchOne_ThrThree_NumberFunctions, IB_LogArchOne_ThrFour_NumberFunctions
la1_numfuncperthrout
CMP_LogArchOne_ThreadOnePart1
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsiziIB_LogArchOneThrOne_ArchAttrResultsAvailable
rsltlogarchonethronearchattro
IB_LogArchOneThrOne_StartRetrieveArchAttrs
strtlogarchonethronearchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchOne_ThrOne_NumberFunctions
la1_ft1_numfuncout
The model currently supports 4 functional threads per logical architecture. The model can be extended if more are required.
CMP_LogArchOne_ThreadTwoPart1
IB_LogArchOneThrTwo_InputDataSize
logarchonethrtwoindatsizi
IB_LogArchTwo_ArchAttrResultsAvailable
rsltlogarchonethrtwoarchattro
IB_LogArchOneThrTwo_StartRetrieveArchAttrs
strtlogarchonethrtwoarchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchOne_ThrTwo_NumberFunctions
la1_ft2_numfuncout
CMP_LogArchOne_ThreadThreePart1
IB_LogArchOneThrThree_InputDataSize
logarchonethrthreeindatsizi
IB_LogArchOneThrThree_ArchAttrResultsAvailable
rsltlogarchonethrthreearchattro
IB_LogArchOneThrThree_StartRetrieveArchAttrs
strtlogarchonethrthreearchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchOne_ThrThree_NumberFunctions
la1_ft3_numfuncout
CMP_LogArchOne_ThreadFourPart1
IB_LogArchOneThrFour_InputDataSize
logarchonethrfourindatsiziIB_LogArchOneThrFour_ArchAttrResultsAvailable
rsltlogarchonethrfourarchattro
IB_LogArchOneThrFour_StartRetrieveArchAttrs
strtlogarchonethrfourarchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchOne_ThrFour_NumberFunctions
la1_ft4_numfuncout
CMP_LogArchOne_ExecutionControlPart1
IB_LogArchOneThrTwo_StartRetrieveArchAttrs
strtlogarchonethrtwoarchattro
IB_LogArchOneThrThree_StartRetrieveArchAttrs
strtlogarchonethrthreearchattro
IB_LogArchOneThrFour_StartRetrieveArchAttrs
strtlogarchonethrfourarchattro
IB_LogArchOneThrOne_StartRetrieveArchAttrs
strtlogarchonethronearchattro
IB_LogArchOneThrTwo_ArchAttrResultsAvailable
rsltlogarchonethrtwoarchattri
IB_LogArchOneThrThree_ArchAttrResultsAvailable
rsltlogarchonethrthreearchattri
IB_LogArchOneThrFour_ArchAttrResultsAvailable
rsltlogarchonethrfourarchattri
IB_LogArchOneThrOne_ArchAttrResultsAvailable
rsltlogarchonethronearchattri IB_LogArchOne_StartRetrieveArchAttrs
strtlogarchonearchattri
IB_LogArchOne_ArchAttrResultsAvailable
rsltlogarchonearchattro
CMP_LogArchOneThrOne_DataInputSizePart1
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizo
CMP_LogArchOneThrFour_DataInputSizePart1
IB_LogArchOneThrFour_InputDataSize
logarchonethrfourindatsizo
CMP_LogArchOneThrThree_DataInputSizePart1
IB_LogArchOneThrThree_InputDataSize
logarchonethrthreeindatsizo
CMP_LogArchOneThrTwo_DataInputSizePart1
IB_LogArchOneThrTwo_InputDataSize
logarchonethrtwoindatsizo
150
Figure A-6 LogArch Functional Threads Perform Computations Exec Ctl AD
act [Block] CMP_LogArchOne_ExecutionControl [ACT_LogArchOneExecutionControl]
ResultsAvailableProcessing
LogArchOne_ArchAttrResultsAvailable(ThrNum, FuncNum, AttrTyp) to rsltlogarchonearchattro
AttrTyp FuncNum ThrNum
setResultsEventParams
AttrTyp FuncNum
LogArchOneThrThree_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrFour_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrOne_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrTwo_ArchAttrResultsAvailable
ArchTyp FuncNum
getFunctionNumberStatus
RETURN
getThreadNumberStatus
RETURN
checkThreadNumber
getAttributeTypeStatus
RETURN
DN_5
[LOC_ThreadNumberMatch == true]
LOC_ThreadNumberStatus = 3;LOC_ThreadNumberStatus = 2;LOC_ThreadNumberStatus = 1;
LOC_ErrorFlag = true;
[else]
LOC_ThreadNumberStatus = 4;
StartEventProcessing
LogArchOneThrTwo_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrtwoarchattroAttrTyp
FuncNum
LogArchOneThrThree_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrthreearchattroAttrTyp
FuncNum
LogArchOneThrOne_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethronearchattroAttrTyp
FuncNum
LogArchOneThrFour_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrfourarchattroAttrTyp
FuncNum
LogArchOne_StartRetrieveArchAttrs
AttrTypFuncNumThrNum
setStartEventParams
AttrTypFuncNumThrNum
DN_1[LOC_ThreadNumberCommand == 1]
DN_2[LOC_ThreadNumberCommand == 2]
[else]
DN_3[LOC_ThreadNumberCommand == 3]
[else]
DN_4[LOC_ThreadNumberCommand == 4]
[else]
LOC_ErrorFlag = true;
[else]
ResultsAvailableProcessing
LogArchOne_ArchAttrResultsAvailable(ThrNum, FuncNum, AttrTyp) to rsltlogarchonearchattro
AttrTyp FuncNum ThrNum
setResultsEventParams
AttrTyp FuncNum
LogArchOneThrThree_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrFour_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrOne_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrTwo_ArchAttrResultsAvailable
ArchTyp FuncNum
getFunctionNumberStatus
RETURN
getThreadNumberStatus
RETURN
checkThreadNumber
getAttributeTypeStatus
RETURN
DN_5
[LOC_ThreadNumberMatch == true]
LOC_ThreadNumberStatus = 3;LOC_ThreadNumberStatus = 2;LOC_ThreadNumberStatus = 1;
LOC_ErrorFlag = true;
[else]
LOC_ThreadNumberStatus = 4;
StartEventProcessing
LogArchOneThrTwo_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrtwoarchattroAttrTyp
FuncNum
LogArchOneThrThree_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrthreearchattroAttrTyp
FuncNum
LogArchOneThrOne_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethronearchattroAttrTyp
FuncNum
LogArchOneThrFour_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrfourarchattroAttrTyp
FuncNum
LogArchOne_StartRetrieveArchAttrs
AttrTypFuncNumThrNum
setStartEventParams
AttrTypFuncNumThrNum
DN_1[LOC_ThreadNumberCommand == 1]
DN_2[LOC_ThreadNumberCommand == 2]
[else]
DN_3[LOC_ThreadNumberCommand == 3]
[else]
DN_4[LOC_ThreadNumberCommand == 4]
[else]
LOC_ErrorFlag = true;
[else]
ResultsAvailableProcessing
LogArchOne_ArchAttrResultsAvailable(ThrNum, FuncNum, AttrTyp) to rsltlogarchonearchattro
AttrTyp FuncNum ThrNum
setResultsEventParams
AttrTyp FuncNum
LogArchOneThrThree_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrFour_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrOne_ArchAttrResultsAvailable
AttrTyp FuncNum
LogArchOneThrTwo_ArchAttrResultsAvailable
ArchTyp FuncNum
getFunctionNumberStatus
RETURN
getThreadNumberStatus
RETURN
checkThreadNumber
getAttributeTypeStatus
RETURN
DN_5
[LOC_ThreadNumberMatch == true]
LOC_ThreadNumberStatus = 3;LOC_ThreadNumberStatus = 2;LOC_ThreadNumberStatus = 1;
LOC_ErrorFlag = true;
[else]
LOC_ThreadNumberStatus = 4;
StartEventProcessing
LogArchOneThrTwo_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrtwoarchattroAttrTyp
FuncNum
LogArchOneThrThree_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrthreearchattroAttrTyp
FuncNum
LogArchOneThrOne_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethronearchattroAttrTyp
FuncNum
LogArchOneThrFour_StartRetrieveArchAttrs(FuncNum, AttrTyp) to strtlogarchonethrfourarchattroAttrTyp
FuncNum
LogArchOne_StartRetrieveArchAttrs
AttrTypFuncNumThrNum
setStartEventParams
AttrTypFuncNumThrNum
DN_1[LOC_ThreadNumberCommand == 1]
DN_2[LOC_ThreadNumberCommand == 2]
[else]
DN_3[LOC_ThreadNumberCommand == 3]
[else]
DN_4[LOC_ThreadNumberCommand == 4]
[else]
LOC_ErrorFlag = true;
[else]
151
Figure A-7 LogArch One Functional Thread One Architecture IBD
ibd [Block] CMP_LogArchOne_ThreadOne [IBD_LogArchOne_ThreadOne]
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizi
IB_LogArchOneThrOne_ArchAttrResultsAvailable
rsltlogarchonethronearchattro
IB_LogArchOneThrOne_StartRetrieveArchAttrs
strtlogarchonethronearchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
IB_LogArchOne_ThrOne_NumberFunctions
la1_ft1_numfuncout
CMP_LogArchOne_ThreadOne_FunctionOnePart1
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizi
IB_LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
rsltlogarchonethronefunconearchattro
IB_LogArchOneThrOneFuncOne_StartRetrieveArchAttrs
strtlogarchonethronefunconearchattri
AlgorithmPerformanceInterfaceBlock, AlgorithmEnergyInterfaceBlock, AlgorithmThermalInterfaceBlock
attrparamso
CMP_LogArchOne_ThreadOne_FunctionTwoPart1
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizi
IB_LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable
rsltlogarchonethronefunctwoarchattro
IB_LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs
strtlogarchonethronefunctwoarchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
CMP_LogArchOne_ThreadOne_FunctionThreePart1
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizi
IB_LogArchOneThrOneFuncThree_ArchAttrResultsAvailable
rsltlogarchonethronefuncthreearchattro
IB_LogArchOneThrOneFuncThree_StartRetrieveArchAttrs
strtlogarchonethronefuncthreearchattri
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
CMP_LogArchOne_ThreadOne_FunctionFourPart1
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizi IB_LogArchOneThrOneFuncFour_StartRetrieveArchAttrs
strtlogarchonethronefuncfourarchattri
IB_LogArchOneThrOneFuncFour_ArchAttrResultsAvailable
rsltlogarchonethronefuncfouarchattro
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsout
The model currently supports 4 functions per thread. The model can be extended if more are required.
CMP_LogArchOneThrOne_ExecutionControlPart1
IB_LogArchOneThrOne_StartRetrieveArchAttrs
strtlogarchonethronearchattri
IB_LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs
strtlogarchonethronefunctwoarchattro
IB_LogArchOneThrOneFuncOne_StartRetrieveArchAttrs
strtlogarchonethronefunconearchattro
IB_LogArchOneThrThree_StartRetrieveArchAttrs
strtlogarchonethronefuncthreearchattro
IB_LogArchOneThrOneFuncFour_StartRetrieveArchAttrs
strtlogarchonethronefuncfourarchattro
IB_LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable
rsltlogarchonethronefunctwoarchattri
IB_LogArchOneThrOneFuncThree_ArchAttrResultsAvailable
rsltlogarchonethronefuncthreearchattri
IB_LogArchOneThrOneFuncFour_ArchAttrResultsAvailable
rsltlogarchonethronefuncfourarchattri
IB_LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
rsltlogarchonethronefunconearchattri
IB_LogArchOneThrOne_ArchAttrResultsAvailable
rsltlogarchonethronearchattro
152
Figure A-8 Functional Thread Functions Perform Computations Exec Ctl AD
act [Block] CMP_LogArchOneThrOne_ExecutionControl [ACT_LogArchOneThrOne_ExecutionControl]
ResultaAvailableProcessing
LogArchOneThrOne_ArchAttrResultsAvailable(FuncNum, AttrTyp) to rsltlogarchonethronearchattro
AttrTyp FuncNum
setResultsEventParams
AttrTyp
LogArchOneThrOneFuncFour_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncThree_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable
AttrTyp
checkFunctionNumber
getFunctionNumberStatus
RETURN
getAttributeTypeStatus
RETURN
DN_5
[LOC_FunctionNumberMatch == true]
LOC_FunctionNumberStatus = 4;LOC_FunctionNumberStatus = 3;LOC_FunctionNumberStatus = 1; LOC_FunctionNumberStatus = 2;
LOC_ErrorFlag = true;
[else]
StartEventProcessing
LogArchOneThrOneFuncOne_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunconearchattroAttrTyp
LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunctwoarchattroAttrTyp
LogArchOneThrOneFuncThree_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncthreearchattroAttrTyp
LogArchOneThrOneFuncFour_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncfourarchattroAttrTyp
LogArchOneThrOne_StartRetrieveArchAttrs
AttrTypFuncNum
setStartEventParameters
AttrTypFuncNum
DN_1[LOC_FunctionNumberCommand == 1]
DN_4[LOC_FunctionNumberCommand == 4]
DN_2[LOC_FunctionNumberCommand == 2]
[else]
DN_3[LOC_FunctionNumberCommand == 3]
[else]
[else]
LOC_ErrorFlag = true;
[else]
ResultaAvailableProcessing
LogArchOneThrOne_ArchAttrResultsAvailable(FuncNum, AttrTyp) to rsltlogarchonethronearchattro
AttrTyp FuncNum
setResultsEventParams
AttrTyp
LogArchOneThrOneFuncFour_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncThree_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable
AttrTyp
checkFunctionNumber
getFunctionNumberStatus
RETURN
getAttributeTypeStatus
RETURN
DN_5
[LOC_FunctionNumberMatch == true]
LOC_FunctionNumberStatus = 4;LOC_FunctionNumberStatus = 3;LOC_FunctionNumberStatus = 1; LOC_FunctionNumberStatus = 2;
LOC_ErrorFlag = true;
[else]
StartEventProcessing
LogArchOneThrOneFuncOne_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunconearchattroAttrTyp
LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunctwoarchattroAttrTyp
LogArchOneThrOneFuncThree_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncthreearchattroAttrTyp
LogArchOneThrOneFuncFour_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncfourarchattroAttrTyp
LogArchOneThrOne_StartRetrieveArchAttrs
AttrTypFuncNum
setStartEventParameters
AttrTypFuncNum
DN_1[LOC_FunctionNumberCommand == 1]
DN_4[LOC_FunctionNumberCommand == 4]
DN_2[LOC_FunctionNumberCommand == 2]
[else]
DN_3[LOC_FunctionNumberCommand == 3]
[else]
[else]
LOC_ErrorFlag = true;
[else]
ResultaAvailableProcessing
LogArchOneThrOne_ArchAttrResultsAvailable(FuncNum, AttrTyp) to rsltlogarchonethronearchattro
AttrTyp FuncNum
setResultsEventParams
AttrTyp
LogArchOneThrOneFuncFour_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncThree_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
AttrTyp
LogArchOneThrOneFuncTwo_ArchAttrResultsAvailable
AttrTyp
checkFunctionNumber
getFunctionNumberStatus
RETURN
getAttributeTypeStatus
RETURN
DN_5
[LOC_FunctionNumberMatch == true]
LOC_FunctionNumberStatus = 4;LOC_FunctionNumberStatus = 3;LOC_FunctionNumberStatus = 1; LOC_FunctionNumberStatus = 2;
LOC_ErrorFlag = true;
[else]
StartEventProcessing
LogArchOneThrOneFuncOne_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunconearchattroAttrTyp
LogArchOneThrOneFuncTwo_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefunctwoarchattroAttrTyp
LogArchOneThrOneFuncThree_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncthreearchattroAttrTyp
LogArchOneThrOneFuncFour_StartRetrieveArchAttrs(AttrTyp) to strtlogarchonethronefuncfourarchattroAttrTyp
LogArchOneThrOne_StartRetrieveArchAttrs
AttrTypFuncNum
setStartEventParameters
AttrTypFuncNum
DN_1[LOC_FunctionNumberCommand == 1]
DN_4[LOC_FunctionNumberCommand == 4]
DN_2[LOC_FunctionNumberCommand == 2]
[else]
DN_3[LOC_FunctionNumberCommand == 3]
[else]
[else]
LOC_ErrorFlag = true;
[else]
153
Figure A-9 Architecture Attributes Per Function IBD
ibd [Block] CMP_LogArchOne_ThreadOne_FunctionOne [IBD_LogArchOneThrOneFuncOne]
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizi
IB_LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
rsltlogarchonethronefunconearchattro
IB_LogArchOneThrOneFuncOne_StartRetrieveArchAttrs
strtlogarchonethronefunconearchattri
AlgorithmPerformanceInterfaceBlock, AlgorithmEnergyInterfaceBlock, AlgorithmThermalInterfaceBlock
attrparamso
CMP_LogArchOneThrOneFuncOne_PerfAttrPart1
IB_LogArchOneThrOne_InputDataSize
logarchonethroneindatsizi
AlgorithmPerformanceInterfaceBlock
attrperfparamso
IB_LogArchOneThrOneFuncOne_StartRetrievePerfAttrs
strtlogarchonethronefunconeperfattri
IB_LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
rsltlogarchonethronefunconeperfattro
This IBD currently only supports the Performance Attribute. Subsequent versions will add support for the Energy and Thermal attributes. This model also requires the addition of a function that computes the performance computations required for this function.
CMP_LogArchOneThrOneFuncOne_EnergyAttrPart1
AlgorithmEnergyInterfaceBlock
attrenerparamsoIB_LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs
strtlogarchonetheonefunconeenerattri
IB_LogArchOneThrOneFuncOne_EnerAttrResultsAvailable
rsltlogarchonethronefunconeenerattro
CMP_LogArchOneThrOneFuncOne_ThermAttrPart1
AlgorithmThermalInterfaceBlock
attrthermparamsoIB_LogArchOneThrOneFuncOne_StartRetrieveThermAttrs
strtlogarchonethronefunconethermattri
IB_LogArchOneThrOneFuncOne_ThermAttrResultsAvailable
rsltlogarchonethronefunconethermattro
CMP_LogArchOneThrOneFuncOne_ExecutionControlPart1
IB_LogArchOneThrOneFuncOne_ThermAttrResultsAvailable
rsltlogarchonethronefunconethermattri
IB_LogArchOneThrOneFuncOne_StartRetrieveThermAttrs
strtlogarchonethronefunconethermattro
IB_LogArchOneThrOneFuncOne_EnerAttrResultsAvailable
rsltlogarchonethronefunconeenerattri
IB_LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs
strtlogarchonetheonefunconeenerattro
IB_LogArchOneThrOneFuncOne_StartRetrievePerfAttrs
strtlogarchonethronefunconeperfattro
IB_LogArchOneThrOneFuncOne_PerfAttrResultsAvailable
rsltlogarchonethronefunconeperfattri
IB_LogArchOneThrOneFuncOne_ArchAttrResultsAvailable
rsltlogarchonethronefunconearchattro
IB_LogArchOneThrOneFuncOne_StartRetrieveArchAttrs
strtlogarchonethronefunconearchattri
154
Figure A-10 Architecture Attributes Per Function Execution Control AD
act [Block] CMP_LogArchOneThrOneFuncOne_ExecutionControl [ACT_LogArchOneThrOneFuncOne_ExecutionControl]
PropagateAttributeResults
LogArchOneThrOneFuncOne_ArchAttrResultsAvailable(AttrTyp) to rsltlogarchonethronefunconearchattro
AttrTyp
LogArchOneThrOneFuncOne_ThermAttrResultsAvailableLogArchOneThrOneFuncOne_EnerAttrResultsAvailable LogArchOneThrOneFuncOne_PerfAttrResultsAvailable
getAttributeTypeStatus
RETURN
checkAttributeType
DN_4
[LOC_AttributeTypeMatch == true]
LOC_AttributeTypeStatus = Performance;LOC_AttributeTypeStatus = Energy;
LOC_AttributeTypeStatus = Thermal;
LOC_ErrorFlag = true;
[else]
InitiateAttributeRetrieval
LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs to strtlogarchonetheonefunconeenerattro
LogArchOneThrOneFuncOne_StartRetrieveThermAttrs to strtlogarchonethronefunconethermattro
LogArchOneThrOneFuncOne_StartRetrievePerfAttrs to strtlogarchonethronefunconeperfattro
LogArchOneThrOneFuncOne_StartRetrieveArchAttrs
AttrTyp
setEventParams
AttrTyp
DN_3[LOC_AttributeType == Thermal]
DN_1[LOC_AttributeType == Energy]
DN_2[LOC_AttributeType = Performance]
[else]
[else]
LOC_ErrorFlag = true;
[else]
PropagateAttributeResults
LogArchOneThrOneFuncOne_ArchAttrResultsAvailable(AttrTyp) to rsltlogarchonethronefunconearchattro
AttrTyp
LogArchOneThrOneFuncOne_ThermAttrResultsAvailableLogArchOneThrOneFuncOne_EnerAttrResultsAvailable LogArchOneThrOneFuncOne_PerfAttrResultsAvailable
getAttributeTypeStatus
RETURN
checkAttributeType
DN_4
[LOC_AttributeTypeMatch == true]
LOC_AttributeTypeStatus = Performance;LOC_AttributeTypeStatus = Energy;
LOC_AttributeTypeStatus = Thermal;
LOC_ErrorFlag = true;
[else]
InitiateAttributeRetrieval
LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs to strtlogarchonetheonefunconeenerattro
LogArchOneThrOneFuncOne_StartRetrieveThermAttrs to strtlogarchonethronefunconethermattro
LogArchOneThrOneFuncOne_StartRetrievePerfAttrs to strtlogarchonethronefunconeperfattro
LogArchOneThrOneFuncOne_StartRetrieveArchAttrs
AttrTyp
setEventParams
AttrTyp
DN_3[LOC_AttributeType == Thermal]
DN_1[LOC_AttributeType == Energy]
DN_2[LOC_AttributeType = Performance]
[else]
[else]
LOC_ErrorFlag = true;
[else]
PropagateAttributeResults
LogArchOneThrOneFuncOne_ArchAttrResultsAvailable(AttrTyp) to rsltlogarchonethronefunconearchattro
AttrTyp
LogArchOneThrOneFuncOne_ThermAttrResultsAvailableLogArchOneThrOneFuncOne_EnerAttrResultsAvailable LogArchOneThrOneFuncOne_PerfAttrResultsAvailable
getAttributeTypeStatus
RETURN
checkAttributeType
DN_4
[LOC_AttributeTypeMatch == true]
LOC_AttributeTypeStatus = Performance;LOC_AttributeTypeStatus = Energy;
LOC_AttributeTypeStatus = Thermal;
LOC_ErrorFlag = true;
[else]
InitiateAttributeRetrieval
LogArchOneThrOneFuncOne_StartRetrieveEnerAttrs to strtlogarchonetheonefunconeenerattro
LogArchOneThrOneFuncOne_StartRetrieveThermAttrs to strtlogarchonethronefunconethermattro
LogArchOneThrOneFuncOne_StartRetrievePerfAttrs to strtlogarchonethronefunconeperfattro
LogArchOneThrOneFuncOne_StartRetrieveArchAttrs
AttrTyp
setEventParams
AttrTyp
DN_3[LOC_AttributeType == Thermal]
DN_1[LOC_AttributeType == Energy]
DN_2[LOC_AttributeType = Performance]
[else]
[else]
LOC_ErrorFlag = true;
[else]
155
Figure A-11 Performance Attribute Per Function Execution Control AD
act [Block] CMP_LogArchOne_ThreadOne_FunctionOne_PerformanceAttrBlock[ACT_LogArchOneThrOneFuncOnePerformanceAttribute]
LogArchOneThrOneFuncOne_StartRetrievePerfAttrs
setPerformanceComputationValues
LogArchOneThrOneFuncOne_PerfAttrResultsAvailable to rsltlogarchonethronefunconeperfattro
156
Figure A-12 PhyArch Container IBD (Three Candidates)
ibd [Block] CMP_PhysicalArchitectureContainer [IBD_CMP_PhysArch_Container]
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsin
IB_ThreadConstraints
thrconstriIB_ThreadWeight
trrweighti
IB_AttributeWeight
attrweighti
IB_LogicalArchContainerOutlogarchconout
IB_PhysArchResultsAvailable
rsltphysarchout
la_numfuncperthrin
la_numfuncthrin
IB_NumberPhysArchs
numphysarchsin
IB_StartAnalysisEvent
startanalin
IB_SimControl
simctlout
IB_SimulationData
simdatout
IB_SimMode
simmdin
IB_OptimControl
optctlo
IB_DagThreadData, IB_DagFunctionData
optdatout
IB_OptimMode
optmdi
IB_AnalysisState
asi
IB_AnalysisMode
ati
CMP_Candidate_1_PhysicalArchitecturePart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattriIB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
attrparamsin
IB_ThreadConstraints
thrconstri
IB_ThreadWeight
thrweightiIB_AttributeWeight
attrweighti
la_numfuncperthrin
la_numfuncthrin
IB_LogicalArchContainerOut
logarchconout
IB_PhysArchOneResultsAvailable
rsltphysarchoneIB_StartPhysArchOne
strtphysarchone
IB_ThreadOneSimData, IB_ThreadTwoSimData, IB_ThreadThreeSimData
simdatout
IB_SimControl
simctli
IB_DagThreadData, IB_DagFunctionData
optdatoutIB_OptimControl
optctli
IB_SimMode
simmdin
IB_OptimMode
optmdi
IB_AnalysisMode
ati
IB_AnalysisState
asi
CMP_Candidate_2_PhysicalArchitecturePart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
attrparamsin
IB_ThreadConstraints
thrconstri
IB_ThreadWeight
trrweightiIB_AttributeWeight
attrweighti
la_numfuncthrin
la_numfuncperthrin
IB_LogicalArchContainerOut
logarchconout
IB_PhysArchTwoResultsAvailable
rsltphysarchtwo
IB_StartPhysArchTwo
strtphysarchtwo
IB_SimulationData
simdatout
IB_SimControl
simctli
IB_DagThreadData
optdatoutIB_OptimControl
optctli
IB_SimMode
simmdin
IB_OptimMode
optmdi
IB_AnalysisState
asi
IB_AnalysisMode
ati
CMP_Candidate_3_PhysicalArchitecturePart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
attrparamsin
IB_ThreadConstraints
thrconstriIB_ThreadWeight
trrweighti
IB_AttributeWeight
attrweighti
la_numfuncthrin
la_numfuncperthrin
IB_LogicalArchContainerOut
logarchconout
IB_PhysArchThreeResultsAvailable
rsltphysarchthree
IB_StartPhysArchThree
strtphysarchthree
IB_SimulationData
simdatout
IB_SimControl
simctli
IB_DagThreadData
optdatoutIB_OptimControl
optctli
IB_SimMode
simmdin
IB_OptimMode
optmdi
IB_AnalysisState
asi
IB_AnalysisMode
ati
CMP_PhysArch_ExecutionControlPart1
IB_PhysArchResultsAvailable
rsltphysarchout
IB_PhysArchThreeResultsAvailable
rsltphysarchthree
IB_PhysArchTwoResultsAvailable
rsltphysarchtwo
IB_PhysArchOneResultsAvailable
rsltphysarchone
IB_NumberPhysArchs
numphysarchsinIB_StartPhysArchThree
strtphysarchthree
IB_StartPhysArchTwo
strtphysarchtwo
IB_StartPhysArchOne
strtphysarchone
IB_StartAnalysisEvent
startanalin
The model supports four physicalarchitecture containers. Three are shown in this diagram.
157
Figure A-13 PhyArche Container Execution Control AD
act [Block] CMP_PhysArch_ExecutionControl[ACT_PhysArch_ExecutionControl]
FinishControl
PhysArchThreeResultsAvailable
LOC_CurrentPhysArch = 0;
PhysArchResultsAvailable to rsltphysarchout
PhysicalArchitectureThreeControl
PhysArchTwoResultsAvailable
StartPhysicalArchitecture_3 to strtphysarchthree
LOC_CurrentPhysArch++;
DN_2[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureTwoControl
PhysArchOneResultsAvailable
StartPhysicalArchitecture_2 to strtphysarchtwo
LOC_CurrentPhysArch++;
DN_1[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureOneControl
StartArchitectureAnalysis
StartPhysicalArchitecture_1 to strtphysarchone
IN_NumPhysArchs =
numphysarchsin.getNumberPhysicalArchitectures();
FinishControl
PhysArchThreeResultsAvailable
LOC_CurrentPhysArch = 0;
PhysArchResultsAvailable to rsltphysarchout
PhysicalArchitectureThreeControl
PhysArchTwoResultsAvailable
StartPhysicalArchitecture_3 to strtphysarchthree
LOC_CurrentPhysArch++;
DN_2[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureTwoControl
PhysArchOneResultsAvailable
StartPhysicalArchitecture_2 to strtphysarchtwo
LOC_CurrentPhysArch++;
DN_1[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureOneControl
StartArchitectureAnalysis
StartPhysicalArchitecture_1 to strtphysarchone
IN_NumPhysArchs =
numphysarchsin.getNumberPhysicalArchitectures();
FinishControl
PhysArchThreeResultsAvailable
LOC_CurrentPhysArch = 0;
PhysArchResultsAvailable to rsltphysarchout
PhysicalArchitectureThreeControl
PhysArchTwoResultsAvailable
StartPhysicalArchitecture_3 to strtphysarchthree
LOC_CurrentPhysArch++;
DN_2[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureTwoControl
PhysArchOneResultsAvailable
StartPhysicalArchitecture_2 to strtphysarchtwo
LOC_CurrentPhysArch++;
DN_1[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureOneControl
StartArchitectureAnalysis
StartPhysicalArchitecture_1 to strtphysarchone
IN_NumPhysArchs =
numphysarchsin.getNumberPhysicalArchitectures();
FinishControl
PhysArchThreeResultsAvailable
LOC_CurrentPhysArch = 0;
PhysArchResultsAvailable to rsltphysarchout
PhysicalArchitectureThreeControl
PhysArchTwoResultsAvailable
StartPhysicalArchitecture_3 to strtphysarchthree
LOC_CurrentPhysArch++;
DN_2[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureTwoControl
PhysArchOneResultsAvailable
StartPhysicalArchitecture_2 to strtphysarchtwo
LOC_CurrentPhysArch++;
DN_1[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureOneControl
StartArchitectureAnalysis
StartPhysicalArchitecture_1 to strtphysarchone
IN_NumPhysArchs =
numphysarchsin.getNumberPhysicalArchitectures();
FinishControl
PhysArchThreeResultsAvailable
LOC_CurrentPhysArch = 0;
PhysArchResultsAvailable to rsltphysarchout
PhysicalArchitectureThreeControl
PhysArchTwoResultsAvailable
StartPhysicalArchitecture_3 to strtphysarchthree
LOC_CurrentPhysArch++;
DN_2[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureTwoControl
PhysArchOneResultsAvailable
StartPhysicalArchitecture_2 to strtphysarchtwo
LOC_CurrentPhysArch++;
DN_1[LOC_CurrentPhysArch < IN_NumPhysArchs]
PhysArchResultsAvailable to rsltphysarchout
[else]
PhysicalArchitectureOneControl
StartArchitectureAnalysis
StartPhysicalArchitecture_1 to strtphysarchone
IN_NumPhysArchs =
numphysarchsin.getNumberPhysicalArchitectures();
158
Figure A-14 PhyArche One Container IBD (Three Threads Shown)
ibd [Block] CMP_Candidate_1_PhysicalArchitecture [IBD_Cand_1_PhysicalContainer]
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsin
IB_ThreadConstraintsthrconstri
IB_ThreadWeight
thrweighti
IB_AttributeWeight
attrweighti
la_numfuncperthrin
IB_LogArchOne_FunctionalThreads, IB_LogArchTwo_FunctionalThreads, IB_LogArchThree_FunctionalThreads, IB_LogArchFour_FunctionalThreads
la_numfuncthrin
IB_LogicalArchContainerOutlogarchconout
IB_PhysArchOneResultsAvailable
rsltphysarchone
IB_StartPhysArchOne
strtphysarchone
IB_ThreadOneSimData, IB_ThreadTwoSimData, IB_ThreadThreeSimData
simdatout
IB_SimControl
simctli
IB_DagThreadData, IB_DagFunctionData
optdatout
IB_OptimControl
optctli
IB_SimMode
simmdin
IB_OptimModeoptmdi
IB_AnalysisMode
ati
IB_AnalysisState
asi
PhysArchOne_CpuClockPart1
IB_CpuClockRatio
cpuckrato
PhysArchOne_GpuClockBlock1
IB_GpuClockRatio
gpuckrato
PhysArchOne_NumberCpus1
IB_NumberCpus
numcpuo
PhysArchOne_NumberGpuThreads1
IB_NumberGpuThreads
numgpuo
CMP_PhysArchOne_ThreadOnePart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
IB_ThreadConstraints
thrconstriIB_ThreadWeight
thrweighti
IB_AttributeWeight
attrweighti
la_numfuncperthrin
IB_StartPhysArchOneThrOne
strtphysarchonethronein
IB_PhysArchOneThrOneResultsAvailable
rsltphysarchonethroneout
IB_LogicalArchContainerOut
logarchconout
IB_ThreadOneCostData
throneopto
IB_ThreadOneSimData
thronesimo
attrparamsin
IB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_AnalysisMode
atiIB_AnalysisState
asi
IB_GpuClockRatio
gpuckrati
IB_CpuClockRatio
cpuckrati
CMP_PhysArchOne_DagThreadsPart1
IB_ThreadThreeCostData
thrthreeopti
IB_ThreadTwoCostData
thrtwoopti
IB_ThreadOneCostData
throneopti
IB_OptimControl
optctli
IB_DagThreadData
optdagthrout
IB_OptimModeoptmdi
Model provides 4 threads. Three areshown in this diagram.
CMP_PhysArchOne_ThreadTwoPart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattriIB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
attrparamsinIB_ThreadConstraints
thrconstriIB_ThreadWeight
thrweighti
IB_AttributeWeight
attrweighti
la_numfuncperthrin
IB_StartPhysArchOneThrTwo
strtphysarchonethrtwoin
IB_PhysArchOneThrTwoResultsAvailable
rsltphysarchonethrtwoout
IB_LogicalArchContainerOut
logarchconout
IB_ThreadTwoSimData
thrtwosimo
IB_ThreadTwoCostData
thrtwooptoIB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_GpuClockRatio
gpuckrati
IB_CpuClockRatio
cpuckrati
IB_AnalysisMode
atiIB_AnalysisState
asi
CMP_PhysArchOne_ThreadThreePart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
attrparamsin
IB_ThreadConstraints
thrconstriIB_ThreadWeight
thrweighti
IB_AttributeWeight
attrweighti
la_numfuncperthrin
IB_StartPhysArchOneThrThree
strtphysarchonethrthreein
IB_PhysArchOneThrThreeResultsAvailable
rsltphysarchonethrthreeout
IB_LogicalArchContainerOut
logarchconout
IB_ThreadThreeSimData
thrthreesimo
IB_ThreadThreeCostData
thrthreeoptoIB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_GpuClockRatio
gpuckrati
IB_CpuClockRatio
cpuckrati
IB_AnalysisMode
ati
IB_AnalysisState
asi
CMP_PhysArchOne_ExecutionControl1
IB_LogicalArchContainerOut
logarchconout
la_numfuncthrin
IB_StartPhysArchOneThrTwo
strtphysarchonethrtwoout
IB_StartPhysArchOneThrThree
strtphysarchonethrthreeout
IB_StartPhysArchOneThrOne
strtphysarchonethroneout
IB_PhysArchOneThrThreeResultsAvailable
rsltphysarchonethrthreein
IB_PhysArchOneThrTwoResultsAvailable
rsltphysarchonethrtwoin
IB_PhysArchOneThrOneResultsAvailable
rsltphysarchonethronein
IB_PhysArchOneResultsAvailable
rsltphysarchone
IB_StartPhysArchOne
strtphysarchone
159
Figure A-15 PhyArch One Container Execution Control AD
act [Block] CMP_PhysArchOne_ExecutionControl [ACT_PhysArchOneContainer]
FinishThreadControl
PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrThreeResultsAvailable
LOC_CurrentPhysArchOneThread = 0;
PhysArchOneThrThreeControl
StartPhysArchOneThreadThree to strtphysarchonethrthreeout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrTwoResultsAvailable
DN_2[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrTwoControl
StartPhysArchOneThreadTwo to strtphysarchonethrtwoout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrOneResultsAvailable
DN_1[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrOneControl
StartPhysArchOneThreadOne to strtphysarchonethroneout
StartPhysicalArchitecture_1
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneNumberThreads =
determineNumberThreads();
FinishThreadControl
PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrThreeResultsAvailable
LOC_CurrentPhysArchOneThread = 0;
PhysArchOneThrThreeControl
StartPhysArchOneThreadThree to strtphysarchonethrthreeout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrTwoResultsAvailable
DN_2[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrTwoControl
StartPhysArchOneThreadTwo to strtphysarchonethrtwoout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrOneResultsAvailable
DN_1[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrOneControl
StartPhysArchOneThreadOne to strtphysarchonethroneout
StartPhysicalArchitecture_1
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneNumberThreads =
determineNumberThreads();
FinishThreadControl
PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrThreeResultsAvailable
LOC_CurrentPhysArchOneThread = 0;
PhysArchOneThrThreeControl
StartPhysArchOneThreadThree to strtphysarchonethrthreeout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrTwoResultsAvailable
DN_2[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrTwoControl
StartPhysArchOneThreadTwo to strtphysarchonethrtwoout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrOneResultsAvailable
DN_1[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrOneControl
StartPhysArchOneThreadOne to strtphysarchonethroneout
StartPhysicalArchitecture_1
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneNumberThreads =
determineNumberThreads();
FinishThreadControl
PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrThreeResultsAvailable
LOC_CurrentPhysArchOneThread = 0;
PhysArchOneThrThreeControl
StartPhysArchOneThreadThree to strtphysarchonethrthreeout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrTwoResultsAvailable
DN_2[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrTwoControl
StartPhysArchOneThreadTwo to strtphysarchonethrtwoout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrOneResultsAvailable
DN_1[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrOneControl
StartPhysArchOneThreadOne to strtphysarchonethroneout
StartPhysicalArchitecture_1
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneNumberThreads =
determineNumberThreads();
FinishThreadControl
PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrThreeResultsAvailable
LOC_CurrentPhysArchOneThread = 0;
PhysArchOneThrThreeControl
StartPhysArchOneThreadThree to strtphysarchonethrthreeout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrTwoResultsAvailable
DN_2[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrTwoControl
StartPhysArchOneThreadTwo to strtphysarchonethrtwoout PhysArchOneResultsAvailable to rsltphysarchone
PhysArchOneThrOneResultsAvailable
DN_1[LOC_CurrentPhysArchOneThread < IN_PhysArchOneNumberThreads] [else]
LOC_CurrentPhysArchOneThread++;
PhysArchOneThrOneControl
StartPhysArchOneThreadOne to strtphysarchonethroneout
StartPhysicalArchitecture_1
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneNumberThreads =
determineNumberThreads();
160
Figure A-16 PhyArch One Container Thread One IBD (Three Functions Shown)
161
Figure A-17 PhyArch One Thread One Container Execution Control AD
act [Block] CMP_PhysArchOneThrOne_ExecutionControl [ACT_PhysArchOneThreadOneExecutionContainer]
FinishThreadFunctionControl
PhysArchOneThrOneFuncThreeResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction =
0;
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
PhysArchOneThrOneFuncThreeControl
PhysArchOneThrOneFuncTwoResultsAvailable
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
LOC_CurrentPhysArchOneThreadOneFunction++
;
DN_2
[else]
StartPhysArchOneThreadOneFunctionThree to strtphysarchonethronefuncthreeout
[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions]
PhysArchOneThrOneFuncTwoControl
PhysArchOneThrOneFuncOneResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction++;
StartPhysArchOneThreadOneFunctionTwo to strtphysarchonethronefunctwoout PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
DN_1[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions] [else]
PhysArchOneThrOneFuncOneControl
StartPhysArchOneThreadOne
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneThreadOneNumberFunctions =
determineNumberThreadFunctions();
StartPhysArchOneThreadOneFunctionOne to strtphysarchonethronefunconeout
FinishThreadFunctionControl
PhysArchOneThrOneFuncThreeResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction =
0;
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
PhysArchOneThrOneFuncThreeControl
PhysArchOneThrOneFuncTwoResultsAvailable
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
LOC_CurrentPhysArchOneThreadOneFunction++
;
DN_2
[else]
StartPhysArchOneThreadOneFunctionThree to strtphysarchonethronefuncthreeout
[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions]
PhysArchOneThrOneFuncTwoControl
PhysArchOneThrOneFuncOneResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction++;
StartPhysArchOneThreadOneFunctionTwo to strtphysarchonethronefunctwoout PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
DN_1[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions] [else]
PhysArchOneThrOneFuncOneControl
StartPhysArchOneThreadOne
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneThreadOneNumberFunctions =
determineNumberThreadFunctions();
StartPhysArchOneThreadOneFunctionOne to strtphysarchonethronefunconeout
FinishThreadFunctionControl
PhysArchOneThrOneFuncThreeResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction =
0;
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
PhysArchOneThrOneFuncThreeControl
PhysArchOneThrOneFuncTwoResultsAvailable
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
LOC_CurrentPhysArchOneThreadOneFunction++
;
DN_2
[else]
StartPhysArchOneThreadOneFunctionThree to strtphysarchonethronefuncthreeout
[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions]
PhysArchOneThrOneFuncTwoControl
PhysArchOneThrOneFuncOneResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction++;
StartPhysArchOneThreadOneFunctionTwo to strtphysarchonethronefunctwoout PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
DN_1[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions] [else]
PhysArchOneThrOneFuncOneControl
StartPhysArchOneThreadOne
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneThreadOneNumberFunctions =
determineNumberThreadFunctions();
StartPhysArchOneThreadOneFunctionOne to strtphysarchonethronefunconeout
FinishThreadFunctionControl
PhysArchOneThrOneFuncThreeResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction =
0;
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
PhysArchOneThrOneFuncThreeControl
PhysArchOneThrOneFuncTwoResultsAvailable
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
LOC_CurrentPhysArchOneThreadOneFunction++
;
DN_2
[else]
StartPhysArchOneThreadOneFunctionThree to strtphysarchonethronefuncthreeout
[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions]
PhysArchOneThrOneFuncTwoControl
PhysArchOneThrOneFuncOneResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction++;
StartPhysArchOneThreadOneFunctionTwo to strtphysarchonethronefunctwoout PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
DN_1[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions] [else]
PhysArchOneThrOneFuncOneControl
StartPhysArchOneThreadOne
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneThreadOneNumberFunctions =
determineNumberThreadFunctions();
StartPhysArchOneThreadOneFunctionOne to strtphysarchonethronefunconeout
FinishThreadFunctionControl
PhysArchOneThrOneFuncThreeResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction =
0;
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
PhysArchOneThrOneFuncThreeControl
PhysArchOneThrOneFuncTwoResultsAvailable
PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
LOC_CurrentPhysArchOneThreadOneFunction++
;
DN_2
[else]
StartPhysArchOneThreadOneFunctionThree to strtphysarchonethronefuncthreeout
[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions]
PhysArchOneThrOneFuncTwoControl
PhysArchOneThrOneFuncOneResultsAvailable
LOC_CurrentPhysArchOneThreadOneFunction++;
StartPhysArchOneThreadOneFunctionTwo to strtphysarchonethronefunctwoout PhysArchOneThrOneResultsAvailable to rsltphysarchonethroneout
DN_1[LOC_CurrentPhysArchOneThreadOneFunction < IN_PhysArchOneThreadOneNumberFunctions] [else]
PhysArchOneThrOneFuncOneControl
StartPhysArchOneThreadOne
LOC_CurrentLogicalArch =
logarchconout.getCurrentLogicalArchitecture();
IN_PhysArchOneThreadOneNumberFunctions =
determineNumberThreadFunctions();
StartPhysArchOneThreadOneFunctionOne to strtphysarchonethronefunconeout
162
Figure A-18 PhyArch One Container Thread One Function One IBD
ibd [Block] CMP_PhysArchOne_ThreadOne_FunctionOne [IBD_PhysArchOneThrOneFuncOne]
IB_ArchitectureAttributeResultsAvailable
rsltarchattriIB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
IB_ThreadConstraintsthrconstri IB_AttributeWeightattrweighti
IB_PhysArchOneThrOneFuncOne
strtphysarchonethronefunconein
IB_PhysArchOneThrOneFuncOneResultsAvailable
rsltphysarchonethronefunconeout
IB_ThrOneFuncOneSimData
thronefunconesimo
IB_ThrOneFuncOneCostData
thronefunconeopto
IB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_LogicalArchContainerOut
logarchconout
IB_GpuClockRatio
gpuckrati
IB_CpuClockRatio
cpuckrati
IB_AnalysisMode
ati
AlgorithmPerformanceInterfaceBlock, IB_AttrEnergyParams, IB_AttrThermalParams
attrparamsin
IB_AnalysisState
asiCMP_PhysArchOneThrOneFuncOne_ExecutionControlPart1
IB_ResultPhysArchOneThrOneFuncOneComplete
rsltpaonethronefunconecompi
IB_ThreadFunctionIdthronefunconeidi
IB_FuncPhysAttributeComputations
strtfuncphysattrcompo IB_PhysArchOneThrOneFuncOneResultsAvailable
rsltphysarchonethronefunconeo
IB_PhysArchOneThrOneFuncOne
strtphysarchonethronefunconei
This model is replicated for each function of each thread.
CMP_FunctionPhysicalContainerPart1
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
IB_FuncPhysAttributeComputations
strtfuncphysattrcompi
IB_FuncPhysAttributeComputationResultsAvailable
rsltfuncphysattrcompo
IB_LogicalArchContainerOut
logarchconout
IB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_GpuClockRatio
gpuckrati
IB_FuncPhysAttributeData
funcphysattrdatao
IB_CpuClockRatio
cpuckrati
attrparamsin
IB_AnalysisMode
atiIB_AnalysisState
asi
CMP_PhysArchOneThrOneFuncOne_IdPart1
IB_ThreadFunctionId
thronefunconeido
CMP_PhysArchOneThrOneFuncOne_InterfacePart1
IB_ThreadConstraintsthrconstri IB_AttributeWeightattrweighti
IB_ResultPhysArchOneThrOneFuncOneComplete
rsltpaonethronefunconecompo
IB_ThreadFunctionIdthronefunconeidi
IB_ThrOneFuncOneSimData
thronefunconesimo
IB_ThrOneFuncOneCostData
thronefunconeopto
IB_FuncPhysAttributeData
funcphysattrdatai
IB_FuncPhysAttributeComputationResultsAvailable
rsltfuncphysattrcompi
163
Figure A-19 PhyArch One Thread One Function One Container Exec Ctl AD
act [Block] CMP_PhysArchOneThrOneFuncOne_ExecutionControlBlock [ACT_PhysArchOneThrOneFuncOneExecutionControl]
CompleteFunction
PhysArchOneThrOneFuncOneResultsAvailable to rsltphysarchonethronefunconeout
RsltPhysArchOneThrOneFuncOneComplete
ProcessFunctionAttributes
StartFuncPhysAttributeComputations(ThrNum, FuncNum) to strtfuncphysattrcompout
FuncNum ThrNum
StartPhysArchOneThreadOneFunctionOne
getThreadNumber
RETURN
getFunctionNumber
RETURN
InitialState
LOC_ThrOneFuncOneFunctionNumber =
thronefunconeidi.getFunctionNumber();
LOC_ThrOneFuncOneThreadNumber =
thronefunconeidi.getThreadNumber();
CompleteFunction
PhysArchOneThrOneFuncOneResultsAvailable to rsltphysarchonethronefunconeout
RsltPhysArchOneThrOneFuncOneComplete
ProcessFunctionAttributes
StartFuncPhysAttributeComputations(ThrNum, FuncNum) to strtfuncphysattrcompout
FuncNum ThrNum
StartPhysArchOneThreadOneFunctionOne
getThreadNumber
RETURN
getFunctionNumber
RETURN
InitialState
LOC_ThrOneFuncOneFunctionNumber =
thronefunconeidi.getFunctionNumber();
LOC_ThrOneFuncOneThreadNumber =
thronefunconeidi.getThreadNumber();
CompleteFunction
PhysArchOneThrOneFuncOneResultsAvailable to rsltphysarchonethronefunconeout
RsltPhysArchOneThrOneFuncOneComplete
ProcessFunctionAttributes
StartFuncPhysAttributeComputations(ThrNum, FuncNum) to strtfuncphysattrcompout
FuncNum ThrNum
StartPhysArchOneThreadOneFunctionOne
getThreadNumber
RETURN
getFunctionNumber
RETURN
InitialState
LOC_ThrOneFuncOneFunctionNumber =
thronefunconeidi.getFunctionNumber();
LOC_ThrOneFuncOneThreadNumber =
thronefunconeidi.getThreadNumber();
CompleteFunction
PhysArchOneThrOneFuncOneResultsAvailable to rsltphysarchonethronefunconeout
RsltPhysArchOneThrOneFuncOneComplete
ProcessFunctionAttributes
StartFuncPhysAttributeComputations(ThrNum, FuncNum) to strtfuncphysattrcompout
FuncNum ThrNum
StartPhysArchOneThreadOneFunctionOne
getThreadNumber
RETURN
getFunctionNumber
RETURN
InitialState
LOC_ThrOneFuncOneFunctionNumber =
thronefunconeidi.getFunctionNumber();
LOC_ThrOneFuncOneThreadNumber =
thronefunconeidi.getThreadNumber();
164
Figure A-20 PhyArch One Thread One Function One Container Interface AD
act [Block] CMP_PhysArchOneThrOneFuncOne_InterfaceBlock [ACT_PhysArchOneThrOneFuncOneInterface]
ProcessFunctionAttributes
RsltPhysArchOneThrOneFuncOneComplete to rsltpaonethronefunconecompo
computeFunctionOneCosts
retrieveFuncPhysAttributeValues
FuncPhysAttributeComputationResultsAvailable
FuncNum ThrNum
setReceivedThreadFunctionValues
FuncNum ThrNum
DN_1[LOC_ReceivedMatch == True]
LOC_ReceivedMatch = False;
LOC_ReceivedCount++;
[else]
InitialState
LOC_ThrOneFuncOneFuncNum =
thronefunconeidi.getFunctionNumber();
LOC_ThrOneFuncOneThrNum =
thronefunconeidi.getThreadNumber();
ProcessFunctionAttributes
RsltPhysArchOneThrOneFuncOneComplete to rsltpaonethronefunconecompo
computeFunctionOneCosts
retrieveFuncPhysAttributeValues
FuncPhysAttributeComputationResultsAvailable
FuncNum ThrNum
setReceivedThreadFunctionValues
FuncNum ThrNum
DN_1[LOC_ReceivedMatch == True]
LOC_ReceivedMatch = False;
LOC_ReceivedCount++;
[else]
InitialState
LOC_ThrOneFuncOneFuncNum =
thronefunconeidi.getFunctionNumber();
LOC_ThrOneFuncOneThrNum =
thronefunconeidi.getThreadNumber();
ProcessFunctionAttributes
RsltPhysArchOneThrOneFuncOneComplete to rsltpaonethronefunconecompo
computeFunctionOneCosts
retrieveFuncPhysAttributeValues
FuncPhysAttributeComputationResultsAvailable
FuncNum ThrNum
setReceivedThreadFunctionValues
FuncNum ThrNum
DN_1[LOC_ReceivedMatch == True]
LOC_ReceivedMatch = False;
LOC_ReceivedCount++;
[else]
InitialState
LOC_ThrOneFuncOneFuncNum =
thronefunconeidi.getFunctionNumber();
LOC_ThrOneFuncOneThrNum =
thronefunconeidi.getThreadNumber();
165
Figure A-21 Function Physical Attribute Computation Container IBD
ibd [Block] CMP_FunctionPhysicalContainerBlock [IBD_FunctionPhysicalContainer]
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
IB_FuncPhysAttributeComputations
strtfuncphysattrcompi
IB_FuncPhysAttributeComputationResultsAvailable
rsltfuncphysattrcompo
IB_LogicalArchContainerOut
logarchconoutIB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_GpuClockRatio
gpuckrati
IB_FuncPhysAttributeData
funcphysattrdatao
IB_CpuClockRatio
cpuckrati
AlgorithmPerformanceInterfaceBlock, AlgorithmEnergyInterfaceBlock, AlgorithmThermalInterfaceBlock
attrparamsin
IB_AnalysisMode
ati
IB_AnalysisState
asi
CMP_FuncPhys_EnergyContainerPart1
IB_EnergyComputationsResultsAvailable
rsltenercompi
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
IB_StartFuncPhysEnergyComputation
strtenerattrcompi
IB_FuncPhysEnergyResultsAvailable
rsltfuncphysenergyo
IB_FuncPhysEnergyData
funcphysenerdataoIB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_GpuClockRatio
gpuckrati
IB_CpuClockRatio
cpuckrati
attrenergyi
AlgorithmEnergyInterfaceBlock
IB_AnalysisMode
atiIB_AnalysisState
asi
The current model supports energy, performance, and thermal attributes. The model can be expanded to support other attributes (e.g. reliability, risk, etc.).
CMP_FuncPhys_PerformanceContainerPart1
IB_PerformanceComputationsResultsAvailable
rsltperfcompi
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
IB_StartFuncPhysPerformanceComputation
strtperfattrcompi
IB_FuncPhysPerformanceResultsAvailable
rsltfuncphysperfo
IB_FuncPhysPerformanceData
funcphysperfdatoIB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_GpuClockRatio
gpuckrati
IB_CpuClockRatio
cpuckrati
AlgorithmPerformanceInterfaceBlock
attrperfi
IB_AnalysisMode
atiIB_AnalysisState
asi
CMP_FuncPhys_ThermalContainerPart1
IB_ThermalComputationResultsAvailable
rsltthermcompi
IB_StartRetrieveLogicalArchitectureAttributesEvent
strtarchattro
IB_StartFuncPhysThermalComputations
strtthermattrcompi
IB_FuncPhysThermalResultsAvailable
rsltfuncphysthermalo
IB_FuncPhysThermalData
funcphystherdatoIB_NumberGpuThreads
numgpui
IB_NumberCpus
numcpui
IB_GpuClockRatio
gpuckrati
IB_CpuClockRatio
cpuckrati
attrthermali
AlgorithmThermalInterfaceBlock
IB_AnalysisMode
atiIB_AnalysisState
asi
CMP_FuncPhys_AttributeExecutionControlPart1
rsltthermcompo
IB_ThermalComputationResultsAvailable
rsltenercompo
IB_EnergyComputationsResultsAvailable
rsltperfcompo
IB_PerformanceComputationsResultsAvailable
IB_ArchitectureAttributeResultsAvailable
rsltarchattri
IB_LogicalArchContainerOut
logarchconout
IB_FuncPhysAttributeComputationResultsAvailable
rsltfuncphysattrcompo
IB_FuncPhysAttributeComputations
strtfuncphysattrcompiIB_StartFuncPhysEnergyComputation
strtenerattrcompo
IB_StartFuncPhysPerformanceComputation
strtperfattrcompo
IB_StartFuncPhysThermalComputations
strtthermattrcompo
IB_FuncPhysThermalResultsAvailable
rsltfuncphysthermali
IB_FuncPhysEnergyResultsAvailable
rsltfuncphysenergyi
IB_FuncPhysPerformanceResultsAvailable
rsltfuncphysperfi
166
Figure A-22 Function Physical Attribute Comp Container Execution Control AD
act [Block] CMP_FunctionPhysicalAttributeExecutionControlBlock [ACT_FuncPhysAttributeExecutionControl]
ProcessAttributeComputationsAvailableProcessing
ThermalComputationResultsAvailable to rsltthermcompo
EnergyComputationResultsAvailable to rsltenercompo
PerformanceComputationResultsAvailable to rsltperfcompo
setResultsEventParams
AttrTypLogArch FuncNumThrNum
ArchitectureAttributeResultsAvailable
AttrTypFuncNumThrNumLogArch
checkCommandStatusParams
DN_4[LOC_AttributeTypeStatus == Thermal]
DN_3[LOC_AttributeTypeStatus == Energy]
[else]
DN_2[LOC_AttributeTypeStatus == Performance]
[else]
DN_1
[LOC_ParamMatch == true]LOC_ErrorFlag = true;
[else]
LOC_ErrorFlag = true;
[else]
FinishAttributeComputationProcessing
FuncPhysAttributeComputationResultsAvailable(ThrNum, FuncNum) to rsltfuncphysattrcompo
FuncNumThrNum
CMP_FuncPhys_ThermalResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartThermalAttributeComputations
StartFuncPhysThermalAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtthermattrcompo
FuncNumThrNum LogArchNum
getLogicalArchitectureNumber
RETURN
CMP_FuncPhys_EnergyResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartEnergyAttributeComputations
StartFuncPhysEnergyAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtenerattrcompo
FuncNumThrNum LogArchNum
CMP_FuncPhys_PerformanceResultsAvailable
getLogicalArchitectureNumber
RETURN
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartPerformanceAttributeComputations
StartFuncPhysPerformanceAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtperfattrcompo
LogArchNumFuncNumThrNum
StartFuncPhysAttributeComputations
FuncNumThrNum
getLogicalArchitectureNumber
RETURN
setLogicalArchitectureNumber
setStartEventParams
FuncNumThrNum
getFunctionNumber
RETURN
getThreadNumber
RETURN
ProcessAttributeComputationsAvailableProcessing
ThermalComputationResultsAvailable to rsltthermcompo
EnergyComputationResultsAvailable to rsltenercompo
PerformanceComputationResultsAvailable to rsltperfcompo
setResultsEventParams
AttrTypLogArch FuncNumThrNum
ArchitectureAttributeResultsAvailable
AttrTypFuncNumThrNumLogArch
checkCommandStatusParams
DN_4[LOC_AttributeTypeStatus == Thermal]
DN_3[LOC_AttributeTypeStatus == Energy]
[else]
DN_2[LOC_AttributeTypeStatus == Performance]
[else]
DN_1
[LOC_ParamMatch == true]LOC_ErrorFlag = true;
[else]
LOC_ErrorFlag = true;
[else]
FinishAttributeComputationProcessing
FuncPhysAttributeComputationResultsAvailable(ThrNum, FuncNum) to rsltfuncphysattrcompo
FuncNumThrNum
CMP_FuncPhys_ThermalResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartThermalAttributeComputations
StartFuncPhysThermalAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtthermattrcompo
FuncNumThrNum LogArchNum
getLogicalArchitectureNumber
RETURN
CMP_FuncPhys_EnergyResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartEnergyAttributeComputations
StartFuncPhysEnergyAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtenerattrcompo
FuncNumThrNum LogArchNum
CMP_FuncPhys_PerformanceResultsAvailable
getLogicalArchitectureNumber
RETURN
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartPerformanceAttributeComputations
StartFuncPhysPerformanceAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtperfattrcompo
LogArchNumFuncNumThrNum
StartFuncPhysAttributeComputations
FuncNumThrNum
getLogicalArchitectureNumber
RETURN
setLogicalArchitectureNumber
setStartEventParams
FuncNumThrNum
getFunctionNumber
RETURN
getThreadNumber
RETURN
ProcessAttributeComputationsAvailableProcessing
ThermalComputationResultsAvailable to rsltthermcompo
EnergyComputationResultsAvailable to rsltenercompo
PerformanceComputationResultsAvailable to rsltperfcompo
setResultsEventParams
AttrTypLogArch FuncNumThrNum
ArchitectureAttributeResultsAvailable
AttrTypFuncNumThrNumLogArch
checkCommandStatusParams
DN_4[LOC_AttributeTypeStatus == Thermal]
DN_3[LOC_AttributeTypeStatus == Energy]
[else]
DN_2[LOC_AttributeTypeStatus == Performance]
[else]
DN_1
[LOC_ParamMatch == true]LOC_ErrorFlag = true;
[else]
LOC_ErrorFlag = true;
[else]
FinishAttributeComputationProcessing
FuncPhysAttributeComputationResultsAvailable(ThrNum, FuncNum) to rsltfuncphysattrcompo
FuncNumThrNum
CMP_FuncPhys_ThermalResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartThermalAttributeComputations
StartFuncPhysThermalAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtthermattrcompo
FuncNumThrNum LogArchNum
getLogicalArchitectureNumber
RETURN
CMP_FuncPhys_EnergyResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartEnergyAttributeComputations
StartFuncPhysEnergyAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtenerattrcompo
FuncNumThrNum LogArchNum
CMP_FuncPhys_PerformanceResultsAvailable
getLogicalArchitectureNumber
RETURN
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartPerformanceAttributeComputations
StartFuncPhysPerformanceAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtperfattrcompo
LogArchNumFuncNumThrNum
StartFuncPhysAttributeComputations
FuncNumThrNum
getLogicalArchitectureNumber
RETURN
setLogicalArchitectureNumber
setStartEventParams
FuncNumThrNum
getFunctionNumber
RETURN
getThreadNumber
RETURN
ProcessAttributeComputationsAvailableProcessing
ThermalComputationResultsAvailable to rsltthermcompo
EnergyComputationResultsAvailable to rsltenercompo
PerformanceComputationResultsAvailable to rsltperfcompo
setResultsEventParams
AttrTypLogArch FuncNumThrNum
ArchitectureAttributeResultsAvailable
AttrTypFuncNumThrNumLogArch
checkCommandStatusParams
DN_4[LOC_AttributeTypeStatus == Thermal]
DN_3[LOC_AttributeTypeStatus == Energy]
[else]
DN_2[LOC_AttributeTypeStatus == Performance]
[else]
DN_1
[LOC_ParamMatch == true]LOC_ErrorFlag = true;
[else]
LOC_ErrorFlag = true;
[else]
FinishAttributeComputationProcessing
FuncPhysAttributeComputationResultsAvailable(ThrNum, FuncNum) to rsltfuncphysattrcompo
FuncNumThrNum
CMP_FuncPhys_ThermalResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartThermalAttributeComputations
StartFuncPhysThermalAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtthermattrcompo
FuncNumThrNum LogArchNum
getLogicalArchitectureNumber
RETURN
CMP_FuncPhys_EnergyResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartEnergyAttributeComputations
StartFuncPhysEnergyAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtenerattrcompo
FuncNumThrNum LogArchNum
CMP_FuncPhys_PerformanceResultsAvailable
getLogicalArchitectureNumber
RETURN
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartPerformanceAttributeComputations
StartFuncPhysPerformanceAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtperfattrcompo
LogArchNumFuncNumThrNum
StartFuncPhysAttributeComputations
FuncNumThrNum
getLogicalArchitectureNumber
RETURN
setLogicalArchitectureNumber
setStartEventParams
FuncNumThrNum
getFunctionNumber
RETURN
getThreadNumber
RETURN
ProcessAttributeComputationsAvailableProcessing
ThermalComputationResultsAvailable to rsltthermcompo
EnergyComputationResultsAvailable to rsltenercompo
PerformanceComputationResultsAvailable to rsltperfcompo
setResultsEventParams
AttrTypLogArch FuncNumThrNum
ArchitectureAttributeResultsAvailable
AttrTypFuncNumThrNumLogArch
checkCommandStatusParams
DN_4[LOC_AttributeTypeStatus == Thermal]
DN_3[LOC_AttributeTypeStatus == Energy]
[else]
DN_2[LOC_AttributeTypeStatus == Performance]
[else]
DN_1
[LOC_ParamMatch == true]LOC_ErrorFlag = true;
[else]
LOC_ErrorFlag = true;
[else]
FinishAttributeComputationProcessing
FuncPhysAttributeComputationResultsAvailable(ThrNum, FuncNum) to rsltfuncphysattrcompo
FuncNumThrNum
CMP_FuncPhys_ThermalResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartThermalAttributeComputations
StartFuncPhysThermalAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtthermattrcompo
FuncNumThrNum LogArchNum
getLogicalArchitectureNumber
RETURN
CMP_FuncPhys_EnergyResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartEnergyAttributeComputations
StartFuncPhysEnergyAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtenerattrcompo
FuncNumThrNum LogArchNum
CMP_FuncPhys_PerformanceResultsAvailable
getLogicalArchitectureNumber
RETURN
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartPerformanceAttributeComputations
StartFuncPhysPerformanceAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtperfattrcompo
LogArchNumFuncNumThrNum
StartFuncPhysAttributeComputations
FuncNumThrNum
getLogicalArchitectureNumber
RETURN
setLogicalArchitectureNumber
setStartEventParams
FuncNumThrNum
getFunctionNumber
RETURN
getThreadNumber
RETURN
ProcessAttributeComputationsAvailableProcessing
ThermalComputationResultsAvailable to rsltthermcompo
EnergyComputationResultsAvailable to rsltenercompo
PerformanceComputationResultsAvailable to rsltperfcompo
setResultsEventParams
AttrTypLogArch FuncNumThrNum
ArchitectureAttributeResultsAvailable
AttrTypFuncNumThrNumLogArch
checkCommandStatusParams
DN_4[LOC_AttributeTypeStatus == Thermal]
DN_3[LOC_AttributeTypeStatus == Energy]
[else]
DN_2[LOC_AttributeTypeStatus == Performance]
[else]
DN_1
[LOC_ParamMatch == true]LOC_ErrorFlag = true;
[else]
LOC_ErrorFlag = true;
[else]
FinishAttributeComputationProcessing
FuncPhysAttributeComputationResultsAvailable(ThrNum, FuncNum) to rsltfuncphysattrcompo
FuncNumThrNum
CMP_FuncPhys_ThermalResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartThermalAttributeComputations
StartFuncPhysThermalAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtthermattrcompo
FuncNumThrNum LogArchNum
getLogicalArchitectureNumber
RETURN
CMP_FuncPhys_EnergyResultsAvailable
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartEnergyAttributeComputations
StartFuncPhysEnergyAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtenerattrcompo
FuncNumThrNum LogArchNum
CMP_FuncPhys_PerformanceResultsAvailable
getLogicalArchitectureNumber
RETURN
getFunctionNumber
RETURN
getThreadNumber
RETURN
StartPerformanceAttributeComputations
StartFuncPhysPerformanceAttributeComputations(LogArchNum, ThrNum, FuncNum) to strtperfattrcompo
LogArchNumFuncNumThrNum
StartFuncPhysAttributeComputations
FuncNumThrNum
getLogicalArchitectureNumber
RETURN
setLogicalArchitectureNumber
setStartEventParams
FuncNumThrNum
getFunctionNumber
RETURN
getThreadNumber
RETURN
167
Figure A-23 Function Physical Performance Attribute IBD
168
Figure A-24 Function Physical Performance Attribute Execution Control AD
act [Block] CMP_FuncPhys_PerformanceExecutionControlBlock [ACT_FuncPhysPerformanceExecutionControl]
FinishPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_GpuPerformanceResultsAvailable
DBG_GpuEventsReceived = DBG_GpuEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
LOC_NumberGpuThreads = 0;
LOC_CurrentCrId = SC;
InitiateGpuPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
CMP_FuncPhys_McPerformanceResultsAvailable
setGpuComputations
DN_3[LOC_NumberGpuThreads > 0]
DBG_McEventsReceived = DBG_McEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
[else]
LOC_CurrentCrId = GPU;
InitiateMcOrGpuPerformanceAttributeProcessing
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
StartFuncPhysMcPerformanceAttributeComputations to strtmcperfattrcompo
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_ScPerformanceResultsAvailable
setGpuComputations
setMcComputations
DN_1[LOC_NumberCpus > 1]
DN_2[LOC_NumberGpuThreads > 0]
[else]
DBG_ScEventsReceived++;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
[else]
LOC_NumberCpus = numcpui.getNumberCpus();
LOC_NumberGpuThreads = numgpui.getNumberGpuThreads();
LOC_CurrentCrId = MC;
LOC_CurrentCrId = GPU;
InitiateScPerformanceAttributeProcessing
StartFuncPhysScPerformanceAttributeComputations to strtscperfattrcompo
PerformanceComputationResultsAvailable
setScComputations
LOC_CurrentCrId = SC;
FinishPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_GpuPerformanceResultsAvailable
DBG_GpuEventsReceived = DBG_GpuEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
LOC_NumberGpuThreads = 0;
LOC_CurrentCrId = SC;
InitiateGpuPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
CMP_FuncPhys_McPerformanceResultsAvailable
setGpuComputations
DN_3[LOC_NumberGpuThreads > 0]
DBG_McEventsReceived = DBG_McEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
[else]
LOC_CurrentCrId = GPU;
InitiateMcOrGpuPerformanceAttributeProcessing
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
StartFuncPhysMcPerformanceAttributeComputations to strtmcperfattrcompo
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_ScPerformanceResultsAvailable
setGpuComputations
setMcComputations
DN_1[LOC_NumberCpus > 1]
DN_2[LOC_NumberGpuThreads > 0]
[else]
DBG_ScEventsReceived++;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
[else]
LOC_NumberCpus = numcpui.getNumberCpus();
LOC_NumberGpuThreads = numgpui.getNumberGpuThreads();
LOC_CurrentCrId = MC;
LOC_CurrentCrId = GPU;
InitiateScPerformanceAttributeProcessing
StartFuncPhysScPerformanceAttributeComputations to strtscperfattrcompo
PerformanceComputationResultsAvailable
setScComputations
LOC_CurrentCrId = SC;
FinishPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_GpuPerformanceResultsAvailable
DBG_GpuEventsReceived = DBG_GpuEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
LOC_NumberGpuThreads = 0;
LOC_CurrentCrId = SC;
InitiateGpuPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
CMP_FuncPhys_McPerformanceResultsAvailable
setGpuComputations
DN_3[LOC_NumberGpuThreads > 0]
DBG_McEventsReceived = DBG_McEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
[else]
LOC_CurrentCrId = GPU;
InitiateMcOrGpuPerformanceAttributeProcessing
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
StartFuncPhysMcPerformanceAttributeComputations to strtmcperfattrcompo
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_ScPerformanceResultsAvailable
setGpuComputations
setMcComputations
DN_1[LOC_NumberCpus > 1]
DN_2[LOC_NumberGpuThreads > 0]
[else]
DBG_ScEventsReceived++;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
[else]
LOC_NumberCpus = numcpui.getNumberCpus();
LOC_NumberGpuThreads = numgpui.getNumberGpuThreads();
LOC_CurrentCrId = MC;
LOC_CurrentCrId = GPU;
InitiateScPerformanceAttributeProcessing
StartFuncPhysScPerformanceAttributeComputations to strtscperfattrcompo
PerformanceComputationResultsAvailable
setScComputations
LOC_CurrentCrId = SC;
FinishPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_GpuPerformanceResultsAvailable
DBG_GpuEventsReceived = DBG_GpuEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
LOC_NumberGpuThreads = 0;
LOC_CurrentCrId = SC;
InitiateGpuPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
CMP_FuncPhys_McPerformanceResultsAvailable
setGpuComputations
DN_3[LOC_NumberGpuThreads > 0]
DBG_McEventsReceived = DBG_McEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
[else]
LOC_CurrentCrId = GPU;
InitiateMcOrGpuPerformanceAttributeProcessing
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
StartFuncPhysMcPerformanceAttributeComputations to strtmcperfattrcompo
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_ScPerformanceResultsAvailable
setGpuComputations
setMcComputations
DN_1[LOC_NumberCpus > 1]
DN_2[LOC_NumberGpuThreads > 0]
[else]
DBG_ScEventsReceived++;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
[else]
LOC_NumberCpus = numcpui.getNumberCpus();
LOC_NumberGpuThreads = numgpui.getNumberGpuThreads();
LOC_CurrentCrId = MC;
LOC_CurrentCrId = GPU;
InitiateScPerformanceAttributeProcessing
StartFuncPhysScPerformanceAttributeComputations to strtscperfattrcompo
PerformanceComputationResultsAvailable
setScComputations
LOC_CurrentCrId = SC;
FinishPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_GpuPerformanceResultsAvailable
DBG_GpuEventsReceived = DBG_GpuEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
LOC_NumberGpuThreads = 0;
LOC_CurrentCrId = SC;
InitiateGpuPerformanceAttributeProcessing
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
CMP_FuncPhys_McPerformanceResultsAvailable
setGpuComputations
DN_3[LOC_NumberGpuThreads > 0]
DBG_McEventsReceived = DBG_McEventsReceived + 1;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
LOC_NumberCpus = 0;
[else]
LOC_CurrentCrId = GPU;
InitiateMcOrGpuPerformanceAttributeProcessing
StartFuncPhysGpuPerformanceAttributeComputations to strtgpuperfattrcompo
StartFuncPhysMcPerformanceAttributeComputations to strtmcperfattrcompo
CMP_FuncPhys_PerformanceResultsAvailable to rsltfuncphysperfo
CMP_FuncPhys_ScPerformanceResultsAvailable
setGpuComputations
setMcComputations
DN_1[LOC_NumberCpus > 1]
DN_2[LOC_NumberGpuThreads > 0]
[else]
DBG_ScEventsReceived++;
LOC_LogicalArchitectureNumber = 0;
LOC_ThreadNumber = 0;
LOC_FunctionNumber = 0;
[else]
LOC_NumberCpus = numcpui.getNumberCpus();
LOC_NumberGpuThreads = numgpui.getNumberGpuThreads();
LOC_CurrentCrId = MC;
LOC_CurrentCrId = GPU;
InitiateScPerformanceAttributeProcessing
StartFuncPhysScPerformanceAttributeComputations to strtscperfattrcompo
PerformanceComputationResultsAvailable
setScComputations
LOC_CurrentCrId = SC;
169
Figure A-25 Retrieve Performance Computations AD
act [Block] CMP_FuncPhys_PerformanceComputationsBlock [ACT_AlgorithmComputationsTestContainer]
setParams
FuncNumThrNumLogArchNum
StartRetrieveLogicalArchitectureAttributes(LogArch, ThrNum, FuncNum, AttrTyp) to strtarchattro
AttrTyp FuncNumThrNumLogArch
getLogicalArchitectureNumber
RETURN
getThreadNumber
RETURN
getFunctionNumber
RETURN
getAttributeType
RETURN
StartFuncPhysPerformanceAttributeComputations
FuncNumThrNumLogArchNum
170
Figure A-26 Performance Single Core CM Container IBD
171
Figure A-27 Single Core CM Execution Control AD
act [Block] SC_ComputationModel_ExecutionControl [ACT_SC_CM_ExecutionControl]
StartFuncPhysScPerformanceAttributeComputations
DN_1
StartScAnalysisExecutionTimeComputationEvent to stscanalexeccompo
[LOC_AnalysisState == Analysis]
StartScSimulationExecutionTimeComputationEvent to stscsimexeccompo
[else]
LOC_AnalysisState = asi.getAnalysisState();
172
Figure A-28 Single Core CM Execution Time AD
act [Block] SC_CM_ExecutionTimeBlock [ACT_SC_CM_ExecutionTime]
SC_Simulation
CMP_FuncPhys_ScPerformanceResultsAvailable to rsltfuncphysscperfo
ScSimulationExecutionTimeAvailableEvent
SC_ExecutionTime =
scsimexeci.getScSimulationExecutionTime();
SC_Analysis
CMP_FuncPhys_ScPerformanceResultsAvailable to rsltfuncphysscperfo
ScAnalysisExecutionTimeAvailableEvent
SC_Execution_Time =
scanalexeci.getScAnalysisExecutionTime();
SC_Simulation
CMP_FuncPhys_ScPerformanceResultsAvailable to rsltfuncphysscperfo
ScSimulationExecutionTimeAvailableEvent
SC_ExecutionTime =
scsimexeci.getScSimulationExecutionTime();
SC_Analysis
CMP_FuncPhys_ScPerformanceResultsAvailable to rsltfuncphysscperfo
ScAnalysisExecutionTimeAvailableEvent
SC_Execution_Time =
scanalexeci.getScAnalysisExecutionTime();
SC_Simulation
CMP_FuncPhys_ScPerformanceResultsAvailable to rsltfuncphysscperfo
ScSimulationExecutionTimeAvailableEvent
SC_ExecutionTime =
scsimexeci.getScSimulationExecutionTime();
SC_Analysis
CMP_FuncPhys_ScPerformanceResultsAvailable to rsltfuncphysscperfo
ScAnalysisExecutionTimeAvailableEvent
SC_Execution_Time =
scanalexeci.getScAnalysisExecutionTime();
173
Figure A-29 Single Core CM Analysis Container IBD
ibd [Block] SC_CM_AnalysisContainer [IBD_SC_CM_AnalysisContainer]
IB_ScAnalysisExecutionTime
scanalexeco
IB_ScAnalysisExecutionTimeEvent
scanalexecevo
IB_StartScAnalysisExecutionTimeComputationEvent
stscanalexeccompi
IB_CpuClockRatio
cpuckrati
IB_AnalysisMode
ati
intcompi
floatcompi
misccompi
arctrigcompi
cmplxcompi
trigcompi
SC_AnalComputation_CmplxContainerPart1
StartCmplxExecutionTimeComputationEventInterfaceBlock
stcmplxexeccompi
CmplxExecutionTimeEventInterfaceBlock
cmplxexeci
CmplxExecutionTimeInterfaceBlock
cmplxexeco
NumberCmplxAddsInterfaceBlock, NumberCmplxDivsInterfaceBlock, NumberComplexMulsInterfaceBlock
cmplxcompi
IB_CpuClockRatio
ckrat_2
IB_AnalysisMode
ati_2
SC_AnalComputation_TrigContainerPart1
TrigExecutionTimeEventInterfaceBlock
trigexeci
TrigExecutionTimeInterfaceBlock
trigexeco
StartTrigExecutionTimeComputationEventInterfaceBlock
sttrigexeccompi
NumberCosComputationsInterfaceBlock_V1, NumberSinComputationsInterfaceBlock_V1, NumberTanComputationsInterfaceBlock_V1
trigcompi
IB_CpuClockRatio
ckrati_1
IB_AnalysisMode
ati_1
SC_AnalysisExecutionTimePart1
IB_ScAnalysisExecutionTimeEvent
scanalexecevo
IntExecutionTimeInterfaceBlock
intexeci
IntExecutionTimeEventInterfaceBlock
intexeco
FloatExecutionTimeInterfaceBlock
floatexeci
FloatExecutionTimeEventInterfaceBlock
floatexeco
MiscExecutionTimeInterfaceBlock
miscexeci
MiscExecutionTimeEventInterfaceBlock
miscexeco
ArcTrigExecutionTimeInterfaceBlock
arctrigexeciArcTrigExecutionTimeEventInterfaceBlock
arctrigexeco
CmplxExecutionTimeInterfaceBlock
cmplxexeci
TrigExecutionTimeInterfaceBlock
trigexeci
CmplxExecutionTimeEventInterfaceBlock
cmplxexeco
IB_ScAnalysisExecutionTime
scanalexeco
TrigExecutionTimeEventInterfaceBlock
trigexeco
SC_PromulgateAnalysisExecutionTimeStartPart1
StartIntExecutionTimeComputationEventInterfaceBlock
stintexeccompi
StartFloatExecutionTimeComputationEventInterfaceBlock
stfloatexeccompi
StartMiscExecutionTimeComputationEventInterfaceBlock
stmiscexeccompi
StartArcTrigExecutionTimeComputationEventInterfaceBlock
starctrigexeccompi
StartCmplxExecutionTimeComputationEventInterfaceBlock
stcmplxexeccompi
StartTrigExecutionTimeComputationEventInterfaceBlock
sttrigexeccompi
IB_StartScAnalysisExecutionTimeComputationEvent
stscanalexeccompi
SC_AnalComputation__ArcTrigContainerPart1
ArcTrigExecutionTimeInterfaceBlock
arctrigexeco
NumberArcCosComputationsInterfaceBlock, NumberArcSinComputationsInterfaceBlock, NumberArcTanComputationsInterfaceBlock, NumberArcTanFourQuadComputationsInterfaceBlock
arctrigcompi
StartArcTrigExecutionTimeComputationEventInterfaceBlock
starctrigexeccompi
IB_CpuClockRatio
ckrati_1
IB_AnalysisMode
ati_1ArcTrigExecutionTimeEventInterfaceBlock
arctrigexeci
SC_AnalComputation_MiscContainerPart1
MiscExecutionTimeInterfaceBlock
miscexeco
MiscExecutionTimeEventInterfaceBlock
miscexeci
NumberLogComputationsInterfaceBlock, NumberSqrtComputationsInterfaceBlock
misccompi
StartMiscExecutionTimeComputationEventInterfaceBlock
stmiscexeccompi
IB_CpuClockRatio
ckrati_1
IB_AnalysisMode
ati_1
SC_AnalComputation_FloatContainerPart1
NumberFloatAddsInterfaceBlock, NumberFloatDivsInterfaceBlock, NumberFloatMulsInterfaceBlock
floatcompiFloatExecutionTimeInterfaceBlock
floatexeco
FloatExecutionTimeEventInterfaceBlock
floatexeci
StartFloatExecutionTimeComputationEventInterfaceBlock
stfloatexeccompi
IB_CpuClockRatio
ckrat_2
IB_AnalysisMode
ati_2
SC_AnalComputation_IntContainerPart1
NumberIntAddsInterfaceBlock, NumberIntDivsInterfaceBlock, NumberIntMulsInterfaceBlock
intcompi
IntExecutionTimeEventInterfaceBlock
intexeci
IntExecutionTimeInterfaceBlock
intexeco
IB_CpuClockRatio
ckrat_2
IB_AnalysisMode
ati_2
StartIntExecutionTimeComputationEventInterfaceBlock
stintexeccompi
The Single Core Computation Model currently supports Integer/Floating Point/Complex Add/Multiply/Divide, Cos/Sin/Tan trigonometric, ArcCos/ArcSin/ArnTan/ArcTanFourQuad trigonometric, and Log/Exp/Sqrt miscellaneous arithmetric operations. Other math operations can be added to the model. Single Core Computation Model Simulation Container has the same structure and behavior as the Analysis Container.
174
Figure A-30 Single Core CM Analysis Container Propagate Start Activity
act [Block] SC_PromulgateAnalysisExecutionStartEventBlock [ACT_PromulgateExecutionStartEventBlock]
StartScAnalysisExecutionTimeComputationEvent
setStartEventReceivedFlag
StartTrigExecutionTimeComputationEvent to sttrigexeccompi StartCmplxExecutionTimeComputationEvent to stcmplxexeccompi
StartArcTrigExecutionTimeComputationEvent to starctrigexeccompi StartMiscExecutionTimeComputationEvent to stmiscexeccompiStartFloatExecutionTimeComputationEvent to stfloatexeccompi
StartIntExecutionTimeComputationEvent to stintexeccompi
175
Figure A-31 Single Core CM Analysis Container Execution Time AD
act [Block] SC_AnalysisComputationTimeBlock [ACT_SC_ComputationTimeBlock]
TrigExecutionTimeAvailableEvent
CmplxComputationExecutionTimeAvailableEvent
computeExecutionTime
ScAnalysisExecutionTimeAvailableEvent to scanalexecevo
ArcTrigExecutionTimeAvailableEvent
MiscExecutionTimeAvailableEvent FloatExecutionTimeAvailableEvent
IntExecutionTimeAvailableEvent
176
Figure A-32 Single Core CM Analysis Complex Container IBD
ibd [Block] SC_AnalComputation_CmplxContainerBlock [IBD_SC_CmplxComputationContainer]
StartCmplxExecutionTimeComputationEventInterfaceBlockstcmplxexeccompi
CmplxExecutionTimeEventInterfaceBlock
cmplxexeci
CmplxExecutionTimeInterfaceBlock
cmplxexeco
NumberCmplxAddsInterfaceBlock, NumberCmplxDivsInterfaceBlock, NumberComplexMulsInterfaceBlock
cmplxcompi
IB_CpuClockRatio
ckrat_2
IB_AnalysisMode
ati_2 SC_AnalComputation_CmplxAddContainerPart1
ComplexAddExecutionTimeEventInterfaceBlock
cmplxaddexeci_1
StartCmplxAddExecutionTimeComputationEventInterfaceBlock
stcmplxaddexeccompi
ComplexAddExecutionTimeInterfaceBlock
cmplxaddexeco_1
NumberCmplxAddsInterfaceBlock
cmplxaddi_1
IB_CpuClockRatio
ckrat_1
IB_AnalysisMode
ati_1
SC_AnalComputation_CmplxMulContainerPart1
ComplexMulExecutionTimeEventInterfaceBlock
cmplxmulexeci_1
StartCmplxMulExecutionTimeComputationEventInterfaceBlock
stcmplxmulexeccompiNumberComplexMulsInterfaceBlock
cmplxmulcompi_1
ComplexMulExecutionTimeInterfaceBlock
cmplxmulexeco_1
IB_CpuClockRatio
ckrat_1
IB_AnalysisMode
ati_1
SC_AnalComputation_CmplxDivContainerPart1
ComplexDivExecutionTimeEventInterfaceBlock
cmplxdivexeci_1
StartCmplxDivExecutionTimeComputationEventInterfaceBlock
stcmplxdivexeccompi
NumberCmplxDivsInterfaceBlock
cmplxdivcompi_1
ComplexDivExecutionTimeInterfaceBlock
cmplxdivexeco_1
IB_CpuClockRatio
ckrat_1
IB_AnalysisMode
ati_1
SC_AnalysisCmplxExecutionTimePart1
ComplexDivExecutionTimeInterfaceBlock
cmplxdivexeci_2
ComplexMulExecutionTimeInterfaceBlock
cmplxmulexeci_2
ComplexAddExecutionTimeInterfaceBlock
cmplxaddexeci_2
CmplxExecutionTimeEventInterfaceBlock
cmplxexeci
CmplxExecutionTimeInterfaceBlock
cmplxexeco
ComplexMulExecutionTimeEventInterfaceBlock
cmplxmulexeco_2
ComplexDivExecutionTimeEventInterfaceBlock
cmplxdivexeco_2
ComplexAddExecutionTimeEventInterfaceBlock
cmplxaddexeco_2
SC_PromulgateAnalysisCmplxExecutionTimeStartPart1
StartCmplxMulExecutionTimeComputationEventInterfaceBlock
stcmplxmulexeccompi
StartCmplxDivExecutionTimeComputationEventInterfaceBlock
stcmplxdivexeccompi
StartCmplxAddExecutionTimeComputationEventInterfaceBlock
stcmplxaddexeccompi
StartCmplxExecutionTimeComputationEventInterfaceBlockstcmplxexeccompi
Model structure is identical for Floating Point and Integer computations
177
Figure A-33 Single Core CM Analysis Complex Container Propagate Start Activity
act [Block] SC_PromulgateCmplxExecutionStartEventBlock [ACT_SC_PromulgateStartCmplxExecutionEvent]
StartCmplxExecutionTimeComputationEvent
setStartEventReceivedFlag
StartCmplxAddExecutionTimeComputationEvent to stcmplxaddexeccompi
StartCmplxDivExecutionTimeComputationEvent to stcmplxdivexeccompi
StartCmplxMulExecutionTimeComputationEvent to stcmplxmulexeccompi
178
Figure A-34 Single Core CM Analysis Complex Container Execution Time AD
act [Block] SC_CmplxTimeBlock [ACT_CmplxTimeBlock]
ComplexAddExecTimeAvailableEvent ComplexDivExecTimeAvailableEvent ComplexMulExecTimeAvailableEvent
computeCmplxComputationExecutionTime
CmplxComputationExecutionTimeAvailableEvent to cmplxexeci
179
Figure A-35 Single Core CM Analysis Complex Add Container IBD
ibd [Block] SC_AnalComputation_CmplxAddContainerBlock [IBD_CmplxAddContainer]
ComplexAddExecutionTimeEventInterfaceBlock
cmplxaddexeci_1
StartCmplxAddExecutionTimeComputationEventInterfaceBlock
stcmplxaddexeccompi
ComplexAddExecutionTimeInterfaceBlock
cmplxaddexeco_1
NumberCmplxAddsInterfaceBlock
cmplxaddi_1
IB_CpuClockRatiockrat_1IB_AnalysisModeati_1
SC_Anal_ComplexAddExecutionTimePart1
ComplexAddExecutionTimeEventInterfaceBlock
cmplxaddexeci
StartCmplxAddExecutionTimeComputationEventInterfaceBlock
stcmplxaddexeccompi
ComplexAddExecutionTimeInterfaceBlock
cmplxaddexeco
NumberCmplxAddsInterfaceBlock
cmplxaddcompi
IB_CpuClockRatiockratiIB_AnalysisMode
ati
IN_CmplxAddQuadWarm:double
IN_CmplxAddQuadMostLikely:double
IN_CmplxAddQuadMin:double
IN_CmplxAddQuadMax:double
IN_CmplxAddQuadHot:double
IN_CmplxAddQuadCold:double
IN_CmplxAddQuintWarm:double
IN_CmplxAddQuintMostLikely:double
IN_CmplxAddQuintMin:double
IN_CmplxAddQuintMax:double
IN_CmplxAddQuintHot:double
IN_CmplxAddQuintCold:double
IN_CmplxAddTripleWarm:double
IN_CmplxAddTripleMostLikely:double
IN_CmplxAddTripleMax:double
IN_CmplxAddTripleMin:double
IN_CmplxAddTripleHot:double
IN_CmplxAddTripleCold:double
IN_CmplxAddDoubleWarm:double
IN_CmplxAddDoubleMostLikely:double
IN_CmplxAddDoubleMin:double
IN_CmplxAddDoubleMax:double
IN_CmplxAddDoubleHot:double
IN_CmplxAddDoubleCold:double
IN_CmplxAddSingleWarm:double
IN_CmplxAddSingleMostLikely:double
IN_CmplxAddSingleMin:double
IN_CmplxAddSingleMax:double
IN_CmplxAddSingleHot:double
IN_CmplxAddSingleCold:double
SC_Anal_CmplxAddSingleBufferQmifPart1
SCcmplxAddSingleMaxLikely:real_T
SCcmplxAddSingleWarmMaxLikely:real_T
SCcmplxAddSingleHotMaxLikely:real_T
SCcmplxAddSingleWcet:real_T
SCcmplxAddSingleWarmMuMax:real_T
SCcmplxAddSingleMin:real_T
SC_Anal_CmplxAddDoubleBufferQmifPart1
SCcmplxAddDoubleMaxLikely:real_T
SCcmplxAddDoubleWarmMaxLikely:real_T
SCcmplxAddDoubleHotMaxLikely:real_T
SCcmplxAddDoubleWcet:real_T
SCcmplxAddDoubleWarmMuMax:real_T
SCcmplxAddDoubleMin:real_T
SC_Anal_CmplxAddTripleBufferQmifPart1
SCcmplxAddTripleMaxLikely:real_T
SCcmplxAddTripleWarmMaxLikely:real_T
SCcmplxAddTripleHotMaxLikely:real_T
SCcmplxAddTripleWcet:real_T
SCcmplxAddTripleWarmMuMax:real_T
SCcmplxAddTripleMin:real_T
SC_Anal_CmplxAddQuadBufferQmifPart1
SCcmplxAddQuadMaxLikely:real_T
SCcmplxAddQuadWarmMaxLikely:real_T
SCcmplxAddQuadHotMaxLikely:real_T
SCcmplxAddQuadWcet:real_T
SCcmplxAddQuadWarmMuMax:real_T
SCcmplxAddQuadMin:real_T
SC_Anal_CmplxAddQuintBufferQmifPart1
SCcmplxAddQuintMaxLikely:real_T
SCcmplxAddQuintWarmMaxLikely:real_T
SCcmplxAddQuintHotMaxLikely:real_T
SCcmplxAddQuintWcet:real_T
SCcmplxAddQuintWarmMuMax:real_T
SCcmplxAddQuintMin:real_T
Supports algorithms with up to five buffers (e.g. three input, two output)
180
Figure A-36 Single Core CM Analysis Complex Add Container Execution Time AD
act [Block] SC_Anal_ComplexAddExecutionTimeBlock [ACT_ComplexAddExecTime]
CmplxAddSingleColdWaitCount++;CmplxAddDoubleColdWaitCount++;
CmplxAddSingleColdWaitCount = 0; CmplxAddDoubleColdWaitCount = 0;
ComplexAddExecTimeAvailableEvent to cmplxaddexeci
CxAddSing_1
[else]
[IN_CmplxAddSingleCold == 0.0]
CxAddDoub_1
[IN_CmplxAddDoubleCold == 0]
[else]
CmplxAddTripleColdWaitCount = 0;
CmplxAddTripleColdWaitCount++;
CxAddTrip_1
[IN_CmplxAddTripleCold == 0]
[else]
CmplxAddQuadColdWaitCount = 0;
CmplxAddQuadColdWaitCount++;
CxAddQuad_1
[IN_CmplxAddQuadCold == 0]
[else]
selectComplexAddBufferTime
computeComplexAddTime
CmplxAddQuintColdWaitCount = 0;
CmplxAddQuintColdWaitCount++
;
CxAddQuint_1
[IN_CmplxAddQuintCold == 0]
[else]
StartCmplxAddExecutionTimeComputationEvent
181
Figure A-37 MATLAB SIMULINK Complex Add Single Buffer Model
182
Figure A-38 Single Core CM Analysis Trig Container IBD
ibd [Block] SC_AnalComputation_TrigContainerBlock [IBD_SC_TrigComputationContainer]
TrigExecutionTimeEventInterfaceBlock
trigexeci
TrigExecutionTimeInterfaceBlock
trigexeco
StartTrigExecutionTimeComputationEventInterfaceBlocksttrigexeccompi
NumberCosComputationsInterfaceBlock_V1, NumberSinComputationsInterfaceBlock_V1, NumberTanComputationsInterfaceBlock_V1
trigcompi
IB_CpuClockRatio
ckrati_1
IB_AnalysisMode
ati_1SC_AnalysisTrigExecutionTimePart1
TanExecutionTimeInterfaceBlock
tanexeci_1
SinExecutionTimeInterfaceBlock
sinexeci_1
CosExecutionTimeInterfaceBlock
cosexeci_1
TrigExecutionTimeEventInterfaceBlock
trigexeci
TrigExecutionTimeInterfaceBlock
trigexeco
SinExecutionTimeEventInterfaceBlock
sinexeco_1
CosExecutionTimeEventInterfaceBlock
cosexeco_1
TanExecutionTimeEventInterfaceBlock
tanexeco_1
SC_AnalComputation_CosContainerPart1
CosExecutionTimeEventInterfaceBlock
cosexeci_1
StartCosExecutionTimeComputationEventInterfaceBlock
stcosexeccompi
IB_AnalysisMode
ati_1
CosExecutionTimeInterfaceBlock
cosexeco_1
NumberCosComputationsInterfaceBlock_V1
coscompi_1
IB_CpuClockRatio
ckrati_1
SC_AnalComputation_SinContainerPart1
SinExecutionTimeEventInterfaceBlock
sinexeci_1
StartSinExecutionTimeComputationEventInterfaceBlock
stsinexeccompi
SinExecutionTimeInterfaceBlock
sinexeco_1
NumberSinComputationsInterfaceBlock_V1
sincompi_1
IB_CpuClockRatio
ckrati_1
IB_AnalysisMode
ati_1
SC_AnalComputation_TanContainerPart1
TanExecutionTimeEventInterfaceBlock
tanexeci_1
TanExecutionTimeInterfaceBlock
tanexeco_1
NumberTanComputationsInterfaceBlock_V1
tancompi_1
StartTanExecutionTimeComputationEventInterfaceBlock
sttanexeccompi
IB_CpuClockRatio
ckrati_1
IB_AnalysisMode
ati_1
SC_PromulgateAnalysisTrigExecutionTimeStartPart1
StartTanExecutionTimeComputationEventInterfaceBlock
sttanexeccompi
StartSinExecutionTimeComputationEventInterfaceBlock
stsinexeccompi
StartCosExecutionTimeComputationEventInterfaceBlock
stcosexeccompi
StartTrigExecutionTimeComputationEventInterfaceBlocksttrigexeccompi
This Analysis Container is replicated in the companion Simulation Container.
183
Figure A-39 Single Core CM Analysis Trig Container Propagate Start Activity
act [Block] SC_PromulgateTrigExecutionStartEventBlock [ACT_PromulgateStatTrigExecutionTimeEvent]
StartTrigExecutionTimeComputationEvent
setStartEventReceivedFlag
StartCosExecutionTimeComputationEvent to stcosexeccompi StartSinExecutionTimeComputationEvent to stsinexeccompi StartTanExecutionTimeComputationEvent to sttanexeccompi
184
Figure A-40 Single Core CM Analysis Trig Container Execution Time AD
act [Block] SC_TrigTimeBlock [ACT_TrigTimeBlock]
CosExecTimeAvailableEvent SinExecTimeAvailableEvent
computeTrigComputationExecutionTime
TrigExecutionTimeAvailableEvent to trigexeci
TanExecTimeAvailableEvent
185
Figure A-41 Single Core CM Analysis Trig Container IBD
ibd [Block] SC_AnalComputation_CosContainerBlock [IBD_SC_CosComputationContainer]
CosExecutionTimeEventInterfaceBlock
cosexeci_1
StartCosExecutionTimeComputationEventInterfaceBlock
stcosexeccompi
IB_AnalysisMode
ati_1
CosExecutionTimeInterfaceBlock
cosexeco_1
NumberCosComputationsInterfaceBlock_V1
coscompi_1
IB_CpuClockRatiockrati_1
SC_AnalysisCosExecutionTimePart1
CosExecutionTimeEventInterfaceBlock
cosexeci
StartCosExecutionTimeComputationEventInterfaceBlock
stcosexeccompi
CosExecutionTimeInterfaceBlock
cosexeco
NumberCosComputationsInterfaceBlock_V1
coscompi
IB_CpuClockRatiockrati
IB_AnalysisMode
ati
IN_CosCold:double
IN_CosMostLikely:double
IN_CosWarm:double
IN_CosHot:double
IN_CosMax:double
IN_CosMin:double
SC_Anal_CosQmifPart1
SCcosWarmMuMax:real_T
SCcosMaxLikelihood:real_T
SCcosWarmMaxLikelihood:real_T
SCcosHotMaxLikelihood:real_T
SCcosWorstCaseExecutionTime:real_T
SCcosMin:real_T
The model structure shown here is replicated for TrigSin,TrigTan, ArcCos, ArcSin, ArcTan,ArcTanFourQuad, MiscExp, MiscLog, and MiscSqrt
186
Figure A-42 Single Core CM Analysis Trig Cosine Container Execution Time AD
act [Block] SC_AnalysisCosExecutionTimeBlock [ACT_CosTimeBlock]
computeCosExecutionTime
CosMinWaitCount++; CosMaxWaitCount++;
CosMinIn_1
[else]
CosMinWaitCount = 0;
[IN_CosMin == 0]
CosMaxWaitCount = 0;
CosMaxIn_1[IN_CosMax == 0]
[else]
CosExecTimeAvailableEvent to cosexeci
StartCosExecutionTimeComputationEvent
setStartEventReceivedFlag
187
Figure A-43 MATLAB SIMULINK Cosine Model
188
Figure A-44 Complex Add Single Buffer Hot State Pdf Parameters
HMM_State(1,1) HMM_State(1,2) HMM_State(1,3) HMM_State(1,4) HMM_State(2,1) HMM_State(2,2) HMM_State(2,3) HMM_State(2,4) HMM_State(3,1) HMM_State(3,2) HMM_State(3,3) HMM_State(3,4)
DistName generalized extreme value loglogistic tlocationscale beta generalized extreme value generalized pareto loglogistic lognormal generalized pareto generalized extreme value loglogistic lognormal
NLogL -568076.0794 -562972.3344 -561869.8315 -561828.835 -265244.0404 -264464.6268 -261111.1075 -260445.784 -71651.14141 -71473.40133 -70822.07087 -70711.35527
BIC -1136121.765 -1125924.406 -1123709.269 -1123637.408 -530459.9057 -528901.0784 -522203.4315 -520872.7845 -143277.8865 -142922.4063 -141627.8775 -141406.4463
AIC -1136146.159 -1125940.669 -1123733.663 -1123653.67 -530482.0809 -528923.2536 -522218.215 -520887.568 -143296.2828 -142940.8027 -141640.1417 -141418.7105
AICc -1136146.158 -1125940.668 -1123733.662 -1123653.67 -530482.0789 -528923.2516 -522218.214 -520887.567 -143296.2757 -142940.7956 -141640.1382 -141418.707
ParamName_1 k mu mu a k k mu mu k k mu mu
ParamDescr_1 shape log location location a shape shape log location log location shape shape log location log location
ParamValue_1 0.23715722 -19.2054384 4.55427E-09 8462.800415 0.447574409 -0.090828143 -19.14831128 -19.14486292 -0.144599417 0.337717059 -19.04287341 -19.03688708
ParamName_2 sigma sigma sigma b sigma sigma sigma sigma sigma sigma sigma sigma
ParamDescr_2 scale log scale scale b scale scale log scale log scale scale scale log scale log scale
ParamValue_2 2.69887E-11 0.005297994 4.45383E-11 1.85315E+12 3.92564E-11 1.05944E-10 0.009307866 0.018356663 3.03125E-10 1.28118E-10 0.022457104 0.042124098
ParamName_3 mu nu mu theta theta mu
ParamDescr_3 location degrees of freedom location threshold threshold location
ParamValue_3 4.54664E-09 6.906482472 4.80065E-09 4.75098E-09 5.14012E-09 5.27991E-09
Paramci(1,1) 0.227434199 -19.20555041 65535 8288.067666 0.431122518 -0.110155639 -19.14859469 -19.14519154 -0.176600991 0.299244709 -19.04417591 -19.03830309
Paramci(1,2) 2.66865E-11 0.005242017 65535 1.8152E+12 3.85232E-11 1.03195E-10 0.009163244 0.018127234 2.89454E-10 1.23703E-10 0.021825803 0.041146476
Paramci(1,3) 4.54627E-09 65535 4.79985E-09 4.75098E-09 5.14012E-09 5.27473E-09
Paramci(2,1) 0.246880242 -19.20532638 65535 8641.216958 0.464026301 -0.071500647 -19.14802787 -19.1445343 -0.112597843 0.376189409 -19.04157092 -19.03547107
Paramci(2,2) 2.72943E-11 0.005354569 65535 1.89189E+12 4.00037E-11 1.08766E-10 0.009454771 0.018592015 3.17441E-10 1.32691E-10 0.023106665 0.043149648
Paramci(2,3) 4.54702E-09 65535 4.80144E-09 4.75098E-09 5.14012E-09 5.28509E-09
ParamCov(1,1) 2.46097E-05 3.26622E-09 8115.167651 7.04588E-05 9.72423E-05 2.09093E-08 2.81064E-08 0.000266592 0.000385302 4.41626E-07 5.21587E-07
ParamCov(1,2) 1.60383E-17 1.96753E-10 1.76213E+12 4.90216E-16 -1.1032E-14 1.67756E-09 5.40912E-19 -9.18315E-14 -7.75988E-15 2.4695E-08 -8.37473E-19
ParamCov(1,3) -2.49668E-16 -7.50726E-16 0 0 -2.09454E-14
ParamCov(2,1) 1.60383E-17 1.96753E-10 1.76213E+12 4.90216E-16 -1.1032E-14 1.67756E-09 5.40912E-19 -9.18315E-14 -7.75988E-15 2.4695E-08 -8.37473E-19
ParamCov(2,2) 2.40475E-26 8.24401E-10 3.82649E+20 1.42631E-25 2.01994E-24 5.53054E-09 1.40549E-08 5.09399E-23 5.25488E-24 1.06741E-07 2.60909E-07
ParamCov(2,3) 1.71446E-26 1.07874E-25 0 0 4.21218E-24
ParamCov(3,1) -2.49668E-16 -7.50726E-16 0 0 -2.09454E-14
ParamCov(3,2) 1.71446E-26 1.07874E-25 0 0 4.21218E-24
ParamCov(3,3) 3.63886E-26 1.63749E-25 0 0 6.99013E-24
189
Figure A-45 Complex Add Single Buffer Warm State Pdf Parameters
HMM_State(1,1) HMM_State(1,2) HMM_State(1,3) HMM_State(1,4) HMM_State(2,1) HMM_State(2,2) HMM_State(2,3) HMM_State(2,4) HMM_State(3,1) HMM_State(3,2) HMM_State(3,3) HMM_State(3,4) HMM_State(4,1) HMM_State(4,2) HMM_State(4,3) HMM_State(4,4)
DistName generalized extreme value loglogistic tlocationscale beta generalized pareto generalized extreme value beta inverse gaussian generalized pareto generalized extreme value beta inverse gaussian generalized extreme value loglogistic tlocationscale generalized pareto
NLogL -137419.4533 -135917.2134 -135752.1999 -135046.7182 -25034.5688 -25000.08912 -24981.05301 -24980.85718 -5202.620544 -5167.933528 -5164.862739 -5164.09248 -37159.86877 -37134.87149 -37079.61223 -36254.58257
BIC -274812.7072 -271816.9607 -271478.2004 -270075.9701 -50047.8824 -49978.92306 -49947.93589 -49947.54424 -10388.7867 -10319.41266 -10318.75588 -10317.21537 -74297.16385 -74254.69386 -74136.65078 -72486.59146
AIC -274832.9065 -271830.4269 -271498.3997 -270089.4363 -50063.13759 -49994.17825 -49958.10602 -49957.71437 -10399.24109 -10329.86706 -10325.72548 -10324.18496 -74313.73753 -74265.74298 -74153.22446 -72503.16515
AICc -274832.9027 -271830.425 -271498.3959 -270089.4344 -50063.11743 -49994.15808 -49958.09594 -49957.70429 -10399.13982 -10329.76579 -10325.67506 -10324.13454 -74313.72455 -74265.7365 -74153.21148 -72503.15217
ParamName_1 k mu mu a k k a mu k k a mu k mu mu k
ParamDescr_1 shape log location location a shape shape a scale shape shape a scale shape log location location shape
ParamValue_1 0.178498126 -18.83969954 6.56413E-09 5984.382385 -0.804640651 -0.210288348 1382.828752 7.43074E-09 -0.890696487 -0.277993223 4562.190614 8.05988E-09 0.085972141 -18.4913442 9.25393E-09 0.0280111
ParamName_2 sigma sigma sigma b sigma sigma b lambda sigma sigma b lambda sigma sigma sigma sigma
ParamDescr_2 scale log scale scale b scale scale b shape scale scale b shape scale log scale scale scale
ParamValue_2 4.45031E-11 0.005875126 7.28196E-11 9.07894E+11 6.44636E-10 1.85429E-10 1.86078E+11 1.04164E-05 3.77693E-10 1.16098E-10 5.65724E+11 3.65798E-05 4.05101E-10 0.02477077 5.26453E-10 1.13873E-09
ParamName_3 mu nu theta mu theta mu mu nu theta
ParamDescr_3 location degrees of freedom threshold location threshold location location degrees of freedom threshold
ParamValue_3 6.55565E-09 4.90599084 7.06018E-09 7.35398E-09 7.86413E-09 8.01739E-09 9.17135E-09 4.011342747 8.29176E-09
Paramci(1,1) 0.166552998 -18.83994308 65535 5811.730023 -0.852824222 -0.270337798 1230.304944 65535 -1.017209458 -0.416713802 3508.751176 65535 0.071320656 -18.49313723 65535 0.01162135
Paramci(1,2) 4.35949E-11 0.005748419 65535 8.82098E+11 6.07832E-10 1.76166E-10 1.65604E+11 65535 3.28704E-10 1.03291E-10 4.35145E+11 65535 3.92022E-10 0.023767004 65535 1.08495E-09
Paramci(1,3) 6.55447E-09 65535 7.06018E-09 7.34148E-09 7.86413E-09 7.99991E-09 9.15166E-09 65535 8.29176E-09
Paramci(2,1) 0.190443253 -18.839456 65535 6162.163829 -0.756457081 -0.150238897 1554.261296 65535 -0.764183515 -0.139272643 5931.906297 65535 0.100623626 -18.48955117 65535 0.04440085
Paramci(2,2) 4.54303E-11 0.006004626 65535 9.34444E+11 6.83669E-10 1.95179E-10 2.09083E+11 65535 4.33982E-10 1.30494E-10 7.35489E+11 65535 4.18617E-10 0.025816929 65535 1.19517E-09
Paramci(2,3) 6.55684E-09 65535 7.06018E-09 7.36648E-09 7.86413E-09 8.03488E-09 9.19105E-09 65535 8.29176E-09
ParamCov(1,1) 3.71437E-05 1.544E-08 7989.723583 0.000604368 0.000938689 6798.928416 0.004166524 0.005009399 373465.6288 5.58814E-05 8.3691E-07 6.99276E-05
ParamCov(1,2) 9.12877E-16 9.35255E-10 1.19339E+12 -4.74466E-13 -9.61483E-14 9.12435E+11 -1.72624E-12 -3.40829E-13 4.62895E+13 1.2899E-14 2.91682E-08 -6.19324E-14
ParamCov(1,3) -3.65451E-16 0 -9.83394E-14 0 -3.27058E-13 -8.7818E-15 0
ParamCov(2,1) 9.12877E-16 9.35255E-10 1.19339E+12 -4.74466E-13 -9.61483E-14 9.12435E+11 -1.72624E-12 -3.40829E-13 4.62895E+13 1.2899E-14 2.91682E-08 -6.19324E-14
ParamCov(2,2) 2.19179E-25 4.2713E-09 1.78272E+20 3.7385E-22 2.35056E-23 1.22472E+20 7.16687E-22 4.79409E-23 5.73767E+21 4.60139E-23 2.7332E-07 7.8991E-22
ParamCov(2,3) 1.34991E-25 0 8.44669E-24 0 1.35763E-23 2.72015E-23 0
ParamCov(3,1) -3.65451E-16 0 -9.83394E-14 0 -3.27058E-13 -8.7818E-15 0
ParamCov(3,2) 1.34991E-25 0 8.44669E-24 0 1.35763E-23 2.72015E-23 0
ParamCov(3,3) 3.63346E-25 0 4.06841E-23 0 7.95934E-23 1.00966E-22 0