AAA which application form ?
description
Transcript of AAA which application form ?
![Page 1: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/1.jpg)
AAA
which a
pplic
ation
form
?
R. de SimoneINRIA EPI Aoste
semi-technical considerations
![Page 2: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/2.jpg)
Adequation Algorithme-ArchitectureAllocation Application / Architecture
mapping comprises:– spatial allocation: distribution, placement (of tasks on resources)
– temporal allocation: scheduling
original formulation: Yves Sorel (1988)since then rephrased as Platform-based design, Y-Chart approach,…
ArchitectureApplicationmapping
adequation
![Page 3: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/3.jpg)
Adequation Algorithme-ArchitectureAllocation Application / Architecture
mapping comprises:– spatial allocation: distribution, placement (of tasks on resources)
– temporal allocation: scheduling
architecture trends: – networks of processors (multicore, GPGPU/MIC, many-buzz)– communication bandwith the issue, on-chip networks… spatial routing, temporal arbitration, what else ?
ArchitectureApplicationmapping
![Page 4: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/4.jpg)
Adequation Algorithme-ArchitectureAllocation Application / Architecture
what kind of application description models ?– where concurrency and timing could be extracted, made explicit– Under favorable conditions, could be performed at compile time
(static, time-predictable)– Dimensioning should be feasible to optimally fit the architecture
(cache sizes, PRET notions,…) – (Existing theories developed in neighboring domains could be
combined and adaptedGOAL is to refine the application so it reflects the architecture
ArchitectureApplicationmapping
![Page 5: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/5.jpg)
5
Draft example: Mapping
Bus
Proc1
Proc2
![Page 6: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/6.jpg)
6
Mapping = spatial distribution/routing + temporal scheduling
Bus
Proc1
Proc2
![Page 7: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/7.jpg)
Applications: the scope MoCCs
Syn/Poly-chronous
Process Networks
Affine bounds nested loops
explicit distribution, concurrency extraction
explicit scheduling, time constraints extraction
![Page 8: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/8.jpg)
Applications: the scope MoCCs
• Exercice: count the communities(do not forget Model-Driven Engineering for model transformations, and Classical or Real-Time Scheduling)
Syn/Poly-chronous
Process Networks
Affine bounds nested loops
explicit distribution, concurrency extraction
explicit scheduling, time constraints extraction
![Page 9: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/9.jpg)
Motivations for the joint use of these models
• Similar restrictions !!– little (in fact no) concern for data values– control largely data-independent (uninterpreted functions)– finite-state control property– role of conflict-freeness as functional determinism– Formal models and mathematical analysis– Issues tacked at compile time
but…– Largely developed independently (even if in neighboring groups)– Lack of common vocabulary, or of common framework, or common
drive (risks to loose AAA grade ?)
…Or is it just me who needs to go back to school ?
![Page 10: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/10.jpg)
Process Networks
• Data-flow: computations triggered by data availability– Pure Data-Flow: Marked Graphs, SDF extension
– w/ data route switches: Boolean DF, CycloStatic DF, Kahn PNs
• Conflict-freeness: all computations amount to same partial order, only different scheduling/time assignments
• ASAP schedule: provides best throughput
• correctness issues: safety and liveness• Optimization issues: optimize buffer sizes while preserving
throughputSecond-level semantics: computations according to schedules (activation conditions) Representation as polychronous descriptions
![Page 11: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/11.jpg)
Explicit timing for Process Networks
• Classical scheduling of Marked Graphs, Synchronous Data Flow graphs
• Latency-Insensitive Design– Ultimately k-periodic schedules
Represented with– N-synchronous formalisms– Signal and affine clock calculus
Syn/Poly-chronous
Process Networks
![Page 12: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/12.jpg)
Ultimately k-periodic scheduling
f
g1,1
131
2,1
1,1
1,11bf
ef111
(11010)
(10101)
(10110)
(01101)
(01011)
(11010)
[00100]
loop pause; emit clk; pause; pause; emit clk; pause; emit clk; pause;
endloop
![Page 13: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/13.jpg)
Explicit routing
• Regular switching patterns in BDF, CSDF, KPNs– Extending scheduling with similar expressivity (ultimately
periodic infinite binary words)KRGs (K-periodically) Routed Graphs– Axiomatics for an equational theory of communication re-wiring
• Allows to consider sharing of communications (on common interconnect structures– Interplay of scheduling and routing (arbitration between
communications using the same channels)
Syn/Poly-chronous
Process Networks
![Page 14: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/14.jpg)
Interconnect modeling and optimization
•On-Chip Networks
•Switching conditions of Select/Marge nodes can configure
different communication paths, possibly overlapping in time
•Predictable routing schemes (ultimately k-periodic) will match the
temporal schedules obtained in classical Process Network scheduling
theory
C1
C2
C3
C4 Merge node
Select node
![Page 15: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/15.jpg)
C1
C2
C3
C4
one possible routing configuration1
00
0
0
0
0
00
1
1
1
1
1
1
1
![Page 16: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/16.jpg)
C1
C2
C3
C4
another possible configuration
1
0
1
11
111
1 1
11
1
0 0
0
0
0
0
00
0
00
0
![Page 17: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/17.jpg)
Nested loops with affine bounds
•Parallel compilation models– static control (regular indices, not while (data_cond) loops)
– Iteration space (multidimensional arrays, polyhedra)
– Source-to-source transformations (rewrite as program, only with DOSEQ and DOPAR at various levels)
– Improving data locality
• Variants– Systems of Uniform Recurrence Equations (SUREs, MMAlpha)– Geometrical description formalisms (Array-OL)
![Page 18: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/18.jpg)
Example + Reduced Data Graphs
Process Network
Affine bounds
nested loops
loop i = 1 to N loop j = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) endend
dependence levels
2 1
direction vectors
01
11
DOSEQ j = 1 to N DOPAR i = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) endend
![Page 19: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/19.jpg)
From Nested Loops to Process Networks
• Existing efforts: basic idea is to assign one computing node to each assignment
• COMPAAN, Pico Express• Tiling, chunking • Stefanov, et al, Polyhedral Process Networks• Multidimensional SDF for Array-OL/Gaspard
Process Network
Affine bounds
nested loops
![Page 20: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/20.jpg)
21
Dream (or nightmare?)• How can one produce Process Network descriptions from
nested loops (or better said, SUREs or RDGs)• Of course limited, but very often already polyhedra (for
bounds) separated from dependency graphs (for computations)
• Needs to split clearly between (sequential) iterations and parallel ones (currently expanded)• Issue of potential (application) vs real
(architecture) concurrency/parallelisma(i,j)
i
j
a(i,j-1)a(i-1,j-1)
8
4
![Page 21: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/21.jpg)
22
Dream (or nightmare?)• Greenish nodes provide constants (in parallel at the
bottom, sequentially on the left)• Blue nodes each compute 4 values in parallel, shifts the
last right across the borderLack of generality certainly when values to be sent across boundaries are not ordered as expected at target
a(i,j)
i
j
a(i,j-1)a(i-1,j-1)
8
4
a[1,1..4] a[1,5..8]
4a[1..4,0])
1 1
3 34
0.(1) 0.(1)
![Page 22: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/22.jpg)
Challenges of it
• Not entirely clear (to me at least) how data locality and transfer (if any) is dealt with in different works
• Typical: a data block is transfered in some order (row), and consumed in different order (column)
![Page 23: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/23.jpg)
SoC/NoC intelligent routers ?
• To match (and be programmed from) the application, routers – should be able to fork the data,– Should only care about directions (not single target)– Should let ccommunications cross one another if not using same lines– Should be able to buffer values (at least in the size of the processor array)
P Core P Core P Core P Core
P Core P Core P Core P Core
P Core P Core P Core P Core
![Page 24: AAA which application form ?](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816661550346895dd9eb70/html5/thumbnails/24.jpg)
ThA Ank You
Questions anybody ?
ThAAAnk YouA