Efficient Model Partitioning for Distributed Model Transformations
-
Upload
amine-benelallam -
Category
Presentations & Public Speaking
-
view
144 -
download
0
Transcript of Efficient Model Partitioning for Distributed Model Transformations
Efficient Model Partitioning forDistributed Model Transformations
SLE’16, 1 Nov. 2016, Amsterdam, Netherlands
Amine BenelallamMassimo Tisi
AtlanMod teamNantes, France
Jesús Sánchez CuadradoJuan de Lara
Universidad Autónomade Madrid, Spain
Jordi Cabot
ICREAOpen University
of Catalonia, Spain
1
2
e
hgf
a
dcb
a
d
c
b
e
h
g
f
Distributed (MOF-Compliant) model access and persistence API
a
dcb g
e
hgf d
Coordination
Task node (worker) Data node Concurrent Read/Write
Data distribution Parallel local transformation Parallel global compositionS
plit
1S
plit
2
a
b c d
e
f g h
a
b c d
e
f g h
g
a a
g
e
d
e
d
System Assumps.
● On-demand loading, to ensure that only needed elements are loaded
● concurrent read/write to the persistence backend
● fast look-up of already loaded elements by using caching and/or indexing mechanisms
A.Benelallam et.al.: Distributed model-to-model transformation with ATL on MapReduce. In Proceedings of the 2015th ACM SIGPLAN Int. Conf. on SLE
3
What makes it different than for other distributed applications?
Model Partitioning for Distributed MTs
4
I need an example !!
Class2Relational
Atlanmod Transformation Language (ATL)
module Class2Relational;
create OUT : Relational from IN : Class ;rule Class2Table {
fromc : Class ! Class ( not c.isAbstract )
toout : Relational ! Table (
col <− Sequence { key } −>union ( c . attr−>select( e | not e.multiValued ) ) −>union ( c.assoc−>select ( e | not e.mvalued ) ) ,
keys <− Sequence { key } −>union ( c . assoc−>select ( e | not e.mvalued ) )),
key : Relational ! Column (name <− c.name+’objectId ’ ,type <− thisModule.getObjectIdType
)} [ … ]
Module
Rule
Input pattern
Output pattern
guard
ATL helperbinding
Class2Relational
6
Running example
7
Model elmt. Dependencies
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
Partitioning: Scenario I
8Distributed (MOF-Compliant) model access API
Task nodesInput modelModel elmt. Dependencies
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1
att1
a1
c1
c2
t1
att3
att2
Partitioning: Scenario I
9Distributed (MOF-Compliant) model access API
Task nodesInput modelModel elmt. Dependencies
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1
att1 t1
c2
a1c1 att3
att2
p1
att1
a1
c1
c2
t1
att3
att2
Partitioning: Scenario I
10Distributed (MOF-Compliant) model access API
Task nodesInput modelModel elmt. Dependencies
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1
att1
a1c1
t1
att3
att2
c2
att1
a1c1
t1
att3
att2
c2
8 + 7 = 15
p1
att1
a1
c1
c2
t1
att3
att2
Partitioning: Scenario II
11Distributed (MOF-Compliant) model access API
Task nodesInput model
p1
att1
a1c1
t1
c2
Model elmt. Dependencies
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
c2
t1att3
att2
6 + 4 = 10 (↑%33)
p1
att1
a1
c1
c2
t1
att3
att2
# 1 Dense Structure● Even though models are structured:
● Their density is often high & irregular● The structure of the computation is
only known @runtime
# 2 Variating complexity
● Graph computations is often data-driven and dictated by the structure of the graph
● Irregular computation structure => Irregular computation cost
12
Simple Complex
Highly-dense
Model-data partitioning
13
● Access patterns tend to have poor data locality
● High data access to computation ratio
● Guarantee a balanced computational load
● Ensure a good data locality
Difficult to
x Proposal
14
● Existing graph-data partitioning approaches are not suitable, they either:
a. Assume that the dependency graph exists
b. Reason only on the vertex-connectivity
● We Propose a two steps approach:
I- Footprint extraction
15
● Extract access patterns as sequences of steps● Resulting footprints have the form:
[sourceType][. ‘( ‘?[propertyName][ ‘ ) ‘ ?∗]?]+● Parse OCL expressions in guards, bindings, and
Helpers● Visit OCL’s AST and perform one of the
following unary|binary operations:○ ⊲ : chain the naviagtionCallExp○ ⊕ : decouple the LHS and RHS into two
separate footprints (e.g. conditional expression)
○ Ⓧ : if RHS is accessible from LHS then ⊲ otherwise ⊕ (e.g. select)
● Organize footprints by sourceType
16
:OpCallExp (flatten)
:IteratorExp (collect)
:IteratorExp (reject)
:IteratorExp (select)
:NavCallExp(assoc)
:VarExp(cc)
:AttrCallExp(multiValued)
:AttrCallExp(isAbstract)
:NavCallExp(classes)
:VarExp(p)
:VarExp(c)
:VarExp(a)
Footprint extraction
I- Footprint extraction
p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();
17
:OpCallExp (flatten)
:IteratorExp (collect)
:IteratorExp (reject)
:IteratorExp (select)
:NavCallExp(assoc)
:VarExp(cc)
:AttrCallExp(multiValued)
:AttrCallExp(isAbstract)
:NavCallExp(classes)
:VarExp(p)
:VarExp(c)
:VarExp(a)
Footprint extraction
I- Footprint extraction
p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();
FP= {Package.classes}
FP= { Package(p) }
⊲
18
:OpCallExp (flatten)
:IteratorExp (collect)
:IteratorExp (reject)
:IteratorExp (select)
:NavCallExp(assoc)
:VarExp(cc)
:AttrCallExp(multiValued)
:AttrCallExp(isAbstract)
:NavCallExp(classes)
:VarExp(p)
:VarExp(c)
:VarExp(a)
Footprint extraction
I- Footprint extraction
p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();
FP= { Package.classes}
FP= {Class(c)}
⊲
Ⓧ
FP= {Package.classes}
FP= { Package(p) }
⊲
19
:OpCallExp (flatten)
:IteratorExp (collect)
:IteratorExp (reject)
:IteratorExp (select)
:NavCallExp(assoc)
:VarExp(cc)
:AttrCallExp(multiValued)
:AttrCallExp(isAbstract)
:NavCallExp(classes)
:VarExp(p)
:VarExp(c)
:VarExp(a)
Footprint extraction
I- Footprint extraction
p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();
FP= { Package.classes}
FP= {Class(c)}
⊲
Ⓧ
FP= {Package.classes}
FP= { Package(p) }
⊲
FP= { Class.ass }
FP= { Package.classes.ass }
FP= { Class(cc) }
FP= { Attribute(a) }
FP= {Class.ass}
⊲ ⊲
Ⓧ
20
:OpCallExp (flatten)
:IteratorExp (collect)
:IteratorExp (reject)
:IteratorExp (select)
:NavCallExp(assoc)
:VarExp(cc)
:AttrCallExp(multiValued)
:AttrCallExp(isAbstract)
:NavCallExp(classes)
:VarExp(p)
:VarExp(c)
:VarExp(a)
FP= { Package.classes.ass }
FP= { Package.classes}
FP= { Class.ass }
FP= { Package.classes.ass }
FP= {Package.classes}
FP= { Package(p) }
FP= {Class(c)}
FP= { Class(cc) }
FP= { Attribute(a) }
FP= {Class.ass}
⊲ ⊲ ⊲ ⊲
Ⓧ
Ⓧ Ⓧ
Footprint extraction
I- Footprint extraction
p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();
I- Resulting Footprints
21
Rules Footprints
Package2Schema Package.classes.assocPackage.types
Class2TableClass.assocClass.attrDataType.allInstances
Attribute2Column Attribute.type
MVAttribute2Column Attribute.typeAttribute.owner
Association2Column DataType.allInstances
MVAssociation2Column Association.typeDataType.allInstances
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assocClass.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
II- Model partitioning
22
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
● Greedy & bi-objective algorithm
a. Maximizing data locality
b. Balancing the machine load
● On-live approximation of dependency graph in the form of <machine-id,nextStep>
● A buffer to delay the processing of elements not participating to the construction of the approximate dependency graph
● Instant assignment based on a score function
23
II- Model partitioning
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
24
p1att3a1 c1 c2 t1att1att2
Input stream elmt. Per Machine Dependencies
c1
c2
t1
att2
a1
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Buffer
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
25
p1
att3a1 c1 c2 t1att1att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; }
c2 {<1,assoc>; }
t1
att2
a1
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Buffer
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
26
p1
att3a1 c1 c2 att1att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; }
c2 {<1,assoc>; }
t1
att2
a1
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Buffer
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
t1
27
p1
att3a1 c1 c2
t1
att1
att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>; }
t1 <2,Ø>
att2
a1
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Buffer
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
28
p1
att3a1 c1
c2t1
att1
att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 <2,Ø>
att2 {<1,Ø>}
att3 {<1,Ø>}
a1
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Buffer
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
29
p1
att3
a1
c1
c2t1
att1
att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}
att2 {<1,Ø>}
att3 {<1,Ø>}
a1 {<2,Ø>}
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Buffer
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
30
p1
att3
a1
c1
c2t1
att1
att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}
att2 {<1,Ø>}
att3 {<1,Ø>}
a1 {<2,Ø>}
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Buffer
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
31
Input stream
Buffer
p1
att1
c2 att2
c1
att3
a1
t1
c1
a1
c2
t1
7 + 5 = 12 (↑%20)
II- Model partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
elmt. Per Machine Dependencies
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}
att2 {<1,Ø>}
att3 {<1,Ø>}
a1 {<2,Ø>}
Types Footprints
Package Package.classes.assocPackage.types
ClassClass.assoc, Class.attrDataType.allInstances
Attribute Attribute.typeAttribute.owner
Association Association.typeDataType.allInstances
Type Ø
Evaluation
32
Eclipse Modeling Framework
NeoEMF/HBase
HDFS
XML Metadata Interchange
ATL-MR
Hadoop Task nodes
ATL-MR Master ATL-MR Slaves
Hadoop D
ata nodes
1. Distribute input
2. Monitor
3. Return output
Evaluation results
33
34
Limitations● The performance of our approach may be reduced
when
a. having elements with big amounts of dependencies or
sometimes exceeding the average size of the split (e.g.
allInstances() operation)
b. having approximate graph containing false positive
dependencies (e.g. select() or reject())
c. having an unfavourable order of streamed elements.
Conclusion● We presented our solution for efficient partitioning of distributed MTs as a
greedy algorithm.
○ We introduced an algorithm for the footprints extraction
○ We presented our greedy algorithm for stream model partitioning
○ We experimentally show the scalability of our solution (up to 16% in average)
● In future work we plan to:
○ Extending our work to balanced edge partitioning and conducting a more exhaustive study
on the impact of the model density on the partitioning strategy.
○ Improving the distribution of the intermediate transformation data (tracing information)
35
Questions
Check us out on githubhttps://github.com/atlanmod/ATL_MR
36