Toward Remote Object Coherence with Compiled Object ...engelen/cpctalk06.pdf · CPC Jan 10, 2006 1...
Transcript of Toward Remote Object Coherence with Compiled Object ...engelen/cpctalk06.pdf · CPC Jan 10, 2006 1...
CPC Jan 10, 2006 1
Toward Remote ObjectCoherence with Compiled
Object Serialization forDistributed Computing with
XML Web Services
Robert van Engelen and Wei Zhang
Florida State University
Madhusudhan Govindaraju
State University of New Yorkat Binghamton
CPC Jan 10, 2006 2
Outline
Motivation:is object-level coherence in XML Web services feasible?
XML serialization:the good, the bad, and the ugly
The gSOAP project:a compiled serialization approach for intercommunication
Results and conclusions
CPC Jan 10, 2006 3
Motivation
“Compiler techniques are necessary toimprove speed and ensure object coherence
in XML-based distributed computing”
CPC Jan 10, 2006 4
Motivation (cont’d)
Object serialization has increasingly becomemore powerful and popular as a means to builddistributed systems Typically based on remote procedure call (RPC) or
its OO equivalent remote method invocation (RMI)
XML is gaining popularity as a serialization format
Message-passing interfaces are also popular, buttheir low-level APIs are not designed to exchangecompound data types and entire object graphs
CPC Jan 10, 2006 5
The Good:a Case for XML Serialization
XML serialization (e.g. used by XML Web services)is gaining popularity, because:+ Binary serialization protocols are inflexible, while XML
is an extensible format that supports compositionality
+ Binary serialization protocols do not interoperate wellacross platforms and through firewalls
+ Hierarchical business data is easily expressed in XML
+ Significant proliferation of XML warehousing
+ Newer generation of office tools use XML-baseddocument formats
CPC Jan 10, 2006 6
The Bad:XML Serialization Issues
XML serialization has shortcomings:– How bad does XML serialization impact
performance and network bandwidth?
– “XML = trees”, so how can XML be used to serializeobject graphs and ensure object-level coherence?
Can compiler techniques help to mitigate theseconcerns?
CPC Jan 10, 2006 7
The Ugly:SOAP RPC Encoding
Plain-old XML to serialize object graphs won’t work:– Naïve approach serializes object graphs into XML trees
(this is fine with business data and most numerical data)
– No standard object referencing mechanism to implementgraph edges of object graphs
– No namespaces to distinguish objects that share thesame name but are structurally different
SOAP RPC encoding standard supports object graphs!
CPC Jan 10, 2006 8
The Ugly (2):SOAP RPC Encoding
XML namespaces to separateoperations and type definitions
The XSD type system to representvalues of primitive types in XML,such as bool, integer, float, string,base64
A new XML array type to encodesparse and partial arrays
Id-href XML attributes toimplement graph edges to encodemulti-ref objects
<env:Envelope> <env:Body env:encodingStyle=“…”> <ns:myRemoteOp> <arg1 xsi:type=“xsd:int”>123</arg1> <arg2 enc:arrayType=“xsd:string[5]”> <item>abc</item> <item href=“#_1”/> <item href=“#_1”/> <item enc:position=“[5]”>xyz</item> </arg2> </ns:myRemoteOp> <mref id=“_1”>def</mref> </env:Body></env:Envelope>
CPC Jan 10, 2006 9
The Ugly (3):SOAP RPC Encoding
At first glance this looks promising, but …– Serialization is still lossy when floating point values
are encoded in text form (due to roundoff)
– No requirements on multi-ref encoding with id-href:
no assurance that the logical structure of an object
graph is unchanged at the receiving side
– Encodes XML with only forward pointing references
to multi-referenced nodes (fixed in SOAP 1.2)
CPC Jan 10, 2006 10
The Ugly (4):SOAP Document/literal Style
More problems with SOAP document/literal style:– Document/literal style requires support for the entire XML
Schema standard, but XML Schemas are intended forvalidation and not for data modeling
– “Impedance mismatch”: programming language typesystems are different than XML schema’s type system
– No default fall-back referencing mechanism such as id-href to implement graph edges, but can use staticdefinition of optional graph edges for each data type
CPC Jan 10, 2006 11
XML Web Services
Object
graph x
Send request
Object
graph
XML
XML Serialization XML Deserialization
Object
graph y
Receive request
Object
graph x’
Receive response
Object
graph
XML
XML Deserialization XML Serialization
Object
graph y’
Send response
Server:transform y
into y’
Client Server
CPC Jan 10, 2006 12
Web Service Tools:Lossy Serialization in XML
Object
graph x
Internal binary format X
Object
graph y
XML
XML Serialization
XML Deserialization
Serialization in XML can be lossy, because object graph x cannot be recoveredfrom its serialized XML form if we are not careful about the choice of mapping
CPC Jan 10, 2006 13
Basic Requirement:Object Graph Isomorphism
Object
graph x
Representation X
Object
graph y
Representation Y
Mapping M
Inverse mapping M -1
Object graphs x and y are isomorph if M is bijective and has an inverse M-1
Object-level coherence requires bijective mappings, so that distributedand serialized object graphs are always isomorph
CPC Jan 10, 2006 14
Object Coherence RequiresLossless XML Serialization
Object
graph x
Internal binary format X
Object
graph y
XML
Serialization method M
Deserialization method M -1
Serialization is lossless if an object graph x can be recovered from itsserialized XML form y = M(x) by deserializing x = M-1(y)
We consider structural equivalence only (location in memory is irrelevant)e.g. as in JavaRMI, IIOP, and DCOM
CPC Jan 10, 2006 15
Compiled XML Serialization
Points-toanalysis
Ptr hash table(graph edges)
XML engineOptimizedserializer
XMLstream
Compiler
Source codetype definitions
generates generates
XML schemadefinitions
CPC Jan 10, 2006 16
Compiled XML Deserialization
Pointerremapper
Id hash table(resolve id-ref)
XML tokenizerLL(1) parser
& deserializerXML
stream
Compiler
Source codetype definitions
generates
Look-asidebuffers (cache)
CPC Jan 10, 2006 17
Static Structure Analysis forSOAP RPC Encoding
<x> <x href=“456”/> … <x href=“456”/> …</x>…<mref id=“456”>…</mref>
<x> <y href=“123”/> … <y href=“123”/> …</x>…<mref id=“123”>…</mref>
<x> <y>…</y> <z>…</z> …</x>
ExampleXML
Instance
<complexType name=“X”> <sequence>…</sequence></complexType
<complexType name=“X”> <sequence>…</sequence></complexType><complexType name=“Y”> <sequence>…</sequence></complexType
<complexType name=“X”> <sequence> <element name=“y” type=“tns:Y”/> <element name=“z” type=“tns:Z”/> </sequence> …</complexType>
XMLSchema
Structure
X
Y Z
X
Y
X X
X
X
CPC Jan 10, 2006 18
Static Structure Analysis forDocument/Literal Encoding
<x ref=“456”>…</x>…<x ref=“456”>…</x>…<x id=“456”>…</x>
<x ref=“123”>…</x>…<x ref=“123”>…</x>…<y id=“123”>…</y>
<x> <y>…</y> <z>…</z> …</x>
ExampleXML
Instance
<complexType name=“X”> <sequence>…</sequence> <attribute name=“id” type=“ID”/> <attribute name=“ref” type=“REF”/></complexType
<complexType name=“X”> <sequence>…</sequence> <attribute name=“ref” type=“REF”/></complexType><complexType name=“Y”> <sequence>…</sequence> <attribute name=“id” type=“ID”/></complexType
<complexType name=“X”> <sequence> <element name=“y” type=“tns:Y”/> <element name=“z” type=“tns:Z”/> </sequence> …</complexType>
XMLSchema
Structure
X
Y Z
X
Y
X X
X
X
CPC Jan 10, 2006 19
Static Structure Analysis forCompiled XML Serialization
Construct a data model Compiler determines which
data type instances canrefer to other data typeinstances at run time
Generate type-specific points-to analysis algorithm Only needed for pointer
types and structs/classeswith pointer fields
Generate type-specificoptimized serialization code Serializer uses id-ref linking
based on pointer analysis
Points-toanalysis
Optimizedserializer
Compiler
Source codetype definitions
generates generates
Ptr hash table(graph edges)
CPC Jan 10, 2006 20
Constructing a Plausible DataModel from Source Code
typedef int SSN;
struct Node{ int val; int *ptr; float num; struct Node *next;};
val ptr num next
Node
SSN
Data model graph(arcs denote all possible
pointer references)
Source code typedefinitions
CPC Jan 10, 2006 21
Generating an XML SchemaDefinition for Serialized XML
val ptr num next
Node
SSN
<simpleType name=“SSN”> <restriction base=“int”/></simpleType>
<complexType name=“Node”> <sequence> <element name=“val” type=“int”/> <element name=“ptr” type=“int” minOccurs=“0”/> <element name=“num” type=“float”/> <element name=“next” type=“tns:Node” minOccurs=“0”/> </sequence></complexType>Data model graph
CPC Jan 10, 2006 22
Generating Code for RuntimePoints-To Analysis
serialize_pointerToint(int *p){ if (p != NULL) { // lookup and mark (p,TYPE_int) // as target in ptr hash table }}
serialize_Node(struct Node *p){ if (p != NULL) { // lookup and mark (p,TYPE_Node) // as target in ptr hash table mark_embedded(&p->val,TYPE_int); serialize_pointerToint(p->ptr); // skip p->num serialize_pointerToNode(p->next); }}
val ptr num next
Node
SSN
Data model graph
CPC Jan 10, 2006 23
Generating Serialization Codeput_pointerToint(int *p){ if (p != NULL) { // lookup (p,TYPE_int) // if embedded, then output “ref” // if single, then output value // if multi, then output value // with “id” and mark embedded }}
put_Node(struct Node *p){ if (p != NULL) { // lookup (p,TYPE_Node) // if embedded, then output “ref” // if single, then output value // if multi, then output value // with “id” and mark embedded put_int(&p->val); put_pointerToint(p->ptr); put_float(&p->num) put_pointerToNode(p->next); }}
val ptr num next
Node
SSN
CPC Jan 10, 2006 24
Serialization Example
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
single14intC4
single116NodeB3
embedded14intB2
multi216NodeA1
RefTypePtrCountPtrSizePtrTypePtrLocID
Ptr hash table after runtime points-to analysis by thegenerated algorithm constructed from the data model
CPC Jan 10, 2006 25
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=Bembedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 26
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 27
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 28
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 29
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 30
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next>
</next> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 31
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 32
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 33
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 34
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 35
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next></Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
CPC Jan 10, 2006 36
Compiled XML Deserialization
For each data type, generateoptimized deserialization code: Generate a recursive descent
LL(1) XML parser that isoptimized for the data type
Embed deserializationoperations as semantic actionsin the parser
Invoke id hash table lookups assemantic actions to resolve id-ref references on the fly
The pointer remapper ensures theconsistency of pointers in theobject graph when nodes aremoved in the construction process
Pointerremapper
Id hash table(resolve id-ref)
LL(1) parser & deserializer
Compiler
Source codetype definitions
generates
CPC Jan 10, 2006 37
Deserialization Example<Node id=“_1”> </Node>
val=?
ptr=?
num=?
next=?
Nodeid=1
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 38
Example (cont’d)
val=123
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> </Node>
id=1
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 39
Example (cont’d)
val=123
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> </Node>
id=1
ref=2(unresolved)
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 40
Example (cont’d)
val=123
ptr=?
num=1.4
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num>
</Node>
id=1
ref=2(unresolved)
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 41
Example (cont’d)
val=123
ptr=?
num=1.4
next=B
Node
val=?
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> </next></Node>
id=1
ref=2(unresolved)
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 42
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
Node
val=456
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> </next></Node>
id=1
id=2
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 43
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr>
</next></Node>
id=1
id=2
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 44
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=2.3
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num>
</next></Node>
id=1
id=2
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 45
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=2.3
next=A
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next></Node>
id=1
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 46
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=2.3
next=A
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next></Node>
id=1
Recursive descent parsingwith object deserializationoperations as semantic actions
CPC Jan 10, 2006 47
Implementation: the gSOAP Toolkit
Web service applicationsare build in two stages:1. Generate service
definitions in familiarC/C++ header file format
2. Generate client/serverstubs and skeletonsource code andserialization code
Note: the soapcpp2compiler can also beused to generate WSDLfrom header files
soapClient.cppsoapServer.cppsoapC.cpp
wsdl2h tool
soapcpp2compiler
Service defs:service.wsdl
Header file defs:service.h
CPC Jan 10, 2006 48
1. Analyze WSDL/Schema andGenerate C/C++ Header File
Header file defs:service.h
wsdl2h tool
Documented methods and class diagrams
Service defs:service.wsdl
CPC Jan 10, 2006 49
2. Analyze Header File andGenerate Stubs and Serializers
stats
Network fabric
gSOAP engineHTTP/S + TCP/IP + UDP
schema-optserializer
schema-optdeserializer
LL(1) XML parsersand (de)serializers
client/server API
gSOAP application
logging
WSSE
Plugins
Native C/C++ data
XML data
soapcpp2compiler
soapClient.cppsoapServer.cppsoapC.cpp
Header file defs:service.h
CPC Jan 10, 2006 50
Performance Results3241 callsper second
gSOAP serialization+deserialization perf. with 1.2K structures
CPC Jan 10, 2006 51
Performance Comparison toOther XML Parsers
Experimental parsers
CPC Jan 10, 2006 52
Conclusions
Is XML serialization a viable alternative to binaryserialization protocols?+ It has the advantages of cross-platform interoperability
and flexibility to manipulate XML with tools such as XSLT,filters, data mining, persistent storage, etc.
– Disadvantage is that naïve serialization can be lossy,implying object-level coherence issues
– Disadvantage is the low performance of XML parsers
+ However, compiled serialization speedup is significant