PowerPoint slides
Transcript of PowerPoint slides
© 2006 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice
RDF and SOA
David Booth, Ph.D. <[email protected]>HP Software2007-02-27
Latest version: http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-slides.pptThese slides are based on the paper athttp://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm
2 April 13, 2023
Some issues mentioned yesterday• Ken: "Someone walks in with a 40 page schema, and
someone else walks in with a 60 page schema"• Namespace/vocabulary/schema management problem• Chris: "that concept is almost what I need, but not
quite, so I need to define my own"• Paul: "Super billing system, with everyone's features"• Jonathan: "Top down policy enforcement is bucking
the tide"• Skip: "too difficult to reuse [schemas]"• Nick: "I see customers spending 80% of their time on
getting their documents right, not on the interfaces or how to exchange them"
• Ben: "WS interop doesn't solve my problem. I need my services to interoperate"
3 April 13, 2023
Outline• The problem with XML in SOA
−Babelization−Versioning−What's wrong with "THE model"−Lack of consistent semantics across models
• The case for RDF in SOA−Data integration−Consistent semantics−Versioning−Validation
• Bridging between XML and RDF−XML as an RDF serialization
4 April 13, 2023
Problems
5 April 13, 2023
Problem 1: Babelization• Success of Web services causes new
challenges• Each Web service defines its own interface:
−XML messages in/out• WSDL doc describes a “language” for
interacting with that service• As services multiply, these languages
multiply−What do the messages mean?
• Result: “Babelization” (http://www.w3.org/2002/Talks/1218-semweb-dbooth/slide43-0.html )
6 April 13, 2023
Babelization impedes integration• Each service speaks its own language
−Cannot easily connect them in new ways
• Each message type has its own data model−Cannot easily integrate data
7 April 13, 2023
Problem 2: Versioning• Need to be able to independently version
both clients and services• XML is brittle• Versioning is a constant problem
8 April 13, 2023
History of data model design in XML• Version 1: This is the model.• Version 2: Oops! No, *this* is the model.• Version 3: This is the model today, but here's an
extensibility point for tomorrow. . . . Then, after integrating with another model:
• Version 4: This is the *super* model (with extensibility for tomorrow, of course). . . . Eventually, after integrating with more models:
• Version 5: This is the super-duper-*ultra* model (with extensibility, of course)
• etc.
9 April 13, 2023
Schema Hell• Model gets very complex• Each app/component only uses one part of
it• Lots of optionality: hard to know what is
really used−Example: RosettaNet Purchase Order
−Over 500 elements, 75% optional
−Trading partners take 2-3months to negotiate through optionality
10 April 13, 2023
Moral: There is no such thing as THE model. • There are many models. There always will
be.• Standard model is always beneficial when
practical−But only possible in the micro -- not the macro
• Why?−Different apps need different models
−Hard to get agreement as org/committee grows
−App needs change over time
−Org changes over time
11 April 13, 2023
Problem 3: No consistent semantics• 1000s of applications• 1000s of schemas• How do they relate to each other?
−<foo:CustAddress> == <bar:ShippingAddr>
12 April 13, 2023
The Case for RDF in SOA
13 April 13, 2023
RDF• Relational data model framework• W3C standard >6 years• Language for making statements about
things• Used to express both:
−Ontologies (with OWL), and
−Instance data
14 April 13, 2023
Key features of RDF• Syntax independent (specifies model)
−Some existing serializations: RDF/XML, N3, Turtle
• Consistent semantics−Based on URIs
• Great for data integration problems−Data "mashups"
15 April 13, 2023
Why RDF excels at data integration• New data models can be easily added• Old and new data models co-exist in
merged model• Relationships between the old and new
models are expressed explicitly• Both old and new can be used
simultaneously
16 April 13, 2023
Example: Blue App has model
17 April 13, 2023
Red App has model
• Need to integrate Red & Blue models into new Green model. How?
18 April 13, 2023
Model integration in XML• Option 1: Red and Blue models become
subtrees of Green model−Very little gained
−Relationships are implicit in the processing code
• Option 2: Design a new model
• How is it different in RDF?
19 April 13, 2023
Step 1: Merge RDF• Same nodes (URIs) join automatically
20 April 13, 2023
Step 2: Add relationships between Red & Blue models• (Relationships are also RDF)
21 April 13, 2023
Step 3: Define Green model• (Making use of Red
& Blue models)
22 April 13, 2023
What the Blue app sees• No difference!
23 April 13, 2023
What the Red app sees• No difference!
24 April 13, 2023
What the Green app sees
25 April 13, 2023
Consistent semantics• RDF facilitates consistent semantics
−Terms have the same meaning across apps
−Based on URIs
• Semantics of different terms can be related declaratively
• ESB can convert formats, but give no assurance of consistent semantics
• Example: Security entitlements across applications
26 April 13, 2023
RDF and versioning of message models
Model Change
Impact in XML(Closed World Assumption)
Impact in RDF(Open World Assumption)
Extra info Must be anticipated by schema, else error
Ignored
Info expressed differently
Rewrite your code
Use inference rules to recover old info when needed
Info gone Out of luck Out of luck
27 April 13, 2023
Versioning of message models• RDF makes message model versioning easier:
−Syntax independence
−Open World Assumption (OWA)
• RDF does not address process flow versioning−REST addresses that
• RDF : message models :: REST : process flows−RDF is uniform data model framework (triples)
−REST is uniform interface
−Both are about looser coupling
28 April 13, 2023
Document Validation• XML is closed world (normally)• RDF uses open world assumption (normally)• Validation requires different approaches• There are pros and cons
29 April 13, 2023
Document validation
XML RDF
Extradata
Error(normally)
Ignored
Missingdata
Error(normally)
Inferred
30 April 13, 2023
Example: Missing data in XML vs. RDF• Suppose:
−Address requires a city name, but
−City name is missing
• Q: CityName == "New York"?
• In XML (closed world assumption): −Error.
• In RDF (open world assumption): −Sure, why not? (No evidence to the contrary)
31 April 13, 2023
Validation in RDF• Techniques are available• World can be temporarily closed• Sample SPARQL query can check data
−If query succeeds, data is complete
• Tools can also help detect errors (e.g., Eyeball, by Chris Dollin, HP Labs Bristol)
32 April 13, 2023
Validating Messages• WSDL use is lopsided:
−Written solely from the perspective of the service
−Specifies input and output schemas
• Service may not know how its clients use its data!
33 April 13, 2023
Kinds of Data Validation• Model integrity
−Same for producer and consumer
• Suitability for a particular use−Differs for producer and consumer
34 April 13, 2023
Producer Versus Consumer Validation• Data producer should provide validator for
data it creates (model integrity)−E.g., sample SPARQL query
• Data consumer should provide validator for data it expects (suitability for use)−E.g., sample SPARQL query
35 April 13, 2023
RDF and efficiency• RDF is not inherently inefficient
−If you need the work done, it must happen somewhere
−Jena, Arq, etc., are pretty good
• There *is* a learning curve for RDF−Must learn what is efficient and inefficient, just
as in programming and RDBMS queries
• Can process directly in XML if RDF power is not needed
36 April 13, 2023
Fundamental Trends• Need for easier service and data
integration is increasing−Flooded with data
−Continually integrating new models & services
• Need for consistent semantics is increasing
37 April 13, 2023
Bridging XML and RDF
38 April 13, 2023
But Web services already use XML!• XML is well known and used• Legacy apps may require specific XML or
other formats that cannot be changed• Standard RDF/XML serialization is verbose
& ugly
• How can we gain the benefits of RDF while still accommodating XML?
39 April 13, 2023
Recall: RDF is syntax independent• Specifies info model -- not syntax!• Can be serialized in any agreed-upon way• Can define specialized RDF serializations!
40 April 13, 2023
Defining new RDF serializations• New XML (or other) formats can be defined
as specialized serializations of RDF−Mapping converts XML/other to RDF
−XSLT/other can define mapping
−Namespace or GRDDL can specify transformation
• Analogous to microformats or micromodels, but:−not restricted to HTML/xhtml
−not necessarily using standards-based ontologies• Can use app-specific ontologies
41 April 13, 2023
Mapping existing formats to RDF• Existing XML/other formats also can be
mapped to RDF−Treat as specialized RDF serializations
−XSLT/other can define mapping
• Allows service to treat XML/other input as RDF−Both old & new formats
42 April 13, 2023
Normalizing to RDF
Normalizeto RDF
RDF Engine / Store
Service
• Documents on the wire can be XML, other or RDF
• Input can be normalized to RDF• App processing can use RDF engine/store
XML/other RDF
ClientCore appProcessing
43 April 13, 2023
Supporting multiple client formats
Normalizeto RDF
RDF Engine / Store
Core appProcessing
Service
• Different clients may require different formats: versions, etc.−Can even change dynamically
• Can be normalized to RDF for common processing• Core app is unaffected
XML/other RDF
Client
44 April 13, 2023
Service Output
Client
Normalizeto RDF
Serialize asXML/other/RDF
RDF Engine / Store
Service
• Serialize to whatever formats are required −Generate XML/other directly (or even RDF!), or
−SPARQL query can generate specific view virst
Core appProcessing
45 April 13, 2023
Granularity• Mapping from XML to RDF can be done with any
level of granularity• Fine grained:
−Every element, attribute, etc., in XML maps to 1+ RDF assertion
−Permits more detailed inferences
−Adds more complexity & processing up front
• Coarse grained: −Entire chunk of XML maps into RDF
• XML chunk is retained
• RDF metadata can annotate XML chunk
−Simpler, less processing up front
−Information inside chunk is less accessible
46 April 13, 2023
Choosing appropriate granularity
Consider:• What is significant from a distributed
systems perspective?• What may be interpreted differently by
someone else if its semantics are not pinned down?
• When finer granularity is added later, differences may show up
47 April 13, 2023
Summary of Principles for RDF in SOA1. Define interface contracts as though message
content is RDFa. Permit custom XML/other serializations as needed
b. Provide machine-processable mappings to RDF
c. Treat the RDF version as authoritative
2. Client and service should each provide both:a. A model integrity validator for data it creates; and
b. A suitability for use validator for data it expects.
3. Choose RDF granularity that makes sense
48 April 13, 2023
Conclusions• Value of RDF in data integration is well
proven• We have some evidence of its value in SOA,
but need:−More exploration of graceful adoption paths
−More work on transformation techniques
−More work on validating RDF models in SOA context
−Best practices for the above
49 April 13, 2023