PowerPoint slides

49
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice RDF and SOA David Booth, Ph.D. <[email protected]> HP Software 2007-02-27 Latest version: http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-slides.ppt These slides are based on the paper at http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm

Transcript of PowerPoint slides

Page 1: PowerPoint slides

© 2006 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

RDF and SOA

David Booth, Ph.D. <[email protected]>HP Software2007-02-27

Latest version: http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-slides.pptThese slides are based on the paper athttp://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm

Page 2: PowerPoint slides

2 April 13, 2023

Some issues mentioned yesterday• Ken: "Someone walks in with a 40 page schema, and

someone else walks in with a 60 page schema"• Namespace/vocabulary/schema management problem• Chris: "that concept is almost what I need, but not

quite, so I need to define my own"• Paul: "Super billing system, with everyone's features"• Jonathan: "Top down policy enforcement is bucking

the tide"• Skip: "too difficult to reuse [schemas]"• Nick: "I see customers spending 80% of their time on

getting their documents right, not on the interfaces or how to exchange them"

• Ben: "WS interop doesn't solve my problem. I need my services to interoperate"

Page 3: PowerPoint slides

3 April 13, 2023

Outline• The problem with XML in SOA

−Babelization−Versioning−What's wrong with "THE model"−Lack of consistent semantics across models

• The case for RDF in SOA−Data integration−Consistent semantics−Versioning−Validation

• Bridging between XML and RDF−XML as an RDF serialization

Page 4: PowerPoint slides

4 April 13, 2023

Problems

Page 5: PowerPoint slides

5 April 13, 2023

Problem 1: Babelization• Success of Web services causes new

challenges• Each Web service defines its own interface:

−XML messages in/out• WSDL doc describes a “language” for

interacting with that service• As services multiply, these languages

multiply−What do the messages mean?

• Result: “Babelization” (http://www.w3.org/2002/Talks/1218-semweb-dbooth/slide43-0.html )

Page 6: PowerPoint slides

6 April 13, 2023

Babelization impedes integration• Each service speaks its own language

−Cannot easily connect them in new ways

• Each message type has its own data model−Cannot easily integrate data

Page 7: PowerPoint slides

7 April 13, 2023

Problem 2: Versioning• Need to be able to independently version

both clients and services• XML is brittle• Versioning is a constant problem

Page 8: PowerPoint slides

8 April 13, 2023

History of data model design in XML• Version 1: This is the model.• Version 2: Oops! No, *this* is the model.• Version 3: This is the model today, but here's an

extensibility point for tomorrow. . . . Then, after integrating with another model:

• Version 4: This is the *super* model (with extensibility for tomorrow, of course). . . . Eventually, after integrating with more models:

• Version 5: This is the super-duper-*ultra* model (with extensibility, of course)

• etc.

Page 9: PowerPoint slides

9 April 13, 2023

Schema Hell• Model gets very complex• Each app/component only uses one part of

it• Lots of optionality: hard to know what is

really used−Example: RosettaNet Purchase Order

−Over 500 elements, 75% optional

−Trading partners take 2-3months to negotiate through optionality

Page 10: PowerPoint slides

10 April 13, 2023

Moral: There is no such thing as THE model. • There are many models. There always will

be.• Standard model is always beneficial when

practical−But only possible in the micro -- not the macro

• Why?−Different apps need different models

−Hard to get agreement as org/committee grows

−App needs change over time

−Org changes over time

Page 11: PowerPoint slides

11 April 13, 2023

Problem 3: No consistent semantics• 1000s of applications• 1000s of schemas• How do they relate to each other?

−<foo:CustAddress> == <bar:ShippingAddr>

Page 12: PowerPoint slides

12 April 13, 2023

The Case for RDF in SOA

Page 13: PowerPoint slides

13 April 13, 2023

RDF• Relational data model framework• W3C standard >6 years• Language for making statements about

things• Used to express both:

−Ontologies (with OWL), and

−Instance data

Page 14: PowerPoint slides

14 April 13, 2023

Key features of RDF• Syntax independent (specifies model)

−Some existing serializations: RDF/XML, N3, Turtle

• Consistent semantics−Based on URIs

• Great for data integration problems−Data "mashups"

Page 15: PowerPoint slides

15 April 13, 2023

Why RDF excels at data integration• New data models can be easily added• Old and new data models co-exist in

merged model• Relationships between the old and new

models are expressed explicitly• Both old and new can be used

simultaneously

Page 16: PowerPoint slides

16 April 13, 2023

Example: Blue App has model

Page 17: PowerPoint slides

17 April 13, 2023

Red App has model

• Need to integrate Red & Blue models into new Green model. How?

Page 18: PowerPoint slides

18 April 13, 2023

Model integration in XML• Option 1: Red and Blue models become

subtrees of Green model−Very little gained

−Relationships are implicit in the processing code

• Option 2: Design a new model

• How is it different in RDF?

Page 19: PowerPoint slides

19 April 13, 2023

Step 1: Merge RDF• Same nodes (URIs) join automatically

Page 20: PowerPoint slides

20 April 13, 2023

Step 2: Add relationships between Red & Blue models• (Relationships are also RDF)

Page 21: PowerPoint slides

21 April 13, 2023

Step 3: Define Green model• (Making use of Red

& Blue models)

Page 22: PowerPoint slides

22 April 13, 2023

What the Blue app sees• No difference!

Page 23: PowerPoint slides

23 April 13, 2023

What the Red app sees• No difference!

Page 24: PowerPoint slides

24 April 13, 2023

What the Green app sees

Page 25: PowerPoint slides

25 April 13, 2023

Consistent semantics• RDF facilitates consistent semantics

−Terms have the same meaning across apps

−Based on URIs

• Semantics of different terms can be related declaratively

• ESB can convert formats, but give no assurance of consistent semantics

• Example: Security entitlements across applications

Page 26: PowerPoint slides

26 April 13, 2023

RDF and versioning of message models

Model Change

Impact in XML(Closed World Assumption)

Impact in RDF(Open World Assumption)

Extra info Must be anticipated by schema, else error

Ignored

Info expressed differently

Rewrite your code

Use inference rules to recover old info when needed

Info gone Out of luck Out of luck

Page 27: PowerPoint slides

27 April 13, 2023

Versioning of message models• RDF makes message model versioning easier:

−Syntax independence

−Open World Assumption (OWA)

• RDF does not address process flow versioning−REST addresses that

• RDF : message models :: REST : process flows−RDF is uniform data model framework (triples)

−REST is uniform interface

−Both are about looser coupling

Page 28: PowerPoint slides

28 April 13, 2023

Document Validation• XML is closed world (normally)• RDF uses open world assumption (normally)• Validation requires different approaches• There are pros and cons

Page 29: PowerPoint slides

29 April 13, 2023

Document validation

XML RDF

Extradata

Error(normally)

Ignored

Missingdata

Error(normally)

Inferred

Page 30: PowerPoint slides

30 April 13, 2023

Example: Missing data in XML vs. RDF• Suppose:

−Address requires a city name, but

−City name is missing

• Q: CityName == "New York"?

• In XML (closed world assumption): −Error.

• In RDF (open world assumption): −Sure, why not? (No evidence to the contrary)

Page 31: PowerPoint slides

31 April 13, 2023

Validation in RDF• Techniques are available• World can be temporarily closed• Sample SPARQL query can check data

−If query succeeds, data is complete

• Tools can also help detect errors (e.g., Eyeball, by Chris Dollin, HP Labs Bristol)

Page 32: PowerPoint slides

32 April 13, 2023

Validating Messages• WSDL use is lopsided:

−Written solely from the perspective of the service

−Specifies input and output schemas

• Service may not know how its clients use its data!

Page 33: PowerPoint slides

33 April 13, 2023

Kinds of Data Validation• Model integrity

−Same for producer and consumer

• Suitability for a particular use−Differs for producer and consumer

Page 34: PowerPoint slides

34 April 13, 2023

Producer Versus Consumer Validation• Data producer should provide validator for

data it creates (model integrity)−E.g., sample SPARQL query

• Data consumer should provide validator for data it expects (suitability for use)−E.g., sample SPARQL query

Page 35: PowerPoint slides

35 April 13, 2023

RDF and efficiency• RDF is not inherently inefficient

−If you need the work done, it must happen somewhere

−Jena, Arq, etc., are pretty good

• There *is* a learning curve for RDF−Must learn what is efficient and inefficient, just

as in programming and RDBMS queries

• Can process directly in XML if RDF power is not needed

Page 36: PowerPoint slides

36 April 13, 2023

Fundamental Trends• Need for easier service and data

integration is increasing−Flooded with data

−Continually integrating new models & services

• Need for consistent semantics is increasing

Page 37: PowerPoint slides

37 April 13, 2023

Bridging XML and RDF

Page 38: PowerPoint slides

38 April 13, 2023

But Web services already use XML!• XML is well known and used• Legacy apps may require specific XML or

other formats that cannot be changed• Standard RDF/XML serialization is verbose

& ugly

• How can we gain the benefits of RDF while still accommodating XML?

Page 39: PowerPoint slides

39 April 13, 2023

Recall: RDF is syntax independent• Specifies info model -- not syntax!• Can be serialized in any agreed-upon way• Can define specialized RDF serializations!

Page 40: PowerPoint slides

40 April 13, 2023

Defining new RDF serializations• New XML (or other) formats can be defined

as specialized serializations of RDF−Mapping converts XML/other to RDF

−XSLT/other can define mapping

−Namespace or GRDDL can specify transformation

• Analogous to microformats or micromodels, but:−not restricted to HTML/xhtml

−not necessarily using standards-based ontologies• Can use app-specific ontologies

Page 41: PowerPoint slides

41 April 13, 2023

Mapping existing formats to RDF• Existing XML/other formats also can be

mapped to RDF−Treat as specialized RDF serializations

−XSLT/other can define mapping

• Allows service to treat XML/other input as RDF−Both old & new formats

Page 42: PowerPoint slides

42 April 13, 2023

Normalizing to RDF

Normalizeto RDF

RDF Engine / Store

Service

• Documents on the wire can be XML, other or RDF

• Input can be normalized to RDF• App processing can use RDF engine/store

XML/other RDF

ClientCore appProcessing

Page 43: PowerPoint slides

43 April 13, 2023

Supporting multiple client formats

Normalizeto RDF

RDF Engine / Store

Core appProcessing

Service

• Different clients may require different formats: versions, etc.−Can even change dynamically

• Can be normalized to RDF for common processing• Core app is unaffected

XML/other RDF

Client

Page 44: PowerPoint slides

44 April 13, 2023

Service Output

Client

Normalizeto RDF

Serialize asXML/other/RDF

RDF Engine / Store

Service

• Serialize to whatever formats are required −Generate XML/other directly (or even RDF!), or

−SPARQL query can generate specific view virst

Core appProcessing

Page 45: PowerPoint slides

45 April 13, 2023

Granularity• Mapping from XML to RDF can be done with any

level of granularity• Fine grained:

−Every element, attribute, etc., in XML maps to 1+ RDF assertion

−Permits more detailed inferences

−Adds more complexity & processing up front

• Coarse grained: −Entire chunk of XML maps into RDF

• XML chunk is retained

• RDF metadata can annotate XML chunk

−Simpler, less processing up front

−Information inside chunk is less accessible

Page 46: PowerPoint slides

46 April 13, 2023

Choosing appropriate granularity

Consider:• What is significant from a distributed

systems perspective?• What may be interpreted differently by

someone else if its semantics are not pinned down?

• When finer granularity is added later, differences may show up

Page 47: PowerPoint slides

47 April 13, 2023

Summary of Principles for RDF in SOA1. Define interface contracts as though message

content is RDFa. Permit custom XML/other serializations as needed

b. Provide machine-processable mappings to RDF

c. Treat the RDF version as authoritative

2. Client and service should each provide both:a. A model integrity validator for data it creates; and

b. A suitability for use validator for data it expects.

3. Choose RDF granularity that makes sense

Page 48: PowerPoint slides

48 April 13, 2023

Conclusions• Value of RDF in data integration is well

proven• We have some evidence of its value in SOA,

but need:−More exploration of graceful adoption paths

−More work on transformation techniques

−More work on validating RDF models in SOA context

−Best practices for the above

Page 49: PowerPoint slides

49 April 13, 2023