Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean...
-
Upload
destiny-reynolds -
Category
Documents
-
view
216 -
download
1
Transcript of Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean...
![Page 1: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/1.jpg)
Sean McGrath http://www.propylon.com 1
Performing impossible feats of XML processing with pipelining
XML Open 2004
Sean McGrath
Propylon
http://www.propylon.com
http://seanmcgrath.blogspot.com
![Page 2: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/2.jpg)
Sean McGrath http://www.propylon.com 2
• The pipelining philosophy• Major functional elements of pipelines• Some examples• Pipelining and Grids• Pipelining and Web Services/SOAs• Some anticipated objections (and answers)• Some musings• Some technology pointers
Contents
![Page 3: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/3.jpg)
Sean McGrath http://www.propylon.com 3
What is XML pipelining?
• It is an architectural framework for developing robust, scaleable, manageable XML processing systems.
• based on proven mechanical manufacturing patterns. Specifically:– Assembly Lines (divide and conquer) – Component assembly and component re-use
![Page 4: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/4.jpg)
Sean McGrath http://www.propylon.com 4
What is XML pipelining and why is it useful?
• A way of thinking about systems that focuses on XML dataflows rather than object APIs. (This is critical and non-trivial focus-shift for many programmers!)
• Why? Because pipelining provides a mechanical, inspiration-free, genius-free way of handling the mind-boggling complexity of complex XML transformation projects.
![Page 5: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/5.jpg)
Sean McGrath http://www.propylon.com 5
Pipelining Philosophy
XML is all about complex hierarchical data structures…
![Page 6: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/6.jpg)
Sean McGrath http://www.propylon.com 6
Pipelining Philosophy
Henry Ford’s Model T Ford Assembly Line – 1914
Cars are complex, hierarchical structures
![Page 7: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/7.jpg)
Sean McGrath http://www.propylon.com 7
Pipelining Philosophy
Lunch Assembly Line. NY, 2004
Lunch is a complex, hierarchical structure
![Page 8: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/8.jpg)
Sean McGrath http://www.propylon.com 8
Pipelining Philosophy
We are complex, hierarchical structures
![Page 9: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/9.jpg)
Sean McGrath http://www.propylon.com 9
Pipelining philosophy
• What have these scenes got it common?– Complex construction of cars, tuna melts and
tendons made possible and efficient through• assembly line manufacturing pattern of divide and
conquer• re-usable component processes and component
materials
• Why not apply this approach to XML “manufacturing”?
![Page 10: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/10.jpg)
Sean McGrath http://www.propylon.com 10
Pipeline philosophy• Why does the assembly line approach work?
– Transformation task decomposition– Re-usable transformation components
• Transformation decomposition is the key to complexity management. Just ask:– Henry Ford– Herbert Simon (The Two Watchmakers – “The Architecture of
Complexity”)– George Miller (7+/-2)– Adam Smith (An Inquiry into the Nature And Causes of the
Wealth of Nations,1776)– Any electrical or chemical engineer.
![Page 11: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/11.jpg)
Sean McGrath http://www.propylon.com 11
Pipeline philosophy
• Component re-use is the key to productivity– Ask any form of engineer (electrical, chemical
etc.) apart from software engineers…– Component re-use remains a holy grail in
software engineering– Pipelining is yet another attempt based on data
transformation and data flow rather than algorithms
![Page 12: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/12.jpg)
Sean McGrath http://www.propylon.com 12
Pipeline philosophy• A lot of data processing for the forseable future will consist of XML to XML transformation• A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail
transformations to non-XML formats• An XML pipeliners mantra:
1. Get data into XML as quickly as possible2. Keep it in XML until the last possible minute3. Bring all your XML tools to bear on solving the data processing problem
![Page 13: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/13.jpg)
Sean McGrath http://www.propylon.com 13
Pipeline philosophy
Input
XMLOutput
XML
Non-XMLInput
Top Transformation
Non-XMLOutput
Tail Transformation
![Page 14: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/14.jpg)
Sean McGrath http://www.propylon.com 14
Pipeline philosophy• The philosophy hinges on the fact that every complex
XML transformation can be broken down into a series of smaller ones than can be chained together
Input XML
Task1
Task2
...Taskn-1 ... Task
n
OutputXML
XPipe
![Page 15: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/15.jpg)
Sean McGrath http://www.propylon.com 15
Pipeline philosophy
• Only so many ways to re-arrange an XML tree structure
• A finite number of fundamental transformations, from which all transformations can be derived
![Page 16: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/16.jpg)
Sean McGrath http://www.propylon.com 16
Pipeline philosophy
1. Starting point: data at time T conforming to “spec” A. Data at time T2 conforming to “spec.” B.
2. Transformation Analysis/Decomposition – decompose the problem of getting from A to B into independent XML in, XML out stages
3. Decide what transformation components you already have.
4. Implement the ones you don’t – make them re-usable for the next transformation project.
![Page 17: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/17.jpg)
Sean McGrath http://www.propylon.com 17
Pipeline philosophy
– Transformation analysis & decomposition leads to• a series of small, manageable, “stand alone” problems with an
XML input “spec” and an XML output “spec”. “Spec” = schemas + structure rules + narrative.
• Can build, test, use and then re-use these transformation components
• Very team development friendly – parallel development of loosely coupled components
• Very debugging friendly – log2(n) “chops” to find any given problem.
![Page 18: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/18.jpg)
Sean McGrath http://www.propylon.com 18
Pipeline debugging
Input
XMLOutput
XML
Non-XMLInput
Top Transformation
Non-XMLOutput
Tail Transformation
SchemaA
SchemaB
SchemaDelta 1
SchemaDelta N…
XMLDelta 1
XMLDelta N
![Page 19: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/19.jpg)
Sean McGrath http://www.propylon.com 19
Pipeline philosophy
• The answer to the SAX/DOM question is “mu”. (More on this later)
• No such thing as “the” correct abstraction for processing XML
• Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem
• Lexical• SAX,STAX,DOM,XOM• COmega,XSLT, XQuery• XDuce, Pyxie, Java, C#, Groovy, Ruby, Haskell, WebIt! Etc. etc.
![Page 20: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/20.jpg)
Sean McGrath http://www.propylon.com 20
Sample PipelineDB
/CMS
CharacterSet Mods Add
Doctype+ validate
+ strip doctype Re-arrangeElements
Stats + FTP
XHTMLGenerate
Validation
SQLReplace
Lexical
Schematron/
RelaxNG/ RhinoJython
Java
XSLT
Lexical
DOM
![Page 21: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/21.jpg)
Sean McGrath http://www.propylon.com 21
Pipeline philosophy
• Many XML transformations end up monolithic• Assertion : developers would use a more
component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves– “Gee, this problem is complex. Maybe I’ll do it in
multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all. Besides, it will run faster...”
![Page 22: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/22.jpg)
Sean McGrath http://www.propylon.com 22
Pipeline philosophy
• “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth
• Pipelining promotes the creation of a reusable plumbing “layer” letting developers concentrate on the application in hand.
![Page 23: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/23.jpg)
Sean McGrath http://www.propylon.com 23
Philosophy Summary
• Think flow - data processing == data transformation w.r.t. time – Michael Jackson
• XML is the current runaway winner in the self-descriptive data stakes and a very good IDDL (Intermediate Data Description Language) for all types of data that are not natively XML based
![Page 24: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/24.jpg)
Sean McGrath http://www.propylon.com 24
Philosophy Summary
• Inside every complex XML transformation is a sequence of simpler XML transformations trying to get out – a pipeline
• Decomposed transformation:– new transformations +– already componentized transformations– -> Component Reuse Nirvana
![Page 25: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/25.jpg)
Sean McGrath http://www.propylon.com 25
Pipeline Philosophy
In Out
In
Level 0 – transformation component
Level 1 - pipeline
Level 2 – Rudimentary orchestration
Out
In
Out
Out
![Page 26: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/26.jpg)
Sean McGrath http://www.propylon.com 26
Simple pipeline transformation component examples
• Fundamental Operation – Rename Element– Rename
• Input : <foo>baz</foo>
• Output: <bar>baz</bar>
foo
baz
bar
baz
![Page 27: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/27.jpg)
Sean McGrath http://www.propylon.com 27
Simple pipeline transformation component examples
• Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo>
• Output: <foo>baz</foo>
foo
baz
bar
foo
baz
![Page 28: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/28.jpg)
Sean McGrath http://www.propylon.com 28
Simple pipeline transformation component examples
• Compound Operation - Matryoshka• Input:
– <foo><bar>baz</bar></foo>
• Output:– <foo></foo><bar></bar>baz
foo
baz
barfoo bar baz
![Page 29: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/29.jpg)
Sean McGrath http://www.propylon.com 29
Simple pipeline transformation component examples
• KlingonCloak– Input:
• <foo><bar>baz</bar></foo>
– Output:– <tag name=“foo”><tag name=“bar”>baz</tag></tag>
foo
baz
bar
tagtype=“foo”
baz
tag type=“bar”
![Page 30: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/30.jpg)
Sean McGrath http://www.propylon.com 30
• Reading a file is an XML to XML transformation– <file>lewisscarrol.xml</file>
– <poem><line>Twas brillig, and the slithy tomes, did gyre and gimbal in the wave</line>…</poem>
Simple pipeline transformation component examples
![Page 31: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/31.jpg)
Sean McGrath http://www.propylon.com 31
• Arithmetic is an XML to XML transformation– <expr>1 + 2</expr>
– <res>3</res>
Simple pipeline transformation component examples
![Page 32: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/32.jpg)
Sean McGrath http://www.propylon.com 32
Simple pipeline transformation component examples
• Unix pipe utilities e.g. tr– hello world
– HELLO WORLD
![Page 33: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/33.jpg)
Sean McGrath http://www.propylon.com 33
• Conditionals are XML to XML transformation “tee junctions” triggered by XPaths
In
if XPath
if XPath TRUE branch
if XPath FALSE branch
A little orchestration in a transformation component
![Page 34: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/34.jpg)
Sean McGrath http://www.propylon.com 34
Validation as a transformation component
XMLA
XMLA’RelaxNG
SchematronJython/Java/JACL
XComponent
ValidationLog
Input Output
Error
![Page 35: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/35.jpg)
Sean McGrath http://www.propylon.com 35
Sample Transformation Component Examples
• Once you start thinking in terms of pipes – components appear everywhere:– Regular fragmentations– Doctype changer– Namespace normalizer– Character set transcoder– Hash generator– Architectural form processing– RelaxNG/Schematron etc
![Page 36: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/36.jpg)
Sean McGrath http://www.propylon.com 36
First objection
• “It will be dog slow” or (stronger form):– “Re-usable tree transforming components
won’t work in my shop – my XML files are too big to schlep around in strings, never mind DOMs!”
![Page 37: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/37.jpg)
Sean McGrath http://www.propylon.com 37
Document fulcra and the scatter/gather pattern
• For any given transformation t to be performed on documents conforming to schema s, there is a fragment expression that can be used to chop each document into n pieces, on which t can be performed.
• I call these points fulcra and are a function of (t,s)
![Page 38: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/38.jpg)
Sean McGrath http://www.propylon.com 38
Identifying Fulcra
• For data-oriented XML, the fulcra often coincide with the “record” iteration in the XML schema and may be independent of t.
• For document-oriented XML, the fulcra are much more dependent on t.
![Page 39: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/39.jpg)
Sean McGrath http://www.propylon.com 39
Document fulcra and scatter/gather pattern
• Having identified the fulcra:-– Chop the input document into fragments –
scatter phase– Perform t– Join all the processed fragments together to
constitute the output document – gather phase
• Three stage pipeline – scatter & gather either side of the core component
![Page 40: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/40.jpg)
Sean McGrath http://www.propylon.com 40
Document FulcraInputDoc
OutputDoc
t t tt t
Scatter
Invoke t
Gather
n fragments
TIM
E
n fragments
![Page 41: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/41.jpg)
Sean McGrath http://www.propylon.com 41
Document Fulcra
• Note the data domain de-composition – SETI@Home meets XML markup.
• Trivially parallelizable
![Page 42: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/42.jpg)
Sean McGrath http://www.propylon.com 42
Document Fulcra• A good fulcra based scatter/gather will make
performance head north faster, cheaper and with a high upper limit than any amount of hand-crafted, genius level XML coding of your transformations in horrid SAX or lexical parse mode.– Massive Parallelism will kill all von Neumann
throughput arguments• Documents per second, not seconds per document –
throughput is the true measure of XML processing speed• Document fulcra – Locality of reference (Denning) applies to
XML processing (more on this later)
![Page 43: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/43.jpg)
Sean McGrath http://www.propylon.com 43
More objections (with more answers)
• It will be slow– No it won’t -
Premature optimization is the root of all evil!
– Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z
Speed
of
Develo
pmen
t
Speed ofExecution
Spe
ed o
fm
odifi
catio
n
Me at age 26
Me at age 39
Me at age 49(Projected)
The 3 Axes to Speed
![Page 44: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/44.jpg)
Sean McGrath http://www.propylon.com 44
Some objections (with some answers)
• Component based software? Harumph! We have heard that one before…– Pipelines are data flow based not API based
(COM, VBX, CORBA)– Two pin interfaces and minimal “verbs”– The XML “payload” is what is important – not
the API - RESTian
![Page 45: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/45.jpg)
Sean McGrath http://www.propylon.com 45
Revisiting the XSLT/DOM -> SAX non-sequiter
• XSLT and DOM are memory bound – trade off between ease of use and resource usage – ease of use favoured
• SAX is not memory bound – trade off between ease of use and resource usage – low resource usage favoured
• On xml-dev users often advised to rewrite their apps using SAX! Ugh!
![Page 46: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/46.jpg)
Sean McGrath http://www.propylon.com 46
XSLT/DOM -> pipeline
• Pipelines and scatter/gather allow you to keep the ease of use of XSLT/DOM with the finite resource utilization of SAX
• As long as you can identify a good fulcrum function– They exist more often than not– If they exist, they are very easily found and “drop out”
of document analysis – eg: xpath expressions in XSLT stylesheet templates
![Page 47: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/47.jpg)
Sean McGrath http://www.propylon.com 47
Pipelining and Grids
• Grid Technologies – computational power “on tap” (http://www.gridforum.org)
• A match made in heaven (bandwidth permitting)
![Page 48: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/48.jpg)
Sean McGrath http://www.propylon.com 48
An XML Processing Grid – on demand
XGrid Interface(XJCL)
XGridComputational
PowerSources
In
Out
Out
DMZ
![Page 49: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/49.jpg)
Sean McGrath http://www.propylon.com 49
Grids - caveats
• For large data volumes it is simple not feasible to shunt the data over the wire – Jim Gray
• Organizations are sensitive about their data going beyond firewalls
• Pay-per-use “racks” in your back-office a better bet. – Rent a grid the way you would rent a chainsaw.
![Page 50: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/50.jpg)
Sean McGrath http://www.propylon.com 50
A Service Oriented Architecture
Integration Layer
Business + PersistenceLayer
Service
Service
Information
Human FacingLayer
SharedService
SharedServicee-Forms
Case Management
TransportAdapters
MessageRouter
“service” = XML transformation with side optional effects
![Page 51: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/51.jpg)
Sean McGrath http://www.propylon.com 51
Pipelines and Service Oriented Architecture
• Can usefully blurr the distinction between a message queue and a transformation pipe
• Services have the same XML-in, XML-out interface– All components can be services– All pipes can be services– All SOAs can be services…
![Page 52: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/52.jpg)
Sean McGrath http://www.propylon.com 52
Federated SOA’s
Portal
Portal
Portal
Pipeline transformation
![Page 53: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/53.jpg)
Sean McGrath http://www.propylon.com 53
Musings #1 - Debugging
• Pipelines are very debugging friendly– log2(N) time required for fault diagnosis
– “Probes” in the form of loggers, RelaxNG validators, easily plug-inable (as transformation components) to a pipe to watch what is going on.
– Pre/Post condition on/off switch is a useful “design by contract” debugger
– XML-aware browsers as “breakpoints”
![Page 54: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/54.jpg)
Sean McGrath http://www.propylon.com 54
Musings #2 – Validation – grammers versus rules versus
FYI’s
• Pipelines make it natural to segregate “business rules” from “grammar rules” and can dramatically simplify both
• Some of the most useful business “rules” are non dyadic. “FYIs” are really, really useful monitoring/QA tools.
![Page 55: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/55.jpg)
Sean McGrath http://www.propylon.com 55
Musings #3 – Inbetween-ing and component development
• Transformation analysts spec the transformation• Only need to code new components• Spec == Documentation of what the transform
needs to do with pre/post etc. but no code• Provides built in JIT-style acceptance test via the
pre/post conditions• Outsource friendly, parallelisability friendly and
third-party market friendly
![Page 56: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/56.jpg)
Sean McGrath http://www.propylon.com 56
Musing #4 - Web Services
• First generation will be a total blind alley – RPC
• Document Oriented Messaging – not Object Oriented Messaging -> SOAs
• The next stage in encapsulation and loose coupling – something like pipelining will be a pre-requisite in a doc/literal world.
![Page 57: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/57.jpg)
Sean McGrath http://www.propylon.com 57
Musing #5 – naming and parametric typing
• Naming components is a really hard problem• Programmers don’t do metadata • Finding components to re-use is a real problem –
the Google lesson• Numerous components that do the same thing but
optimized on different axes:– Space– Time– Infoset considerations
![Page 58: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/58.jpg)
Sean McGrath http://www.propylon.com 58
Musing #6 – Pre-validation Transformation
• Killing ourselves seeking one-shot expressivity in schema validation languages
• Many complex validations become a lot simpler if you do some transformation(s) first– Co-occurrence constraints
– Contextual constraints
• Clear analog with formatting (pre-flow transformation(s) + flow = DSSSL/XSL)
![Page 59: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/59.jpg)
Sean McGrath http://www.propylon.com 59
Musing #7 – grids, scheduling and compilers
• Scheduling transformations on a pipeline grid is hard – manufacturing lore needs to be brought to bear (e.g. Flow Shop Scheduling).
• Pipe -> Component via compiler is a powerful idea– Both for grids (IO optimisation) and for general program
distribution– Pipe compilation can beat the IO problems while retaining
the simple, componentised development approach.– Back to the future with Jackson’s Program Inversion
![Page 60: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/60.jpg)
Sean McGrath http://www.propylon.com 60
Musing #8 – Higher order transformations
• What if, instead of transforming an instance, you transformed a grammer?
• Auto-generation of instance transformation primitives
• Limited to non-PCDATA transforms and side-effect free transforms but useful nonetheless
![Page 61: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/61.jpg)
Sean McGrath http://www.propylon.com 61
Some pipeline-related open source technologies
• | - Unix Pipes• SAX Filters• XBeans• Cocoon• Xpipe (sadly under resourced)• axKit• xvif• DSDL• Ant, W3C Pipeline Note
![Page 62: Sean McGrath 1 Performing impossible feats of XML processing with pipelining XML Open 2004 Sean McGrath.](https://reader035.fdocuments.in/reader035/viewer/2022062511/5513c49c5503466f748b46fb/html5/thumbnails/62.jpg)
Sean McGrath http://www.propylon.com 62
Thank you
(question,answer?)*