XPipe - An XML Processing Methodology

50
XML 2001, Sean McGrath http://www.propylon.com XPipe - An XML Processing Methodology XML 2001 Florida, USA Sean McGrath CTO Propylon

description

XPipe - An XML Processing Methodology. XML 2001 Florida, USA Sean McGrath CTO Propylon. What is XPipe?. It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. based on proven mechanical manufacturing techniques. Specifically: - PowerPoint PPT Presentation

Transcript of XPipe - An XML Processing Methodology

Page 1: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe - An XML Processing Methodology

XML 2001 Florida, USA

Sean McGrath

CTO

Propylon

Page 2: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

What is XPipe?

• It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems.

• based on proven mechanical manufacturing techniques. Specifically:– The Assembly Line Principle– Component assembly and component re-use

Page 3: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

What is XPipe

• An open source project hosted on Sourceforge– http://xpipe.sourceforge.net

• A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations– (If you do not find XML transformation complicated, you

are not sufficiently well informed.)– (And no, XSLT does not solve all your problems)

• A way of thinking about systems that focuses on information flows rather than APIs

Page 4: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Contents of this talk

• The XPipe philosophy• Major functional elements• Some examples• Relationship to other technologies• The XGrid• Some anticipated objections (and answers)• Current status • Current problems• Future plans

Page 5: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe Philosophy

Henry Ford’s Model T Ford Assembly Line – 1914

Cars Are complex, hierarchical structures

Page 6: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe Philosophy

Lunch Assembly Line – 2001

Lunch is a complex, hierarchical structure

Page 7: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe Philosophy

We are complex, hierarchical structures

Page 8: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

• What have these scenes got it common?– Complex construction of cars, tuna melts and

tendons made possible and efficient through• assembly line manufacturing

• re-usable component processes and component materials

• Why not apply this approach to XML “manufacturing”?

Page 9: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy• Why does the assembly line approach work?

– Transformation task decomposition– Re-usable transformation components

• Transformation decomposition is the key to complexity management. Just ask:– Henry Ford– Herbert Simon (The Two Watchmakers – “The Architecture of

Complexity”)– George Miller (7+/-2)– Adam Smith (An Inquiry into the Nature And Causes of the

Wealth of Nations,1776)– Any electrical or chemical engineer.

Page 10: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

• Component re-use is the key to productivity– Ask any form of engineer (electrical, chemical

etc.) apart from software engineers…– Component re-use remains a holy grail in

software engineering– XPipe is yet another attempt…

Page 11: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy• A lot of data processing will consist of XML to XML transformation• A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations• Mantra

– Get data into XML as quickly as possible– Keep it in XML until the last possible minute– Bring all your XML tools to bear on solving the data processing problem

Page 12: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

Input

XMLOutput

XML

Non-XMLInput

Top Transformation

Non-XMLOutput

Tail Transformation

Page 13: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy• The philosophy hinges on the fact that every complex

XML transformation can be broken down into a series of smaller ones than can be chained together

Input XML

Task1

Task2

...Taskn-1 ... Task

n

OutputXML

XPipe

Page 14: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

• Only so many ways to re-arrange an XML tree structure

• A finite number of fundamental transformations, from which all higher order transformations can be derived

Page 15: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

– Transformation Decomposition leads to• a series of small, manageable, “stand alone”

problems with an XML input “spec” and an XML output “spec”.

• Can build, test, use and then re-use these transformation components

• Very team development friendly

• High cohesion, loose coupling – just like the professor advised

Page 16: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

• Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem

• Lexical• SAX• DOM• XSLT• XDuce, Pyxie, Haskell…

Page 17: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Sample XPipeDB

/CMS

CharacterSet Mods Add

Doctype+ validate

+ strip doctype Re-arrangeElements

Stats + FTP

XHTMLGenerate

Validation

SQLReplace

Lexical

Schematron/

RelaxNG/ RhinoJython

Java

XSLT

Lexical

DOM

Page 18: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

• Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves– “Gee, this problem is complex. Maybe I’ll do it in

multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…”

Page 19: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

XPipe philosophy

• “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth

• XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff

Page 20: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Major Functional Elements – XComponents

• Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.)

• All XComponents are standalone programs of the form– [Name] [InputXML] [OutputXML]

[ErrorXML]

Page 21: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Major Functional Elements - XComponents

• XComponents described in XML form. An Xcomponent consists of:– Documentation– Unit Tests (input,output XML stream pairs)– Metadata for retrieval– Input and Output predicates – declarative

(DTD/RelaxNG/Schema) or procedural (code)

Page 22: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Major Functional Elements – XComponent Unit Tester

• Standalone program analogous to JUnit or PyUnit but for XML transformation component testing

• Very outsource-friendly and “inbetweenable” approach (specify everything but the code == spec+doc+test harness all in one)

Page 23: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Major Functional Elements – XPipes

• Described in XML

• Consist of– Documentation– Input/Output Predicates (Schemas/Code)– Test Suite– References to XComponents which are

resolved when the XPipe is installed

Page 24: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Major Functional Elements – XPipe Executive

• Uniprocessor– XPipe executed on 1 machine, possibly with

separate threads for each XComponent task

• Multiprocessor– XML based protocol to implement “Job Shop”

work distribution over a P2P network

Page 25: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Major Functional Elements – XPipe Monitor

Page 26: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Some related open technologies

• | - Unix Pipes• SAX Filters• TRAX• XBeans• Cocoon• axKit• JXTA• Translets• TupleSpaces

Page 27: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Simple XComponent examples

• Fundamental Operation – Rename Element– Rename

• Input : <foo>baz</foo>

• Output: <bar>baz</bar>

foo

baz

bar

baz

Page 28: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Simple XComponent examples

• Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo>

• Output: <foo>baz</foo>

foo

baz

bar

foo

baz

Page 29: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Simple XComponent examples

• Compound Operation - Matryoshka• Input:

– <foo><bar>baz</bar></foo>

• Output:– <foo></foo><bar></bar>baz

foo

baz

barfoo bar baz

Page 30: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Simple Xcomponent examples

• KlingonCloak– Input:

• <foo><bar>baz</bar></foo>

– Output:– <tag name=“foo”><tag name=“bar”>baz</tag></tag>

foo

baz

bar

tagtype=“foo”

baz

tag type=“bar”

Page 31: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Sample Xcomponents

• Once you start thinking in terms of Pipes – components appear everywhere:– Regular fragmentations– Doctype changer– namespace normalizer– Character set transcoder– Hash generator– RelaxNG/Schematron etc

• A validator can be thought of as a component in an Xpipe that mirrors its input on its output

Page 32: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Validation as an XComponent

XMLA

XMLA’RelaxNG

SchematronJython/Java/JACL

XComponent

ValidationLog

Input Output

Error

Page 33: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

The XGrid

• Grid Technologies – computational power “on tap” (http://www.gridforum.org)

• The XGrid – computational power “on tap” to execute XPipes

Page 34: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

The XGrid

XGrid Interface(XJCL)

XMLDataXPipe

XGridComputational

PowerSources

Page 35: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Some objections (with some answers)

• It will be slow– No it won’t -

Premature optimization is the root of all evil!

– Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z

Speed

of

Devel

opm

ent

Speed ofExecution

Spe

ed

ofm

odifi

catio

n

Me at age 26

Me at age 36

Me at age 46(Projected)

The 3 Axes to Speed

Page 36: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Some objections (with some answers)

• It will be slow (cont.)– Massive Parallelism will kill all von Neumann

throughput arguments• Documents per second, not seconds per document

– A myriad of “compile time” optimizations on XPipes possible

– Keep the architecture simple – and speed will sort itself out

Page 37: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Some objections (with some answers)

• Pipes are not rich enough, real data flows require graphs– Inside every graph is a collection of straight

segments– Do the smallest thing than can possible work– XComponents can conditionally flow data in

different directions – graph

Page 38: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Some objections (with some answers)

• Component based software? Harumph! We have heard that one before…– XPipe is data flow based not API based (COM,

VBX, CORBA). They payload is what is important – not the plumbing

– Information integration (needed on the server side)– not application integration (needed on the client side)

Page 39: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Current Status

• Schemas for XPipes and XComponents on xpipe.sourceforge.net. – feedback required

• Sample components (Java/XSLT/Jython) and some documentation

• Simple, illustrative XPipe uniprocessor executives• Draft of XJCL – XGrid Job Control Language

Page 40: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Current Status

• Uniprocessor XPipe used to develop– 80-C pipe from Hub notation for a complex

document type to a legacy mainframe display notation. 120 page spec.

– 20-C pipe for semantic validation of legislation documents

– Xpipe and XComponent validators

Page 41: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Current Problems

• Everybody agrees that an XML document is a tree but:– The content and structure of the tree depends

on the parser– The content and structure of re-generated XML

(The round-tripping problem)

Page 42: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Current Problems

• Naming things– Taxonomy of XTLs (XML Transformation

Languages)– Taxonomy of re-usable XComponents and

XPipes

Page 43: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Current Problems

• Flexible transformation scheduling is hard

• Optimal transformation scheduling is very hard

• Packaging

Page 44: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Future Plans

• Evangelize the idea that DTD validated XML 1.0 is just Well Formed XML that has been through a pipe consisting of:– A transclusion component (entity expansion)

– A macro pre-processor (conditional marked sections)

– An attribute decorator (implied/fixed attributes)

– A grammar checker

– …

Page 45: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Valid XMLWell Formed XML

Paremeter Entity Expansion

Conditional Sections

General Entity Expansion

Attribute Decoration

Grammer Validation ValidXML

Page 46: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Future Plans

• XPipes and XComponents as web services (SOAP/XML-RPC, UDDI etc.)

• Getting the P2P and Grid Technology communities input into XGrid.

• Getting help to develop the XPipe reference implementation on Sourceforge

Page 47: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Future Plans

• Development of commercial implementations of XPipe integrated with leading EAI systems (Ongoing)

• Use of SCADA tools to develop XPipe process control and monitoring systems

Page 48: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Future Plans

• Use of Animation Engineering techniques for CAXTE tools (Computer Aided XML Transformation Engineering)

• Digging around hierarchy theory, self-assembly, bio-informatics and nanofabrication for concepts and tools applicable to XML transformations

Page 49: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

In conclusion

• XPipe is simple

• Simplicity works!

• Plenty of evidence outside of XML engineering that this approach will work

• Plenty of lore and tools from other fields of science can be brought to bear to build systems using the XPipe approach

Page 50: XPipe - An XML Processing Methodology

XML 2001, Sean McGrath http://www.propylon.com

Thank you

– http://xpipe.sourceforge.net