Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.
-
Upload
vernon-west -
Category
Documents
-
view
215 -
download
0
Transcript of Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.
![Page 1: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/1.jpg)
Diaries of a Desperate (XML|XProc) Hacker
![Page 2: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/2.jpg)
Diaries of a Desperate (XML|XProc) Hacker
James FullerLead Engineer | MarkLogic
![Page 3: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/3.jpg)
Background
• Engineer on MarkLogic API team (History meters, Management API, etc…)
• W3C XML Processing WG (XProc v2.0)• 2001 started with XML tech (EXSLT),XML
Prague, etc… • Open source contrib.
• Thank you to the organisers of XProc XML London 2015
![Page 4: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/4.jpg)
Agenda
1. XML Hacker Desperation2. XMLCalabash & depify3. Show & Tell4. XProc Hacker Desperation5. Summary6. Goto pub
* Raise your hand to ask question
* Yes, I am going to ‘powerpoint’ you
![Page 5: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/5.jpg)
xkcd.com - http://xkcd.com/208/ [xkcd-ref]
TheD.P.H.
Email !!!
![Page 6: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/6.jpg)
D.P.H. – a twinkling in SGML eye
• Desperate Perl Hacker – Paul Grosso 1997 xml-dev link – Google images ‘desperate perl hacker’ link– Etymological cousin of ‘Just Another Perl Hacker’
(JAPH) – Randal Schwartz aka Merlin• What’s it all about ?– GSD– Opaque One liners (Perl Golf encouraged)– Even better if (regex|pipes|sed|awk) involved– Challenge: Be able to munge XML with Perl
![Page 7: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/7.jpg)
Desperate XML Hacker
• GAD (Get it All Done) with XML Stack• ‘clever’ (and|or) ‘clear’ • Highly productive, albeit marooned and
anxious on ‘XML island’• Working with xml means working with
documents and that means working with document workflows
![Page 8: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/8.jpg)
All programmers are desperate
emacs
xpath
xslt
xquery
marklogic
xml
emacs
json
java
gradle
ant
bash…..
![Page 9: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/9.jpg)
• Day 1 - transform an xml doc with XSLT• Day 2 - run transform on set of docs• Day 3 - generate multiple output formats• Day 4 - read docs from database• Day 5 - put results into database• Day 6 - notify when its done• Day 7 - run assertions and validate results• Day 8 - generate png from svg for each document• Day 8 - zip up files and upload them (w/ oauth)• Day 9 - create EPub• And so forth …
![Page 10: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/10.jpg)
Technology Selection – XSLT– XQuery– Bash scripts– Makefiles– Ant– Java
– All of the above ?
![Page 11: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/11.jpg)
Adhoc pipelines
TRANSFORM
GENERATE
PACKAGE
zip
notify
upload
![Page 12: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/12.jpg)
Pipelines manage complexity
• Transformation decomposition is the key to complexity management, just ask:– Henry Ford– Herbert Simon (The Two Watchmakers – “The Architecture of
Complexity”)– George Miller (7+/-2)– Adam Smith (An Inquiry into the Nature And Causes of the
Wealth of Nations,1776)– Any electrical/chemical engineer– Michael A. Jackson
[McGrath2004] Sean McGrath. Performing impossible feats of XML processing with pipelining, Proc XML Open 2004,
• Easy to build, test and reuse• Segregation of business rules from grammar rules• Enable group collaboration
![Page 13: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/13.jpg)
Michael Kay Balisage 2009 – ‘You Pull, I’ll Push: on the Polarity of Pipelines’
• ‘the code of each step in the pipeline is kept very simple’
• ‘very easy to assemble an application from a set of components, thus maximizing the potential for component reuse’
• ‘there is no requirement that each step in a pipeline should use the same technology; it's easy to mix XSLT, XQuery, Java and so on in different stages.’
http://www.balisage.net/Proceedings/vol3/html/Kay01/BalisageVol3-Kay01.html
![Page 14: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/14.jpg)
![Page 15: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/15.jpg)
Use all the XML technologies …
![Page 16: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/16.jpg)
Modern XMLTier 1
Modern XMLTier 2
Core XML 1.0NamespacesXPATH 1.0/2.0/3.0
XML Canonicalization
Transform/Query
XSLT 1.0/2.0/3.0XQuery 1.0/3.0
XSLT 1.0/2.0 (in browser)
Processing SAX, DOM XProc?, XOM
Other XML Catalog XForms
Schema SchematronXML Schema 1.0
RELAX-NGXML Schema 1.1
Semantics RDFOWL
SPARQLSPARQL Update
Vocabularies* SVG‘Office’ Doc ML….
MathMLDocbookDITAXHTML
- Amended from XML Amsterdam 2012 Keynote
XML – The Good Parts
![Page 17: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/17.jpg)
Dependency Adoption (technology selection)
![Page 18: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/18.jpg)
Helter skelter
Dependency Adoption
![Page 19: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/19.jpg)
Helter skelterhttp://upload.wikimedia.org/wikipedia/commons/thumb/b/ba/Helter_skelter.jpg/440px-Helter_skelter.jpg
Its more like this
![Page 20: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/20.jpg)
The right Tool
![Page 21: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/21.jpg)
Obligatory Jedi slide
![Page 22: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/22.jpg)
But it works!
![Page 23: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/23.jpg)
Java and XML
![Page 24: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/24.jpg)
xml:Father- "XML gives Java something to do.”
• XML, Java, and the future of the Web 1997, Jon Bosak - http://www.ibiblio.org/pub/sun-info/standards/xml/why/xmlapps.htm
• SAX,DOM• Unicode support• Distributed
• Caring and feeding of java vm• Invoke abstraction (classpath, jar fun)
![Page 25: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/25.jpg)
Do Java and XML work better together?
![Page 26: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/26.jpg)
Not enough time
![Page 27: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/27.jpg)
Not enough time
![Page 28: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/28.jpg)
Desire to be Productive
![Page 29: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/29.jpg)
10x programmers is not a myth• Augustine, N. R. 1979. "Augustine’s Laws and Major System Development Programs." Defense Systems Management
Review: 50-76.• Boehm, Barry W., and Philip N. Papaccio. 1988. "Understanding and Controlling Software Costs." IEEE Transactions on
Software Engineering SE-14, no. 10 (October): 1462-77.• Boehm, Barry, et al, 2000. Software Cost Estimation with Cocomo II, Boston, Mass.: Addison Wesley, 2000.• Boehm, Barry W., T. E. Gray, and T. Seewaldt. 1984. "Prototyping Versus Specifying: A Multiproject Experiment." IEEE
Transactions on Software Engineering SE-10, no. 3 (May): 290-303. Also in Jones 1986b.• Card, David N. 1987. "A Software Technology Evaluation Program." Information and Software Technology 29, no. 6
(July/August): 291-300.• Curtis, Bill. 1981. "Substantiating Programmer Variability." Proceedings of the IEEE 69, no. 7: 846.• Curtis, Bill, et al. 1986. "Software Psychology: The Need for an Interdisciplinary Program." Proceedings of the IEEE 74, no. 8:
1092-1106.• DeMarco, Tom, and Timothy Lister. 1985. "Programmer Performance and the Effects of the Workplace." Proceedings of the
8th International Conference on Software Engineering. Washington, D.C.: IEEE Computer Society Press, 268-72.• DeMarco, Tom and Timothy Lister, 1999. Peopleware: Productive Projects and Teams, 2d Ed. New York: Dorset House, 1999.• Mills, Harlan D. 1983. Software Productivity. Boston, Mass.: Little, Brown.• Sackman, H., W.J. Erikson, and E. E. Grant. 1968. "Exploratory Experimental Studies Comparing Online and Offline
Programming Performance." Communications of the ACM 11, no. 1 (January): 3-11.• Valett, J., and F. E. McGarry. 1989. "A Summary of Software Measurement Experiences in the Software Engineering
Laboratory." Journal of Systems and Software 9, no. 2 (February): 137-48.• Weinberg, Gerald M., and Edward L. Schulman. 1974. "Goals and Performance in Computer Programming." Human Factors
16, no. 1 (February): 70-77.
![Page 30: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/30.jpg)
Except when it is a myth
• technical debt – Maintainable/Upgrade– Add new features– Enterprise requirements
• more bugs• brittle code
Upfront designTechnology selectionBalancing trade-offs to achieve sum gain
![Page 31: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/31.jpg)
reflection
• Desperate people do desperate things– Use all the XML technologies– Dependency adoption– Not the right tool– Not enough time– Being productive
![Page 32: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/32.jpg)
avoid being a D.X.H.
• Careful technology selection• Manage your dependencies• Avoid distributing logic up/down/across tech
stack (hint: don’t use bash, makefiles, ant, etc)• Simplify interaction with Java (VM)• Model pipelines (hint: XProc)
![Page 33: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/33.jpg)
avoid being a D.X.H.
• Use XProc (XMLCalabash)– XProc is designed for XML processing pipelines– Extensible– Simplify and aggregate logic
• Use XProc extension steps (depify) – XProc w/o extension steps is half of XProc– Provide façade over other technologies
![Page 34: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/34.jpg)
We use pipelines• John Lumley – worked with DITA OT • Sandro Cirulli - workflow (pull scm, push db, process)• Nic Gibson – conversion workflows• Philip Fearon - types of workflows (seq and concurrent)
with XMLFlow• Andrew Sales – schematron on word docs (used Ant)• ….
• most talks mentioned workflow/pipeline– ~100 mentions in proceedings– guestimate ~6 mentions per hour during the talks
![Page 35: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/35.jpg)
Desperate XProc Hacker• XProc learning curve
– v1.0 verbose in places– XProc generic by design– Some ‘Batteries not included’
• XProc v2.0 addresses this– Simplify connecting steps– Simplify parameters (maps)– Flow control– Metadata– Anything ‘flows’– avt/tvt– Syntactic optimisations
• depify provides a way to distribute and reuse extension steps
beats the problems that arise using ‘hairball’ approach
![Page 36: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/36.jpg)
XMLCalabash & depify
• XMLCalabash – XProc processor– Norm Walsh – http://xmlcalabash.com/
• depify – XProc dependency management – http://depify.com/
![Page 37: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/37.jpg)
XMLCalabash extension steps
![Page 38: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/38.jpg)
package com.example.library;
import com.xmlcalabash.library.DefaultStep;… elided …import com.xmlcalabash.runtime.XAtomicStep;
@XMLCalabash( name = "ex:hello-world", type = "{http://example.org/xmlcalabash/steps}hello-world")
public class HelloWorld extends DefaultStep { private WritablePipe result = null;
public HelloWorld(XProcRuntime runtime, XAtomicStep step) { super(runtime,step); }
public void setOutput(String port, WritablePipe pipe) { result = pipe; }
public void reset() { result.resetWriter(); }
public void run() throws SaxonApiException { super.run();
… elided … tree.addText("Hello World");… elided …result.write(tree.getResult()); }}
![Page 39: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/39.jpg)
<p:library version="1.0" xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:ex="http://example.org/xmlcalabash/steps">
<p:declare-step type="ex:hello-world"> <p:output port="result"/> </p:declare-step> </p:library>
Library for the step
![Page 40: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/40.jpg)
M Filemode Length Date Time File- ---------- -------- ----------- -------- ----------------------------------------------------- drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/ -rw-r--r-- 843 8-Mar-2015 10:43:38 META-INF/MANIFEST.MF drwxr-xr-x 0 8-Mar-2015 10:43:38 com/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/library/ -rw-r--r-- 2062 8-Mar-2015 10:43:38 com/example/library/HelloWorld.class drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/annotations/ -rw-r--r-- 31 8-Mar-2015 10:43:38 META-INF/annotations/com.xmlcalabash.core.XMLCalabash -rw-r--r-- 294 19-Feb-2015 15:41:00 example-library.xpl- ---------- -------- ----------- -------- ----------------------------------------------------- 3230 9 files
library xpl included in jar
![Page 41: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/41.jpg)
• depify.com
• depify client
• depify github
depify
![Page 42: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/42.jpg)
• Usage of XMLCalabash• Usage of depify• Develop your own step• Distribute with depify
![Page 43: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/43.jpg)
depify future
• Gradle plugin• Depify into other repos to enable day zero
bootstrap (w/ yum, etc)• Integration (expath package management)• More steps• More steps• More steps
![Page 44: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/44.jpg)
Summary
• XProc extension steps provide reuse• XProc v2.0 lets you work in broader context• Pipelines manage complexity• depify specifically built for XProc
(XMLcalabash)• Reuse with existing mechanisms (ex. Maven)
![Page 45: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/45.jpg)
How to Become a Delighted XProc Hacker
• model pipelines with XProc (XMLCalabash)• try out ext steps (depify)• GSD• reuse and distribute new steps (depify)• goto pub
• Stop using bash, makefiles, ant or bending XML tech to control main loop
• Stop making adhoc pipelines
![Page 46: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649ea25503460f94ba5f98/html5/thumbnails/46.jpg)
<pub/>
Thank you for your attention and time, questions ?