Groovy Xml processing

41
XML Processing The Groovy approach XML is like a human: it starts out cute when it’s small and gets annoying when it becomes bigger.

Transcript of Groovy Xml processing

Page 1: Groovy Xml processing

XML Processing

The Groovy approach

XML is like a human: it starts out cute when it’s small and gets annoyingwhen it becomes bigger.

Page 2: Groovy Xml processing

AGENDA•XML , Should we groovy it ?•Parsing XML

•Comparing Java and Groovy XML parsing•DOM Category

•Downsides•What’s GPath•Using XMLParser

•Downsides•Using XMLSlurper•XMLPArser VS XMLSluper•So here is my Advice

•Creating XML•Comparing Java and Groovy XML generation•Gstring , It's Not What You Think!•MarkupBuilder•StreamingMarkupBuilder•Comparing builders

•Wait, XML processing is not OXM•XML using Groovy Conclusion

Page 3: Groovy Xml processing

XML , Should we groovy it ?

• Groovy does not force us to duplicate our efforts .

• Use the Java-based approaches as needed specially for legacy XML processing code.

• If we’re creating a new code to process XML, though, we should use Groovy facilities.

here is why ?

Page 4: Groovy Xml processing

Sample XML

<langs type="current">

<language>Java</language>

<language>Groovy</language>

<language>JavaScript</language>

</langs>

• Parsing this trivial XML document is decidedly nontrivial in the Java language , 30 LOC !!

Page 5: Groovy Xml processing

import org.xml.sax.SAXException;

import org.w3c.dom.*;

import javax.xml.parsers.*;

import java.io.IOException;

public class ParseXml {

public static void main(String[] args) {

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

try {

DocumentBuilder db = dbf.newDocumentBuilder();

Document doc = db.parse("src/languages.xml");

//print the "type" attribute

Element langs = doc.getDocumentElement();

System.out.println("type = " + langs.getAttribute("type"));

//print the "language" elements

NodeList list = langs.getElementsByTagName("language");

for(int i = 0 ; i < list.getLength();i++) {

Element language = (Element) list.item(i);

System.out.println(language.getTextContent());

}

}catch(ParserConfigurationException pce) {

pce.printStackTrace();

}catch(SAXException se) {

se.printStackTrace();

}catch(IOException ioe) {

ioe.printStackTrace();

}

}

}

Page 6: Groovy Xml processing

Groovy Code

def langs = new XmlParser().parse("languages.xml")println "type = ${langs.attribute("type")}"langs.language.each{println it.text()

}

//output:type = currentJavaGroovyJavaScript

Page 7: Groovy Xml processing

Groovy 1-0 Java

• Groovy code is significantly shorter than the equivalent Java code

• Far more expressive, Writing langs.language.each, feels like working directly with the XML, it’s not like Java , thanks to the Dynamic nature of groovy and GPath .

Page 8: Groovy Xml processing

Dom Category

• We can use Groovy categories to define dynamic methods on classes ( borrowed from objectiveC )

• Groovy provides a category for working with the Document Object Model (DOM), by adding convenience methods.

• DOMCategory :navigate the DOM structure the DOM API , with the convenience of GPathqueries

Page 9: Groovy Xml processing

What’s GPath?

• Much like how XPath helps navigate the hierarchy of an XML document, but Gpath allows to navigate the hierarchy of objects(POJO/POGO) and XML using Dot notation .

• Ex: car.engine.powerXml :<car year=“20><engine>

<power/></engine></ car >POGO/POJO: Car.getEngine().getPower()

we can access a year attribute of a car using car.'@year' (or car.@year).

For more info : http://groovy.codehaus.org/GPath

Page 10: Groovy Xml processing

Sample XML<languages>

<language name="C++">

<author>Stroustrup</author>

</language>

<language name="Java">

<author>Gosling</author>

</language>

<language name="Lisp">

<author>McCarthy</author>

</language>

<language name="Modula-2">

<author>Wirth</author>

</language>

<language name="Oberon-2">

<author>Wirth</author>

</language>

<language name="Pascal">

<author>Wirth</author>

</language>

</languages>

Page 11: Groovy Xml processing

document = groovy.xml.DOMBuilder.parse(new FileReader('languages.xml'))

rootElement = document.documentElement

use(groovy.xml.dom.DOMCategory) {

println "Languages and authors"

languages = rootElement.language

languages.each { language ->

println "${language.'@name'} authored by ${language.author[0].text()}"

}

def languagesByAuthor = { authorName ->

languages.findAll { it.author[0].text() == authorName }.collect {

it.'@name' }.join(', ')

}

println "Languages by Wirth:" + languagesByAuthor('Wirth')

}

DOM Category

Page 12: Groovy Xml processing

Languages and authorsC++ authored by StroustrupJava authored by GoslingLisp authored by McCarthyModula-2 authored by WirthOberon-2 authored by WirthPascal authored by WirthLanguages by Wirth:Modula-2, Oberon-2, Pascal

Output

Page 13: Groovy Xml processing

Downside

• one restriction is that we need to place code in a (use)Block

Page 14: Groovy Xml processing

XMLParser

• The class groovy.util.XMLParser exploits groovy’s dynamic typing and metaprogramming capabilities.

• The code is much like the example we saw in Using DOMCategory, without the use block

• XMLParser has added the convenience of iterators to the elements, so we can navigate easily using methods such as each(), collect(), and find().

Page 15: Groovy Xml processing

languages = new XmlParser().parse('languages.xml')

println "Languages and authors"

languages.each {

println "${it.@name} authored by ${it.author[0].text()}"

}

def languagesByAuthor = { authorName ->

languages.findAll { it.author[0].text() == authorName }.collect {it.@name }.join(', ')

}

println "Languages by Wirth:" + languagesByAuthor('Wirth')

XMLParser

Page 16: Groovy Xml processing

Downside

• It does not preserve the XML InfoSet1, and it ignores the XML comments and processing instructions in documents.

• For large document sizes, the memory usage of XMLParser might become prohibitive.

Page 17: Groovy Xml processing

XMLSlurper

Same Code as XMLParser

languages = new XmlSlurper().parse('languages.xml')

println "Languages and authors"

languages.language.each {

println "${it.@name} authored by ${it.author[0].text()}"

}

def languagesByAuthor = { authorName ->

languages.language.findAll { it.author[0].text() == authorName }.collect {

it.@name }.join(', ')

}

println "Languages by Wirth:" + languagesByAuthor('Wirth')

Page 18: Groovy Xml processing

XMLSluper

• Name Spaces <languages xmlns:computer="Computer" xmlns:natural="Natural">

<computer:language name="Java"/>

<computer:language name="Groovy"/>

<computer:language name="Erlang"/>

<natural:language name="English"/>

<natural:language name="German"/>

<natural:language name="French"/>

</languages>

Page 19: Groovy Xml processing

XMLSluperlanguages = new XmlSlurper().parse(

'computerAndNaturalLanguages.xml').declareNamespace(human: 'Natural')

print "Languages: "

println languages.language.collect { it.@name }.join(', ')

print "Natural languages: "

println languages.'human:language'.collect { it.@name }.join(', ')

Output :

Languages: Java, Groovy, Erlang, English, German, French

Natural languages: English, German, French

Page 20: Groovy Xml processing

XMLParser VS XMLSluper

• The difference is that the parser structure is evaluated only once, the slurper paths may be evaluated on demand. On demand can be read as "more memory efficient but slower”.

• Ultimatively it depends how many paths/requests – want only to know the value of an attribute in a

certain part of the XML and then be done with it:• XmlParser : process all Nodes , a lot of objects will be

created, memory and CPU spend• XmlSlurper: will not create the extra objects

– If you need all parts of the document anyway, the slurper looses the advantage and will be slower

Page 21: Groovy Xml processing

XMLParser VS XMLSluper

• Both can do transforms on the document, but the slurper assumes it being a constant and thus you would have to first write the changes out and create a new slurper to read the new xml in. The parser supports seeing the changes right away.

Page 22: Groovy Xml processing

Creating XML

• Again Groovy doesn’t force us to use it , We can use the full power of Java APIs based XML processor, such as Xerces with groovy as well

Page 23: Groovy Xml processing

Comparing Java and Groovy XML generation

import org.w3c.dom.*;

import javax.xml.parsers.*;

import javax.xml.transform.*;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;

import java.io.StringWriter;

public class CreateXml {

public static void main(String[] args) {

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

try {

DocumentBuilder db = dbf.newDocumentBuilder();

Document doc = db.newDocument();

Element langs = doc.createElement("langs");

langs.setAttribute("type", "current");

doc.appendChild(langs);

Element language1 = doc.createElement("language");

Text text1 = doc.createTextNode("Java");

language1.appendChild(text1);

langs.appendChild(language1);

Element language2 = doc.createElement("language");

Text text2 = doc.createTextNode("Groovy");

language2.appendChild(text2);

langs.appendChild(language2);

Page 24: Groovy Xml processing

// Output the XML

TransformerFactory tf = TransformerFactory.newInstance();

Transformer transformer = tf.newTransformer();

transformer.setOutputProperty(OutputKeys.INDENT, "yes");

StringWriter sw = new StringWriter();

StreamResult sr = new StreamResult(sw);

DOMSource source = new DOMSource(doc);

transformer.transform(source, sr);

String xmlString = sw.toString();

System.out.println(xmlString);

}catch(ParserConfigurationException pce) {

pce.printStackTrace();

} catch (TransformerConfigurationException e) {

e.printStackTrace();

} catch (TransformerException e) {

e.printStackTrace();

}

}

}

Page 25: Groovy Xml processing

Comparing Java and Groovy XML generation

• I know that some of you are crying, "Foul!" right now. Plenty of third-party libraries can make this code more straightforward — JDOMand dom4j are two popular ones. But none of the Java libraries comes close to the simplicity of using a Groovy MarkupBuilder

Page 26: Groovy Xml processing

Comparing Java and Groovy XML generation

def xml = new groovy.xml.MarkupBuilder()

xml.langs(type:"current"){

language("Java")

language("Groovy")

language("JavaScript")

}

That’s it !!

• we are back to the nearly 1:1 ratio of code to XML

• It's almost like a DSL for building XML, thanks to Groovy MOPping

Page 27: Groovy Xml processing

Metaobject Protocol (MOP)

• Metaprogramming means writing programs that manipulate programs,includingthemselves

• In Groovy, we can use MOP to invoke methods dynamically and synthesize classes and methods on the fly. This can give us the feeling that our object favorably changed its class.

Page 28: Groovy Xml processing

Metaobject Protocol (MOP)

• The Java language is static: the Java compiler ensures that all methods exist before you can call them

• Groovy's Builder demonstrates that one language's bug is another language's feature.

• The API docs for MarkupBuilder, contains no langs() method , language() method, or any other element name.

• Luckily, Groovy can catch these calls to methods that don't exist and do something productive with them. In the case of a MarkupBuilder, it takes the phantom method calls and generates well-formed XML.

Page 29: Groovy Xml processing

GStringIt's Not What You Think!

• Snapshot From the Groovy API Documentaion

Nice suggestion Gosnell !

Page 30: Groovy Xml processing

GString

• We can use GString’s ability to embed expressions into a string, along with Groovy’s

• facility for creating multiline strings. This facility is useful for creating small

• XML fragments that we may need in code and tests.

Page 31: Groovy Xml processing

langs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy']

content = ''

langs.each { language, author ->

fragment = """

<language name="${language}">

<author>${author}</author>

</language>

"""

content += fragment

}

xml = "<languages>${content}</languages>"

println xml

Page 32: Groovy Xml processing

Downside

• Only works for the small fragments of XML .

• The preferred approach in Groovy applications is to use Builders. We don’t have to mess with string manipulation.

Page 33: Groovy Xml processing

MarkupBuilderdef sw = new StringWriter()

def html = new groovy.xml.MarkupBuilder(sw)

html.html{

head{

title("Links")

}

body{

h1("Here are my HTML bookmarks")

table(border:1){

tr{

th("what")

th("where")

}

tr{

td("Groovy Articles")

td{

a(href:"http://ibm.com/developerworks", "DeveloperWorks")

}

}

}

}

}

def f = new File("index.html")

f.write(sw.toString())

Page 34: Groovy Xml processing

MarkupBuilderoutput:

<html>

<head>

<title>Links</title>

</head>

<body>

<h1>Here are my HTML bookmarks</h1>

<table border='1'>

<tr>

<th>what</th>

<th>where</th>

</tr>

<tr>

<td>Groovy Articles</td>

<td>

<a href='http://ibm.com/developerworks'>DeveloperWorks</a>

</td>

</tr>

</table>

</body>

</html>

Page 35: Groovy Xml processing

DownSide

• For Large XML documents, it’s not memory efficient

• Miss some XML structures like namespaces and processing instructions and comments

• For these reasons , streaming markup builder should be used .

Page 36: Groovy Xml processing

StreamingMarkupBuilderlangs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy']

xmlDocument = new groovy.xml.StreamingMarkupBuilder().bind {

mkp.xmlDeclaration()

mkp.declareNamespace(computer: "Computer")

languages {

comment << "Created using StreamingMarkupBuilder"

langs.each { key, value ->

computer.language(name: key) {

author (value)

}

}

}

}

println xmlDocument

Page 37: Groovy Xml processing

StreamingMarkupBuilder

<?xml version="1.0"?>

<languages xmlns:computer='Computer'>

<!--Created using StreamingMarkupBuilder-->

<computer:language name='C++'>

<author>Stroustrup</author>

</computer:language>

<computer:language name='Java'>

<author>Gosling</author>

</computer:language>

<computer:language name='Lisp'>

<author>McCarthy</author>

</computer:language>

</languages>

Page 38: Groovy Xml processing

Markup builder vs streaming markup builder

• MarkupBuilder creates a representation of the document in memory which is then written out to which ever stream is designated.

• StreamingMarkupBuilder only evaluates the document ondemand where the demand is driven by the writer requesting the next item.

• The inversion of control means that for StreamingMarkupBuilder, the document is never actually represented in memory only the program that generates the document is.

Page 39: Groovy Xml processing

Wait, XML processing is not OXM

• OXM , Object/XML Mapping , is the act of converting an XML document to and from an object.

• It’s important to work with java objects instead of their XML ,e.g. SOAP web service request/response

• In web services we need to send/receive the exact field types , thus the dynamic nature of groovy hasn’t much to offer in this static direct field mappings ,instead the Java Architecture for XML Binding (JAXB) or other library (e.g. JiBX ,..)should be used

Page 40: Groovy Xml processing

XML using Groovy Conclusions

• What we covered can be compared with JAXP, The Java API for XML Processing (JAXP)

• Groovy is by far simplifies the processing and results in a much shorter and more expressive code

• GPath allows to traverse the XML/POGO in a similar way

• Builders and parsers use Mooping and dynamic groovy features to provide a DSL like for XML processing

• Groovy is strongly recommended for XML processing , especially if we are about to write a new code .

Page 41: Groovy Xml processing

References

• InfoSet :

• http://www.informit.com/library/content.aspx?b=STY_XML_21days&seqNum=40