Groovy Xml processing

XML Processing

The Groovy approach

XML is like a human: it starts out cute when it’s small and gets annoyingwhen it becomes bigger.

AGENDA•XML , Should we groovy it ?•Parsing XML

•Comparing Java and Groovy XML parsing•DOM Category

•Downsides•What’s GPath•Using XMLParser

•Downsides•Using XMLSlurper•XMLPArser VS XMLSluper•So here is my Advice

•Creating XML•Comparing Java and Groovy XML generation•Gstring , It's Not What You Think!•MarkupBuilder•StreamingMarkupBuilder•Comparing builders

•Wait, XML processing is not OXM•XML using Groovy Conclusion

XML , Should we groovy it ?

• Groovy does not force us to duplicate our efforts .

• Use the Java-based approaches as needed specially for legacy XML processing code.

• If we’re creating a new code to process XML, though, we should use Groovy facilities.

here is why ?

Sample XML

<langs type="current">

<language>Java</language>

<language>Groovy</language>

<language>JavaScript</language>

</langs>

• Parsing this trivial XML document is decidedly nontrivial in the Java language , 30 LOC !!

import org.xml.sax.SAXException;

import org.w3c.dom.*;

import javax.xml.parsers.*;

import java.io.IOException;

public class ParseXml {

public static void main(String[] args) {

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

try {

DocumentBuilder db = dbf.newDocumentBuilder();

Document doc = db.parse("src/languages.xml");

//print the "type" attribute

Element langs = doc.getDocumentElement();

System.out.println("type = " + langs.getAttribute("type"));

//print the "language" elements

NodeList list = langs.getElementsByTagName("language");

for(int i = 0 ; i < list.getLength();i++) {

Element language = (Element) list.item(i);

System.out.println(language.getTextContent());

}

}catch(ParserConfigurationException pce) {

pce.printStackTrace();

}catch(SAXException se) {

se.printStackTrace();

}catch(IOException ioe) {

ioe.printStackTrace();

}

}

}

Groovy Code

def langs = new XmlParser().parse("languages.xml")println "type = ${langs.attribute("type")}"langs.language.each{println it.text()

}

//output:type = currentJavaGroovyJavaScript

Groovy 1-0 Java

• Groovy code is significantly shorter than the equivalent Java code

• Far more expressive, Writing langs.language.each, feels like working directly with the XML, it’s not like Java , thanks to the Dynamic nature of groovy and GPath .

Dom Category

• We can use Groovy categories to define dynamic methods on classes ( borrowed from objectiveC )

• Groovy provides a category for working with the Document Object Model (DOM), by adding convenience methods.

• DOMCategory :navigate the DOM structure the DOM API , with the convenience of GPathqueries

What’s GPath?

• Much like how XPath helps navigate the hierarchy of an XML document, but Gpath allows to navigate the hierarchy of objects(POJO/POGO) and XML using Dot notation .

• Ex: car.engine.powerXml :<car year=“20><engine>

<power/></engine></ car >POGO/POJO: Car.getEngine().getPower()

we can access a year attribute of a car using car.'@year' (or car.@year).

For more info : http://groovy.codehaus.org/GPath

Sample XML<languages>

<language name="C++">

<author>Stroustrup</author>

</language>

<language name="Java">

<author>Gosling</author>

</language>

<language name="Lisp">

<author>McCarthy</author>

</language>

<language name="Modula-2">

<author>Wirth</author>

</language>

<language name="Oberon-2">


</language>

<language name="Pascal">


</language>

</languages>

document = groovy.xml.DOMBuilder.parse(new FileReader('languages.xml'))

rootElement = document.documentElement

use(groovy.xml.dom.DOMCategory) {

println "Languages and authors"

languages = rootElement.language

languages.each { language ->

println "${language.'@name'} authored by ${language.author[0].text()}"

}

def languagesByAuthor = { authorName ->

languages.findAll { it.author[0].text() == authorName }.collect {

it.'@name' }.join(', ')

}

println "Languages by Wirth:" + languagesByAuthor('Wirth')

}

DOM Category

Languages and authorsC++ authored by StroustrupJava authored by GoslingLisp authored by McCarthyModula-2 authored by WirthOberon-2 authored by WirthPascal authored by WirthLanguages by Wirth:Modula-2, Oberon-2, Pascal

Output

Downside

• one restriction is that we need to place code in a (use)Block

XMLParser

• The class groovy.util.XMLParser exploits groovy’s dynamic typing and metaprogramming capabilities.

• The code is much like the example we saw in Using DOMCategory, without the use block

• XMLParser has added the convenience of iterators to the elements, so we can navigate easily using methods such as each(), collect(), and find().

languages = new XmlParser().parse('languages.xml')


languages.each {

println "${it.@name} authored by ${it.author[0].text()}"

}


languages.findAll { it.author[0].text() == authorName }.collect {it.@name }.join(', ')

}


XMLParser

Downside

• It does not preserve the XML InfoSet1, and it ignores the XML comments and processing instructions in documents.

• For large document sizes, the memory usage of XMLParser might become prohibitive.

XMLSlurper

Same Code as XMLParser

languages = new XmlSlurper().parse('languages.xml')


languages.language.each {

println "${it.@name} authored by ${it.author[0].text()}"

}


languages.language.findAll { it.author[0].text() == authorName }.collect {

it.@name }.join(', ')

}


XMLSluper

• Name Spaces <languages xmlns:computer="Computer" xmlns:natural="Natural">

<computer:language name="Java"/>

<computer:language name="Groovy"/>

<computer:language name="Erlang"/>

<natural:language name="English"/>

<natural:language name="German"/>

<natural:language name="French"/>

</languages>

XMLSluperlanguages = new XmlSlurper().parse(

'computerAndNaturalLanguages.xml').declareNamespace(human: 'Natural')

print "Languages: "

println languages.language.collect { it.@name }.join(', ')

print "Natural languages: "

println languages.'human:language'.collect { it.@name }.join(', ')

Output :

Languages: Java, Groovy, Erlang, English, German, French

Natural languages: English, German, French

XMLParser VS XMLSluper

• The difference is that the parser structure is evaluated only once, the slurper paths may be evaluated on demand. On demand can be read as "more memory efficient but slower”.

• Ultimatively it depends how many paths/requests – want only to know the value of an attribute in a

certain part of the XML and then be done with it:• XmlParser : process all Nodes , a lot of objects will be

created, memory and CPU spend• XmlSlurper: will not create the extra objects

– If you need all parts of the document anyway, the slurper looses the advantage and will be slower

XMLParser VS XMLSluper

• Both can do transforms on the document, but the slurper assumes it being a constant and thus you would have to first write the changes out and create a new slurper to read the new xml in. The parser supports seeing the changes right away.

Creating XML

• Again Groovy doesn’t force us to use it , We can use the full power of Java APIs based XML processor, such as Xerces with groovy as well

Comparing Java and Groovy XML generation

import org.w3c.dom.*;

import javax.xml.parsers.*;

import javax.xml.transform.*;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;

import java.io.StringWriter;

public class CreateXml {

public static void main(String[] args) {

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

try {

DocumentBuilder db = dbf.newDocumentBuilder();

Document doc = db.newDocument();

Element langs = doc.createElement("langs");

langs.setAttribute("type", "current");

doc.appendChild(langs);

Element language1 = doc.createElement("language");

Text text1 = doc.createTextNode("Java");

language1.appendChild(text1);

langs.appendChild(language1);

Element language2 = doc.createElement("language");

Text text2 = doc.createTextNode("Groovy");

language2.appendChild(text2);

langs.appendChild(language2);

// Output the XML

TransformerFactory tf = TransformerFactory.newInstance();

Transformer transformer = tf.newTransformer();

transformer.setOutputProperty(OutputKeys.INDENT, "yes");

StringWriter sw = new StringWriter();

StreamResult sr = new StreamResult(sw);

DOMSource source = new DOMSource(doc);

transformer.transform(source, sr);

String xmlString = sw.toString();

System.out.println(xmlString);

}catch(ParserConfigurationException pce) {

pce.printStackTrace();

} catch (TransformerConfigurationException e) {

e.printStackTrace();

} catch (TransformerException e) {

e.printStackTrace();

}

}

}


• I know that some of you are crying, "Foul!" right now. Plenty of third-party libraries can make this code more straightforward — JDOMand dom4j are two popular ones. But none of the Java libraries comes close to the simplicity of using a Groovy MarkupBuilder


def xml = new groovy.xml.MarkupBuilder()

xml.langs(type:"current"){

language("Java")

language("Groovy")

language("JavaScript")

}

That’s it !!

• we are back to the nearly 1:1 ratio of code to XML

• It's almost like a DSL for building XML, thanks to Groovy MOPping

Metaobject Protocol (MOP)

• Metaprogramming means writing programs that manipulate programs,includingthemselves

• In Groovy, we can use MOP to invoke methods dynamically and synthesize classes and methods on the fly. This can give us the feeling that our object favorably changed its class.

Metaobject Protocol (MOP)

• The Java language is static: the Java compiler ensures that all methods exist before you can call them

• Groovy's Builder demonstrates that one language's bug is another language's feature.

• The API docs for MarkupBuilder, contains no langs() method , language() method, or any other element name.

• Luckily, Groovy can catch these calls to methods that don't exist and do something productive with them. In the case of a MarkupBuilder, it takes the phantom method calls and generates well-formed XML.

GStringIt's Not What You Think!

• Snapshot From the Groovy API Documentaion

Nice suggestion Gosnell !

GString

• We can use GString’s ability to embed expressions into a string, along with Groovy’s

• facility for creating multiline strings. This facility is useful for creating small

• XML fragments that we may need in code and tests.

langs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy']

content = ''

langs.each { language, author ->

fragment = """

<language name="${language}">

<author>${author}</author>

</language>

"""

content += fragment

}

xml = "<languages>${content}</languages>"

println xml

Downside

• Only works for the small fragments of XML .

• The preferred approach in Groovy applications is to use Builders. We don’t have to mess with string manipulation.

MarkupBuilderdef sw = new StringWriter()

def html = new groovy.xml.MarkupBuilder(sw)

html.html{

head{

title("Links")

}

body{

h1("Here are my HTML bookmarks")

table(border:1){

tr{

th("what")

th("where")

}

tr{

td("Groovy Articles")

td{

a(href:"http://ibm.com/developerworks", "DeveloperWorks")

}

}

}

}

}

def f = new File("index.html")

f.write(sw.toString())

MarkupBuilderoutput:

<html>

<head>

<title>Links</title>

</head>

<body>

<h1>Here are my HTML bookmarks</h1>

<table border='1'>

<tr>

<th>what</th>

<th>where</th>

</tr>

<tr>

<td>Groovy Articles</td>

<td>

<a href='http://ibm.com/developerworks'>DeveloperWorks</a>

</td>

</tr>

</table>

</body>

</html>

DownSide

• For Large XML documents, it’s not memory efficient

• Miss some XML structures like namespaces and processing instructions and comments

• For these reasons , streaming markup builder should be used .

StreamingMarkupBuilderlangs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy']

xmlDocument = new groovy.xml.StreamingMarkupBuilder().bind {

mkp.xmlDeclaration()

mkp.declareNamespace(computer: "Computer")

languages {

comment << "Created using StreamingMarkupBuilder"

langs.each { key, value ->

computer.language(name: key) {

author (value)

}

}

}

}

println xmlDocument

StreamingMarkupBuilder

<?xml version="1.0"?>

<languages xmlns:computer='Computer'>



<computer:language name='C++'>

<author>Stroustrup</author>

</computer:language>

<computer:language name='Java'>

<author>Gosling</author>


<computer:language name='Lisp'>

<author>McCarthy</author>


</languages>

Markup builder vs streaming markup builder

• MarkupBuilder creates a representation of the document in memory which is then written out to which ever stream is designated.

• StreamingMarkupBuilder only evaluates the document ondemand where the demand is driven by the writer requesting the next item.

• The inversion of control means that for StreamingMarkupBuilder, the document is never actually represented in memory only the program that generates the document is.

Wait, XML processing is not OXM

• OXM , Object/XML Mapping , is the act of converting an XML document to and from an object.

• It’s important to work with java objects instead of their XML ,e.g. SOAP web service request/response

• In web services we need to send/receive the exact field types , thus the dynamic nature of groovy hasn’t much to offer in this static direct field mappings ,instead the Java Architecture for XML Binding (JAXB) or other library (e.g. JiBX ,..)should be used

XML using Groovy Conclusions

• What we covered can be compared with JAXP, The Java API for XML Processing (JAXP)

• Groovy is by far simplifies the processing and results in a much shorter and more expressive code

• GPath allows to traverse the XML/POGO in a similar way

• Builders and parsers use Mooping and dynamic groovy features to provide a DSL like for XML processing

• Groovy is strongly recommended for XML processing , especially if we are about to write a new code .

References

• InfoSet :

• http://www.informit.com/library/content.aspx?b=STY_XML_21days&seqNum=40

Groovy Xml processing

Software

Transcript of Groovy Xml processing