Groovy Xml processing
-
Upload
mohamed-fawy -
Category
Software
-
view
418 -
download
3
Transcript of Groovy Xml processing
XML Processing
The Groovy approach
XML is like a human: it starts out cute when it’s small and gets annoyingwhen it becomes bigger.
AGENDA•XML , Should we groovy it ?•Parsing XML
•Comparing Java and Groovy XML parsing•DOM Category
•Downsides•What’s GPath•Using XMLParser
•Downsides•Using XMLSlurper•XMLPArser VS XMLSluper•So here is my Advice
•Creating XML•Comparing Java and Groovy XML generation•Gstring , It's Not What You Think!•MarkupBuilder•StreamingMarkupBuilder•Comparing builders
•Wait, XML processing is not OXM•XML using Groovy Conclusion
XML , Should we groovy it ?
• Groovy does not force us to duplicate our efforts .
• Use the Java-based approaches as needed specially for legacy XML processing code.
• If we’re creating a new code to process XML, though, we should use Groovy facilities.
here is why ?
Sample XML
<langs type="current">
<language>Java</language>
<language>Groovy</language>
<language>JavaScript</language>
</langs>
• Parsing this trivial XML document is decidedly nontrivial in the Java language , 30 LOC !!
import org.xml.sax.SAXException;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.IOException;
public class ParseXml {
public static void main(String[] args) {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("src/languages.xml");
//print the "type" attribute
Element langs = doc.getDocumentElement();
System.out.println("type = " + langs.getAttribute("type"));
//print the "language" elements
NodeList list = langs.getElementsByTagName("language");
for(int i = 0 ; i < list.getLength();i++) {
Element language = (Element) list.item(i);
System.out.println(language.getTextContent());
}
}catch(ParserConfigurationException pce) {
pce.printStackTrace();
}catch(SAXException se) {
se.printStackTrace();
}catch(IOException ioe) {
ioe.printStackTrace();
}
}
}
Groovy Code
def langs = new XmlParser().parse("languages.xml")println "type = ${langs.attribute("type")}"langs.language.each{println it.text()
}
//output:type = currentJavaGroovyJavaScript
Groovy 1-0 Java
• Groovy code is significantly shorter than the equivalent Java code
• Far more expressive, Writing langs.language.each, feels like working directly with the XML, it’s not like Java , thanks to the Dynamic nature of groovy and GPath .
Dom Category
• We can use Groovy categories to define dynamic methods on classes ( borrowed from objectiveC )
• Groovy provides a category for working with the Document Object Model (DOM), by adding convenience methods.
• DOMCategory :navigate the DOM structure the DOM API , with the convenience of GPathqueries
What’s GPath?
• Much like how XPath helps navigate the hierarchy of an XML document, but Gpath allows to navigate the hierarchy of objects(POJO/POGO) and XML using Dot notation .
• Ex: car.engine.powerXml :<car year=“20><engine>
<power/></engine></ car >POGO/POJO: Car.getEngine().getPower()
we can access a year attribute of a car using car.'@year' (or car.@year).
For more info : http://groovy.codehaus.org/GPath
Sample XML<languages>
<language name="C++">
<author>Stroustrup</author>
</language>
<language name="Java">
<author>Gosling</author>
</language>
<language name="Lisp">
<author>McCarthy</author>
</language>
<language name="Modula-2">
<author>Wirth</author>
</language>
<language name="Oberon-2">
<author>Wirth</author>
</language>
<language name="Pascal">
<author>Wirth</author>
</language>
</languages>
document = groovy.xml.DOMBuilder.parse(new FileReader('languages.xml'))
rootElement = document.documentElement
use(groovy.xml.dom.DOMCategory) {
println "Languages and authors"
languages = rootElement.language
languages.each { language ->
println "${language.'@name'} authored by ${language.author[0].text()}"
}
def languagesByAuthor = { authorName ->
languages.findAll { it.author[0].text() == authorName }.collect {
it.'@name' }.join(', ')
}
println "Languages by Wirth:" + languagesByAuthor('Wirth')
}
DOM Category
Languages and authorsC++ authored by StroustrupJava authored by GoslingLisp authored by McCarthyModula-2 authored by WirthOberon-2 authored by WirthPascal authored by WirthLanguages by Wirth:Modula-2, Oberon-2, Pascal
Output
Downside
• one restriction is that we need to place code in a (use)Block
XMLParser
• The class groovy.util.XMLParser exploits groovy’s dynamic typing and metaprogramming capabilities.
• The code is much like the example we saw in Using DOMCategory, without the use block
• XMLParser has added the convenience of iterators to the elements, so we can navigate easily using methods such as each(), collect(), and find().
languages = new XmlParser().parse('languages.xml')
println "Languages and authors"
languages.each {
println "${it.@name} authored by ${it.author[0].text()}"
}
def languagesByAuthor = { authorName ->
languages.findAll { it.author[0].text() == authorName }.collect {it.@name }.join(', ')
}
println "Languages by Wirth:" + languagesByAuthor('Wirth')
XMLParser
Downside
• It does not preserve the XML InfoSet1, and it ignores the XML comments and processing instructions in documents.
• For large document sizes, the memory usage of XMLParser might become prohibitive.
XMLSlurper
Same Code as XMLParser
languages = new XmlSlurper().parse('languages.xml')
println "Languages and authors"
languages.language.each {
println "${it.@name} authored by ${it.author[0].text()}"
}
def languagesByAuthor = { authorName ->
languages.language.findAll { it.author[0].text() == authorName }.collect {
it.@name }.join(', ')
}
println "Languages by Wirth:" + languagesByAuthor('Wirth')
XMLSluper
• Name Spaces <languages xmlns:computer="Computer" xmlns:natural="Natural">
<computer:language name="Java"/>
<computer:language name="Groovy"/>
<computer:language name="Erlang"/>
<natural:language name="English"/>
<natural:language name="German"/>
<natural:language name="French"/>
</languages>
XMLSluperlanguages = new XmlSlurper().parse(
'computerAndNaturalLanguages.xml').declareNamespace(human: 'Natural')
print "Languages: "
println languages.language.collect { it.@name }.join(', ')
print "Natural languages: "
println languages.'human:language'.collect { it.@name }.join(', ')
Output :
Languages: Java, Groovy, Erlang, English, German, French
Natural languages: English, German, French
XMLParser VS XMLSluper
• The difference is that the parser structure is evaluated only once, the slurper paths may be evaluated on demand. On demand can be read as "more memory efficient but slower”.
• Ultimatively it depends how many paths/requests – want only to know the value of an attribute in a
certain part of the XML and then be done with it:• XmlParser : process all Nodes , a lot of objects will be
created, memory and CPU spend• XmlSlurper: will not create the extra objects
– If you need all parts of the document anyway, the slurper looses the advantage and will be slower
XMLParser VS XMLSluper
• Both can do transforms on the document, but the slurper assumes it being a constant and thus you would have to first write the changes out and create a new slurper to read the new xml in. The parser supports seeing the changes right away.
Creating XML
• Again Groovy doesn’t force us to use it , We can use the full power of Java APIs based XML processor, such as Xerces with groovy as well
Comparing Java and Groovy XML generation
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.StringWriter;
public class CreateXml {
public static void main(String[] args) {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.newDocument();
Element langs = doc.createElement("langs");
langs.setAttribute("type", "current");
doc.appendChild(langs);
Element language1 = doc.createElement("language");
Text text1 = doc.createTextNode("Java");
language1.appendChild(text1);
langs.appendChild(language1);
Element language2 = doc.createElement("language");
Text text2 = doc.createTextNode("Groovy");
language2.appendChild(text2);
langs.appendChild(language2);
// Output the XML
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
StreamResult sr = new StreamResult(sw);
DOMSource source = new DOMSource(doc);
transformer.transform(source, sr);
String xmlString = sw.toString();
System.out.println(xmlString);
}catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
}
}
}
Comparing Java and Groovy XML generation
• I know that some of you are crying, "Foul!" right now. Plenty of third-party libraries can make this code more straightforward — JDOMand dom4j are two popular ones. But none of the Java libraries comes close to the simplicity of using a Groovy MarkupBuilder
Comparing Java and Groovy XML generation
def xml = new groovy.xml.MarkupBuilder()
xml.langs(type:"current"){
language("Java")
language("Groovy")
language("JavaScript")
}
That’s it !!
• we are back to the nearly 1:1 ratio of code to XML
• It's almost like a DSL for building XML, thanks to Groovy MOPping
Metaobject Protocol (MOP)
• Metaprogramming means writing programs that manipulate programs,includingthemselves
• In Groovy, we can use MOP to invoke methods dynamically and synthesize classes and methods on the fly. This can give us the feeling that our object favorably changed its class.
Metaobject Protocol (MOP)
• The Java language is static: the Java compiler ensures that all methods exist before you can call them
• Groovy's Builder demonstrates that one language's bug is another language's feature.
• The API docs for MarkupBuilder, contains no langs() method , language() method, or any other element name.
• Luckily, Groovy can catch these calls to methods that don't exist and do something productive with them. In the case of a MarkupBuilder, it takes the phantom method calls and generates well-formed XML.
GStringIt's Not What You Think!
• Snapshot From the Groovy API Documentaion
Nice suggestion Gosnell !
GString
• We can use GString’s ability to embed expressions into a string, along with Groovy’s
• facility for creating multiline strings. This facility is useful for creating small
• XML fragments that we may need in code and tests.
langs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy']
content = ''
langs.each { language, author ->
fragment = """
<language name="${language}">
<author>${author}</author>
</language>
"""
content += fragment
}
xml = "<languages>${content}</languages>"
println xml
Downside
• Only works for the small fragments of XML .
• The preferred approach in Groovy applications is to use Builders. We don’t have to mess with string manipulation.
MarkupBuilderdef sw = new StringWriter()
def html = new groovy.xml.MarkupBuilder(sw)
html.html{
head{
title("Links")
}
body{
h1("Here are my HTML bookmarks")
table(border:1){
tr{
th("what")
th("where")
}
tr{
td("Groovy Articles")
td{
a(href:"http://ibm.com/developerworks", "DeveloperWorks")
}
}
}
}
}
def f = new File("index.html")
f.write(sw.toString())
MarkupBuilderoutput:
<html>
<head>
<title>Links</title>
</head>
<body>
<h1>Here are my HTML bookmarks</h1>
<table border='1'>
<tr>
<th>what</th>
<th>where</th>
</tr>
<tr>
<td>Groovy Articles</td>
<td>
<a href='http://ibm.com/developerworks'>DeveloperWorks</a>
</td>
</tr>
</table>
</body>
</html>
DownSide
• For Large XML documents, it’s not memory efficient
• Miss some XML structures like namespaces and processing instructions and comments
• For these reasons , streaming markup builder should be used .
StreamingMarkupBuilderlangs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy']
xmlDocument = new groovy.xml.StreamingMarkupBuilder().bind {
mkp.xmlDeclaration()
mkp.declareNamespace(computer: "Computer")
languages {
comment << "Created using StreamingMarkupBuilder"
langs.each { key, value ->
computer.language(name: key) {
author (value)
}
}
}
}
println xmlDocument
StreamingMarkupBuilder
<?xml version="1.0"?>
<languages xmlns:computer='Computer'>
<!--Created using StreamingMarkupBuilder-->
<computer:language name='C++'>
<author>Stroustrup</author>
</computer:language>
<computer:language name='Java'>
<author>Gosling</author>
</computer:language>
<computer:language name='Lisp'>
<author>McCarthy</author>
</computer:language>
</languages>
Markup builder vs streaming markup builder
• MarkupBuilder creates a representation of the document in memory which is then written out to which ever stream is designated.
• StreamingMarkupBuilder only evaluates the document ondemand where the demand is driven by the writer requesting the next item.
• The inversion of control means that for StreamingMarkupBuilder, the document is never actually represented in memory only the program that generates the document is.
Wait, XML processing is not OXM
• OXM , Object/XML Mapping , is the act of converting an XML document to and from an object.
• It’s important to work with java objects instead of their XML ,e.g. SOAP web service request/response
• In web services we need to send/receive the exact field types , thus the dynamic nature of groovy hasn’t much to offer in this static direct field mappings ,instead the Java Architecture for XML Binding (JAXB) or other library (e.g. JiBX ,..)should be used
XML using Groovy Conclusions
• What we covered can be compared with JAXP, The Java API for XML Processing (JAXP)
• Groovy is by far simplifies the processing and results in a much shorter and more expressive code
• GPath allows to traverse the XML/POGO in a similar way
• Builders and parsers use Mooping and dynamic groovy features to provide a DSL like for XML processing
• Groovy is strongly recommended for XML processing , especially if we are about to write a new code .
References
• InfoSet :
• http://www.informit.com/library/content.aspx?b=STY_XML_21days&seqNum=40