DOM and SAX
-
Upload
jussi-pohjolainen -
Category
Technology
-
view
17.920 -
download
2
description
Transcript of DOM and SAX
DOM and SAX
Jussi PohjolainenTAMK University of Applied Sciences
DOM and SAX
• DOM and SAX– Platform and language-independent APIs for
manipulating or reading XML-documents• API: Application Programming Interface, set of
functions, procedures, methods, classes and interfaces
• DOM and SAX is implemented in most programming languages: Java, PHP..
Differences between DOM and SAX
DOM SAX
Standardization W3C Recommendation No formal specification
Manipulation Reading and Writing (manipulation)
Only Reading
Memory Consumption Depends on the size of the source xml-file, can be large
Very low
XML handling Tree-based Event-based
SAX
Overview of SAX
• SAX: Simple API for XML• Originally a Java – only API– Nowdays SAX is supported in almost all
programming languages
• Uses a event-driven model• Quantity of memory usage is low• Only for reading xml-documents
Event-driven?
• SAX uses event-driven model for reading xml-documents
• The basic idea is, that SAX parser reads the xml-document "one line at a time".
• Handler functions reacts when finding elements and other parts of the xml-document.– When the parser finds starting tag, then a certain
function is called.. when the parser winds ending tag a certain function is called
Example (Wikipedia)<?xml version="1.0" encoding="UTF-8"?>
<RootElement param="value">
<FirstElement>
Some Text
</FirstElement>
<SecondElement param2="something">
Pre-Text <Inline>Inlined text</Inline> Post-text.
</SecondElement>
</RootElement>
Example (Wikipedia)
• XML Processing Instruction, named xml, with attributes version equal to "1.0" and encoding equal to "UTF-8"
• XML Element start, named RootElement, with an attribute param equal to "value"
• XML Element start, named FirstElement• XML Text node, with data equal to "Some Text" (note:
text processing, with regard to spaces, can be changed)
• XML Element end, named FirstElement• ....
PHP and SAX// Creates an XML Parser $xml_parser = xml_parser_create(); // Set up for reading xml_set_element_handler($xml_parser, "startElement", "endElement"); xml_set_character_data_handler($xml_parser, "characterData"); // Open XML file if (!($fp = fopen($file, "r"))) { die("could not open XML input"); } // Reading and Parsing xml-file while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser);
PHP and SAXfunction startElement($parser, $name, $attrs) { // Do something } function endElement($parser, $name) { // Do something } function characterData($parser, $data) { echo $data; }
Benefits of SAX
• Excellent API when just reading the contents of the XML – file
• Easy and clean API• Does not require much resources (mobile
devices!)
DOM
DOM
• The Document Object Model (DOM) is a platform- and language-independent standard object model for representing HTML or XML and related formats.
• W3C Recommendation• Can be used for manipulating XML –
documents• Different versions: DOM 1, DOM 2 and DOM 3
Basic Idea behind DOM
• API for manipulating XML – documents• DOM loads xml-document into memory and
creates a tree-model of the xml-data.– Can consume memory, if documents are large
Tree and Nodes
• Tree consists of nodes• Node can be– Element (Element)– Text (Text)– Attribute (Attr)– CDATA (CDATASection)– Comment (Comment)– Etc
Nodes and Relationships
• Node has references to it's– first child (firstChild)– last child (lastChild)– next sibling (nextSibling)– previous sibling (previousSibling)– parent (parentNode
Node's contents
• Some nodes have contents (nodeValue)– Attribute's value– Element's value (text)– Comment's value (text)– etc
Collections
• NodeList (List of nodes)– length– item ( index )
• NamedNodeMap (List of attributes)– getNamedItem( name )– item ( index )
Example using PHP DOM// Load the xml - document
$dom = new domDocument();
$dom->load("books.xml");
// NodeList of name-elements
$listOfNodes = $dom->getElementsByTagName("name");
// Browse all nodes
foreach($listOfNodes as $node)
{
print $node->nodeValue;
}
Example using PHP DOM// Load xml-document$dom = new domDocument(); $dom->load("books.xml");
// Create element <book></book> $book = $dom->createElement("book");
// create element <title>some contents</title> $title = $dom->createElement("title", $_GET['title']); // <book><title>some contents</title></book> $book->appendChild($title);
// Add the book under root element of "books.xml"$dom->documentElement->appendChild($book);
// save$dom->save("books.xml");
Removing element
$elements = $dom->getElementsByTagname("kirja"); $element = $elements->item(0); $children = $element->childNodes(); $child = $element->removeChild( $children->item(0) );
<kirjat> <kirja> <nimi>Tuntematon Sotilas</nimi> </kirja> <kirja> <nimi>Learn Java</nimi> </kirja> </kirjat>
PHP DOM and Encoding
• PHP DOM uses utf-8 internally• Everything you put into xml-document using
PHP DOM must be converted to utf-8.• utf8_encode(..), utf8_decode(...)