Post on 15-Jul-2015
Applied XML Programming for Microsoft .NETPART 2
XML ReadersIn the Microsoft .NET Framework, two distinct sets of classes provide for XML-driven
reading and writing operations
These classes are known globally as XML readers and writers. The base class for readers is
XmlReader, whereas XmlWriter provides the base programming interface for writers.
The Programming Interface of ReadersXmlReader is an abstract class available from the System.Xml namespace. It defines
the set of functionalities that an XML reader exposes to let developers access an XML
stream in a noncached, forward-only, read-only way.
An XML reader works on a read-only stream by jumping from one node to the next in a
forward-only direction. The XML reader maintains an internal pointer to the current
node and its attributes and text but has no notion of previous and next nodes. You can't
modify text or attributes, and you can move only forward from the current node. If you
are visiting attribute nodes, however, you can move back to the parent node or access
an attribute by index.
The specification for the XmlReader class recommends that any derived class should
check at least whether the XML source is well-formed and throw exceptions if an error
is encountered
XML exceptions are handled through the tailor-made XmlException class. The XMLReader class
specification does not say anything about XML validation.
An OOP Refresher1. In the .NET Framework, an interface is a container for a named collection of method,
property, and event definitions referred to as a contract An interface can be used as a
reference type, but it is not a creatable type.
2. A class is a container that can include data and function members (methods,
properties, events, operators, and constructors). Classes support inheritance from
other classes as well as from interfaces. Any class from which another class inherits is
called a base class.
An abstract class simply declares its members without providing any implementation.
Like interfaces, abstract classes are not creatable but can be used as reference types.
An abstract class differs from an interface in that it has a slightly richer set of internal
members (constructors, constants, and operators). Members of an abstract class can
be scoped as private, public, or protected, whereas members of an interface are
mostly public. In addition, child classes can implement multiple interfaces but can
inherit from only one class.
Parsing with the XmlTextReader ClassThe XmlTextReader class is designed to provide fast access to streams of XML data in
a forward-only and read-only manner. The reader verifies that the submitted XML is
well-formed. It also performs a quick check for correctness on the referenced DTD, if
one exists. In no case, though, does this reader validate against a schema or DTD. If
you need more functionality (for example, validation), you must resort to other reader
classes such as XmlNodeReader or XmlValidatingReader
An instance of the XmlTextReader class can be created in a number of ways and from
a variety of sources, including disk files, URLs, streams, and text readers. To process
an XML file, you start by instantiating the constructor, as shown here:
XmlTextReader reader = new XmlTextReader(file);
Accessing NodesThe following example shows how to use an XmlTextReader object to parse the
contents of an XML file and build the node layout. Let's begin by considering the
following XML data:
<platforms type="software">
<platform vendor="Microsoft">.NET</platform>
<platform vendor=""OpenSource="yes">Linux</platform>
<platform vendor="Microsoft">Win32</platform>
<platform vendor="Sun">Java</platform>
</platforms>
Character EncodingXML documents can contain an attribute to specify the encoding. Character encoding
provides a mapping between numeric indexes and corresponding characters that users
read from a document. The following declaration shows how to set the required
encoding for an XML document:
<?xml version="1.0" encoding="ISO-8859-5"?>The Encoding property of the XML reader returns the character encoding found in the document. The default encoding attribute is UTF-8 (UCS Transformation Format, 8 bits).
Accessing AttributesOf all the node types supplied in the .NET Framework, only Element, DocumentType,
and XmlDeclaration support attributes. To check whether a given node contains
attributes, use the HasAttributes Boolean property. The AttributeCount property returns
the number of attributes available for the current node
This next example demonstrates how to programmatically access any sequence of
attributes for a node and concatenate their names and values in a single string.
Consider the following XML fragment:
<employee id="1" lastname="Users" firstname="Joe" />
Attribute NormalizationThe W3C XML 1.0 Recommendation defines attribute normalization as the preliminary
process that an attribute value should be subjected to prior to being returned to the
application. The normalization process can be summarized in a few basic rules:
1. Any referenced character (for example, ) is expanded.
2. any white space character (blanks, carriage returns, linefeeds, and tabs) is replaced with a blank (ASCII 0x20) character.
3. Any leading or trailing sequence of blanks is discarded.
4. Any other sequence of blanks is replaced with a single blank character (ASCII 0x20).
The XmlTextReader parser lets you toggle the normalization process on and off
through the Normalization Boolean property. By default, the Normalization property is
set to false, meaning that attribute values are not normalized. If the normalization
process is disabled, an attribute can contain any character, including characters in the
� to  range, which are normally considered invalid and not permitted. When
normalization is on, using any of those character entities results in an XmlException
being thrown.
Parsing XML FragmentsThe XmlTextReader class provides the basic set of functionalities to process any XML
data coming from a disk file, a stream, or a URL. This kind of reader works sequentially,
reading one node after the next, and does not deliberately provide any ad hoc search
function to parse only a particular subtree
In the .NET Framework, to process only fragments of XML data, excerpted from a
variety of sources, you can take one of two routes. You can initialize the text reader
with the XML string that represents the fragment, or you can use another, more
specific, reader class—the XmlNodeReader class.
Parsing Well-Formed XML StringsThe trick to initializing a text reader from a string is all in packing the string into a
StringReader object. One of the XmlTextReader constructors looks like this:
public XmlTextReader(TextReader);
TextReader is an abstract class that represents a .NET reader object capable of
reading a sequence of characters no matter where they are physically stored. The
StringReader class inherits from TextReader and simply makes itself capable of
reading the bytes of an in-memory string. Because StringReader derives from
TextReader, you can safely use it to initialize XmlTextReader.
string xmlText = "…";
StringReader strReader = new StringReader(xmlText);
XmlTextReader reader = new XmlTextReader(strReader);
Writing a Custom XML ReaderWe have one more topic to consider on the subject of XML readers, which opens up a
whole new world of opportunities: creating customized XML readers. An XML reader
class is merely a programming interface for reading data that appears to be XML. The
XmlTextReader class represents the simplest and the fastest of all possible XML
readers but—and this is what really matters—it is just one reader. Its inherent simplicity
and effectiveness stems from two key points. First, the class operates as a read-only,
forward-only, nonvalidating parser. Second, the class is assumed to work on native
XML data. It has no need, and no subsequent overhead, to map input data internally to
XML data structures
Mapping Data Structures to XML Nodes
INI files have been a fundamental part of Microsoft Windows applications.
Read and Write the content of an INI file using file and I/O classes, or you might resort to making
calls to the underlying Win32 unmanaged platform.
Mapping CSV Files to XML1. A CSV file consists of one or more lines of text. Each line contains strings of text separated by
commas. Each line of a CSV file can be naturally associated with a database row in which each token maps to a column.
2. Likewise, a line in a CSV file can also be correlated to an XML node with as many attributes as the comma-separated tokens. The following code shows a typical CSV file:
Davolio,Nancy,Sales Representative
Fuller,Andrew,Sales Manager
Leverling,Janet,Sales Representative
Exposing Data as XMLIn a true XML reader, methods like ReadInnerXml and ReadOuterXml serve the
purpose of returning the XML source code embedded in, or sitting around, the currently
selected node. For a CSV reader, of course, there is no XML source code to return.
You might want to return an XML description of the current CSV node, however.
Assuming that this is how you want the CSV reader to work, the ReadInnerXml method
for a CSV XML reader can only return either null or the empty string, as shown in the
following code. By design, in fact, each element has an empty body
public override string ReadInnerXml()
{
if (m_readState != ReadState.Interactive)
return null;
return String.Empty;
}
In contrast, the outer XML text for a CSV node can be designed like a node with a
sequence of attributes, as follows:
<row attr1="…" attr2="…" />
The source code to obtain this output is shown here:
public override string ReadOuterXml()
{
if (m_readState != ReadState.Interactive)
return null;
StringBuilder sb = new StringBuilder("");
sb.Append("<");
sb.Append(CsvRowName);
sb.Append(" ");
foreach(object o in m_tokenValues)
{
sb.Append(o);
sb.Append("=");
sb.Append(QuoteChar);
sb.Append(m_tokenValues[o.ToString()].ToString());
sb.Append(QuoteChar);
sb.Append("");
}
sb.Append("/>");
return sb.ToString();
}
The CSV XML Reader in Action
In this section, you'll see the CSV XML reader in action and learn how to instantiate and
use it in the context of a realistic application. In particular, I'll show you how to load the
contents of a CSV file into a DataTable object to appear in a Windows Forms DataGrid
control
You start by instantiating the reader object, passing the name of the CSV file to be
processed and a Boolean flag. The Boolean value indicates whether the values in the
first row of the CSV source file must be read as the column names or as data. If you
pass false, the row is considered a plain data row and each column name is formed by
a prefix and a progressive number. You control the prefix through the CsvColumnPrefix
property.
// Instantiate the reader on a CSV file
XmlCsvReader reader;
reader = new XmlCsvReader("employees.csv", hasHeader.Checked);
reader.CsvColumnPrefix = colPrefix.Text;
reader.Read();
// Define the target table
DataTable dt = new DataTable();
for(int i=0; i<reader.AttributeCount; i++)
{
reader.MoveToAttribute(i);
DataColumn col = new DataColumn(reader.Name,
typeof(string));
dt.Columns.Add(col);
}
reader.MoveToElement();
Before you load data rows into the table and populate the data grid, you must define the
layout of the target DataTable object. To do that, you must scroll the attributes of one
row—typically the first row. You move to each of the attributes in the first row and
create a DataColumn object with the same name as the attribute and specified as a
string type. You then add the DataColumn object to the DataTable object and continue
until you've added all the attributes. The MoveToElement call restores the focus to the
CSV row element.
// Loop through the rows and populate a DataTable
do
{
DataRow row = dt.NewRow();
for(int i=0; i<reader.AttributeCount; i++)
{
row[i] = reader[i].ToString();
}
dt.Rows.Add(row);
}
while (reader.Read());
reader.Close();
// Bind the table to the grid
dataGrid1.DataSource = dt;
Next you walk through the various data rows of the CSV file and create a new DataRow
object for each. The row will then be filled in with the values of the attributes. Because
the reader is already positioned in the first row when the loop begins, you must use a
do…while loop instead of the perhaps more natural while loop. At the end of the loop,
you simply close the reader and bind the freshly created DataTable object to the
DataGrid control for display.
The CSV XML reader now reads the column names from the first row in the
source file.
Readers and XML ReadersTo cap off our examination of XML readers and custom readers, let's spend a few
moments looking at the difference between an XML reader and a generic reader for a
non-XML data structure.
A reader is a basic and key concept in the .NET Framework. Several different types of
reader classes do exist in the .NET Framework: binary readers, text readers, XML
readers, and database readers, just to name a few. Of course, you can add your own
data-specific readers to the list. But that's the point. How would you write your new
reader? The simplest answer would be, you write the reader by inheriting from one of
the existing reader classes
Further ReadingAn article that summarizes in a few pages the essence of XML readers and writers was written for the January 2001 issue of MSDN Magazine. Although based on a beta version of .NET, it is still of significant value and can be found at http://msdn.microsoft.com/msdnmag/issues/01/01/xml/xml.asp. Fresh, up-to-date, and
handy information about XML in the .NET world (and other topics) can be found monthly in the "Extreme XML" column on MSDN Online.
If you need to know more about ADO.NET and its integration with XML, you can check out my book Building Web Solutions with ASP.NET and ADO.NET (Microsoft Press, 2002) or David Sceppa's book Microsoft ADO.NET (Core Reference) (Microsoft Press, 2002).
XML extensions for SQL Server 2000 are described in detail in Chapter 2. Finally, for a very informative article about the development of XML custom readers, see "Implementing XmlReader Classes for Non-XML Data Structures and Formats,“ available on MSDN at http://msdn.microsoft.com/library/enus/dndotnet/html/Custxmlread.asp.