The .NET Framework offers many nice, simplified ways of programming, but its XML DOM implementation could still be further improved. For example, creating a new node is still a multistep process. First the new node needs to be created, then it needs to be set a value, and finally it needs to be inserted into the document. XML in .NET would greatly benefit from a wrapper class to hide some of the dirty work. This was the inspiration for XMLObject, a class that wraps some more-common operations and adds a few tricks of its own. In this article, we will provide an overview of its functionality, focusing more on things that have no easy equivalent using straight .NET Framework classes, as well as providing an overview of XPath, a core XML technology that provides the means to locate individual nodes within a document.
The most basic operations on an XML document are those that create, change, and retrieve values from nodes. These are grouped into three sets of functions in the XMLObject class. Those in the first set start with the word "Get," which retrieves values. The second set of functions starts with the word "Set," and generally takes three parameters: an XPath to the node at which to start operating, a list of node names separated by a "/", and finally, a value. These methods find the starting node, then look for the nodes specified by the second XML path, and create them if they are not there. They can therefore be used to change the value of a node without creating an ever-growing node tree. The third set of functions is prepended with "Create," and also generally take three parameters, but once these functions have navigated to their starting node, they always create all the nodes in the specified path, regardless of whether they exist or not. These methods are more suited to creating groups of nodes under a particular parent node. For example, the following code:
XMLObject o = new XMLObject(); o.SetXML("<XML><node/></XML>"); o.CreateNode("node", "subnode", "node1"); o.CreateNode("node", "subnode", "node2"); o.SetNode("node", "subnode", "node3");
results in this XML:
<?xml version="1.0" encoding="utf-16"?> <XML> <node> <subnode>node3</subnode> <subnode>node2</subnode> </node> </XML>
Special mention should be made of the SetDocumentRoot function. This function creates a virtual document, so that all future operations on the XMLObject regard the subset of the document identified by the path passed into it to be the entire document being dealt with. The syntax for passing paths into the Get/Set/Create functions requires that the path start with "/" if the root node name is specified. If "/" is not the first character, then the name of the current document root is prepended to the path. The exception to this rule is SetDocumentRoot itself, which demands that paths starting with "/" are relative to the true root of the document stored in the wrapper. This is so this function can navigate back out of the document as well as focusing on a subset for simplified operations. While this may sound complex, it is quite intuitive and obvious in practice.
Therefore, the following code:
o.SetDocumentRoot("node");
at the end of the code block above, results in the following document:
<?xml version="1.0" encoding="utf-16"?> <node> <subnode>node3</subnode> <subnode>node2</subnode> </node>
Note that the entire document is still in the wrapper, but all operations will act as if the subset returned is the full document until the document root is set to "". The exception is SaveAllXML, which always saves the entire document.
Encoded Nodes
As well as wrapping the methods provided by the DOM and providing some typed variations, the wrapper provides methods to Create/Get/Set and Compare encoded nodes. This is done via an array of delegates called StringEncoders, through a delegate called StringEncoder. The delegate looks like this:
public delegate string StringEncoder (string sEncode, bool bEncode);
The implementation of a function to use with this should accept the string, and return either a decoded or encoded string based on the value of bEncode. Within the XMLObject is a basic example that simply adds two to the key value of every character. These functions make it easy to store passwords and other sensitive data in an encoded manner, and with the CompareEncodedNode method, it is possible to compare data that has been entered against the database by encoding the data to compare, so that every instance of the sensitive data in memory is encoded at all times.
The StringEncoders property was a difficult design decision on the one hand, making it static would mean you could set the encoding once for all instances of the object; on the other hand, this meant that any change would affect all instances, possibly unintentionally. In the end, we did not make it static, and I'd recommend writing a helper function that sets the desired encoding functions, so it can be easily set whenever creating an XMLObject. You should use the System.Security.Cryptography namespace when looking to implement your own encryption methods.
Validation and Exceptions
XML can be validated a number of ways, the most common being a DTD or XSD. (If you're not familiar with these technologies, investigate some of the online tutorials at http://www.xml.org/.) An XSD in particular is compiled prior to being used for validation, so it makes sense to do this once and carry the required object within the wrapper for speed. Additionally, the wrapper can be set so that any operation that changes the document state will also perform validation against the XSD or DTD provided. The SetXSD function is used to pass in the XSD, then the AutoValidateDTD property is set to True when validation is required. This property can also be set via the SetXSD function.
The .NET Framework generally promotes the use of exceptions. Within the XMLObject you have three options, available through the ExceptionHandling property, and settable with the XMLObject.Exceptions enumeration. The three options are as follows:
- Normal: Simply passes exceptions back to the client function.
- None: Does not throw exceptions for anticipated errors; rather, stores the error state internally and sets a flag within the object.
- Custom: For anticipated exceptions, throws an XMLWrapperException, which wraps the original exception and provides some additional methods for accessing the information.
The XMLWrapperException object contains an enum that specifies the type of exception wrapped. The available values are None, XML, XSL, XSDSchema, XPath, and DTD. It is a poor design to simply catch all exceptions, so the wrapper only catches exceptions that we anticipate being possible through bad data, and so on. For example, if the entire system is out of memory, we cannot do anything about that, nor do we attempt to catch the resultant exception. If the ExceptionHandling is set to "None," then the GetErrorState method will return True, and reset itself internally if an error has occurred since the last time the function was called. The GetLastError function then returns the XMLWrapperException, which contains the details of the error caught. Note this if you don't bother to check often enough because only the last error is stored. This is similar to the way Win32 does things, and is not recommended, but it is possible that under some circumstances it could be useful, and so the facility is provided for use with caution.
For example, the following code shows how to make our object use custom exceptions, and then how this simplifies catching errors and displaying their values. Of course, this will still only catch errors that we have anticipated as being possible within the XMLObject due to bad input, and so on.
o.ExceptionHandling = XMLObject.Exceptions.Custom; try { o.CreateNode("/XML", "/deliberately invalid string", "test"); } catch (XMLWrapperException ex) { MessageBox.Show(ex.ToString(), ex.exceptionType.ToString() + " Exception"); }
XML Namespaces
Proper use of XML cannot occur without knowledge of XML namespaces, which function very similarly to C++/C# namespaces as a way of grouping together like objects. The XMLObject provides a Namespaces property, which can be modified using values in the XMLObject.namespaces enum. The values are set to powers of two so that they can easily be added and removed using logical operators. The enum has the values shown in Table 1.
The private SetNamespaces function is the place to change the code if you decide to add any other namespace values to the enum.
The following code will first set the object to use the xlink namespace only, and then add the dt and xsd namespaces. The final line removes the xlink namespace while preserving the other values.
o.Namespaces = XMLObject.namespaces.xlink; o.Namespaces |= XMLObject.namespaces.dt | XMLObject.namespaces.xsd; o.Namespaces &= ~XMLObject.namespaces.xlink;
One exciting thing that C# adds as standard in the language is regular expressions. The XMLObject constructor takes an optional bool as a parameter (actually, C# does not allow optional parameters; we have a parameterless constructor that calls the version that takes a bool, passing in False). This value determines whether the object incurs the additional overhead of maintaining two dictionaries, one to track node values, and one to track node names. To assist with this, the object's namespace also contains a number of helper functions, as C# unfortunately does not, to date, support generic programming, and therefore the provided containers are not strongly typed. Additionally, the XmlNodeList class has no methods for adding nodes, so we provide our own XMLNodeList class, which is what is returned by the RegEx functions. This class supports IEnumerable and IEnumerator interfaces, so it can be used with foreach, and is just as easy to use as the one provided in the XML namespace.
Two methods are provided, RegExNodesByPath and RegExNodesByValue. As the names suggest, one matches node paths, the other matches node values. If these functions are called when no string lists exist, they will be built prior to the main execution of these functions.
A number of static methods are provided, which in C# means they need to be scoped by the class name, not the name of an object. These include GetNodePath, GetText, and MakePretty (which formats XML with proper line breaks and tabs).
Putting together the RegEx and the static functions, the following example brings up two message boxes, both with "/XML/node/subnode" in the title, one with the message "node2," and the other "node3."
Regex re = new Regex("node[0-9]"); XMLNodeList nl = o.RegExNodesByValue(re); foreach (XmlNode n in nl) { MessageBox.Show(XMLObject.GetText(n), XMLObject.GetNodePath(n)); }
The XMLObject also provides a Compare function, which takes an XMLObject and three "out" parameter strings. The XMLObject that is used to call the function compares its contents with that of the object passed in, and sets the values of the three provided strings to XML reflecting the nodes added, changed, and removed between the two objects.
The object provides four methods for transforming XML with XSL, two that take a StreamReader to read XSL files from media, and two that take the XSL as a string. The difference between the two in each case is that one also takes an XslArgumentList, whereby it is possible to pass variables into an XSL stylesheet. XSL is a very powerful and exciting technology, and if you're not familiar with it, you should investigate the references provided. Additionally, the Windows Forms application included in the demo project provides shortcut keys for the most common XSL commands, making it easier to edit and test your own stylesheets using the object.
The object provides nearly 60 properties and methods, the bulk of which have not been covered here. Also available for download is the Windows Forms project, which allows the user to enter XML with parameters and then choose methods from a drop down list to see what they do. I have an ASP.NET web page at graus.dns2go.com/XMLWrapperDocs that allows you to both view documentation for the entire object and, for most functions, enter data into the site and see the resultant XML. The site makes extensive use of the XML comments generated by C# in combination with XSL stylesheets that run through the XMLObject class, so it can serve as a case study of use of the object.
XPath Queries
At the core of a lot of these functions is an XPath query. An XPath query is a little like a DOS path, except it does not have to be as literal. Most people have entered something like "d:\Win98\setup" into the DOS window, and an XPath can work in exactly the same way. The following XML document will form the basis of my explanation of XPath; all example paths will be explained by the value of the node they return from this document. This document represents a list of a couple of books that I have found invaluable in preparing this article.
<XML> <book isbn="1861005067"> <title>XSLT Programmers Reference</title> <author>Michael Kay</author> </book> <book isbn="0735616485"> <title>Inside C#</title> <author>Tom Archer</author> <sections> <section number="1">C# Class Fundamentals</section> <section number="2">Writing Code </section> <section number="3">Advanced C#</section> </sections> </book> <book isbn="0735614229"> <title>Applied Microsoft .NET Framework Programming</title> <author>Jeffrey Richter</author> </book> </XML>
The simplest type of XPath simply specifies the path to a node. For example, searching for /XML/book/title will return the three book titles. It is also possible to use a double /, which can stand for any number of steps. In other words, /XML//section will return the three sections of "Inside C#," and would return any other nodes named "section" within the document. Additional special characters are listed in Table 2.
Therefore, the XPath /XML//@isbn will return the three attribute nodes in the section elements, and /XML/book[2]/title will return only the title of the second book.
As well as specifying a path, an XPath can specify an axis. This means that you can specify what nodes you want to select from a certain point. The default is self(); in other words, only the nodes the path leads to are selected. The possible values are shown in Table 3 along with the result of placing the axis in question into the XPath /XML/book[2]/axis::*.
Finally, you can select the sort of nodes you want to match with a node test. The node tests will limit matches to only the node types named, and are comment(), node(), processing- instruction(), and text(). Also, nodename is shorthand for child::nodename, while @nodename is short for attribute::nodename.
In addition to the flexibility offered by selecting by axis and node type, it's possible to select using built-in functions. Of the node functions, the most commonly used functions are probably count(), last(), and position(). count() returns a node count at that point, last() returns the number of nodes in the current node list, and position() returns a position number. If we did not know how many sections were in "Inside C#," we could still return the last section node with /XML/book[2]/ sections/section[last()]. We could also list every section bar first with /XML/book[2]/sections/section[position() > 1], or even every odd numbered one with /XML/book[2]/sections/section[position() mod 2 > 0]. In fact, the last example is the way an XSL stylesheet would generate an HTML table with alternating row colors. Table 4 shows the available functions grouped by type.
Conclusion
Although this has been somewhat of a whirlwind tour of XPath, we hope it is clear that XPath offers a lot more flexibility than just the ability to specify a path as you may do in a filesystem. There are many web sites with further information on XPath, XSL, XSD, and DTDs. We welcome any comments or suggestions with regard to the XMLObject wrapper class, or questions in regard to this article. We can be reached via e-mail or through the XMLWrapper web site, at graus.dns2go.com/XMLWrapperDocs. w::d
Christian Graus has been programming since 1985. He is especially passionate about the C++ Standard Library, but is also interested in other languages, such as C#. He can be contacted at [email protected]. Matt Cole has been programming for six years. He now works at Dytech Solutions working mainly with ASP.NET and C# building n-tiered systems. He can be contacted at [email protected].