Fabio Vazquez (Sao Paulo, Brazil) is a Software Development Engineer for the Microsoft Dynamics Ax Globalization Team at Microsoft. He is a former Microsoft Most Valuable Professional (MVP) in C# and VFP and a frequent speaker at software development conferences in Brazil, including the Microsoft PDC (Professional Developers Conference) and Tech-Ed. Fabio has two books published and has a great passion for agile development practices.
Introduction
Information has been available in different sources since long time ago. The Internet has played a singular role in making information ready anywhere, anytime. The inevitable side effect of all this availability is that data becomes unorganized, being stored in all kinds of formats and having to be accessed through lots of different protocols. Even in controlled environments such as an in-company LAN, great chances exist of data be scattered and not having a standard way to access them. However, each time more, we assist an increasingly number of information being available in XML format, being it through queries against RDBMS such as SQL Server and Oracle or through the consume of XML Web Services. In the early days of XML, its use was more restricted to data exchange between B2B solutions; and applications such as BizTalk Server started to emerge. These solutions fit more to middleware layers or even database layers, but stayed far from the user-interface services. But recently we have been seeing XML coming from Web Services or business components and this data arriving at the presentation layers of our applications. As more and more XML arrives at the presentation layers, we developers have to provide reliable and productive means of showing them to our users.
In this article we are going to discuss some of the techniques and tools we can use to present the contents of XML documents to the end users. As an example of an XML vocabulary used in the real world we are going to approach RSS. RSS has been gaining some popularity lately, specially because it is the base for Web Logs. Web Logs consists of news and contents periodically published by people that are becoming more and more used on the web today.
XML in the Real World
As XML has been evolving, different uses has been developed to it. Some vocabularies have emerged with great success, such as SVG (Scalable Vector Graphics - http://www.w3.org/TR/SVG11/), MathML (Mathematical Markup Language - http://www.w3.org/TR/REC-MathML/), VoiceXML (Voice Extensible Markup Language - http://www.w3.org/TR/voicexml20/) and others. Some of these vocabularies are “official”;, like the ones cited above, as they are supported by a comitee like W3C. But others emerge from independent initiatives and become very important. One of these vocabularies that came outside W3C and which will be used in this article is RSS. RSS is an XML vocabulary for describing news and related content from online sources. RSS enables the display of "channels". Channels are composed of items retrievable through URLs. RSS files are updated periodically and each item has a title, a link and a brief description. Usually these items are news headlines, but lately RSS has been used to describe entries in a personal log (also known as Web Log). People are starting to put their personal logs on Internet and using RSS to store them. Companies are also publishing their news this way.
Since RSS documents have contents that usually will be read by people, normally the web sites that publish news in RSS format also have an HTML version of them. Some specialized sites are available that aggregate news from various sources and organize them in categories. http://Moreover.com, http://newsisfree.com, www.Syndi8.com and http://MyRSS.com, for example, have a great catalog of news entries available.
Where Does RSS Come From?
RSS was first created by Netscape in 1999. Later, other companies assumed RSS and created new versions of the vocabulary. Nowadays, there are a number of different versions around, but two of them are the most used ones: version 0.91 and version 2.0. The version 0.91 is simpler in comparison with other versions. In this article, for the sake of simplicity, we are going to use RSS documents written in version 0.91. The following listing represents the RSS file we are going to use in some examples in this article. Save this content in a file named RSS.xml and reserve it to make the tests described later in this article.
<managingEditor>fabio@ilabore.com.br</managingEditor>
<webMaster>fabio@ilabore.com.br</webMaster>
<item>
<title>Visual FoxPro 8.0 Released</title>
<link>http://msdn.microsoft.com/vfoxpro</link>
<description>
Microsoft has relased the new version of Visual FoxPro, VFP 8.0 to MSDN subscribers.
The version for non-subscribers must be available by mid-March/2003.
</description>
</item>
<item>
<title>Happy Birthday XML!</title>
<link>http://www.w3.org/2003/02/xml-at-5.html</link>
<description>
The Extensible Markup Language completes five years of existance as a W3C recommendation
on 10 February 2003.
</description>
</item>
<item>
<title>Understanding SOAP</title>
<link>
http://msdn.microsoft.com/webservices/understanding/webservicebasics/default.aspx?pull=
/library/en-us//dnsoap/html/understandsoap.asp
</link>
<description>
Aaron Skonnard, the famous XML guru, talks about the SOAP protocol
from a pure XML perspective.
</description>
</item>
</channel>
</rss>
If you are curious about RSS, you can give it a try, accessing news from famous sources like CNET, Mono Project, MTV and others. See some links bellow:
As lots of UTMag readers are Visual FoxPro developers, here are some Web Logs from distinguished personalities of the Visual FoxPro community (actually, the only ones at the time of this writing, as far as I know):
With so many sources of interesting information available, it is imperative that we have a software tool capable of organize all that for us. Actually this king of software does exist, and they are called News Aggregators. News aggregators work pretty like News Readers, but they also have the capacity of reading and managing RSS sources. If you are interested in subscribing to some cool Web Logs you have found, don’t forget to evaluate these tools. Some very interesting ones are NewzCrawler and NewsGator. I usually use Syndirella, mostly because it is an open-source project written in C# and serves as a good studying material as well.
If you are interested in more information about RSS, refer to Mike Pilgrim’s excellent article at XML.Com.
This Tree is Not Very Nice
XML documents are pure text files; and as such, they don't have a good appearance when opened within text editors. Browsers like Internet Explorer, on the other hand, are able to show XML files in a more pleasant manner, but they are still very unfriendly for final users (and sometimes for developers too).
Figure 1: our sample XML file opened with Internet Explorer 6.0
We need a better way to show this XML content. One of the alternatives we have is to use CSS (Cascading Style Sheets). CSS is a W3C recommendations which allows the designer to redefine the behavior and appearance of HTML and XML elements. When used with HTML, CSS will be able to change colors, borders, fonts, spacing and mostly all the visual characteristics of the elements. When using XML, we don't have the pre-defined fixed tags we have in HTML; we will create our own tags (elements), and CSS allows us to define the complete appearance of each element in the document.
In our XML example, we can define that the <title> element, child of the <item> element, will look like a block with large fonts and a gray background; and the <description> element, also child of the <item> element, will appear with smaller fonts in blue color and will have an indentation of 20 pixels.
In order to do this, we have to write a CSS file with these styles defined. The CSS will have redefinitions for each element we want to change:
All we need to do now is to associate the XML file with the CSS. We can do this by using the XML Processing Instruction <?xml-stylesheet ... ?>. Processing Instructions do not have their content previously defined by the XML recommendation. They serve exactly to instruct the XML application (the web browser for example) on how to behave in some situations. It is a kind of “customizer” so to say.
Bellow are the relevant parts of our sample XML changed to include the <?xml-stylesheet .. ?> processing instruction:
<?xml version="1.0" ?>
<?xsl-stylesheet href="RSSStyle.css" type="text/css" ?>
<rss version="0.91">
<!-- Rest of the document omitted for clarity reasons. -->
</rss>
<?xml-stylesheet ?> has two important attributes: “href”, which defines the location of the CSS file and “type”, which defines the MIME type of the style sheet (text/css in this case). This means you can create different presentations for an ordinary XML file and have it shown by simply changing the “href” attribute of the <?xml-stylesheet ?> processing instruction.
The following figure shows the document as it is presented in Internet Explorer after the <?xml-stylesheet ?> processing instruction has been applied.
Figure 2: XML document with a CSS style sheet associated
Notice that the old tree aspect was replaced by a more friendly one. Each element redefined by the CSS file had its appearance changed and the browser rendered it appropriately.
Although we have something much more powerful right now, the use of CSS to style XML files has still lots of limitations. We can't, for example, change the order the elements appear because this is dependent of the physical order they have in the original XML file; we can't sort elements nor calculate values derived from the ones within the document (we couldn’t, for example count the number of <item> elements and put this information someplace in the output); Pure CSS doesn't allow either to embed new elements and/or attributes in order to get new formatting options. So, we need something still more flexible and robust to support our needs when presenting the content of XML files. Next, we will see some better options we have to present our XML content; but before that, lets briefly talk about an important technology that will be present in great part of the work we will do with XML: XPath.
XPath
XPath (XML Path Language) describes a syntax that allows us to access data within XML documents. The different parts of any arbitrary XML document are called nodes. XML nodes can be elements, attributes, Processing Instructions, CDATA sections, text, etc. XPath enables the selection and filtering of the various nodes within a document, much like SQL allows these kind of operations over relational tables. The XPath language uses a very familiar syntax which resembles the way we address files in a file system. In order to reference a file in a Windows based file system we use the backslash character (\) to separate the folder and file names. XPath also uses a character to separate the element and attribute names in an XML hierarchy, and this is the slash character (/). So, we could use the following valid XPath expression to access the element named "channel" which is a child of the "rss" element:
/rss/channel
Notice the use of the slash character to separate the element names. Notice also that another slash character was used in the beginning of the expression. This means that this XPath expression defines an absolute path, beginning at the root node. An XPath expression can also be relative to the node currently in context (the context node). This kind of expression does not use an absolute path, but a relative one. We can easily identify a relative path because it does not start with a slash character.
Actually, there are two syntaxes for XPath expressions: the abbreviated syntax (the easiest one) and the unabbreviated syntax (the not so easy one). In this article we are going to use only the abbreviated syntax. If you want more information about XPath syntaxes, please refer to the W3C Web site at http://www.w3.org/TR/xpath.
The following XPath expression addresses the attribute named “version”, which is part of the element “rss”, which is the document root element.
/rss/@version
The "@" character in XPath expressions indicate that the name that follows it is an attribute. All attributes used in XPath expressions that use the abbreviated sintax must be prefixed with an “@” character. This allows the XPath processor to distinguish which node is an element and which is an attribute.
Our Programming Language Can Help Us
One of the choices we have to overcome the limitations of CSS style sheets is to employ an XML parser to read the contents of the XML file and use our programming language of choice to format the output the way that best fits our needs. Using an XML parser we can extract exactly the elements and attributes we want and merge them with our output text. Pretty the same we do when we are retrieving data from relational data sources through ADO.NET, for example. Consider the following code that uses the XmlTextReader class from the .NET Framework to open our sample XML file and extracts some data from it:
private void Page_Load(object sender, System.EventArgs e)
{
// To hold the resulting HTML content
System.Text.StringBuilder sb = new System.Text.StringBuilder("");
// Defines an XmlTextReader objet to read the RSS feed from an URL
XmlTextReader reader = new XmlTextReader(Server.MapPath("rss.xml"));
reader.WhitespaceHandling = WhitespaceHandling.None;
// Now iterates through the XML and composes the Literar HTML output
reader.MoveToContent();
while ( reader.Read() )
{
switch (reader.LocalName)
{
// Element
//
case "channel":
{
while(reader.Read())
{
if (reader.LocalName == "title" && reader.NodeType == XmlNodeType.Element)
{
reader.Read();
sb.AppendFormat("<h1>{0}</h1>", reader.Value);
}
if (reader.LocalName == "link" && reader.NodeType == XmlNodeType.Element)
{
reader.Read();
sb.AppendFormat("<a href='{0}'>{0}</a>", reader.Value);
break;
}
}
break;
}
// Element
//
case "item":
{
while(reader.Read())
{
if (reader.LocalName == "title" && reader.NodeType == XmlNodeType.Element)
{
reader.Read();
sb.Append("<hr>");
sb.AppendFormat("<h2>{0}</h2>", reader.Value);
}
if (reader.LocalName == "description" && reader.NodeType == XmlNodeType.Element)
{
reader.Read();
sb.AppendFormat("<p>{0}</p>", reader.Value);
}
}
break;
}
}
}
Response.Write(sb.ToString());
reader = null;
}
The XmlTextReader class is part of the System.Xml namespace. This class can be used to read the different nodes of an XML document and process them as desired. XmlTextReader is different from the DOM model implemented by the XmlDocument class because it doesn't load all document into memory. Documents processed through the XmlDocument class put all the XML tree (all the nodes and their hierarchy) into memory and this can become slow in scenarios where large XML documents are used. The XmlTextReader class implements a “pull” model, where the nodes are transversed and discarded from memory as they are read.
So, this class provides a read-only, forward-only way of reading XML documents. This can be very efficient in some situations because you don't have all the overhead associated with loading the entire document into memory as we would have with the XmlDocument class. XmlDocument, on the other hand, provides some capabilities we don't have with XmlTextReader, as we will see later in this article.
The previous example shows that by using the XmlTextReader class we gain more flexibility over pure CSS; but it is still very verbose, procedural and time-consuming. Well, actually there are better ways of doing exactly the same thing. See the code bellow for an example that uses the XmlDocument class to load the document into memory and to select the necessary nodes:
private void Page_Load(object sender, System.EventArgs e)
{
// Defines an XmlTextReader objet to read the RSS feed from an URL
XmlTextReader reader = new XmlTextReader(Server.MapPath("rss.xml"));
reader.WhitespaceHandling = WhitespaceHandling.None;
// Creates an XPathNavigator to select the nodes we want to get
XmlDocument doc = new XmlDocument();
XmlNodeList nodes;
doc.Load(reader);
nodes = doc.SelectNodes("//item");
foreach(XmlNode node in nodes)
{
Response.Write(node.SelectSingleNode("title").InnerText + "<br>");
}
}
The previous example uses the XmlDocument class of the System.Xml namespace. This class is an implementation of W3C DOM Level 2 recommendation. It is not much different from the MSXML library Win32 implementation you might be familiar with if you have worked with XML in languages such as Visual FoxPro or Visual Basic.
Using the XmlDocument class, we can select the nodes we want using XPath expressions in the SelectNodes() method. This method receives an XPath expression as parameter and returns the result of the evaluation of the expression. This result comes in the format of an XmlNodeList object. As the name implies, an XmlNodeList represents a collection of XmlNode objects, each one representing an XML node returned from the XPath expression. Our example selects the nodes that are to be used to format the output and iterates though them in order to get its values and present them to the user. See that we can use a foreach() construction to iterate through the nodes of an XmlNodeList object.
Both examples above are functional but still suffer from a very undesired problem: they don't separate presentation from data and processing. All code is mixed together and this difficults the maintainability and independence of the various parts involved in the process.
Next, we are going to see a better way to process and present XML files. The technology responsible for that is called XSLT (Extensible Stylesheet Language Transformations). In the next section we will see how XSLT transformations work and how they will be useful to take our input XML file and transform it into an HTML output document.
XSLT Comes to The Rescue
To overcome all the limitations we have pointed out so far, we can use XSLT. XSLT is a W3C recommendation (http://www.w3.org/TR/xslt) which is completely implemented in the .NET Framework. We are going to touch some concepts of XSLT, but a complete overview of it deserves an entire book. Actually these books about XSLT already exist...
What is XSLT
XSLT is a technology which allows that an XML document be transformed into another one.
These resultant documents are primarily text-based formats like HTML, WML, RTF or any other text file. Nowadays, XSLT has its major use in the generation of HTML files from XML documents, and this is the kind of transformations we are going to discuss in the next sections.
An XSLT transformation works by taking two files: an arbitrary well-formed XML Document and a style sheet document. The first is called the input file and the second the transformation file. An XSLT processor is applied over the input file and uses the transformation file in order to generate a third document called output file. The output file could be an XML document as well, but it could be any type of text file, as we commented above.
The figure bellow illustrates how the XSLT process occurs:
Figure 3: XSLT process
XSLT Templates
A transformation done by an XSLT processor can be thought as one or more operations over the structure and data of the input document. These operations can consist of grouping, filtering, ordering, arithmetic and string operations, and others. As you may know, all this kind of operation can be done by a program like the one shown earlier written in a API-oriented manner. XSLT, on the other hand defines a declarative language that defines rules to be applied over certain parts of the input document. These rules are put in templates. The templates are called one or more times during the processing of the input document and they are responsible for generating the resultant output.
It is common to design XSLT templates which will be applied for a certain node or group of nodes of the input XML document. The key to define each node or group of nodes will be processed by a given XSLT template is XPath. The general structure for an XSLT template which will be applied to the "/rss/item" nodes of a RSS document is shown here:
<template match="/rss/channel/item">
<!-- Template rules go here -->
</template>
The XSLT template element has an important attribute that specifies the XPath expression which will be applied to the context node in order to find the node set we want to process. The template will be executed for each node returned by the XPath expression in the "match" attribute. In the example above, the template will be applied to all "item" elements which are children of the "channel" element which, in turn is child of the "rss" element.
The template shown earlier looks like this then put in a complete XSLT stylesheet:
Notice that the elements used in the previous example are prefixed by "xsl:". This is just a prefix for the XML Namespace http://www.w3.org/1999/XSL/Transform (if you are not familiar with XML Namespaces, please, refer to the W3C page at XML Namespaces). This namespace is where the XSLT elements and attributes will be in and must be declared at the root element of the XSLT document. The root element of an XSLT stylesheet is the <stylesheet> element. All XSLT elements inside the element must be in the "http://www.w3.org/1999/XSL/Transform" namespace. This namespace is defined by W3C and XSLT processors will recognize it. Remember that XML Namespaces are just URIs (Uniform Resource Identifiers) names and do not need to exist as an Internet resource. Even being XML Namespaces generally URLs, it does not mean you need to have an Internet connection in order to use XSLT. XML Namespaces are just names, they do not need to refer to real Internet resources and no connection is made during the processing.
But something is still missing in this document: all XSLT documents have an implicit template called default template. This template has a standard behavior of copying all elements and attributes from the root element on. This is not desirable in our example because we want to display only the news titles and their descriptions.
To avoid this behavior, we must define a template that matches the root element and calls the template for the "item" elements explicitly. Another version of our XSLT style sheet is shown here:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:apply-templates select="/rss/channel/item" />
</xsl:template>
<xsl:template match="/rss/channel/item">
<!-- Template rules go here -->
</xsl:template>
</xsl:stylesheet>
In this example we can see that the template for the root element ("/") has a child element named "apply-templates". This element forces the execution of all the templates which match the XPath expression included as the value of the "select" attribute. In our case, the template that has the "match" attribute of value "/rss/channel/item" will be called for each "item" element of the input XML document.
But wait! We have no rules into the template defined for the "item" elements. Actually, we postponed this until now. So, lets put something useful into our template!
Suppose we want to create an HTML document as the output of our XSLT processing; now suppose that all we want to show are the titles of the news, each one on a separate paragraph. All we need to do is to include the appropriate HTML code into the element. Ok, we know how to define a paragraph in HTML, right? You bet! That is the "<P>" tag! But how can we retrieve a value from the input XML document? Not surprisingly, there is an XSLT element that does this job for us. This is the <value-of> element. This element has also an "select" attribute. This attribute will define the XPath expression necessary to get a value from a node of the input document and the resulting value will replace the <value-of> element when the output is being generated.
The following listing shows the <value-of> element in use:
In order to associate this style sheet with an XML document we have to make a job similar of the one we did to associate the CSS style sheet before. We will still use the <?xml=stylesheet ?> processing instruction but now, instead of specifying the MIME type of “text/css”, we are going to specify the value “text/xsl”. The “href” attribute will now point to our XSLT file.
Put the code in the listing above into a file named “RSSNormal.xsl” and save it in the same folder of the RSS.xml file you created before. Now, change the <?xml-stylesheet ?> processing instruction of the RSS.xml file and make it appear as this:
Now, open the RSS.xml file in Internet Explorer and you will see that the XSLT style sheet we just created and associated with the XML file was applied and the new appearance looks like this:
Figure 4: XML with XSLT style sheet applied
Internet Explorer Also Has its Transformations
Since its version 5.0, Internet Explorer became able to show XML files in a tree structure. The elements can be expanded and collapsed in any level and this eases the visualization of the documents specially the more complex ones.
Internet Explorer actually uses a resource from the MSXML Win32 COM library to transform the XML files it opens. This transformation is done through XSLT and we can easily see the source of this transformation documents by typing the following line in the Address Bar of the browser:
res://msxml.dll/defaultss.xsl
Each version of the MSXML library implements its own default style sheet. You can visualize the style sheets for each version by changing the line above to correspond to the desired version of the MSXML DLL, for example:
When we are developing web applications with ASP.NET, we can use an easy strategy to transform XML documents through XSLT. The .NET Framework provides a Web Control class named "Xml". The corresponding class is located in the "System.Web.UI.WebControls" namespace. To use this Web Control in a Web Form, all we have to do is to open our Web Form in design mode and drag the "Xml" control from the ToolBox onto the Form surface. Figure 5 shows the Xml Web Control at the Visual Studio .NET Toolbox.
The Xml Web Control is very easy to use. In order for it to show our XML document we have only to specify the location of the XML file in the "DocumentSource" property: right-click the XML Web Control in the design view and choose properties from the context menu. Specify the location of your XML file for the "DocumentSource" property and your Xml Web Control will be already functional, although not very nice. To test it, click the "Start" button from the standard toolbar or press F5. A new instance of the Web browser will open with a content that looks like the one shown in the figure bellow:
Figure 6: Raw XML Document
Ok, I agree with you! This is not so beautiful. Lets improve the quality of this presentation then...
The key is the XSLT file we created before. It is very easy to associate our XSLT file with the source XML by using the XML Web Control. In the properties window of the control you will see a property named "TransformSource". It is not so hard to suppose what the purpose of this property is. Yes, its is intended to associate the XML file referenced in the "DocumentSource" property with the XSLT style sheet pointed by it. Inform the name of the XSLT file (RSSNormal.xsl) in the "TransformSource" property and try to run the application again. This time you will see a far better output, thanks to the XSLT transformation applied over the raw XML. The figure bellow shows the resulting transformation; now very better.
Figure 7: XML with a simple XSL Style Sheet applied
Figure 8: Drop Down List in Design mode
Notice that you can choose to change the presentation of the XML simply by associating other XSLT transformation style sheets in the "TransformSource" property of the Xml Web Server Control.
Lets try to improve our Web Form by adding a drop down list which will allow the user to switch the visualization of the page dynamically. There will be two options in this drop down list control. Each of the options will switch the XSLT applied over the input XML document and display the resulting content when each one is selected. One option will show a simple visualization of the news (just the headlines) and the other visualization will show more detailed information about each news. We will get this behavior by alternating the XSLT style sheet and reprocessing the page.
From the ToolBox, drag a DropDownList Web Control onto the Form and name it "DropDownView" for example. Try to put it above the Xml Control. If you have problems positioning the controls the way you like, try to change the DOCUMENT "pageLayout" property from GridLayout to FlowLayout. Now your Web Form will look like the figure 8 when visualized in design mode.
Set the "AutoPostBack" property of the Drop Down control to "True", in order to resubmit the form as soon as the user changes its value.
Double click a blank area of the Web Form or right-click it and choose "View Code" from the context menu; by the way, F7 is the fastest way of doing the same thing; you can use it too... Try to paste the following code at the Page_Load method:
The Page_Load method of our Web Form class is associated with its Load event. Normally we do not see this association happening, but it is all right bellow our nose. If you are curious to see how it works, expand the section of your source code which reads "Web Form Designer generated code". There you will be able to see the code responsible for associating the events with the corresponding methods. This code will be located into the "InicializeComponent" method and is automatically generated by the Web Form Designer. It is not a good idea to change this code, since it will be overwritten the next time you change some event configuration through the designer.
Now that we have a Drop Down Control with two options, it is a good idea make it work any time the user changes these options. Lets do this by programming the "SelectedItemChanged" event of the DropDownList control. Double click this control and VS.NET will show you the skeleton of the "DropDownView_SelectedItemChanged" method. Put the following content into this method in order to change the XSLT style sheet depending on the option chosen by the user in the Drop Down list:
Now we need to create the RSSDetailed.xsl file. This transformation will show more detailed information about the RSS news feed and will format the output differently. The listing bellow shows the code for the RSSDetailed.xsl style sheet:
Just copy this code into a file named “RSSDetailed.xsl” and put it in the same folder as the RSS.xml and RSSNormal.xsl. After that, you can run the program again and see how the style sheets can be changed during the execution of the Web Form.
Figure 9: Web Form in execution with an detailed visualization
Conclusion
Displaying the contents of XML files is a task we must be prepared to do as Developers. As we saw, there are a number of alternatives to do that, each one with its advantages and drawbacks. Certainly, XSLT is and excellent choice, given its declarative nature and the fact that it separates the presentation logic from the data. Knowing the choices you have and the tools you can use, you will be able to take the most adequate decision to your specific needs.