HOME | SIGN UP | ACCOUNT | SUBSCRIBE | TROUBLESHOOTING | LOGIN
News · Meetings · Jobs · Downloads · Universal Thread articles · Web articles · Conference coverages · Consultants · Blogs · FAQ · User groups · Photos · Videos

Presenting XML Content in .NET

Fabio Vazquez, Microsoft
Fabio Vazquez (Sao Paulo, Brazil) is a Software Development Engineer for the Microsoft Dynamics Ax Globalization Team at Microsoft. He is a former Microsoft Most Valuable Professional (MVP) in C# and VFP and a frequent speaker at software development conferences in Brazil, including the Microsoft PDC (Professional Developers Conference) and Tech-Ed. Fabio has two books published and has a great passion for agile development practices.

Introduction

Information has been available in different sources since long time ago. The Internet has played a singular role in making information ready anywhere, anytime. The inevitable side effect of all this availability is that data becomes unorganized, being stored in all kinds of formats and having to be accessed through lots of different protocols. Even in controlled environments such as an in-company LAN, great chances exist of data be scattered and not having a standard way to access them. However, each time more, we assist an increasingly number of information being available in XML format, being it through queries against RDBMS such as SQL Server and Oracle or through the consume of XML Web Services. In the early days of XML, its use was more restricted to data exchange between B2B solutions; and applications such as BizTalk Server started to emerge. These solutions fit more to middleware layers or even database layers, but stayed far from the user-interface services. But recently we have been seeing XML coming from Web Services or business components and this data arriving at the presentation layers of our applications. As more and more XML arrives at the presentation layers, we developers have to provide reliable and productive means of showing them to our users.

In this article we are going to discuss some of the techniques and tools we can use to present the contents of XML documents to the end users. As an example of an XML vocabulary used in the real world we are going to approach RSS. RSS has been gaining some popularity lately, specially because it is the base for Web Logs. Web Logs consists of news and contents periodically published by people that are becoming more and more used on the web today.

XML in the Real World

As XML has been evolving, different uses has been developed to it. Some vocabularies have emerged with great success, such as SVG (Scalable Vector Graphics - http://www.w3.org/TR/SVG11/), MathML (Mathematical Markup Language - http://www.w3.org/TR/REC-MathML/), VoiceXML (Voice Extensible Markup Language - http://www.w3.org/TR/voicexml20/) and others. Some of these vocabularies are “official”;, like the ones cited above, as they are supported by a comitee like W3C. But others emerge from independent initiatives and become very important. One of these vocabularies that came outside W3C and which will be used in this article is RSS. RSS is an XML vocabulary for describing news and related content from online sources. RSS enables the display of "channels". Channels are composed of items retrievable through URLs. RSS files are updated periodically and each item has a title, a link and a brief description. Usually these items are news headlines, but lately RSS has been used to describe entries in a personal log (also known as Web Log). People are starting to put their personal logs on Internet and using RSS to store them. Companies are also publishing their news this way.

Since RSS documents have contents that usually will be read by people, normally the web sites that publish news in RSS format also have an HTML version of them. Some specialized sites are available that aggregate news from various sources and organize them in categories. http://Moreover.com, http://newsisfree.com, www.Syndi8.com and http://MyRSS.com, for example, have a great catalog of news entries available.

Where Does RSS Come From?

RSS was first created by Netscape in 1999. Later, other companies assumed RSS and created new versions of the vocabulary. Nowadays, there are a number of different versions around, but two of them are the most used ones: version 0.91 and version 2.0. The version 0.91 is simpler in comparison with other versions. In this article, for the sake of simplicity, we are going to use RSS documents written in version 0.91. The following listing represents the RSS file we are going to use in some examples in this article. Save this content in a file named RSS.xml and reserve it to make the tests described later in this article.

<?xml version="1.0" ?>
<?xml-stylesheet href="RSSStyle.css" type="text/css" ?>
<rss version="0.91">
   <channel>
      <title>Example of RSS Feed</title>
      <link>http://www.ilabore.com.br</link>
      <description>News for XML, .NET and Visual FoxPro developers</description>
      <language>en-us</language>
      <copyright>Copyright © 2002-2003, Fabio Vazquez</copyright>
      <managingEditor>fabio@ilabore.com.br</managingEditor>
      <webMaster>fabio@ilabore.com.br</webMaster>
      <item>
         <title>Visual FoxPro 8.0 Released</title>
         <link>http://msdn.microsoft.com/vfoxpro</link>
         <description>
         Microsoft has relased the new version of Visual FoxPro, VFP 8.0 to MSDN subscribers. 
         The version for non-subscribers must be available by mid-March/2003.
         </description>
      </item>
      <item>
         <title>Happy Birthday XML!</title>
         <link>http://www.w3.org/2003/02/xml-at-5.html</link>
         <description>
         The Extensible Markup Language completes five years of existance as a W3C recommendation 
         on 10 February 2003.
         </description>
      </item>
      <item>
         <title>Understanding SOAP</title>
         <link>
           http://msdn.microsoft.com/webservices/understanding/webservicebasics/default.aspx?pull=
/library/en-us//dnsoap/html/understandsoap.asp
         </link>
         <description>
            Aaron Skonnard, the famous XML guru, talks about the SOAP protocol 
            from a pure XML perspective.
         </description>
      </item>
   </channel>
</rss>

If you are curious about RSS, you can give it a try, accessing news from famous sources like CNET, Mono Project, MTV and others. See some links bellow:

Personal Web logs can also be found. The following links point to some technology personalities' RSS files:

And finally, after reading and studying so much, we deserve some moments of fun. The following are links to RSS documents about miscellaneous topics:

As lots of UTMag readers are Visual FoxPro developers, here are some Web Logs from distinguished personalities of the Visual FoxPro community (actually, the only ones at the time of this writing, as far as I know):

Please, Aggregate this Mess

With so many sources of interesting information available, it is imperative that we have a software tool capable of organize all that for us. Actually this king of software does exist, and they are called News Aggregators. News aggregators work pretty like News Readers, but they also have the capacity of reading and managing RSS sources. If you are interested in subscribing to some cool Web Logs you have found, don’t forget to evaluate these tools. Some very interesting ones are NewzCrawler and NewsGator. I usually use Syndirella, mostly because it is an open-source project written in C# and serves as a good studying material as well.

If you are interested in more information about RSS, refer to Mike Pilgrim’s excellent article at XML.Com.

This Tree is Not Very Nice

XML documents are pure text files; and as such, they don't have a good appearance when opened within text editors. Browsers like Internet Explorer, on the other hand, are able to show XML files in a more pleasant manner, but they are still very unfriendly for final users (and sometimes for developers too).

Figure 1: our sample XML file opened with Internet Explorer 6.0

We need a better way to show this XML content. One of the alternatives we have is to use CSS (Cascading Style Sheets). CSS is a W3C recommendations which allows the designer to redefine the behavior and appearance of HTML and XML elements. When used with HTML, CSS will be able to change colors, borders, fonts, spacing and mostly all the visual characteristics of the elements. When using XML, we don't have the pre-defined fixed tags we have in HTML; we will create our own tags (elements), and CSS allows us to define the complete appearance of each element in the document.

In our XML example, we can define that the <title> element, child of the <item> element, will look like a block with large fonts and a gray background; and the <description> element, also child of the <item> element, will appear with smaller fonts in blue color and will have an indentation of 20 pixels. In order to do this, we have to write a CSS file with these styles defined. The CSS will have redefinitions for each element we want to change:

rss, channel, title, copyright, description
{
   display:block;
   font-family:Arial;
}
rss 
{
   background-color:#CCCCCC;
}
item 
{
   background-color:White;
   display:block;
   font-family:Arial;
   border:solid 1pt black;
   padding: 5pt 5pt 5pt 5pt;
   margin: 5pt 5pt 5pt 5pt;
}
language, managingEditor, webmaster, link
{
   display:none;
}
channel title
{
   font-size:large;
   color:Black;
}
item title 
{
   background-color:Navy;
   font-size:medium;
   font-weight:bold;
   color:white;
   margin-bottom:8pt;
}
item link
{
   font-size:x-small;
   font-weight:normal;
   color:blue;
   margin-left:20pt;
}
item description 
{
   text-justify:distribute;
   margin-left:20pt;
   font-size:smaller;
}
All we need to do now is to associate the XML file with the CSS. We can do this by using the XML Processing Instruction <?xml-stylesheet ... ?>. Processing Instructions do not have their content previously defined by the XML recommendation. They serve exactly to instruct the XML application (the web browser for example) on how to behave in some situations. It is a kind of “customizer” so to say.

Bellow are the relevant parts of our sample XML changed to include the <?xml-stylesheet .. ?> processing instruction:

<?xml version="1.0" ?>
<?xsl-stylesheet href="RSSStyle.css" type="text/css" ?>
<rss version="0.91">
  <!-- Rest of the document omitted for clarity reasons. -->
</rss>

<?xml-stylesheet ?> has two important attributes: “href”, which defines the location of the CSS file and “type”, which defines the MIME type of the style sheet (text/css in this case). This means you can create different presentations for an ordinary XML file and have it shown by simply changing the “href” attribute of the <?xml-stylesheet ?> processing instruction.

The following figure shows the document as it is presented in Internet Explorer after the <?xml-stylesheet ?> processing instruction has been applied.

Figure 2: XML document with a CSS style sheet associated

Notice that the old tree aspect was replaced by a more friendly one. Each element redefined by the CSS file had its appearance changed and the browser rendered it appropriately.

Although we have something much more powerful right now, the use of CSS to style XML files has still lots of limitations. We can't, for example, change the order the elements appear because this is dependent of the physical order they have in the original XML file; we can't sort elements nor calculate values derived from the ones within the document (we couldn’t, for example count the number of <item> elements and put this information someplace in the output); Pure CSS doesn't allow either to embed new elements and/or attributes in order to get new formatting options. So, we need something still more flexible and robust to support our needs when presenting the content of XML files. Next, we will see some better options we have to present our XML content; but before that, lets briefly talk about an important technology that will be present in great part of the work we will do with XML: XPath.

XPath

XPath (XML Path Language) describes a syntax that allows us to access data within XML documents. The different parts of any arbitrary XML document are called nodes. XML nodes can be elements, attributes, Processing Instructions, CDATA sections, text, etc. XPath enables the selection and filtering of the various nodes within a document, much like SQL allows these kind of operations over relational tables. The XPath language uses a very familiar syntax which resembles the way we address files in a file system. In order to reference a file in a Windows based file system we use the backslash character (\) to separate the folder and file names. XPath also uses a character to separate the element and attribute names in an XML hierarchy, and this is the slash character (/). So, we could use the following valid XPath expression to access the element named "channel" which is a child of the "rss" element:

/rss/channel

Notice the use of the slash character to separate the element names. Notice also that another slash character was used in the beginning of the expression. This means that this XPath expression defines an absolute path, beginning at the root node. An XPath expression can also be relative to the node currently in context (the context node). This kind of expression does not use an absolute path, but a relative one. We can easily identify a relative path because it does not start with a slash character.

Actually, there are two syntaxes for XPath expressions: the abbreviated syntax (the easiest one) and the unabbreviated syntax (the not so easy one). In this article we are going to use only the abbreviated syntax. If you want more information about XPath syntaxes, please refer to the W3C Web site at http://www.w3.org/TR/xpath.

The following XPath expression addresses the attribute named “version”, which is part of the element “rss”, which is the document root element.

/rss/@version

The "@" character in XPath expressions indicate that the name that follows it is an attribute. All attributes used in XPath expressions that use the abbreviated sintax must be prefixed with an “@” character. This allows the XPath processor to distinguish which node is an element and which is an attribute.

Our Programming Language Can Help Us

One of the choices we have to overcome the limitations of CSS style sheets is to employ an XML parser to read the contents of the XML file and use our programming language of choice to format the output the way that best fits our needs. Using an XML parser we can extract exactly the elements and attributes we want and merge them with our output text. Pretty the same we do when we are retrieving data from relational data sources through ADO.NET, for example. Consider the following code that uses the XmlTextReader class from the .NET Framework to open our sample XML file and extracts some data from it:

private void Page_Load(object sender, System.EventArgs e)
{
    // To hold the resulting HTML content
    System.Text.StringBuilder sb = new System.Text.StringBuilder("");
	
    // Defines an XmlTextReader objet to read the RSS feed from an URL
    XmlTextReader reader = new XmlTextReader(Server.MapPath("rss.xml"));
    reader.WhitespaceHandling = WhitespaceHandling.None;

    // Now iterates through the XML and composes the Literar HTML output
    reader.MoveToContent();
    while ( reader.Read() )
    {
        switch (reader.LocalName)
        {
            // Element 
            //
            case "channel":
            {
                while(reader.Read())
                {
                    if (reader.LocalName == "title" && reader.NodeType == XmlNodeType.Element)
                    {
                        reader.Read();
                        sb.AppendFormat("<h1>{0}</h1>", reader.Value);
                    }
                    if (reader.LocalName == "link" && reader.NodeType == XmlNodeType.Element)
                    {
                        reader.Read();
                        sb.AppendFormat("<a href='{0}'>{0}</a>", reader.Value);
                        break;
                    }
                }
                break;
            }

            // Element 
            //
            case "item":
            {
                while(reader.Read())
                {
                    if (reader.LocalName == "title" && reader.NodeType == XmlNodeType.Element)
                    {
                        reader.Read();
                        sb.Append("<hr>");
                        sb.AppendFormat("<h2>{0}</h2>", reader.Value);
                    }
                    if (reader.LocalName == "description" && reader.NodeType == XmlNodeType.Element)
                    {
                        reader.Read();
                        sb.AppendFormat("<p>{0}</p>", reader.Value);
                    }
                }
                break;
            }
        }
    }

    Response.Write(sb.ToString());

    reader = null;
}

The XmlTextReader class is part of the System.Xml namespace. This class can be used to read the different nodes of an XML document and process them as desired. XmlTextReader is different from the DOM model implemented by the XmlDocument class because it doesn't load all document into memory. Documents processed through the XmlDocument class put all the XML tree (all the nodes and their hierarchy) into memory and this can become slow in scenarios where large XML documents are used. The XmlTextReader class implements a “pull” model, where the nodes are transversed and discarded from memory as they are read.

So, this class provides a read-only, forward-only way of reading XML documents. This can be very efficient in some situations because you don't have all the overhead associated with loading the entire document into memory as we would have with the XmlDocument class. XmlDocument, on the other hand, provides some capabilities we don't have with XmlTextReader, as we will see later in this article.

The previous example shows that by using the XmlTextReader class we gain more flexibility over pure CSS; but it is still very verbose, procedural and time-consuming. Well, actually there are better ways of doing exactly the same thing. See the code bellow for an example that uses the XmlDocument class to load the document into memory and to select the necessary nodes:

private void Page_Load(object sender, System.EventArgs e)
{
    // Defines an XmlTextReader objet to read the RSS feed from an URL
    XmlTextReader reader = new XmlTextReader(Server.MapPath("rss.xml"));
    reader.WhitespaceHandling = WhitespaceHandling.None;

    // Creates an XPathNavigator to select the nodes we want to get
    XmlDocument doc = new XmlDocument();
    XmlNodeList nodes;
    doc.Load(reader);
    nodes = doc.SelectNodes("//item");

    foreach(XmlNode node in nodes) 
    {	
        Response.Write(node.SelectSingleNode("title").InnerText + "<br>");
    }
}

The previous example uses the XmlDocument class of the System.Xml namespace. This class is an implementation of W3C DOM Level 2 recommendation. It is not much different from the MSXML library Win32 implementation you might be familiar with if you have worked with XML in languages such as Visual FoxPro or Visual Basic.

Using the XmlDocument class, we can select the nodes we want using XPath expressions in the SelectNodes() method. This method receives an XPath expression as parameter and returns the result of the evaluation of the expression. This result comes in the format of an XmlNodeList object. As the name implies, an XmlNodeList represents a collection of XmlNode objects, each one representing an XML node returned from the XPath expression. Our example selects the nodes that are to be used to format the output and iterates though them in order to get its values and present them to the user. See that we can use a foreach() construction to iterate through the nodes of an XmlNodeList object.

Both examples above are functional but still suffer from a very undesired problem: they don't separate presentation from data and processing. All code is mixed together and this difficults the maintainability and independence of the various parts involved in the process.

Next, we are going to see a better way to process and present XML files. The technology responsible for that is called XSLT (Extensible Stylesheet Language Transformations). In the next section we will see how XSLT transformations work and how they will be useful to take our input XML file and transform it into an HTML output document.

XSLT Comes to The Rescue

To overcome all the limitations we have pointed out so far, we can use XSLT. XSLT is a W3C recommendation (http://www.w3.org/TR/xslt) which is completely implemented in the .NET Framework. We are going to touch some concepts of XSLT, but a complete overview of it deserves an entire book. Actually these books about XSLT already exist...

What is XSLT

XSLT is a technology which allows that an XML document be transformed into another one.

These resultant documents are primarily text-based formats like HTML, WML, RTF or any other text file. Nowadays, XSLT has its major use in the generation of HTML files from XML documents, and this is the kind of transformations we are going to discuss in the next sections.

An XSLT transformation works by taking two files: an arbitrary well-formed XML Document and a style sheet document. The first is called the input file and the second the transformation file. An XSLT processor is applied over the input file and uses the transformation file in order to generate a third document called output file. The output file could be an XML document as well, but it could be any type of text file, as we commented above. The figure bellow illustrates how the XSLT process occurs:

Figure 3: XSLT process

XSLT Templates

A transformation done by an XSLT processor can be thought as one or more operations over the structure and data of the input document. These operations can consist of grouping, filtering, ordering, arithmetic and string operations, and others. As you may know, all this kind of operation can be done by a program like the one shown earlier written in a API-oriented manner. XSLT, on the other hand defines a declarative language that defines rules to be applied over certain parts of the input document. These rules are put in templates. The templates are called one or more times during the processing of the input document and they are responsible for generating the resultant output.

It is common to design XSLT templates which will be applied for a certain node or group of nodes of the input XML document. The key to define each node or group of nodes will be processed by a given XSLT template is XPath. The general structure for an XSLT template which will be applied to the "/rss/item" nodes of a RSS document is shown here:

<template match="/rss/channel/item">
   <!-- Template rules go here -->
</template>

The XSLT template element has an important attribute that specifies the XPath expression which will be applied to the context node in order to find the node set we want to process. The template will be executed for each node returned by the XPath expression in the "match" attribute. In the example above, the template will be applied to all "item" elements which are children of the "channel" element which, in turn is child of the "rss" element.

The template shown earlier looks like this then put in a complete XSLT stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

   <xsl:template match="/rss/channel/item">
     <!-- Template rules goes here -->
   </xsl:template>

</xsl:stylesheet>

Notice that the elements used in the previous example are prefixed by "xsl:". This is just a prefix for the XML Namespace http://www.w3.org/1999/XSL/Transform (if you are not familiar with XML Namespaces, please, refer to the W3C page at XML Namespaces). This namespace is where the XSLT elements and attributes will be in and must be declared at the root element of the XSLT document. The root element of an XSLT stylesheet is the <stylesheet> element. All XSLT elements inside the element must be in the "http://www.w3.org/1999/XSL/Transform" namespace. This namespace is defined by W3C and XSLT processors will recognize it. Remember that XML Namespaces are just URIs (Uniform Resource Identifiers) names and do not need to exist as an Internet resource. Even being XML Namespaces generally URLs, it does not mean you need to have an Internet connection in order to use XSLT. XML Namespaces are just names, they do not need to refer to real Internet resources and no connection is made during the processing.

But something is still missing in this document: all XSLT documents have an implicit template called default template. This template has a standard behavior of copying all elements and attributes from the root element on. This is not desirable in our example because we want to display only the news titles and their descriptions.

To avoid this behavior, we must define a template that matches the root element and calls the template for the "item" elements explicitly. Another version of our XSLT style sheet is shown here:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

   <xsl:template match="/">
      <xsl:apply-templates select="/rss/channel/item" />
   </xsl:template>

   <xsl:template match="/rss/channel/item">
     <!-- Template rules go here -->
   </xsl:template>

</xsl:stylesheet>

In this example we can see that the template for the root element ("/") has a child element named "apply-templates". This element forces the execution of all the templates which match the XPath expression included as the value of the "select" attribute. In our case, the template that has the "match" attribute of value "/rss/channel/item" will be called for each "item" element of the input XML document.

But wait! We have no rules into the template defined for the "item" elements. Actually, we postponed this until now. So, lets put something useful into our template!

Suppose we want to create an HTML document as the output of our XSLT processing; now suppose that all we want to show are the titles of the news, each one on a separate paragraph. All we need to do is to include the appropriate HTML code into the