By now, I hope you're convinced that you can use XSLT to convert big piles of XML data into other useful things. Our examples to this point have pretty much gone through the XML source in what's referred to as document order. We'd like to go through our XML documents in a couple of other common ways, though:
We could sort some or all of the XML elements, then generate output based on the sorted elements.
We could group the data, selecting all elements that have some property in common, then sorting the groups of elements.
We'll give several examples of these operations in this chapter.
The simplest way to rearrange our XML elements is to use the <xsl:sort> element. This element temporarily rearranges a collection of elements based on criteria we define in our stylesheet.
For our first example, we'll have a set of U.S. postal addresses that we want to sort. (No chauvinism is intended here; obviously every country has different conventions for mailing addresses. We just needed a short sample document that can be sorted in many useful ways.) Here's our original document:
<?xml version="1.0"?> <addressbook> <address> <name> <title>Mr.</title> <first-name>Chester Hasbrouck</first-name> <last-name>Frisby</last-name> </name> <street>1234 Main Street</street> <city>Sheboygan</city> <state>WI</state> <zip>48392</zip> </address> <address> <name> <first-name>Mary</first-name> <last-name>Backstayge</last-name> </name> <street>283 First Avenue</street> <city>Skunk Haven</city> <state>MA</state> <zip>02718</zip> </address> <address> <name> <title>Ms.</title> <first-name>Natalie</first-name> <last-name>Attired</last-name> </name> <street>707 Breitling Way</street> <city>Winter Harbor</city> <state>ME</state> <zip>00218</zip> </address> <address> <name> <first-name>Harry</first-name> <last-name>Backstayge</last-name> </name> <street>283 First Avenue</street> <city>Skunk Haven</city> <state>MA</state> <zip>02718</zip> </address> <address> <name> <first-name>Mary</first-name> <last-name>McGoon</last-name> </name> <street>103 Bryant Street</street> <city>Boylston</city> <state>VA</state> <zip>27318</zip> </address> <address> <name> <title>Ms.</title> <first-name>Amanda</first-name> <last-name>Reckonwith</last-name> </name> <street>930-A Chestnut Street</street> <city>Lynn</city> <state>MA</state> <zip>02930</zip> </address> </addressbook>
We'd like to generate a list of these addresses, sorted by <last-name>. We'll use the magical <xsl:sort> element to do the work. Our stylesheet looks like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" indent="no"/> <xsl:strip-space elements="*"/> <xsl:variable name="newline"> <xsl:text> </xsl:text> </xsl:variable> <xsl:template match="/"> <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:value-of select="name/title"/> <xsl:text> </xsl:text> <xsl:value-of select="name/first-name"/> <xsl:text> </xsl:text> <xsl:value-of select="name/last-name"/> <xsl:value-of select="$newline"/> <xsl:value-of select="street"/> <xsl:value-of select="$newline"/> <xsl:value-of select="city"/> <xsl:text>, </xsl:text> <xsl:value-of select="state"/> <xsl:text> </xsl:text> <xsl:value-of select="zip"/> <xsl:value-of select="$newline"/> <xsl:value-of select="$newline"/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
The heart of our stylesheet are the <xsl:for-each> and <xsl:sort> elements. The <xsl:for-each> element selects the items with which we'll work, and the <xsl:sort> element rearranges them before we write them out.
Notice that we're generating a text file (<xsl:output method="text"/>). (You could generate an HTML file or something more complicated if you want.) To invoke the stylesheet engine, we run this command:
java org.apache.xalan.xslt.Process -in names.xml -xsl namesorter1.xsl -out names.text
Here are the results we get from our first attempt at sorting:
Ms. Natalie Attired 707 Breitling Way Winter Harbor, ME 00218 Mary Backstayge 283 First Avenue Skunk Haven, MA 02718 Harry Backstayge 283 First Avenue Skunk Haven, MA 02718 Mr. Chester Hasbrouck Frisby 1234 Main Street Sheboygan, WI 48392 Mary McGoon 103 Bryant Street Boylston, VA 27318 Ms. Amanda Reckonwith 930-A Chestnut Street Lynn, MA 02930
As you can see from the output, the addresses in our original document were sorted by last name. All we had to do was add xsl:sort to our stylesheet, and all the elements were magically reordered. If you aren't convinced that XSLT can increase your programmer productivity, try writing the Java code and DOM method calls to do the same thing.
We can do a couple of things to improve our original stylesheet, however. For one thing, there's an annoying blank space at the start of every name that doesn't have a <title> element. A more significant improvement is that we'd like to sort addresses by <first-name> within <last-name>. In our last example, Mary Backstayge should appear after Harry Backstayge. Here's how we can modify our stylesheet to use more than one sort key:
<xsl:template match="/"> <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> ...
We've simply added a second <xsl:sort> element to our stylesheet. This element does what we want; it sorts the <address> elements by <first-name> within <last-name>. To be thoroughly obsessive about our output, we can use an <xsl:if> element to get rid of that annoying blank space in front of names with no <title> element:
<xsl:if test="name/title"> <xsl:value-of select="name/title"/> <xsl:text> </xsl:text> </xsl:if>
Now our output is perfect:
Ms. Natalie Attired 707 Breitling Way Winter Harbor, ME 00218 Harry Backstayge 283 First Avenue Skunk Haven, MA 02718 Mary Backstayge 283 First Avenue Skunk Haven, MA 02718 Mr. Chester Hasbrouck Frisby 1234 Main Street Sheboygan, WI 48392 Mary McGoon 103 Bryant Street Boylston, VA 27318 Ms. Amanda Reckonwith 930-A Chestnut Street Lynn, MA 02930
Now that we've seen a couple of examples of how <xsl:sort> works, we'll go over its syntax, its attributes, and where you can use it.
I'm so glad you asked that question. One thing the XSLT working group could have done is something like this:
<xsl:for-each select="addressbook/address" sort-key-1="name/last-name" sort-key-2="name/first-name"/>
The problem with this approach is that no matter how many sort-key-x attributes you define, out of sheer perverseness, someone will cry out that they really need the sort-key-8293 attribute. To avoid this messy problem, the XSLT designers decided to let you specify the sort keys by using a number of <xsl:sort> elements. The first is the primary sort key, the second is the secondary sort key, the 8293rd one is the eight-thousand-two-hundred-and-ninety-third sort key, etc.
Well, that's why the syntax looks the way it does, but how does it actually work? When I first saw this syntax:
<xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> <xsl:apply-templates select="."/> </xsl:for-each>
I thought it meant that all the nodes were sorted during each iteration through the <xsl:for-each> element. That seemed incredibly inefficient; if you've sorted all the nodes, why resort them each time through the <xsl:for-each> element? Actually, the XSLT processor handles all <xsl:sort> elements before it does anything, then it processes the <xsl:for-each> element as if the <xsl:sort> elements weren't there.
It's less efficient, but if it makes you feel better about the syntax, you could write the stylesheet like this:
<xsl:template match="/"> <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> <xsl:for-each select="."> <!-- This is slower, but it works --> <xsl:apply-templates select="."/> </xsl:for-each> </xsl:for-each> </xsl:template>
(Don't actually do this. I'm only trying to make a point.) This stylesheet generates the same results as our earlier stylesheet.
The <xsl:sort> element has several attributes, all of which are discussed here.
data-type="text"
data-type="number"
A data-type="QName" that identifies a particular datatype. The stated goal of the XSLT working group is that the datatypes defined in the XML Schema specification will eventually be supported here.
The XSLT specification defines the behavior for data-type="text" and data-type="number". Consider this XML document:
<?xml version="1.0"?> <numberlist> <number>127</number> <number>23</number> <number>10</number> </numberlist>
We'll sort these values using the default value (data-type="text"):
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" indent="no"/> <xsl:strip-space elements="*"/> <xsl:variable name="newline"> <xsl:text> </xsl:text> </xsl:variable> <xsl:template match="/"> <xsl:for-each select="numberlist/number"> <xsl:sort select="."/> <xsl:value-of select="."/> <xsl:value-of select="$newline"/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
When we sort these elements using data-type="text", here's what we get:
10 127 23
We get this result because a text-based sort puts anything that starts with a "1" before anything that starts with a "2." If we change the <xsl:sort> element to be <xsl:sort select="." data-type="number"/>, we get these results:
10 27 123
If you use something else here (data-type="floating-point", for example), what the XSLT processor does is anybody's guess. The XSLT specification allows for other values here, but it's up to the XSLT processor to decide how (or if) it wants to process those values. Check your processor's documentation to see if it does anything relevant or useful for values other than data-type="text" or data-type="number".
A final note: if you're using data-type="number", and any of the values aren't numbers, those non-numeric values will sort before the numeric values. That means if you're using order="ascending", the non-numeric values appear first; if you use order="descending", the non-numeric values appear last.
<?xml version="1.0"?> <numberlist> <number>127</number> <number>23</number> <number>zzz</number> <number>10</number> <number>yyy</number> </numberlist>
Given this less-than-perfect data, here are the correctly sorted results:
zzz yyy 10 23 127
Notice that the non-numeric values were not sorted; they simply appear in the output document in the order in which they were encountered.
The <xsl:sort> element can appear inside two elements:
<xsl:apply-templates>
<xsl:for-each>
If you use an <xsl:sort> element inside <xsl:for-each>, the <xsl:sort> element(s) must appear first. If you tried something like this, you'd get an exception from the XSLT processor:
<xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:value-of select="name/title"/> <xsl:sort select="name/first-name"/> <!-- NOT LEGAL! --> ...
We've pretty much covered the <xsl:sort> element at this point. To add another wrinkle to our example, we'll change the stylesheet so the xsl:sort element acts upon a subset of the addresses, then sorts that subset. We'll sort only the addresses from states that start with the letter M. As you'd expect, we'll do this magic with an XPath expression that limits the elements to be sorted:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" indent="no"/> <xsl:strip-space elements="*"/> <xsl:variable name="newline"> <xsl:text> </xsl:text> </xsl:variable> <xsl:template match="/"> <xsl:for-each select="addressbook/address/[starts-with(state, 'M')]"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> <xsl:if test="name/title"> <xsl:value-of select="name/title"/> <xsl:text> </xsl:text> </xsl:if> <xsl:value-of select="name/first-name"/> <xsl:text> </xsl:text> <xsl:value-of select="name/last-name"/> <xsl:value-of select="$newline"/> <xsl:value-of select="street"/> <xsl:value-of select="$newline"/> <xsl:value-of select="city"/> <xsl:text>, </xsl:text> <xsl:value-of select="state"/> <xsl:text> </xsl:text> <xsl:value-of select="zip"/> <xsl:value-of select="$newline"/> <xsl:value-of select="$newline"/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
Here are the results, only those addresses from states beginning with the letter M, sorted by first name within last name:
Ms. Natalie Attired 707 Breitling Way Winter Harbor, ME 00218 Harry Backstayge 283 First Avenue Skunk Haven, MA 02718 Mary Backstayge 283 First Avenue Skunk Haven, MA 02718 Ms. Amanda Reckonwith 930-A Chestnut Street Lynn, MA 02930
Notice that in the xsl:for-each element, we used a predicate in our XPath expression so that only addresses containing <state> elements whose contents begin with M are selected. This example starts us on the path to grouping nodes. We could do lots of other things here:
We could generate output that prints all the unique Zip Codes, along with the number of addresses that have those Zip Codes.
For each unique Zip Code (or state, or last name, etc.) we could sort on a field and list all addresses with that Zip Code.
Copyright © 2002 O'Reilly & Associates. All rights reserved.