One of the most common uses of XPath is to create location paths. A location path describes the location of something in an XML document. In our examples in the previous chapter, we used location paths on the match and select attributes of various XSLT elements. Those location paths described the parts of the XML document we wanted to work with. Most of the XPath expressions you'll use are location paths, and most of them are pretty simple. Before we dive in to the wonders of XPath, we need to discuss the context.
One of the most important concepts in XPath is the context. Everything we do in XPath is interpreted with respect to the context. You can think of an XML document as a hierarchy of directories in a filesystem. In our sonnet example, we could imagine that sonnet is a directory at the root level of the filesystem. The sonnet directory would, in turn, contain directories named auth:author, title, and lines. In this example, the context would be the current directory. If I go to a command line and execute a particular command (such as dir *.js), the results I get vary depending on the current directory. Similarly, the results of evaluating an XPath expression will probably vary based on the context.
Most of the time, we can think of the context as the node in the tree from which any expression is evaluated. To be completely accurate, the context consists of five things:
The context node (the "current directory"). The XPath expression is evaluated from this node.
Two integers, the context position and the context size. These integers are important when we're processing a group of nodes. For example, we could write an XPath expression that selects all of the <li> elements in a given document. The context size refers to the number of <li> items selected by that expression, and the context position refers to the position of the <li> we're currently processing.
A set of variables. This set includes names and values of all variables that are currently in scope.
A set of all the functions available to XPath expressions. Some of these functions are defined by the XPath and XSLT standards themselves; others might be extension functions defined by whomever created the stylesheet. (You'll read more about extension functions in Chapter 8, "Extending XSLT".)
Having said all that, most of the time you can ignore everything but the context node. To use our command line analogy one more time, if you're at a command line, you have a current directory; you also have (depending on your operating system) a number of environment variables defined. For most commands, you can focus on the current directory and ignore the environment variables.
Now that we've talked about what a context is and why it matters, we'll look at some location paths. We'll start with a variety of simple location paths; as we go along, we'll look at more complex location paths that use all the various features of XPath. We already looked at one of the simplest XPath expressions:
<xsl:template match="/">
This template selects the root node of the document. We saw another simple XPath expression in the <xsl:value-of> element:
<xsl:value-of select="."/>
This template selects the context node, represented by a period. To complete our tour of very simple location paths, we can use the double period (..) to select the parent of the context node:
<xsl:value-of select=".."/>
All these XPath expressions have one thing in common: they don't use element names. As you might have noticed in our Hello World example, you can use element names to select elements that have a particular name:
<xsl:apply-templates select="greeting"/>
In this example, we select all of the <greeting> elements in the current context and apply the appropriate template to each of them. Turning to our XML sonnet, we can create location paths that specify more than one level in the document hierarchy:
<xsl:apply-templates select="lines/line/">
This example selects all <line> elements that are contained in any <lines> elements in the current context. If the current context doesn't have any <lines> elements, then this expression returns an empty node-set. If the current context has plenty of <lines> elements, but none of them contain any <line> elements, this expression also returns an empty node-set.
The XPath specification talks about two kinds of XPath expressions, relative and absolute. Our previous example is a relative XPath expression because the nodes it specifies depend on the current context. An absolute XPath expression begins with a slash (/), which tells the XSLT processor to start at the root of the document, regardless of the current context. In other words, you can evaluate an absolute XPath expression from any context node you want, and the results will be the same. Here's an absolute XPath expression:
<xsl:apply-templates select="/sonnet/lines/line"/>
The good thing about an absolute expression is that you don't have to worry about the context node. Another benefit is that it makes it easy for the XSLT processor to find all nodes that match this expression: what we've said in this expression is that there must be a <sonnet> element at the root of the document, that element must contain at least one <lines> element, and that at least one of those <lines> elements must contain at least one <line> element. If any of those conditions fail, the XSLT processor can stop looking through the tree and return an empty node-set.
A possible disadvantage of using absolute XPath expressions is that it could make your templates more difficult to reuse. Both of these templates process <line> elements, but the second one is more difficult to reuse:
<xsl:template match="line"> ... </xsl:template> <xsl:template match="/sonnet/lines/line"> ... </xsl:template>
If the second template has wonderful code for processing <line> elements, but your document contains <line> elements that don't match the absolute XPath expression, you can't reuse that template. You should keep that in mind as you design your templates.
Up until now, we've discussed XPath expressions that used either element names (/sonnet/lines/line) or special characters (/ or ..) to select elements from an XML document. Obviously, XML documents contain things other than elements; we'll talk about how to select those other things here.
To select an attribute, use the at-sign (@) along with the attribute name. In our sample sonnet, you can select the type attribute of the <sonnet> element with the XPath expression /sonnet/@type. If the context node is the <sonnet> element itself, then the relative XPath expression @type does the same thing.
To select the text of an element, use the XPath node test text(). The XPath expression /sonnet/auth:author/last-name/text() selects the text of the last-name element in our example document. Be aware that the text of an element is the concatenation of all of its text nodes. Thus, the XPath expression /sonnet/auth:author/text() returns the following text:
ShakespeareWilliamBritish15641616
That's probably not the output you want; if you want to provide spacing, line breaks, or other formatting, you need to use the text() node test against all the child nodes individually.
By this point, we've covered most of the things you're ever likely to do with an XPath expression. You can use a couple of other XPath node tests to describe parts of an XML document. The comment() and processing-instruction() node tests allow you to select comments and processing instructions from the XML document. Going back to our sample sonnet, the XPath expression /processing-instruction() returns the two processing instructions (named xml-stylesheet and cocoon-process). The expression /sonnet/comment() returns the comment node that begins, "Is there an official title for this sonnet?"
Processing comment nodes in this way can actually be useful. If you've entered comments into an XML document, you can use the comment() node test to display your comments only when you want. Here's an XSLT template you could use:
<xsl:template match="comment()"> <span class="comment"> <p><xsl:value-of select="."/></p> </span> </xsl:template>
Elsewhere in your stylesheet, you could define CSS attributes to print comments in a large, bold, purple font. To remove all comments from your output document, simply go to your stylesheet and comment out any <xsl:apply-templates select="comment()"/> statements.
XPath has one other kind of node, the rarely used namespace node. To retrieve namespace nodes, you have to use something called the namespace axis; we'll discuss axes soon. One note about namespace nodes, if you ever have to use them: When matching namespace nodes, the namespace prefix isn't important. As an example, our sample sonnet used the auth namespace prefix, which maps to the value http://www.authors.com/. If a stylesheet uses the namespace prefix writers to refer to the same URL, then the XPath expression /sonnet/writers::* would return the <auth:author> element. Even though the namespace prefixes are different, the URLs they refer to are the same.
Having said all that, the chances that you'll ever need to use namespace nodes are pretty slim.
XPath features three wildcards:
The asterisk (*), which selects all element nodes in the current context. Be aware that the asterisk wildcard selects element nodes only; attributes, text nodes, comments, or processing instructions aren't included. You can also use a namespace prefix with an asterisk: in our sample sonnet, the XPath expression auth:* returns all element nodes in the current context that are associated with the namespace URL http://www.authors.com.
The at-sign and asterisk (@*), which selects all attribute nodes in the current context. You can use a namespace prefix with the attribute wildcard. In our sample sonnet, @auth:* returns all attribute nodes in the current context that are associated with the namespace URL http://www.authors.com.
The node() node test, which selects all nodes in the current context, regardless of type. This includes elements, text, comments, processing instructions, attributes, and namespace nodes.
In addition to these wildcards, XPath includes the double slash (//), which indicates that zero or more elements may occur between the slashes. For example, the XPath expression //line selects all <line> elements, regardless of where they appear in the document. This is an absolute XPath expression because it begins with a slash. You can also use the double slash at any point in an XPath expression; the expression /sonnet//line selects all <line> elements that are descendants of the <sonnet> element at the root of the XML document. The expressions /sonnet//line and /sonnet/descendant-or-self::line are equivalent.
WARNING: The double slash (//) is a very powerful operator, but be aware that it can make your stylesheets incredibly inefficient. If we use the XPath expression //line, the XSLT processor has to check every node in the document to see if there are any <line> elements. The more specific you can be in your XPath expressions, the less work the XSLT processor has to do, and the faster your stylesheets will execute. Thinking back to our filesystem metaphor, if I go to a Windows command prompt and type dir/s c:\*.xml, the operating system has to look in every subdirectory for any *.xml files that might be there. However, if I type dir /s c:\doug\projects\xml-docs\*.xml, the operating system has far fewer places to look, and the command will execute much faster.
To this point, we've been able to select child elements, attributes, text, comments, and processing instructions with some fairly simple XPath expressions. Obviously, we might want to select many other things, such as:
All ancestors of the context node
All descendants of the context node
All previous siblings or following siblings of the context node (siblings are nodes that have the same parent)
To select these things, XPath provides a number of axes that let you specify various collections of nodes. There are thirteen axes in all; we'll discuss all of them here, even though most of them won't be particularly useful to you. To use an axis in an XPath expression, type the name of the axis, a double colon (::), and the name of the element you want to select, if any.
Before we define all of the axes, though, we need to talk about XPath's unabbreviated syntax.
To this point, all the XPath expressions we've looked at used the XPath abbreviated syntax. Most of the time, that's what you'll use; however, most of the lesser-used axes can only be specified with the unabbreviated syntax. For example, when we wrote an XPath expression to select all of the <line> elements in the current context, we used the abbreviated syntax:
<xsl:apply-templates select="line"/>
If you really enjoy typing, you can use the unabbreviated syntax to specify that you want all of the <line> children of the current context:
<xsl:apply-templates select="child::line"/>
We'll go through all of the axes now, pointing out which ones have an abbreviated syntax.
The following list contains all of the axes defined by the XPath standard, with a brief description of each one.
There's one more aspect of XPath expressions that we haven't discussed: predicates. Predicates are filters that restrict the nodes selected by an XPath expression. Each predicate is evaluated and converted to a Boolean value (either true or false). If the predicate is true for a given node, that node will be selected; otherwise, the node is not selected. Predicates always appear inside square brackets ([]). Here's an example:
<xsl:apply-templates select="line[3]"/>
This expression selects the third <line> element in the current context. If there are two or fewer <line> elements in the current context, this XPath expression returns an empty node-set. Several things can be part of a predicate; we'll go through them here.
A number inside square brackets selects nodes that have a particular position. For example, the XPath expression line[7] selects the seventh <line> element in the context node. XPath also provides the boolean and and or operators as well as the union operator (|) to combine predicates. The expression line[position()=3 and @style] matches all <line> elements that occur third and have a style attribute, while line[position()=3 or @style] matches all <line> elements that either occur third or have a style attribute. With the union operator, the expression line[3|7] matches all third and seventh <line> elements in the current context, as does the more verbose line[3] | line[7].
In addition to numbers, we can use XPath and XSLT functions inside predicates. Here are some examples:
Copyright © 2002 O'Reilly & Associates. All rights reserved.