In this chapter, we review the design rationale behind XSLT and XPath and discuss the basics of XML. We also talk about other web standards and how they relate to XSLT and XPath. We conclude the chapter with a brief discussion of how to set up an XSLT processor on your machine so you can work with the examples throughout the book.
XML has gone from working group to entrenched buzzword in record time. Its flexibility as a language for presenting structured data has made it the lingua franca for data interchange. Early adopters used programming interfaces such as the Document Object Model (DOM) and the Simple API for XML (SAX) to parse and process XML documents. As XML becomes mainstream, however, it's clear that the average web citizen can't be expected to hack Java, Visual Basic, Perl, or Python code to work with documents. What's needed is a flexible, powerful, yet relatively simple, language capable of processing XML.
What's needed is XSLT.
XSLT, the Extensible Stylesheet Language for Transformations, is an official recommendation of the World Wide Web Consortium (W3C). It provides a flexible, powerful language for transforming XML documents into something else. That something else can be an HTML document, another XML document, a Portable Document Format (PDF) file, a Scalable Vector Graphics (SVG) file, a Virtual Reality Modeling Language (VRML) file, Java code, a flat text file, a JPEG file, or most anything you want. You write an XSLT stylesheet to define the rules for transforming an XML document, and the XSLT processor does the work.
The W3C has defined two families of standards for stylesheets. The oldest and simplest is Cascading Style Sheets (CSS), a mechanism used to define various properties of markup elements. Although CSS can be used with XML, it is most often used to style HTML documents. I can use CSS properties to define that certain elements be rendered in blue, or in 58-point type, or in boldface. That's all well and good, but there are many things that CSS can't do:
CSS can't change the order in which elements appear in a document. If you want to sort certain elements or filter elements based on a certain property, CSS won't do the job.
CSS can't do computations. If you want to calculate and output a value (maybe you want to add up the numeric value of all <price> elements in a document), CSS won't do the job.
CSS can't combine multiple documents. If you want to combine 53 purchase order documents and print a summary of all items ordered in those purchase orders, CSS won't do the job.
WARNING: Don't take this section as a criticism of CSS; XSLT and CSS were designed for different purposes. One fairly common use of XSLT is to generate an HTML document that contains CSS elements. See Section 3.5, "The XPath View of an XML Document" in Chapter 3, "XPath: A Syntax for Describing Needles and Haystacks" for an example that uses XSLT to generate CSS properties.
XSLT was created to be a more powerful, flexible language for transforming documents. In this book, we go through all the features of XSLT and discuss each of them in terms of practical examples. Some of XSLT's design goals specify that:
An XSLT stylesheet should be an XML document. This means that you can write a stylesheet that transforms a second stylesheet into another stylesheet (we actually do this in Chapter 4, "Branching and Control Elements"). This kind of recursive thinking is common in XSLT.
The XSLT language should be based on pattern matching. Most of our stylesheets consist of rules (called templates in XSLT) used to transform a document. Each rule says, "When you see part of a document that looks like this, here's how you convert it into something else." This is probably different from any programming you've previously done.
XSLT should be designed to be free of side effects. In other words, XSLT is designed to be optimized so that many different stylesheet rules could be applied simultaneously. The biggest impact of this is that variables can't be modified. Once a variable is initialized, you can't change its value; if variables could be changed, then processing one stylesheet rule might have side effects that impact other stylesheet rules. This is almost certainly different from any programming you've previously done.
XSLT is heavily influenced by the design of functional programming languages, such as Lisp, Scheme, and Haskell. These languages also feature immutable variables. Instead of defining the templates of XSLT, functional programming languages define programs as a series of functions, each of which generates a well-defined output (free from side effects, of course) in response to a well-defined input. The goal is to execute the instructions of a given XSLT template without affecting the execution of any other XSLT template.
Instead of looping, XSLT uses iteration and recursion. Given that variables can't be changed, how do you do something like a for or do-while loop? XSLT uses two equivalent techniques: iteration and recursion. Iteration means that you can write an XSLT template that says, "get all the things that look like this, and here's what I want you to do with each of them." Although that's different from a do-while loop, usually what you do in a procedural language is something like, "do this while there are any items left to process." In that case, iteration does exactly what you want.
Recursion takes some getting used to. If you must implement something like a for statement (for i=1 to 10 do, for example), recursion is the way to go. There are a number of examples of recursion throughout the book; you can flip ahead to Section 4.7, "A Stylesheet That Emulates a for Loop" in Chapter 4, "Branching and Control Elements" for more information.
Given these design goals, what are XSLT's strengths? Here are some scenarios:
Your web site needs to deliver information to a variety of devices. You need to support ordinary desktop browsers, as well as pagers, mobile phones, and other low-resolution, low-function devices. It would be great if you could create your information in structured documents, then transform those documents into all the formats you need.
You need to exchange data with your partners, but all of you use different database systems. It would be great if you could define a common XML data format, then transform documents written in that format into the import files you need (SQL statements, comma-separated values, etc.).
To stay on the cutting edge, your web site gets a complete visual redesign every few months. Even though things such as server-side includes and CSS can help, they can't do everything. It would be great if your data were in a flexible format that could be transformed into any look and feel, simplifying the redesign process.
You have documents in several different formats. All the documents are machine-readable, but it's a hassle to write programs to parse and process all of them. It would be great if you could combine all of the documents into a single format, then generate summary documents and reports based on that collection of documents. It would be even better if the report could contain calculated values, automatically generated graphics, and formatting for high-quality printing.
Throughout the book, we'll demonstrate XSLT solutions for problems just like these. Most chapters focus on particular techniques, such as sorting, grouping, and generating links between pieces of data. We wrap up with a case study that discusses a real-world content-management scenario and illustrates how XSLT was used to solve a number of problems.
Copyright © 2002 O'Reilly & Associates. All rights reserved.