When Sun released the Java API for XML Parsing, generally referred to as JAXP, they managed to launch a series of contradictions into the Java world. In one swoop, they released the most important API that wasn't an API to Java developers, and caused great confusion with the simplest API. People switched to a new parser without knowing they had switched to a new parser. There is a lot of confusion surrounding JAXP, not only about how to use it, but even about what it is.
In this chapter, I'll first address some of the confusion about what JAXP is and is not.[13] Then you'll get a look at JAXP 1.0, which is still used heavily. Once you get the basics, we will move on to JAXP 1.1, the latest version (not quite released as of the writing of this chapter, but almost certainly available by publication time). That will give you a leg up on the new features in the latest version, and in particular the TrAX API included in JAXP 1.1. Buckle up, and be prepared to finally understand the mystery behind JAXP.
[13]If this chapter feels a little bit like déjà vu, you may have read an earlier version of this text at IBM DeveloperWorks. There were originally two articles published at http://www.ibm.com/developer that explored JAXP. This chapter is an updated and slightly modified version of those articles.
Before diving into code, it's important to cover some basic concepts. Strictly speaking, JAXP is an API, but it is more accurately called an abstraction layer. It does not provide a new means of parsing XML, add to SAX, DOM, or JDOM, or provide new functionality in handling Java and XML. Instead, it makes it easier to deal with some difficult tasks with DOM and SAX. It also makes it possible to handle vendor-specific tasks encountered when using the DOM and SAX APIs, which in turn allows those APIs to be used in a vendor-neutral way.
While I'll go through these features individually, the thing you really need to get a handle on is that JAXP does not provide parsing functionality! Without SAX, DOM, or another XML parsing API, you cannot parse XML. I have seen many requests for a comparison of DOM, SAX, or JDOM to JAXP. Making these comparisons is impossible because the first three APIs serve a completely different purpose than JAXP. SAX, DOM, and JDOM all parse XML. JAXP provides a means to get to these APIs and the results of parsing a document. It doesn't offer a new way to parse the document itself. This is a critical distinction to make if you're going to use JAXP correctly. It will also most likely put you miles ahead of many of your fellow XML developers.
If you're still dubious, download the JAXP 1.0 distribution from Sun's web site at http://java.sun.com/xml and you'll get an idea of how basic JAXP is. In the included jar (jaxp.jar), you will find only six classes! How hard could this API be? All of the classes (part of the javax.xml.parsers package) sit on top of an existing parser. And two of these classes are for error handling. JAXP is simpler than people think.
Part of the trouble stems from the fact that Sun's parser is included with the JAXP download. The parser classes are all in the parser.jar archive as part of the com.sun.xml.parser package and related subpackages. This parser (now code-named Crimson) is not part of JAXP. It is part of the JAXP distribution, but it is not part of the JAXP API. Confusing? A little bit. Think about it this way: JDOM downloads include the Apache Xerces parser. That parser isn't part of JDOM but is used by JDOM, so it's included to ensure that JDOM is usable out of the box. The same principle applies for JAXP, but it isn't as clearly publicized: JAXP comes with Sun's parser so it can be used immediately. However, many people refer to the classes included in Sun's parser as part of the JAXP API itself. For example, a common question on newsgroups is, "How can I use the XMLDocument class that comes with JAXP? What is its purpose?" The answer is somewhat complicated.
First, the com.sun.xml.tree.XMLDocument class is not part of JAXP. It is part of Sun's parser. So the question is misleading from the start. Second, the whole point of JAXP is to provide vendor-independence when dealing with parsers. The same code, using JAXP, could be used with Sun's XML parser, Apache's Xerces XML parser, and Oracle's XML parser. Using a Sun-specific class, then, is a bad idea. It violates the entire point of using JAXP. Are you starting to see how this subject has gotten muddied up? The parser and the API in the JAXP distribution (at least the one from Sun) have been lumped together, and developers mistake classes and features from one as part of the other, and vice versa.
There is another confusing issue related to JAXP. JAXP 1.0 supports only SAX 1.0 and DOM Level 1. It is generally Sun's policy not to ship any API or product based on a working draft, beta, or other nonfinal version of underlying APIs. When JAXP 1.0 was finalized, Sun settled on SAX 1.0, as SAX 2.0 was still in beta, and DOM Level 1, as Level 2 was still in candidate recommendation. There were a lot of users who layered JAXP on top of existing parsers (like Apache Xerces, for example) that had SAX 2.0 and DOM Level 2 support, and suddenly lost functionality. The result was a lot of questions about how to use features that simply couldn't be used with JAXP. It was also right about this time that SAX 2.0 went from beta to final, and really threw things into a mess. However, that hasn't stopped many who didn't need these later versions of DOM and SAX from putting JAXP 1.0 into production, and so I'd be remiss in not covering both the old version (1.0), as well as the new version (1.1), which does support SAX 2.0 and DOM Level 2. The rest of this chapter is split into two parts: the first dealing with JAXP 1.0, and the second with 1.1. Since 1.1 builds on what 1.0 provided in terms of functionality, you should read both sections regardless of the version of the API you're using.
Copyright © 2002 O'Reilly & Associates. All rights reserved.