As you get into the more advanced features of SAX, you certainly don't reduce the number of problems you can get yourself into. However, these problems often become more subtle, which makes for some tricky bugs to track down. I'll point out a few of these common problems.
As I mentioned in the section on EntityResolvers, you should always ensure that you return null as a starting point for resolveEntity( ) method implementations. Luckily, Java ensures that you return something from the method, but I've often seen code like this:
public InputSource resolveEntity(String publicID, String systemID) throws IOException, SAXException { InputSource inputSource = new InputSource( ); // Handle references to online version of copyright.xml if (systemID.equals( "http://www.newInstance.com/javaxml2/copyright.xml")) { inputSource.setSystemId( "file:///c:/javaxml2/ch04/xml/copyright.xml"); } // In the default case, return null return inputSource; }
As you can see, an InputSource is created initially and then the system ID is set on that source. The problem here is that if no if blocks are entered, an InputSource with no system or public ID, as well as no specified Reader or InputStream, is returned. This can lead to unpredictable results; in some parsers, things continue with no problems. In other parsers, though, returning an empty InputSource results in entities being ignored, or in exceptions being thrown. In other words, return null at the end of every resolveEntity( ) implementation, and you won't have to worry about these details.
I've described setting properties and features in this chapter, their affect on validation, and also the DTDHandler interface. In all that discussion of DTDs and validation, it's possible you got a few things mixed up; I want to be clear that the DTDHandler interface has nothing at all to do with validation. I've seen many developers register a DTDHandler and wonder why validation isn't occurring. However, DTDHandler doesn't do anything but provide notification of notation and unparsed entity declarations! Probably not what the developer expected. Remember that it's a property that sets validation, not a handler instance:
reader.setFeature("http://xml.org/sax/features/validation", true);
Anything less than this (short of a parser validating by default) won't get you validation, and probably won't make you very happy.
I've talked about pipelines in SAX in this chapter, and hopefully you got an idea of how useful they could be. However, there's an error I see among filter beginners time and time again, and it's a frustrating one to deal with. The problem is setting up the pipeline chain incorrectly: this occurs when each filter does not set the preceding filter as its parent, ending in an XMLReader instance. Check out this code fragment:
public void buildTree(DefaultTreeModel treeModel, DefaultMutableTreeNode base, String xmlURI) throws IOException, SAXException { // Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); XMLWriter writer = new XMLWriter(reader, new FileWriter("snapshot.xml")); NamespaceFilter filter = new NamespaceFilter(reader, "http://www.oreilly.com/javaxml2", "http://www.oreilly.com/catalog/javaxml2"); ContentHandler jTreeContentHandler = new JTreeContentHandler(treeModel, base, reader); ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( ); // Register content handler reader.setContentHandler(jTreeContentHandler); // Register error handler reader.setErrorHandler(jTreeErrorHandler); // Register entity resolver reader.setEntityResolver(new SimpleEntityResolver( )); // Parse InputSource inputSource = new InputSource(xmlURI); reader.parse(inputSource); }
See anything wrong? Parsing is occurring on the XMLReader instance, not at the end of the pipeline chain. In addition, the NamespaceFilter instance sets its parent to the XMLReader, instead of the XMLWriter instance that should precede it in the chain. These errors are not obvious, and will throw your intended pipeline into chaos. In this example, no filtering will occur at all, because parsing occurs on the reader, not the filters. If you correct that error, you still won't get output, as the writer is left out of the pipeline through improper setting of the NamespaceFilter's parent. Setting the parent properly sets you up, though, and you'll finally get the behavior you expected in the first place. Be very careful with parentage and parsing when handling SAX pipelines.
Copyright © 2002 O'Reilly & Associates. All rights reserved.