In many cases, you'll find that the XML modules on CPAN satisfy 90 percent of your needs. Of course, that final 10 percent is the difference between being an essential member of your company's staff and ending up slated for the next round of layoffs. We're going to give you your money's worth out of this book by showing you in gruesome detail how XML processing in Perl works at the lowest levels (relative to any other kind of specialized text munging you may perform with Perl). To start, let's go over some basic truths:
It doesn't matter where it comes from.
By the time the XML parsing part of a program gets its hands on a document, it doesn't give a camel's hump where the thing came from. It could have been received over a network, constructed from a database, or read from disk. To the parser, it's good (or bad) XML, and that's all it knows.
Mind you, the program as a whole might care a great deal. If we write a program that implements XML-RPC, for example, it better know exactly how to use TCP to fetch and send all that XML data over the Internet! We can have it do that fetching and sending however we like, as long as the end product is the same: a clean XML document fit to pass to the XML processor that lies at the program's core.
We will get into some detailed examples of larger programs later in this book.
Structurally, all XML documents are similar.
No matter why or how they were put together or to what purpose they'll be applied, all XML documents must follow the same basic rules of well-formedness: exactly one root element, no overlapping elements, all attributes quoted, and so on. Every XML processor's parser component will, at its core, need to do the same things as every other XML processor. This, in turn, means that all these processors can share a common base. Perl XML-processing programs usually observe this in their use of one of the many free parsing modules, rather than having to reimplement basic XML parsing procedures every time.
Furthermore, the one-document, one-element nature of XML makes processing a pleasantly fractal experience, as any document invoked through an external entity by another document magically becomes "just another element" within the invoker, and the same code that crawled the first document can skitter into the meat of any reference (and anything to which the reference might refer) without batting an eye.
In meaning, all XML applications are different.
XML applications are the raison d'être of any one XML document, the higher-level set of rules they follow with an aim for applicability to some useful purpose -- be it filling out a configuration file, preparing a network transmission, or describing a comic strip. XML applications exist to not only bless humble documents with a higher sense of purpose, but to require the documents to be written according to a given application specification.
DTDs help enforce the consistency of this structure. However, you don't have to have a formal validation scheme to make an application. You may want to create some validation rules, though, if you need to make sure that your successors (including yourself, two weeks in the future) do not stray from the path you had in mind when they make changes to the program. You should also create a validation scheme if you want to allow others to write programs that generate the same flavor of XML.
Most of the XML hacking you'll accomplish will capitalize on this document/application duality. In most cases, your software will consist of parts that cover all three of these facts:
It will accept input in an appropriate way -- listening to a network socket, for example, or reading a file from disk. This behavior is very ordinary and Perlish: do whatever's necessary here to get that data.
It will pass captured input to some kind of XML processor. Dollars to doughnuts says you'll use one of the parsers that other people in the Perl community have already written and continue to maintain, such as XML::Simple, or the more sophisticated modules we'll discuss later.
Finally, it will Do Something with whatever that processor did to the XML. Maybe it will output more XML (or HTML), update a database, or send mail to your mom. This is the defining point of your XML application -- it takes the XML and does something meaningful with it. While we won't cover the infinite possibilities here, we will discuss the crucial ties between the XML processor and the rest of your program.
Copyright © 2002 O'Reilly & Associates. All rights reserved.