start page | rating of books | rating of authors | reviews | copyrights

Perl CookbookPerl CookbookSearch this book

22.4. Making Simple Changes to Elements or Text

22.4.1. Problem

You want to filter some XML. For example, you want to make substitutions in the body of a document, or add a price to every book described in an XML document, or you want to change <book id="1"> to <book> <id>1</id>.

22.4.2. Solution

Use the XML::SAX::Machines module from CPAN:

#!/usr/bin/perl -w

use MySAXFilter1;
use MySAXFilter2;
use XML::SAX::ParserFactory;
use XML::SAX::Machines qw(Pipeline);

my $machine = Pipeline(MySAXFilter1 => MySAXFilter2); # or more
$machine->parse_uri($FILENAME);

Write a handler, inheriting from XML::SAX::Base as in Recipe 22.3, then whenever you need a SAX event, call the appropriate handler in your superclass. For example:

$self->SUPER::start_element($tag_struct);

22.4.3. Discussion

A SAX filter accepts SAX events and triggers new ones. The XML::SAX::Base module detects whether your handler object is called as a filter. If so, the XML::SAX::Base methods pass the SAX events onto the next filter in the chain. If your handler object is not called as a filter, then the XML::SAX::Base methods consume events but do not emit them. This makes it almost as simple to write events as it is to consume them.

The XML::SAX::Machines module chains the filters for you. Import its Pipeline function, then say:

my $machine = Pipeline(Filter1 => Filter2 => Filter3 => Filter4);
$machine->parse_uri($FILENAME);

SAX events triggered by parsing the XML file go to Filter1, which sends possibly different events to Filter2, which in turn sends events to Filter3, and so on to Filter4. The last filter should print or otherwise do something with the incoming SAX events. If you pass a reference to a typeglob, XML::SAX::Machines writes the XML to the filehandle in that typeglob.

Example 22-5 shows a filter that turns the id attribute in book elements from the XML document in Example 22-1 into a new id element. For example, <book id="1"> becomes <book><id>1</id>.

Example 22-5. filters-rewriteids

package RewriteIDs;
# RewriteIDs.pm -- turns "id" attributes into elements

use base qw(XML::SAX::Base);

my $ID_ATTRIB = "{  }id";   # the attribute hash entry we're interested in

sub start_element {
    my ($self, $data) = @_;

    if ($data->{Name} eq 'book') {
        my $id = $data->{Attributes}{$ID_ATTRIB}{Value};
        delete $data->{Attributes}{$ID_ATTRIB};
        $self->SUPER::start_element($data);

        # make new element parameter data structure for the <id> tag
        my $id_node = {  };
        %$id_node = %$self;
        $id_node->{Name} = 'id';     # more complex if namespaces involved
        $id_node->{Attributes} = {  };

        # build the <id>$id</id>
        $self->SUPER::start_element($id_node);
        $self->SUPER::characters({ Data => $id });
        $self->SUPER::end_element($id_node);
    } else {
        $self->SUPER::start_element($data);
    }
}

1;

Example 22-6 is the stub that uses XML::SAX::Machines to create the pipeline for processing books.xml and print the altered XML.

Example 22-6. filters-rewriteprog

#!/usr/bin/perl -w
# rewrite-ids -- call RewriteIDs SAX filter to turn id attrs into elements

use RewriteIDs;
use XML::SAX::Machines qw(:all);

my $machine = Pipeline(RewriteIDs => *STDOUT);
$machine->parse_uri("books.xml");

The output of Example 22-6 is as follows (truncated for brevity):

<book><id>1</id>
    <title>Programming Perl</title>
 ...
<book><id>2</id>
    <title>Perl &amp; LWP</title>
 ...

To save the XML to the file new-books.xml, use the XML::SAX::Writer module:

#!/usr/bin/perl -w

use RewriteIDs;
use XML::SAX::Machines qw(:all);
use XML::SAX::Writer;

my $writer = XML::SAX::Writer->new(Output => "new-books.xml");
my $machine = Pipeline(RewriteIDs => $writer);
$machine->parse_uri("books.xml");

You can also pass a scalar reference as the Output parameter to have the XML appended to the scalar; as an array reference to have the XML appended to the array, one array element per SAX event; or as a filehandle to have the XML printed to that filehandle.

22.4.4. See Also

The documentation for the modules XML::SAX::Machines and XML::SAX::Writer



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.