You want to filter some XML. For example, you want to make substitutions in the body of a document, or add a price to every book described in an XML document, or you want to change <book id="1"> to <book> <id>1</id>.
Use the XML::SAX::Machines module from CPAN:
#!/usr/bin/perl -w use MySAXFilter1; use MySAXFilter2; use XML::SAX::ParserFactory; use XML::SAX::Machines qw(Pipeline); my $machine = Pipeline(MySAXFilter1 => MySAXFilter2); # or more $machine->parse_uri($FILENAME);
Write a handler, inheriting from XML::SAX::Base as in Recipe 22.3, then whenever you need a SAX event, call the appropriate handler in your superclass. For example:
$self->SUPER::start_element($tag_struct);
A SAX filter accepts SAX events and triggers new ones. The XML::SAX::Base module detects whether your handler object is called as a filter. If so, the XML::SAX::Base methods pass the SAX events onto the next filter in the chain. If your handler object is not called as a filter, then the XML::SAX::Base methods consume events but do not emit them. This makes it almost as simple to write events as it is to consume them.
The XML::SAX::Machines module chains the filters for you. Import its Pipeline function, then say:
my $machine = Pipeline(Filter1 => Filter2 => Filter3 => Filter4); $machine->parse_uri($FILENAME);
SAX events triggered by parsing the XML file go to Filter1, which sends possibly different events to Filter2, which in turn sends events to Filter3, and so on to Filter4. The last filter should print or otherwise do something with the incoming SAX events. If you pass a reference to a typeglob, XML::SAX::Machines writes the XML to the filehandle in that typeglob.
Example 22-5 shows a filter that turns the id attribute in book elements from the XML document in Example 22-1 into a new id element. For example, <book id="1"> becomes <book><id>1</id>.
package RewriteIDs; # RewriteIDs.pm -- turns "id" attributes into elements use base qw(XML::SAX::Base); my $ID_ATTRIB = "{ }id"; # the attribute hash entry we're interested in sub start_element { my ($self, $data) = @_; if ($data->{Name} eq 'book') { my $id = $data->{Attributes}{$ID_ATTRIB}{Value}; delete $data->{Attributes}{$ID_ATTRIB}; $self->SUPER::start_element($data); # make new element parameter data structure for the <id> tag my $id_node = { }; %$id_node = %$self; $id_node->{Name} = 'id'; # more complex if namespaces involved $id_node->{Attributes} = { }; # build the <id>$id</id> $self->SUPER::start_element($id_node); $self->SUPER::characters({ Data => $id }); $self->SUPER::end_element($id_node); } else { $self->SUPER::start_element($data); } } 1;
Example 22-6 is the stub that uses XML::SAX::Machines to create the pipeline for processing books.xml and print the altered XML.
#!/usr/bin/perl -w # rewrite-ids -- call RewriteIDs SAX filter to turn id attrs into elements use RewriteIDs; use XML::SAX::Machines qw(:all); my $machine = Pipeline(RewriteIDs => *STDOUT); $machine->parse_uri("books.xml");
The output of Example 22-6 is as follows (truncated for brevity):
<book><id>1</id> <title>Programming Perl</title> ... <book><id>2</id> <title>Perl & LWP</title> ...
To save the XML to the file new-books.xml, use the XML::SAX::Writer module:
#!/usr/bin/perl -w use RewriteIDs; use XML::SAX::Machines qw(:all); use XML::SAX::Writer; my $writer = XML::SAX::Writer->new(Output => "new-books.xml"); my $machine = Pipeline(RewriteIDs => $writer); $machine->parse_uri("books.xml");
You can also pass a scalar reference as the Output parameter to have the XML appended to the scalar; as an array reference to have the XML appended to the array, one array element per SAX event; or as a filehandle to have the XML printed to that filehandle.
Copyright © 2003 O'Reilly & Associates. All rights reserved.