You want to use the Document Object Model (DOM) to access and perhaps change the parse tree of an XML file.
Use the XML::LibXML module from CPAN:
use XML::LibXML; my $parser = XML::LibXML->new( ); my $dom = $parser->parse_string($XML); # or my $dom = $parser->parse_file($FILENAME); my $root = $dom->getDocumentElement;
DOM is a framework of classes for representing XML parse trees. Each element is a node in the tree, with which you can do operations like find its children nodes (the XML elements in this case), add another child node, and move the node somewhere else in the tree. The parse_string, parse_file, and parse_fh (filehandle) constructors all return a DOM object that you can use to find nodes in the tree.
For example, given the books XML from Example 22-1, Example 22-2 shows one way to print the titles.
#!/usr/bin/perl -w # dom-titledumper -- display titles in books file using DOM use XML::LibXML; use Data::Dumper; use strict; my $parser = XML::LibXML->new; my $dom = $parser->parse_file("books.xml") or die; # get all the title elements my @titles = $dom->getElementsByTagName("title"); foreach my $t (@titles) { # get the text node inside the <title> element, and print its value print $t->firstChild->data, "\n"; }
The getElementsByTagName method returns a list of elements as nodes within the document that have the specific tag name. Here we get a list of the title elements, then go through each title to find its contents. We know that each title has only a single piece of text, so we assume the first child node is text and print its contents.
If we wanted to confirm that the node was a text node, we could say:
die "the title contained something other than text!" if $t->firstChild->nodeType != 3;
This ensures that the first node is of type 3 (text). Table 22-1 shows LibXML's numeric node types, which the nodeType method returns.
Node type |
Number |
---|---|
Element |
1 |
Attribute |
2 |
Text |
3 |
CDATA Section |
4 |
Entity Ref |
5 |
Entity |
6 |
Processing Instruction |
7 |
Comment |
8 |
Document |
9 |
Document Type |
10 |
Document Fragment |
11 |
Notation |
12 |
HTML Document |
13 |
DTD Node |
14 |
Element Decl |
15 |
Attribute Decl |
16 |
Entity Decl |
17 |
Namespace Decl |
18 |
XInclude Start |
19 |
XInclude End |
20 |
You can also create and insert new nodes, or move and delete existing ones, to change the parse tree. Example 22-23 shows how you would add a randomly generated price value to each book element.
#!/usr/bin/perl -w # dom-addprice -- add price element to books use XML::LibXML; use Data::Dumper; use strict; my $parser = XML::LibXML->new; my $dom = $parser->parse_file("books.xml") or die; my $root = $dom->documentElement; # get list of all the "book" elements my @books = $root->getElementsByTagName("book"); foreach my $book (@books) { my $price = sprintf("\$%d.95", 19 + 5 * int rand 5); # random price my $price_text_node = $dom->createTextNode($price); # contents of <price> my $price_element = $dom->createElement("price"); # create <price> $price_element->appendChild($price_text_node); # put contents into <price> $book->appendChild($price_element); # put <price> into <book> } print $dom->toString;
We use createTextNode and createElement to build the new price tag and its contents. Then we use appendChild to insert the tag onto the end of the current book tag's existing contents. The toString method emits a document as XML, which lets you easily write XML filters like this one using DOM.
The XML::LibXML::DOM manpage gives a quick introduction to the features of XML::LibXML's DOM support and references the manpages for the DOM classes (e.g., XML::LibXML::Node). Those manpages list the methods for the objects.
The documentation for the XML::LibXML::DOM, XML::LibXML::Document, XML::LibXML::Element, and XML::LibXML::Node modules
Copyright © 2003 O'Reilly & Associates. All rights reserved.