start page | rating of books | rating of authors | reviews | copyrights

Book HomePerl & XMLSearch this book

6.4. XML::SimpleObject

Using built-in data types is fine, but as your code becomes more complex and hard to read, you may start to pine for the neater interfaces of objects. Doing things like testing a node's type, getting the last child of an element, or changing the representation of data without breaking the rest of the program is easier with objects. It's not surprising that there are more object-oriented modules for XML than you can shake a stick at.

Dan Brian's XML::SimpleObject starts the tour of object models for XML trees. It takes the structure returned by XML::Parser in tree mode and changes it from a hierarchy of lists into a hierarchy of objects. Each object represents an element and provides methods to access its children. As with XML::Simple, elements are accessed by their names, passed as arguments to the methods.

Let's see how useful this module is. Example 6-5 is a silly datafile representing a genealogical tree. We're going to write a program to parse this file into an object tree and then traverse the tree to print out a text description.

Example 6-5. A genealogical tree

<ancestry>
  <ancestor><name>Glook the Magnificent</name>
    <children>
      <ancestor><name>Glimshaw the Brave</name></ancestor>
      <ancestor><name>Gelbar the Strong</name></ancestor>
      <ancestor><name>Glurko the Healthy</name>
        <children>
          <ancestor><name>Glurff the Sturdy</name></ancestor>
          <ancestor><name>Glug the Strange</name>
            <children>
              <ancestor><name>Blug the Insane</name></ancestor>
              <ancestor><name>Flug the Disturbed</name></ancestor>
            </children>
          </ancestor>
        </children>
      </ancestor>
    </children>
  </ancestor>
</ancestry>

Example 6-6 is our program. It starts by parsing the file with XML::Parser in tree mode and passing the result to an XML::SimpleObject constructor. Next, we write a routine begat( ) to traverse the tree and output text recursively. At each ancestor, it prints the name. If there are progeny, which we find out by testing whether the child method returns a non-undef value, it descends the tree to process them too.

Example 6-6. An XML::SimpleObject program

use XML::Parser;
use XML::SimpleObject;

# parse the data file and build a tree object
my $file = shift @ARGV;
my $parser = XML::Parser->new( ErrorContext => 2, Style => "Tree" );
my $tree = XML::SimpleObject->new( $parser->parsefile( $file ));

# output a text description
print "My ancestry starts with ";
begat( $tree->child( 'ancestry' )->child( 'ancestor' ), '' );

# describe a generation of ancestry
sub begat {
    my( $anc, $indent ) = @_;

    # output the ancestor's name
    print $indent . $anc->child( 'name' )->value;

    # if there are children, recurse over them
    if( $anc->child( 'children' ) and $anc->child( 'children' )->children ) {
        print " who begat...\n";
        my @children = $anc->child( 'children' )->children;
        foreach my $child ( @children ) {
            begat( $child, $indent . '   ' );
        }
    } else {
        print "\n";
    }
}

To prove it works, here's the output. In the program, we added indentation to show the descent through generations:

My ancestry starts with Glook the Magnificent who begat...
   Glimshaw the Brave
   Gelbar the Strong
   Glurko the Healthy who begat...
      Glurff the Sturdy
      Glug the Strange who begat...
         Blug the Insane
         Flug the Disturbed

We used several different methods to access data in objects. child( ) returns a reference to an XML::SimpleObject object that represents a child of the source node. children( ) returns a list of such references. value( ) looks for a character data node inside the source node and returns a scalar value. Passing arguments in these methods restricts the search to just a few matching nodes. For example, child( 'name' ) specifies the <name> element among a set of children. If the search fails, the value undef is given.

This is a good start, but as its name suggests, it may be a little too simple for some applications. There are limited ways to access nodes, mostly by getting a child or list of children. Accessing elements by name doesn't work when more than one element has the same name.

Unfortunately, this module's objects lack a way to get XML back out, so outputting a document from this structure is not easy. However, for simplicity, this module is an easy OO solution to learn and use.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.