The last object model we'll examine before jumping into standards-based solutions is Ken MacLeod's XML::Grove. Like XML::SimpleObject, it takes the XML::Parser output in tree mode and changes it into an object hierarchy. The difference is that each node type is represented by a different class. Therefore, an element would be mapped to XML::Grove::Element, a processing instruction to XML::Grove::PI, and so on. Text nodes are still scalar values.
Another feature of this module is that the declarations in the internal subset are captured in lists accessible through the XML::Grove object. Every entity or notation declaration is available for your perusal. For example, the following program counts the distribution of elements and other nodes, and then prints a list of node types and their frequency.
First, we initialize the parser with the style "grove" (to tell XML::Parser that it needs to use XML::Parser::Grove to process its output):
use XML::Parser; use XML::Parser::Grove; use XML::Grove; my $parser = XML::Parser->new( Style => 'grove', NoExpand => '1' ); my $grove = $parser->parsefile( shift @ARGV );
Next, we access the contents of the grove by calling the contents( ) method. This method returns a list including the root element and any comments or PIs outside of it. A subroutine called tabulate( ) counts nodes and descends recursively through the tree. Finally, the results are printed:
# tabulate elements and other nodes my %dist; foreach( @{$grove->contents} ) { &tabulate( $_, \%dist ); } print "\nNODES:\n\n"; foreach( sort keys %dist ) { print "$_: " . $dist{$_} . "\n"; }
Here is the subroutine that handles each node in the tree. Since each node is a different class, we can use ref( ) to get the type. Attributes are not treated as nodes in this model, but are available through the element class's method attributes( ) as a hash. The call to contents( ) allows the routine to continue processing the element's children:
# given a node and a table, find out what the node is, add to the count, # and recurse if necessary # sub tabulate { my( $node, $table ) = @_; my $type = ref( $node ); if( $type eq 'XML::Grove::Element' ) { $table->{ 'element' }++; $table->{ 'element (' . $node->name . ')' }++; foreach( keys %{$node->attributes} ) { $table->{ "attribute ($_)" }++; } foreach( @{$node->contents} ) { &tabulate( $_, $table ); } } elsif( $type eq 'XML::Grove::Entity' ) { $table->{ 'entity-ref (' . $node->name . ')' }++; } elsif( $type eq 'XML::Grove::PI' ) { $table->{ 'PI (' . $node->target . ')' }++; } elsif( $type eq 'XML::Grove::Comment' ) { $table->{ 'comment' }++; } else { $table->{ 'text-node' }++ } }
Here's a typical result, when run on an XML datafile:
NODES: PI (a): 1 attribute (date): 1 attribute (style): 12 attribute (type): 2 element: 30 element (category): 2 element (inventory): 1 element (item): 6 element (location): 6 element (name): 12 element (note): 3 text-node: 100
Copyright © 2002 O'Reilly & Associates. All rights reserved.