start page | rating of books | rating of authors | reviews | copyrights

Perl CookbookPerl CookbookSearch this book

22.1. Parsing XML into Data Structures

22.1.1. Problem

You want a Perl data structure (a combination of hashes and arrays) that corresponds to the structure and content of an XML file. For example, you have XML representing a configuration file, and you'd like to say $xml->{config}{server}{hostname} to access the contents of <config><server><hostname>...</hostname>.

22.1.2. Solution

Use the XML::Simple module from CPAN. If your XML is in a file, pass the filename to XMLin:

use XML::Simple;
$ref = XMLin($FILENAME, ForceArray => 1);

If your XML is in a string, pass the string to XMLin:

use XML::Simple;
$ref = XMLin($STRING, ForceArray => 1);

22.1.3. Discussion

Here's the data structure that XML::Simple produces from the XML in Example 22-1:

{

'book' => {
  '1' => {
    'authors' => [
      {
        'author' => [
          {
            'firstname' => [ 'Larry' ],
            'lastname'  => [ 'Wall'  ]
          },
          {
            'firstname' => [ 'Tom' ],
            'lastname'  => [ 'Christiansen' ]
          },
          {
            'firstname' => [ 'Jon' ],
            'lastname'  => [ 'Orwant' ]
          }
        ]
      }
    ],
    'edition' => [ '3' ],
    'title'   => [ 'Programming Perl' ],
    'isbn'    => [ '0-596-00027-8' ]
  },
  '2' => {
    'authors' => [
      {
        'author' => [
          {
            'firstname' => [ 'Sean' ],
            'lastname'  => [ 'Burke' ]
          }
        ]
      }
    ],
    'edition' => [ '1' ],
    'title'   => [ 'Perl & LWP' ],
    'isbn'    => [ '0-596-00178-9' ]
  },
  '3' => {
    'authors' => [ {  } ],
    'edition' => [ '1' ],
    'title'   => [ 'Anonymous Perl' ],
    'isbn'    => [ '0-555-00178-0' ]
  },
}
  }

The basic function of XML::Simple is to turn an element that contains other elements into a hash. If there are multiple identically named elements inside a single containing element (e.g., book), they become an array of hashes unless XML::Simple knows they are uniquely identified by attributes (as happens here with the id attribute).

By default, XML::Simple assumes that if an element has an attribute called id, name, or key, then that attribute is a unique identifier for the element. This is controlled by the KeyAttr option to the XMLin function. For example, set KeyAttr to an empty list to disable this conversion from arrays of elements to a hash by attribute:

$ref = XMLin($xml, ForceArray => 1, KeyAttr => [  ]);

For more fine-grained control, specify a hash that maps the element name to the attribute that holds a unique identifier. For example, to create a hash on the id attribute of book elements and no others, say:

$ref = XMLin($xml, ForceArray => 1, KeyAttr => { book => "id" });

The ForceArray option creates all of those one-element arrays in the data structure. Without it, XML::Simple compacts one-element arrays:

'3' => {
  'authors' => {  },
  'edition' => '1',
  'title' => 'Anonymous Perl',
  'isbn' => '0-555-00178-0'
},

Although this format is easier to read, it's also harder to program for. If you know that no element repeats, you can leave ForceArray off. But if some elements repeat and some don't, you need ForceArray to ensure a consistent data structure. Having the data sometimes directly available, sometimes inside an array, complicates the code.

The XML::Simple module has options that control the data structure built from the XML. Read the module's manpage for more details. Be aware that XML::Simple is only really useful for highly structured data, like the kind used in configuration files. It's awkward to use with XML that represents documents rather than data structures, and doesn't let you work with XML features like processing instructions or comments. We recommend that, for all but the most simple XML, you look to DOM and SAX parsing for your XML parsing needs.

22.1.4. See Also

The documentation for the CPAN module XML::Simple; Recipe 22.10



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.