start page | rating of books | rating of authors | reviews | copyrights

Learning Perl Objects, References & ModulesLearning Perl Objects, References & ModulesSearch this book

Chapter 5. Manipulating Complex Data Structures

Contents:

Using the Debugger to View Complex Data
Viewing Complex Data with Data::Dumper
Storing Complex Data with Storable
The map and grep Operators
Using map
Applying a Bit of Indirection
Selecting and Altering Complex Data
Exercises

Now that you've seen the basics of references, let's look at additional ways to manipulate complex data. We'll start by using the debugger to examine complex data structures and then use Data::Dumper to show the data under programmatic control. Next, you'll learn to store and retrieve complex data easily and quickly using Storable, and finally we'll wrap up with a review of grep and map and see how they apply to complex data.

5.1. Using the Debugger to View Complex Data

The Perl debugger can display complex data easily. For example, let's single-step through one version of the byte-counting program from Chapter 4:

my %total_bytes;
while (<>) {
  my ($source, $destination, $bytes) = split;
  $total_bytes{$source}{$destination} += $bytes;
}
for my $source (sort keys %total_bytes) {
  for my $destination (sort keys %{ $total_bytes{$source} }) {
    print "$source => $destination:",
     " $total_bytes{$source}{$destination} bytes\n";
  }
  print "\n";
}

Here's the data you'll use to test it:

professor.hut gilligan.crew.hut 1250
professor.hut lovey.howell.hut 910
thurston.howell.hut lovey.howell.hut 1250
professor.hut lovey.howell.hut 450
ginger.girl.hut professor.hut 1218
ginger.girl.hut maryann.girl.hut 199

You can do this a number of ways. One of the easiest is to invoke Perl with a -d switch on the command line:

myhost% perl -d bytecounts bytecounts-in

Loading DB routines from perl5db.pl version 1.19
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(bytecounts:2):        my %total_bytes;
  DB<1> s
main::(bytecounts:3):        while (<>) {
  DB<1> s
main::(bytecounts:4):          my ($source, $destination, $bytes) = split;
  DB<1> s
main::(bytecounts:5):          $total_bytes{$source}{$destination} += $bytes;
  DB<1> x $source, $destination, $bytes
0  'professor.hut'
1  'gilligan.crew.hut'
2  1250

If you're playing along at home, be aware that each new release of the debugger works differently than any other, so your screen probably won't look exactly like this. Also, if you get stuck at any time, type h for help, or look at perldoc perldebug.

Each line of code is shown before it is executed. That means that, at this point, you're about to invoke the autovivification, and you've got your keys established. The s command single-steps the program, while the x command dumps a list of values in a nice format. You can see that $source, $destination, and $bytes are correct, and now it's time to update the data:

  DB<2> s
main::(bytecounts:3):        while (<>) {

You've created the hash entries through autovivification. Let's see what you've got:

   DB<2> x \%total_bytes
 0  HASH(0x132dc)
    'professor.hut' => HASH(0x37a34)
       'gilligan.crew.hut' => 1250

When x is given a hash reference, it dumps the entire contents of the hash, showing the key/value pairs. If any of the values are also hash references, they are dumped as well, recursively. What you'll see is that the %total_bytes hash has a single key of professor.hut, whose corresponding value is another hash reference. The referenced hash contains a single key of gilligan.crew.hut, with a value of 1250, as expected.

Let's see what happens just after the next assignment:

  DB<3> s
main::(bytecounts:4):          my ($source, $destination, $bytes) = split;
  DB<3> s
main::(bytecounts:5):          $total_bytes{$source}{$destination} += $bytes;
  DB<3> x $source, $destination, $bytes
0  'professor.hut'
1  'lovey.howell.hut'
2  910
  DB<4> s
main::(bytecounts:3):        while (<>) {
  DB<4> x \%total_bytes
0  HASH(0x132dc)
   'professor.hut' => HASH(0x37a34)
      'gilligan.crew.hut' => 1250
      'lovey.howell.hut' => 910

Now you've added bytes flowing from professor.hut to lovey.howell.hut. The top-level hash hasn't changed, but the second-level hash has added a new entry. Let's continue:

  DB<5> s
main::(bytecounts:4):          my ($source, $destination, $bytes) = split;
  DB<6> s
main::(bytecounts:5):          $total_bytes{$source}{$destination} += $bytes;
  DB<6> x $source, $destination, $bytes
0  'thurston.howell.hut'
1  'lovey.howell.hut'
2  1250
  DB<7> s
main::(bytecounts:3):        while (<>) {
  DB<7> x \%total_bytes
0  HASH(0x132dc)
   'professor.hut' => HASH(0x37a34)
      'gilligan.crew.hut' => 1250
      'lovey.howell.hut' => 910
   'thurston.howell.hut' => HASH(0x2f9538)
      'lovey.howell.hut' => 1250

Ah, now it's getting interesting. A new entry in the top-level hash has a key of thurston.howell.hut, and a new hash reference, autovivified initially to an empty hash. Immediately after the new empty hash was put in place, a new key/value pair was added, indicating 1250 bytes transferred from thurston.howell.hut to lovey.howell.hut. Let's step some more:

  DB<8> s
main::(bytecounts:4):          my ($source, $destination, $bytes) = split;
  DB<8> s
main::(bytecounts:5):          $total_bytes{$source}{$destination} += $bytes;
  DB<8> x $source, $destination, $bytes
0  'professor.hut'
1  'lovey.howell.hut'
2  450
  DB<9> s
main::(bytecounts:3):        while (<>) {
  DB<9> x \%total_bytes
0  HASH(0x132dc)
   'professor.hut' => HASH(0x37a34)
      'gilligan.crew.hut' => 1250
      'lovey.howell.hut' => 1360
   'thurston.howell.hut' => HASH(0x2f9538)
      'lovey.howell.hut' => 1250

Now you're adding in some more bytes from professor.hut to lovey.howell.hut, reusing the existing value place. Nothing too exciting there. Let's keep stepping:

  DB<10> s
main::(bytecounts:4):          my ($source, $destination, $bytes) = split;
  DB<10> s
main::(bytecounts:5):          $total_bytes{$source}{$destination} += $bytes;
  DB<10> x $source, $destination, $bytes
0  'ginger.girl.hut'
1  'professor.hut'
2  1218
  DB<11> s
main::(bytecounts:3):        while (<>) {
  DB<11> x \%total_bytes
0  HASH(0x132dc)
   'ginger.girl.hut' => HASH(0x297474)
      'professor.hut' => 1218
   'professor.hut' => HASH(0x37a34)
      'gilligan.crew.hut' => 1250
      'lovey.howell.hut' => 1360
   'thurston.howell.hut' => HASH(0x2f9538)
      'lovey.howell.hut' => 1250

This time, you added a new source, ginger.girl.hut. Notice that the top level hash now has three elements, and each element has a different hash reference value. Let's step some more:

  DB<12> s
main::(bytecounts:4):          my ($source, $destination, $bytes) = split;
  DB<12> s
main::(bytecounts:5):          $total_bytes{$source}{$destination} += $bytes;
  DB<12> x $source, $destination, $bytes
0  'ginger.girl.hut'
1  'maryann.girl.hut'
2  199
  DB<13> s
main::(bytecounts:3):        while (<>) {
  DB<13> x \%total_bytes
0  HASH(0x132dc)
   'ginger.girl.hut' => HASH(0x297474)
      'maryann.girl.hut' => 199
      'professor.hut' => 1218
   'professor.hut' => HASH(0x37a34)
      'gilligan.crew.hut' => 1250
      'lovey.howell.hut' => 1360
   'thurston.howell.hut' => HASH(0x2f9538)
      'lovey.howell.hut' => 1250

Now you've added a second destination to the hash that records information for all bytes originating at ginger.girl.hut. Because that was the final line of data (in this run), a step brings you down to the lower foreach loop:

  DB<14> s
main::(bytecounts:8):        for my $source (sort keys %total_bytes) {

Even though you can't directly examine the list value from inside those parentheses, you can display it:

    DB<14> x sort keys %total_bytes
  0  'ginger.girl.hut'
  1  'professor.hut'
  2  'thurston.howell.hut'

This is the list the foreach now scans. These are all the sources for transferred bytes seen in this particular logfile. Here's what happens when you step into the inner loop:

    DB<15> s
  main::(bytecounts:9):          for my $destination (sort keys %{ $total bytes{
$source} }) {

At this point, you can determine from the inside out exactly what values will result from the list value from inside the parentheses. Let's look at them:

  DB<15> x $source
0  'ginger.girl.hut'
  DB<16> x $total_bytes{$source}
0  HASH(0x297474)
   'maryann.girl.hut' => 199
   'professor.hut' => 1218
  DB<18> x keys %{ $total_bytes{$source } }
0  'maryann.girl.hut'
1  'professor.hut'
  DB<19> x sort keys %{ $total_bytes{$source } }
0  'maryann.girl.hut'
1  'professor.hut'

Note that dumping $total_bytes{$source} shows that it was a hash reference. Also, the sort appears not to have done anything, but the output of keys is not necessarily in a sorted order. The next step finds the data:

  DB<20> s
main::(bytecounts:10):            print "$source => $destination:",
main::(bytecounts:11):              " $total_bytes{$source}{$destination} bytes\n";
  DB<20> x $source, $destination
0  'ginger.girl.hut'
1  'maryann.girl.hut'
  DB<21> x $total_bytes{$source}{$destination}
0  199

As you can see, with the debugger, you can easily show the data, even structured data, to help you understand your program.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.