start page | rating of books | rating of authors | reviews | copyrights

Perl CookbookPerl CookbookSearch this book

20.20. Program: htmlsub

This program makes substitutions in HTML files so changes happen only in normal text. If you had the file scooby.html that contained:

<HTML><HEAD><TITLE>Hi!</TITLE></HEAD>
<BODY><H1>Welcome to Scooby World!</H1>
I have <A HREF="pictures.html">pictures</A> of the crazy dog
himself.  Here's one!<P>
<IMG SRC="scooby.jpg" ALT="Good doggy!"><P>
<BLINK>He's my hero!</BLINK>  I would like to meet him some day,
and get my picture taken with him.<P>
P.S. I am deathly ill.  <A HREF="shergold.html">Please send
cards</A>.
</BODY></HTML>

you could use htmlsub to change every occurrence of the word "picture" in the document text to read "photo". It prints the new document on STDOUT:

% htmlsub picture photo scooby.html
<HTML><HEAD><TITLE>Hi!</TITLE></HEAD>
<BODY><H1>Welcome to Scooby World!</H1>
I have <A HREF="pictures.html">photos</A> of the crazy dog
himself.  Here's one!<P>
<IMG SRC="scooby.jpg" ALT="Good doggy!"><P>
<BLINK>He's my hero!</BLINK>  I would like to meet him some day,
and get my photo taken with him.<P>
P.S. I am deathly ill.  <A HREF="shergold.html">Please send
cards</A>.
</BODY></HTML

The program is shown in Example 20-12.

Example 20-12. htmlsub

  #!/usr/bin/perl -w
  # htmlsub - make substitutions in normal text of HTML files
  # from Gisle Aas <[email protected]>
  
  sub usage { die "Usage: $0 <from> <to> <file>...\n" }
  
  my $from = shift or usage;
  my $to   = shift or usage;
  usage unless @ARGV;
  
  # Build the HTML::Filter subclass to do the substituting.
  
  package MyFilter;
  use HTML::Filter;
  @ISA=qw(HTML::Filter);
  use HTML::Entities qw(decode_entities encode_entities);
  
  sub text
  {
     my $self = shift;
     my $text = decode_entities($_[0]);
     $text =~ s/\Q$from/$to/go;       # most important line
     $self->SUPER::text(encode_entities($text));
  }
  
  # Now use the class.
  
  package main;
  foreach (@ARGV) {
      MyFilter->new->parse_file($_);
  }


Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.