start page | rating of books | rating of authors | reviews | copyrights

Book HomePerl & LWPSearch this book

2.7. Example: Babelfish

Submitting a POST query to Babelfish is as simple as:

my ($content, $message, $is_success) = do_POST(
  'http://babelfish.altavista.com/translate.dyn',
  [ 'urltext' => "I like pie", 'lp' => "en_fr", 'enc' => 'utf8' ],
);

If the request succeeded ($is_success will tell us this), $content will be an HTML page that contains the translation text. At time of this writing, the translation is inside the only textarea element on the page, so it can be extracted with just this regexp:

$content =~ m{<textarea.*?>(.*?)</textarea>}is;

The translated text is now in $1, if the match succeeded.

Knowing this, it's easy to wrap this whole procedure up in a function that takes the text to translate and a specification of what language from and to, and returns the translation. Example 2-8 is such a function.

Example 2-8. Using Babelfish to translate

sub translate {
  my ($text, $language_path) = @_;

  my ($content, $message, $is_success) = do_POST(
    'http://babelfish.altavista.com/translate.dyn',
    [ 'urltext' => $text, 'lp' => $language_path, 'enc' => 'utf8' ],
  );
  die "Error in translation $language_path: $message\n"
   unless $is_success;

  if ($content =~ m{<textarea.*?>(.*?)</textarea>}is) {
    my $translation;
    $translation = $1;
    # Trim whitespace:
    $translation =~ s/\s+/ /g;
    $translation =~ s/^ //s;
    $translation =~ s/ $//s;
    return $translation;
  } else {
    die "Can't find translation in response to $language_path";
  }
}

The translate( ) subroutine constructs the request and extracts the translation from the response, cleaning up any whitespace that may surround it. If the request couldn't be completed, the subroutine throws an exception by calling die( ).

The translate( ) subroutine could be used to automate on-demand translation of important content from one language to another. But machine translation is still a fairly new technology, and the real value of it is to be found in translating from English into another language and then back into English, just for fun. (Incidentally, there's a CPAN module that takes care of all these details for you, called Lingua::Translate, but here we're interested in how to carry out the task, rather than whether someone's already figured it out and posted it to CPAN.)

The alienate program given in Example 2-9 does just this (the definitions of translate( ) and do_POST( ) have been omitted from the listing for brevity).

Example 2-9. The alienate program

#!/usr/bin/perl -w
# alienate - translate text
use strict;
my $lang;
if (@ARGV and $ARGV[0] =~ m/^-(\w\w)$/s) {
  # If the language is specified as a switch like "-fr"
  $lang = lc $1;
  shift @ARGV;
} else {
  # Otherwise just pick a language at random:
  my @languages = qw(it fr de es ja pt);
  # I.e.: Italian, French, German, Spanish, Japanese, Portugese.
  $lang = $languages[rand @languages];
}

die "What to translate?\n" unless @ARGV;
my $in = join(' ', @ARGV);

print " => via $lang => ",
  translate(
    translate($in, 'en_' . $lang),
    $lang . '_en'
  ), "\n";
exit;

# definitions of do_POST() and translate( ) go here

Call the alienate program like this:

% alienate [-lang] phrase

Specify a language with -lang, for example -fr to translate via French. If you don't specify a language, one will be randomly chosen for you. The phrase to translate is taken from the command line following any switches.

Here are some runs of alienate:

% alienate -de "Pearls before swine!"
=> via de => Beads before pigs!

% alienate "Bond, James Bond"
=> via fr => Link, Link Of James

% alienate "Shaken, not stirred"
=> via pt => Agitated, not agitated

% alienate -it "Shaken, not stirred"
=> via it => Mental patient, not stirred

% alienate -it "Guess what! I'm a computer!"
=> via it => Conjecture that what! They are a calculating!

% alienate 'It was more fun than a barrel of monkeys'
=> via de => It was more fun than a barrel drop hammer

% alienate -ja 'It was more fun than a barrel of monkeys'
=> via ja => That the barrel of monkey at times was many pleasures


Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.