start page | rating of books | rating of authors | reviews | copyrights

sed & awksed & awkSearch this book

7.4. Pattern Matching

The "Hello, world" program does not demonstrate the power of pattern-matching rules. In this section, we look at a number of small, even trivial examples that nonetheless demonstrate this central feature of awk scripts.

When awk reads an input line, it attempts to match each pattern-matching rule in a script. Only the lines matching the particular pattern are the object of an action. If no action is specified, the line that matches the pattern is printed (executing the print statement is the default action). Consider the following script:

/^$/ { print "This is a blank line." }

This script reads: if the input line is blank, then print "This is a blank line." The pattern is written as a regular expression that identifies a blank line. The action, like most of those we've seen so far, contains a single print statement.

If we place this script in a file named awkscr and use an input file named test that contains three blank lines, then the following command executes the script:

$ awk -f awkscr test
This is a blank line.
This is a blank line.
This is a blank line.

(From this point on, we'll assume that our scripts are placed in a separate file and invoked using the -f command-line option.) The result tells us that there are three blank lines in test. This script ignores lines that are not blank.

Let's add several new rules to the script. This script is now going to analyze the input and classify it as an integer, a string, or a blank line.

# test for integer, string or empty line.
/[0-9]+/    { print "That is an integer" }
/[A-Za-z]+/ { print "This is a string" }
/^$/        { print "This is a blank line." }

The general idea is that if a line of input matches any of these patterns, the associated print statement will be executed. The + metacharacter is part of the extended set of regular expression metacharacters and means "one or more." Therefore, a line containing a sequence of one or more digits will be considered an integer. Here's a sample run, taking input from standard input:

$ awk -f awkscr
4
That is an integer
t
This is a string
4T
That is an integer
This is a string
RETURN
This is a blank line.
44
That is an integer
CTRL-D
$

Note that input "4T" was identified as both an integer and a string. A line can match more than one rule. You can write a stricter rule set to prevent a line from matching more than one rule. You can also write actions that are designed to skip other parts of the script.

We will be exploring the use of pattern-matching rules throughout this chapter.

7.4.1. Describing Your Script

Adding comments as you write the script is a good practice. A comment begins with the "#" character and ends at a newline. Unlike sed, awk allows comments anywhere in the script.

NOTE: If you are supplying your awk program on the command line, rather than putting it in a file, do not use a single quote anywhere in your program. The shell would interpret it and become confused.

As we begin writing scripts, we'll use comments to describe the action:

#  blank.awk -- Print message for each blank line.
/^$/ { print "This is a blank line." }

This comment offers the name of the script, blank.awk, and briefly describes what the script does. A particularly useful comment for longer scripts is one that identifies the expected structure of the input file. For instance, in the next section, we are going to look at writing a script that reads a file containing names and phone numbers. The introductory comments for this program should be:

# blocklist.awk -- print name and address in block form.
# fields: name, company, street, city, state and zip, phone

It is useful to embed this information in the script because the script won't work unless the structure of the input file corresponds to that expected by the person who wrote the script.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.