You invoke sed and awk in much the same way. The command-line syntax is:
command
[options
]script filename
Like almost all UNIX programs, sed and awk can take input from standard input and send the output to standard output. If a filename is specified, input is taken from that file. The output contains the processed information. Standard output is the display screen, and typically the output from these programs is directed there. It can also be sent to a file, using I/O redirection in the shell, but it must not go to the same file that supplies input to the program.
The options for each command are different. We will demonstrate many of these options in upcoming sections. (The complete list of command-line options for sed can be found in Appendix A, Quick Reference for sed; the complete list of options for awk is in Appendix B, Quick Reference for awk.)
The script specifies what instructions to perform. If specified on the command line, the script must be surrounded in single quotes if it contains a space or any characters that might be interpreted by the shell ($ and * for instance).
One option common to both sed and awk is the -f option that allows you to specify the name of a script file. As a script grows in size, it is convenient to place it in a file. Thus, you might invoke sed as follows:
sed -f
scriptfile
inputfile
Figure 2.1 shows the basic operation of sed and awk. Each program reads one input line at a time from the input file, makes a copy of the input line, and executes the instructions specified in the script on that copy. Thus, changes made to the input line do not affect the actual input file.
A script is where you tell the program what to do. At least one line of instruction is required. Short scripts can be specified on the command line; longer scripts are usually placed in a file where they can easily be revised and tested. In writing a script, keep in mind the sequence in which instructions will be executed and how each instruction changes the input line.
In sed and awk, each instruction has two parts: a pattern and a procedure. The pattern is a regular expression delimited with slashes (/). A procedure specifies one or more actions to be performed.
As each line of input is read, the program reads the first instruction in the script and checks the pattern against the current line. If there is no match, the procedure is ignored and the next instruction is read. If there is a match, then the action or actions specified in the procedure are followed. All of the instructions are read, not just the first instruction that matches the input line.
When all the applicable instructions have been interpreted and applied for a single line, sed outputs the line and repeats the cycle for each input line. Awk, on the other hand, does not automatically output the line; the instructions in your script control what is finally done with it.
The contents of a procedure are very different in sed and awk. In sed, the procedure consists of editing commands like those used in the line editor. Most commands consist of a single letter.
In awk, the procedure consists of programming statements and functions. A procedure must be surrounded by braces.
In the sections that follow, we'll look at a few scripts that process a sample mailing list.
In the upcoming sections, the examples use a sample file, named list. It contains a list of names and addresses, as shown below.
$cat list
John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston MA
If you like, create this file on your system or use a similar one of your own making. Because many of the examples in this chapter are short and interactive, you can enter them at your keyboard and verify the results.