We conclude this chapter by presenting sample tasks that involve complex pattern-matching concepts. Rather than solve the problems right away, we'll work toward the solutions step by step.
Suppose you have a few lines with this general form:
the best of times; the worst of times: moving The coolest of times; the worst of times: moving
The lines that you're concerned with always end with moving, but you never know what the first two words might be. You want to change any line that ends with moving to read:
The greatest of times; the worst of times: moving
Since the changes must occur on certain lines, you need to
specify a context-sensitive global replacement. Using
:g/moving$/
will match lines that end with moving.
Next, you realize that your search pattern could be any number of
any character, so the metacharacters .*
come to mind.
But these will match the whole line unless you somehow restrict
the match. Here's your first attempt:
:g/moving$/s/.*of/Thegreatestof/
This search string, you decide, will match from the beginning of the line to the first of. Since you needed to specify the word of to restrict the search, you simply repeat it in the replacement. Here's the resulting line:
The greatest of times: moving
Something went wrong. The replacement gobbled the line up to the second of instead of the first. Here's why. When given a choice, the action of "match any number of any character" will match as much text as possible. In this case, since the word of appears twice, your search string finds:
the best of times; the worst of
rather than:
the best of
Your search pattern needs to be more restrictive:
:g/moving$/s/.*of times;/The greatest of times;/
Now the .*
will match all characters up to
the instance of the phrase of times;.
Since there's only one instance, it has to be the first.
There are cases, though, when it is
inconvenient, or even incorrect, to use the .*
metacharacters.
For example, you might find yourself typing
many words to restrict your search pattern, or you might
be unable to restrict the pattern by specific words (if the text
in the lines varies widely). The next section presents such a
case.
Suppose you want to switch the order of all last names and first names in a (text) database. The lines look like this:
Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567 Name: Joy, Susan S.; Areas: Graphics; Phone: 999-3333
The name of each field ends with a colon, and each field is separated by a semicolon. Using the top line as an example, you want to change Feld, Ray to Ray Feld. We'll present some commands that look promising but don't work. After each command, we show you the line the way it looked before the change and after the change.
:%s/: \(.*\), \(.*\);/: \2 \1;/ Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567 Before Name: UNIX Feld, Ray; Areas: PC; Phone: 123-4567 After
We've highlighted the contents of the first hold buffer in
bold
and the contents of the second hold buffer in italic
.
Note that the first hold buffer contains more than you want.
Since it was not sufficiently restricted by the pattern that
follows it, the hold buffer was able to save up to the second comma.
Now you try to restrict the contents of the first hold buffer:
:%s/: \(....\), \(.*\);/: \2 \1;/ Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567 Before Name: Ray; Areas: PC, UNIX Feld; Phone: 123-4567 After
Here you've managed to save the last name in the first hold buffer, but now the second hold buffer will save anything up to the last semicolon on the line. Now you restrict the second hold buffer, too:
:%s/: \(....\), \(...\);/: \2 \1;/ Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567 Before Name: Ray Feld; Areas: PC, UNIX; Phone: 123-4567 After
This gives you what you want, but only in the specific case of a four-letter last name and a three-letter first name. (The previous attempt included the same mistake.) Why not just return to the first attempt, but this time be more selective about the end of the search pattern?
:%s/: \(.*\), \(.*\); Area/: \2 \1; Area/ Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567 Before Name: Ray Feld; Areas: PC, UNIX; Phone: 123-4567 After
This works, but we'll continue the discussion by introducing an additional concern. Suppose that the Area field isn't always present or isn't always the second field. The above command won't work on such lines.
We introduce this problem to make a point. Whenever you rethink a pattern match, it's usually better to work toward refining the variables (the metacharacters), rather than using specific text to restrict patterns. The more variables you use in your patterns, the more powerful your commands will be.
In the current example, think again about the patterns you want to switch. Each word starts with an uppercase letter and is followed by any number of lowercase letters, so you can match the names like this:
[A-Z][a-z]*
A last name might also have more than one uppercase letter (McFly, for example), so you'd want to search for this possibility in the second and succeeding letters:
[A-Z][A-Za-z]*
It doesn't hurt to use this for the first name, too (you never know when McGeorge Bundy will turn up). Your command now becomes:
:%s/: \([A-Z][A-Za-z]*\), \([A-Z][A-Za-z]*\);/: \2 \1;/
Quite forbidding, isn't it? It still doesn't cover the case of a name like Joy, Susan S. Since the first-name field might include a middle initial, you need to add a space and a period within the second pair of brackets. But enough is enough. Sometimes, specifying exactly what you want is more difficult than specifying what you don't want. In your sample database, the last names end with a comma, so a last-name field can be thought of as a string of characters that are not commas:
[^,]*
This pattern matches characters up until the first comma. Similarly, the first-name field is a string of characters that are not semicolons:
[^;]*
Putting these more efficient patterns back into your previous command, you get:
:%s/: \([^,]*\), \([^;]*\);/: \2 \1;/
The same command could also be entered as a context-sensitive replacement. If all lines begin with Name, you can say:
:g/^Name/s/: \([^,]*\), \([^;]*\);/: \2 \1;/
You can also add an asterisk after the first space, in order to match a colon that has extra spaces (or no spaces) after it:
:g/^Name/s/: *\([^,]*\), \([^;]*\);/: \2 \1;/
As we've usually seen the :g
command used, it selects lines that
are typically then edited by subsequent commands on the same line -- for
example, we select lines with g
, and then make substitutions
on them, or select them and delete them:
:g/mg[ira]box/s/box/square/g :g/^$/d
However, in his two-part tutorial in UNIX World,[9]
Walter Zintz makes an interesting point about the g
command. This command selects lines -- but the associated editing
commands need not actually affect the lines that are selected.
[9] Part 1, "vi Tips for Power Users," appears in the April 1990 issue of UNIX World. Part 2, "Using vi to Automate Complex Edits," appears in the May 1990 issue. The examples presented are from Part 2.
Instead, he demonstrates a technique by which you can repeat ex commands some arbitrary number of times. For example, suppose you want to place ten copies of lines 12 through 17 of your file at the end of your current file. You could type:
:1,10g/^/ 12,17t$
This is a very unexpected use of g
, but it works! The
g
command
selects line 1, executes the specified t
command, then goes on to
line 2, to execute the next copy command. When line 10 is
reached, ex will have made ten copies.
Here's another advanced g
example, again building on
suggestions provided in Zintz's article.
Suppose you're editing a document that consists of several parts.
Part 2 of this file is shown below, using ellipses to
show omitted text and displaying line numbers for reference:
301 Part 2 302 Capability Reference 303 .LP 304 Chapter 7 305 Introduction to the Capabilities 306 This and the next three chapters ... 400 ... and a complete index at the end. 401 .LP 402 Chapter 8 403 Screen Dimensions 404 Before you can do anything useful 405 on the screen, you need to know ... 555 .LP 556 Chapter 9 557 Editing the Screen 558 This chapter discusses ... 821 .LP 822 Part 3: 823 Advanced Features 824 .LP 825 Chapter 10
The chapter numbers appear on one line, their titles appear on the line below, and the chapter text (highlighted for emphasis) begins on the line below that. The first thing you'd like to do is copy the beginning line of each chapter, sending it to an already existing file called begin.
Here's the command that does this:
:g /^Chapter/ .+2w >> begin
You must be at the top of your file before issuing this command.
First you search for Chapter at the start of a line,
but then you want to run the command on the beginning line of each
chapter -- the second line below Chapter.
Because a line beginning with Chapter is now selected as
the current line,
the line address .+2
will indicate the second line below it.
The equivalent line
addresses +2
or ++
work as well.
You want to write these lines to an existing file named
begin, so you issue the w
command with the append operator
>>
.
Suppose you want to send the beginnings of chapters that are only
within Part 2. You need to restrict the lines selected by g
,
so you change your command to this:
:/^Part 2/,/^Part 3/g /^Chapter/ .+2w >> begin
Here, the g
command selects the lines that begin with
Chapter, but it searches
only that portion of the file from a line
starting with Part 2 through a line
starting with Part 3.
If you issue the above command,
the last lines of the file begin will read as follows:
This and the next three chapters ... Before you can do anything useful This chapter discusses ...
These are the lines that begin Chapters 7, 8, and 9.
In addition to the lines you've just sent, you'd like to copy chapter titles to the end of the document, in preparation for making a table of contents. You can use the vertical bar to tack a second command after your first command, like so:
:/^Part 2/,/^Part 3/g /^Chapter/ .+2w >> begin | +t$
Remember that with any subsequent command, line addresses are
relative to the previous command. The first command has marked
lines
(within Part 2) that start with Chapter, and the chapter titles
appear on a line below such lines. Therefore,
to access chapter titles in the second command, the line
address is +
(or the equivalents +1
or .+1
).
Then use t$
to copy the chapter titles to the end of the
file.
As these examples illustrate, thought and experimentation may lead you to some unusual editing solutions. Don't be afraid to try things! Just be sure to back up your file first!