Unless you tell it otherwise, sort divides each line into fields at whitespace (blanks or tabs), and sorts the lines by field, from left to right.
That is, it sorts on the basis of field 0 (leftmost), but when the leftmost fields are the same, it sorts on the basis of field 1, and so on. This is hard to put into words, but it's really just common sense. Suppose your office inventory manager created a file like this:
supplies pencils 148 furniture chairs 40 kitchen knives 22 kitchen forks 20 supplies pens 236 furniture couches 10 furniture tables 7 supplies paper 29
You'd want all the supplies sorted into categories, and within each category, you'd want them sorted alphabetically:
% sort supplies furniture chairs 40 furniture couches 10 furniture tables 7 kitchen forks 20 kitchen knives 22 supplies paper 29 supplies pencils 148 supplies pens 236
Of course, you don't always want to sort from left to right. The command-line option +n tells sort to start sorting on field n; -n tells sort to stop sorting on field n. Remember (again) that sort counts fields from left to right, starting with 0.[66] Here's an example. We want to sort a list of telephone numbers of authors, presidents, and blues singers:
[66]I harp on this because I always get confused and have to look it up in the manual page.
Robert M Johnson 344-0909 Lyndon B Johnson 933-1423 Samuel H Johnson 754-2542 Michael K Loukides 112-2535 Jerry O Peek 267-2345 Timothy F O'Reilly 443-2434
According to standard "telephone book rules," we want these names sorted by last name, first name, and middle initial. We don't want the phone number to play a part in the sorting. So we want to start sorting on field 2, stop sorting on field 3, continue sorting on field 0, sort on field 1, and (just to make sure) stop sorting on field 2 (the last name). We can code this as follows:
% sort +2 -3 +0 -2 phonelist Lyndon B Johnson 933-1423 Robert M Johnson 344-0909 Samuel H Johnson 754-2542 Michael K Loukides 112-2535 Timothy F O'Reilly 443-2434 Jerry O Peek 267-2345
A few notes:
We need the -3 option to prevent sort from sorting on the telephone number after sorting on the last name. Without -3, the "Robert Johnson" entry would appear before "Lyndon Johnson" because it has a lower phone number.
We don't need to state +1 explicitly. Unless you give an explicit "stop" field, +1 is implied after +0.
If two names are completely identical, we probably don't care what happens next. However, just to be sure that something unexpected doesn't take place, we end the option list with -2, which says, "After sorting on the middle initial, don't do any further sorting."
There are a couple of variations that are worth mentioning. You may never need them unless you're really serious about sorting data files, but it's good to keep them in the back of your mind. First, you can add any "collation" operations (discard blanks, numeric sort, etc.) to the end of a field specifier to describe how you want that field sorted. Using our previous example, let's say that if two names are identical, you want them sorted in numeric phone number order. The following command does the trick:
% sort +2 -3 +0 -2 +3n phonelist
The +3n option says "do a numeric sort on the fourth field." If you're worried about initial blanks (perhaps some of the phone numbers have area codes), use +3nb.
Second, you can specify individual columns within any field for sorting, using the notation +n.c, where n is a field number, and c is a character position within the field. Likewise, the notation -n.c says "stop sorting at the character before character c." If you're counting characters, be sure to use the -b (ignore whitespace) option -- otherwise, it will be very difficult to figure out what character you're counting.
-- ML
Copyright © 2003 O'Reilly & Associates. All rights reserved.