start page | rating of books | rating of authors | reviews | copyrights

Book HomeBook TitleSearch this book

8.3. Signals

We said earlier that typing CTRL-Z to suspend a job is similar to typing CTRL-C to stop a job, except that you can resume the job later. They are actually similar in a deeper way: both are particular cases of the act of sending a signal to a process.

A signal is a message that one process sends to another when some abnormal event takes place or when it wants the other process to do something. Most often, a process sends a signal to a subprocess it created. You're undoubtedly already comfortable with the idea that one process can communicate with another through an I/O pipeline; think of a signal as another way for processes to communicate with each other. (In fact, any textbook on operating systems will tell you that both are examples of the general concept of interprocess communication, or IPC.)[117]

[117] Pipes and signals were the only IPC mechanisms in early versions of Unix. More modern versions have additional mechanisms, such as sockets, named pipes, and shared memory. Named pipes are accessible to shell programmers through the mkfifo(1) command, which is beyond the scope of this book.

Depending on the version of Unix, there are two or three dozen types of signals, including a few that can be used for whatever purpose a programmer wishes. Signals have numbers (from 1 to the number of signals the system supports) and names; we'll use the latter. You can get a list of all the signals on your system by typing kill -l. Bear in mind, when you write shell code involving signals, that signal names are more portable to other versions of Unix than signal numbers.

8.3.1. Control-Key Signals

When you type CTRL-C, you tell the shell to send the INT (for "interrupt") signal to the current job; CTRL-Z sends TSTP (for "terminal stop"). You can also send the current job a QUIT signal by typing CTRL-\ (control-backslash); this is sort of like a "stronger" version of CTRL-C.[118] You would normally use CTRL-\ when (and only when) CTRL-C doesn't work.

[118] CTRL-\ can also cause the running program to leave a file called core in the program's current directory. This file contains an image of the process to which you sent the signal; a programmer could use it to help debug the program that was running. The file's name is a (very) old-fashioned term for a computer's memory. Other signals leave these "core dumps" as well; you should feel free to delete them unless a systems programmer tells you otherwise.

As we'll see soon, there is also a "panic" signal called KILL that you can send to a process when even CTRL-\ doesn't work. But it isn't attached to any control key, which means that you can't use it to stop the currently running process. INT, TSTP, and QUIT are the only signals you can use with control keys (although some systems have additional control-key signals).

You can customize the control keys used to send signals with options of the stty(1) command. These vary from system to system -- consult your man page for the command -- but the usual syntax is stty signame char. signame is a name for the signal that, unfortunately, is often not the same as the names we use here. Table 1-7 in Chapter 1 lists stty names for signals and tty-driver actions found on all modern versions of Unix. char is the control character, which you can give in the same notation we use. For example, to set your INT key to CTRL-X on most systems, use:

stty intr ^X

Now that we've told you how to do this, we should add that we don't recommend it. Changing your signal keys could lead to trouble if someone else has to stop a runaway process on your machine.

Most of the other signals are used by the operating system to advise processes of error conditions, like a bad machine code instruction, bad memory address, division by zero, or other events such as input being available on a file descriptor or a timer ("alarm" in Unix terminology) going off. The remaining signals are used for esoteric error conditions that are of interest only to low-level systems programmers; newer versions of Unix have more and more arcane signal types.

8.3.2. kill

You can use the built-in shell command kill to send a signal to any process you've created -- not just the currently running job. kill takes as argument the process ID, job number, or command name of the process to which you want to send the signal. By default, kill sends the TERM ("terminate") signal, which usually has the same effect as the INT signal that you send with CTRL-C. But you can specify a different signal by using the -s option and the signal name, or the -n option and a signal number.

kill is so named because of the nature of the default TERM signal, but there is another reason, which has to do with the way Unix handles signals in general. The full details are too complex to go into here, but the following explanation should suffice.

Most signals cause a process that receives them to roll over and die; therefore, if you send any one of these signals, you "kill" the process that receives it. However, programs can be set up to "trap" specific signals and take some other action. For example, a text editor would do well to save the file being edited before terminating when it receives a signal such as INT, TERM, or QUIT. Determining what to do when various signals come in is part of the fun of Unix systems programming.

Here is an example of kill. Say you have a fred process in the background, with process ID 480 and job number 1, that needs to be stopped. You would start with this command:

kill %1

If you were successful, you would see a message like this:

[1] + Terminated                fred &

If you don't see this, then the TERM signal failed to terminate the job. The next step would be to try QUIT:

kill -s QUIT %1

If that worked, you would see this message:

[1] + Quit(coredump)           fred &

The shell indicates the signal that killed the program ("Quit") and the fact that it produced a core file. When a program exits normally, the exit status it returns to the shell is a value between 0 and 255. When a program dies from having been sent a signal, it exits, not with a status value of its own choosing, but rather with the status 256+N, where N is the number of the signal it received. (With ksh88 and most other shells, normal exit statuses are between 0 and 127, and the "death by signal" exit status is 128+N. Caveat emptor.)

If even QUIT doesn't work, the last-ditch method would be to use KILL:

kill -s KILL %1

(Notice how this has the flavor of "yelling" at the runaway process.) This produces the message:

[1] + Killed                    fred &

It is impossible for a process to trap a KILL signal -- the operating system should terminate the process immediately and unconditionally. If it doesn't, then either your process is in one of the "funny states" that we'll see later in this chapter, or (far less likely) there's a bug in your version of Unix.

On job-control systems, there is an additional uncatchable signal: STOP. This is like TSTP, in that it suspends the targeted job. But unlike TSTP, it cannot be caught or ignored. It is a more drastic signal than TSTP, but less so than QUIT or TERM, since a stopped job may still be continued with fg or bg. The Korn shell provides the predefined alias stop='kill -s STOP' to make stopping jobs easier.

Task 8-1 is another example of how to use the kill command.

Task 8-1

Write a function called killalljobs that kills all background jobs.[119]

[119] To test your understanding of how the shell works, answer this question: why can't this be done as a separate script?

The solution to this task is simple, relying on jobs -p: \

function killalljobs {
    kill "$@" $(jobs -p)
}

You may be tempted to use the KILL signal immediately, instead of trying TERM (the default) and QUIT first. Don't do this. TERM and QUIT are designed to give a process the chance to clean up before exiting, whereas KILL will stop the process, wherever it may be in its computation. Use KILL only as a last resort.

You can use the kill command with any process you create, not just jobs in the background of your current shell. For example, if you use a windowing system, then you may have several terminal windows, each of which runs its own shell. If one shell is running a process that you want to stop, you can kill it from another window -- but you can't refer to it with a job number because it's running under a different shell. You must instead use its process ID.

8.3.3. ps

This is probably the only situation in which a casual user would need to know the ID of a process. The command ps(1) gives you this information; however, it can give you lots of extra information that you must wade through as well.

ps is a complex command. It takes many options, some of which differ from one version of Unix to another. To add to the confusion, you may need different options on different Unix versions to get the same information! We will use options available on the two major types of Unix systems, those derived from System V (such as most of the versions for Intel x86 PCs, as well as Solaris, IBM's AIX and Hewlett-Packard's HP-UX) and BSD (Compaq's Ultrix, SunOS 4.x, and also GNU/Linux). If you aren't sure which kind of Unix version you have, try the System V options first.

You can invoke ps in its simplest form without any options. In this case, it prints a line of information about the current login shell and any processes running under it (i.e., background jobs). For example, if you invoked three background jobs, as we saw earlier in the chapter, ps on System V-derived versions of Unix would produce output that looks something like this:

   PID TTY      TIME COMD
   146 pts/10   0:03 ksh
  2349 pts/10   0:03 fred
  2367 pts/10   0:17 bob
  2387 pts/10   0:06 george
  2389 pts/10   0:09 dave
  2390 pts/10   0:00 ps

The output on BSD-derived systems looks like this:

   PID TT STAT  TIME COMMAND
   146 10 S     0:03 /bin/ksh -i
  2349 10 R     0:03 fred
  2367 10 D     0:17 bob
  2387 10 S     0:06 george
  2389 10 R     0:09 dave
  2390 10 R     0:00 ps

(You can ignore the STAT column.) This is a bit like the jobs command. PID is the process ID; TTY (or TT) is the terminal (or pseudo-terminal, if you are using a windowing system) the process was invoked from; TIME is the amount of processor time (not real or "wall clock" time) the process has used so far; COMD (or COMMAND) is the command. Notice that the BSD version includes the command's arguments, if any; also notice that the first line reports on the parent shell process, and in the last line, ps reports on itself.

ps without arguments lists all processes started from the current terminal or pseudo-terminal. But since ps is not a shell command, it doesn't correlate process IDs with the shell's job numbers. It also doesn't help you find the ID of the runaway process in another shell window.

To get this information, use ps -a (for "all"); this lists information on a different set of processes, depending on your Unix version.

8.3.3.1. System V

Instead of listing all of those that were started under a specific terminal, ps -a on System V-derived systems lists all processes associated with any terminal that aren't group leaders. For our purposes, a "group leader" is the parent shell of a terminal or window. Therefore, if you are using a windowing system, ps -a lists all jobs started in all windows (by all users), but not their parent shells.

Assume that, in the above example, you have only one terminal or window. Then ps -a prints the same output as plain ps except for the first line, since that's the parent shell. This doesn't seem to be very useful.

But consider what happens when you have multiple windows open. Let's say you have three windows, all running terminal emulators like xterm for the X Window System. You start background jobs fred, dave, and bob in windows with pseudo-terminal numbers 1, 2, and 3, respectively. This situation is shown in Figure 8-1.

Figure 8-1

Figure 8-1. Background jobs in multiple windows

Assume you are in the uppermost window. If you type ps, you see something like this:

   PID TTY      TIME COMD
   146 pts/1    0:03 ksh
  2349 pts/1    0:03 fred
  2390 pts/1    0:00 ps

But if you type ps -a, you see this:

   PID TTY      TIME COMD
  2349 pts/1    0:03 fred
  2367 pts/2    0:17 bob
  2389 pts/3    0:09 dave
  2390 pts/1    0:00 ps

Now you should see how ps -a can help you track down a runaway process. If it's dave, you can type kill 2389. If that doesn't work, try kill -s QUIT 2389, or in the worst case, kill -s KILL 2389.

8.3.3.2. BSD

On BSD-derived systems,[120] ps -a lists all jobs that were started on any terminal; in other words, it's a bit like concatenating the the results of plain ps for every user on the system. Given the above scenario, ps -a will show you all processes that the System V version shows, plus the group leaders (parent shells).

[120] ps on GNU/Linux systems acts like the BSD version.

Unfortunately, ps -a (on any version of Unix) will not report processes that are in certain pathological conditions where they "forget" things like what shell invoked them and what terminal they belong to. Such processes have colorful names (zombies, orphans) that are actually used in Unix technical literature, not just informally by professional systems programmers. If you have a serious runaway process problem, it's possible that the process has entered one of these states.

Let's not worry about why or how a process gets this way. All you need to understand is that the process doesn't show up when you type ps -a. You need another option to ps to see it: on System V, it's ps -e ("everything"), whereas on BSD, it's ps -ax.

These options tell ps to list processes that either weren't started from terminals or "forgot" what terminal they were started from. The former category includes lots of processes that you probably didn't even know existed: these include basic processes that run the system and so-called daemons (pronounced "demons") that handle system services like mail, printing, network file systems, etc.

In fact, the output of ps -e or ps -ax is an excellent source of education about Unix system internals, if you're curious about them. Run the command on your system and, for each line of the listing that looks interesting, invoke man on the process name or look it up in the Unix Programmer's Manual for your system.

User shells and processes are listed at the very bottom of ps -e or ps -ax output; this is where you should look for runaway processes. Notice that many processes in the listing have ? instead of a terminal. Either these aren't supposed to have one (such as the basic daemons) or they're runaways. Therefore it's likely that if ps -a doesn't find a process you're trying to kill, ps -e (or ps -ax) will list it with ? in the TTY (or TT) column. You can determine which process you want by looking at the COMD (or COMMAND) column.

8.3.4. kill: The Full Story

The kill command is really misnamed. It should have been called sendsignal or something similar, since it sends signals to processes. (The name in fact derives from the kill(2) system call, which the kill command uses to send signals, and which is similarly misnamed.)

As we saw earlier, kill -l gives you the full list of available signal names on your system. The behavior of the built-in version of kill has been considerably rationalized in ksh93. The options and what they do are summarized in Table 8-2.

Table 8-2. Options for kill

Option Meaning
kill job ...

Send the TERM signal to each named job. This is the normal usage.

kill -l

List the names of all supported signals.

kill -l signal ...

When signal is a number, print its name. If it's a name, print its number. If signal is a number greater than 256, it's treated as an exit status. The shell subtracts 256 and prints the corresponding signal.

kill -s signal-name job ...

Send the signal named by signal-name to each given job.

kill -n signal-number job ...

Send the numeric signal given by the signal-number to each given job.

kill -signal job ...

Send the signal specified by signal to each given job. signal may be either a number or a signal name. This form is considered to be obsolete; it is provided for compatibility with ksh88 and the external kill(1) command.

One place to take advantage of kill's ability to turn a number into a name is in issuing diagnostics. When a job dies due to a signal, the exit status is 256 plus the signal number. Thus, you might use code like this to produce a meaningful diagnostic from within a script:

es=$?        # save exit status
if ((es >= 256)); then
    print job received signal $(kill -l $((es - 256)) )
fi


Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.