Now that you know how CGI works, let's talk about how Apache implements mod_cgi. This is important because it will help you understand the limitations of mod_cgi and why mod_perl is such a big improvement. This discussion will also build a foundation for the rest of the performance chapters of this book.
Apache 1.3 on all Unix flavors uses the forking model.[8] When you start the server, a single process, called the parent process, is started. Its main responsibility is starting and killing child processes as needed. Various Apache configuration directives let you control how many child processes are spawned initially, the number of spare idle processes, and the maximum number of processes the parent process is allowed to fork.
[8]In Chapter 24 we talk about Apache 2.0, which introduces a few more server models.
Each child process has its own lifespan, which is controlled by the configuration directive MaxRequestsPerChild. This directive specifies the number of requests that should be served by the child before it is instructed to step down and is replaced by another process. Figure 1-3 illustrates.
When a client initiates a request, the parent process checks whether there is an idle child process and, if so, tells it to handle the request. If there are no idle processes, the parent checks whether it is allowed to fork more processes. If it is, a new process is forked to handle the request. Otherwise, the incoming request is queued until a child process becomes available to handle it.
The maximum number of queued requests is configurable by the ListenBacklog configuration directive. When this number is reached, a client issuing a new request will receive an error response informing it that the server is unreachable.
This is how requests for static objects, such as HTML documents and images, are processed. When a CGI request is received, an additional step is performed: mod_cgi in the child Apache process forks a new process to execute the CGI script. When the script has completed processing the request, the forked process exits.
One of the benefits of this model is that if something causes the child process to die (e.g., a badly written CGI script), it won't cause the whole service to fail. In fact, only the client that initiated the request will notice there was a problem.
Many free (and non-free) CGI scripts are badly written, but they still work, which is why no one tries to improve them. Examples of poor CGI programming practices include forgetting to close open files, using uninitialized global variables, ignoring the warnings Perl generates, and forgetting to turn on taint checks (thus creating huge security holes that are happily used by crackers to break into online systems).
Why do these sloppily written scripts work under mod_cgi? The reason lies in the way mod_cgi invokes them: every time a Perl CGI script is run, a new process is forked, and a new Perl interpreter is loaded. This Perl interpreter lives for the span of the request's life, and when the script exits (no matter how), the process and the interpreter exit as well, cleaning up on the way. When a new interpreter is started, it has no history of previous requests. All the variables are created from scratch, and all the files are reopened if needed. Although this detail may seem obvious, it will be of paramount importance when we discuss mod_perl.
There are several drawbacks to mod_cgi that triggered the development of improved web technologies. The first problem lies in the fact that a new process is forked and a new Perl interpreter is loaded for each CGI script invocation. This has several implications:
It adds the overhead of forking, although this is almost insignificant on modern Unix systems.
Loading the Perl interpreter adds significant overhead to server response times.
The script's source code and the modules that it uses need to be loaded into memory and compiled each time from scratch. This adds even more overhead to response times.
Process termination on the script's completion makes it impossible to create persistent variables, which in turn prevents the establishment of persistent database connections and in-memory databases.
Starting a new interpreter removes the benefit of memory sharing that could be obtained by preloading code modules at server startup. Also, database connections can't be pre-opened at server startup.
Another drawback is limited functionality: mod_cgi allows developers to write only content handlers within CGI scripts. If you need to access the much broader core functionality Apache provides, such as authentication or URL rewriting, you must resort to third-party Apache modules written in C, which sometimes make the production server environment somewhat cumbersome. More components require more administration work to keep the server in a healthy state.
Copyright © 2003 O'Reilly & Associates. All rights reserved.