The new features introduced by Apache 2.0 and the Perl 5.6 and 5.8 generations provide the base of the new mod_perl 2.0 features. In addition, mod_perl 2.0 reimplements itself from scratch, providing such new features as a new build and testing framework. Let's look at the major changes since mod_perl 1.0.
In order to adapt to the Apache 2.0 threads architecture (for threaded MPMs), mod_perl 2.0 needs to use thread-safe Perl interpreters, also known as ithreads (interpreter threads). This mechanism is enabled at compile time and ensures that each Perl interpreter instance is reentrant—that is, multiple Perl interpreters can be used concurrently within the same process without locking, as each instance has its own copy of any mutable data (symbol tables, stacks, etc.). This of course requires that each Perl interpreter instance is accessed by only one thread at any given time.
The first mod_perl generation has only a single PerlInterpreter, which is constructed by the parent process, then inherited across the forks to child processes. mod_perl 2.0 has a configurable number of PerlInterpreters and two classes of interpreters, parent and clone. A parent is like in mod_perl 1.0, where the main interpreter created at startup time compiles any preloaded Perl code. A clone is created from the parent using the Perl API perl_clone( ) function. At request time, parent interpreters are used only for making more clones, as the clones are the interpreters that actually handle requests. Care is taken by Perl to copy only mutable data, which means that no runtime locking is required and read-only data such as the syntax tree is shared from the parent, which should reduce the overall mod_perl memory footprint.
Rather than creating a PerlInterperter for each thread, by default mod_perl creates a pool of interpreters. The pool mechanism helps cut down memory usage a great deal. As already mentioned, the syntax tree is shared between all cloned interpreters. If your server is serving more than just mod_perl requests, having a smaller number of PerlInterpreters than the number of threads will clearly cut down on memory usage. Finally, perhaps the biggest win is memory reuse: as calls are made into Perl subroutines, memory allocations are made for variables when they are used for the first time. Subsequent use of variables may allocate more memory; e.g., if a scalar variable needs to hold a longer string than it did before, or an array has new elements added. As an optimization, Perl hangs onto these allocations, even though their values go out of scope. mod_perl 2.0 has much better control over which PerlInterpreters are used for incoming requests. The interpreters are stored in two linked lists, one for available interpreters and another for busy ones. When needed to handle a request, one interpreter is taken from the head of the available list, and it's put back at the head of the same list when it's done. This means that if, for example, you have ten interpreters configured to be cloned at startup time, but no more than five are ever used concurrently, those five continue to reuse Perl's allocations, while the other five remain much smaller, but ready to go if the need arises.
The interpreters pool mechanism has been abstracted into an API known as tipool (thread item pool). This pool, currently used to manage a pool of PerlInterpreter objects, can be used to manage any data structure in which you wish to have a smaller number of items than the number of configured threads.
It's important to notice that the Perl ithreads implementation ensures that Perl code is thread-safe, at least with respect to the Apache threads in which it is running. However, it does not ensure that functions and extensions that call into third-party C/C++ libraries are thread-safe. In the case of non-thread-safe extensions, if it is not possible to fix those routines, care needs to be taken to serialize calls into such functions (either at the XS or Perl level). See Perl 5.8.0's perlthrtut manpage.
Note that while Perl data is thread-private unless explicitly shared and threads themselves are separate execution threads, the threads can affect process-scope state, affecting all the threads. For example, if one thread does chdir("/tmp"), the current working directory of all threads is now /tmp. While each thread can correct its current working directory by storing the original value, there are functions whose process-scope changes cannot be undone. For example, chroot( ) changes the root directory of all threads, and this change is not reversible. Refer to the perlthrtut manpage for more information.
As we mentioned earlier, Apache 2.0 uses two APIs:
The Apache Portable Runtime (APR) API, which implements a portable and efficient API to generically work with files, threads, processes, shared memory, etc.
The Apache API, which handles issues specific to the web server
mod_perl 2.0 provides its own very flexible special-purpose XS code generator, which is capable of doing things none of the existing generators can handle. It's possible that in the future this generator will be generalized and used for other projects of a high complexity.
This generator creates the Perl glue code for the public APR and Apache APIs, almost without a need for any extra code (just a few thin wrappers to make the API more Perlish).
Since APR can be used outside of Apache, the Perl APR:: modules can be used outside of Apache as well.
In addition to the already mentioned new features in mod_perl 2.0, the following are of major importance:
Apache 2.0 protocol modules are supported. Later we will see an example of a protocol module running on top of mod_perl 2.0.
mod_perl 2.0 provides a very simple-to-use interface to the Apache filtering API; this is of great interest because in mod_perl 1.0 the Apache::Filter and Apache::OutputChain modules, used for filtering, had to go to great lengths to implement filtering and couldn't be used for filtering output generated by non-Perl modules. Moreover, incoming-stream filtering has now become possible. We will discuss filtering and see a few examples later on.
A feature-full and flexible Apache::Test framework was developed especially for mod_perl testing. While intended to test the core mod_perl features, it is also used by third-party module writers to easily test their modules. Moreover, Apache::Test was adopted by Apache and is currently used to test the Apache 1.3, 2.0, and other ASF projects. Anything that runs on top of Apache can be tested with Apache::Test, whether the target is written in Perl, C, PHP, etc.
The support of the new MPMs makes mod_perl 2.0 able to scale better on a wider range of platforms. For example, if you've happened to try mod_perl 1.0 on Win32 you probably know that parallel requests had to be serialized—i.e., only a single request could be processed at a time, rendering the Win32 platform unusable with mod_perl as a heavy production service. Thanks to the new Apache MPM design, mod_perl 2.0 can now efficiently process parallel requests on Win32 platforms (using its native win32 MPM).
mod_perl 2.0 provides new configuration directives for the newly added features and improves upon existing ones. For example, the PerlOptions directive provides fine-grained configuration for what were compile-time only options in the first mod_perl generation. The Perl*FilterHandler directives provide a much simpler Apache filtering API, hiding most of the details underneath. We will talk in detail about these and other options in the section Section 24.5.
The new Apache::Directive module provides a Perl interface to the Apache configuration tree, which is another new feature in Apache 2.0.
The rewrite of mod_perl gives us a chance to build a smarter, stronger, and faster implementation based on lessons learned over the years since mod_perl was introduced. There are some optimizations that can be made in the mod_perl source code, some that can be made in the Perl space by optimizing its syntax tree, and some that are a combination of both.
Copyright © 2003 O'Reilly & Associates. All rights reserved.