A new feature of Apache 2.0 is the ability to create filters, as described in Chapter 6. These are modules (or parts of modules) that modify the output or input of other modules in some way. Over the course of Apache's development, it has often been said that these could only be done in a threaded server, because then you can make the process look just like reading and writing files. Early attempts to do it without threading met the argument that the required "inside out" model would be too hard for most module writers to handle. So, when Apache 2.0 came along with threading as a standard feature, there was much rejoicing. But wait! Unfortunately, even in 2.0, there are platforms that don't handle threading and process models that don't use it even if the platform supports it. So, we were back at square one. But, strangely, a new confidence in the ability of module writers meant that people suddenly believed that they could handle the "inside out" programming model.[73] And so, bucket brigades were born.
[73]So called because, instead of simply reading input and writing output, one must be prepared to receive some input, then return before a complete chunk is available, and then get called again with the next bit, possibly several times before anything completes. This requires saving state between each invocation and is considerably painful in comparison.
The general concept is that each "layer" in the filter stack can talk to the next layer up (or down, depending on whether it is an input filter or an output filter) and deal with the I/O between them by handing up (or down) "bucket brigades," which are a list of "buckets." Each bucket can contain some data, which should be dealt with in order by the filter, which, in turn, generates new bucket brigades and buckets.
Of course, there is an obvious asymmetry between input filters and output filters. Despite its obviousness, it takes a bit of getting used to when writing filters. An output filter is called with a bucket brigade and told "here, deal with the contents of this." In turn, it creates new bucket brigades and hands them on to the downstream filters. In contrast, an input filter gets asked "could you please fill this brigade?" and must, in turn, call lower-level filters to seed the input.
Of course, there are special cases for the ends of brigades — the "bottom" end will actually receive or send data (often through a special bucket) and the "top" end will consume or generate data without any higher (for output) or lower (for input) filter feeding it.
Why do we have buckets and bucket brigades? Why not pass buckets between the filters and dispense with brigades? The simple answer is that it is likely that filters will generate more than one bucket from time to time and would then have to store the "extra" ones until needed. Why make each one do that — why not have a standard mechanism? Once that's agreed, it is then natural to hand the brigade between layers instead of the buckets — it reduces the number of calls that have to be made without increasing complexity at all.
The bucket interface is documented in srclib/apr-util/include/apr_buckets.h.
Buckets come in various flavors — currently there are file, pipe, and socket buckets. There are buckets that are simply data in memory, but even these have various types — transient, heap, pool, memory-mapped, and immortal. There are also special EOS (end of stream) and flush buckets. Even though all buckets provide a way to read the bucket data (or as much as is currently available) via apr_bucket_read( ) — which is actually more like a peek interface — it is still necessary to consume the data somehow, either by destroying the bucket, reducing it in size, or splitting it. The read can be chosen to be either blocking or nonblocking — in either case, if data is available, it will all be returned.
Note that because the data is not destroyed by the read operation, it may be necessary for the bucket to change type and/or add extra buckets to the brigade — for example, consider a socket bucket: when you read it, it will read whatever is currently available from the socket and replace itself with a memory bucket containing that data. It will also add a new socket bucket following the memory bucket. (It can't simply insert the memory bucket before the socket bucket — that way, you'd have no way to find the pointer to the memory bucket, or even know it had been created.) So, although the current bucket pointer remains valid, it may change type as a result of a read, and the contents of the brigade may also change.
Although one cannot destructively read from a brigade, one can write to one — there are lots of functions to do that, ranging from apr_brigade_putc( ) to apr_brigade_printf( ).
EOS buckets indicate the end of the current stream (e.g., the end of a request), and flush buckets indicate that the filter should flush any stored data (assuming it can, of course). It is vital to obey such instructions (and pass them on), as failure will often cause deadlocks.
An output filter is given a bucket brigade, does whatever it does, and hands a new brigade (or brigades) down to the next filter in the output filter stack. To be used at all, a filter must first be registered. This is normally done in the hook registering function by calling ap_register_output_filter( ), like so:
ap_register_output_filter("filter name",filter_function,AP_FTYPE_RESOURCE);
where the first parameter is the name of the filter — this can be used in the configuration file to specify when a filter should be used. The second is the actual filter function, and the third says what type of filter it is (the possible types being AP_FTYPE_RESOURCE, AP_FTYPE_CONTENT_SET, AP_FTYPE_PROTOCOL, AP_FTYPE_TRANSCODE, AP_FTYPE_CONNECTION or AP_FTYPE_NETWORK). In reality, all the type does is determine where in the stack the filter appears. The filter function is called by the filter above it in the stack, which hands it its filter structure and a bucket brigade.
Once the filter is registered, it can be invoked either by configuration, or for more complex cases, the module can decide whether to insert it in the filter stack. If this is desired, the thing to do is to hook the "insert filter" hook, which is called when the filter stack is being set up. A typical hook would look like this:
ap_hook_insert_filter(filter_inserter,NULL,NULL,APR_HOOK_MIDDLE);
where filter_inserter( ) is a function that decides whether to insert the filter, and if so, inserts it. To do the insertion of the filter, you call:
ap_add_output_filter("filter name",ctx,r,r->connection);
where "filter name" is the same name as was used to register the filter in the first place and r is the request structure. The second parameter, ctx in this example, is an optional pointer to a context structure to be set in the filter structure. This can contain arbitrary information that the module needs the filter function to know in the usual way. The filter can retrieve it from the filter structure it is handed on each invocation:
static apr_status_t filter_function(ap_filter_t *f,apr_bucket_brigade *pbbIn) { filter_context *ctx=f->ctx;
where filter_context is a type you can choose freely (but had better match the type of the context variable you passed to ap_add_output_filter( )). The third and fourth parameters are the request and connection structures — the connection structure is always required, but the request structure is only needed if the filter applies to a single request rather than the whole connection.
As an example, I have written a complete output filter. This one is pretty frivolous — it simply converts the output to all uppercase. The current source should be available in modules/experimental/mod_case_filter.c. (Note that the comments to this example fall after the line(s) to which they refer.)
#include "httpd.h" #include "http_config.h" #include "apr_general.h" #include "util_filter.h" #include "apr_buckets.h" #include "http_request.h"
First, we include the necessary headers.
static const char s_szCaseFilterName[]="CaseFilter";
Next, we declare the filter name — this registers the filter and later inserts it to declare it as a const string.
module case_filter_module;
This is simply a forward declaration of the module structure.
typedef struct { int bEnabled; } CaseFilterConfig;
The module allows us to enable or disable the filter in the server configuration — if it is disabled, it doesn't get inserted into the output filter chain. Here's the structure where we store that info.
static void *CaseFilterCreateServerConfig(apr_pool_t *p,server_rec *s) { CaseFilterConfig *pConfig=apr_pcalloc(p,sizeof *pConfig); pConfig->bEnabled=0; return pConfig; }
This creates the server configuration structure (note that this means it must be a per-server option, not a location-dependent one). All modules that need per-server configuration must do this.
static void CaseFilterInsertFilter(request_rec *r) { CaseFilterConfig *pConfig=ap_get_module_config(r->server->module_config, &case_filter_module); if(!pConfig->bEnabled) return; ap_add_output_filter(s_szCaseFilterName,NULL,r,r->connection); }
This function inserts the output filter into the filter stack — note that it does this purely by the name of the filter. It is also possible to insert the filter automatically by using the AddOutputFilter or SetOutputFilter directives.
static apr_status_t CaseFilterOutFilter(ap_filter_t *f, apr_bucket_brigade *pbbIn) { apr_bucket *pbktIn; apr_bucket_brigade *pbbOut; pbbOut=apr_brigade_create(f->r->pool);
Since we are going to pass on data every time, we need to create a brigade to which to add the data.
APR_BRIGADE_FOREACH(pbktIn,pbbIn) {
Now loop over each of the buckets passed into us.
const char *data; apr_size_t len; char *buf; apr_size_t n; apr_bucket *pbktOut; if(APR_BUCKET_IS_EOS(pbktIn)) { apr_bucket *pbktEOS=apr_bucket_eos_create( ); APR_BRIGADE_INSERT_TAIL(pbbOut,pbktEOS); continue; }
If the bucket is an EOS, then pass it on down.
apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ);
Read all the data in the bucket, blocking to ensure there actually is some!
buf=malloc(len);
Allocate a new buffer for the output data. (We need to do this because we may add another to the bucket brigade, so using a transient wouldn't do — it would get overwritten on the next loop.) However, we use a buffer on the heap rather than the pool so it can be released as soon as we're finished with it.
for(n=0 ; n < len ; ++n) buf[n]=toupper(data[n]);
Convert whatever data we read into uppercase and store it in the new buffer.
pbktOut=apr_bucket_heap_create(buf,len,0);
Create the new bucket, and add our data to it. The final 0 means "don't copy this, we've already allocated memory for it."
APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);
And add it to the tail of the output brigade.
} return ap_pass_brigade(f->next,pbbOut); }
Once we've finished, pass the brigade down the filter chain.
static const char *CaseFilterEnable(cmd_parms *cmd, void *dummy, int arg) { CaseFilterConfig *pConfig=ap_get_module_config(cmd->server->module_config, &case_filter_module); pConfig->bEnabled=arg; return NULL; }
This just sets the configuration option to enable or disable the filter.
static const command_rec CaseFilterCmds[] = { AP_INIT_FLAG("CaseFilter", CaseFilterEnable, NULL, RSRC_CONF, "Run a case filter on this host"), { NULL } };
And this creates the command to set it.
static void CaseFilterRegisterHooks(void) { ap_hook_insert_filter(CaseFilterInsertFilter,NULL,NULL,APR_HOOK_MIDDLE);
Every module must register its hooks, so this module registers the filter inserter hook.
ap_register_output_filter(s_szCaseFilterName,CaseFilterOutFilter, AP_FTYPE_CONTENT);
It is also a convenient (and correct) place to register the filter itself, so we do.
} module case_filter_module = { STANDARD20_MODULE_STUFF, NULL, NULL, CaseFilterCreateServerConfig, NULL, CaseFilterCmds, NULL, CaseFilterRegisterHooks };
Finally, we have to register the various functions in the module structure. And there we are: a simple output filter. There are two ways to invoke this filter, either add:
CaseFilter on
in a Directory or Location section, invoking it through its own directives, or (for example):
AddOutputFilter CaseFilter html
which associates it with all .html files using the standard filter directives.
An input filter is called when input is required. It is handed a brigade to fill, a mode parameter (the mode can either be blocking, nonblocking, or peek), and a number of bytes to read — 0 means "read a line." Most input filters will, of course, call the filter below them to get data, process it in some way, then fill the brigade with the resulting data.
As with output filters, the filter must be registered:
ap_register_input_filter("filter name", filter_function, AP_FTYPE_CONTENT);
where the parameters are as described earlier for output filters. Note that there is currently no attempt to avoid collisions in filter names, which is probably a mistake. As with output filters, you have to insert the filter at the right moment — all is the same as earlier, except the functions say "input" instead of "output," of course.
Naturally, input filters are similar to but not the same as output filters. It is probably simplest to illustrate the differences with an example. The following filter converts the case of request data (note, just the data, not the headers — so to see anything happen, you need to do a POST request). It should be available in modules/experimental/mod_case_filter_in.c. (Note the comments follow the line(s) of code to which they refer.)
#include "httpd.h" #include "http_config.h" #include "apr_general.h" #include "util_filter.h" #include "apr_buckets.h" #include "http_request.h" #include <ctype.h>
As always, we start with the headers we need.
static const char s_szCaseFilterName[]="CaseFilter";
And then we see the name of the filter. Note that this is the same as the example output filter — this is fine, because there's never an ambiguity between input and output filters.
module case_filter_in_module;
This is just the usual required forward declaration.
typedef struct { int bEnabled; } CaseFilterInConfig;
This is a structure to hold on to whether this filter is enabled or not.
typedef struct { apr_bucket_brigade *pbbTmp; } CaseFilterInContext;
Unlike the output filter, we need a context — this is to hold a temporary bucket brigade. We keep it in the context to avoid recreating it each time we are called, which would be inefficient.
static void *CaseFilterInCreateServerConfig(apr_pool_t *p,server_rec *s) { CaseFilterInConfig *pConfig=apr_pcalloc(p,sizeof *pConfig); pConfig->bEnabled=0; return pConfig; }
Here is just standard stuff creating the server config structure (note that ap_pcalloc( ) actually sets the whole structure to zeros anyway, so the explicit initialization of bEnabled is redundant, but useful for documentation purposes).
static void CaseFilterInInsertFilter(request_rec *r) { CaseFilterInConfig *pConfig=ap_get_module_config(r->server->module_config, &case_filter_in_module); CaseFilterInContext *pCtx; if(!pConfig->bEnabled) return;
If the filter is enabled (by the CaseFilterIn directive), then...
pCtx=apr_palloc(r->pool,sizeof *pCtx); pCtx->pbbTmp=apr_brigade_create(r->pool);
Create the filter context discussed previously, and...
ap_add_input_filter(s_szCaseFilterName,pCtx,r,NULL);
insert the filter. Note that because of where we're hooked, this happens after the request headers have been read.
}
Now we move on to the actual filter function.
static apr_status_t CaseFilterInFilter(ap_filter_t *f, apr_bucket_brigade *pbbOut, ap_input_mode_t eMode, apr_size_t *pnBytes) { CaseFilterInContext *pCtx=f->ctx;
First we get the context we created earlier.
apr_status_t ret; ap_assert(APR_BRIGADE_EMPTY(pCtx->pbbTmp));
Because we're reusing the temporary bucket brigade each time we are called, it's a good idea to ensure that it's empty — it should be impossible for it not to be, hence the use of an assertion instead of emptying it.
ret=ap_get_brigade(f->next,pCtx->pbbTmp,eMode,pnBytes);
Get the next filter down to read some input, using the same parameters as we got, except it fills the temporary brigade instead of ours.
if(eMode == AP_MODE_PEEK || ret != APR_SUCCESS) return ret;
If we are in peek mode, all we have to do is return success if there is data available. Since the next filter down has to do the same, and we only have data if it has, then we can simply return at this point. This may not be true for more complex filters, of course! Also, if there was an error in the next filter, we should return now regardless of mode.
while(!APR_BRIGADE_EMPTY(pCtx->pbbTmp)) {
Now we loop over all the buckets read by the filter below.
apr_bucket *pbktIn=APR_BRIGADE_FIRST(pCtx->pbbTmp); apr_bucket *pbktOut; const char *data; apr_size_t len; char *buf; int n; // It is tempting to do this... //APR_BUCKET_REMOVE(pB); //APR_BRIGADE_INSERT_TAIL(pbbOut,pB); // and change the case of the bucket data, but that would be wrong // for a file or socket buffer, for example...
As the comment says, the previous would be tempting. We could do a hybrid — move buckets that are allocated in memory and copy buckets that are external resources, for example. This would make the code considerably more complex, though it might be more efficient as a result.
if(APR_BUCKET_IS_EOS(pbktIn)) { APR_BUCKET_REMOVE(pbktIn); APR_BRIGADE_INSERT_TAIL(pbbOut,pbktIn); continue; }
Once we've read an EOS, we should pass it on.
ret=apr_bucket_read(pbktIn,&data,&len,eMode); if(ret != APR_SUCCESS) return ret;
Again, we read the bucket in the same mode in which we were called (which, at this point, is either blocking or nonblocking, but definitely not peek) to ensure that we don't block if we shouldn't, and do if we should.
buf=malloc(len); for(n=0 ; n < len ; ++n) buf[n]=toupper(data[n]);
We allocate the new buffer on the heap, because it will be consumed and destroyed by the layers above us — if we used a pool buffer, it would last as long as the request does, which is likely to be wasteful of memory.
pbktOut=apr_bucket_heap_create(buf,len,0,NULL);
As always, the bucket for the buffer needs to have a matching type (note that we could ask the bucket to copy the data onto the heap, but we don't).
APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);
Add the new bucket to the output brigade.
apr_bucket_delete(pbktIn);
And delete the one we got from below.
} return APR_SUCCESS;
If we get here, everything must have gone fine, so return success.
} static const char *CaseFilterInEnable(cmd_parms *cmd, void *dummy, int arg) { CaseFilterInConfig *pConfig =ap_get_module_config(cmd->server->module_config,&case_filter_in_module); pConfig->bEnabled=arg; return NULL; }
This simply sets the Boolean enable flag in the configuration for this module. Note that we've used per-server configuration, but we could equally well use per-request, since the filter is added after the request is processed.
static const command_rec CaseFilterInCmds[] = { AP_INIT_FLAG("CaseFilterIn", CaseFilterInEnable, NULL, RSRC_CONF, "Run an input case filter on this host"),
Associate the configuration command with the function that sets it.
{ NULL } }; static void CaseFilterInRegisterHooks(apr_pool_t *p) { ap_hook_insert_filter(CaseFilterInInsertFilter,NULL,NULL,APR_HOOK_MIDDLE);
Hook the filter insertion hook — this gets called after the request header has been processed, but before any response is written or request body is read.
ap_register_input_filter(s_szCaseFilterName,CaseFilterInFilter, AP_FTYPE_RESOURCE);
This is a convenient point to register the filter.
} module case_filter_in_module = { STANDARD20_MODULE_STUFF, NULL, NULL, CaseFilterInCreateServerConfig, NULL, CaseFilterInCmds, CaseFilterInRegisterHooks };
Finally, we associate the various functions with the correct slots in the module structure. Incidentally, some people prefer to put the module structure at the beginning of the source — I prefer the end because it avoids having to predeclare all the functions used in it.
Copyright © 2003 O'Reilly & Associates. All rights reserved.