20<div class="up"><a href="./"><img title="&lt;-" alt="&lt;-" src="/images/left.gif" /></a></div>
21<div id="path">
Guide to writing output filters
27    <p>There are a number of common pitfalls encountered when writing
28    output filters; this page aims to document best practice for
29    authors of new or existing filters.</p>
31    <p>This document is applicable to both version 2.0 and version 2.2
32    of the Apache HTTP Server; it specifically targets
33    <code>RESOURCE</code>-level or <code>CONTENT_SET</code>-level
34    filters though some advice is generic to all types of filter.</p>
35  </div>
47<div class="section">
48<h2><a name="basics" id="basics">Filters and bucket brigades</a></h2>
51    <p>Each time a filter is invoked, it is passed a <em>bucket
52    brigade</em>, containing a sequence of <em>buckets</em> which
53    represent both data content and metadata.  Every bucket has a
54    <em>bucket type</em>; a number of bucket types are defined and
55    used by the <code>httpd</code> core modules (and the
56    <code>apr-util</code> library which provides the bucket brigade
57    interface), but modules are free to define their own types.</p>
59    <div class="note">Output filters must be prepared to process
60    buckets of non-standard types; with a few exceptions, a filter
61    need not care about the types of buckets being filtered.</div>
63    <p>A filter can tell whether a bucket represents either data or
64    metadata using the <code>APR_BUCKET_IS_METADATA</code> macro.
65    Generally, all metadata buckets should be passed down the filter
66    chain by an output filter.  Filters may transform, delete, and
67    insert data buckets as appropriate.</p>
69    <p>There are two metadata bucket types which all filters must pay
70    attention to: the <code>EOS</code> bucket type, and the
71    <code>FLUSH</code> bucket type.  An <code>EOS</code> bucket
72    indicates that the end of the response has been reached and no
73    further buckets need be processed.  A <code>FLUSH</code> bucket
74    indicates that the filter should flush any buffered buckets (if
75    applicable) down the filter chain immediately.</p>
77    <div class="note"><code>FLUSH</code> buckets are sent when the
78    content generator (or an upstream filter) knows that there may be
79    a delay before more content can be sent.  By passing
80    <code>FLUSH</code> buckets down the filter chain immediately,
81    filters ensure that the client is not kept waiting for pending
82    data longer than necessary.</div>
84    <p>Filters can create <code>FLUSH</code> buckets and pass these
85    down the filter chain if desired.  Generating <code>FLUSH</code>
86    buckets unnecessarily, or too frequently, can harm network
87    utilisation since it may force large numbers of small packets to
88    be sent, rather than a small number of larger packets.  The
89    section on <a href="#nonblock">Non-blocking bucket reads</a>
90    covers a case where filters are encouraged to generate
91    <code>FLUSH</code> buckets.</p>
93    <div class="example"><h3>Example bucket brigade</h3><p><code>
94    HEAP FLUSH FILE EOS</code></p></div>
96    <p>This shows a bucket brigade which may be passed to a filter; it
97    contains two metadata buckets (<code>FLUSH</code> and
98    <code>EOS</code>), and two data buckets (<code>HEAP</code> and
99    <code>FILE</code>).</p>
102<div class="section">
103<h2><a name="invocation" id="invocation">Filter invocation</a></h2>
106    <p>For any given request, an output filter might be invoked only
107    once and be given a single brigade representing the entire response.
108    It is also possible that the number of times a filter is invoked
109    for a single response is proportional to the size of the content
110    being filtered, with the filter being passed a brigade containing
111    a single bucket each time.  Filters must operate correctly in
112    either case.</p>
114    <div class="warning">An output filter which allocates long-lived
115    memory every time it is invoked may consume memory proportional to
116    response size.  Output filters which need to allocate memory
117    should do so once per response; see <a href="#state">Maintaining
118    state</a> below.</div>
120    <p>An output filter can distinguish the final invocation for a
121    given response by the presence of an <code>EOS</code> bucket in
122    the brigade.  Any buckets in the brigade after an EOS should be
123    ignored.</p>
125    <p>An output filter should never pass an empty brigade down the
126    filter chain.  To be defensive, filters should be prepared to
127    accept an empty brigade, and should return success without passing
128    this brigade on down the filter chain.  The handling of an empty
129    brigade should have no side effects (such as changing any state
130    private to the filter).</p>
132    <div class="example"><h3>How to handle an empty brigade</h3><pre class="prettyprint lang-c">    apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)<br />
133    {
134        if (APR_BRIGADE_EMPTY(bb)) {
135            return APR_SUCCESS;
136        }
137        ....</pre>
141<div class="section">
142<h2><a name="brigade" id="brigade">Brigade structure</a></h2>
145    <p>A bucket brigade is a doubly-linked list of buckets.  The list
146    is terminated (at both ends) by a <em>sentinel</em> which can be
147    distinguished from a normal bucket by comparing it with the
148    pointer returned by <code>APR_BRIGADE_SENTINEL</code>.  The list
149    sentinel is in fact not a valid bucket structure; any attempt to
150    call normal bucket functions (such as
151    <code>apr_bucket_read</code>) on the sentinel will have undefined
152    behaviour (i.e. will crash the process).</p>
154    <p>There are a variety of functions and macros for traversing and
155    manipulating bucket brigades; see the <a href="http://apr.apache.org/docs/apr-util/trunk/group___a_p_r___util___bucket___brigades.html">apr_bucket.h</a>
156    header for complete coverage.  Commonly used macros include:</p>
158    <dl>
159      <dt><code>APR_BRIGADE_FIRST(bb)</code></dt>
160      <dd>returns the first bucket in brigade bb</dd>
162      <dt><code>APR_BRIGADE_LAST(bb)</code></dt>
163      <dd>returns the last bucket in brigade bb</dd>
165      <dt><code>APR_BUCKET_NEXT(e)</code></dt>
166      <dd>gives the next bucket after bucket e</dd>
168      <dt><code>APR_BUCKET_PREV(e)</code></dt>
169      <dd>gives the bucket before bucket e</dd>
171    </dl>
173    <p>The <code>apr_bucket_brigade</code> structure itself is
174    allocated out of a pool, so if a filter creates a new brigade, it
175    must ensure that memory use is correctly bounded.  A filter which
176    allocates a new brigade out of the request pool
177    (<code>r-&gt;pool</code>) on every invocation, for example, will fall
178    foul of the <a href="#invocation">warning above</a> concerning
179    memory use.  Such a filter should instead create a brigade on the
180    first invocation per request, and store that brigade in its <a href="#state">state structure</a>.</p>
182    <div class="warning"><p>It is generally never advisable to use
183    <code>apr_brigade_destroy</code> to "destroy" a brigade unless
184    you know for certain that the brigade will never be used
185    again, even then, it should be used rarely.  The
186    memory used by the brigade structure will not be released by
187    calling this function (since it comes from a pool), but the
188    associated pool cleanup is unregistered.  Using
189    <code>apr_brigade_destroy</code> can in fact cause memory leaks;
190    if a "destroyed" brigade contains buckets when its
191    containing pool is destroyed, those buckets will <em>not</em> be
192    immediately destroyed.</p>
194    <p>In general, filters should use <code>apr_brigade_cleanup</code>
195    in preference to <code>apr_brigade_destroy</code>.</p></div>
198<div class="section">
199<h2><a name="buckets" id="buckets">Processing buckets</a></h2>
203    <p>When dealing with non-metadata buckets, it is important to
204    understand that the "<code>apr_bucket *</code>" object is an
205    abstract <em>representation</em> of data:</p>
207    <ol>
208      <li>The amount of data represented by the bucket may or may not
209      have a determinate length; for a bucket which represents data of
210      indeterminate length, the <code>-&gt;length</code> field is set to
211      the value <code>(apr_size_t)-1</code>.  For example, buckets of
212      the <code>PIPE</code> bucket type have an indeterminate length;
213      they represent the output from a pipe.</li>
215      <li>The data represented by a bucket may or may not be mapped
216      into memory.  The <code>FILE</code> bucket type, for example,
217      represents data stored in a file on disk.</li>
218    </ol>
220    <p>Filters read the data from a bucket using the
221    <code>apr_bucket_read</code> function.  When this function is
222    invoked, the bucket may <em>morph</em> into a different bucket
223    type, and may also insert a new bucket into the bucket brigade.
224    This must happen for buckets which represent data not mapped into
225    memory.</p>
227    <p>To give an example; consider a bucket brigade containing a
228    single <code>FILE</code> bucket representing an entire file, 24
229    kilobytes in size:</p>
231    <div class="example"><p><code>FILE(0K-24K)</code></p></div>
233    <p>When this bucket is read, it will read a block of data from the
234    file, morph into a <code>HEAP</code> bucket to represent that
235    data, and return the data to the caller.  It also inserts a new
236    <code>FILE</code> bucket representing the remainder of the file;
237    after the <code>apr_bucket_read</code> call, the brigade looks
238    like:</p>
240    <div class="example"><p><code>HEAP(8K) FILE(8K-24K)</code></p></div>
243<div class="section">
244<h2><a name="filtering" id="filtering">Filtering brigades</a></h2>
247    <p>The basic function of any output filter will be to iterate
248    through the passed-in brigade and transform (or simply examine)
249    the content in some manner.  The implementation of the iteration
250    loop is critical to producing a well-behaved output filter.</p>
252    <p>Taking an example which loops through the entire brigade as
253    follows:</p>
255    <div class="example"><h3>Bad output filter -- do not imitate!</h3><pre class="prettyprint lang-c">apr_bucket *e = APR_BRIGADE_FIRST(bb);
256const char *data;
257apr_size_t len;
259while (e != APR_BRIGADE_SENTINEL(bb)) {
260    apr_bucket_read(e, &amp;data, &amp;length, APR_BLOCK_READ);
261    e = APR_BUCKET_NEXT(e);
265return ap_pass_brigade(bb);</pre>
268    <p>The above implementation would consume memory proportional to
269    content size.  If passed a <code>FILE</code> bucket, for example,
270    the entire file contents would be read into memory as each
271    <code>apr_bucket_read</code> call morphed a <code>FILE</code>
272    bucket into a <code>HEAP</code> bucket.</p>
274    <p>In contrast, the implementation below will consume a fixed
275    amount of memory to filter any brigade; a temporary brigade is
276    needed and must be allocated only once per response, see the <a href="#state">Maintaining state</a> section.</p>
278    <div class="example"><h3>Better output filter</h3><pre class="prettyprint lang-c">apr_bucket *e;
279const char *data;
280apr_size_t len;
282while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
283   rv = apr_bucket_read(e, &amp;data, &amp;length, APR_BLOCK_READ);
284   if (rv) ...;
285   /* Remove bucket e from bb. */
287   /* Insert it into  temporary brigade. */
288   APR_BRIGADE_INSERT_HEAD(tmpbb, e);
289   /* Pass brigade downstream. */
290   rv = ap_pass_brigade(f-&gt;next, tmpbb);
291   if (rv) ...;
292   apr_brigade_cleanup(tmpbb);
297<div class="section">
298<h2><a name="state" id="state">Maintaining state</a></h2>
302    <p>A filter which needs to maintain state over multiple
303    invocations per response can use the <code>-&gt;ctx</code> field of
304    its <code>ap_filter_t</code> structure.  It is typical to store a
305    temporary brigade in such a structure, to avoid having to allocate
306    a new brigade per invocation as described in the <a href="#brigade">Brigade structure</a> section.</p>
308  <div class="example"><h3>Example code to maintain filter state</h3><pre class="prettyprint lang-c">struct dummy_state {
309   apr_bucket_brigade *tmpbb;
310   int filter_state;
311   ....
314apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
317    struct dummy_state *state;
319    state = f-&gt;ctx;
320    if (state == NULL) {
322       /* First invocation for this response: initialise state structure.
323        */
324       f-&gt;ctx = state = apr_palloc(sizeof *state, f-&gt;r-&gt;pool);
326       state-&gt;tmpbb = apr_brigade_create(f-&gt;r-&gt;pool, f-&gt;c-&gt;bucket_alloc);
327       state-&gt;filter_state = ...;
329    }
330    ...</pre>
334<div class="section">
335<h2><a name="buffer" id="buffer">Buffering buckets</a></h2>
338    <p>If a filter decides to store buckets beyond the duration of a
339    single filter function invocation (for example storing them in its
340    <code>-&gt;ctx</code> state structure), those buckets must be <em>set
341    aside</em>.  This is necessary because some bucket types provide
342    buckets which represent temporary resources (such as stack memory)
343    which will fall out of scope as soon as the filter chain completes
344    processing the brigade.</p>
346    <p>To setaside a bucket, the <code>apr_bucket_setaside</code>
347    function can be called.  Not all bucket types can be setaside, but
348    if successful, the bucket will have morphed to ensure it has a
349    lifetime at least as long as the pool given as an argument to the
350    <code>apr_bucket_setaside</code> function.</p>
352    <p>Alternatively, the <code>ap_save_brigade</code> function can be
353    used, which will move all the buckets into a separate brigade
354    containing buckets with a lifetime as long as the given pool
355    argument.  This function must be used with care, taking into
356    account the following points:</p>
358    <ol>
359      <li>On return, <code>ap_save_brigade</code> guarantees that all
360      the buckets in the returned brigade will represent data mapped
361      into memory.  If given an input brigade containing, for example,
362      a <code>PIPE</code> bucket, <code>ap_save_brigade</code> will
363      consume an arbitrary amount of memory to store the entire output
364      of the pipe.</li>
366      <li>When <code>ap_save_brigade</code> reads from buckets which
367      cannot be setaside, it will always perform blocking reads,
368      removing the opportunity to use <a href="#nonblock">Non-blocking
369      bucket reads</a>.</li>
371      <li>If <code>ap_save_brigade</code> is used without passing a
372      non-NULL "<code>saveto</code>" (destination) brigade parameter,
373      the function will create a new brigade, which may cause memory
374      use to be proportional to content size as described in the <a href="#brigade">Brigade structure</a> section.</li>
375    </ol>
377    <div class="warning">Filters must ensure that any buffered data is
378    processed and passed down the filter chain during the last
379    invocation for a given response (a brigade containing an EOS
380    bucket).  Otherwise such data will be lost.</div>
383<div class="section">
384<h2><a name="nonblock" id="nonblock">Non-blocking bucket reads</a></h2>
387    <p>The <code>apr_bucket_read</code> function takes an
388    <code>apr_read_type_e</code> argument which determines whether a
389    <em>blocking</em> or <em>non-blocking</em> read will be performed
390    from the data source.  A good filter will first attempt to read
391    from every data bucket using a non-blocking read; if that fails
392    with <code>APR_EAGAIN</code>, then send a <code>FLUSH</code>
393    bucket down the filter chain, and retry using a blocking read.</p>
395    <p>This mode of operation ensures that any filters further down the
396    filter chain will flush any buffered buckets if a slow content
397    source is being used.</p>
399    <p>A CGI script is an example of a slow content source which is
400    implemented as a bucket type. <code class="module"><a href="/mod/mod_cgi.html">mod_cgi</a></code> will send
401    <code>PIPE</code> buckets which represent the output from a CGI
402    script; reading from such a bucket will block when waiting for the
403    CGI script to produce more output.</p>
405    <div class="example"><h3>Example code using non-blocking bucket reads</h3><pre class="prettyprint lang-c">apr_bucket *e;
406apr_read_type_e mode = APR_NONBLOCK_READ;
408while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
409    apr_status_t rv;
411    rv = apr_bucket_read(e, &amp;data, &amp;length, mode);
412    if (rv == APR_EAGAIN &amp;&amp; mode == APR_NONBLOCK_READ) {
414        /* Pass down a brigade containing a flush bucket: */
415        APR_BRIGADE_INSERT_TAIL(tmpbb, apr_bucket_flush_create(...));
416        rv = ap_pass_brigade(f-&gt;next, tmpbb);
417        apr_brigade_cleanup(tmpbb);
418        if (rv != APR_SUCCESS) return rv;
420        /* Retry, using a blocking read. */
421        mode = APR_BLOCK_READ;
422        continue;
423    } else if (rv != APR_SUCCESS) {
424        /* handle errors */
425    }
427    /* Next time, try a non-blocking read first. */
428    mode = APR_NONBLOCK_READ;
429    ...
434<div class="section">
435<h2><a name="rules" id="rules">Ten rules for output filters</a></h2>
438    <p>In summary, here is a set of rules for all output filters to
439    follow:</p>
441    <ol>
442      <li>Output filters should not pass empty brigades down the filter
443      chain, but should be tolerant of being passed empty
444      brigades.</li>
446      <li>Output filters must pass all metadata buckets down the filter
447      chain; <code>FLUSH</code> buckets should be respected by passing
448      any pending or buffered buckets down the filter chain.</li>
450      <li>Output filters should ignore any buckets following an
451      <code>EOS</code> bucket.</li>
453      <li>Output filters must process a fixed amount of data at a
454      time, to ensure that memory consumption is not proportional to
455      the size of the content being filtered.</li>
457      <li>Output filters should be agnostic with respect to bucket
458      types, and must be able to process buckets of unfamiliar
459      type.</li>
461      <li>After calling <code>ap_pass_brigade</code> to pass a brigade
462      down the filter chain, output filters should call
463      <code>apr_brigade_cleanup</code> to ensure the brigade is empty
464      before reusing that brigade structure; output filters should
465      never use <code>apr_brigade_destroy</code> to "destroy"
466      brigades.</li>
468      <li>Output filters must <em>setaside</em> any buckets which are
469      preserved beyond the duration of the filter function.</li>
471      <li>Output filters must not ignore the return value of
472      <code>ap_pass_brigade</code>, and must return appropriate errors
473      back up the filter chain.</li>
475      <li>Output filters must only create a fixed number of bucket
476      brigades for each response, rather than one per invocation.</li>
478      <li>Output filters should first attempt non-blocking reads from
479      each data bucket, and send a <code>FLUSH</code> bucket down the
480      filter chain if the read blocks, before retrying with a blocking
481      read.</li>
483    </ol>
485  </div></div>
