1package HTTP::Proxy::BodyFilter;
2
3use strict;
4use Carp;
5
6sub new {
7    my $class = shift;
8    my $self = bless {}, $class;
9    $self->init(@_) if $self->can('init');
10    return $self;
11}
12
13sub proxy {
14    my ( $self, $new ) = @_;
15    return $new ? $self->{_hpbf_proxy} = $new : $self->{_hpbf_proxy};
16}
17
18sub filter {
19    croak "HTTP::Proxy::HeaderFilter cannot be used as a filter";
20}
21
22sub will_modify { 1 } # by default, we expect the filter to modify data
23
241;
25
26__END__
27
28=head1 NAME
29
30HTTP::Proxy::BodyFilter - A base class for HTTP messages body filters
31
32=head1 SYNOPSIS
33
34    package MyFilter;
35
36    use base qw( HTTP::Proxy::BodyFilter );
37
38    # a simple modification, that may break things
39    sub filter {
40        my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
41        $$dataref =~ s/PERL/Perl/g;
42    }
43
44    1;
45
46=head1 DESCRIPTION
47
48The HTTP::Proxy::BodyFilter class is used to create filters for
49HTTP request/response body data.
50
51=head2 Creating a BodyFilter
52
53A BodyFilter is just a derived class that implements some methods
54called by the proxy. Of all the methods presented below, only
55C<filter()> B<must> be defined in the derived class.
56
57=over 4
58
59=item filter()
60
61The signature of the filter() method is the following:
62
63    sub filter {
64        my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
65        ...
66    }
67
68where $self is the filter object, $dataref is a reference to the chunk
69of body data received,
70$message is a reference to either a HTTP::Request or a HTTP::Response
71object, and $protocol is a reference to the LWP::Protocol protocol object.
72
73Note that this subroutine signature looks a lot like that of the call-
74backs of LWP::UserAgent (except that $message is either a HTTP::Request
75or a HTTP::Response object).
76
77$buffer is a reference to a buffer where some of the unprocessed data
78can be stored for the next time the filter will be called (see L</Using
79a buffer to store data for a later use> for details). Thanks to the
80built-in HTTP::Proxy::BodyFilter::* filters, this is rarely needed.
81
82It is possible to access the headers of the message with
83C<< $message->headers() >>. This HTTP::Headers object is the one
84that was sent to the client
85(if the filter is on the response stack) or origin server (if the filter
86is on the request stack). Modifying it in the filter() method is useless,
87since the headers have already been sent.
88
89Since $dataref is a I<reference> to the data string, the referent
90can be modified and the changes will be transmitted through the
91filters that follows, until the data reaches its recipient.
92
93A HTTP::Proxy::BodyFilter object is a blessed hash, and the base class
94reserves only hash keys that start with C<_hpbf>.
95
96=item new()
97
98The constructor is defined for all subclasses. Initialisation tasks
99(if any) for subclasses should be done in the C<init()> method (see below).
100
101=item init()
102
103This method is called by the C<new()> constructeur to perform all
104initisalisation tasks. It's called once in the filter lifetime.
105
106It receives all the parameters passed to C<new()>.
107
108=item begin()
109
110Some filters might require initialisation before they are able to handle
111the data. If a C<begin()> method is defined in your subclass, the proxy
112will call it before sending data to the C<filter()> method.
113
114It's called once per HTTP message handled by the filter, before data
115processing begins.
116
117The method signature is as follows:
118
119    sub begin {
120        my ( $self, $message ) = @_
121        ...
122    }
123
124=item end()
125
126Some filters might require finalisation after they are finished handling
127the data. If a C<end()> method is defined in your subclass, the proxy
128will call it after it has finished sending data to the C<filter()> method.
129
130It's called once per HTTP message handled by the filter, after all data
131processing is done.
132
133This method does not expect any parameters.
134
135=item will_modify()
136
137This method return a boolean value that indicate if the filter will
138modify the body data on the fly.
139
140The default implementation returns a I<true> value.
141
142=back
143
144=head2 Using a buffer to store data for a later use
145
146Some filters cannot handle arbitrary data: for example a filter that
147basically lowercases tag name will apply a simple regex
148such as C<s/E<lt>\s*(\w+)([^E<gt>]*)E<gt>/E<lt>\L$1\E$2E<gt>/g>.
149But the filter will fail is the chunk of data contains a tag
150that is cut before the final C<E<gt>>.
151
152It would be extremely complicated and error-prone to let each filter
153(and its author) do its own buffering, so the HTTP::Proxy architecture
154handles this too. The proxy passes to each filter, each time it is called,
155a reference to an empty string ($buffer in the above signature) that
156the filter can use to store some data for next run.
157
158When the reference is C<undef>, it means that the filter cannot
159store any data, because this is the very last run, needed to gather
160all the data left in all buffers.
161
162It is recommended to store as little data as possible in the buffer,
163so as to avoid (badly) reproducing what HTTP::Proxy::BodyFilter::complete
164does.
165
166In particular, you have to remember that all the data that remains in
167the buffer after the last piece of data is received from the origin
168server will be sent back to your filter in one big piece.
169
170=head2 The store and forward approach
171
172HTTP::Proxy implements a I<store and forward> mechanism, for those filters
173which need to have the whole message body to work. It's enabled simply by
174pushing the HTTP::Proxy::BodyFilter::complete filter on the filter stack.
175
176The data is stored in memory by the "complete" filter, which passes it
177on to the following filter once the full message body has been received.
178
179=head2 Standard BodyFilters
180
181Standard HTTP::Proxy::BodyFilter classes are lowercase.
182
183The following BodyFilters are included in the HTTP::Proxy distribution:
184
185=over 4
186
187=item lines
188
189This filter makes sure that the next filter in the filter chain will
190only receive complete lines. The "chunks" of data received by the
191following filters with either end with C<\n> or will be the last
192piece of data for the current HTTP message body.
193
194=item htmltext
195
196This class lets you create a filter that runs a given code reference
197against text included in a HTML document (outside C<E<lt>scriptE<gt>>
198and C<E<lt>styleE<gt>> tags). HTML entities are not included in the text.
199
200=item htmlparser
201
202Creates a filter from a HTML::Parser object.
203
204=item simple
205
206This class lets you create a simple body filter from a code reference.
207
208=item save
209
210Store the message body to a file.
211
212=item complete
213
214This filter stores the whole message body in memory, thus allowing
215some actions to be taken only when the full page has been received
216by the proxy.
217
218=item tags
219
220The HTTP::Proxy::BodyFilter::tags filter makes sure that the next filter
221in the filter chain will only receive complete tags. The current
222implementation is not 100% perfect, though.
223
224=back
225
226Please read each filter's documentation for more details about their use.
227
228=head1 USEFUL METHODS FOR SUBCLASSES
229
230Some methods are available to filters, so that they can eventually use
231the little knowledge they might have of HTTP::Proxy's internals. They
232mostly are accessors.
233
234=over 4
235
236=item proxy()
237
238Gets a reference to the HTTP::Proxy objects that owns the filter.
239This gives access to some of the proxy methods.
240
241=back
242
243=head1 AUTHOR
244
245Philippe "BooK" Bruhat, E<lt>book@cpan.orgE<gt>.
246
247=head1 SEE ALSO
248
249L<HTTP::Proxy>, L<HTTP::Proxy::HeaderFilter>.
250
251=head1 COPYRIGHT
252
253Copyright 2003-2005, Philippe Bruhat.
254
255=head1 LICENSE
256
257This module is free software; you can redistribute it or modify it under
258the same terms as Perl itself.
259
260=cut
261
262