1package HTTP::Proxy::BodyFilter; 2 3use strict; 4use Carp; 5 6sub new { 7 my $class = shift; 8 my $self = bless {}, $class; 9 $self->init(@_) if $self->can('init'); 10 return $self; 11} 12 13sub proxy { 14 my ( $self, $new ) = @_; 15 return $new ? $self->{_hpbf_proxy} = $new : $self->{_hpbf_proxy}; 16} 17 18sub filter { 19 croak "HTTP::Proxy::HeaderFilter cannot be used as a filter"; 20} 21 22sub will_modify { 1 } # by default, we expect the filter to modify data 23 241; 25 26__END__ 27 28=head1 NAME 29 30HTTP::Proxy::BodyFilter - A base class for HTTP messages body filters 31 32=head1 SYNOPSIS 33 34 package MyFilter; 35 36 use base qw( HTTP::Proxy::BodyFilter ); 37 38 # a simple modification, that may break things 39 sub filter { 40 my ( $self, $dataref, $message, $protocol, $buffer ) = @_; 41 $$dataref =~ s/PERL/Perl/g; 42 } 43 44 1; 45 46=head1 DESCRIPTION 47 48The HTTP::Proxy::BodyFilter class is used to create filters for 49HTTP request/response body data. 50 51=head2 Creating a BodyFilter 52 53A BodyFilter is just a derived class that implements some methods 54called by the proxy. Of all the methods presented below, only 55C<filter()> B<must> be defined in the derived class. 56 57=over 4 58 59=item filter() 60 61The signature of the filter() method is the following: 62 63 sub filter { 64 my ( $self, $dataref, $message, $protocol, $buffer ) = @_; 65 ... 66 } 67 68where $self is the filter object, $dataref is a reference to the chunk 69of body data received, 70$message is a reference to either a HTTP::Request or a HTTP::Response 71object, and $protocol is a reference to the LWP::Protocol protocol object. 72 73Note that this subroutine signature looks a lot like that of the call- 74backs of LWP::UserAgent (except that $message is either a HTTP::Request 75or a HTTP::Response object). 76 77$buffer is a reference to a buffer where some of the unprocessed data 78can be stored for the next time the filter will be called (see L</Using 79a buffer to store data for a later use> for details). Thanks to the 80built-in HTTP::Proxy::BodyFilter::* filters, this is rarely needed. 81 82It is possible to access the headers of the message with 83C<< $message->headers() >>. This HTTP::Headers object is the one 84that was sent to the client 85(if the filter is on the response stack) or origin server (if the filter 86is on the request stack). Modifying it in the filter() method is useless, 87since the headers have already been sent. 88 89Since $dataref is a I<reference> to the data string, the referent 90can be modified and the changes will be transmitted through the 91filters that follows, until the data reaches its recipient. 92 93A HTTP::Proxy::BodyFilter object is a blessed hash, and the base class 94reserves only hash keys that start with C<_hpbf>. 95 96=item new() 97 98The constructor is defined for all subclasses. Initialisation tasks 99(if any) for subclasses should be done in the C<init()> method (see below). 100 101=item init() 102 103This method is called by the C<new()> constructeur to perform all 104initisalisation tasks. It's called once in the filter lifetime. 105 106It receives all the parameters passed to C<new()>. 107 108=item begin() 109 110Some filters might require initialisation before they are able to handle 111the data. If a C<begin()> method is defined in your subclass, the proxy 112will call it before sending data to the C<filter()> method. 113 114It's called once per HTTP message handled by the filter, before data 115processing begins. 116 117The method signature is as follows: 118 119 sub begin { 120 my ( $self, $message ) = @_ 121 ... 122 } 123 124=item end() 125 126Some filters might require finalisation after they are finished handling 127the data. If a C<end()> method is defined in your subclass, the proxy 128will call it after it has finished sending data to the C<filter()> method. 129 130It's called once per HTTP message handled by the filter, after all data 131processing is done. 132 133This method does not expect any parameters. 134 135=item will_modify() 136 137This method return a boolean value that indicate if the filter will 138modify the body data on the fly. 139 140The default implementation returns a I<true> value. 141 142=back 143 144=head2 Using a buffer to store data for a later use 145 146Some filters cannot handle arbitrary data: for example a filter that 147basically lowercases tag name will apply a simple regex 148such as C<s/E<lt>\s*(\w+)([^E<gt>]*)E<gt>/E<lt>\L$1\E$2E<gt>/g>. 149But the filter will fail is the chunk of data contains a tag 150that is cut before the final C<E<gt>>. 151 152It would be extremely complicated and error-prone to let each filter 153(and its author) do its own buffering, so the HTTP::Proxy architecture 154handles this too. The proxy passes to each filter, each time it is called, 155a reference to an empty string ($buffer in the above signature) that 156the filter can use to store some data for next run. 157 158When the reference is C<undef>, it means that the filter cannot 159store any data, because this is the very last run, needed to gather 160all the data left in all buffers. 161 162It is recommended to store as little data as possible in the buffer, 163so as to avoid (badly) reproducing what HTTP::Proxy::BodyFilter::complete 164does. 165 166In particular, you have to remember that all the data that remains in 167the buffer after the last piece of data is received from the origin 168server will be sent back to your filter in one big piece. 169 170=head2 The store and forward approach 171 172HTTP::Proxy implements a I<store and forward> mechanism, for those filters 173which need to have the whole message body to work. It's enabled simply by 174pushing the HTTP::Proxy::BodyFilter::complete filter on the filter stack. 175 176The data is stored in memory by the "complete" filter, which passes it 177on to the following filter once the full message body has been received. 178 179=head2 Standard BodyFilters 180 181Standard HTTP::Proxy::BodyFilter classes are lowercase. 182 183The following BodyFilters are included in the HTTP::Proxy distribution: 184 185=over 4 186 187=item lines 188 189This filter makes sure that the next filter in the filter chain will 190only receive complete lines. The "chunks" of data received by the 191following filters with either end with C<\n> or will be the last 192piece of data for the current HTTP message body. 193 194=item htmltext 195 196This class lets you create a filter that runs a given code reference 197against text included in a HTML document (outside C<E<lt>scriptE<gt>> 198and C<E<lt>styleE<gt>> tags). HTML entities are not included in the text. 199 200=item htmlparser 201 202Creates a filter from a HTML::Parser object. 203 204=item simple 205 206This class lets you create a simple body filter from a code reference. 207 208=item save 209 210Store the message body to a file. 211 212=item complete 213 214This filter stores the whole message body in memory, thus allowing 215some actions to be taken only when the full page has been received 216by the proxy. 217 218=item tags 219 220The HTTP::Proxy::BodyFilter::tags filter makes sure that the next filter 221in the filter chain will only receive complete tags. The current 222implementation is not 100% perfect, though. 223 224=back 225 226Please read each filter's documentation for more details about their use. 227 228=head1 USEFUL METHODS FOR SUBCLASSES 229 230Some methods are available to filters, so that they can eventually use 231the little knowledge they might have of HTTP::Proxy's internals. They 232mostly are accessors. 233 234=over 4 235 236=item proxy() 237 238Gets a reference to the HTTP::Proxy objects that owns the filter. 239This gives access to some of the proxy methods. 240 241=back 242 243=head1 AUTHOR 244 245Philippe "BooK" Bruhat, E<lt>book@cpan.orgE<gt>. 246 247=head1 SEE ALSO 248 249L<HTTP::Proxy>, L<HTTP::Proxy::HeaderFilter>. 250 251=head1 COPYRIGHT 252 253Copyright 2003-2005, Philippe Bruhat. 254 255=head1 LICENSE 256 257This module is free software; you can redistribute it or modify it under 258the same terms as Perl itself. 259 260=cut 261 262