1package HTTP::Proxy::BodyFilter::htmlparser;
2
3use strict;
4use Carp;
5use HTTP::Proxy::BodyFilter;
6use vars qw( @ISA );
7@ISA = qw( HTTP::Proxy::BodyFilter );
8
9sub init {
10    croak "First parameter must be a HTML::Parser object"
11      unless $_[1]->isa('HTML::Parser');
12
13    my $self = shift;
14    $self->{_parser} = shift;
15
16    my %args = (@_);
17    $self->{rw} = delete $args{rw};
18}
19
20sub filter {
21    my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
22
23    @{ $self->{_parser} }{qw( output message protocol )} =
24      ( "", $message, $protocol );
25
26    $self->{_parser}->parse($$dataref);
27    $self->{_parser}->eof if not defined $buffer;    # last chunk
28    $$dataref = $self->{_parser}{output} if $self->{rw};
29}
30
31sub will_modify { $_[0]->{rw} }
32
331;
34
35__END__
36
37=head1 NAME
38
39HTTP::Proxy::BodyFilter::htmlparser - Filter using HTML::Parser
40
41=head1 SYNOPSIS
42
43    use HTTP::Proxy::BodyFilter::htmlparser;
44
45    # $parser is a HTML::Parser object
46    $proxy->push_filter(
47        mime     => 'text/html',
48        response => HTTP::Proxy::BodyFilter::htmlparser->new( $parser );
49    );
50
51=head1 DESCRIPTION
52
53The HTTP::Proxy::BodyFilter::htmlparser lets you create a
54filter based on the HTML::Parser object of your choice.
55
56This filter takes a HTML::Parser object as an argument to its constructor.
57The filter is either read-only or read-write. A read-only filter will
58not allow you to change the data on the fly. If you request a read-write
59filter, you'll have to rewrite the response-body completely.
60
61With a read-write filter, you B<must> recreate the whole body data. This
62is mainly due to the fact that the HTML::Parser has its own buffering
63system, and that there is no easy way to correlate the data that triggered
64the HTML::Parser event and its original position in the chunk sent by the
65origin server. See below for details.
66
67Note that a simple filter that modify the HTML text (not the tags) can
68be created more easily with HTTP::Proxy::BodyFilter::htmltext.
69
70=head2 Creating a HTML::Parser that rewrites pages
71
72A read-write filter is declared by passing C<rw =E<gt> 1> to the constructor:
73
74     HTTP::Proxy::BodyFilter::htmlparser->new( $parser, rw => 1 );
75
76To be able to modify the body of a message, a filter created with
77HTTP::Proxy::BodyFilter::htmlparser must rewrite it completely. The
78HTML::Parser object can update a special attribute named C<output>.
79To do so, the HTML::Parser handler will have to request the C<self>
80attribute (that is to say, require access to the parser itself) and
81update its C<output> key.
82
83The following attributes are added to the HTML::Parser object by this filter:
84
85=over 4
86
87=item output
88
89A string that will hold the data sent back by the proxy.
90
91This string will be used as a replacement for the body data only
92if the filter is read-write, that is to say, if it was initialised with
93C<rw =E<gt> 1>.
94
95Data should always be B<appended> to C<$parser-E<gt>{output}>.
96
97=item message
98
99A reference to the HTTP::Message that triggered the filter.
100
101=item protocol
102
103A reference to the HTTP::Protocol object.
104
105=back
106
107=head1 METHODS
108
109This filter defines three methods, called automatically:
110
111=over 4
112
113=item filter()
114
115The C<filter()> method handles all the interactions with the HTML::Parser
116object.
117
118=item init()
119
120Initialise the filter with the HTML::Parser object passed to the constructor.
121
122=item will_modify()
123
124This method returns a boolean value that indicates to the system
125if it will modify the data passing through. The value is actually
126the value of the C<rw> parameter passed to the constructor.
127
128=back
129
130=head1 SEE ALSO
131
132L<HTTP::Proxy>, L<HTTP::Proxy::Bodyfilter>,
133L<HTTP::Proxy::BodyFilter::htmltext>.
134
135=head1 AUTHOR
136
137Philippe "BooK" Bruhat, E<lt>book@cpan.orgE<gt>.
138
139=head1 COPYRIGHT
140
141Copyright 2003-2006, Philippe Bruhat.
142
143=head1 LICENSE
144
145This module is free software; you can redistribute it or modify it under
146the same terms as Perl itself.
147
148=cut
149
150