1
2=head1 NAME
3
4Pod::Simple - framework for parsing Pod
5
6=head1 SYNOPSIS
7
8 TODO
9
10=head1 DESCRIPTION
11
12Pod::Simple is a Perl library for parsing text in the Pod ("plain old
13documentation") markup language that is typically used for writing
14documentation for Perl and for Perl modules. The Pod format is explained
15in L<perlpod>; the most common formatter is called C<perldoc>.
16
17Be sure to read L</ENCODING> if your Pod contains non-ASCII characters.
18
19Pod formatters can use Pod::Simple to parse Pod documents and render them into
20plain text, HTML, or any number of other formats. Typically, such formatters
21will be subclasses of Pod::Simple, and so they will inherit its methods, like
22C<parse_file>.  But note that Pod::Simple doesn't understand and
23properly parse Perl itself, so if you have a file which contains a Perl
24program that has a multi-line quoted string which has lines that look
25like pod, Pod::Simple will treat them as pod.  This can be avoided if
26the file makes these into indented here documents instead.
27
28If you're reading this document just because you have a Pod-processing
29subclass that you want to use, this document (plus the documentation for the
30subclass) is probably all you need to read.
31
32If you're reading this document because you want to write a formatter
33subclass, continue reading it and then read L<Pod::Simple::Subclassing>, and
34then possibly even read L<perlpodspec> (some of which is for parser-writers,
35but much of which is notes to formatter-writers).
36
37=head1 MAIN METHODS
38
39=over
40
41=item C<< $parser = I<SomeClass>->new(); >>
42
43This returns a new parser object, where I<C<SomeClass>> is a subclass
44of Pod::Simple.
45
46=item C<< $parser->output_fh( *OUT ); >>
47
48This sets the filehandle that C<$parser>'s output will be written to.
49You can pass C<*STDOUT> or C<*STDERR>, otherwise you should probably do
50something like this:
51
52    my $outfile = "output.txt";
53    open TXTOUT, ">$outfile" or die "Can't write to $outfile: $!";
54    $parser->output_fh(*TXTOUT);
55
56...before you call one of the C<< $parser->parse_I<whatever> >> methods.
57
58=item C<< $parser->output_string( \$somestring ); >>
59
60This sets the string that C<$parser>'s output will be sent to,
61instead of any filehandle.
62
63
64=item C<< $parser->parse_file( I<$some_filename> ); >>
65
66=item C<< $parser->parse_file( *INPUT_FH ); >>
67
68This reads the Pod content of the file (or filehandle) that you specify,
69and processes it with that C<$parser> object, according to however
70C<$parser>'s class works, and according to whatever parser options you
71have set up for this C<$parser> object.
72
73=item C<< $parser->parse_string_document( I<$all_content> ); >>
74
75This works just like C<parse_file> except that it reads the Pod
76content not from a file, but from a string that you have already
77in memory.
78
79=item C<< $parser->parse_lines( I<...@lines...>, undef ); >>
80
81This processes the lines in C<@lines> (where each list item must be a
82defined value, and must contain exactly one line of content -- so no
83items like C<"foo\nbar"> are allowed).  The final C<undef> is used to
84indicate the end of document being parsed.
85
86The other C<parser_I<whatever>> methods are meant to be called only once
87per C<$parser> object; but C<parse_lines> can be called as many times per
88C<$parser> object as you want, as long as the last call (and only
89the last call) ends with an C<undef> value.
90
91
92=item C<< $parser->content_seen >>
93
94This returns true only if there has been any real content seen for this
95document. Returns false in cases where the document contains content,
96but does not make use of any Pod markup.
97
98=item C<< I<SomeClass>->filter( I<$filename> ); >>
99
100=item C<< I<SomeClass>->filter( I<*INPUT_FH> ); >>
101
102=item C<< I<SomeClass>->filter( I<\$document_content> ); >>
103
104This is a shortcut method for creating a new parser object, setting the
105output handle to STDOUT, and then processing the specified file (or
106filehandle, or in-memory document). This is handy for one-liners like
107this:
108
109  perl -MPod::Simple::Text -e "Pod::Simple::Text->filter('thingy.pod')"
110
111=back
112
113
114
115=head1 SECONDARY METHODS
116
117Some of these methods might be of interest to general users, as
118well as of interest to formatter-writers.
119
120Note that the general pattern here is that the accessor-methods
121read the attribute's value with C<< $value = $parser->I<attribute> >>
122and set the attribute's value with
123C<< $parser->I<attribute>(I<newvalue>) >>.  For each accessor, I typically
124only mention one syntax or another, based on which I think you are actually
125most likely to use.
126
127
128=over
129
130=item C<< $parser->parse_characters( I<SOMEVALUE> ) >>
131
132The Pod parser normally expects to read octets and to convert those octets
133to characters based on the C<=encoding> declaration in the Pod source.  Set
134this option to a true value to indicate that the Pod source is already a Perl
135character stream.  This tells the parser to ignore any C<=encoding> command
136and to skip all the code paths involving decoding octets.
137
138=item C<< $parser->no_whining( I<SOMEVALUE> ) >>
139
140If you set this attribute to a true value, you will suppress the
141parser's complaints about irregularities in the Pod coding. By default,
142this attribute's value is false, meaning that irregularities will
143be reported.
144
145Note that turning this attribute to true won't suppress one or two kinds
146of complaints about rarely occurring unrecoverable errors.
147
148
149=item C<< $parser->no_errata_section( I<SOMEVALUE> ) >>
150
151If you set this attribute to a true value, you will stop the parser from
152generating a "POD ERRORS" section at the end of the document. By
153default, this attribute's value is false, meaning that an errata section
154will be generated, as necessary.
155
156
157=item C<< $parser->complain_stderr( I<SOMEVALUE> ) >>
158
159If you set this attribute to a true value, it will send reports of
160parsing errors to STDERR. By default, this attribute's value is false,
161meaning that no output is sent to STDERR.
162
163Setting C<complain_stderr> also sets C<no_errata_section>.
164
165
166=item C<< $parser->source_filename >>
167
168This returns the filename that this parser object was set to read from.
169
170
171=item C<< $parser->doc_has_started >>
172
173This returns true if C<$parser> has read from a source, and has seen
174Pod content in it.
175
176
177=item C<< $parser->source_dead >>
178
179This returns true if C<$parser> has read from a source, and come to the
180end of that source.
181
182=item C<< $parser->strip_verbatim_indent( I<SOMEVALUE> ) >>
183
184The perlpod spec for a Verbatim paragraph is "It should be reproduced
185exactly...", which means that the whitespace you've used to indent your
186verbatim blocks will be preserved in the output. This can be annoying for
187outputs such as HTML, where that whitespace will remain in front of every
188line. It's an unfortunate case where syntax is turned into semantics.
189
190If the POD you're parsing adheres to a consistent indentation policy, you can
191have such indentation stripped from the beginning of every line of your
192verbatim blocks. This method tells Pod::Simple what to strip. For two-space
193indents, you'd use:
194
195  $parser->strip_verbatim_indent('  ');
196
197For tab indents, you'd use a tab character:
198
199  $parser->strip_verbatim_indent("\t");
200
201If the POD is inconsistent about the indentation of verbatim blocks, but you
202have figured out a heuristic to determine how much a particular verbatim block
203is indented, you can pass a code reference instead. The code reference will be
204executed with one argument, an array reference of all the lines in the
205verbatim block, and should return the value to be stripped from each line. For
206example, if you decide that you're fine to use the first line of the verbatim
207block to set the standard for indentation of the rest of the block, you can
208look at the first line and return the appropriate value, like so:
209
210  $new->strip_verbatim_indent(sub {
211      my $lines = shift;
212      (my $indent = $lines->[0]) =~ s/\S.*//;
213      return $indent;
214  });
215
216If you'd rather treat each line individually, you can do that, too, by just
217transforming them in-place in the code reference and returning C<undef>. Say
218that you don't want I<any> lines indented. You can do something like this:
219
220  $new->strip_verbatim_indent(sub {
221      my $lines = shift;
222      sub { s/^\s+// for @{ $lines },
223      return undef;
224  });
225
226=item C<< $parser->expand_verbatim_tabs( I<n> ) >>
227
228Default: 8
229
230If after any stripping of indentation in verbatim blocks, there remain
231tabs, this method call indicates what to do with them.  C<0>
232means leave them as tabs, any other number indicates that each tab is to
233be translated so as to have tab stops every C<n> columns.
234
235This is independent of other methods (except that it operates after any
236verbatim input stripping is done).
237
238Like the other methods, the input parameter is not checked for validity.
239C<undef> or containing non-digits has the same effect as 8.
240
241=back
242
243=head1 TERTIARY METHODS
244
245=over
246
247=item C<< $parser->abandon_output_fh() >>X<abandon_output_fh>
248
249Cancel output to the file handle. Any POD read by the C<$parser> is not
250effected.
251
252=item C<< $parser->abandon_output_string() >>X<abandon_output_string>
253
254Cancel output to the output string. Any POD read by the C<$parser> is not
255effected.
256
257=item C<< $parser->accept_code( @codes ) >>X<accept_code>
258
259Alias for L<< accept_codes >>.
260
261=item C<< $parser->accept_codes( @codes ) >>X<accept_codes>
262
263Allows C<$parser> to accept a list of L<perlpod/Formatting Codes>. This can be
264used to implement user-defined codes.
265
266=item C<< $parser->accept_directive_as_data( @directives ) >>X<accept_directive_as_data>
267
268Allows C<$parser> to accept a list of directives for data paragraphs. A
269directive is the label of a L<perlpod/Command Paragraph>. A data paragraph is
270one delimited by C<< =begin/=for/=end >> directives. This can be used to
271implement user-defined directives.
272
273=item C<< $parser->accept_directive_as_processed( @directives ) >>X<accept_directive_as_processed>
274
275Allows C<$parser> to accept a list of directives for processed paragraphs. A
276directive is the label of a L<perlpod/Command Paragraph>. A processed
277paragraph is also known as L<perlpod/Ordinary Paragraph>. This can be used to
278implement user-defined directives.
279
280=item C<< $parser->accept_directive_as_verbatim( @directives ) >>X<accept_directive_as_verbatim>
281
282Allows C<$parser> to accept a list of directives for L<perlpod/Verbatim
283Paragraph>. A directive is the label of a L<perlpod/Command Paragraph>. This
284can be used to implement user-defined directives.
285
286=item C<< $parser->accept_target( @targets ) >>X<accept_target>
287
288Alias for L<< accept_targets >>.
289
290=item C<< $parser->accept_target_as_text( @targets ) >>X<accept_target_as_text>
291
292Alias for L<< accept_targets_as_text >>.
293
294=item C<< $parser->accept_targets( @targets ) >>X<accept_targets>
295
296Accepts targets for C<< =begin/=for/=end >> sections of the POD.
297
298=item C<< $parser->accept_targets_as_text( @targets ) >>X<accept_targets_as_text>
299
300Accepts targets for C<< =begin/=for/=end >> sections that should be parsed as
301POD. For details, see L<< perlpodspec/About Data Paragraphs >>.
302
303=item C<< $parser->any_errata_seen() >>X<any_errata_seen>
304
305Used to check if any errata was seen.
306
307I<Example:>
308
309  die "too many errors\n" if $parser->any_errata_seen();
310
311=item C<< $parser->errata_seen() >>X<errata_seen>
312
313Returns a hash reference of all errata seen, both whines and screams. The hash reference's keys are the line number and the value is an array reference of the errors for that line.
314
315I<Example:>
316
317  if ( $parser->any_errata_seen() ) {
318     $logger->log( $parser->errata_seen() );
319  }
320
321=item C<< $parser->detected_encoding() >>X<detected_encoding>
322
323Return the encoding corresponding to C<< =encoding >>, but only if the
324encoding was recognized and handled.
325
326=item C<< $parser->encoding() >>X<encoding>
327
328Return encoding of the document, even if the encoding is not correctly
329handled.
330
331=item C<< $parser->parse_from_file( $source, $to ) >>X<parse_from_file>
332
333Parses from C<$source> file to C<$to> file. Similar to L<<
334Pod::Parser/parse_from_file >>.
335
336=item C<< $parser->scream( @error_messages ) >>X<scream>
337
338Log an error that can't be ignored.
339
340=item C<< $parser->unaccept_code( @codes ) >>X<unaccept_code>
341
342Alias for L<< unaccept_codes >>.
343
344=item C<< $parser->unaccept_codes( @codes ) >>X<unaccept_codes>
345
346Removes C<< @codes >> as valid codes for the parse.
347
348=item C<< $parser->unaccept_directive( @directives ) >>X<unaccept_directive>
349
350Alias for L<< unaccept_directives >>.
351
352=item C<< $parser->unaccept_directives( @directives ) >>X<unaccept_directives>
353
354Removes C<< @directives >> as valid directives for the parse.
355
356=item C<< $parser->unaccept_target( @targets ) >>X<unaccept_target>
357
358Alias for L<< unaccept_targets >>.
359
360=item C<< $parser->unaccept_targets( @targets ) >>X<unaccept_targets>
361
362Removes C<< @targets >> as valid targets for the parse.
363
364=item C<< $parser->version_report() >>X<version_report>
365
366Returns a string describing the version.
367
368=item C<< $parser->whine( @error_messages ) >>X<whine>
369
370Log an error unless C<< $parser->no_whining( TRUE ); >>.
371
372=back
373
374=head1 ENCODING
375
376The Pod::Simple parser expects to read B<octets>.  The parser will decode the
377octets into Perl's internal character string representation using the value of
378the C<=encoding> declaration in the POD source.
379
380If the POD source does not include an C<=encoding> declaration, the parser will
381attempt to guess the encoding (selecting one of UTF-8 or CP 1252) by examining
382the first non-ASCII bytes and applying the heuristic described in
383L<perlpodspec>.  (If the POD source contains only ASCII bytes, the
384encoding is assumed to be ASCII.)
385
386If you set the C<parse_characters> option to a true value the parser will
387expect characters rather than octets; will ignore any C<=encoding>; and will
388make no attempt to decode the input.
389
390=head1 SEE ALSO
391
392L<Pod::Simple::Subclassing>
393
394L<perlpod|perlpod>
395
396L<perlpodspec|perlpodspec>
397
398L<Pod::Escapes|Pod::Escapes>
399
400L<perldoc>
401
402=head1 SUPPORT
403
404Questions or discussion about POD and Pod::Simple should be sent to the
405pod-people@perl.org mail list. Send an empty email to
406pod-people-subscribe@perl.org to subscribe.
407
408This module is managed in an open GitHub repository,
409L<https://github.com/perl-pod/pod-simple/>. Feel free to fork and contribute, or
410to clone L<git://github.com/perl-pod/pod-simple.git> and send patches!
411
412Please use L<https://github.com/perl-pod/pod-simple/issues/new> to file a bug
413report.
414
415=head1 COPYRIGHT AND DISCLAIMERS
416
417Copyright (c) 2002 Sean M. Burke.
418
419This library is free software; you can redistribute it and/or modify it
420under the same terms as Perl itself.
421
422This program is distributed in the hope that it will be useful, but
423without any warranty; without even the implied warranty of
424merchantability or fitness for a particular purpose.
425
426=head1 AUTHOR
427
428Pod::Simple was created by Sean M. Burke <sburke@cpan.org>.
429But don't bother him, he's retired.
430
431Pod::Simple is maintained by:
432
433=over
434
435=item * Allison Randal C<allison@perl.org>
436
437=item * Hans Dieter Pearcey C<hdp@cpan.org>
438
439=item * David E. Wheeler C<dwheeler@cpan.org>
440
441=item * Karl Williamson C<khw@cpan.org>
442
443=back
444
445Documentation has been contributed by:
446
447=over
448
449=item * Gabor Szabo C<szabgab@gmail.com>
450
451=item * Shawn H Corey  C<SHCOREY at cpan.org>
452
453=back
454
455=cut
456