1: #!/usr/bin/perl-5.005
2    eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
3	if $running_under_some_shell;
4
5$DEF_PM_SECTION = '3pm' || '3';
6
7=head1 NAME
8
9pod2man - translate embedded Perl pod directives into man pages
10
11=head1 SYNOPSIS
12
13B<pod2man>
14[ B<--section=>I<manext> ]
15[ B<--release=>I<relpatch> ]
16[ B<--center=>I<string> ]
17[ B<--date=>I<string> ]
18[ B<--fixed=>I<font> ]
19[ B<--official> ]
20[ B<--lax> ]
21I<inputfile>
22
23=head1 DESCRIPTION
24
25B<pod2man> converts its input file containing embedded pod directives (see
26L<perlpod>) into nroff source suitable for viewing with nroff(1) or
27troff(1) using the man(7) macro set.
28
29Besides the obvious pod conversions, B<pod2man> also takes care of
30func(), func(n), and simple variable references like $foo or @bar so
31you don't have to use code escapes for them; complex expressions like
32C<$fred{'stuff'}> will still need to be escaped, though.  Other nagging
33little roffish things that it catches include translating the minus in
34something like foo-bar, making a long dash--like this--into a real em
35dash, fixing up "paired quotes", putting a little space after the
36parens in something like func(), making C++ and PI look right, making
37double underbars have a little tiny space between them, making ALLCAPS
38a teeny bit smaller in troff(1), and escaping backslashes so you don't
39have to.
40
41=head1 OPTIONS
42
43=over 8
44
45=item center
46
47Set the centered header to a specific string.  The default is
48"User Contributed Perl Documentation", unless the C<--official> flag is
49given, in which case the default is "Perl Programmers Reference Guide".
50
51=item date
52
53Set the left-hand footer string to this value.  By default,
54the modification date of the input file will be used.
55
56=item fixed
57
58The fixed font to use for code refs.  Defaults to CW.
59
60=item official
61
62Set the default header to indicate that this page is of
63the standard release in case C<--center> is not given.
64
65=item release
66
67Set the centered footer.  By default, this is the current
68perl release.
69
70=item section
71
72Set the section for the C<.TH> macro.  The standard conventions on
73sections are to use 1 for user commands,  2 for system calls, 3 for
74functions, 4 for devices, 5 for file formats, 6 for games, 7 for
75miscellaneous information, and 8 for administrator commands.  This works
76best if you put your Perl man pages in a separate tree, like
77F</usr/local/perl/man/>.  By default, section 1 will be used
78unless the file ends in F<.pm> in which case section 3 will be selected.
79
80=item lax
81
82Don't complain when required sections aren't present.
83
84=back
85
86=head1 Anatomy of a Proper Man Page
87
88For those not sure of the proper layout of a man page, here's
89an example of the skeleton of a proper man page.  Head of the
90major headers should be setout as a C<=head1> directive, and
91are historically written in the rather startling ALL UPPER CASE
92format, although this is not mandatory.
93Minor headers may be included using C<=head2>, and are
94typically in mixed case.
95
96=over 10
97
98=item NAME
99
100Mandatory section; should be a comma-separated list of programs or
101functions documented by this podpage, such as:
102
103    foo, bar - programs to do something
104
105=item SYNOPSIS
106
107A short usage summary for programs and functions, which
108may someday be deemed mandatory.
109
110=item DESCRIPTION
111
112Long drawn out discussion of the program.  It's a good idea to break this
113up into subsections using the C<=head2> directives, like
114
115    =head2 A Sample Subection
116
117    =head2 Yet Another Sample Subection
118
119=item OPTIONS
120
121Some people make this separate from the description.
122
123=item RETURN VALUE
124
125What the program or function returns if successful.
126
127=item ERRORS
128
129Exceptions, return codes, exit stati, and errno settings.
130
131=item EXAMPLES
132
133Give some example uses of the program.
134
135=item ENVIRONMENT
136
137Envariables this program might care about.
138
139=item FILES
140
141All files used by the program.  You should probably use the FE<lt>E<gt>
142for these.
143
144=item SEE ALSO
145
146Other man pages to check out, like man(1), man(7), makewhatis(8), or catman(8).
147
148=item NOTES
149
150Miscellaneous commentary.
151
152=item CAVEATS
153
154Things to take special care with; sometimes called WARNINGS.
155
156=item DIAGNOSTICS
157
158All possible messages the program can print out--and
159what they mean.
160
161=item BUGS
162
163Things that are broken or just don't work quite right.
164
165=item RESTRICTIONS
166
167Bugs you don't plan to fix :-)
168
169=item AUTHOR
170
171Who wrote it (or AUTHORS if multiple).
172
173=item HISTORY
174
175Programs derived from other sources sometimes have this, or
176you might keep a modification log here.
177
178=back
179
180=head1 EXAMPLES
181
182    pod2man program > program.1
183    pod2man some_module.pm > /usr/perl/man/man3/some_module.3
184    pod2man --section=7 note.pod > note.7
185
186=head1 DIAGNOSTICS
187
188The following diagnostics are generated by B<pod2man>.  Items
189marked "(W)" are non-fatal, whereas the "(F)" errors will cause
190B<pod2man> to immediately exit with a non-zero status.
191
192=over 4
193
194=item bad option in paragraph %d of %s: ``%s'' should be [%s]<%s>
195
196(W) If you start include an option, you should set it off
197as bold, italic, or code.
198
199=item can't open %s: %s
200
201(F) The input file wasn't available for the given reason.
202
203=item Improper man page - no dash in NAME header in paragraph %d of %s
204
205(W) The NAME header did not have an isolated dash in it.  This is
206considered important.
207
208=item Invalid man page - no NAME line in %s
209
210(F) You did not include a NAME header, which is essential.
211
212=item roff font should be 1 or 2 chars, not `%s'  (F)
213
214(F) The font specified with the C<--fixed> option was not
215a one- or two-digit roff font.
216
217=item %s is missing required section: %s
218
219(W) Required sections include NAME, DESCRIPTION, and if you're
220using a section starting with a 3, also a SYNOPSIS.  Actually,
221not having a NAME is a fatal.
222
223=item Unknown escape: %s in %s
224
225(W) An unknown HTML entity (probably for an 8-bit character) was given via
226a C<EE<lt>E<gt>> directive.  Besides amp, lt, gt, and quot, recognized
227entities are Aacute, aacute, Acirc, acirc, AElig, aelig, Agrave, agrave,
228Aring, aring, Atilde, atilde, Auml, auml, Ccedil, ccedil, Eacute, eacute,
229Ecirc, ecirc, Egrave, egrave, ETH, eth, Euml, euml, Iacute, iacute, Icirc,
230icirc, Igrave, igrave, Iuml, iuml, Ntilde, ntilde, Oacute, oacute, Ocirc,
231ocirc, Ograve, ograve, Oslash, oslash, Otilde, otilde, Ouml, ouml, szlig,
232THORN, thorn, Uacute, uacute, Ucirc, ucirc, Ugrave, ugrave, Uuml, uuml,
233Yacute, yacute, and yuml.
234
235=item Unmatched =back
236
237(W) You have a C<=back> without a corresponding C<=over>.
238
239=item Unrecognized pod directive: %s
240
241(W) You specified a pod directive that isn't in the known list of
242C<=head1>, C<=head2>, C<=item>, C<=over>, C<=back>, or C<=cut>.
243
244
245=back
246
247=head1 NOTES
248
249If you would like to print out a lot of man page continuously, you
250probably want to set the C and D registers to set contiguous page
251numbering and even/odd paging, at least on some versions of man(7).
252Settting the F register will get you some additional experimental
253indexing:
254
255    troff -man -rC1 -rD1 -rF1 perl.1 perldata.1 perlsyn.1 ...
256
257The indexing merely outputs messages via C<.tm> for each
258major page, section, subsection, item, and any C<XE<lt>E<gt>>
259directives.
260
261
262=head1 RESTRICTIONS
263
264None at this time.
265
266=head1 BUGS
267
268The =over and =back directives don't really work right.  They
269take absolute positions instead of offsets, don't nest well, and
270making people count is suboptimal in any event.
271
272=head1 AUTHORS
273
274Original prototype by Larry Wall, but so massively hacked over by
275Tom Christiansen such that Larry probably doesn't recognize it anymore.
276
277=cut
278
279$/ = "";
280$cutting = 1;
281@Indices = ();
282
283# We try first to get the version number from a local binary, in case we're
284# running an installed version of Perl to produce documentation from an
285# uninstalled newer version's pod files.
286if ($^O ne 'plan9' and $^O ne 'dos' and $^O ne 'os2' and $^O ne 'MSWin32') {
287  my $perl = (-x './perl' && -f './perl' ) ?
288                 './perl' :
289                 ((-x '../perl' && -f '../perl') ?
290                      '../perl' :
291                      '');
292  ($version,$patch) = `$perl -e 'print $]'` =~ /^(\d\.\d{3})(\d{2})?/ if $perl;
293}
294# No luck; we'll just go with the running Perl's version
295($version,$patch) = $] =~ /^(.{5})(\d{2})?/ unless $version;
296$DEF_RELEASE  = "perl $version";
297$DEF_RELEASE .= ", patch $patch" if $patch;
298
299
300sub makedate {
301    my $secs = shift;
302    my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($secs);
303    my $mname = (qw{Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec})[$mon];
304    $year += 1900;
305    return "$mday/$mname/$year";
306}
307
308use Getopt::Long;
309
310$DEF_SECTION = 1;
311$DEF_CENTER = "User Contributed Perl Documentation";
312$STD_CENTER = "Perl Programmers Reference Guide";
313$DEF_FIXED = 'CW';
314$DEF_LAX = 0;
315
316sub usage {
317    warn "$0: @_\n" if @_;
318    die <<EOF;
319usage: $0 [options] podpage
320Options are:
321	--section=manext      (default "$DEF_SECTION")
322	--release=relpatch    (default "$DEF_RELEASE")
323	--center=string       (default "$DEF_CENTER")
324	--date=string         (default "$DEF_DATE")
325	--fixed=font	      (default "$DEF_FIXED")
326	--official	      (default NOT)
327	--lax                 (default NOT)
328EOF
329}
330
331$uok = GetOptions( qw(
332	section=s
333	release=s
334	center=s
335	date=s
336	fixed=s
337	official
338	lax
339	help));
340
341$DEF_DATE = makedate((stat($ARGV[0]))[9] || time());
342
343usage("Usage error!") unless $uok;
344usage() if $opt_help;
345usage("Need one and only one podpage argument") unless @ARGV == 1;
346
347$section = $opt_section || ($ARGV[0] =~ /\.pm$/
348				? $DEF_PM_SECTION : $DEF_SECTION);
349$RP = $opt_release || $DEF_RELEASE;
350$center = $opt_center || ($opt_official ? $STD_CENTER : $DEF_CENTER);
351$lax = $opt_lax || $DEF_LAX;
352
353$CFont = $opt_fixed || $DEF_FIXED;
354
355if (length($CFont) == 2) {
356    $CFont_embed = "\\f($CFont";
357}
358elsif (length($CFont) == 1) {
359    $CFont_embed = "\\f$CFont";
360}
361else {
362    die "roff font should be 1 or 2 chars, not `$CFont_embed'";
363}
364
365$date = $opt_date || $DEF_DATE;
366
367for (qw{NAME DESCRIPTION}) {
368# for (qw{NAME DESCRIPTION AUTHOR}) {
369    $wanna_see{$_}++;
370}
371$wanna_see{SYNOPSIS}++ if $section =~ /^3/;
372
373
374$name = @ARGV ? $ARGV[0] : "<STDIN>";
375$Filename = $name;
376if ($section =~ /^1/) {
377    require File::Basename;
378    $name = uc File::Basename::basename($name);
379}
380$name =~ s/\.(pod|p[lm])$//i;
381
382# Lose everything up to the first of
383#     */lib/*perl*	standard or site_perl module
384#     */*perl*/lib	from -D prefix=/opt/perl
385#     */*perl*/		random module hierarchy
386# which works.
387$name =~ s-//+-/-g;
388if ($name =~ s-^.*?/lib/[^/]*perl[^/]*/--i
389	or $name =~ s-^.*?/[^/]*perl[^/]*/lib/--i
390	or $name =~ s-^.*?/[^/]*perl[^/]*/--i) {
391    # Lose ^site(_perl)?/.
392    $name =~ s-^site(_perl)?/--;
393    # Lose ^arch/.	(XXX should we use Config? Just for archname?)
394    $name =~ s~^(.*-$^O|$^O-.*)/~~o;
395    # Lose ^version/.
396    $name =~ s-^\d+\.\d+/--;
397}
398
399# Translate Getopt/Long to Getopt::Long, etc.
400$name =~ s(/)(::)g;
401
402if ($name ne 'something') {
403    FCHECK: {
404	open(F, "< $ARGV[0]") || die "can't open $ARGV[0]: $!";
405	while (<F>) {
406	    next unless /^=\b/;
407	    if (/^=head1\s+NAME\s*$/) {  # an /m would forgive mistakes
408		$_ = <F>;
409		unless (/\s*-+\s+/) {
410		    $oops++;
411		    warn "$0: Improper man page - no dash in NAME header in paragraph $. of $ARGV[0]\n"
412                } else {
413		    my @n = split /\s+-+\s+/;
414		    if (@n != 2) {
415			$oops++;
416			warn "$0: Improper man page - malformed NAME header in paragraph $. of $ARGV[0]\n"
417		    }
418		    else {
419			$n[0] =~ s/\n/ /g;
420			$n[1] =~ s/\n/ /g;
421			%namedesc = @n;
422		    }
423		}
424		last FCHECK;
425	    }
426	    next if /^=cut\b/;	# DB_File and Net::Ping have =cut before NAME
427	    next if /^=pod\b/;  # It is OK to have =pod before NAME
428	    next if /^=(for|begin|end)\s+comment\b/;  # It is OK to have =for =begin or =end comment before NAME
429	    die "$0: Invalid man page - 1st pod line is not NAME in $ARGV[0]\n" unless $lax;
430	}
431	die "$0: Invalid man page - no documentation in $ARGV[0]\n" unless $lax;
432    }
433    close F;
434}
435
436print <<"END";
437.rn '' }`
438''' \$RCSfile\$\$Revision\$\$Date\$
439'''
440''' \$Log\$
441'''
442.de Sh
443.br
444.if t .Sp
445.ne 5
446.PP
447\\fB\\\\\$1\\fR
448.PP
449..
450.de Sp
451.if t .sp .5v
452.if n .sp
453..
454.de Ip
455.br
456.ie \\\\n(.\$>=3 .ne \\\\\$3
457.el .ne 3
458.IP "\\\\\$1" \\\\\$2
459..
460.de Vb
461.ft $CFont
462.nf
463.ne \\\\\$1
464..
465.de Ve
466.ft R
467
468.fi
469..
470'''
471'''
472'''     Set up \\*(-- to give an unbreakable dash;
473'''     string Tr holds user defined translation string.
474'''     Bell System Logo is used as a dummy character.
475'''
476.tr \\(*W-|\\(bv\\*(Tr
477.ie n \\{\\
478.ds -- \\(*W-
479.ds PI pi
480.if (\\n(.H=4u)&(1m=24u) .ds -- \\(*W\\h'-12u'\\(*W\\h'-12u'-\\" diablo 10 pitch
481.if (\\n(.H=4u)&(1m=20u) .ds -- \\(*W\\h'-12u'\\(*W\\h'-8u'-\\" diablo 12 pitch
482.ds L" ""
483.ds R" ""
484'''   \\*(M", \\*(S", \\*(N" and \\*(T" are the equivalent of
485'''   \\*(L" and \\*(R", except that they are used on ".xx" lines,
486'''   such as .IP and .SH, which do another additional levels of
487'''   double-quote interpretation
488.ds M" """
489.ds S" """
490.ds N" """""
491.ds T" """""
492.ds L' '
493.ds R' '
494.ds M' '
495.ds S' '
496.ds N' '
497.ds T' '
498'br\\}
499.el\\{\\
500.ds -- \\(em\\|
501.tr \\*(Tr
502.ds L" ``
503.ds R" ''
504.ds M" ``
505.ds S" ''
506.ds N" ``
507.ds T" ''
508.ds L' `
509.ds R' '
510.ds M' `
511.ds S' '
512.ds N' `
513.ds T' '
514.ds PI \\(*p
515'br\\}
516END
517
518print <<'END';
519.\"	If the F register is turned on, we'll generate
520.\"	index entries out stderr for the following things:
521.\"		TH	Title
522.\"		SH	Header
523.\"		Sh	Subsection
524.\"		Ip	Item
525.\"		X<>	Xref  (embedded
526.\"	Of course, you have to process the output yourself
527.\"	in some meaninful fashion.
528.if \nF \{
529.de IX
530.tm Index:\\$1\t\\n%\t"\\$2"
531..
532.nr % 0
533.rr F
534.\}
535END
536
537print <<"END";
538.TH $name $section "$RP" "$date" "$center"
539.UC
540END
541
542push(@Indices, qq{.IX Title "$name $section"});
543
544while (($name, $desc) = each %namedesc) {
545    for ($name, $desc) { s/^\s+//; s/\s+$//; }
546    push(@Indices, qq(.IX Name "$name - $desc"\n));
547}
548
549print <<'END';
550.if n .hy 0
551.if n .na
552.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
553.de CQ          \" put $1 in typewriter font
554END
555print ".ft $CFont\n";
556print <<'END';
557'if n "\c
558'if t \\&\\$1\c
559'if n \\&\\$1\c
560'if n \&"
561\\&\\$2 \\$3 \\$4 \\$5 \\$6 \\$7
562'.ft R
563..
564.\" @(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2
565.	\" AM - accent mark definitions
566.bd B 3
567.	\" fudge factors for nroff and troff
568.if n \{\
569.	ds #H 0
570.	ds #V .8m
571.	ds #F .3m
572.	ds #[ \f1
573.	ds #] \fP
574.\}
575.if t \{\
576.	ds #H ((1u-(\\\\n(.fu%2u))*.13m)
577.	ds #V .6m
578.	ds #F 0
579.	ds #[ \&
580.	ds #] \&
581.\}
582.	\" simple accents for nroff and troff
583.if n \{\
584.	ds ' \&
585.	ds ` \&
586.	ds ^ \&
587.	ds , \&
588.	ds ~ ~
589.	ds ? ?
590.	ds ! !
591.	ds /
592.	ds q
593.\}
594.if t \{\
595.	ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
596.	ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
597.	ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
598.	ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
599.	ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
600.	ds ? \s-2c\h'-\w'c'u*7/10'\u\h'\*(#H'\zi\d\s+2\h'\w'c'u*8/10'
601.	ds ! \s-2\(or\s+2\h'-\w'\(or'u'\v'-.8m'.\v'.8m'
602.	ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
603.	ds q o\h'-\w'o'u*8/10'\s-4\v'.4m'\z\(*i\v'-.4m'\s+4\h'\w'o'u*8/10'
604.\}
605.	\" troff and (daisy-wheel) nroff accents
606.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
607.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
608.ds v \\k:\h'-(\\n(.wu*9/10-\*(#H)'\v'-\*(#V'\*(#[\s-4v\s0\v'\*(#V'\h'|\\n:u'\*(#]
609.ds _ \\k:\h'-(\\n(.wu*9/10-\*(#H+(\*(#F*2/3))'\v'-.4m'\z\(hy\v'.4m'\h'|\\n:u'
610.ds . \\k:\h'-(\\n(.wu*8/10)'\v'\*(#V*4/10'\z.\v'-\*(#V*4/10'\h'|\\n:u'
611.ds 3 \*(#[\v'.2m'\s-2\&3\s0\v'-.2m'\*(#]
612.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
613.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
614.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
615.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
616.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
617.ds ae a\h'-(\w'a'u*4/10)'e
618.ds Ae A\h'-(\w'A'u*4/10)'E
619.ds oe o\h'-(\w'o'u*4/10)'e
620.ds Oe O\h'-(\w'O'u*4/10)'E
621.	\" corrections for vroff
622.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
623.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
624.	\" for low resolution devices (crt and lpr)
625.if \n(.H>23 .if \n(.V>19 \
626\{\
627.	ds : e
628.	ds 8 ss
629.	ds v \h'-1'\o'\(aa\(ga'
630.	ds _ \h'-1'^
631.	ds . \h'-1'.
632.	ds 3 3
633.	ds o a
634.	ds d- d\h'-1'\(ga
635.	ds D- D\h'-1'\(hy
636.	ds th \o'bp'
637.	ds Th \o'LP'
638.	ds ae ae
639.	ds Ae AE
640.	ds oe oe
641.	ds Oe OE
642.\}
643.rm #[ #] #H #V #F C
644END
645
646$indent = 0;
647
648$begun = "";
649
650# Unrolling [^A-Z>]|[A-Z](?!<) gives:    // MRE pp 165.
651my $nonest = '(?:[^A-Z>]*(?:[A-Z](?!<)[^A-Z>]*)*)';
652
653while (<>) {
654    if ($cutting) {
655	next unless /^=/;
656	$cutting = 0;
657    }
658    if ($begun) {
659	if (/^=end\s+$begun/) {
660            $begun = "";
661	}
662	elsif ($begun =~ /^(roff|man)$/) {
663	    print STDOUT $_;
664        }
665	next;
666    }
667    chomp;
668
669    # Translate verbatim paragraph
670
671    if (/^\s/) {
672	@lines = split(/\n/);
673	for (@lines) {
674	    1 while s
675		{^( [^\t]* ) \t ( \t* ) }
676		{ $1 . ' ' x (8 - (length($1)%8) + 8 * (length($2))) }ex;
677	    s/\\/\\e/g;
678	    s/\A/\\&/s;
679	}
680	$lines = @lines;
681	makespace() unless $verbatim++;
682	print ".Vb $lines\n";
683	print join("\n", @lines), "\n";
684	print ".Ve\n";
685	$needspace = 0;
686	next;
687    }
688
689    $verbatim = 0;
690
691    if (/^=for\s+(\S+)\s*/s) {
692	if ($1 eq "man" or $1 eq "roff") {
693	    print STDOUT $',"\n\n";
694	} else {
695	    # ignore unknown for
696	}
697	next;
698    }
699    elsif (/^=begin\s+(\S+)\s*/s) {
700	$begun = $1;
701	if ($1 eq "man" or $1 eq "roff") {
702	    print STDOUT $'."\n\n";
703	}
704	next;
705    }
706
707    # check for things that'll hosed our noremap scheme; affects $_
708    init_noremap();
709
710    if (!/^=item/) {
711
712	# trofficate backslashes; must do it before what happens below
713	s/\\/noremap('\\e')/ge;
714
715	# protect leading periods and quotes against *roff
716	# mistaking them for directives
717	s/^(?:[A-Z]<)?[.']/\\&$&/gm;
718
719	# first hide the escapes in case we need to
720	# intuit something and get it wrong due to fmting
721
722	1 while s/([A-Z]<$nonest>)/noremap($1)/ge;
723
724	# func() is a reference to a perl function
725	s{
726	    \b
727	    (
728		[:\w]+ \(\)
729	    )
730	} {I<$1>}gx;
731
732	# func(n) is a reference to a perl function or a man page
733	s{
734	    ([:\w]+)
735	    (
736		\( [^\051]+ \)
737	    )
738	} {I<$1>\\|$2}gx;
739
740	# convert simple variable references
741	s/(\s+)([\$\@%][\w:]+)(?!\()/${1}C<$2>/g;
742
743	if (m{ (
744		    [\-\w]+
745		    \(
746			[^\051]*?
747			[\@\$,]
748			[^\051]*?
749		    \)
750		)
751	    }x && $` !~ /([LCI]<[^<>]*|-)$/ && !/^=\w/)
752	{
753	    warn "$0: bad option in paragraph $. of $ARGV: ``$1'' should be [LCI]<$1>\n";
754	    $oops++;
755	}
756
757	while (/(-[a-zA-Z])\b/g && $` !~ /[\w\-]$/) {
758	    warn "$0: bad option in paragraph $. of $ARGV: ``$1'' should be [CB]<$1>\n";
759	    $oops++;
760	}
761
762	# put it back so we get the <> processed again;
763	clear_noremap(0); # 0 means leave the E's
764
765    } else {
766	# trofficate backslashes
767	s/\\/noremap('\\e')/ge;
768
769    }
770
771    # need to hide E<> first; they're processed in clear_noremap
772    s/(E<[^<>]+>)/noremap($1)/ge;
773
774
775    $maxnest = 10;
776    while ($maxnest-- && /[A-Z]</) {
777
778	# can't do C font here
779	s/([BI])<($nonest)>/font($1) . $2 . font('R')/eg;
780
781	# files and filelike refs in italics
782	s/F<($nonest)>/I<$1>/g;
783
784	# no break -- usually we want C<> for this
785	s/S<($nonest)>/nobreak($1)/eg;
786
787	# LREF: a la HREF L<show this text|man/section>
788	s:L<([^|>]+)\|[^>]+>:$1:g;
789
790	# LREF: a manpage(3f)
791	s:L<([a-zA-Z][^\s\/]+)(\([^\)]+\))?>:the I<$1>$2 manpage:g;
792
793	# LREF: an =item on another manpage
794	s{
795	    L<
796		([^/]+)
797		/
798		(
799		    [:\w]+
800		    (\(\))?
801		)
802	    >
803	} {the C<$2> entry in the I<$1> manpage}gx;
804
805	# LREF: an =item on this manpage
806	s{
807	   ((?:
808	    L<
809		/
810		(
811		    [:\w]+
812		    (\(\))?
813		)
814	    >
815	    (,?\s+(and\s+)?)?
816	  )+)
817	} { internal_lrefs($1) }gex;
818
819	# LREF: a =head2 (head1?), maybe on a manpage, maybe right here
820	# the "func" can disambiguate
821	s{
822	    L<
823		(?:
824		    ([a-zA-Z]\S+?) /
825		)?
826		"?(.*?)"?
827	    >
828	}{
829	    do {
830		$1 	# if no $1, assume it means on this page.
831		    ?  "the section on I<$2> in the I<$1> manpage"
832		    :  "the section on I<$2>"
833	    }
834	}gesx; # s in case it goes over multiple lines, so . matches \n
835
836	s/Z<>/\\&/g;
837
838	# comes last because not subject to reprocessing
839	s/C<($nonest)>/noremap("${CFont_embed}${1}\\fR")/eg;
840    }
841
842    if (s/^=//) {
843	$needspace = 0;		# Assume this.
844
845	s/\n/ /g;
846
847	($Cmd, $_) = split(' ', $_, 2);
848
849	$dotlevel = 1;
850	if ($Cmd eq 'head1') {
851	   $dotlevel = 1;
852	}
853	elsif ($Cmd eq 'head2') {
854	   $dotlevel = 1;
855	}
856	elsif ($Cmd eq 'item') {
857	   $dotlevel = 2;
858	}
859
860	if (defined $_) {
861	    &escapes($dotlevel);
862	    s/"/""/g;
863	}
864
865	clear_noremap(1);
866
867	if ($Cmd eq 'cut') {
868	    $cutting = 1;
869	}
870	elsif ($Cmd eq 'head1') {
871	    s/\s+$//;
872	    delete $wanna_see{$_} if exists $wanna_see{$_};
873	    print qq{.SH "$_"\n};
874      push(@Indices, qq{.IX Header "$_"\n});
875	}
876	elsif ($Cmd eq 'head2') {
877	    print qq{.Sh "$_"\n};
878      push(@Indices, qq{.IX Subsection "$_"\n});
879	}
880	elsif ($Cmd eq 'over') {
881	    push(@indent,$indent);
882	    $indent += ($_ + 0) || 5;
883	}
884	elsif ($Cmd eq 'back') {
885	    $indent = pop(@indent);
886	    warn "$0: Unmatched =back in paragraph $. of $ARGV\n" unless defined $indent;
887	    $needspace = 1;
888	}
889	elsif ($Cmd eq 'item') {
890	    s/^\*( |$)/\\(bu$1/g;
891	    # if you know how to get ":s please do
892	    s/\\\*\(L"([^"]+?)\\\*\(R"/'$1'/g;
893	    s/\\\*\(L"([^"]+?)""/'$1'/g;
894	    s/[^"]""([^"]+?)""[^"]/'$1'/g;
895	    # here do something about the $" in perlvar?
896	    print STDOUT qq{.Ip "$_" $indent\n};
897      push(@Indices, qq{.IX Item "$_"\n});
898	}
899	elsif ($Cmd eq 'pod') {
900	    # this is just a comment
901	}
902	else {
903	    warn "$0: Unrecognized pod directive in paragraph $. of $ARGV: $Cmd\n";
904	}
905    }
906    else {
907	if ($needspace) {
908	    &makespace;
909	}
910	&escapes(0);
911	clear_noremap(1);
912	print $_, "\n";
913	$needspace = 1;
914    }
915}
916
917print <<"END";
918
919.rn }` ''
920END
921
922if (%wanna_see && !$lax) {
923    @missing = keys %wanna_see;
924    warn "$0: $Filename is missing required section"
925	.  (@missing > 1 && "s")
926	.  ": @missing\n";
927    $oops++;
928}
929
930foreach (@Indices) { print "$_\n"; }
931
932exit;
933#exit ($oops != 0);
934
935#########################################################################
936
937sub nobreak {
938    my $string = shift;
939    $string =~ s/ /\\ /g;
940    $string;
941}
942
943sub escapes {
944    my $indot = shift;
945
946    s/X<(.*?)>/mkindex($1)/ge;
947
948    # translate the minus in foo-bar into foo\-bar for roff
949    s/([^0-9a-z-])-([^-])/$1\\-$2/g;
950
951    # make -- into the string version \*(-- (defined above)
952    s/\b--\b/\\*(--/g;
953    s/"--([^"])/"\\*(--$1/g;  # should be a better way
954    s/([^"])--"/$1\\*(--"/g;
955
956    # fix up quotes; this is somewhat tricky
957    my $dotmacroL = 'L';
958    my $dotmacroR = 'R';
959    if ( $indot == 1 ) {
960	$dotmacroL = 'M';
961	$dotmacroR = 'S';
962    }
963    elsif ( $indot >= 2 ) {
964	$dotmacroL = 'N';
965	$dotmacroR = 'T';
966    }
967    if (!/""/) {
968	s/(^|\s)(['"])/noremap("$1\\*($dotmacroL$2")/ge;
969	s/(['"])($|[\-\s,;\\!?.])/noremap("\\*($dotmacroR$1$2")/ge;
970    }
971
972    #s/(?!")(?:.)--(?!")(?:.)/\\*(--/g;
973    #s/(?:(?!")(?:.)--(?:"))|(?:(?:")--(?!")(?:.))/\\*(--/g;
974
975
976    # make sure that func() keeps a bit a space tween the parens
977    ### s/\b\(\)/\\|()/g;
978    ### s/\b\(\)/(\\|)/g;
979
980    # make C++ into \*C+, which is a squinched version (defined above)
981    s/\bC\+\+/\\*(C+/g;
982
983    # make double underbars have a little tiny space between them
984    s/__/_\\|_/g;
985
986    # PI goes to \*(PI (defined above)
987    s/\bPI\b/noremap('\\*(PI')/ge;
988
989    # make all caps a teeny bit smaller, but don't muck with embedded code literals
990    my $hidCFont = font('C');
991    if ($Cmd !~ /^head1/) { # SH already makes smaller
992	# /g isn't enough; 1 while or we'll be off
993
994#	1 while s{
995#	    (?!$hidCFont)(..|^.|^)
996#	    \b
997#	    (
998#		[A-Z][\/A-Z+:\-\d_$.]+
999#	    )
1000#	    (s?)
1001#	    \b
1002#	} {$1\\s-1$2\\s0}gmox;
1003
1004	1 while s{
1005	    (?!$hidCFont)(..|^.|^)
1006	    (
1007		\b[A-Z]{2,}[\/A-Z+:\-\d_\$]*\b
1008	    )
1009	} {
1010	    $1 . noremap( '\\s-1' .  $2 . '\\s0' )
1011	}egmox;
1012
1013    }
1014}
1015
1016# make troff just be normal, but make small nroff get quoted
1017# decided to just put the quotes in the text; sigh;
1018sub ccvt {
1019    local($_,$prev) = @_;
1020    noremap(qq{.CQ "$_" \n\\&});
1021}
1022
1023sub makespace {
1024    if ($indent) {
1025	print ".Sp\n";
1026    }
1027    else {
1028	print ".PP\n";
1029    }
1030}
1031
1032sub mkindex {
1033    my ($entry) = @_;
1034    my @entries = split m:\s*/\s*:, $entry;
1035    push @Indices, ".IX Xref " . join ' ', map {qq("$_")} @entries;
1036    return '';
1037}
1038
1039sub font {
1040    local($font) = shift;
1041    return '\\f' . noremap($font);
1042}
1043
1044sub noremap {
1045    local($thing_to_hide) = shift;
1046    $thing_to_hide =~ tr/\000-\177/\200-\377/;
1047    return $thing_to_hide;
1048}
1049
1050sub init_noremap {
1051	# escape high bit characters in input stream
1052	s/([\200-\377])/"E<".ord($1).">"/ge;
1053}
1054
1055sub clear_noremap {
1056    my $ready_to_print = $_[0];
1057
1058    tr/\200-\377/\000-\177/;
1059
1060    # trofficate backslashes
1061    # s/(?!\\e)(?:..|^.|^)\\/\\e/g;
1062
1063    # now for the E<>s, which have been hidden until now
1064    # otherwise the interative \w<> processing would have
1065    # been hosed by the E<gt>
1066    s {
1067	    E<
1068	    (
1069	        ( \d + )
1070	        | ( [A-Za-z]+ )
1071	    )
1072	    >
1073    } {
1074	 do {
1075	     defined $2
1076		? chr($2)
1077		:
1078	     exists $HTML_Escapes{$3}
1079		? do { $HTML_Escapes{$3} }
1080		: do {
1081		    warn "$0: Unknown escape in paragraph $. of $ARGV: ``$&''\n";
1082		    "E<$1>";
1083		}
1084	 }
1085    }egx if $ready_to_print;
1086}
1087
1088sub internal_lrefs {
1089    local($_) = shift;
1090    local $trailing_and = s/and\s+$// ? "and " : "";
1091
1092    s{L</([^>]+)>}{$1}g;
1093    my(@items) = split( /(?:,?\s+(?:and\s+)?)/ );
1094    my $retstr = "the ";
1095    my $i;
1096    for ($i = 0; $i <= $#items; $i++) {
1097	$retstr .= "C<$items[$i]>";
1098	$retstr .= ", " if @items > 2 && $i != $#items;
1099	$retstr .= " and " if $i+2 == @items;
1100    }
1101
1102    $retstr .= " entr" . ( @items > 1  ? "ies" : "y" )
1103	    .  " elsewhere in this document";
1104    # terminal space to avoid words running together (pattern used
1105    # strips terminal spaces)
1106    $retstr .= " " if length $trailing_and;
1107    $retstr .=  $trailing_and;
1108
1109    return $retstr;
1110
1111}
1112
1113BEGIN {
1114%HTML_Escapes = (
1115    'amp'	=>	'&',	#   ampersand
1116    'lt'	=>	'<',	#   left chevron, less-than
1117    'gt'	=>	'>',	#   right chevron, greater-than
1118    'quot'	=>	'"',	#   double quote
1119
1120    "Aacute"	=>	"A\\*'",	#   capital A, acute accent
1121    "aacute"	=>	"a\\*'",	#   small a, acute accent
1122    "Acirc"	=>	"A\\*^",	#   capital A, circumflex accent
1123    "acirc"	=>	"a\\*^",	#   small a, circumflex accent
1124    "AElig"	=>	'\*(AE',	#   capital AE diphthong (ligature)
1125    "aelig"	=>	'\*(ae',	#   small ae diphthong (ligature)
1126    "Agrave"	=>	"A\\*`",	#   capital A, grave accent
1127    "agrave"	=>	"A\\*`",	#   small a, grave accent
1128    "Aring"	=>	'A\\*o',	#   capital A, ring
1129    "aring"	=>	'a\\*o',	#   small a, ring
1130    "Atilde"	=>	'A\\*~',	#   capital A, tilde
1131    "atilde"	=>	'a\\*~',	#   small a, tilde
1132    "Auml"	=>	'A\\*:',	#   capital A, dieresis or umlaut mark
1133    "auml"	=>	'a\\*:',	#   small a, dieresis or umlaut mark
1134    "Ccedil"	=>	'C\\*,',	#   capital C, cedilla
1135    "ccedil"	=>	'c\\*,',	#   small c, cedilla
1136    "Eacute"	=>	"E\\*'",	#   capital E, acute accent
1137    "eacute"	=>	"e\\*'",	#   small e, acute accent
1138    "Ecirc"	=>	"E\\*^",	#   capital E, circumflex accent
1139    "ecirc"	=>	"e\\*^",	#   small e, circumflex accent
1140    "Egrave"	=>	"E\\*`",	#   capital E, grave accent
1141    "egrave"	=>	"e\\*`",	#   small e, grave accent
1142    "ETH"	=>	'\\*(D-',	#   capital Eth, Icelandic
1143    "eth"	=>	'\\*(d-',	#   small eth, Icelandic
1144    "Euml"	=>	"E\\*:",	#   capital E, dieresis or umlaut mark
1145    "euml"	=>	"e\\*:",	#   small e, dieresis or umlaut mark
1146    "Iacute"	=>	"I\\*'",	#   capital I, acute accent
1147    "iacute"	=>	"i\\*'",	#   small i, acute accent
1148    "Icirc"	=>	"I\\*^",	#   capital I, circumflex accent
1149    "icirc"	=>	"i\\*^",	#   small i, circumflex accent
1150    "Igrave"	=>	"I\\*`",	#   capital I, grave accent
1151    "igrave"	=>	"i\\*`",	#   small i, grave accent
1152    "Iuml"	=>	"I\\*:",	#   capital I, dieresis or umlaut mark
1153    "iuml"	=>	"i\\*:",	#   small i, dieresis or umlaut mark
1154    "Ntilde"	=>	'N\*~',		#   capital N, tilde
1155    "ntilde"	=>	'n\*~',		#   small n, tilde
1156    "Oacute"	=>	"O\\*'",	#   capital O, acute accent
1157    "oacute"	=>	"o\\*'",	#   small o, acute accent
1158    "Ocirc"	=>	"O\\*^",	#   capital O, circumflex accent
1159    "ocirc"	=>	"o\\*^",	#   small o, circumflex accent
1160    "Ograve"	=>	"O\\*`",	#   capital O, grave accent
1161    "ograve"	=>	"o\\*`",	#   small o, grave accent
1162    "Oslash"	=>	"O\\*/",	#   capital O, slash
1163    "oslash"	=>	"o\\*/",	#   small o, slash
1164    "Otilde"	=>	"O\\*~",	#   capital O, tilde
1165    "otilde"	=>	"o\\*~",	#   small o, tilde
1166    "Ouml"	=>	"O\\*:",	#   capital O, dieresis or umlaut mark
1167    "ouml"	=>	"o\\*:",	#   small o, dieresis or umlaut mark
1168    "szlig"	=>	'\*8',		#   small sharp s, German (sz ligature)
1169    "THORN"	=>	'\\*(Th',	#   capital THORN, Icelandic
1170    "thorn"	=>	'\\*(th',,	#   small thorn, Icelandic
1171    "Uacute"	=>	"U\\*'",	#   capital U, acute accent
1172    "uacute"	=>	"u\\*'",	#   small u, acute accent
1173    "Ucirc"	=>	"U\\*^",	#   capital U, circumflex accent
1174    "ucirc"	=>	"u\\*^",	#   small u, circumflex accent
1175    "Ugrave"	=>	"U\\*`",	#   capital U, grave accent
1176    "ugrave"	=>	"u\\*`",	#   small u, grave accent
1177    "Uuml"	=>	"U\\*:",	#   capital U, dieresis or umlaut mark
1178    "uuml"	=>	"u\\*:",	#   small u, dieresis or umlaut mark
1179    "Yacute"	=>	"Y\\*'",	#   capital Y, acute accent
1180    "yacute"	=>	"y\\*'",	#   small y, acute accent
1181    "yuml"	=>	"y\\*:",	#   small y, dieresis or umlaut mark
1182);
1183}
1184
1185