History log of /openbsd-current/usr.bin/mandoc/mdoc.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.164 06-Apr-2020 schwarze

Support manual tagging of .Pp, .Bd, .D1, .Dl, .Bl, and .It.
In HTML output, improve the logic for writing inside permalinks:
skip them when there is no child content or when there is a risk
that the children might contain flow content.


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.163 31-Dec-2018 schwarze

Cleanup, no functional change:
Use the new parser flag ROFF_NOFILL in the mdoc(7) parser, too,
instead of the old MDOC_LITERAL, which was an alias for the
former MAN_LITERAL.


# 1.162 31-Dec-2018 schwarze

Cleanup, minus 15 LOC, no functional change:
Simplify the way the man(7) and mdoc(7) validators are called.
Reset the parser state with a common function before calling them.
There is no need to again reset the parser state afterwards,
the parsers are no longer used after validation.
This allows getting rid of man_node_validate() and mdoc_node_validate()
as separate functions.


# 1.161 30-Dec-2018 schwarze

Cleanup, no functional change:

The struct roff_man used to be a bad mixture of internal parser
state and public parsing results. Move the public results to the
parsing result struct roff_meta, which is already public. Move the
rest of struct roff_man to the parser-internal header roff_int.h.

Since the validators need access to the parser state, call them
from the top level parser during mparse_result() rather than from
the main programs, also reducing code duplication.

This keeps parser internal state out of thee main programs (five
in mandoc portable) and out of eight formatters.


# 1.160 14-Dec-2018 schwarze

Almost mechanical diff to remove the "struct mparse *" argument
from mandoc_msg(), where it is no longer used.
While here, rename mandoc_vmsg() to mandoc_msg() and retire the
old version: There is really no point in having another function
merely to save "%s" in a few places.
Minus 140 lines of code.


# 1.159 04-Dec-2018 schwarze

Clean up the validation of .Pp, .PP, .sp, and .br. Make sure all
combinations are handled, and are handled in a systematic manner.
This resolves some erratic duplicate handling, handles a number of
missing cases, and improves diagnostics in various respects.

Move validation of .br and .sp to the roff validation module
rather than doing that twice in the mdoc and man validation modules.
Move the node relinking function to the roff library where it belongs.

In validation functions, only look at the node itself, at previous
nodes, and at descendants, not at following nodes or ancestors,
such that only nodes are inspected which are already validated.


Revision tags: OPENBSD_6_4_BASE
# 1.158 17-Aug-2018 schwarze

Remove more pointer arithmetic passing via regions outside the array
that is undefined according to the C standard. Robert Elz <kre at
munnari dot oz dot au> pointed out i wasn't quite done yet.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.157 11-Aug-2017 schwarze

Make the "new sentence, new line" check stricter, allowing digits
in the last two letters of the last word of the sentence.
No false positives in base or Xenocara.
Suggested by and OK jmc@.


# 1.156 17-Jun-2017 schwarze

correct handling of blank lines after \c


# 1.155 07-Jun-2017 schwarze

Also catch "new sentence, new line" if there are three blanks
between the sentences. Thomas Klausner says he has seen some
of these, and i don't see any false positives.


# 1.154 07-Jun-2017 schwarze

Make "new sentence, new line" detection stricter:
Also catch cases where the new sentence starts with a one-letter word
and the input line is broken right after that word.
Suggested by Thomas Klausner <wiz @ NetBSD>.

It's merely a three-bit diff, changing one byte from 0x34 to 0x33,
so what can possibly go wrong...


# 1.153 05-May-2017 schwarze

Move .sp to the roff modules. Enough infrastructure is in place
now that this actually saves code: -70 LOC.


# 1.152 29-Apr-2017 schwarze

Parser unification: use nice ohashes for all three request and macro tables;
no functional change, minus two source files, minus 200 lines of code.


# 1.151 24-Apr-2017 schwarze

Continue parser unification:
* Make enum rofft an internal interface as enum roff_tok in "roff.h".
* Represent mdoc and man macros in enum roff_tok.
* Make TOKEN_NONE a proper enum value and use it throughout.
* Put the prologue macros first in the macro tables.
* Unify mdoc_macroname[] and man_macroname[] into roff_name[].


Revision tags: OPENBSD_6_1_BASE
# 1.150 03-Mar-2017 schwarze

remove a few redundant conditions that jsg@ found with cppcheck


# 1.149 16-Feb-2017 schwarze

Remove the ENDBODY_NOSPACE flag, simplifying the code.

Comparing to groff output, it appears that all cases where it was used
and made a difference actually require the opposite, ENDBODY_SPACE.

I have no idea why i added it back in 2010; maybe to compensate for
some other bug that has long been fixed.


# 1.148 28-Jan-2017 schwarze

Add a warning "new sentence, new line".
This does not attempt to pinpoint each and every offender, but
instead tries very hard to avoid false positives: Currently, there
are only two false positives in the whole OpenBSD base system.
Only do this in mdoc(7), not in man(7), because manuals written
in man(7) typically have much worse problems than this.
OK jmc@ on a previous version of the patch


# 1.147 10-Jan-2017 schwarze

unify names of AST node flags; no change of cpp output


# 1.146 20-Aug-2016 schwarze

If a column list starts with implicit rows (that is, rows without .It)
and roff-level nodes (e.g. tbl or eqn) follow, don't run into an
assertion. Instead, wrap the roff-level nodes in their own row.
Issue found by tb@ with afl(1).


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.145 30-Oct-2015 schwarze

If a .Bd block has no arguments at all, drop the block and only keep
its contents. Removing a gratuitious difference to groff output
found after a related bug report from krw@.


# 1.144 20-Oct-2015 schwarze

In order to become able to generate syntax tree nodes on the roff(7)
level, validation must be separated from parsing and rewinding.
This first big step moves calling of the mdoc(7) post_*() functions
out of the parser loop into their own mdoc_validate() pass, while
using a new mdoc_state() module to make syntax tree state handling
available to both the parser loop and the validation pass.


# 1.143 12-Oct-2015 schwarze

To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.


# 1.142 06-Oct-2015 schwarze

modernize style: "return" is not a function; ok cmp(1)


Revision tags: OPENBSD_5_8_BASE
# 1.141 23-Apr-2015 schwarze

Unify mdoc_deroff() and man_deroff() into a common function deroff().
No functional change except that for mdoc(7), it now skips leading
escape sequences just like it already did for man(7).
Escape sequences rarely occur in mdoc(7) code and if they do,
skipping them is an improvement in this context.
Minus 30 lines of code.


# 1.140 23-Apr-2015 schwarze

Get rid of two empty wrapper functions. No functional change.


# 1.139 19-Apr-2015 schwarze

Unify trickier node handling functions.
* man_elem_alloc() -> roff_elem_alloc()
* man_block_alloc() -> roff_block_alloc()
The functions mdoc_elem_alloc() and mdoc_block_alloc() remain for
now because they need to do mdoc(7)-specific argument processing.


# 1.138 19-Apr-2015 schwarze

Unify some node handling functions that use TOKEN_NONE.
* mdoc_word_alloc(), man_word_alloc() -> roff_word_alloc()
* mdoc_word_append(), man_word_append() -> roff_word_append()
* mdoc_addspan(), man_addspan() -> roff_addtbl()
* mdoc_addeqn(), man_addeqn() -> roff_addeqn()
Minus 50 lines of code, no functional change.


# 1.137 19-Apr-2015 schwarze

Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.


# 1.136 19-Apr-2015 schwarze

Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.


# 1.135 18-Apr-2015 schwarze

Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.


# 1.134 18-Apr-2015 schwarze

Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.


# 1.133 18-Apr-2015 schwarze

Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.


# 1.132 18-Apr-2015 schwarze

Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.


# 1.131 18-Apr-2015 schwarze

Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.


# 1.130 02-Apr-2015 schwarze

Third step towards parser unification:
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.


# 1.129 02-Apr-2015 schwarze

Second step towards parser unification:
Replace struct mdoc_node and struct man_node by a unified struct roff_node.
To be able to use the tok member for both mdoc(7) and man(7) without
defining all the macros in roff.h, sacrifice a tiny bit of type safety
and make tok an int rather than an enum.
Almost mechanical, no functional change.
Written on the Eurostar from Bruxelles to London on the way to p2k15.


# 1.128 02-Apr-2015 schwarze

First step towards parser unification:
Replace enum mdoc_type and enum man_type by a unified enum roff_type.
Almost mechanical, no functional change.
Written on the ICE train from Frankfurt to Bruxelles on the way to p2k15.


Revision tags: OPENBSD_5_7_BASE
# 1.127 12-Feb-2015 schwarze

Do not confuse .Bl -column lists that just broken another block
with newly opened .Bl -column lists;
fixing an assertion failure jsg@ found with afl:
test case #481, Bl It Bl -column It Bd El text text El


# 1.126 12-Feb-2015 schwarze

Delete the mdoc_node.pending pointer and the function calculating
it, make_pending(), which was the most difficult function of the
whole mdoc(7) parser. After almost five years of maintaining this
hellhole, i just noticed the pointer isn't needed after all.

Blocks are always rewound in the reverse order they were opened;
that even holds for broken blocks. Consequently, it is sufficient
to just mark broken blogs with the flag MDOC_BROKEN and breaking
blocks with the flag MDOC_ENDED. When rewinding, instead of iterating
the pending pointers, just iterate from each broken block to its
parents, rewinding all that are MDOC_ENDED and stopping after
processing the first ancestor that it not MDOC_BROKEN. For ENDBODY
markers, use the mdoc_node.body pointer in place of the former
mdoc_node.pending.

This also fixes an assertion failure found by jsg@ with afl,
test case #467 (Bo Bl It Bd Bc It), where (surprise surprise)
the pending pointer got corrupted.

Improved functionality, minus one function, minus one struct field,
minus 50 lines of code.


# 1.125 05-Feb-2015 schwarze

Simplify by deleting the "lastline" member of struct mdoc_node.
Minus one struct member, minus 17 lines of code, no functional change.


# 1.124 02-Feb-2015 schwarze

Get rid of all calls to rew_sub() in blk_exp_close(); only ten calls
remain in other functions. As a bonus, this fixes an assertion failure
jsg@ found some time ago with afl (test case 982) and improves minor
details in error reporting.


# 1.123 15-Jan-2015 schwarze

Fatal errors no longer exist.
If a file can be opened, mandoc will produce some output;
at worst, the output may be almost empty.
Simplifies error handling and frees a message type for future use.


# 1.122 28-Nov-2014 schwarze

Simplify by making the eqn and tbl steering functions void;
no functional change, minus 15 lines of code.


# 1.121 28-Nov-2014 schwarze

Simplify by making the mdoc parser callbacks void, and some cleanup;
no functional change, minus 50 lines of code.


# 1.120 28-Nov-2014 schwarze

Simplify the code by making various mdoc parser helper functions void.
No functional change, minus 130 lines of code.


# 1.119 28-Nov-2014 schwarze

Simplify code by making mdoc validation handlers void.
No functional change, minus 90 lines of code.


# 1.118 19-Nov-2014 schwarze

Escape sequences terminate high-level macro names, and when doing so,
they are ignored, just in the same way as for request names
and for low-level macro names.
This also cures a warning in the pod2man(1) preamble.


# 1.117 20-Oct-2014 schwarze

correct the spacing after in-line equations
that start at the beginning of an input line
but end before the end of an input line


# 1.116 20-Oct-2014 schwarze

correct spacing before inline equations


# 1.115 16-Oct-2014 schwarze

Implement in-line equations, much needed by Xenocara manuals.
Put the steering into the roff parser rather than into the mdoc
parser such that it works for all macro languages and on both text
and macro lines.
Line breaks and blank characters generated before and after in-line
equations are not perfect yet, but let's do one thing at a time.


# 1.114 06-Sep-2014 schwarze

Simplify by handling empty request lines at the one logical place
in the roff parser instead of in three other places in other parsers.
No functional change.


# 1.113 08-Aug-2014 schwarze

Bring the handling of defective prologues even closer to groff,
in particular relaxing the distinction between prologue and body
and further improving messages.
* The last .Dd wins and the last .Os wins, even in the body.
* The last .Dt before the first body macro wins.
* Missing title in .Dt defaults to UNTITLED. Warn about it.
* Missing section in .Dt does not default to 1. But warn about it.
* Do not warn multiple times about the same mdoc(7) prologue macro.
* Warn about missing .Os.
* Incomplete .TH defaults to empty strings. Warn about it.


# 1.112 08-Aug-2014 schwarze

mention requests and macros in more messages


# 1.111 08-Aug-2014 schwarze

Simplify: replace one global flag by one local variable
and remove three unused global flags. No functional change.


Revision tags: OPENBSD_5_6_BASE
# 1.110 09-Jul-2014 schwarze

mark defos as const; nobody needs to change it,
and it is occasionally useful to be able to pass literal strings


# 1.109 07-Jul-2014 schwarze

no need to skip content before first section header


# 1.108 06-Jul-2014 schwarze

Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.


# 1.107 02-Jul-2014 schwarze

Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!


# 1.106 01-Jul-2014 schwarze

Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.


# 1.105 20-Jun-2014 schwarze

Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.

1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue

Started on the plane back from Ottawa.


# 1.104 25-Apr-2014 schwarze

Fix a minor optimization i broke in bsd.lv rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!


# 1.103 20-Apr-2014 schwarze

KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change


# 1.102 30-Mar-2014 schwarze

Implement the roff(7) .ll (line length) request.
Found by naddy@ in the textproc/enchant(1) port.
Of course, do not use this in new manuals.


# 1.101 23-Mar-2014 schwarze

If an .Nd block contains macros, avoid fragmented entries in mandocdb(8),
instead use the .Nd content recursively.
Improves a couple of index entries in base.


# 1.100 21-Mar-2014 schwarze

avoid repetitive code for asprintf error handling


# 1.99 21-Mar-2014 schwarze

The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.


Revision tags: OPENBSD_5_5_BASE
# 1.98 05-Jan-2014 schwarze

Add an option -Q (quick) to mandocdb(8)
for accelerated generation of reduced-size databases.

Implement this by allowing the parsers to optionally
abort the parse sequence after the NAME section.

While here, garbage collect the unused void *arg attribute
of struct mparse and mparse_alloc().

This reduces the processing time of mandocdb(8) on /usr/share/man
by a factor of 2 and the database size by a factor of 4.
However, it still takes 5 times the time and 6 times the space
of makewhatis(8), so more work is clearly needed.


# 1.97 30-Dec-2013 schwarze

Simplify: Remove an unused argument from the mandoc_eos() function.
No functional change.


# 1.96 24-Dec-2013 schwarze

When deciding whether two consecutive macros are on the same input line,
we have to compare the line where the first one *ends* (not where it begins)
to the line where the second one starts.
This fixes the bug that .Bk allowed output line breaks right after block
macros spanning more than one input line, even when the next macro follows
on the same line.


# 1.95 21-Oct-2013 schwarze

There are three kinds of input lines: text lines, macros taking
positional arguments (like Dt Fn Xr) and macros taking text as
arguments (like Nd Sh Em %T An). In the past, even the latter put
each word of their arguments into its own MDOC_TEXT node; instead,
concatenate arguments unless delimiters, keeps or spacing mode
prevent that. Regarding mandoc(1), this is internal refactoring,
no output change intended.

Once we will switch mandocdb(8) from DB to SQLite in the future,
this is going to be required to support search expressions crossing
word boundaries, and it will reduce both database sizes and build
times by a bit more than 5% each.


# 1.94 03-Oct-2013 schwarze

Support setting arbitrary roff(7) number registers,
preserving read support for the ".nr nS" SYNOPSIS state register;
read support for arbitrary registers is still not available.

Inspired by NetBSD roff.c rev. 1.18 (Christos Zoulas, March 21, 2013),
but implemented differently. I don't want to have yet another different
implementation of a hash table in mandoc - it would be the second one
in roff.c alone and the fifth one in mandoc grand total.
Instead, i designed and implemented roff_setreg() and roff_getreg()
to be similar to roff_setstrn() and roff_getstrn().

Once we feel the need to optimize, we can introduce one common
hash table implementation for everything in mandoc.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.93 17-Nov-2012 schwarze

Cleanup naming of local variables to make the code easier on the eye:
Settle for "struct man *man", "struct mdoc *mdoc", "struct meta *meta"
and avoid the confusing "*m" which was sometimes this, sometimes that.
No functional change.

ok kristaps@ some time ago


# 1.92 16-Nov-2012 schwarze

Fix a crash triggered by .Bl -tag .It Xo .El .Sh found by florian@.

* When allocating a body end marker, copy the pointer to the normalized
block information from the body block, avoiding the risk of subsequent
null pointer derefence.
* When inserting the body end marker into the syntax tree, do not try to
copy that pointer from the parent block, because not being a direkt child
of the block it belongs to is the whole point of a body end marker.
* Even non-callable blocks (like Bd and Bl) can break other blocks;
when this happens, postpone closing them out in the usual way.


Revision tags: OPENBSD_5_2_BASE
# 1.91 18-Jul-2012 schwarze

Fix handling of paragraph macros inside lists:
* When they are trailing the last item, move them outside the list.
* When they are trailing any other none-compact item, drop them.

Improves formatting of 40 pages, e.g. grep(1), ksh(1), netstat(1),
ath(4), bsd.port.mk(5), pf.conf(5), mount(8), crypto(9).


# 1.90 18-Jul-2012 schwarze

The mdoc(7) \*(Ba predefined string actually forces roman font;
that's stupid because it may break enclosing font changes,
but let's do the same for groff bug compatibility.

--> Never use \*(Ba, use just plain "|"! <--

Also, predefined strings are already expanded by the roff(7) parser,
so the mdoc(7) parser has to look for the expanded string.

Formatting improvements in ksh(1), less(1), atan2(3),
hostapd.conf(5), snmpd.conf(5), and mknod(8).


# 1.89 16-Jul-2012 schwarze

Several -mdoc parser improvements related to vertical spacing:
* So far, .Pp and .Lp were removed before paragraph type blocks.
* Now also remove .br before paragraph type blocks.
* Treat .Lp as a paragraph like .Pp, so remove .Pp, .Lp, .br before it.
* Do not treat .sp as a paragraph, don't remove anything before it.
* After .Sh, .Ss, .Pp, and .Lp, remove .Pp, .Lp, .sp, .br, and blank lines.
* After .sp and .br, remove .br.


# 1.88 07-Jul-2012 schwarze

Support the .cc request; code by kristaps@, tests by me.
Needed for sqlite3(1) as reported by espie@.


# 1.87 24-May-2012 schwarze

Support -Ios='OpenBSD 5.1' to override uname(3) as the source of the
default value for the mdoc(7) .Os macro.
Needed for man.cgi on the OpenBSD website.

Problem with man.cgi first noticed by deraadt@;
beck@ and deraadt@ agree with the way to solve the issue.


Revision tags: OPENBSD_5_1_BASE
# 1.86 30-Sep-2011 schwarze

implement .Ap .Bd .Bo .Bq .D1 .Ic .Lp .Oo .Pf .Po .Ss .Sx .Sy .br .sp
implement .Bl -bullet
add more information to the .TH line
escape dots at the beginnings of lines
add trailing newline character at the end of the file
do not misinterpret the ROOT block as .Ap


# 1.85 18-Sep-2011 schwarze

sync to version 1.11.7 from kristaps@
main new feature: support the roff(7) .tr request
plus various bugfixes and some refactoring

regressions are so minor that it's better to get this in
and fix them in the tree


# 1.84 18-Sep-2011 schwarze

sync to version 1.11.5:
adding an implementation of the eqn(7) language
by kristaps@

So far, only .EQ/.EN blocks are handled, in-line equations are not, and
rendering is not yet very pretty, but the parser is fairly complete.


Revision tags: OPENBSD_5_0_BASE
# 1.83 24-Apr-2011 schwarze

Merge version 1.11.1:
Again lots of cleanup and maintenance work by kristaps@.
- simplify error reporting: less function pointers, more mandoc_[v]msg
- main: split document parsing out of main.c into read.c
- roff, mdoc, man: improved recognition of control characters
- roff: better handling of if/else stack overflows
- roff: add some predefined strings for backward compatibility
- mdoc, man: empty sections are not errors
- mdoc: move delimiter handling to libmdoc
- some header restructuring and some minor features and fixes
This merge causes two minor regressions
that i will fix in separate commits right afterwards.


# 1.82 21-Apr-2011 schwarze

Merge version 1.10.10:
lots of cleanup and maintenance work by kristaps@.
- move some main.c globals into struct curparse
- move mandoc_*alloc to mandoc.h such that all code can use them
- make mandoc_isdelim available to formatting frontends
- dissolve mdoc_strings.c, move the code where it is used
- make all error reporting functions void, their return values were useless
- and various minor cleanups and fixes


# 1.81 20-Mar-2011 schwarze

Import the foundation for eqn(7) support.
Written by kristaps@.

For now, i'm adding one line to each of the four frontends
to just pass the input text through to the output,
not yet interpreting any of then eqn keywords.


# 1.80 07-Mar-2011 schwarze

Clean up date handling,
as a first step to get rid of the frequent petty warnings in this area:
- always store dates as strings, not as seconds since the Epoch
- for input, try the three most common formats everywhere
- for unrecognized format, just pass the date though verbatim
- when there is no date at all, still use the current date
Originally triggered by a one-line patch from Tim van der Molen,
<tbvdm at xs4all dot nl>, which is included here.
Feedback and OK on manual parts from jmc@.
"please check this in" kristaps@


Revision tags: OPENBSD_4_9_BASE
# 1.79 10-Feb-2011 schwarze

Tbl code maintenance by kristaps@.
- Remember the line-number of a tbl_span, and use it in messages.
- Put *_span_alloc() functions right into the *_addspan() ones,
since these are the only places they are called from.


# 1.78 09-Jan-2011 schwarze

Make sure coding errors cannot make us miss fatal parsing errors
by assert(3)ing valid parser state in the main parsing functions;
from kristaps@.


# 1.77 04-Jan-2011 schwarze

Merge kristaps@' cleaner tbl integration, removing mine;
there are still a few bugs, but fixing these will be easier in tree.


# 1.76 01-Jan-2011 schwarze

Clean up {mdoc,man}_{p,v}msg invocations:
Ignore the return values, they are constant anyway.
From kristaps@.


# 1.75 29-Dec-2010 schwarze

Reorg by Kristaps: In libmdoc, replace the union of pointers to structs
of macro-specific data by a pointer to a union of structs, which makes the
code simpler and more robust at the expense of a small memory overhead.
Merging was somewhat difficult because we mustn't break tbl(1) support
which the bsd.lv version does not yet have.


# 1.74 26-Dec-2010 schwarze

Behave more like groff (both old and new): Specifying both .%T and .%J in
an .Rs block causes the title to be quoted instead of underlined, such
that journal title and article title appear visually different.
Original diff from kristaps@, simplified by me, tweaked again by kristaps@.


# 1.73 21-Dec-2010 schwarze

Migrate .An to use a pointer to its data, like everybody else.
In preparation for a simpler ref-counted system for node data.
From kristaps@.


# 1.72 21-Dec-2010 schwarze

Vertical spacing improvements from kristaps@, small tweaks by me:
Add a "last child" member to struct mdoc_node.
Remove .Pp or .Lp if it is the first or last child of an .Sh or .Ss body.
Thus, no need to do the same in the front-ends any longer.
Tolerate some cases of .Pp inside .Bl.


# 1.71 02-Dec-2010 schwarze

Properly initialize the manual section to a default when .Dt is missing.
Without this, we died on an assertion.
Problem noted and patch provided by kristaps@.


# 1.70 01-Dec-2010 schwarze

Merge mdoc_action.c into mdoc_validate.c, because having two places to do
basically the same things just causes code duplication and confusion.
Work by kristaps@, including a few bugfixes he found during the merge,
and reapplying OpenBSD changes on top.


# 1.69 28-Nov-2010 schwarze

To avoid FATAL errors, we have been parsing and ignoring the roff
requests .am, .ami, .am1, .dei, and .rm for a long time.
Since ignoring them can (rarely) cause information loss and serious
misformatting, throw an ERROR: NOT IMPLEMENTED when finding them.
Implementing them would not be too difficult, but they are so rare
in practice that i can find better use for my time right now.

In this context,
- Put the string "NOT IMPLEMENTED" into two other error messages
as well, to distinguish them from those caused by broken input.
- Print the string "unknown macro" once, not twice in the error message
associated with MANDOCERR_MACRO, and begin printing the buffer at the
point where the unknown macro really is, not at the start of line.


# 1.68 16-Oct-2010 schwarze

Do not abort() on tbl errors, reduce the risk that tbl stuff kills a build,
and provide more useful tbl error messages in a non-intrusive way.


# 1.67 16-Oct-2010 schwarze

Support tbl(1) code embedded into mdoc(7) input files.
Very similar to what i have done in man(7) yesterday.
Allows to build cpu(4) on HPPA, wi(4), and phantasia(6).
Now we are able to build all tbl code in base.


# 1.66 27-Sep-2010 schwarze

Merge the last bits of 1.10.6 (released today), most were already in:
* ignore double-.Pp
* ignore .Pp before .Bd and .Bl (unless -compact in specified)
* avoid double blank line upon .Pp, .br and friends in literal context
* cast enums to int when passing them to exit(3) to please lint(1)
While merging, fix a regression introduced by kristaps@:
Outside literal mode, double blank lines must both be printed.
To achieve this again after kristaps@ improvements in 1.10.6,
treat such blank lines as .sp (instead of .Pp as in 1.10.5)
and drop .Pp before .sp just like dropping .Pp before .Pp.


# 1.65 20-Aug-2010 schwarze

Implement a simple, consistent user interface for error handling.
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.


# 1.64 18-Aug-2010 schwarze

Simplify and sync the code and comments for copying the macro name
in man_pmacro() and mdoc_pmacro(). In particular, no need to use
isgraph(3) here, that has already been done in main.c.
Joint work by Kristaps and myself, ok kristaps@.


Revision tags: OPENBSD_4_8_BASE
# 1.63 07-Aug-2010 schwarze

Groff allows the initial macro on a line to be delimited by a space
of by a tab; so allow the tab in mandoc, too.
Bug found by me, fix by kristaps@, "sure" deraadt@.


# 1.62 16-Jul-2010 schwarze

Text ending in a full stop, exclamation mark or question mark
should not flag the end of a sentence if:

1) The punctuation is followed by closing delimiters
and not preceded by alphanumeric characters, like in
"There is no full stop (.) in this sentence"

or

2) The punctuation is a child of a macro
and not preceded by alphanumeric characters, like in
"There is no full stop
.Pq \&.
in this sentence"

jmc@ and sobrado@ like this


# 1.61 13-Jul-2010 schwarze

Merge release 1.10.4 (all code by kristaps@), providing four new features:
1) Proper .Bk support: allow output line breaks at input line breaks,
but keep input lines together in the output, finally fixing
synopses like aucat(1), mail(1) and tmux(1).
2) Mostly finished -Tps (PostScript) output.
3) Implement -Thtml output for .Nm blocks and .Bk -words.
4) Allow iterative interpolation of user-defined roff(7) strings.
Also contains some minor bugfixes and some performance improvements.


# 1.60 01-Jul-2010 schwarze

In the mdoc(7) parser, inspect roff registers early such that all parts
of the parser can use the resulting cues. In particular, this allows
to use .nr nS to force SYNOPSIS-style .Nm indentation outside the
SYNOPSIS as needed by ifconfig(8).

To actually make this useable, .Pp must rewind .Nm, or the rest of the
section would end up indented. Implement a quick hack for now,
a generic solution can be designed later.

ok kristaps@ sobrado@


# 1.59 29-Jun-2010 schwarze

Support for badly nested blocks, written around the time of
the Rostock mandoc hackathon and tested and polished since,
supporting constructs like:

.Ao Bo Ac Bc (exp breaking exp)
.Aq Bo eol Bc (imp breaking exp)
.Ao Bq Ac eol (exp breaking imp)
.Ao Bo So Bc Ac Sc (double break, inner before outer)
.Ao Bo So Ac Bc Sc (double break, outer before inner)
.Ao Bo Ac So Bc Sc (broken breaker)
.Ao Bo So Bc Do Ac Sc Dc (broken double breaker)

There are still two known issues which are tricky:

1) Breaking two identical explicit blocks (Ao Bo Bo Ac or Aq Bo Bo eol)
fails outright, triggering a bogus syntax error.
2) Breaking a block by two identical explicit blocks (Ao Ao Bo Ac Ac Bc
or Ao Ao Bq Ac Ac eol) still has a minor rendering error left:
"<ao1 <ao2 [bo ac2> ac1> bc]>" should not have the final ">".

We can fix these later in the tree, let's not grow this diff too large.

"get it in" kristaps@


# 1.58 27-Jun-2010 schwarze

Full .nr nS support, unbreaking the kernel manuals.

Kristaps coded this from scratch after reading my .nr patch;
it is simpler and more powerful.

Registers live in struct regset in regs.h, struct man and struct mdoc
contain pointers to it. The nS register is cleared when parsing .Sh.
Frontends respect the MDOC_SYNPRETTY flag set in mdoc node_alloc.


# 1.57 26-Jun-2010 schwarze

merge release 1.10.2
* bug fixes:
- interaction of ASCII_HYPH with special chars (found by Ulrich Spoerlein)
- handling of roff conditionals (found by Ulrich Spoerlein)
- .Bd -offset will no more default to 6n
* maintenance:
- more caching of .Bd and .Bl arguments for efficiency
- deconstify man(7) validation routines
- add FreeBSD library names (provided by Ulrich Spoerlein)
* start PostScript font-switching


# 1.56 06-Jun-2010 schwarze

Merge bsd.lv version 1.10.1 (to be released soon).

The main step forward is that this now has *much* better .Bl -column
support, now supporting many manuals that previously errored out
without producing any output.

Other fixes include:
* do not die from multiple list types, use the first and warn
* in .Bl without a type, default to -item
* various tweaks to .Dt
* fix .In, .Fd, .Ft, .Fn and .Fo formatting
* some documentation fixes and additions
* and fix a couple of bugs reported by Ulrich Spoerlein:
* better support for roff block-end "\}" without a preceding dot
* .In must not break the line outside SYNOPSIS
* spelling in some error messages

While merging, fix one regression in .In spacing
that needs to go to bsd.lv, too.


# 1.55 26-May-2010 schwarze

When a word does not fully fit onto the output line, but it contains
at least one hyphen, we already had support for breaking the line a the
last fitting hyphen. This patch improves this functionality by only
breaking at hyphens in free-form text, and by not breaking at hyphens
* at the beginning or end of a word or
* immediately preceded or followed by another hyphen or
* escaped by a preceding backslash.

Before this patch, differences in break-at-hyphen support were one
of the major sources of noise in automatic comparisons to mdoc(7)
groff output. Now, the remaining differences are hard to find among
the noise coming from other sources.

Where there are still differences, what we do seems to be better than
what groff does, see e.g. the chio(1) exchange and position commands
for one of the now rare examples.

idea and coding by kristaps@

Besides, this was the last substantial code difference left
between bsd.lv and openbsd.org. We are now in full sync.


# 1.54 23-May-2010 schwarze

Unified error and warning message system for all of mandoc,
featuring three message levels, as agreed during the mandoc hackathon:
* FATAL parser failure, cannot produce any output from this input file:
eventually, we hope to convert most of these to ERRORs.
* ERROR, meaning mandoc cannot cope fully with the input syntax and will
probably lose information or produce structurally garbled output;
it will try to produce output anyway but exit non-zero at the end,
which is eventually intended to make the ports infrastructure happy.
* WARNING, meaning you should clean up the input file, but output
is probably mostly OK, so this will not cause error-exit at the end.
This commit is mostly just converting the old system to the new one; before
the classification will become really reliable, we must check all messages.

In particular,
* set up a new central message string table in main.c
* drop the old message string tables from man.c and mdoc.c
* get rid of the piece-meal merr enums in libman and libmdoc
* reduce number of error/warning functions from 16 to 6 (still a lot...)

While here, handle a few problems more gracefully:
* allow .Rv and .Ex to work without a prior .Nm
* allow .An to ignore extra arguments
* allow undeclared columns in .Bl -column

Written by kristaps@.


# 1.53 20-May-2010 schwarze

Support nested roff instructions:
* allow roff_parseln() to be re-run
* allow roff_parseln() to manipulate the line buffer offset
* support the offset in the man and mdoc libraries
* adapt .if, .ie, .el, .ig, .am* and .de* support
* interpret some instructions even in conditional-negative context
Coded by kristaps during the last day of the mandoc hackathon.

To avoid regressions in the OpenBSD tree, commit this together
with some small local additions:
* detect roff block end "\}" even on macro lines
* actually implement the ".if n" conditional
* ignore .ds, .rm and .tr in libroff

Also back my old .if/.ie/.el-handling out of libman, reverting:
man.h 1.15 man.c 1.25 man_macro.c 1.15 man_validate.c 1.19
man_action.c 1.15 man_term.c 1.28 man_html.c 1.9.


# 1.52 16-May-2010 schwarze

Rewrite the main mdoc text parser, mdoc_ptext()
to make it easier to understand and to fix various bugs:
* strip white space from the end MDOC_TEXT elements in literal mode
* in literal mode, a line may be blank even when containing tabs
* escaped backslashes do not escape following characters
ok kristaps@


# 1.51 16-May-2010 schwarze

allow the single quote as a control character in place of the dot
at all relevant places;
from kristaps@


# 1.50 15-May-2010 schwarze

allow non-numeric manual sections in -mdoc;
while here, allow LIBRARY in section 9;
by kristaps@


# 1.49 15-May-2010 schwarze

various improvements regarding errors and warnings Joerg Sonnenberger:
* If the last -column .Bl isn't specified, it is auto-sized.
* An invalid .St argument should be a warning, not an error.
Just put the argument into the output.
* An invalid .At argument should be a warning, not an error.
Just print the argument, like new groff does.
* Remove warnings concerning manual section (like 1, 6, 8).
It was only used for .Ex and not really useful.
* Remove warnings concerning page section (like SYNOPSIS).
These were only used for .Fd and .Lb and not really useful.


# 1.48 14-May-2010 schwarze

Integrate kristaps@' end-of-sentence (EOS) framework
which is simpler and more powerful than mine, and remove mine.

* man(7) now has EOS handling, too
* put EOS detection into its own function in libmandoc
* use node and termp flags to communicate the EOS condition
* no more EOS pseudo-macro
* no more non-printable EOS marker character on the formatter level

This slightly breaks EOS detection after trailing punctuation
in mdoc(7) macros, but that will be restored soon.


# 1.47 14-May-2010 schwarze

Merge 1.9.25, keeping local patches;
this does not merge kristaps' end-of-sentences handling yet,
i will check that separately. This one includes:
* handle \*(Ba as a delimiter
* introduce ARGS_PEND for .Bl -column .It end-of-line special casing
* section ordering: expect EXIT STATUS at the right place
* line break fixes in SYNOPSIS
* allow literal contexts to have arbitrary line lengths
* the input file column number can not be used to identify the beginning
of a line because white space is allowed after the initial '.'
* proper leading spaces in -man -Tascii mode
* do not let Lb break lines in -mdoc -Thtml LIBRARY


# 1.46 14-May-2010 schwarze

merge 1.9.24, keeping local patches; some changes:
* preserve multiple consecutive space characters in input
* do not restrict .Cd and .Rv to certain sections (requested by Joerg)
* do not run lookup() on quoted words
* enum return types for mdoc_args and mdoc_argv
* fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel)
* various lint and manual fixes


# 1.45 08-May-2010 schwarze

merge bsd.lv rev 1.123:
sync mdoc.c's static function names with man.c


# 1.44 08-May-2010 schwarze

handle text lines beginning with \." as comments, like groff does,
even though this is not correct comment syntax (so warn, too)
reported by Claus Assmann on misc@, fix by kristaps@


# 1.43 04-May-2010 schwarze

end-of-sentence markers at the end of .Fn argument lists
ruin indentation of the next line in the SYNOPSIS section;
bug found by jacekm@ in err(3)


# 1.42 27-Apr-2010 schwarze

Fix a subtle bug noticed by naddy@ in pftop(8), thanks!

When converting blank lines to .Pp outside literal context,
it could happen that the following node ended up as a child
of the .Pp element, but it must always be a sibling.


# 1.41 22-Apr-2010 schwarze

Fix a segfault reported by nicm@, introduced in rev. 1.38.
When finding a blank line, trying to parse it is a bad idea.
Instead, after adding .Pp to the AST, just return from parsetext().


# 1.40 07-Apr-2010 schwarze

Merge the good parts of 1.9.23,
avoid the bad parts of 1.9.23, and keep local patches.

Input in general:
* Basic handling of roff-style font escapes \f, \F.
* Quoted punctuation does not count as punctuation.

mdoc(7) parser:
* Make .Pf callable; noted by Claus Assmann.
* Let .Bd and .Bl ignore unknown arguments; noted by deraadt@.
* Do not warn when .Er is used outside certain sections.
* Replace mdoc_node_free[list] by mdoc_node_delete.
* Replace #define by enum for rew*() return values.

man(7) parser:
* When .TH is missing, use default section and date.

Output in general:
* Curly braces do not count as punctuation.
* No space after .Fl w/o args when a macro follows on the same line.

HTML output:
* Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT().
* Print whitespace after, not before .Vt .Fn .Ft .Fo.

Checked that all manuals in base still build.


# 1.39 04-Apr-2010 schwarze

When the prologue lacks required information, do not error out,
but warn, set up some default values, and prod on.
Unbreaking the ports build for textproc/sgmlformat;
reported by naddy@, thanks.


# 1.38 03-Apr-2010 schwarze

* outside literal context in mdoc(7), handle blank lines like .Pp
* a missing NAME section in mdoc(7) need not be fatal

ok deraadt@


# 1.37 02-Apr-2010 schwarze

merge 1.9.22, keeping local patches
* convert mdoc tokens from #define to enum
* fix a segfault with .Xo/.Xc in explicit blocks
* Thorn is \*(Th, not \*(TH; noticed by Joerg Sonnenberger


# 1.36 25-Mar-2010 schwarze

fix a stupid out-of-bounds read access introduced in the previous
revision, in the code searching for the end of a sentence


Revision tags: OPENBSD_4_7_BASE
# 1.35 02-Mar-2010 schwarze

Proper inter-sentence spacing for mdoc(7).
When a text line or a non-block macro line in the source code ends
in any of ".!?", consider that an end of sentence (EOS).
This makes Jason's rule "new sentence, new line" even more important.
Let the parser detect the EOS and insert a token into the AST.
Let the -Tascii frontend render the EOS token as a double space before
the next word.


# 1.34 18-Feb-2010 schwarze

sync to release 1.9.15:
* corrected .Vt handling (spotted by Joerg Sonnenberger)
* corrected .Xr argument handling (based on my patch)
* removed \\ escape sequence (because it is for low-level roff only)
* warn about trailing whitespace (suggested by jmc@)
* -Txhtml support
* and some general cleanup and doc improvements


# 1.33 02-Jan-2010 schwarze

complete the sync to 1.9.15-pre2: mostly minor fixes
* bugfix: do not restore TERMP flags when leaving lists, just reset them
* and a few HTML fixes
* clarity: return width from a2width, not width+2, and adapt to it
* manual: document .Bl and .Fl
* portability: no need to escape '%' in macro names


# 1.32 22-Dec-2009 schwarze

sync to 1.9.12, mostly portability and refactoring:

correctness/functionality:
- bugfix: do not die when overstep hits the right margin
- new option: -fign-escape
- and various HTML features

portability:
- replace bzero(3) by memset(3), which is ANSI C
- replace err(3)/warn(3) by perror(3)/exit(3), which is ANSI C
- iuse argv[0] instead of __progname
- add time.h to various files for FreeBSD compilation

simplicity:
- do not allocate header/footer data dynamically in *_term.c
- provide and use malloc frontends that error out on failure

for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.31 27-Oct-2009 schwarze

sync to 1.9.11: adapt printing of dates to groff conventions,
NetBSD portability fixes and some minor bugfixes and feature enhancements;
also checked that my hyphenation code still works on top of this


# 1.30 21-Oct-2009 schwarze

sync to 1.9.9, featuring:
* -Thtml output mode
* roff scaling units
* and some minor fixes
for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.29 19-Oct-2009 schwarze

sync to 1.9.6: multiple improvements to references (.Rs)
* validate and order .Rs child nodes
* underline book title (.%B) and issuer (.%I)
* enclose title of article (.%T) in quotes
* avoid calling mdoc_verr directly, use a proper error code instead


# 1.28 19-Oct-2009 schwarze

sync to 1.9.6: u_char lives in <sys/types.h>
noticed by uqs at spoerlein dot net on FreeBSD,
where <stdlib.h> does not include <sys/types.h>


# 1.27 21-Sep-2009 schwarze

sync to 1.9.5: lookup hashes are now static tables
shortening the code, and, according to kristaps@, speeding it up


# 1.26 18-Sep-2009 schwarze

sync to 1.9.2: non-printable characters in macro names are errors;
from joerg at netbsd dot org


# 1.25 22-Aug-2009 schwarze

sync to 1.9.1: set mdoc_next flags in mdoc_*_alloc routines, where they belong


# 1.24 22-Aug-2009 schwarze

sync to 1.9.0: polishing the core code of mdoc macro handling
1) If a macro is not parsed, do not parse it. Of course, without
parsing it, we cannot produce "macro-like parameter" warnings,
but these were useless anyway.
2) If a macro is not callable, do not print a useless warning when
it occurs as a parameter, just display the raw characters.
3) Below .Bl -column, check whether macros are callable.
4) Like groff, allow whitespace after the initial dot on macro lines.


# 1.23 22-Aug-2009 schwarze

sync to 1.8.5: Error string is now file:line:col: message.
Fixed column reporting (off by one).
Use fprintf instead of warnx for parse errors (like cc).


# 1.22 26-Jul-2009 schwarze

sync to 1.8.1: rewrite quoted literal handling correctly,
rewrite TABSEP handling in a simpler way,
and retire ECOLEMPTY, ARGS_QUOTED and ARGS_ARGVLIKE


# 1.21 26-Jul-2009 schwarze

sync to 1.8.1: removed excessively verbose EARGVPARM warning


# 1.20 26-Jul-2009 schwarze

sync to 1.8.1: support .br and .sp


# 1.19 26-Jul-2009 schwarze

sync to 1.8.1: libmdoc now breaks up free-form lines into tokens;
will simplify LITERAL mode in front-end


# 1.18 18-Jul-2009 schwarze

sync to 1.8.0: move mdoc_a2att, mdoc_a2st, and mdoc_a2lib to libmdoc


# 1.17 12-Jul-2009 schwarze

sync to 1.7.23: pass warning code to mdoc_pwarn() instead of warning message
define additional warning macro mdoc_nwarn()
remove obsolete warning functions mdoc_warn(), pwarn(), vwarn(), nwarn()
remove various now unused "enum mdoc_warn" and "enum mwarn"


# 1.16 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_perr() instead of error string
and use the so improved mdoc_nerr() at many places;
get rid of now unused static functions perr()


# 1.15 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_nerr() instead of error string
and use the so improved mdoc_nerr() at many places


# 1.14 12-Jul-2009 schwarze

sync to 1.7.23: unify the various "enum merr" into libman.h and libmdoc.h,
use it as a new argument to mdoc_err(), the same way as for for man_err(),
and use string tables instead of switch statements to select error messages


# 1.13 12-Jul-2009 schwarze

sync to 1.7.23: third step to get rid of enum mdoc_warn:
mdoc_verr is not using enum mdoc_warn, so use it at a few more places


# 1.12 12-Jul-2009 schwarze

sync to 1.7.23: second step to get rid of enum mdoc_warn:
remove type from mdoc_vwarn arguments, and use this function where apropriate


# 1.11 12-Jul-2009 schwarze

sync to 1.7.23: first step to get rid of enum mdoc_warn:
unify manwarn() and mdocwarn() into mwarn()


Revision tags: OPENBSD_4_6_BASE
# 1.10 23-Jun-2009 schwarze

sync to 1.7.20: like for the -man case, add an nchild counter to the -mdoc
nodes, simplifying the validation code; no functional change


# 1.9 19-Jun-2009 schwarze

sync to 1.7.19: more elegant section handling


# 1.8 18-Jun-2009 schwarze

sync to 1.7.19: comment cleanup; no functional change


# 1.7 18-Jun-2009 schwarze

sync to 1.7.19: improved comment handling


# 1.6 18-Jun-2009 schwarze

complete sync to 1.7.17: garbage collect unused functions
mdoc_msg, mdoc_pmsg, mdoc_vmsg, and mdoc_nwarn


# 1.5 15-Jun-2009 schwarze

bring back miod@'s "real functions" patch (rev. 1.2)
which was erroneously backed out in rev. 1.4, sorry;
ok kristaps@


# 1.4 15-Jun-2009 schwarze

sync to 1.7.16:
reduce code duplication in warning and error reporting functions
while here, garbage collect three unused function prototypes


# 1.3 14-Jun-2009 schwarze

sync to 1.7.16: comments, whitespace and spelling fixes; no functional change


# 1.2 15-Apr-2009 miod

Replace variadic macros with real functions, so that this compiles on
platforms still using gcc 2.
ok deraadt@


# 1.1 06-Apr-2009 kristaps

Initial check-in of mandoc for formatting manuals. ok deraadt@


# 1.163 31-Dec-2018 schwarze

Cleanup, no functional change:
Use the new parser flag ROFF_NOFILL in the mdoc(7) parser, too,
instead of the old MDOC_LITERAL, which was an alias for the
former MAN_LITERAL.


# 1.162 31-Dec-2018 schwarze

Cleanup, minus 15 LOC, no functional change:
Simplify the way the man(7) and mdoc(7) validators are called.
Reset the parser state with a common function before calling them.
There is no need to again reset the parser state afterwards,
the parsers are no longer used after validation.
This allows getting rid of man_node_validate() and mdoc_node_validate()
as separate functions.


# 1.161 30-Dec-2018 schwarze

Cleanup, no functional change:

The struct roff_man used to be a bad mixture of internal parser
state and public parsing results. Move the public results to the
parsing result struct roff_meta, which is already public. Move the
rest of struct roff_man to the parser-internal header roff_int.h.

Since the validators need access to the parser state, call them
from the top level parser during mparse_result() rather than from
the main programs, also reducing code duplication.

This keeps parser internal state out of thee main programs (five
in mandoc portable) and out of eight formatters.


# 1.160 14-Dec-2018 schwarze

Almost mechanical diff to remove the "struct mparse *" argument
from mandoc_msg(), where it is no longer used.
While here, rename mandoc_vmsg() to mandoc_msg() and retire the
old version: There is really no point in having another function
merely to save "%s" in a few places.
Minus 140 lines of code.


# 1.159 04-Dec-2018 schwarze

Clean up the validation of .Pp, .PP, .sp, and .br. Make sure all
combinations are handled, and are handled in a systematic manner.
This resolves some erratic duplicate handling, handles a number of
missing cases, and improves diagnostics in various respects.

Move validation of .br and .sp to the roff validation module
rather than doing that twice in the mdoc and man validation modules.
Move the node relinking function to the roff library where it belongs.

In validation functions, only look at the node itself, at previous
nodes, and at descendants, not at following nodes or ancestors,
such that only nodes are inspected which are already validated.


Revision tags: OPENBSD_6_4_BASE
# 1.158 17-Aug-2018 schwarze

Remove more pointer arithmetic passing via regions outside the array
that is undefined according to the C standard. Robert Elz <kre at
munnari dot oz dot au> pointed out i wasn't quite done yet.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.157 11-Aug-2017 schwarze

Make the "new sentence, new line" check stricter, allowing digits
in the last two letters of the last word of the sentence.
No false positives in base or Xenocara.
Suggested by and OK jmc@.


# 1.156 17-Jun-2017 schwarze

correct handling of blank lines after \c


# 1.155 07-Jun-2017 schwarze

Also catch "new sentence, new line" if there are three blanks
between the sentences. Thomas Klausner says he has seen some
of these, and i don't see any false positives.


# 1.154 07-Jun-2017 schwarze

Make "new sentence, new line" detection stricter:
Also catch cases where the new sentence starts with a one-letter word
and the input line is broken right after that word.
Suggested by Thomas Klausner <wiz @ NetBSD>.

It's merely a three-bit diff, changing one byte from 0x34 to 0x33,
so what can possibly go wrong...


# 1.153 05-May-2017 schwarze

Move .sp to the roff modules. Enough infrastructure is in place
now that this actually saves code: -70 LOC.


# 1.152 29-Apr-2017 schwarze

Parser unification: use nice ohashes for all three request and macro tables;
no functional change, minus two source files, minus 200 lines of code.


# 1.151 24-Apr-2017 schwarze

Continue parser unification:
* Make enum rofft an internal interface as enum roff_tok in "roff.h".
* Represent mdoc and man macros in enum roff_tok.
* Make TOKEN_NONE a proper enum value and use it throughout.
* Put the prologue macros first in the macro tables.
* Unify mdoc_macroname[] and man_macroname[] into roff_name[].


Revision tags: OPENBSD_6_1_BASE
# 1.150 03-Mar-2017 schwarze

remove a few redundant conditions that jsg@ found with cppcheck


# 1.149 16-Feb-2017 schwarze

Remove the ENDBODY_NOSPACE flag, simplifying the code.

Comparing to groff output, it appears that all cases where it was used
and made a difference actually require the opposite, ENDBODY_SPACE.

I have no idea why i added it back in 2010; maybe to compensate for
some other bug that has long been fixed.


# 1.148 28-Jan-2017 schwarze

Add a warning "new sentence, new line".
This does not attempt to pinpoint each and every offender, but
instead tries very hard to avoid false positives: Currently, there
are only two false positives in the whole OpenBSD base system.
Only do this in mdoc(7), not in man(7), because manuals written
in man(7) typically have much worse problems than this.
OK jmc@ on a previous version of the patch


# 1.147 10-Jan-2017 schwarze

unify names of AST node flags; no change of cpp output


# 1.146 20-Aug-2016 schwarze

If a column list starts with implicit rows (that is, rows without .It)
and roff-level nodes (e.g. tbl or eqn) follow, don't run into an
assertion. Instead, wrap the roff-level nodes in their own row.
Issue found by tb@ with afl(1).


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.145 30-Oct-2015 schwarze

If a .Bd block has no arguments at all, drop the block and only keep
its contents. Removing a gratuitious difference to groff output
found after a related bug report from krw@.


# 1.144 20-Oct-2015 schwarze

In order to become able to generate syntax tree nodes on the roff(7)
level, validation must be separated from parsing and rewinding.
This first big step moves calling of the mdoc(7) post_*() functions
out of the parser loop into their own mdoc_validate() pass, while
using a new mdoc_state() module to make syntax tree state handling
available to both the parser loop and the validation pass.


# 1.143 12-Oct-2015 schwarze

To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.


# 1.142 06-Oct-2015 schwarze

modernize style: "return" is not a function; ok cmp(1)


Revision tags: OPENBSD_5_8_BASE
# 1.141 23-Apr-2015 schwarze

Unify mdoc_deroff() and man_deroff() into a common function deroff().
No functional change except that for mdoc(7), it now skips leading
escape sequences just like it already did for man(7).
Escape sequences rarely occur in mdoc(7) code and if they do,
skipping them is an improvement in this context.
Minus 30 lines of code.


# 1.140 23-Apr-2015 schwarze

Get rid of two empty wrapper functions. No functional change.


# 1.139 19-Apr-2015 schwarze

Unify trickier node handling functions.
* man_elem_alloc() -> roff_elem_alloc()
* man_block_alloc() -> roff_block_alloc()
The functions mdoc_elem_alloc() and mdoc_block_alloc() remain for
now because they need to do mdoc(7)-specific argument processing.


# 1.138 19-Apr-2015 schwarze

Unify some node handling functions that use TOKEN_NONE.
* mdoc_word_alloc(), man_word_alloc() -> roff_word_alloc()
* mdoc_word_append(), man_word_append() -> roff_word_append()
* mdoc_addspan(), man_addspan() -> roff_addtbl()
* mdoc_addeqn(), man_addeqn() -> roff_addeqn()
Minus 50 lines of code, no functional change.


# 1.137 19-Apr-2015 schwarze

Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.


# 1.136 19-Apr-2015 schwarze

Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.


# 1.135 18-Apr-2015 schwarze

Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.


# 1.134 18-Apr-2015 schwarze

Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.


# 1.133 18-Apr-2015 schwarze

Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.


# 1.132 18-Apr-2015 schwarze

Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.


# 1.131 18-Apr-2015 schwarze

Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.


# 1.130 02-Apr-2015 schwarze

Third step towards parser unification:
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.


# 1.129 02-Apr-2015 schwarze

Second step towards parser unification:
Replace struct mdoc_node and struct man_node by a unified struct roff_node.
To be able to use the tok member for both mdoc(7) and man(7) without
defining all the macros in roff.h, sacrifice a tiny bit of type safety
and make tok an int rather than an enum.
Almost mechanical, no functional change.
Written on the Eurostar from Bruxelles to London on the way to p2k15.


# 1.128 02-Apr-2015 schwarze

First step towards parser unification:
Replace enum mdoc_type and enum man_type by a unified enum roff_type.
Almost mechanical, no functional change.
Written on the ICE train from Frankfurt to Bruxelles on the way to p2k15.


Revision tags: OPENBSD_5_7_BASE
# 1.127 12-Feb-2015 schwarze

Do not confuse .Bl -column lists that just broken another block
with newly opened .Bl -column lists;
fixing an assertion failure jsg@ found with afl:
test case #481, Bl It Bl -column It Bd El text text El


# 1.126 12-Feb-2015 schwarze

Delete the mdoc_node.pending pointer and the function calculating
it, make_pending(), which was the most difficult function of the
whole mdoc(7) parser. After almost five years of maintaining this
hellhole, i just noticed the pointer isn't needed after all.

Blocks are always rewound in the reverse order they were opened;
that even holds for broken blocks. Consequently, it is sufficient
to just mark broken blogs with the flag MDOC_BROKEN and breaking
blocks with the flag MDOC_ENDED. When rewinding, instead of iterating
the pending pointers, just iterate from each broken block to its
parents, rewinding all that are MDOC_ENDED and stopping after
processing the first ancestor that it not MDOC_BROKEN. For ENDBODY
markers, use the mdoc_node.body pointer in place of the former
mdoc_node.pending.

This also fixes an assertion failure found by jsg@ with afl,
test case #467 (Bo Bl It Bd Bc It), where (surprise surprise)
the pending pointer got corrupted.

Improved functionality, minus one function, minus one struct field,
minus 50 lines of code.


# 1.125 05-Feb-2015 schwarze

Simplify by deleting the "lastline" member of struct mdoc_node.
Minus one struct member, minus 17 lines of code, no functional change.


# 1.124 02-Feb-2015 schwarze

Get rid of all calls to rew_sub() in blk_exp_close(); only ten calls
remain in other functions. As a bonus, this fixes an assertion failure
jsg@ found some time ago with afl (test case 982) and improves minor
details in error reporting.


# 1.123 15-Jan-2015 schwarze

Fatal errors no longer exist.
If a file can be opened, mandoc will produce some output;
at worst, the output may be almost empty.
Simplifies error handling and frees a message type for future use.


# 1.122 28-Nov-2014 schwarze

Simplify by making the eqn and tbl steering functions void;
no functional change, minus 15 lines of code.


# 1.121 28-Nov-2014 schwarze

Simplify by making the mdoc parser callbacks void, and some cleanup;
no functional change, minus 50 lines of code.


# 1.120 28-Nov-2014 schwarze

Simplify the code by making various mdoc parser helper functions void.
No functional change, minus 130 lines of code.


# 1.119 28-Nov-2014 schwarze

Simplify code by making mdoc validation handlers void.
No functional change, minus 90 lines of code.


# 1.118 19-Nov-2014 schwarze

Escape sequences terminate high-level macro names, and when doing so,
they are ignored, just in the same way as for request names
and for low-level macro names.
This also cures a warning in the pod2man(1) preamble.


# 1.117 20-Oct-2014 schwarze

correct the spacing after in-line equations
that start at the beginning of an input line
but end before the end of an input line


# 1.116 20-Oct-2014 schwarze

correct spacing before inline equations


# 1.115 16-Oct-2014 schwarze

Implement in-line equations, much needed by Xenocara manuals.
Put the steering into the roff parser rather than into the mdoc
parser such that it works for all macro languages and on both text
and macro lines.
Line breaks and blank characters generated before and after in-line
equations are not perfect yet, but let's do one thing at a time.


# 1.114 06-Sep-2014 schwarze

Simplify by handling empty request lines at the one logical place
in the roff parser instead of in three other places in other parsers.
No functional change.


# 1.113 08-Aug-2014 schwarze

Bring the handling of defective prologues even closer to groff,
in particular relaxing the distinction between prologue and body
and further improving messages.
* The last .Dd wins and the last .Os wins, even in the body.
* The last .Dt before the first body macro wins.
* Missing title in .Dt defaults to UNTITLED. Warn about it.
* Missing section in .Dt does not default to 1. But warn about it.
* Do not warn multiple times about the same mdoc(7) prologue macro.
* Warn about missing .Os.
* Incomplete .TH defaults to empty strings. Warn about it.


# 1.112 08-Aug-2014 schwarze

mention requests and macros in more messages


# 1.111 08-Aug-2014 schwarze

Simplify: replace one global flag by one local variable
and remove three unused global flags. No functional change.


Revision tags: OPENBSD_5_6_BASE
# 1.110 09-Jul-2014 schwarze

mark defos as const; nobody needs to change it,
and it is occasionally useful to be able to pass literal strings


# 1.109 07-Jul-2014 schwarze

no need to skip content before first section header


# 1.108 06-Jul-2014 schwarze

Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.


# 1.107 02-Jul-2014 schwarze

Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!


# 1.106 01-Jul-2014 schwarze

Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.


# 1.105 20-Jun-2014 schwarze

Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.

1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue

Started on the plane back from Ottawa.


# 1.104 25-Apr-2014 schwarze

Fix a minor optimization i broke in bsd.lv rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!


# 1.103 20-Apr-2014 schwarze

KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change


# 1.102 30-Mar-2014 schwarze

Implement the roff(7) .ll (line length) request.
Found by naddy@ in the textproc/enchant(1) port.
Of course, do not use this in new manuals.


# 1.101 23-Mar-2014 schwarze

If an .Nd block contains macros, avoid fragmented entries in mandocdb(8),
instead use the .Nd content recursively.
Improves a couple of index entries in base.


# 1.100 21-Mar-2014 schwarze

avoid repetitive code for asprintf error handling


# 1.99 21-Mar-2014 schwarze

The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.


Revision tags: OPENBSD_5_5_BASE
# 1.98 05-Jan-2014 schwarze

Add an option -Q (quick) to mandocdb(8)
for accelerated generation of reduced-size databases.

Implement this by allowing the parsers to optionally
abort the parse sequence after the NAME section.

While here, garbage collect the unused void *arg attribute
of struct mparse and mparse_alloc().

This reduces the processing time of mandocdb(8) on /usr/share/man
by a factor of 2 and the database size by a factor of 4.
However, it still takes 5 times the time and 6 times the space
of makewhatis(8), so more work is clearly needed.


# 1.97 30-Dec-2013 schwarze

Simplify: Remove an unused argument from the mandoc_eos() function.
No functional change.


# 1.96 24-Dec-2013 schwarze

When deciding whether two consecutive macros are on the same input line,
we have to compare the line where the first one *ends* (not where it begins)
to the line where the second one starts.
This fixes the bug that .Bk allowed output line breaks right after block
macros spanning more than one input line, even when the next macro follows
on the same line.


# 1.95 21-Oct-2013 schwarze

There are three kinds of input lines: text lines, macros taking
positional arguments (like Dt Fn Xr) and macros taking text as
arguments (like Nd Sh Em %T An). In the past, even the latter put
each word of their arguments into its own MDOC_TEXT node; instead,
concatenate arguments unless delimiters, keeps or spacing mode
prevent that. Regarding mandoc(1), this is internal refactoring,
no output change intended.

Once we will switch mandocdb(8) from DB to SQLite in the future,
this is going to be required to support search expressions crossing
word boundaries, and it will reduce both database sizes and build
times by a bit more than 5% each.


# 1.94 03-Oct-2013 schwarze

Support setting arbitrary roff(7) number registers,
preserving read support for the ".nr nS" SYNOPSIS state register;
read support for arbitrary registers is still not available.

Inspired by NetBSD roff.c rev. 1.18 (Christos Zoulas, March 21, 2013),
but implemented differently. I don't want to have yet another different
implementation of a hash table in mandoc - it would be the second one
in roff.c alone and the fifth one in mandoc grand total.
Instead, i designed and implemented roff_setreg() and roff_getreg()
to be similar to roff_setstrn() and roff_getstrn().

Once we feel the need to optimize, we can introduce one common
hash table implementation for everything in mandoc.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.93 17-Nov-2012 schwarze

Cleanup naming of local variables to make the code easier on the eye:
Settle for "struct man *man", "struct mdoc *mdoc", "struct meta *meta"
and avoid the confusing "*m" which was sometimes this, sometimes that.
No functional change.

ok kristaps@ some time ago


# 1.92 16-Nov-2012 schwarze

Fix a crash triggered by .Bl -tag .It Xo .El .Sh found by florian@.

* When allocating a body end marker, copy the pointer to the normalized
block information from the body block, avoiding the risk of subsequent
null pointer derefence.
* When inserting the body end marker into the syntax tree, do not try to
copy that pointer from the parent block, because not being a direkt child
of the block it belongs to is the whole point of a body end marker.
* Even non-callable blocks (like Bd and Bl) can break other blocks;
when this happens, postpone closing them out in the usual way.


Revision tags: OPENBSD_5_2_BASE
# 1.91 18-Jul-2012 schwarze

Fix handling of paragraph macros inside lists:
* When they are trailing the last item, move them outside the list.
* When they are trailing any other none-compact item, drop them.

Improves formatting of 40 pages, e.g. grep(1), ksh(1), netstat(1),
ath(4), bsd.port.mk(5), pf.conf(5), mount(8), crypto(9).


# 1.90 18-Jul-2012 schwarze

The mdoc(7) \*(Ba predefined string actually forces roman font;
that's stupid because it may break enclosing font changes,
but let's do the same for groff bug compatibility.

--> Never use \*(Ba, use just plain "|"! <--

Also, predefined strings are already expanded by the roff(7) parser,
so the mdoc(7) parser has to look for the expanded string.

Formatting improvements in ksh(1), less(1), atan2(3),
hostapd.conf(5), snmpd.conf(5), and mknod(8).


# 1.89 16-Jul-2012 schwarze

Several -mdoc parser improvements related to vertical spacing:
* So far, .Pp and .Lp were removed before paragraph type blocks.
* Now also remove .br before paragraph type blocks.
* Treat .Lp as a paragraph like .Pp, so remove .Pp, .Lp, .br before it.
* Do not treat .sp as a paragraph, don't remove anything before it.
* After .Sh, .Ss, .Pp, and .Lp, remove .Pp, .Lp, .sp, .br, and blank lines.
* After .sp and .br, remove .br.


# 1.88 07-Jul-2012 schwarze

Support the .cc request; code by kristaps@, tests by me.
Needed for sqlite3(1) as reported by espie@.


# 1.87 24-May-2012 schwarze

Support -Ios='OpenBSD 5.1' to override uname(3) as the source of the
default value for the mdoc(7) .Os macro.
Needed for man.cgi on the OpenBSD website.

Problem with man.cgi first noticed by deraadt@;
beck@ and deraadt@ agree with the way to solve the issue.


Revision tags: OPENBSD_5_1_BASE
# 1.86 30-Sep-2011 schwarze

implement .Ap .Bd .Bo .Bq .D1 .Ic .Lp .Oo .Pf .Po .Ss .Sx .Sy .br .sp
implement .Bl -bullet
add more information to the .TH line
escape dots at the beginnings of lines
add trailing newline character at the end of the file
do not misinterpret the ROOT block as .Ap


# 1.85 18-Sep-2011 schwarze

sync to version 1.11.7 from kristaps@
main new feature: support the roff(7) .tr request
plus various bugfixes and some refactoring

regressions are so minor that it's better to get this in
and fix them in the tree


# 1.84 18-Sep-2011 schwarze

sync to version 1.11.5:
adding an implementation of the eqn(7) language
by kristaps@

So far, only .EQ/.EN blocks are handled, in-line equations are not, and
rendering is not yet very pretty, but the parser is fairly complete.


Revision tags: OPENBSD_5_0_BASE
# 1.83 24-Apr-2011 schwarze

Merge version 1.11.1:
Again lots of cleanup and maintenance work by kristaps@.
- simplify error reporting: less function pointers, more mandoc_[v]msg
- main: split document parsing out of main.c into read.c
- roff, mdoc, man: improved recognition of control characters
- roff: better handling of if/else stack overflows
- roff: add some predefined strings for backward compatibility
- mdoc, man: empty sections are not errors
- mdoc: move delimiter handling to libmdoc
- some header restructuring and some minor features and fixes
This merge causes two minor regressions
that i will fix in separate commits right afterwards.


# 1.82 21-Apr-2011 schwarze

Merge version 1.10.10:
lots of cleanup and maintenance work by kristaps@.
- move some main.c globals into struct curparse
- move mandoc_*alloc to mandoc.h such that all code can use them
- make mandoc_isdelim available to formatting frontends
- dissolve mdoc_strings.c, move the code where it is used
- make all error reporting functions void, their return values were useless
- and various minor cleanups and fixes


# 1.81 20-Mar-2011 schwarze

Import the foundation for eqn(7) support.
Written by kristaps@.

For now, i'm adding one line to each of the four frontends
to just pass the input text through to the output,
not yet interpreting any of then eqn keywords.


# 1.80 07-Mar-2011 schwarze

Clean up date handling,
as a first step to get rid of the frequent petty warnings in this area:
- always store dates as strings, not as seconds since the Epoch
- for input, try the three most common formats everywhere
- for unrecognized format, just pass the date though verbatim
- when there is no date at all, still use the current date
Originally triggered by a one-line patch from Tim van der Molen,
<tbvdm at xs4all dot nl>, which is included here.
Feedback and OK on manual parts from jmc@.
"please check this in" kristaps@


Revision tags: OPENBSD_4_9_BASE
# 1.79 10-Feb-2011 schwarze

Tbl code maintenance by kristaps@.
- Remember the line-number of a tbl_span, and use it in messages.
- Put *_span_alloc() functions right into the *_addspan() ones,
since these are the only places they are called from.


# 1.78 09-Jan-2011 schwarze

Make sure coding errors cannot make us miss fatal parsing errors
by assert(3)ing valid parser state in the main parsing functions;
from kristaps@.


# 1.77 04-Jan-2011 schwarze

Merge kristaps@' cleaner tbl integration, removing mine;
there are still a few bugs, but fixing these will be easier in tree.


# 1.76 01-Jan-2011 schwarze

Clean up {mdoc,man}_{p,v}msg invocations:
Ignore the return values, they are constant anyway.
From kristaps@.


# 1.75 29-Dec-2010 schwarze

Reorg by Kristaps: In libmdoc, replace the union of pointers to structs
of macro-specific data by a pointer to a union of structs, which makes the
code simpler and more robust at the expense of a small memory overhead.
Merging was somewhat difficult because we mustn't break tbl(1) support
which the bsd.lv version does not yet have.


# 1.74 26-Dec-2010 schwarze

Behave more like groff (both old and new): Specifying both .%T and .%J in
an .Rs block causes the title to be quoted instead of underlined, such
that journal title and article title appear visually different.
Original diff from kristaps@, simplified by me, tweaked again by kristaps@.


# 1.73 21-Dec-2010 schwarze

Migrate .An to use a pointer to its data, like everybody else.
In preparation for a simpler ref-counted system for node data.
From kristaps@.


# 1.72 21-Dec-2010 schwarze

Vertical spacing improvements from kristaps@, small tweaks by me:
Add a "last child" member to struct mdoc_node.
Remove .Pp or .Lp if it is the first or last child of an .Sh or .Ss body.
Thus, no need to do the same in the front-ends any longer.
Tolerate some cases of .Pp inside .Bl.


# 1.71 02-Dec-2010 schwarze

Properly initialize the manual section to a default when .Dt is missing.
Without this, we died on an assertion.
Problem noted and patch provided by kristaps@.


# 1.70 01-Dec-2010 schwarze

Merge mdoc_action.c into mdoc_validate.c, because having two places to do
basically the same things just causes code duplication and confusion.
Work by kristaps@, including a few bugfixes he found during the merge,
and reapplying OpenBSD changes on top.


# 1.69 28-Nov-2010 schwarze

To avoid FATAL errors, we have been parsing and ignoring the roff
requests .am, .ami, .am1, .dei, and .rm for a long time.
Since ignoring them can (rarely) cause information loss and serious
misformatting, throw an ERROR: NOT IMPLEMENTED when finding them.
Implementing them would not be too difficult, but they are so rare
in practice that i can find better use for my time right now.

In this context,
- Put the string "NOT IMPLEMENTED" into two other error messages
as well, to distinguish them from those caused by broken input.
- Print the string "unknown macro" once, not twice in the error message
associated with MANDOCERR_MACRO, and begin printing the buffer at the
point where the unknown macro really is, not at the start of line.


# 1.68 16-Oct-2010 schwarze

Do not abort() on tbl errors, reduce the risk that tbl stuff kills a build,
and provide more useful tbl error messages in a non-intrusive way.


# 1.67 16-Oct-2010 schwarze

Support tbl(1) code embedded into mdoc(7) input files.
Very similar to what i have done in man(7) yesterday.
Allows to build cpu(4) on HPPA, wi(4), and phantasia(6).
Now we are able to build all tbl code in base.


# 1.66 27-Sep-2010 schwarze

Merge the last bits of 1.10.6 (released today), most were already in:
* ignore double-.Pp
* ignore .Pp before .Bd and .Bl (unless -compact in specified)
* avoid double blank line upon .Pp, .br and friends in literal context
* cast enums to int when passing them to exit(3) to please lint(1)
While merging, fix a regression introduced by kristaps@:
Outside literal mode, double blank lines must both be printed.
To achieve this again after kristaps@ improvements in 1.10.6,
treat such blank lines as .sp (instead of .Pp as in 1.10.5)
and drop .Pp before .sp just like dropping .Pp before .Pp.


# 1.65 20-Aug-2010 schwarze

Implement a simple, consistent user interface for error handling.
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.


# 1.64 18-Aug-2010 schwarze

Simplify and sync the code and comments for copying the macro name
in man_pmacro() and mdoc_pmacro(). In particular, no need to use
isgraph(3) here, that has already been done in main.c.
Joint work by Kristaps and myself, ok kristaps@.


Revision tags: OPENBSD_4_8_BASE
# 1.63 07-Aug-2010 schwarze

Groff allows the initial macro on a line to be delimited by a space
of by a tab; so allow the tab in mandoc, too.
Bug found by me, fix by kristaps@, "sure" deraadt@.


# 1.62 16-Jul-2010 schwarze

Text ending in a full stop, exclamation mark or question mark
should not flag the end of a sentence if:

1) The punctuation is followed by closing delimiters
and not preceded by alphanumeric characters, like in
"There is no full stop (.) in this sentence"

or

2) The punctuation is a child of a macro
and not preceded by alphanumeric characters, like in
"There is no full stop
.Pq \&.
in this sentence"

jmc@ and sobrado@ like this


# 1.61 13-Jul-2010 schwarze

Merge release 1.10.4 (all code by kristaps@), providing four new features:
1) Proper .Bk support: allow output line breaks at input line breaks,
but keep input lines together in the output, finally fixing
synopses like aucat(1), mail(1) and tmux(1).
2) Mostly finished -Tps (PostScript) output.
3) Implement -Thtml output for .Nm blocks and .Bk -words.
4) Allow iterative interpolation of user-defined roff(7) strings.
Also contains some minor bugfixes and some performance improvements.


# 1.60 01-Jul-2010 schwarze

In the mdoc(7) parser, inspect roff registers early such that all parts
of the parser can use the resulting cues. In particular, this allows
to use .nr nS to force SYNOPSIS-style .Nm indentation outside the
SYNOPSIS as needed by ifconfig(8).

To actually make this useable, .Pp must rewind .Nm, or the rest of the
section would end up indented. Implement a quick hack for now,
a generic solution can be designed later.

ok kristaps@ sobrado@


# 1.59 29-Jun-2010 schwarze

Support for badly nested blocks, written around the time of
the Rostock mandoc hackathon and tested and polished since,
supporting constructs like:

.Ao Bo Ac Bc (exp breaking exp)
.Aq Bo eol Bc (imp breaking exp)
.Ao Bq Ac eol (exp breaking imp)
.Ao Bo So Bc Ac Sc (double break, inner before outer)
.Ao Bo So Ac Bc Sc (double break, outer before inner)
.Ao Bo Ac So Bc Sc (broken breaker)
.Ao Bo So Bc Do Ac Sc Dc (broken double breaker)

There are still two known issues which are tricky:

1) Breaking two identical explicit blocks (Ao Bo Bo Ac or Aq Bo Bo eol)
fails outright, triggering a bogus syntax error.
2) Breaking a block by two identical explicit blocks (Ao Ao Bo Ac Ac Bc
or Ao Ao Bq Ac Ac eol) still has a minor rendering error left:
"<ao1 <ao2 [bo ac2> ac1> bc]>" should not have the final ">".

We can fix these later in the tree, let's not grow this diff too large.

"get it in" kristaps@


# 1.58 27-Jun-2010 schwarze

Full .nr nS support, unbreaking the kernel manuals.

Kristaps coded this from scratch after reading my .nr patch;
it is simpler and more powerful.

Registers live in struct regset in regs.h, struct man and struct mdoc
contain pointers to it. The nS register is cleared when parsing .Sh.
Frontends respect the MDOC_SYNPRETTY flag set in mdoc node_alloc.


# 1.57 26-Jun-2010 schwarze

merge release 1.10.2
* bug fixes:
- interaction of ASCII_HYPH with special chars (found by Ulrich Spoerlein)
- handling of roff conditionals (found by Ulrich Spoerlein)
- .Bd -offset will no more default to 6n
* maintenance:
- more caching of .Bd and .Bl arguments for efficiency
- deconstify man(7) validation routines
- add FreeBSD library names (provided by Ulrich Spoerlein)
* start PostScript font-switching


# 1.56 06-Jun-2010 schwarze

Merge bsd.lv version 1.10.1 (to be released soon).

The main step forward is that this now has *much* better .Bl -column
support, now supporting many manuals that previously errored out
without producing any output.

Other fixes include:
* do not die from multiple list types, use the first and warn
* in .Bl without a type, default to -item
* various tweaks to .Dt
* fix .In, .Fd, .Ft, .Fn and .Fo formatting
* some documentation fixes and additions
* and fix a couple of bugs reported by Ulrich Spoerlein:
* better support for roff block-end "\}" without a preceding dot
* .In must not break the line outside SYNOPSIS
* spelling in some error messages

While merging, fix one regression in .In spacing
that needs to go to bsd.lv, too.


# 1.55 26-May-2010 schwarze

When a word does not fully fit onto the output line, but it contains
at least one hyphen, we already had support for breaking the line a the
last fitting hyphen. This patch improves this functionality by only
breaking at hyphens in free-form text, and by not breaking at hyphens
* at the beginning or end of a word or
* immediately preceded or followed by another hyphen or
* escaped by a preceding backslash.

Before this patch, differences in break-at-hyphen support were one
of the major sources of noise in automatic comparisons to mdoc(7)
groff output. Now, the remaining differences are hard to find among
the noise coming from other sources.

Where there are still differences, what we do seems to be better than
what groff does, see e.g. the chio(1) exchange and position commands
for one of the now rare examples.

idea and coding by kristaps@

Besides, this was the last substantial code difference left
between bsd.lv and openbsd.org. We are now in full sync.


# 1.54 23-May-2010 schwarze

Unified error and warning message system for all of mandoc,
featuring three message levels, as agreed during the mandoc hackathon:
* FATAL parser failure, cannot produce any output from this input file:
eventually, we hope to convert most of these to ERRORs.
* ERROR, meaning mandoc cannot cope fully with the input syntax and will
probably lose information or produce structurally garbled output;
it will try to produce output anyway but exit non-zero at the end,
which is eventually intended to make the ports infrastructure happy.
* WARNING, meaning you should clean up the input file, but output
is probably mostly OK, so this will not cause error-exit at the end.
This commit is mostly just converting the old system to the new one; before
the classification will become really reliable, we must check all messages.

In particular,
* set up a new central message string table in main.c
* drop the old message string tables from man.c and mdoc.c
* get rid of the piece-meal merr enums in libman and libmdoc
* reduce number of error/warning functions from 16 to 6 (still a lot...)

While here, handle a few problems more gracefully:
* allow .Rv and .Ex to work without a prior .Nm
* allow .An to ignore extra arguments
* allow undeclared columns in .Bl -column

Written by kristaps@.


# 1.53 20-May-2010 schwarze

Support nested roff instructions:
* allow roff_parseln() to be re-run
* allow roff_parseln() to manipulate the line buffer offset
* support the offset in the man and mdoc libraries
* adapt .if, .ie, .el, .ig, .am* and .de* support
* interpret some instructions even in conditional-negative context
Coded by kristaps during the last day of the mandoc hackathon.

To avoid regressions in the OpenBSD tree, commit this together
with some small local additions:
* detect roff block end "\}" even on macro lines
* actually implement the ".if n" conditional
* ignore .ds, .rm and .tr in libroff

Also back my old .if/.ie/.el-handling out of libman, reverting:
man.h 1.15 man.c 1.25 man_macro.c 1.15 man_validate.c 1.19
man_action.c 1.15 man_term.c 1.28 man_html.c 1.9.


# 1.52 16-May-2010 schwarze

Rewrite the main mdoc text parser, mdoc_ptext()
to make it easier to understand and to fix various bugs:
* strip white space from the end MDOC_TEXT elements in literal mode
* in literal mode, a line may be blank even when containing tabs
* escaped backslashes do not escape following characters
ok kristaps@


# 1.51 16-May-2010 schwarze

allow the single quote as a control character in place of the dot
at all relevant places;
from kristaps@


# 1.50 15-May-2010 schwarze

allow non-numeric manual sections in -mdoc;
while here, allow LIBRARY in section 9;
by kristaps@


# 1.49 15-May-2010 schwarze

various improvements regarding errors and warnings Joerg Sonnenberger:
* If the last -column .Bl isn't specified, it is auto-sized.
* An invalid .St argument should be a warning, not an error.
Just put the argument into the output.
* An invalid .At argument should be a warning, not an error.
Just print the argument, like new groff does.
* Remove warnings concerning manual section (like 1, 6, 8).
It was only used for .Ex and not really useful.
* Remove warnings concerning page section (like SYNOPSIS).
These were only used for .Fd and .Lb and not really useful.


# 1.48 14-May-2010 schwarze

Integrate kristaps@' end-of-sentence (EOS) framework
which is simpler and more powerful than mine, and remove mine.

* man(7) now has EOS handling, too
* put EOS detection into its own function in libmandoc
* use node and termp flags to communicate the EOS condition
* no more EOS pseudo-macro
* no more non-printable EOS marker character on the formatter level

This slightly breaks EOS detection after trailing punctuation
in mdoc(7) macros, but that will be restored soon.


# 1.47 14-May-2010 schwarze

Merge 1.9.25, keeping local patches;
this does not merge kristaps' end-of-sentences handling yet,
i will check that separately. This one includes:
* handle \*(Ba as a delimiter
* introduce ARGS_PEND for .Bl -column .It end-of-line special casing
* section ordering: expect EXIT STATUS at the right place
* line break fixes in SYNOPSIS
* allow literal contexts to have arbitrary line lengths
* the input file column number can not be used to identify the beginning
of a line because white space is allowed after the initial '.'
* proper leading spaces in -man -Tascii mode
* do not let Lb break lines in -mdoc -Thtml LIBRARY


# 1.46 14-May-2010 schwarze

merge 1.9.24, keeping local patches; some changes:
* preserve multiple consecutive space characters in input
* do not restrict .Cd and .Rv to certain sections (requested by Joerg)
* do not run lookup() on quoted words
* enum return types for mdoc_args and mdoc_argv
* fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel)
* various lint and manual fixes


# 1.45 08-May-2010 schwarze

merge bsd.lv rev 1.123:
sync mdoc.c's static function names with man.c


# 1.44 08-May-2010 schwarze

handle text lines beginning with \." as comments, like groff does,
even though this is not correct comment syntax (so warn, too)
reported by Claus Assmann on misc@, fix by kristaps@


# 1.43 04-May-2010 schwarze

end-of-sentence markers at the end of .Fn argument lists
ruin indentation of the next line in the SYNOPSIS section;
bug found by jacekm@ in err(3)


# 1.42 27-Apr-2010 schwarze

Fix a subtle bug noticed by naddy@ in pftop(8), thanks!

When converting blank lines to .Pp outside literal context,
it could happen that the following node ended up as a child
of the .Pp element, but it must always be a sibling.


# 1.41 22-Apr-2010 schwarze

Fix a segfault reported by nicm@, introduced in rev. 1.38.
When finding a blank line, trying to parse it is a bad idea.
Instead, after adding .Pp to the AST, just return from parsetext().


# 1.40 07-Apr-2010 schwarze

Merge the good parts of 1.9.23,
avoid the bad parts of 1.9.23, and keep local patches.

Input in general:
* Basic handling of roff-style font escapes \f, \F.
* Quoted punctuation does not count as punctuation.

mdoc(7) parser:
* Make .Pf callable; noted by Claus Assmann.
* Let .Bd and .Bl ignore unknown arguments; noted by deraadt@.
* Do not warn when .Er is used outside certain sections.
* Replace mdoc_node_free[list] by mdoc_node_delete.
* Replace #define by enum for rew*() return values.

man(7) parser:
* When .TH is missing, use default section and date.

Output in general:
* Curly braces do not count as punctuation.
* No space after .Fl w/o args when a macro follows on the same line.

HTML output:
* Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT().
* Print whitespace after, not before .Vt .Fn .Ft .Fo.

Checked that all manuals in base still build.


# 1.39 04-Apr-2010 schwarze

When the prologue lacks required information, do not error out,
but warn, set up some default values, and prod on.
Unbreaking the ports build for textproc/sgmlformat;
reported by naddy@, thanks.


# 1.38 03-Apr-2010 schwarze

* outside literal context in mdoc(7), handle blank lines like .Pp
* a missing NAME section in mdoc(7) need not be fatal

ok deraadt@


# 1.37 02-Apr-2010 schwarze

merge 1.9.22, keeping local patches
* convert mdoc tokens from #define to enum
* fix a segfault with .Xo/.Xc in explicit blocks
* Thorn is \*(Th, not \*(TH; noticed by Joerg Sonnenberger


# 1.36 25-Mar-2010 schwarze

fix a stupid out-of-bounds read access introduced in the previous
revision, in the code searching for the end of a sentence


Revision tags: OPENBSD_4_7_BASE
# 1.35 02-Mar-2010 schwarze

Proper inter-sentence spacing for mdoc(7).
When a text line or a non-block macro line in the source code ends
in any of ".!?", consider that an end of sentence (EOS).
This makes Jason's rule "new sentence, new line" even more important.
Let the parser detect the EOS and insert a token into the AST.
Let the -Tascii frontend render the EOS token as a double space before
the next word.


# 1.34 18-Feb-2010 schwarze

sync to release 1.9.15:
* corrected .Vt handling (spotted by Joerg Sonnenberger)
* corrected .Xr argument handling (based on my patch)
* removed \\ escape sequence (because it is for low-level roff only)
* warn about trailing whitespace (suggested by jmc@)
* -Txhtml support
* and some general cleanup and doc improvements


# 1.33 02-Jan-2010 schwarze

complete the sync to 1.9.15-pre2: mostly minor fixes
* bugfix: do not restore TERMP flags when leaving lists, just reset them
* and a few HTML fixes
* clarity: return width from a2width, not width+2, and adapt to it
* manual: document .Bl and .Fl
* portability: no need to escape '%' in macro names


# 1.32 22-Dec-2009 schwarze

sync to 1.9.12, mostly portability and refactoring:

correctness/functionality:
- bugfix: do not die when overstep hits the right margin
- new option: -fign-escape
- and various HTML features

portability:
- replace bzero(3) by memset(3), which is ANSI C
- replace err(3)/warn(3) by perror(3)/exit(3), which is ANSI C
- iuse argv[0] instead of __progname
- add time.h to various files for FreeBSD compilation

simplicity:
- do not allocate header/footer data dynamically in *_term.c
- provide and use malloc frontends that error out on failure

for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.31 27-Oct-2009 schwarze

sync to 1.9.11: adapt printing of dates to groff conventions,
NetBSD portability fixes and some minor bugfixes and feature enhancements;
also checked that my hyphenation code still works on top of this


# 1.30 21-Oct-2009 schwarze

sync to 1.9.9, featuring:
* -Thtml output mode
* roff scaling units
* and some minor fixes
for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.29 19-Oct-2009 schwarze

sync to 1.9.6: multiple improvements to references (.Rs)
* validate and order .Rs child nodes
* underline book title (.%B) and issuer (.%I)
* enclose title of article (.%T) in quotes
* avoid calling mdoc_verr directly, use a proper error code instead


# 1.28 19-Oct-2009 schwarze

sync to 1.9.6: u_char lives in <sys/types.h>
noticed by uqs at spoerlein dot net on FreeBSD,
where <stdlib.h> does not include <sys/types.h>


# 1.27 21-Sep-2009 schwarze

sync to 1.9.5: lookup hashes are now static tables
shortening the code, and, according to kristaps@, speeding it up


# 1.26 18-Sep-2009 schwarze

sync to 1.9.2: non-printable characters in macro names are errors;
from joerg at netbsd dot org


# 1.25 22-Aug-2009 schwarze

sync to 1.9.1: set mdoc_next flags in mdoc_*_alloc routines, where they belong


# 1.24 22-Aug-2009 schwarze

sync to 1.9.0: polishing the core code of mdoc macro handling
1) If a macro is not parsed, do not parse it. Of course, without
parsing it, we cannot produce "macro-like parameter" warnings,
but these were useless anyway.
2) If a macro is not callable, do not print a useless warning when
it occurs as a parameter, just display the raw characters.
3) Below .Bl -column, check whether macros are callable.
4) Like groff, allow whitespace after the initial dot on macro lines.


# 1.23 22-Aug-2009 schwarze

sync to 1.8.5: Error string is now file:line:col: message.
Fixed column reporting (off by one).
Use fprintf instead of warnx for parse errors (like cc).


# 1.22 26-Jul-2009 schwarze

sync to 1.8.1: rewrite quoted literal handling correctly,
rewrite TABSEP handling in a simpler way,
and retire ECOLEMPTY, ARGS_QUOTED and ARGS_ARGVLIKE


# 1.21 26-Jul-2009 schwarze

sync to 1.8.1: removed excessively verbose EARGVPARM warning


# 1.20 26-Jul-2009 schwarze

sync to 1.8.1: support .br and .sp


# 1.19 26-Jul-2009 schwarze

sync to 1.8.1: libmdoc now breaks up free-form lines into tokens;
will simplify LITERAL mode in front-end


# 1.18 18-Jul-2009 schwarze

sync to 1.8.0: move mdoc_a2att, mdoc_a2st, and mdoc_a2lib to libmdoc


# 1.17 12-Jul-2009 schwarze

sync to 1.7.23: pass warning code to mdoc_pwarn() instead of warning message
define additional warning macro mdoc_nwarn()
remove obsolete warning functions mdoc_warn(), pwarn(), vwarn(), nwarn()
remove various now unused "enum mdoc_warn" and "enum mwarn"


# 1.16 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_perr() instead of error string
and use the so improved mdoc_nerr() at many places;
get rid of now unused static functions perr()


# 1.15 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_nerr() instead of error string
and use the so improved mdoc_nerr() at many places


# 1.14 12-Jul-2009 schwarze

sync to 1.7.23: unify the various "enum merr" into libman.h and libmdoc.h,
use it as a new argument to mdoc_err(), the same way as for for man_err(),
and use string tables instead of switch statements to select error messages


# 1.13 12-Jul-2009 schwarze

sync to 1.7.23: third step to get rid of enum mdoc_warn:
mdoc_verr is not using enum mdoc_warn, so use it at a few more places


# 1.12 12-Jul-2009 schwarze

sync to 1.7.23: second step to get rid of enum mdoc_warn:
remove type from mdoc_vwarn arguments, and use this function where apropriate


# 1.11 12-Jul-2009 schwarze

sync to 1.7.23: first step to get rid of enum mdoc_warn:
unify manwarn() and mdocwarn() into mwarn()


Revision tags: OPENBSD_4_6_BASE
# 1.10 23-Jun-2009 schwarze

sync to 1.7.20: like for the -man case, add an nchild counter to the -mdoc
nodes, simplifying the validation code; no functional change


# 1.9 19-Jun-2009 schwarze

sync to 1.7.19: more elegant section handling


# 1.8 18-Jun-2009 schwarze

sync to 1.7.19: comment cleanup; no functional change


# 1.7 18-Jun-2009 schwarze

sync to 1.7.19: improved comment handling


# 1.6 18-Jun-2009 schwarze

complete sync to 1.7.17: garbage collect unused functions
mdoc_msg, mdoc_pmsg, mdoc_vmsg, and mdoc_nwarn


# 1.5 15-Jun-2009 schwarze

bring back miod@'s "real functions" patch (rev. 1.2)
which was erroneously backed out in rev. 1.4, sorry;
ok kristaps@


# 1.4 15-Jun-2009 schwarze

sync to 1.7.16:
reduce code duplication in warning and error reporting functions
while here, garbage collect three unused function prototypes


# 1.3 14-Jun-2009 schwarze

sync to 1.7.16: comments, whitespace and spelling fixes; no functional change


# 1.2 15-Apr-2009 miod

Replace variadic macros with real functions, so that this compiles on
platforms still using gcc 2.
ok deraadt@


# 1.1 06-Apr-2009 kristaps

Initial check-in of mandoc for formatting manuals. ok deraadt@


# 1.161 30-Dec-2018 schwarze

Cleanup, no functional change:

The struct roff_man used to be a bad mixture of internal parser
state and public parsing results. Move the public results to the
parsing result struct roff_meta, which is already public. Move the
rest of struct roff_man to the parser-internal header roff_int.h.

Since the validators need access to the parser state, call them
from the top level parser during mparse_result() rather than from
the main programs, also reducing code duplication.

This keeps parser internal state out of thee main programs (five
in mandoc portable) and out of eight formatters.


# 1.160 14-Dec-2018 schwarze

Almost mechanical diff to remove the "struct mparse *" argument
from mandoc_msg(), where it is no longer used.
While here, rename mandoc_vmsg() to mandoc_msg() and retire the
old version: There is really no point in having another function
merely to save "%s" in a few places.
Minus 140 lines of code.


# 1.159 04-Dec-2018 schwarze

Clean up the validation of .Pp, .PP, .sp, and .br. Make sure all
combinations are handled, and are handled in a systematic manner.
This resolves some erratic duplicate handling, handles a number of
missing cases, and improves diagnostics in various respects.

Move validation of .br and .sp to the roff validation module
rather than doing that twice in the mdoc and man validation modules.
Move the node relinking function to the roff library where it belongs.

In validation functions, only look at the node itself, at previous
nodes, and at descendants, not at following nodes or ancestors,
such that only nodes are inspected which are already validated.


Revision tags: OPENBSD_6_4_BASE
# 1.158 17-Aug-2018 schwarze

Remove more pointer arithmetic passing via regions outside the array
that is undefined according to the C standard. Robert Elz <kre at
munnari dot oz dot au> pointed out i wasn't quite done yet.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.157 11-Aug-2017 schwarze

Make the "new sentence, new line" check stricter, allowing digits
in the last two letters of the last word of the sentence.
No false positives in base or Xenocara.
Suggested by and OK jmc@.


# 1.156 17-Jun-2017 schwarze

correct handling of blank lines after \c


# 1.155 07-Jun-2017 schwarze

Also catch "new sentence, new line" if there are three blanks
between the sentences. Thomas Klausner says he has seen some
of these, and i don't see any false positives.


# 1.154 07-Jun-2017 schwarze

Make "new sentence, new line" detection stricter:
Also catch cases where the new sentence starts with a one-letter word
and the input line is broken right after that word.
Suggested by Thomas Klausner <wiz @ NetBSD>.

It's merely a three-bit diff, changing one byte from 0x34 to 0x33,
so what can possibly go wrong...


# 1.153 05-May-2017 schwarze

Move .sp to the roff modules. Enough infrastructure is in place
now that this actually saves code: -70 LOC.


# 1.152 29-Apr-2017 schwarze

Parser unification: use nice ohashes for all three request and macro tables;
no functional change, minus two source files, minus 200 lines of code.


# 1.151 24-Apr-2017 schwarze

Continue parser unification:
* Make enum rofft an internal interface as enum roff_tok in "roff.h".
* Represent mdoc and man macros in enum roff_tok.
* Make TOKEN_NONE a proper enum value and use it throughout.
* Put the prologue macros first in the macro tables.
* Unify mdoc_macroname[] and man_macroname[] into roff_name[].


Revision tags: OPENBSD_6_1_BASE
# 1.150 03-Mar-2017 schwarze

remove a few redundant conditions that jsg@ found with cppcheck


# 1.149 16-Feb-2017 schwarze

Remove the ENDBODY_NOSPACE flag, simplifying the code.

Comparing to groff output, it appears that all cases where it was used
and made a difference actually require the opposite, ENDBODY_SPACE.

I have no idea why i added it back in 2010; maybe to compensate for
some other bug that has long been fixed.


# 1.148 28-Jan-2017 schwarze

Add a warning "new sentence, new line".
This does not attempt to pinpoint each and every offender, but
instead tries very hard to avoid false positives: Currently, there
are only two false positives in the whole OpenBSD base system.
Only do this in mdoc(7), not in man(7), because manuals written
in man(7) typically have much worse problems than this.
OK jmc@ on a previous version of the patch


# 1.147 10-Jan-2017 schwarze

unify names of AST node flags; no change of cpp output


# 1.146 20-Aug-2016 schwarze

If a column list starts with implicit rows (that is, rows without .It)
and roff-level nodes (e.g. tbl or eqn) follow, don't run into an
assertion. Instead, wrap the roff-level nodes in their own row.
Issue found by tb@ with afl(1).


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.145 30-Oct-2015 schwarze

If a .Bd block has no arguments at all, drop the block and only keep
its contents. Removing a gratuitious difference to groff output
found after a related bug report from krw@.


# 1.144 20-Oct-2015 schwarze

In order to become able to generate syntax tree nodes on the roff(7)
level, validation must be separated from parsing and rewinding.
This first big step moves calling of the mdoc(7) post_*() functions
out of the parser loop into their own mdoc_validate() pass, while
using a new mdoc_state() module to make syntax tree state handling
available to both the parser loop and the validation pass.


# 1.143 12-Oct-2015 schwarze

To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.


# 1.142 06-Oct-2015 schwarze

modernize style: "return" is not a function; ok cmp(1)


Revision tags: OPENBSD_5_8_BASE
# 1.141 23-Apr-2015 schwarze

Unify mdoc_deroff() and man_deroff() into a common function deroff().
No functional change except that for mdoc(7), it now skips leading
escape sequences just like it already did for man(7).
Escape sequences rarely occur in mdoc(7) code and if they do,
skipping them is an improvement in this context.
Minus 30 lines of code.


# 1.140 23-Apr-2015 schwarze

Get rid of two empty wrapper functions. No functional change.


# 1.139 19-Apr-2015 schwarze

Unify trickier node handling functions.
* man_elem_alloc() -> roff_elem_alloc()
* man_block_alloc() -> roff_block_alloc()
The functions mdoc_elem_alloc() and mdoc_block_alloc() remain for
now because they need to do mdoc(7)-specific argument processing.


# 1.138 19-Apr-2015 schwarze

Unify some node handling functions that use TOKEN_NONE.
* mdoc_word_alloc(), man_word_alloc() -> roff_word_alloc()
* mdoc_word_append(), man_word_append() -> roff_word_append()
* mdoc_addspan(), man_addspan() -> roff_addtbl()
* mdoc_addeqn(), man_addeqn() -> roff_addeqn()
Minus 50 lines of code, no functional change.


# 1.137 19-Apr-2015 schwarze

Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.


# 1.136 19-Apr-2015 schwarze

Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.


# 1.135 18-Apr-2015 schwarze

Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.


# 1.134 18-Apr-2015 schwarze

Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.


# 1.133 18-Apr-2015 schwarze

Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.


# 1.132 18-Apr-2015 schwarze

Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.


# 1.131 18-Apr-2015 schwarze

Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.


# 1.130 02-Apr-2015 schwarze

Third step towards parser unification:
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.


# 1.129 02-Apr-2015 schwarze

Second step towards parser unification:
Replace struct mdoc_node and struct man_node by a unified struct roff_node.
To be able to use the tok member for both mdoc(7) and man(7) without
defining all the macros in roff.h, sacrifice a tiny bit of type safety
and make tok an int rather than an enum.
Almost mechanical, no functional change.
Written on the Eurostar from Bruxelles to London on the way to p2k15.


# 1.128 02-Apr-2015 schwarze

First step towards parser unification:
Replace enum mdoc_type and enum man_type by a unified enum roff_type.
Almost mechanical, no functional change.
Written on the ICE train from Frankfurt to Bruxelles on the way to p2k15.


Revision tags: OPENBSD_5_7_BASE
# 1.127 12-Feb-2015 schwarze

Do not confuse .Bl -column lists that just broken another block
with newly opened .Bl -column lists;
fixing an assertion failure jsg@ found with afl:
test case #481, Bl It Bl -column It Bd El text text El


# 1.126 12-Feb-2015 schwarze

Delete the mdoc_node.pending pointer and the function calculating
it, make_pending(), which was the most difficult function of the
whole mdoc(7) parser. After almost five years of maintaining this
hellhole, i just noticed the pointer isn't needed after all.

Blocks are always rewound in the reverse order they were opened;
that even holds for broken blocks. Consequently, it is sufficient
to just mark broken blogs with the flag MDOC_BROKEN and breaking
blocks with the flag MDOC_ENDED. When rewinding, instead of iterating
the pending pointers, just iterate from each broken block to its
parents, rewinding all that are MDOC_ENDED and stopping after
processing the first ancestor that it not MDOC_BROKEN. For ENDBODY
markers, use the mdoc_node.body pointer in place of the former
mdoc_node.pending.

This also fixes an assertion failure found by jsg@ with afl,
test case #467 (Bo Bl It Bd Bc It), where (surprise surprise)
the pending pointer got corrupted.

Improved functionality, minus one function, minus one struct field,
minus 50 lines of code.


# 1.125 05-Feb-2015 schwarze

Simplify by deleting the "lastline" member of struct mdoc_node.
Minus one struct member, minus 17 lines of code, no functional change.


# 1.124 02-Feb-2015 schwarze

Get rid of all calls to rew_sub() in blk_exp_close(); only ten calls
remain in other functions. As a bonus, this fixes an assertion failure
jsg@ found some time ago with afl (test case 982) and improves minor
details in error reporting.


# 1.123 15-Jan-2015 schwarze

Fatal errors no longer exist.
If a file can be opened, mandoc will produce some output;
at worst, the output may be almost empty.
Simplifies error handling and frees a message type for future use.


# 1.122 28-Nov-2014 schwarze

Simplify by making the eqn and tbl steering functions void;
no functional change, minus 15 lines of code.


# 1.121 28-Nov-2014 schwarze

Simplify by making the mdoc parser callbacks void, and some cleanup;
no functional change, minus 50 lines of code.


# 1.120 28-Nov-2014 schwarze

Simplify the code by making various mdoc parser helper functions void.
No functional change, minus 130 lines of code.


# 1.119 28-Nov-2014 schwarze

Simplify code by making mdoc validation handlers void.
No functional change, minus 90 lines of code.


# 1.118 19-Nov-2014 schwarze

Escape sequences terminate high-level macro names, and when doing so,
they are ignored, just in the same way as for request names
and for low-level macro names.
This also cures a warning in the pod2man(1) preamble.


# 1.117 20-Oct-2014 schwarze

correct the spacing after in-line equations
that start at the beginning of an input line
but end before the end of an input line


# 1.116 20-Oct-2014 schwarze

correct spacing before inline equations


# 1.115 16-Oct-2014 schwarze

Implement in-line equations, much needed by Xenocara manuals.
Put the steering into the roff parser rather than into the mdoc
parser such that it works for all macro languages and on both text
and macro lines.
Line breaks and blank characters generated before and after in-line
equations are not perfect yet, but let's do one thing at a time.


# 1.114 06-Sep-2014 schwarze

Simplify by handling empty request lines at the one logical place
in the roff parser instead of in three other places in other parsers.
No functional change.


# 1.113 08-Aug-2014 schwarze

Bring the handling of defective prologues even closer to groff,
in particular relaxing the distinction between prologue and body
and further improving messages.
* The last .Dd wins and the last .Os wins, even in the body.
* The last .Dt before the first body macro wins.
* Missing title in .Dt defaults to UNTITLED. Warn about it.
* Missing section in .Dt does not default to 1. But warn about it.
* Do not warn multiple times about the same mdoc(7) prologue macro.
* Warn about missing .Os.
* Incomplete .TH defaults to empty strings. Warn about it.


# 1.112 08-Aug-2014 schwarze

mention requests and macros in more messages


# 1.111 08-Aug-2014 schwarze

Simplify: replace one global flag by one local variable
and remove three unused global flags. No functional change.


Revision tags: OPENBSD_5_6_BASE
# 1.110 09-Jul-2014 schwarze

mark defos as const; nobody needs to change it,
and it is occasionally useful to be able to pass literal strings


# 1.109 07-Jul-2014 schwarze

no need to skip content before first section header


# 1.108 06-Jul-2014 schwarze

Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.


# 1.107 02-Jul-2014 schwarze

Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!


# 1.106 01-Jul-2014 schwarze

Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.


# 1.105 20-Jun-2014 schwarze

Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.

1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue

Started on the plane back from Ottawa.


# 1.104 25-Apr-2014 schwarze

Fix a minor optimization i broke in bsd.lv rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!


# 1.103 20-Apr-2014 schwarze

KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change


# 1.102 30-Mar-2014 schwarze

Implement the roff(7) .ll (line length) request.
Found by naddy@ in the textproc/enchant(1) port.
Of course, do not use this in new manuals.


# 1.101 23-Mar-2014 schwarze

If an .Nd block contains macros, avoid fragmented entries in mandocdb(8),
instead use the .Nd content recursively.
Improves a couple of index entries in base.


# 1.100 21-Mar-2014 schwarze

avoid repetitive code for asprintf error handling


# 1.99 21-Mar-2014 schwarze

The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.


Revision tags: OPENBSD_5_5_BASE
# 1.98 05-Jan-2014 schwarze

Add an option -Q (quick) to mandocdb(8)
for accelerated generation of reduced-size databases.

Implement this by allowing the parsers to optionally
abort the parse sequence after the NAME section.

While here, garbage collect the unused void *arg attribute
of struct mparse and mparse_alloc().

This reduces the processing time of mandocdb(8) on /usr/share/man
by a factor of 2 and the database size by a factor of 4.
However, it still takes 5 times the time and 6 times the space
of makewhatis(8), so more work is clearly needed.


# 1.97 30-Dec-2013 schwarze

Simplify: Remove an unused argument from the mandoc_eos() function.
No functional change.


# 1.96 24-Dec-2013 schwarze

When deciding whether two consecutive macros are on the same input line,
we have to compare the line where the first one *ends* (not where it begins)
to the line where the second one starts.
This fixes the bug that .Bk allowed output line breaks right after block
macros spanning more than one input line, even when the next macro follows
on the same line.


# 1.95 21-Oct-2013 schwarze

There are three kinds of input lines: text lines, macros taking
positional arguments (like Dt Fn Xr) and macros taking text as
arguments (like Nd Sh Em %T An). In the past, even the latter put
each word of their arguments into its own MDOC_TEXT node; instead,
concatenate arguments unless delimiters, keeps or spacing mode
prevent that. Regarding mandoc(1), this is internal refactoring,
no output change intended.

Once we will switch mandocdb(8) from DB to SQLite in the future,
this is going to be required to support search expressions crossing
word boundaries, and it will reduce both database sizes and build
times by a bit more than 5% each.


# 1.94 03-Oct-2013 schwarze

Support setting arbitrary roff(7) number registers,
preserving read support for the ".nr nS" SYNOPSIS state register;
read support for arbitrary registers is still not available.

Inspired by NetBSD roff.c rev. 1.18 (Christos Zoulas, March 21, 2013),
but implemented differently. I don't want to have yet another different
implementation of a hash table in mandoc - it would be the second one
in roff.c alone and the fifth one in mandoc grand total.
Instead, i designed and implemented roff_setreg() and roff_getreg()
to be similar to roff_setstrn() and roff_getstrn().

Once we feel the need to optimize, we can introduce one common
hash table implementation for everything in mandoc.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.93 17-Nov-2012 schwarze

Cleanup naming of local variables to make the code easier on the eye:
Settle for "struct man *man", "struct mdoc *mdoc", "struct meta *meta"
and avoid the confusing "*m" which was sometimes this, sometimes that.
No functional change.

ok kristaps@ some time ago


# 1.92 16-Nov-2012 schwarze

Fix a crash triggered by .Bl -tag .It Xo .El .Sh found by florian@.

* When allocating a body end marker, copy the pointer to the normalized
block information from the body block, avoiding the risk of subsequent
null pointer derefence.
* When inserting the body end marker into the syntax tree, do not try to
copy that pointer from the parent block, because not being a direkt child
of the block it belongs to is the whole point of a body end marker.
* Even non-callable blocks (like Bd and Bl) can break other blocks;
when this happens, postpone closing them out in the usual way.


Revision tags: OPENBSD_5_2_BASE
# 1.91 18-Jul-2012 schwarze

Fix handling of paragraph macros inside lists:
* When they are trailing the last item, move them outside the list.
* When they are trailing any other none-compact item, drop them.

Improves formatting of 40 pages, e.g. grep(1), ksh(1), netstat(1),
ath(4), bsd.port.mk(5), pf.conf(5), mount(8), crypto(9).


# 1.90 18-Jul-2012 schwarze

The mdoc(7) \*(Ba predefined string actually forces roman font;
that's stupid because it may break enclosing font changes,
but let's do the same for groff bug compatibility.

--> Never use \*(Ba, use just plain "|"! <--

Also, predefined strings are already expanded by the roff(7) parser,
so the mdoc(7) parser has to look for the expanded string.

Formatting improvements in ksh(1), less(1), atan2(3),
hostapd.conf(5), snmpd.conf(5), and mknod(8).


# 1.89 16-Jul-2012 schwarze

Several -mdoc parser improvements related to vertical spacing:
* So far, .Pp and .Lp were removed before paragraph type blocks.
* Now also remove .br before paragraph type blocks.
* Treat .Lp as a paragraph like .Pp, so remove .Pp, .Lp, .br before it.
* Do not treat .sp as a paragraph, don't remove anything before it.
* After .Sh, .Ss, .Pp, and .Lp, remove .Pp, .Lp, .sp, .br, and blank lines.
* After .sp and .br, remove .br.


# 1.88 07-Jul-2012 schwarze

Support the .cc request; code by kristaps@, tests by me.
Needed for sqlite3(1) as reported by espie@.


# 1.87 24-May-2012 schwarze

Support -Ios='OpenBSD 5.1' to override uname(3) as the source of the
default value for the mdoc(7) .Os macro.
Needed for man.cgi on the OpenBSD website.

Problem with man.cgi first noticed by deraadt@;
beck@ and deraadt@ agree with the way to solve the issue.


Revision tags: OPENBSD_5_1_BASE
# 1.86 30-Sep-2011 schwarze

implement .Ap .Bd .Bo .Bq .D1 .Ic .Lp .Oo .Pf .Po .Ss .Sx .Sy .br .sp
implement .Bl -bullet
add more information to the .TH line
escape dots at the beginnings of lines
add trailing newline character at the end of the file
do not misinterpret the ROOT block as .Ap


# 1.85 18-Sep-2011 schwarze

sync to version 1.11.7 from kristaps@
main new feature: support the roff(7) .tr request
plus various bugfixes and some refactoring

regressions are so minor that it's better to get this in
and fix them in the tree


# 1.84 18-Sep-2011 schwarze

sync to version 1.11.5:
adding an implementation of the eqn(7) language
by kristaps@

So far, only .EQ/.EN blocks are handled, in-line equations are not, and
rendering is not yet very pretty, but the parser is fairly complete.


Revision tags: OPENBSD_5_0_BASE
# 1.83 24-Apr-2011 schwarze

Merge version 1.11.1:
Again lots of cleanup and maintenance work by kristaps@.
- simplify error reporting: less function pointers, more mandoc_[v]msg
- main: split document parsing out of main.c into read.c
- roff, mdoc, man: improved recognition of control characters
- roff: better handling of if/else stack overflows
- roff: add some predefined strings for backward compatibility
- mdoc, man: empty sections are not errors
- mdoc: move delimiter handling to libmdoc
- some header restructuring and some minor features and fixes
This merge causes two minor regressions
that i will fix in separate commits right afterwards.


# 1.82 21-Apr-2011 schwarze

Merge version 1.10.10:
lots of cleanup and maintenance work by kristaps@.
- move some main.c globals into struct curparse
- move mandoc_*alloc to mandoc.h such that all code can use them
- make mandoc_isdelim available to formatting frontends
- dissolve mdoc_strings.c, move the code where it is used
- make all error reporting functions void, their return values were useless
- and various minor cleanups and fixes


# 1.81 20-Mar-2011 schwarze

Import the foundation for eqn(7) support.
Written by kristaps@.

For now, i'm adding one line to each of the four frontends
to just pass the input text through to the output,
not yet interpreting any of then eqn keywords.


# 1.80 07-Mar-2011 schwarze

Clean up date handling,
as a first step to get rid of the frequent petty warnings in this area:
- always store dates as strings, not as seconds since the Epoch
- for input, try the three most common formats everywhere
- for unrecognized format, just pass the date though verbatim
- when there is no date at all, still use the current date
Originally triggered by a one-line patch from Tim van der Molen,
<tbvdm at xs4all dot nl>, which is included here.
Feedback and OK on manual parts from jmc@.
"please check this in" kristaps@


Revision tags: OPENBSD_4_9_BASE
# 1.79 10-Feb-2011 schwarze

Tbl code maintenance by kristaps@.
- Remember the line-number of a tbl_span, and use it in messages.
- Put *_span_alloc() functions right into the *_addspan() ones,
since these are the only places they are called from.


# 1.78 09-Jan-2011 schwarze

Make sure coding errors cannot make us miss fatal parsing errors
by assert(3)ing valid parser state in the main parsing functions;
from kristaps@.


# 1.77 04-Jan-2011 schwarze

Merge kristaps@' cleaner tbl integration, removing mine;
there are still a few bugs, but fixing these will be easier in tree.


# 1.76 01-Jan-2011 schwarze

Clean up {mdoc,man}_{p,v}msg invocations:
Ignore the return values, they are constant anyway.
From kristaps@.


# 1.75 29-Dec-2010 schwarze

Reorg by Kristaps: In libmdoc, replace the union of pointers to structs
of macro-specific data by a pointer to a union of structs, which makes the
code simpler and more robust at the expense of a small memory overhead.
Merging was somewhat difficult because we mustn't break tbl(1) support
which the bsd.lv version does not yet have.


# 1.74 26-Dec-2010 schwarze

Behave more like groff (both old and new): Specifying both .%T and .%J in
an .Rs block causes the title to be quoted instead of underlined, such
that journal title and article title appear visually different.
Original diff from kristaps@, simplified by me, tweaked again by kristaps@.


# 1.73 21-Dec-2010 schwarze

Migrate .An to use a pointer to its data, like everybody else.
In preparation for a simpler ref-counted system for node data.
From kristaps@.


# 1.72 21-Dec-2010 schwarze

Vertical spacing improvements from kristaps@, small tweaks by me:
Add a "last child" member to struct mdoc_node.
Remove .Pp or .Lp if it is the first or last child of an .Sh or .Ss body.
Thus, no need to do the same in the front-ends any longer.
Tolerate some cases of .Pp inside .Bl.


# 1.71 02-Dec-2010 schwarze

Properly initialize the manual section to a default when .Dt is missing.
Without this, we died on an assertion.
Problem noted and patch provided by kristaps@.


# 1.70 01-Dec-2010 schwarze

Merge mdoc_action.c into mdoc_validate.c, because having two places to do
basically the same things just causes code duplication and confusion.
Work by kristaps@, including a few bugfixes he found during the merge,
and reapplying OpenBSD changes on top.


# 1.69 28-Nov-2010 schwarze

To avoid FATAL errors, we have been parsing and ignoring the roff
requests .am, .ami, .am1, .dei, and .rm for a long time.
Since ignoring them can (rarely) cause information loss and serious
misformatting, throw an ERROR: NOT IMPLEMENTED when finding them.
Implementing them would not be too difficult, but they are so rare
in practice that i can find better use for my time right now.

In this context,
- Put the string "NOT IMPLEMENTED" into two other error messages
as well, to distinguish them from those caused by broken input.
- Print the string "unknown macro" once, not twice in the error message
associated with MANDOCERR_MACRO, and begin printing the buffer at the
point where the unknown macro really is, not at the start of line.


# 1.68 16-Oct-2010 schwarze

Do not abort() on tbl errors, reduce the risk that tbl stuff kills a build,
and provide more useful tbl error messages in a non-intrusive way.


# 1.67 16-Oct-2010 schwarze

Support tbl(1) code embedded into mdoc(7) input files.
Very similar to what i have done in man(7) yesterday.
Allows to build cpu(4) on HPPA, wi(4), and phantasia(6).
Now we are able to build all tbl code in base.


# 1.66 27-Sep-2010 schwarze

Merge the last bits of 1.10.6 (released today), most were already in:
* ignore double-.Pp
* ignore .Pp before .Bd and .Bl (unless -compact in specified)
* avoid double blank line upon .Pp, .br and friends in literal context
* cast enums to int when passing them to exit(3) to please lint(1)
While merging, fix a regression introduced by kristaps@:
Outside literal mode, double blank lines must both be printed.
To achieve this again after kristaps@ improvements in 1.10.6,
treat such blank lines as .sp (instead of .Pp as in 1.10.5)
and drop .Pp before .sp just like dropping .Pp before .Pp.


# 1.65 20-Aug-2010 schwarze

Implement a simple, consistent user interface for error handling.
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.


# 1.64 18-Aug-2010 schwarze

Simplify and sync the code and comments for copying the macro name
in man_pmacro() and mdoc_pmacro(). In particular, no need to use
isgraph(3) here, that has already been done in main.c.
Joint work by Kristaps and myself, ok kristaps@.


Revision tags: OPENBSD_4_8_BASE
# 1.63 07-Aug-2010 schwarze

Groff allows the initial macro on a line to be delimited by a space
of by a tab; so allow the tab in mandoc, too.
Bug found by me, fix by kristaps@, "sure" deraadt@.


# 1.62 16-Jul-2010 schwarze

Text ending in a full stop, exclamation mark or question mark
should not flag the end of a sentence if:

1) The punctuation is followed by closing delimiters
and not preceded by alphanumeric characters, like in
"There is no full stop (.) in this sentence"

or

2) The punctuation is a child of a macro
and not preceded by alphanumeric characters, like in
"There is no full stop
.Pq \&.
in this sentence"

jmc@ and sobrado@ like this


# 1.61 13-Jul-2010 schwarze

Merge release 1.10.4 (all code by kristaps@), providing four new features:
1) Proper .Bk support: allow output line breaks at input line breaks,
but keep input lines together in the output, finally fixing
synopses like aucat(1), mail(1) and tmux(1).
2) Mostly finished -Tps (PostScript) output.
3) Implement -Thtml output for .Nm blocks and .Bk -words.
4) Allow iterative interpolation of user-defined roff(7) strings.
Also contains some minor bugfixes and some performance improvements.


# 1.60 01-Jul-2010 schwarze

In the mdoc(7) parser, inspect roff registers early such that all parts
of the parser can use the resulting cues. In particular, this allows
to use .nr nS to force SYNOPSIS-style .Nm indentation outside the
SYNOPSIS as needed by ifconfig(8).

To actually make this useable, .Pp must rewind .Nm, or the rest of the
section would end up indented. Implement a quick hack for now,
a generic solution can be designed later.

ok kristaps@ sobrado@


# 1.59 29-Jun-2010 schwarze

Support for badly nested blocks, written around the time of
the Rostock mandoc hackathon and tested and polished since,
supporting constructs like:

.Ao Bo Ac Bc (exp breaking exp)
.Aq Bo eol Bc (imp breaking exp)
.Ao Bq Ac eol (exp breaking imp)
.Ao Bo So Bc Ac Sc (double break, inner before outer)
.Ao Bo So Ac Bc Sc (double break, outer before inner)
.Ao Bo Ac So Bc Sc (broken breaker)
.Ao Bo So Bc Do Ac Sc Dc (broken double breaker)

There are still two known issues which are tricky:

1) Breaking two identical explicit blocks (Ao Bo Bo Ac or Aq Bo Bo eol)
fails outright, triggering a bogus syntax error.
2) Breaking a block by two identical explicit blocks (Ao Ao Bo Ac Ac Bc
or Ao Ao Bq Ac Ac eol) still has a minor rendering error left:
"<ao1 <ao2 [bo ac2> ac1> bc]>" should not have the final ">".

We can fix these later in the tree, let's not grow this diff too large.

"get it in" kristaps@


# 1.58 27-Jun-2010 schwarze

Full .nr nS support, unbreaking the kernel manuals.

Kristaps coded this from scratch after reading my .nr patch;
it is simpler and more powerful.

Registers live in struct regset in regs.h, struct man and struct mdoc
contain pointers to it. The nS register is cleared when parsing .Sh.
Frontends respect the MDOC_SYNPRETTY flag set in mdoc node_alloc.


# 1.57 26-Jun-2010 schwarze

merge release 1.10.2
* bug fixes:
- interaction of ASCII_HYPH with special chars (found by Ulrich Spoerlein)
- handling of roff conditionals (found by Ulrich Spoerlein)
- .Bd -offset will no more default to 6n
* maintenance:
- more caching of .Bd and .Bl arguments for efficiency
- deconstify man(7) validation routines
- add FreeBSD library names (provided by Ulrich Spoerlein)
* start PostScript font-switching


# 1.56 06-Jun-2010 schwarze

Merge bsd.lv version 1.10.1 (to be released soon).

The main step forward is that this now has *much* better .Bl -column
support, now supporting many manuals that previously errored out
without producing any output.

Other fixes include:
* do not die from multiple list types, use the first and warn
* in .Bl without a type, default to -item
* various tweaks to .Dt
* fix .In, .Fd, .Ft, .Fn and .Fo formatting
* some documentation fixes and additions
* and fix a couple of bugs reported by Ulrich Spoerlein:
* better support for roff block-end "\}" without a preceding dot
* .In must not break the line outside SYNOPSIS
* spelling in some error messages

While merging, fix one regression in .In spacing
that needs to go to bsd.lv, too.


# 1.55 26-May-2010 schwarze

When a word does not fully fit onto the output line, but it contains
at least one hyphen, we already had support for breaking the line a the
last fitting hyphen. This patch improves this functionality by only
breaking at hyphens in free-form text, and by not breaking at hyphens
* at the beginning or end of a word or
* immediately preceded or followed by another hyphen or
* escaped by a preceding backslash.

Before this patch, differences in break-at-hyphen support were one
of the major sources of noise in automatic comparisons to mdoc(7)
groff output. Now, the remaining differences are hard to find among
the noise coming from other sources.

Where there are still differences, what we do seems to be better than
what groff does, see e.g. the chio(1) exchange and position commands
for one of the now rare examples.

idea and coding by kristaps@

Besides, this was the last substantial code difference left
between bsd.lv and openbsd.org. We are now in full sync.


# 1.54 23-May-2010 schwarze

Unified error and warning message system for all of mandoc,
featuring three message levels, as agreed during the mandoc hackathon:
* FATAL parser failure, cannot produce any output from this input file:
eventually, we hope to convert most of these to ERRORs.
* ERROR, meaning mandoc cannot cope fully with the input syntax and will
probably lose information or produce structurally garbled output;
it will try to produce output anyway but exit non-zero at the end,
which is eventually intended to make the ports infrastructure happy.
* WARNING, meaning you should clean up the input file, but output
is probably mostly OK, so this will not cause error-exit at the end.
This commit is mostly just converting the old system to the new one; before
the classification will become really reliable, we must check all messages.

In particular,
* set up a new central message string table in main.c
* drop the old message string tables from man.c and mdoc.c
* get rid of the piece-meal merr enums in libman and libmdoc
* reduce number of error/warning functions from 16 to 6 (still a lot...)

While here, handle a few problems more gracefully:
* allow .Rv and .Ex to work without a prior .Nm
* allow .An to ignore extra arguments
* allow undeclared columns in .Bl -column

Written by kristaps@.


# 1.53 20-May-2010 schwarze

Support nested roff instructions:
* allow roff_parseln() to be re-run
* allow roff_parseln() to manipulate the line buffer offset
* support the offset in the man and mdoc libraries
* adapt .if, .ie, .el, .ig, .am* and .de* support
* interpret some instructions even in conditional-negative context
Coded by kristaps during the last day of the mandoc hackathon.

To avoid regressions in the OpenBSD tree, commit this together
with some small local additions:
* detect roff block end "\}" even on macro lines
* actually implement the ".if n" conditional
* ignore .ds, .rm and .tr in libroff

Also back my old .if/.ie/.el-handling out of libman, reverting:
man.h 1.15 man.c 1.25 man_macro.c 1.15 man_validate.c 1.19
man_action.c 1.15 man_term.c 1.28 man_html.c 1.9.


# 1.52 16-May-2010 schwarze

Rewrite the main mdoc text parser, mdoc_ptext()
to make it easier to understand and to fix various bugs:
* strip white space from the end MDOC_TEXT elements in literal mode
* in literal mode, a line may be blank even when containing tabs
* escaped backslashes do not escape following characters
ok kristaps@


# 1.51 16-May-2010 schwarze

allow the single quote as a control character in place of the dot
at all relevant places;
from kristaps@


# 1.50 15-May-2010 schwarze

allow non-numeric manual sections in -mdoc;
while here, allow LIBRARY in section 9;
by kristaps@


# 1.49 15-May-2010 schwarze

various improvements regarding errors and warnings Joerg Sonnenberger:
* If the last -column .Bl isn't specified, it is auto-sized.
* An invalid .St argument should be a warning, not an error.
Just put the argument into the output.
* An invalid .At argument should be a warning, not an error.
Just print the argument, like new groff does.
* Remove warnings concerning manual section (like 1, 6, 8).
It was only used for .Ex and not really useful.
* Remove warnings concerning page section (like SYNOPSIS).
These were only used for .Fd and .Lb and not really useful.


# 1.48 14-May-2010 schwarze

Integrate kristaps@' end-of-sentence (EOS) framework
which is simpler and more powerful than mine, and remove mine.

* man(7) now has EOS handling, too
* put EOS detection into its own function in libmandoc
* use node and termp flags to communicate the EOS condition
* no more EOS pseudo-macro
* no more non-printable EOS marker character on the formatter level

This slightly breaks EOS detection after trailing punctuation
in mdoc(7) macros, but that will be restored soon.


# 1.47 14-May-2010 schwarze

Merge 1.9.25, keeping local patches;
this does not merge kristaps' end-of-sentences handling yet,
i will check that separately. This one includes:
* handle \*(Ba as a delimiter
* introduce ARGS_PEND for .Bl -column .It end-of-line special casing
* section ordering: expect EXIT STATUS at the right place
* line break fixes in SYNOPSIS
* allow literal contexts to have arbitrary line lengths
* the input file column number can not be used to identify the beginning
of a line because white space is allowed after the initial '.'
* proper leading spaces in -man -Tascii mode
* do not let Lb break lines in -mdoc -Thtml LIBRARY


# 1.46 14-May-2010 schwarze

merge 1.9.24, keeping local patches; some changes:
* preserve multiple consecutive space characters in input
* do not restrict .Cd and .Rv to certain sections (requested by Joerg)
* do not run lookup() on quoted words
* enum return types for mdoc_args and mdoc_argv
* fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel)
* various lint and manual fixes


# 1.45 08-May-2010 schwarze

merge bsd.lv rev 1.123:
sync mdoc.c's static function names with man.c


# 1.44 08-May-2010 schwarze

handle text lines beginning with \." as comments, like groff does,
even though this is not correct comment syntax (so warn, too)
reported by Claus Assmann on misc@, fix by kristaps@


# 1.43 04-May-2010 schwarze

end-of-sentence markers at the end of .Fn argument lists
ruin indentation of the next line in the SYNOPSIS section;
bug found by jacekm@ in err(3)


# 1.42 27-Apr-2010 schwarze

Fix a subtle bug noticed by naddy@ in pftop(8), thanks!

When converting blank lines to .Pp outside literal context,
it could happen that the following node ended up as a child
of the .Pp element, but it must always be a sibling.


# 1.41 22-Apr-2010 schwarze

Fix a segfault reported by nicm@, introduced in rev. 1.38.
When finding a blank line, trying to parse it is a bad idea.
Instead, after adding .Pp to the AST, just return from parsetext().


# 1.40 07-Apr-2010 schwarze

Merge the good parts of 1.9.23,
avoid the bad parts of 1.9.23, and keep local patches.

Input in general:
* Basic handling of roff-style font escapes \f, \F.
* Quoted punctuation does not count as punctuation.

mdoc(7) parser:
* Make .Pf callable; noted by Claus Assmann.
* Let .Bd and .Bl ignore unknown arguments; noted by deraadt@.
* Do not warn when .Er is used outside certain sections.
* Replace mdoc_node_free[list] by mdoc_node_delete.
* Replace #define by enum for rew*() return values.

man(7) parser:
* When .TH is missing, use default section and date.

Output in general:
* Curly braces do not count as punctuation.
* No space after .Fl w/o args when a macro follows on the same line.

HTML output:
* Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT().
* Print whitespace after, not before .Vt .Fn .Ft .Fo.

Checked that all manuals in base still build.


# 1.39 04-Apr-2010 schwarze

When the prologue lacks required information, do not error out,
but warn, set up some default values, and prod on.
Unbreaking the ports build for textproc/sgmlformat;
reported by naddy@, thanks.


# 1.38 03-Apr-2010 schwarze

* outside literal context in mdoc(7), handle blank lines like .Pp
* a missing NAME section in mdoc(7) need not be fatal

ok deraadt@


# 1.37 02-Apr-2010 schwarze

merge 1.9.22, keeping local patches
* convert mdoc tokens from #define to enum
* fix a segfault with .Xo/.Xc in explicit blocks
* Thorn is \*(Th, not \*(TH; noticed by Joerg Sonnenberger


# 1.36 25-Mar-2010 schwarze

fix a stupid out-of-bounds read access introduced in the previous
revision, in the code searching for the end of a sentence


Revision tags: OPENBSD_4_7_BASE
# 1.35 02-Mar-2010 schwarze

Proper inter-sentence spacing for mdoc(7).
When a text line or a non-block macro line in the source code ends
in any of ".!?", consider that an end of sentence (EOS).
This makes Jason's rule "new sentence, new line" even more important.
Let the parser detect the EOS and insert a token into the AST.
Let the -Tascii frontend render the EOS token as a double space before
the next word.


# 1.34 18-Feb-2010 schwarze

sync to release 1.9.15:
* corrected .Vt handling (spotted by Joerg Sonnenberger)
* corrected .Xr argument handling (based on my patch)
* removed \\ escape sequence (because it is for low-level roff only)
* warn about trailing whitespace (suggested by jmc@)
* -Txhtml support
* and some general cleanup and doc improvements


# 1.33 02-Jan-2010 schwarze

complete the sync to 1.9.15-pre2: mostly minor fixes
* bugfix: do not restore TERMP flags when leaving lists, just reset them
* and a few HTML fixes
* clarity: return width from a2width, not width+2, and adapt to it
* manual: document .Bl and .Fl
* portability: no need to escape '%' in macro names


# 1.32 22-Dec-2009 schwarze

sync to 1.9.12, mostly portability and refactoring:

correctness/functionality:
- bugfix: do not die when overstep hits the right margin
- new option: -fign-escape
- and various HTML features

portability:
- replace bzero(3) by memset(3), which is ANSI C
- replace err(3)/warn(3) by perror(3)/exit(3), which is ANSI C
- iuse argv[0] instead of __progname
- add time.h to various files for FreeBSD compilation

simplicity:
- do not allocate header/footer data dynamically in *_term.c
- provide and use malloc frontends that error out on failure

for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.31 27-Oct-2009 schwarze

sync to 1.9.11: adapt printing of dates to groff conventions,
NetBSD portability fixes and some minor bugfixes and feature enhancements;
also checked that my hyphenation code still works on top of this


# 1.30 21-Oct-2009 schwarze

sync to 1.9.9, featuring:
* -Thtml output mode
* roff scaling units
* and some minor fixes
for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.29 19-Oct-2009 schwarze

sync to 1.9.6: multiple improvements to references (.Rs)
* validate and order .Rs child nodes
* underline book title (.%B) and issuer (.%I)
* enclose title of article (.%T) in quotes
* avoid calling mdoc_verr directly, use a proper error code instead


# 1.28 19-Oct-2009 schwarze

sync to 1.9.6: u_char lives in <sys/types.h>
noticed by uqs at spoerlein dot net on FreeBSD,
where <stdlib.h> does not include <sys/types.h>


# 1.27 21-Sep-2009 schwarze

sync to 1.9.5: lookup hashes are now static tables
shortening the code, and, according to kristaps@, speeding it up


# 1.26 18-Sep-2009 schwarze

sync to 1.9.2: non-printable characters in macro names are errors;
from joerg at netbsd dot org


# 1.25 22-Aug-2009 schwarze

sync to 1.9.1: set mdoc_next flags in mdoc_*_alloc routines, where they belong


# 1.24 22-Aug-2009 schwarze

sync to 1.9.0: polishing the core code of mdoc macro handling
1) If a macro is not parsed, do not parse it. Of course, without
parsing it, we cannot produce "macro-like parameter" warnings,
but these were useless anyway.
2) If a macro is not callable, do not print a useless warning when
it occurs as a parameter, just display the raw characters.
3) Below .Bl -column, check whether macros are callable.
4) Like groff, allow whitespace after the initial dot on macro lines.


# 1.23 22-Aug-2009 schwarze

sync to 1.8.5: Error string is now file:line:col: message.
Fixed column reporting (off by one).
Use fprintf instead of warnx for parse errors (like cc).


# 1.22 26-Jul-2009 schwarze

sync to 1.8.1: rewrite quoted literal handling correctly,
rewrite TABSEP handling in a simpler way,
and retire ECOLEMPTY, ARGS_QUOTED and ARGS_ARGVLIKE


# 1.21 26-Jul-2009 schwarze

sync to 1.8.1: removed excessively verbose EARGVPARM warning


# 1.20 26-Jul-2009 schwarze

sync to 1.8.1: support .br and .sp


# 1.19 26-Jul-2009 schwarze

sync to 1.8.1: libmdoc now breaks up free-form lines into tokens;
will simplify LITERAL mode in front-end


# 1.18 18-Jul-2009 schwarze

sync to 1.8.0: move mdoc_a2att, mdoc_a2st, and mdoc_a2lib to libmdoc


# 1.17 12-Jul-2009 schwarze

sync to 1.7.23: pass warning code to mdoc_pwarn() instead of warning message
define additional warning macro mdoc_nwarn()
remove obsolete warning functions mdoc_warn(), pwarn(), vwarn(), nwarn()
remove various now unused "enum mdoc_warn" and "enum mwarn"


# 1.16 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_perr() instead of error string
and use the so improved mdoc_nerr() at many places;
get rid of now unused static functions perr()


# 1.15 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_nerr() instead of error string
and use the so improved mdoc_nerr() at many places


# 1.14 12-Jul-2009 schwarze

sync to 1.7.23: unify the various "enum merr" into libman.h and libmdoc.h,
use it as a new argument to mdoc_err(), the same way as for for man_err(),
and use string tables instead of switch statements to select error messages


# 1.13 12-Jul-2009 schwarze

sync to 1.7.23: third step to get rid of enum mdoc_warn:
mdoc_verr is not using enum mdoc_warn, so use it at a few more places


# 1.12 12-Jul-2009 schwarze

sync to 1.7.23: second step to get rid of enum mdoc_warn:
remove type from mdoc_vwarn arguments, and use this function where apropriate


# 1.11 12-Jul-2009 schwarze

sync to 1.7.23: first step to get rid of enum mdoc_warn:
unify manwarn() and mdocwarn() into mwarn()


Revision tags: OPENBSD_4_6_BASE
# 1.10 23-Jun-2009 schwarze

sync to 1.7.20: like for the -man case, add an nchild counter to the -mdoc
nodes, simplifying the validation code; no functional change


# 1.9 19-Jun-2009 schwarze

sync to 1.7.19: more elegant section handling


# 1.8 18-Jun-2009 schwarze

sync to 1.7.19: comment cleanup; no functional change


# 1.7 18-Jun-2009 schwarze

sync to 1.7.19: improved comment handling


# 1.6 18-Jun-2009 schwarze

complete sync to 1.7.17: garbage collect unused functions
mdoc_msg, mdoc_pmsg, mdoc_vmsg, and mdoc_nwarn


# 1.5 15-Jun-2009 schwarze

bring back miod@'s "real functions" patch (rev. 1.2)
which was erroneously backed out in rev. 1.4, sorry;
ok kristaps@


# 1.4 15-Jun-2009 schwarze

sync to 1.7.16:
reduce code duplication in warning and error reporting functions
while here, garbage collect three unused function prototypes


# 1.3 14-Jun-2009 schwarze

sync to 1.7.16: comments, whitespace and spelling fixes; no functional change


# 1.2 15-Apr-2009 miod

Replace variadic macros with real functions, so that this compiles on
platforms still using gcc 2.
ok deraadt@


# 1.1 06-Apr-2009 kristaps

Initial check-in of mandoc for formatting manuals. ok deraadt@


# 1.160 14-Dec-2018 schwarze

Almost mechanical diff to remove the "struct mparse *" argument
from mandoc_msg(), where it is no longer used.
While here, rename mandoc_vmsg() to mandoc_msg() and retire the
old version: There is really no point in having another function
merely to save "%s" in a few places.
Minus 140 lines of code.


# 1.159 04-Dec-2018 schwarze

Clean up the validation of .Pp, .PP, .sp, and .br. Make sure all
combinations are handled, and are handled in a systematic manner.
This resolves some erratic duplicate handling, handles a number of
missing cases, and improves diagnostics in various respects.

Move validation of .br and .sp to the roff validation module
rather than doing that twice in the mdoc and man validation modules.
Move the node relinking function to the roff library where it belongs.

In validation functions, only look at the node itself, at previous
nodes, and at descendants, not at following nodes or ancestors,
such that only nodes are inspected which are already validated.


Revision tags: OPENBSD_6_4_BASE
# 1.158 17-Aug-2018 schwarze

Remove more pointer arithmetic passing via regions outside the array
that is undefined according to the C standard. Robert Elz <kre at
munnari dot oz dot au> pointed out i wasn't quite done yet.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.157 11-Aug-2017 schwarze

Make the "new sentence, new line" check stricter, allowing digits
in the last two letters of the last word of the sentence.
No false positives in base or Xenocara.
Suggested by and OK jmc@.


# 1.156 17-Jun-2017 schwarze

correct handling of blank lines after \c


# 1.155 07-Jun-2017 schwarze

Also catch "new sentence, new line" if there are three blanks
between the sentences. Thomas Klausner says he has seen some
of these, and i don't see any false positives.


# 1.154 07-Jun-2017 schwarze

Make "new sentence, new line" detection stricter:
Also catch cases where the new sentence starts with a one-letter word
and the input line is broken right after that word.
Suggested by Thomas Klausner <wiz @ NetBSD>.

It's merely a three-bit diff, changing one byte from 0x34 to 0x33,
so what can possibly go wrong...


# 1.153 05-May-2017 schwarze

Move .sp to the roff modules. Enough infrastructure is in place
now that this actually saves code: -70 LOC.


# 1.152 29-Apr-2017 schwarze

Parser unification: use nice ohashes for all three request and macro tables;
no functional change, minus two source files, minus 200 lines of code.


# 1.151 24-Apr-2017 schwarze

Continue parser unification:
* Make enum rofft an internal interface as enum roff_tok in "roff.h".
* Represent mdoc and man macros in enum roff_tok.
* Make TOKEN_NONE a proper enum value and use it throughout.
* Put the prologue macros first in the macro tables.
* Unify mdoc_macroname[] and man_macroname[] into roff_name[].


Revision tags: OPENBSD_6_1_BASE
# 1.150 03-Mar-2017 schwarze

remove a few redundant conditions that jsg@ found with cppcheck


# 1.149 16-Feb-2017 schwarze

Remove the ENDBODY_NOSPACE flag, simplifying the code.

Comparing to groff output, it appears that all cases where it was used
and made a difference actually require the opposite, ENDBODY_SPACE.

I have no idea why i added it back in 2010; maybe to compensate for
some other bug that has long been fixed.


# 1.148 28-Jan-2017 schwarze

Add a warning "new sentence, new line".
This does not attempt to pinpoint each and every offender, but
instead tries very hard to avoid false positives: Currently, there
are only two false positives in the whole OpenBSD base system.
Only do this in mdoc(7), not in man(7), because manuals written
in man(7) typically have much worse problems than this.
OK jmc@ on a previous version of the patch


# 1.147 10-Jan-2017 schwarze

unify names of AST node flags; no change of cpp output


# 1.146 20-Aug-2016 schwarze

If a column list starts with implicit rows (that is, rows without .It)
and roff-level nodes (e.g. tbl or eqn) follow, don't run into an
assertion. Instead, wrap the roff-level nodes in their own row.
Issue found by tb@ with afl(1).


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.145 30-Oct-2015 schwarze

If a .Bd block has no arguments at all, drop the block and only keep
its contents. Removing a gratuitious difference to groff output
found after a related bug report from krw@.


# 1.144 20-Oct-2015 schwarze

In order to become able to generate syntax tree nodes on the roff(7)
level, validation must be separated from parsing and rewinding.
This first big step moves calling of the mdoc(7) post_*() functions
out of the parser loop into their own mdoc_validate() pass, while
using a new mdoc_state() module to make syntax tree state handling
available to both the parser loop and the validation pass.


# 1.143 12-Oct-2015 schwarze

To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.


# 1.142 06-Oct-2015 schwarze

modernize style: "return" is not a function; ok cmp(1)


Revision tags: OPENBSD_5_8_BASE
# 1.141 23-Apr-2015 schwarze

Unify mdoc_deroff() and man_deroff() into a common function deroff().
No functional change except that for mdoc(7), it now skips leading
escape sequences just like it already did for man(7).
Escape sequences rarely occur in mdoc(7) code and if they do,
skipping them is an improvement in this context.
Minus 30 lines of code.


# 1.140 23-Apr-2015 schwarze

Get rid of two empty wrapper functions. No functional change.


# 1.139 19-Apr-2015 schwarze

Unify trickier node handling functions.
* man_elem_alloc() -> roff_elem_alloc()
* man_block_alloc() -> roff_block_alloc()
The functions mdoc_elem_alloc() and mdoc_block_alloc() remain for
now because they need to do mdoc(7)-specific argument processing.


# 1.138 19-Apr-2015 schwarze

Unify some node handling functions that use TOKEN_NONE.
* mdoc_word_alloc(), man_word_alloc() -> roff_word_alloc()
* mdoc_word_append(), man_word_append() -> roff_word_append()
* mdoc_addspan(), man_addspan() -> roff_addtbl()
* mdoc_addeqn(), man_addeqn() -> roff_addeqn()
Minus 50 lines of code, no functional change.


# 1.137 19-Apr-2015 schwarze

Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.


# 1.136 19-Apr-2015 schwarze

Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.


# 1.135 18-Apr-2015 schwarze

Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.


# 1.134 18-Apr-2015 schwarze

Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.


# 1.133 18-Apr-2015 schwarze

Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.


# 1.132 18-Apr-2015 schwarze

Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.


# 1.131 18-Apr-2015 schwarze

Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.


# 1.130 02-Apr-2015 schwarze

Third step towards parser unification:
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.


# 1.129 02-Apr-2015 schwarze

Second step towards parser unification:
Replace struct mdoc_node and struct man_node by a unified struct roff_node.
To be able to use the tok member for both mdoc(7) and man(7) without
defining all the macros in roff.h, sacrifice a tiny bit of type safety
and make tok an int rather than an enum.
Almost mechanical, no functional change.
Written on the Eurostar from Bruxelles to London on the way to p2k15.


# 1.128 02-Apr-2015 schwarze

First step towards parser unification:
Replace enum mdoc_type and enum man_type by a unified enum roff_type.
Almost mechanical, no functional change.
Written on the ICE train from Frankfurt to Bruxelles on the way to p2k15.


Revision tags: OPENBSD_5_7_BASE
# 1.127 12-Feb-2015 schwarze

Do not confuse .Bl -column lists that just broken another block
with newly opened .Bl -column lists;
fixing an assertion failure jsg@ found with afl:
test case #481, Bl It Bl -column It Bd El text text El


# 1.126 12-Feb-2015 schwarze

Delete the mdoc_node.pending pointer and the function calculating
it, make_pending(), which was the most difficult function of the
whole mdoc(7) parser. After almost five years of maintaining this
hellhole, i just noticed the pointer isn't needed after all.

Blocks are always rewound in the reverse order they were opened;
that even holds for broken blocks. Consequently, it is sufficient
to just mark broken blogs with the flag MDOC_BROKEN and breaking
blocks with the flag MDOC_ENDED. When rewinding, instead of iterating
the pending pointers, just iterate from each broken block to its
parents, rewinding all that are MDOC_ENDED and stopping after
processing the first ancestor that it not MDOC_BROKEN. For ENDBODY
markers, use the mdoc_node.body pointer in place of the former
mdoc_node.pending.

This also fixes an assertion failure found by jsg@ with afl,
test case #467 (Bo Bl It Bd Bc It), where (surprise surprise)
the pending pointer got corrupted.

Improved functionality, minus one function, minus one struct field,
minus 50 lines of code.


# 1.125 05-Feb-2015 schwarze

Simplify by deleting the "lastline" member of struct mdoc_node.
Minus one struct member, minus 17 lines of code, no functional change.


# 1.124 02-Feb-2015 schwarze

Get rid of all calls to rew_sub() in blk_exp_close(); only ten calls
remain in other functions. As a bonus, this fixes an assertion failure
jsg@ found some time ago with afl (test case 982) and improves minor
details in error reporting.


# 1.123 15-Jan-2015 schwarze

Fatal errors no longer exist.
If a file can be opened, mandoc will produce some output;
at worst, the output may be almost empty.
Simplifies error handling and frees a message type for future use.


# 1.122 28-Nov-2014 schwarze

Simplify by making the eqn and tbl steering functions void;
no functional change, minus 15 lines of code.


# 1.121 28-Nov-2014 schwarze

Simplify by making the mdoc parser callbacks void, and some cleanup;
no functional change, minus 50 lines of code.


# 1.120 28-Nov-2014 schwarze

Simplify the code by making various mdoc parser helper functions void.
No functional change, minus 130 lines of code.


# 1.119 28-Nov-2014 schwarze

Simplify code by making mdoc validation handlers void.
No functional change, minus 90 lines of code.


# 1.118 19-Nov-2014 schwarze

Escape sequences terminate high-level macro names, and when doing so,
they are ignored, just in the same way as for request names
and for low-level macro names.
This also cures a warning in the pod2man(1) preamble.


# 1.117 20-Oct-2014 schwarze

correct the spacing after in-line equations
that start at the beginning of an input line
but end before the end of an input line


# 1.116 20-Oct-2014 schwarze

correct spacing before inline equations


# 1.115 16-Oct-2014 schwarze

Implement in-line equations, much needed by Xenocara manuals.
Put the steering into the roff parser rather than into the mdoc
parser such that it works for all macro languages and on both text
and macro lines.
Line breaks and blank characters generated before and after in-line
equations are not perfect yet, but let's do one thing at a time.


# 1.114 06-Sep-2014 schwarze

Simplify by handling empty request lines at the one logical place
in the roff parser instead of in three other places in other parsers.
No functional change.


# 1.113 08-Aug-2014 schwarze

Bring the handling of defective prologues even closer to groff,
in particular relaxing the distinction between prologue and body
and further improving messages.
* The last .Dd wins and the last .Os wins, even in the body.
* The last .Dt before the first body macro wins.
* Missing title in .Dt defaults to UNTITLED. Warn about it.
* Missing section in .Dt does not default to 1. But warn about it.
* Do not warn multiple times about the same mdoc(7) prologue macro.
* Warn about missing .Os.
* Incomplete .TH defaults to empty strings. Warn about it.


# 1.112 08-Aug-2014 schwarze

mention requests and macros in more messages


# 1.111 08-Aug-2014 schwarze

Simplify: replace one global flag by one local variable
and remove three unused global flags. No functional change.


Revision tags: OPENBSD_5_6_BASE
# 1.110 09-Jul-2014 schwarze

mark defos as const; nobody needs to change it,
and it is occasionally useful to be able to pass literal strings


# 1.109 07-Jul-2014 schwarze

no need to skip content before first section header


# 1.108 06-Jul-2014 schwarze

Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.


# 1.107 02-Jul-2014 schwarze

Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!


# 1.106 01-Jul-2014 schwarze

Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.


# 1.105 20-Jun-2014 schwarze

Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.

1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue

Started on the plane back from Ottawa.


# 1.104 25-Apr-2014 schwarze

Fix a minor optimization i broke in bsd.lv rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!


# 1.103 20-Apr-2014 schwarze

KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change


# 1.102 30-Mar-2014 schwarze

Implement the roff(7) .ll (line length) request.
Found by naddy@ in the textproc/enchant(1) port.
Of course, do not use this in new manuals.


# 1.101 23-Mar-2014 schwarze

If an .Nd block contains macros, avoid fragmented entries in mandocdb(8),
instead use the .Nd content recursively.
Improves a couple of index entries in base.


# 1.100 21-Mar-2014 schwarze

avoid repetitive code for asprintf error handling


# 1.99 21-Mar-2014 schwarze

The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.


Revision tags: OPENBSD_5_5_BASE
# 1.98 05-Jan-2014 schwarze

Add an option -Q (quick) to mandocdb(8)
for accelerated generation of reduced-size databases.

Implement this by allowing the parsers to optionally
abort the parse sequence after the NAME section.

While here, garbage collect the unused void *arg attribute
of struct mparse and mparse_alloc().

This reduces the processing time of mandocdb(8) on /usr/share/man
by a factor of 2 and the database size by a factor of 4.
However, it still takes 5 times the time and 6 times the space
of makewhatis(8), so more work is clearly needed.


# 1.97 30-Dec-2013 schwarze

Simplify: Remove an unused argument from the mandoc_eos() function.
No functional change.


# 1.96 24-Dec-2013 schwarze

When deciding whether two consecutive macros are on the same input line,
we have to compare the line where the first one *ends* (not where it begins)
to the line where the second one starts.
This fixes the bug that .Bk allowed output line breaks right after block
macros spanning more than one input line, even when the next macro follows
on the same line.


# 1.95 21-Oct-2013 schwarze

There are three kinds of input lines: text lines, macros taking
positional arguments (like Dt Fn Xr) and macros taking text as
arguments (like Nd Sh Em %T An). In the past, even the latter put
each word of their arguments into its own MDOC_TEXT node; instead,
concatenate arguments unless delimiters, keeps or spacing mode
prevent that. Regarding mandoc(1), this is internal refactoring,
no output change intended.

Once we will switch mandocdb(8) from DB to SQLite in the future,
this is going to be required to support search expressions crossing
word boundaries, and it will reduce both database sizes and build
times by a bit more than 5% each.


# 1.94 03-Oct-2013 schwarze

Support setting arbitrary roff(7) number registers,
preserving read support for the ".nr nS" SYNOPSIS state register;
read support for arbitrary registers is still not available.

Inspired by NetBSD roff.c rev. 1.18 (Christos Zoulas, March 21, 2013),
but implemented differently. I don't want to have yet another different
implementation of a hash table in mandoc - it would be the second one
in roff.c alone and the fifth one in mandoc grand total.
Instead, i designed and implemented roff_setreg() and roff_getreg()
to be similar to roff_setstrn() and roff_getstrn().

Once we feel the need to optimize, we can introduce one common
hash table implementation for everything in mandoc.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.93 17-Nov-2012 schwarze

Cleanup naming of local variables to make the code easier on the eye:
Settle for "struct man *man", "struct mdoc *mdoc", "struct meta *meta"
and avoid the confusing "*m" which was sometimes this, sometimes that.
No functional change.

ok kristaps@ some time ago


# 1.92 16-Nov-2012 schwarze

Fix a crash triggered by .Bl -tag .It Xo .El .Sh found by florian@.

* When allocating a body end marker, copy the pointer to the normalized
block information from the body block, avoiding the risk of subsequent
null pointer derefence.
* When inserting the body end marker into the syntax tree, do not try to
copy that pointer from the parent block, because not being a direkt child
of the block it belongs to is the whole point of a body end marker.
* Even non-callable blocks (like Bd and Bl) can break other blocks;
when this happens, postpone closing them out in the usual way.


Revision tags: OPENBSD_5_2_BASE
# 1.91 18-Jul-2012 schwarze

Fix handling of paragraph macros inside lists:
* When they are trailing the last item, move them outside the list.
* When they are trailing any other none-compact item, drop them.

Improves formatting of 40 pages, e.g. grep(1), ksh(1), netstat(1),
ath(4), bsd.port.mk(5), pf.conf(5), mount(8), crypto(9).


# 1.90 18-Jul-2012 schwarze

The mdoc(7) \*(Ba predefined string actually forces roman font;
that's stupid because it may break enclosing font changes,
but let's do the same for groff bug compatibility.

--> Never use \*(Ba, use just plain "|"! <--

Also, predefined strings are already expanded by the roff(7) parser,
so the mdoc(7) parser has to look for the expanded string.

Formatting improvements in ksh(1), less(1), atan2(3),
hostapd.conf(5), snmpd.conf(5), and mknod(8).


# 1.89 16-Jul-2012 schwarze

Several -mdoc parser improvements related to vertical spacing:
* So far, .Pp and .Lp were removed before paragraph type blocks.
* Now also remove .br before paragraph type blocks.
* Treat .Lp as a paragraph like .Pp, so remove .Pp, .Lp, .br before it.
* Do not treat .sp as a paragraph, don't remove anything before it.
* After .Sh, .Ss, .Pp, and .Lp, remove .Pp, .Lp, .sp, .br, and blank lines.
* After .sp and .br, remove .br.


# 1.88 07-Jul-2012 schwarze

Support the .cc request; code by kristaps@, tests by me.
Needed for sqlite3(1) as reported by espie@.


# 1.87 24-May-2012 schwarze

Support -Ios='OpenBSD 5.1' to override uname(3) as the source of the
default value for the mdoc(7) .Os macro.
Needed for man.cgi on the OpenBSD website.

Problem with man.cgi first noticed by deraadt@;
beck@ and deraadt@ agree with the way to solve the issue.


Revision tags: OPENBSD_5_1_BASE
# 1.86 30-Sep-2011 schwarze

implement .Ap .Bd .Bo .Bq .D1 .Ic .Lp .Oo .Pf .Po .Ss .Sx .Sy .br .sp
implement .Bl -bullet
add more information to the .TH line
escape dots at the beginnings of lines
add trailing newline character at the end of the file
do not misinterpret the ROOT block as .Ap


# 1.85 18-Sep-2011 schwarze

sync to version 1.11.7 from kristaps@
main new feature: support the roff(7) .tr request
plus various bugfixes and some refactoring

regressions are so minor that it's better to get this in
and fix them in the tree


# 1.84 18-Sep-2011 schwarze

sync to version 1.11.5:
adding an implementation of the eqn(7) language
by kristaps@

So far, only .EQ/.EN blocks are handled, in-line equations are not, and
rendering is not yet very pretty, but the parser is fairly complete.


Revision tags: OPENBSD_5_0_BASE
# 1.83 24-Apr-2011 schwarze

Merge version 1.11.1:
Again lots of cleanup and maintenance work by kristaps@.
- simplify error reporting: less function pointers, more mandoc_[v]msg
- main: split document parsing out of main.c into read.c
- roff, mdoc, man: improved recognition of control characters
- roff: better handling of if/else stack overflows
- roff: add some predefined strings for backward compatibility
- mdoc, man: empty sections are not errors
- mdoc: move delimiter handling to libmdoc
- some header restructuring and some minor features and fixes
This merge causes two minor regressions
that i will fix in separate commits right afterwards.


# 1.82 21-Apr-2011 schwarze

Merge version 1.10.10:
lots of cleanup and maintenance work by kristaps@.
- move some main.c globals into struct curparse
- move mandoc_*alloc to mandoc.h such that all code can use them
- make mandoc_isdelim available to formatting frontends
- dissolve mdoc_strings.c, move the code where it is used
- make all error reporting functions void, their return values were useless
- and various minor cleanups and fixes


# 1.81 20-Mar-2011 schwarze

Import the foundation for eqn(7) support.
Written by kristaps@.

For now, i'm adding one line to each of the four frontends
to just pass the input text through to the output,
not yet interpreting any of then eqn keywords.


# 1.80 07-Mar-2011 schwarze

Clean up date handling,
as a first step to get rid of the frequent petty warnings in this area:
- always store dates as strings, not as seconds since the Epoch
- for input, try the three most common formats everywhere
- for unrecognized format, just pass the date though verbatim
- when there is no date at all, still use the current date
Originally triggered by a one-line patch from Tim van der Molen,
<tbvdm at xs4all dot nl>, which is included here.
Feedback and OK on manual parts from jmc@.
"please check this in" kristaps@


Revision tags: OPENBSD_4_9_BASE
# 1.79 10-Feb-2011 schwarze

Tbl code maintenance by kristaps@.
- Remember the line-number of a tbl_span, and use it in messages.
- Put *_span_alloc() functions right into the *_addspan() ones,
since these are the only places they are called from.


# 1.78 09-Jan-2011 schwarze

Make sure coding errors cannot make us miss fatal parsing errors
by assert(3)ing valid parser state in the main parsing functions;
from kristaps@.


# 1.77 04-Jan-2011 schwarze

Merge kristaps@' cleaner tbl integration, removing mine;
there are still a few bugs, but fixing these will be easier in tree.


# 1.76 01-Jan-2011 schwarze

Clean up {mdoc,man}_{p,v}msg invocations:
Ignore the return values, they are constant anyway.
From kristaps@.


# 1.75 29-Dec-2010 schwarze

Reorg by Kristaps: In libmdoc, replace the union of pointers to structs
of macro-specific data by a pointer to a union of structs, which makes the
code simpler and more robust at the expense of a small memory overhead.
Merging was somewhat difficult because we mustn't break tbl(1) support
which the bsd.lv version does not yet have.


# 1.74 26-Dec-2010 schwarze

Behave more like groff (both old and new): Specifying both .%T and .%J in
an .Rs block causes the title to be quoted instead of underlined, such
that journal title and article title appear visually different.
Original diff from kristaps@, simplified by me, tweaked again by kristaps@.


# 1.73 21-Dec-2010 schwarze

Migrate .An to use a pointer to its data, like everybody else.
In preparation for a simpler ref-counted system for node data.
From kristaps@.


# 1.72 21-Dec-2010 schwarze

Vertical spacing improvements from kristaps@, small tweaks by me:
Add a "last child" member to struct mdoc_node.
Remove .Pp or .Lp if it is the first or last child of an .Sh or .Ss body.
Thus, no need to do the same in the front-ends any longer.
Tolerate some cases of .Pp inside .Bl.


# 1.71 02-Dec-2010 schwarze

Properly initialize the manual section to a default when .Dt is missing.
Without this, we died on an assertion.
Problem noted and patch provided by kristaps@.


# 1.70 01-Dec-2010 schwarze

Merge mdoc_action.c into mdoc_validate.c, because having two places to do
basically the same things just causes code duplication and confusion.
Work by kristaps@, including a few bugfixes he found during the merge,
and reapplying OpenBSD changes on top.


# 1.69 28-Nov-2010 schwarze

To avoid FATAL errors, we have been parsing and ignoring the roff
requests .am, .ami, .am1, .dei, and .rm for a long time.
Since ignoring them can (rarely) cause information loss and serious
misformatting, throw an ERROR: NOT IMPLEMENTED when finding them.
Implementing them would not be too difficult, but they are so rare
in practice that i can find better use for my time right now.

In this context,
- Put the string "NOT IMPLEMENTED" into two other error messages
as well, to distinguish them from those caused by broken input.
- Print the string "unknown macro" once, not twice in the error message
associated with MANDOCERR_MACRO, and begin printing the buffer at the
point where the unknown macro really is, not at the start of line.


# 1.68 16-Oct-2010 schwarze

Do not abort() on tbl errors, reduce the risk that tbl stuff kills a build,
and provide more useful tbl error messages in a non-intrusive way.


# 1.67 16-Oct-2010 schwarze

Support tbl(1) code embedded into mdoc(7) input files.
Very similar to what i have done in man(7) yesterday.
Allows to build cpu(4) on HPPA, wi(4), and phantasia(6).
Now we are able to build all tbl code in base.


# 1.66 27-Sep-2010 schwarze

Merge the last bits of 1.10.6 (released today), most were already in:
* ignore double-.Pp
* ignore .Pp before .Bd and .Bl (unless -compact in specified)
* avoid double blank line upon .Pp, .br and friends in literal context
* cast enums to int when passing them to exit(3) to please lint(1)
While merging, fix a regression introduced by kristaps@:
Outside literal mode, double blank lines must both be printed.
To achieve this again after kristaps@ improvements in 1.10.6,
treat such blank lines as .sp (instead of .Pp as in 1.10.5)
and drop .Pp before .sp just like dropping .Pp before .Pp.


# 1.65 20-Aug-2010 schwarze

Implement a simple, consistent user interface for error handling.
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.


# 1.64 18-Aug-2010 schwarze

Simplify and sync the code and comments for copying the macro name
in man_pmacro() and mdoc_pmacro(). In particular, no need to use
isgraph(3) here, that has already been done in main.c.
Joint work by Kristaps and myself, ok kristaps@.


Revision tags: OPENBSD_4_8_BASE
# 1.63 07-Aug-2010 schwarze

Groff allows the initial macro on a line to be delimited by a space
of by a tab; so allow the tab in mandoc, too.
Bug found by me, fix by kristaps@, "sure" deraadt@.


# 1.62 16-Jul-2010 schwarze

Text ending in a full stop, exclamation mark or question mark
should not flag the end of a sentence if:

1) The punctuation is followed by closing delimiters
and not preceded by alphanumeric characters, like in
"There is no full stop (.) in this sentence"

or

2) The punctuation is a child of a macro
and not preceded by alphanumeric characters, like in
"There is no full stop
.Pq \&.
in this sentence"

jmc@ and sobrado@ like this


# 1.61 13-Jul-2010 schwarze

Merge release 1.10.4 (all code by kristaps@), providing four new features:
1) Proper .Bk support: allow output line breaks at input line breaks,
but keep input lines together in the output, finally fixing
synopses like aucat(1), mail(1) and tmux(1).
2) Mostly finished -Tps (PostScript) output.
3) Implement -Thtml output for .Nm blocks and .Bk -words.
4) Allow iterative interpolation of user-defined roff(7) strings.
Also contains some minor bugfixes and some performance improvements.


# 1.60 01-Jul-2010 schwarze

In the mdoc(7) parser, inspect roff registers early such that all parts
of the parser can use the resulting cues. In particular, this allows
to use .nr nS to force SYNOPSIS-style .Nm indentation outside the
SYNOPSIS as needed by ifconfig(8).

To actually make this useable, .Pp must rewind .Nm, or the rest of the
section would end up indented. Implement a quick hack for now,
a generic solution can be designed later.

ok kristaps@ sobrado@


# 1.59 29-Jun-2010 schwarze

Support for badly nested blocks, written around the time of
the Rostock mandoc hackathon and tested and polished since,
supporting constructs like:

.Ao Bo Ac Bc (exp breaking exp)
.Aq Bo eol Bc (imp breaking exp)
.Ao Bq Ac eol (exp breaking imp)
.Ao Bo So Bc Ac Sc (double break, inner before outer)
.Ao Bo So Ac Bc Sc (double break, outer before inner)
.Ao Bo Ac So Bc Sc (broken breaker)
.Ao Bo So Bc Do Ac Sc Dc (broken double breaker)

There are still two known issues which are tricky:

1) Breaking two identical explicit blocks (Ao Bo Bo Ac or Aq Bo Bo eol)
fails outright, triggering a bogus syntax error.
2) Breaking a block by two identical explicit blocks (Ao Ao Bo Ac Ac Bc
or Ao Ao Bq Ac Ac eol) still has a minor rendering error left:
"<ao1 <ao2 [bo ac2> ac1> bc]>" should not have the final ">".

We can fix these later in the tree, let's not grow this diff too large.

"get it in" kristaps@


# 1.58 27-Jun-2010 schwarze

Full .nr nS support, unbreaking the kernel manuals.

Kristaps coded this from scratch after reading my .nr patch;
it is simpler and more powerful.

Registers live in struct regset in regs.h, struct man and struct mdoc
contain pointers to it. The nS register is cleared when parsing .Sh.
Frontends respect the MDOC_SYNPRETTY flag set in mdoc node_alloc.


# 1.57 26-Jun-2010 schwarze

merge release 1.10.2
* bug fixes:
- interaction of ASCII_HYPH with special chars (found by Ulrich Spoerlein)
- handling of roff conditionals (found by Ulrich Spoerlein)
- .Bd -offset will no more default to 6n
* maintenance:
- more caching of .Bd and .Bl arguments for efficiency
- deconstify man(7) validation routines
- add FreeBSD library names (provided by Ulrich Spoerlein)
* start PostScript font-switching


# 1.56 06-Jun-2010 schwarze

Merge bsd.lv version 1.10.1 (to be released soon).

The main step forward is that this now has *much* better .Bl -column
support, now supporting many manuals that previously errored out
without producing any output.

Other fixes include:
* do not die from multiple list types, use the first and warn
* in .Bl without a type, default to -item
* various tweaks to .Dt
* fix .In, .Fd, .Ft, .Fn and .Fo formatting
* some documentation fixes and additions
* and fix a couple of bugs reported by Ulrich Spoerlein:
* better support for roff block-end "\}" without a preceding dot
* .In must not break the line outside SYNOPSIS
* spelling in some error messages

While merging, fix one regression in .In spacing
that needs to go to bsd.lv, too.


# 1.55 26-May-2010 schwarze

When a word does not fully fit onto the output line, but it contains
at least one hyphen, we already had support for breaking the line a the
last fitting hyphen. This patch improves this functionality by only
breaking at hyphens in free-form text, and by not breaking at hyphens
* at the beginning or end of a word or
* immediately preceded or followed by another hyphen or
* escaped by a preceding backslash.

Before this patch, differences in break-at-hyphen support were one
of the major sources of noise in automatic comparisons to mdoc(7)
groff output. Now, the remaining differences are hard to find among
the noise coming from other sources.

Where there are still differences, what we do seems to be better than
what groff does, see e.g. the chio(1) exchange and position commands
for one of the now rare examples.

idea and coding by kristaps@

Besides, this was the last substantial code difference left
between bsd.lv and openbsd.org. We are now in full sync.


# 1.54 23-May-2010 schwarze

Unified error and warning message system for all of mandoc,
featuring three message levels, as agreed during the mandoc hackathon:
* FATAL parser failure, cannot produce any output from this input file:
eventually, we hope to convert most of these to ERRORs.
* ERROR, meaning mandoc cannot cope fully with the input syntax and will
probably lose information or produce structurally garbled output;
it will try to produce output anyway but exit non-zero at the end,
which is eventually intended to make the ports infrastructure happy.
* WARNING, meaning you should clean up the input file, but output
is probably mostly OK, so this will not cause error-exit at the end.
This commit is mostly just converting the old system to the new one; before
the classification will become really reliable, we must check all messages.

In particular,
* set up a new central message string table in main.c
* drop the old message string tables from man.c and mdoc.c
* get rid of the piece-meal merr enums in libman and libmdoc
* reduce number of error/warning functions from 16 to 6 (still a lot...)

While here, handle a few problems more gracefully:
* allow .Rv and .Ex to work without a prior .Nm
* allow .An to ignore extra arguments
* allow undeclared columns in .Bl -column

Written by kristaps@.


# 1.53 20-May-2010 schwarze

Support nested roff instructions:
* allow roff_parseln() to be re-run
* allow roff_parseln() to manipulate the line buffer offset
* support the offset in the man and mdoc libraries
* adapt .if, .ie, .el, .ig, .am* and .de* support
* interpret some instructions even in conditional-negative context
Coded by kristaps during the last day of the mandoc hackathon.

To avoid regressions in the OpenBSD tree, commit this together
with some small local additions:
* detect roff block end "\}" even on macro lines
* actually implement the ".if n" conditional
* ignore .ds, .rm and .tr in libroff

Also back my old .if/.ie/.el-handling out of libman, reverting:
man.h 1.15 man.c 1.25 man_macro.c 1.15 man_validate.c 1.19
man_action.c 1.15 man_term.c 1.28 man_html.c 1.9.


# 1.52 16-May-2010 schwarze

Rewrite the main mdoc text parser, mdoc_ptext()
to make it easier to understand and to fix various bugs:
* strip white space from the end MDOC_TEXT elements in literal mode
* in literal mode, a line may be blank even when containing tabs
* escaped backslashes do not escape following characters
ok kristaps@


# 1.51 16-May-2010 schwarze

allow the single quote as a control character in place of the dot
at all relevant places;
from kristaps@


# 1.50 15-May-2010 schwarze

allow non-numeric manual sections in -mdoc;
while here, allow LIBRARY in section 9;
by kristaps@


# 1.49 15-May-2010 schwarze

various improvements regarding errors and warnings Joerg Sonnenberger:
* If the last -column .Bl isn't specified, it is auto-sized.
* An invalid .St argument should be a warning, not an error.
Just put the argument into the output.
* An invalid .At argument should be a warning, not an error.
Just print the argument, like new groff does.
* Remove warnings concerning manual section (like 1, 6, 8).
It was only used for .Ex and not really useful.
* Remove warnings concerning page section (like SYNOPSIS).
These were only used for .Fd and .Lb and not really useful.


# 1.48 14-May-2010 schwarze

Integrate kristaps@' end-of-sentence (EOS) framework
which is simpler and more powerful than mine, and remove mine.

* man(7) now has EOS handling, too
* put EOS detection into its own function in libmandoc
* use node and termp flags to communicate the EOS condition
* no more EOS pseudo-macro
* no more non-printable EOS marker character on the formatter level

This slightly breaks EOS detection after trailing punctuation
in mdoc(7) macros, but that will be restored soon.


# 1.47 14-May-2010 schwarze

Merge 1.9.25, keeping local patches;
this does not merge kristaps' end-of-sentences handling yet,
i will check that separately. This one includes:
* handle \*(Ba as a delimiter
* introduce ARGS_PEND for .Bl -column .It end-of-line special casing
* section ordering: expect EXIT STATUS at the right place
* line break fixes in SYNOPSIS
* allow literal contexts to have arbitrary line lengths
* the input file column number can not be used to identify the beginning
of a line because white space is allowed after the initial '.'
* proper leading spaces in -man -Tascii mode
* do not let Lb break lines in -mdoc -Thtml LIBRARY


# 1.46 14-May-2010 schwarze

merge 1.9.24, keeping local patches; some changes:
* preserve multiple consecutive space characters in input
* do not restrict .Cd and .Rv to certain sections (requested by Joerg)
* do not run lookup() on quoted words
* enum return types for mdoc_args and mdoc_argv
* fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel)
* various lint and manual fixes


# 1.45 08-May-2010 schwarze

merge bsd.lv rev 1.123:
sync mdoc.c's static function names with man.c


# 1.44 08-May-2010 schwarze

handle text lines beginning with \." as comments, like groff does,
even though this is not correct comment syntax (so warn, too)
reported by Claus Assmann on misc@, fix by kristaps@


# 1.43 04-May-2010 schwarze

end-of-sentence markers at the end of .Fn argument lists
ruin indentation of the next line in the SYNOPSIS section;
bug found by jacekm@ in err(3)


# 1.42 27-Apr-2010 schwarze

Fix a subtle bug noticed by naddy@ in pftop(8), thanks!

When converting blank lines to .Pp outside literal context,
it could happen that the following node ended up as a child
of the .Pp element, but it must always be a sibling.


# 1.41 22-Apr-2010 schwarze

Fix a segfault reported by nicm@, introduced in rev. 1.38.
When finding a blank line, trying to parse it is a bad idea.
Instead, after adding .Pp to the AST, just return from parsetext().


# 1.40 07-Apr-2010 schwarze

Merge the good parts of 1.9.23,
avoid the bad parts of 1.9.23, and keep local patches.

Input in general:
* Basic handling of roff-style font escapes \f, \F.
* Quoted punctuation does not count as punctuation.

mdoc(7) parser:
* Make .Pf callable; noted by Claus Assmann.
* Let .Bd and .Bl ignore unknown arguments; noted by deraadt@.
* Do not warn when .Er is used outside certain sections.
* Replace mdoc_node_free[list] by mdoc_node_delete.
* Replace #define by enum for rew*() return values.

man(7) parser:
* When .TH is missing, use default section and date.

Output in general:
* Curly braces do not count as punctuation.
* No space after .Fl w/o args when a macro follows on the same line.

HTML output:
* Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT().
* Print whitespace after, not before .Vt .Fn .Ft .Fo.

Checked that all manuals in base still build.


# 1.39 04-Apr-2010 schwarze

When the prologue lacks required information, do not error out,
but warn, set up some default values, and prod on.
Unbreaking the ports build for textproc/sgmlformat;
reported by naddy@, thanks.


# 1.38 03-Apr-2010 schwarze

* outside literal context in mdoc(7), handle blank lines like .Pp
* a missing NAME section in mdoc(7) need not be fatal

ok deraadt@


# 1.37 02-Apr-2010 schwarze

merge 1.9.22, keeping local patches
* convert mdoc tokens from #define to enum
* fix a segfault with .Xo/.Xc in explicit blocks
* Thorn is \*(Th, not \*(TH; noticed by Joerg Sonnenberger


# 1.36 25-Mar-2010 schwarze

fix a stupid out-of-bounds read access introduced in the previous
revision, in the code searching for the end of a sentence


Revision tags: OPENBSD_4_7_BASE
# 1.35 02-Mar-2010 schwarze

Proper inter-sentence spacing for mdoc(7).
When a text line or a non-block macro line in the source code ends
in any of ".!?", consider that an end of sentence (EOS).
This makes Jason's rule "new sentence, new line" even more important.
Let the parser detect the EOS and insert a token into the AST.
Let the -Tascii frontend render the EOS token as a double space before
the next word.


# 1.34 18-Feb-2010 schwarze

sync to release 1.9.15:
* corrected .Vt handling (spotted by Joerg Sonnenberger)
* corrected .Xr argument handling (based on my patch)
* removed \\ escape sequence (because it is for low-level roff only)
* warn about trailing whitespace (suggested by jmc@)
* -Txhtml support
* and some general cleanup and doc improvements


# 1.33 02-Jan-2010 schwarze

complete the sync to 1.9.15-pre2: mostly minor fixes
* bugfix: do not restore TERMP flags when leaving lists, just reset them
* and a few HTML fixes
* clarity: return width from a2width, not width+2, and adapt to it
* manual: document .Bl and .Fl
* portability: no need to escape '%' in macro names


# 1.32 22-Dec-2009 schwarze

sync to 1.9.12, mostly portability and refactoring:

correctness/functionality:
- bugfix: do not die when overstep hits the right margin
- new option: -fign-escape
- and various HTML features

portability:
- replace bzero(3) by memset(3), which is ANSI C
- replace err(3)/warn(3) by perror(3)/exit(3), which is ANSI C
- iuse argv[0] instead of __progname
- add time.h to various files for FreeBSD compilation

simplicity:
- do not allocate header/footer data dynamically in *_term.c
- provide and use malloc frontends that error out on failure

for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.31 27-Oct-2009 schwarze

sync to 1.9.11: adapt printing of dates to groff conventions,
NetBSD portability fixes and some minor bugfixes and feature enhancements;
also checked that my hyphenation code still works on top of this


# 1.30 21-Oct-2009 schwarze

sync to 1.9.9, featuring:
* -Thtml output mode
* roff scaling units
* and some minor fixes
for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.29 19-Oct-2009 schwarze

sync to 1.9.6: multiple improvements to references (.Rs)
* validate and order .Rs child nodes
* underline book title (.%B) and issuer (.%I)
* enclose title of article (.%T) in quotes
* avoid calling mdoc_verr directly, use a proper error code instead


# 1.28 19-Oct-2009 schwarze

sync to 1.9.6: u_char lives in <sys/types.h>
noticed by uqs at spoerlein dot net on FreeBSD,
where <stdlib.h> does not include <sys/types.h>


# 1.27 21-Sep-2009 schwarze

sync to 1.9.5: lookup hashes are now static tables
shortening the code, and, according to kristaps@, speeding it up


# 1.26 18-Sep-2009 schwarze

sync to 1.9.2: non-printable characters in macro names are errors;
from joerg at netbsd dot org


# 1.25 22-Aug-2009 schwarze

sync to 1.9.1: set mdoc_next flags in mdoc_*_alloc routines, where they belong


# 1.24 22-Aug-2009 schwarze

sync to 1.9.0: polishing the core code of mdoc macro handling
1) If a macro is not parsed, do not parse it. Of course, without
parsing it, we cannot produce "macro-like parameter" warnings,
but these were useless anyway.
2) If a macro is not callable, do not print a useless warning when
it occurs as a parameter, just display the raw characters.
3) Below .Bl -column, check whether macros are callable.
4) Like groff, allow whitespace after the initial dot on macro lines.


# 1.23 22-Aug-2009 schwarze

sync to 1.8.5: Error string is now file:line:col: message.
Fixed column reporting (off by one).
Use fprintf instead of warnx for parse errors (like cc).


# 1.22 26-Jul-2009 schwarze

sync to 1.8.1: rewrite quoted literal handling correctly,
rewrite TABSEP handling in a simpler way,
and retire ECOLEMPTY, ARGS_QUOTED and ARGS_ARGVLIKE


# 1.21 26-Jul-2009 schwarze

sync to 1.8.1: removed excessively verbose EARGVPARM warning


# 1.20 26-Jul-2009 schwarze

sync to 1.8.1: support .br and .sp


# 1.19 26-Jul-2009 schwarze

sync to 1.8.1: libmdoc now breaks up free-form lines into tokens;
will simplify LITERAL mode in front-end


# 1.18 18-Jul-2009 schwarze

sync to 1.8.0: move mdoc_a2att, mdoc_a2st, and mdoc_a2lib to libmdoc


# 1.17 12-Jul-2009 schwarze

sync to 1.7.23: pass warning code to mdoc_pwarn() instead of warning message
define additional warning macro mdoc_nwarn()
remove obsolete warning functions mdoc_warn(), pwarn(), vwarn(), nwarn()
remove various now unused "enum mdoc_warn" and "enum mwarn"


# 1.16 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_perr() instead of error string
and use the so improved mdoc_nerr() at many places;
get rid of now unused static functions perr()


# 1.15 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_nerr() instead of error string
and use the so improved mdoc_nerr() at many places


# 1.14 12-Jul-2009 schwarze

sync to 1.7.23: unify the various "enum merr" into libman.h and libmdoc.h,
use it as a new argument to mdoc_err(), the same way as for for man_err(),
and use string tables instead of switch statements to select error messages


# 1.13 12-Jul-2009 schwarze

sync to 1.7.23: third step to get rid of enum mdoc_warn:
mdoc_verr is not using enum mdoc_warn, so use it at a few more places


# 1.12 12-Jul-2009 schwarze

sync to 1.7.23: second step to get rid of enum mdoc_warn:
remove type from mdoc_vwarn arguments, and use this function where apropriate


# 1.11 12-Jul-2009 schwarze

sync to 1.7.23: first step to get rid of enum mdoc_warn:
unify manwarn() and mdocwarn() into mwarn()


Revision tags: OPENBSD_4_6_BASE
# 1.10 23-Jun-2009 schwarze

sync to 1.7.20: like for the -man case, add an nchild counter to the -mdoc
nodes, simplifying the validation code; no functional change


# 1.9 19-Jun-2009 schwarze

sync to 1.7.19: more elegant section handling


# 1.8 18-Jun-2009 schwarze

sync to 1.7.19: comment cleanup; no functional change


# 1.7 18-Jun-2009 schwarze

sync to 1.7.19: improved comment handling


# 1.6 18-Jun-2009 schwarze

complete sync to 1.7.17: garbage collect unused functions
mdoc_msg, mdoc_pmsg, mdoc_vmsg, and mdoc_nwarn


# 1.5 15-Jun-2009 schwarze

bring back miod@'s "real functions" patch (rev. 1.2)
which was erroneously backed out in rev. 1.4, sorry;
ok kristaps@


# 1.4 15-Jun-2009 schwarze

sync to 1.7.16:
reduce code duplication in warning and error reporting functions
while here, garbage collect three unused function prototypes


# 1.3 14-Jun-2009 schwarze

sync to 1.7.16: comments, whitespace and spelling fixes; no functional change


# 1.2 15-Apr-2009 miod

Replace variadic macros with real functions, so that this compiles on
platforms still using gcc 2.
ok deraadt@


# 1.1 06-Apr-2009 kristaps

Initial check-in of mandoc for formatting manuals. ok deraadt@


# 1.158 17-Aug-2018 schwarze

Remove more pointer arithmetic passing via regions outside the array
that is undefined according to the C standard. Robert Elz <kre at
munnari dot oz dot au> pointed out i wasn't quite done yet.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.157 11-Aug-2017 schwarze

Make the "new sentence, new line" check stricter, allowing digits
in the last two letters of the last word of the sentence.
No false positives in base or Xenocara.
Suggested by and OK jmc@.


# 1.156 17-Jun-2017 schwarze

correct handling of blank lines after \c


# 1.155 07-Jun-2017 schwarze

Also catch "new sentence, new line" if there are three blanks
between the sentences. Thomas Klausner says he has seen some
of these, and i don't see any false positives.


# 1.154 07-Jun-2017 schwarze

Make "new sentence, new line" detection stricter:
Also catch cases where the new sentence starts with a one-letter word
and the input line is broken right after that word.
Suggested by Thomas Klausner <wiz @ NetBSD>.

It's merely a three-bit diff, changing one byte from 0x34 to 0x33,
so what can possibly go wrong...


# 1.153 05-May-2017 schwarze

Move .sp to the roff modules. Enough infrastructure is in place
now that this actually saves code: -70 LOC.


# 1.152 29-Apr-2017 schwarze

Parser unification: use nice ohashes for all three request and macro tables;
no functional change, minus two source files, minus 200 lines of code.


# 1.151 24-Apr-2017 schwarze

Continue parser unification:
* Make enum rofft an internal interface as enum roff_tok in "roff.h".
* Represent mdoc and man macros in enum roff_tok.
* Make TOKEN_NONE a proper enum value and use it throughout.
* Put the prologue macros first in the macro tables.
* Unify mdoc_macroname[] and man_macroname[] into roff_name[].


Revision tags: OPENBSD_6_1_BASE
# 1.150 03-Mar-2017 schwarze

remove a few redundant conditions that jsg@ found with cppcheck


# 1.149 16-Feb-2017 schwarze

Remove the ENDBODY_NOSPACE flag, simplifying the code.

Comparing to groff output, it appears that all cases where it was used
and made a difference actually require the opposite, ENDBODY_SPACE.

I have no idea why i added it back in 2010; maybe to compensate for
some other bug that has long been fixed.


# 1.148 28-Jan-2017 schwarze

Add a warning "new sentence, new line".
This does not attempt to pinpoint each and every offender, but
instead tries very hard to avoid false positives: Currently, there
are only two false positives in the whole OpenBSD base system.
Only do this in mdoc(7), not in man(7), because manuals written
in man(7) typically have much worse problems than this.
OK jmc@ on a previous version of the patch


# 1.147 10-Jan-2017 schwarze

unify names of AST node flags; no change of cpp output


# 1.146 20-Aug-2016 schwarze

If a column list starts with implicit rows (that is, rows without .It)
and roff-level nodes (e.g. tbl or eqn) follow, don't run into an
assertion. Instead, wrap the roff-level nodes in their own row.
Issue found by tb@ with afl(1).


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.145 30-Oct-2015 schwarze

If a .Bd block has no arguments at all, drop the block and only keep
its contents. Removing a gratuitious difference to groff output
found after a related bug report from krw@.


# 1.144 20-Oct-2015 schwarze

In order to become able to generate syntax tree nodes on the roff(7)
level, validation must be separated from parsing and rewinding.
This first big step moves calling of the mdoc(7) post_*() functions
out of the parser loop into their own mdoc_validate() pass, while
using a new mdoc_state() module to make syntax tree state handling
available to both the parser loop and the validation pass.


# 1.143 12-Oct-2015 schwarze

To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.


# 1.142 06-Oct-2015 schwarze

modernize style: "return" is not a function; ok cmp(1)


Revision tags: OPENBSD_5_8_BASE
# 1.141 23-Apr-2015 schwarze

Unify mdoc_deroff() and man_deroff() into a common function deroff().
No functional change except that for mdoc(7), it now skips leading
escape sequences just like it already did for man(7).
Escape sequences rarely occur in mdoc(7) code and if they do,
skipping them is an improvement in this context.
Minus 30 lines of code.


# 1.140 23-Apr-2015 schwarze

Get rid of two empty wrapper functions. No functional change.


# 1.139 19-Apr-2015 schwarze

Unify trickier node handling functions.
* man_elem_alloc() -> roff_elem_alloc()
* man_block_alloc() -> roff_block_alloc()
The functions mdoc_elem_alloc() and mdoc_block_alloc() remain for
now because they need to do mdoc(7)-specific argument processing.


# 1.138 19-Apr-2015 schwarze

Unify some node handling functions that use TOKEN_NONE.
* mdoc_word_alloc(), man_word_alloc() -> roff_word_alloc()
* mdoc_word_append(), man_word_append() -> roff_word_append()
* mdoc_addspan(), man_addspan() -> roff_addtbl()
* mdoc_addeqn(), man_addeqn() -> roff_addeqn()
Minus 50 lines of code, no functional change.


# 1.137 19-Apr-2015 schwarze

Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.


# 1.136 19-Apr-2015 schwarze

Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.


# 1.135 18-Apr-2015 schwarze

Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.


# 1.134 18-Apr-2015 schwarze

Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.


# 1.133 18-Apr-2015 schwarze

Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.


# 1.132 18-Apr-2015 schwarze

Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.


# 1.131 18-Apr-2015 schwarze

Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.


# 1.130 02-Apr-2015 schwarze

Third step towards parser unification:
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.


# 1.129 02-Apr-2015 schwarze

Second step towards parser unification:
Replace struct mdoc_node and struct man_node by a unified struct roff_node.
To be able to use the tok member for both mdoc(7) and man(7) without
defining all the macros in roff.h, sacrifice a tiny bit of type safety
and make tok an int rather than an enum.
Almost mechanical, no functional change.
Written on the Eurostar from Bruxelles to London on the way to p2k15.


# 1.128 02-Apr-2015 schwarze

First step towards parser unification:
Replace enum mdoc_type and enum man_type by a unified enum roff_type.
Almost mechanical, no functional change.
Written on the ICE train from Frankfurt to Bruxelles on the way to p2k15.


Revision tags: OPENBSD_5_7_BASE
# 1.127 12-Feb-2015 schwarze

Do not confuse .Bl -column lists that just broken another block
with newly opened .Bl -column lists;
fixing an assertion failure jsg@ found with afl:
test case #481, Bl It Bl -column It Bd El text text El


# 1.126 12-Feb-2015 schwarze

Delete the mdoc_node.pending pointer and the function calculating
it, make_pending(), which was the most difficult function of the
whole mdoc(7) parser. After almost five years of maintaining this
hellhole, i just noticed the pointer isn't needed after all.

Blocks are always rewound in the reverse order they were opened;
that even holds for broken blocks. Consequently, it is sufficient
to just mark broken blogs with the flag MDOC_BROKEN and breaking
blocks with the flag MDOC_ENDED. When rewinding, instead of iterating
the pending pointers, just iterate from each broken block to its
parents, rewinding all that are MDOC_ENDED and stopping after
processing the first ancestor that it not MDOC_BROKEN. For ENDBODY
markers, use the mdoc_node.body pointer in place of the former
mdoc_node.pending.

This also fixes an assertion failure found by jsg@ with afl,
test case #467 (Bo Bl It Bd Bc It), where (surprise surprise)
the pending pointer got corrupted.

Improved functionality, minus one function, minus one struct field,
minus 50 lines of code.


# 1.125 05-Feb-2015 schwarze

Simplify by deleting the "lastline" member of struct mdoc_node.
Minus one struct member, minus 17 lines of code, no functional change.


# 1.124 02-Feb-2015 schwarze

Get rid of all calls to rew_sub() in blk_exp_close(); only ten calls
remain in other functions. As a bonus, this fixes an assertion failure
jsg@ found some time ago with afl (test case 982) and improves minor
details in error reporting.


# 1.123 15-Jan-2015 schwarze

Fatal errors no longer exist.
If a file can be opened, mandoc will produce some output;
at worst, the output may be almost empty.
Simplifies error handling and frees a message type for future use.


# 1.122 28-Nov-2014 schwarze

Simplify by making the eqn and tbl steering functions void;
no functional change, minus 15 lines of code.


# 1.121 28-Nov-2014 schwarze

Simplify by making the mdoc parser callbacks void, and some cleanup;
no functional change, minus 50 lines of code.


# 1.120 28-Nov-2014 schwarze

Simplify the code by making various mdoc parser helper functions void.
No functional change, minus 130 lines of code.


# 1.119 28-Nov-2014 schwarze

Simplify code by making mdoc validation handlers void.
No functional change, minus 90 lines of code.


# 1.118 19-Nov-2014 schwarze

Escape sequences terminate high-level macro names, and when doing so,
they are ignored, just in the same way as for request names
and for low-level macro names.
This also cures a warning in the pod2man(1) preamble.


# 1.117 20-Oct-2014 schwarze

correct the spacing after in-line equations
that start at the beginning of an input line
but end before the end of an input line


# 1.116 20-Oct-2014 schwarze

correct spacing before inline equations


# 1.115 16-Oct-2014 schwarze

Implement in-line equations, much needed by Xenocara manuals.
Put the steering into the roff parser rather than into the mdoc
parser such that it works for all macro languages and on both text
and macro lines.
Line breaks and blank characters generated before and after in-line
equations are not perfect yet, but let's do one thing at a time.


# 1.114 06-Sep-2014 schwarze

Simplify by handling empty request lines at the one logical place
in the roff parser instead of in three other places in other parsers.
No functional change.


# 1.113 08-Aug-2014 schwarze

Bring the handling of defective prologues even closer to groff,
in particular relaxing the distinction between prologue and body
and further improving messages.
* The last .Dd wins and the last .Os wins, even in the body.
* The last .Dt before the first body macro wins.
* Missing title in .Dt defaults to UNTITLED. Warn about it.
* Missing section in .Dt does not default to 1. But warn about it.
* Do not warn multiple times about the same mdoc(7) prologue macro.
* Warn about missing .Os.
* Incomplete .TH defaults to empty strings. Warn about it.


# 1.112 08-Aug-2014 schwarze

mention requests and macros in more messages


# 1.111 08-Aug-2014 schwarze

Simplify: replace one global flag by one local variable
and remove three unused global flags. No functional change.


Revision tags: OPENBSD_5_6_BASE
# 1.110 09-Jul-2014 schwarze

mark defos as const; nobody needs to change it,
and it is occasionally useful to be able to pass literal strings


# 1.109 07-Jul-2014 schwarze

no need to skip content before first section header


# 1.108 06-Jul-2014 schwarze

Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.


# 1.107 02-Jul-2014 schwarze

Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!


# 1.106 01-Jul-2014 schwarze

Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.


# 1.105 20-Jun-2014 schwarze

Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.

1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue

Started on the plane back from Ottawa.


# 1.104 25-Apr-2014 schwarze

Fix a minor optimization i broke in bsd.lv rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!


# 1.103 20-Apr-2014 schwarze

KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change


# 1.102 30-Mar-2014 schwarze

Implement the roff(7) .ll (line length) request.
Found by naddy@ in the textproc/enchant(1) port.
Of course, do not use this in new manuals.


# 1.101 23-Mar-2014 schwarze

If an .Nd block contains macros, avoid fragmented entries in mandocdb(8),
instead use the .Nd content recursively.
Improves a couple of index entries in base.


# 1.100 21-Mar-2014 schwarze

avoid repetitive code for asprintf error handling


# 1.99 21-Mar-2014 schwarze

The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.


Revision tags: OPENBSD_5_5_BASE
# 1.98 05-Jan-2014 schwarze

Add an option -Q (quick) to mandocdb(8)
for accelerated generation of reduced-size databases.

Implement this by allowing the parsers to optionally
abort the parse sequence after the NAME section.

While here, garbage collect the unused void *arg attribute
of struct mparse and mparse_alloc().

This reduces the processing time of mandocdb(8) on /usr/share/man
by a factor of 2 and the database size by a factor of 4.
However, it still takes 5 times the time and 6 times the space
of makewhatis(8), so more work is clearly needed.


# 1.97 30-Dec-2013 schwarze

Simplify: Remove an unused argument from the mandoc_eos() function.
No functional change.


# 1.96 24-Dec-2013 schwarze

When deciding whether two consecutive macros are on the same input line,
we have to compare the line where the first one *ends* (not where it begins)
to the line where the second one starts.
This fixes the bug that .Bk allowed output line breaks right after block
macros spanning more than one input line, even when the next macro follows
on the same line.


# 1.95 21-Oct-2013 schwarze

There are three kinds of input lines: text lines, macros taking
positional arguments (like Dt Fn Xr) and macros taking text as
arguments (like Nd Sh Em %T An). In the past, even the latter put
each word of their arguments into its own MDOC_TEXT node; instead,
concatenate arguments unless delimiters, keeps or spacing mode
prevent that. Regarding mandoc(1), this is internal refactoring,
no output change intended.

Once we will switch mandocdb(8) from DB to SQLite in the future,
this is going to be required to support search expressions crossing
word boundaries, and it will reduce both database sizes and build
times by a bit more than 5% each.


# 1.94 03-Oct-2013 schwarze

Support setting arbitrary roff(7) number registers,
preserving read support for the ".nr nS" SYNOPSIS state register;
read support for arbitrary registers is still not available.

Inspired by NetBSD roff.c rev. 1.18 (Christos Zoulas, March 21, 2013),
but implemented differently. I don't want to have yet another different
implementation of a hash table in mandoc - it would be the second one
in roff.c alone and the fifth one in mandoc grand total.
Instead, i designed and implemented roff_setreg() and roff_getreg()
to be similar to roff_setstrn() and roff_getstrn().

Once we feel the need to optimize, we can introduce one common
hash table implementation for everything in mandoc.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.93 17-Nov-2012 schwarze

Cleanup naming of local variables to make the code easier on the eye:
Settle for "struct man *man", "struct mdoc *mdoc", "struct meta *meta"
and avoid the confusing "*m" which was sometimes this, sometimes that.
No functional change.

ok kristaps@ some time ago


# 1.92 16-Nov-2012 schwarze

Fix a crash triggered by .Bl -tag .It Xo .El .Sh found by florian@.

* When allocating a body end marker, copy the pointer to the normalized
block information from the body block, avoiding the risk of subsequent
null pointer derefence.
* When inserting the body end marker into the syntax tree, do not try to
copy that pointer from the parent block, because not being a direkt child
of the block it belongs to is the whole point of a body end marker.
* Even non-callable blocks (like Bd and Bl) can break other blocks;
when this happens, postpone closing them out in the usual way.


Revision tags: OPENBSD_5_2_BASE
# 1.91 18-Jul-2012 schwarze

Fix handling of paragraph macros inside lists:
* When they are trailing the last item, move them outside the list.
* When they are trailing any other none-compact item, drop them.

Improves formatting of 40 pages, e.g. grep(1), ksh(1), netstat(1),
ath(4), bsd.port.mk(5), pf.conf(5), mount(8), crypto(9).


# 1.90 18-Jul-2012 schwarze

The mdoc(7) \*(Ba predefined string actually forces roman font;
that's stupid because it may break enclosing font changes,
but let's do the same for groff bug compatibility.

--> Never use \*(Ba, use just plain "|"! <--

Also, predefined strings are already expanded by the roff(7) parser,
so the mdoc(7) parser has to look for the expanded string.

Formatting improvements in ksh(1), less(1), atan2(3),
hostapd.conf(5), snmpd.conf(5), and mknod(8).


# 1.89 16-Jul-2012 schwarze

Several -mdoc parser improvements related to vertical spacing:
* So far, .Pp and .Lp were removed before paragraph type blocks.
* Now also remove .br before paragraph type blocks.
* Treat .Lp as a paragraph like .Pp, so remove .Pp, .Lp, .br before it.
* Do not treat .sp as a paragraph, don't remove anything before it.
* After .Sh, .Ss, .Pp, and .Lp, remove .Pp, .Lp, .sp, .br, and blank lines.
* After .sp and .br, remove .br.


# 1.88 07-Jul-2012 schwarze

Support the .cc request; code by kristaps@, tests by me.
Needed for sqlite3(1) as reported by espie@.


# 1.87 24-May-2012 schwarze

Support -Ios='OpenBSD 5.1' to override uname(3) as the source of the
default value for the mdoc(7) .Os macro.
Needed for man.cgi on the OpenBSD website.

Problem with man.cgi first noticed by deraadt@;
beck@ and deraadt@ agree with the way to solve the issue.


Revision tags: OPENBSD_5_1_BASE
# 1.86 30-Sep-2011 schwarze

implement .Ap .Bd .Bo .Bq .D1 .Ic .Lp .Oo .Pf .Po .Ss .Sx .Sy .br .sp
implement .Bl -bullet
add more information to the .TH line
escape dots at the beginnings of lines
add trailing newline character at the end of the file
do not misinterpret the ROOT block as .Ap


# 1.85 18-Sep-2011 schwarze

sync to version 1.11.7 from kristaps@
main new feature: support the roff(7) .tr request
plus various bugfixes and some refactoring

regressions are so minor that it's better to get this in
and fix them in the tree


# 1.84 18-Sep-2011 schwarze

sync to version 1.11.5:
adding an implementation of the eqn(7) language
by kristaps@

So far, only .EQ/.EN blocks are handled, in-line equations are not, and
rendering is not yet very pretty, but the parser is fairly complete.


Revision tags: OPENBSD_5_0_BASE
# 1.83 24-Apr-2011 schwarze

Merge version 1.11.1:
Again lots of cleanup and maintenance work by kristaps@.
- simplify error reporting: less function pointers, more mandoc_[v]msg
- main: split document parsing out of main.c into read.c
- roff, mdoc, man: improved recognition of control characters
- roff: better handling of if/else stack overflows
- roff: add some predefined strings for backward compatibility
- mdoc, man: empty sections are not errors
- mdoc: move delimiter handling to libmdoc
- some header restructuring and some minor features and fixes
This merge causes two minor regressions
that i will fix in separate commits right afterwards.


# 1.82 21-Apr-2011 schwarze

Merge version 1.10.10:
lots of cleanup and maintenance work by kristaps@.
- move some main.c globals into struct curparse
- move mandoc_*alloc to mandoc.h such that all code can use them
- make mandoc_isdelim available to formatting frontends
- dissolve mdoc_strings.c, move the code where it is used
- make all error reporting functions void, their return values were useless
- and various minor cleanups and fixes


# 1.81 20-Mar-2011 schwarze

Import the foundation for eqn(7) support.
Written by kristaps@.

For now, i'm adding one line to each of the four frontends
to just pass the input text through to the output,
not yet interpreting any of then eqn keywords.


# 1.80 07-Mar-2011 schwarze

Clean up date handling,
as a first step to get rid of the frequent petty warnings in this area:
- always store dates as strings, not as seconds since the Epoch
- for input, try the three most common formats everywhere
- for unrecognized format, just pass the date though verbatim
- when there is no date at all, still use the current date
Originally triggered by a one-line patch from Tim van der Molen,
<tbvdm at xs4all dot nl>, which is included here.
Feedback and OK on manual parts from jmc@.
"please check this in" kristaps@


Revision tags: OPENBSD_4_9_BASE
# 1.79 10-Feb-2011 schwarze

Tbl code maintenance by kristaps@.
- Remember the line-number of a tbl_span, and use it in messages.
- Put *_span_alloc() functions right into the *_addspan() ones,
since these are the only places they are called from.


# 1.78 09-Jan-2011 schwarze

Make sure coding errors cannot make us miss fatal parsing errors
by assert(3)ing valid parser state in the main parsing functions;
from kristaps@.


# 1.77 04-Jan-2011 schwarze

Merge kristaps@' cleaner tbl integration, removing mine;
there are still a few bugs, but fixing these will be easier in tree.


# 1.76 01-Jan-2011 schwarze

Clean up {mdoc,man}_{p,v}msg invocations:
Ignore the return values, they are constant anyway.
From kristaps@.


# 1.75 29-Dec-2010 schwarze

Reorg by Kristaps: In libmdoc, replace the union of pointers to structs
of macro-specific data by a pointer to a union of structs, which makes the
code simpler and more robust at the expense of a small memory overhead.
Merging was somewhat difficult because we mustn't break tbl(1) support
which the bsd.lv version does not yet have.


# 1.74 26-Dec-2010 schwarze

Behave more like groff (both old and new): Specifying both .%T and .%J in
an .Rs block causes the title to be quoted instead of underlined, such
that journal title and article title appear visually different.
Original diff from kristaps@, simplified by me, tweaked again by kristaps@.


# 1.73 21-Dec-2010 schwarze

Migrate .An to use a pointer to its data, like everybody else.
In preparation for a simpler ref-counted system for node data.
From kristaps@.


# 1.72 21-Dec-2010 schwarze

Vertical spacing improvements from kristaps@, small tweaks by me:
Add a "last child" member to struct mdoc_node.
Remove .Pp or .Lp if it is the first or last child of an .Sh or .Ss body.
Thus, no need to do the same in the front-ends any longer.
Tolerate some cases of .Pp inside .Bl.


# 1.71 02-Dec-2010 schwarze

Properly initialize the manual section to a default when .Dt is missing.
Without this, we died on an assertion.
Problem noted and patch provided by kristaps@.


# 1.70 01-Dec-2010 schwarze

Merge mdoc_action.c into mdoc_validate.c, because having two places to do
basically the same things just causes code duplication and confusion.
Work by kristaps@, including a few bugfixes he found during the merge,
and reapplying OpenBSD changes on top.


# 1.69 28-Nov-2010 schwarze

To avoid FATAL errors, we have been parsing and ignoring the roff
requests .am, .ami, .am1, .dei, and .rm for a long time.
Since ignoring them can (rarely) cause information loss and serious
misformatting, throw an ERROR: NOT IMPLEMENTED when finding them.
Implementing them would not be too difficult, but they are so rare
in practice that i can find better use for my time right now.

In this context,
- Put the string "NOT IMPLEMENTED" into two other error messages
as well, to distinguish them from those caused by broken input.
- Print the string "unknown macro" once, not twice in the error message
associated with MANDOCERR_MACRO, and begin printing the buffer at the
point where the unknown macro really is, not at the start of line.


# 1.68 16-Oct-2010 schwarze

Do not abort() on tbl errors, reduce the risk that tbl stuff kills a build,
and provide more useful tbl error messages in a non-intrusive way.


# 1.67 16-Oct-2010 schwarze

Support tbl(1) code embedded into mdoc(7) input files.
Very similar to what i have done in man(7) yesterday.
Allows to build cpu(4) on HPPA, wi(4), and phantasia(6).
Now we are able to build all tbl code in base.


# 1.66 27-Sep-2010 schwarze

Merge the last bits of 1.10.6 (released today), most were already in:
* ignore double-.Pp
* ignore .Pp before .Bd and .Bl (unless -compact in specified)
* avoid double blank line upon .Pp, .br and friends in literal context
* cast enums to int when passing them to exit(3) to please lint(1)
While merging, fix a regression introduced by kristaps@:
Outside literal mode, double blank lines must both be printed.
To achieve this again after kristaps@ improvements in 1.10.6,
treat such blank lines as .sp (instead of .Pp as in 1.10.5)
and drop .Pp before .sp just like dropping .Pp before .Pp.


# 1.65 20-Aug-2010 schwarze

Implement a simple, consistent user interface for error handling.
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.


# 1.64 18-Aug-2010 schwarze

Simplify and sync the code and comments for copying the macro name
in man_pmacro() and mdoc_pmacro(). In particular, no need to use
isgraph(3) here, that has already been done in main.c.
Joint work by Kristaps and myself, ok kristaps@.


Revision tags: OPENBSD_4_8_BASE
# 1.63 07-Aug-2010 schwarze

Groff allows the initial macro on a line to be delimited by a space
of by a tab; so allow the tab in mandoc, too.
Bug found by me, fix by kristaps@, "sure" deraadt@.


# 1.62 16-Jul-2010 schwarze

Text ending in a full stop, exclamation mark or question mark
should not flag the end of a sentence if:

1) The punctuation is followed by closing delimiters
and not preceded by alphanumeric characters, like in
"There is no full stop (.) in this sentence"

or

2) The punctuation is a child of a macro
and not preceded by alphanumeric characters, like in
"There is no full stop
.Pq \&.
in this sentence"

jmc@ and sobrado@ like this


# 1.61 13-Jul-2010 schwarze

Merge release 1.10.4 (all code by kristaps@), providing four new features:
1) Proper .Bk support: allow output line breaks at input line breaks,
but keep input lines together in the output, finally fixing
synopses like aucat(1), mail(1) and tmux(1).
2) Mostly finished -Tps (PostScript) output.
3) Implement -Thtml output for .Nm blocks and .Bk -words.
4) Allow iterative interpolation of user-defined roff(7) strings.
Also contains some minor bugfixes and some performance improvements.


# 1.60 01-Jul-2010 schwarze

In the mdoc(7) parser, inspect roff registers early such that all parts
of the parser can use the resulting cues. In particular, this allows
to use .nr nS to force SYNOPSIS-style .Nm indentation outside the
SYNOPSIS as needed by ifconfig(8).

To actually make this useable, .Pp must rewind .Nm, or the rest of the
section would end up indented. Implement a quick hack for now,
a generic solution can be designed later.

ok kristaps@ sobrado@


# 1.59 29-Jun-2010 schwarze

Support for badly nested blocks, written around the time of
the Rostock mandoc hackathon and tested and polished since,
supporting constructs like:

.Ao Bo Ac Bc (exp breaking exp)
.Aq Bo eol Bc (imp breaking exp)
.Ao Bq Ac eol (exp breaking imp)
.Ao Bo So Bc Ac Sc (double break, inner before outer)
.Ao Bo So Ac Bc Sc (double break, outer before inner)
.Ao Bo Ac So Bc Sc (broken breaker)
.Ao Bo So Bc Do Ac Sc Dc (broken double breaker)

There are still two known issues which are tricky:

1) Breaking two identical explicit blocks (Ao Bo Bo Ac or Aq Bo Bo eol)
fails outright, triggering a bogus syntax error.
2) Breaking a block by two identical explicit blocks (Ao Ao Bo Ac Ac Bc
or Ao Ao Bq Ac Ac eol) still has a minor rendering error left:
"<ao1 <ao2 [bo ac2> ac1> bc]>" should not have the final ">".

We can fix these later in the tree, let's not grow this diff too large.

"get it in" kristaps@


# 1.58 27-Jun-2010 schwarze

Full .nr nS support, unbreaking the kernel manuals.

Kristaps coded this from scratch after reading my .nr patch;
it is simpler and more powerful.

Registers live in struct regset in regs.h, struct man and struct mdoc
contain pointers to it. The nS register is cleared when parsing .Sh.
Frontends respect the MDOC_SYNPRETTY flag set in mdoc node_alloc.


# 1.57 26-Jun-2010 schwarze

merge release 1.10.2
* bug fixes:
- interaction of ASCII_HYPH with special chars (found by Ulrich Spoerlein)
- handling of roff conditionals (found by Ulrich Spoerlein)
- .Bd -offset will no more default to 6n
* maintenance:
- more caching of .Bd and .Bl arguments for efficiency
- deconstify man(7) validation routines
- add FreeBSD library names (provided by Ulrich Spoerlein)
* start PostScript font-switching


# 1.56 06-Jun-2010 schwarze

Merge bsd.lv version 1.10.1 (to be released soon).

The main step forward is that this now has *much* better .Bl -column
support, now supporting many manuals that previously errored out
without producing any output.

Other fixes include:
* do not die from multiple list types, use the first and warn
* in .Bl without a type, default to -item
* various tweaks to .Dt
* fix .In, .Fd, .Ft, .Fn and .Fo formatting
* some documentation fixes and additions
* and fix a couple of bugs reported by Ulrich Spoerlein:
* better support for roff block-end "\}" without a preceding dot
* .In must not break the line outside SYNOPSIS
* spelling in some error messages

While merging, fix one regression in .In spacing
that needs to go to bsd.lv, too.


# 1.55 26-May-2010 schwarze

When a word does not fully fit onto the output line, but it contains
at least one hyphen, we already had support for breaking the line a the
last fitting hyphen. This patch improves this functionality by only
breaking at hyphens in free-form text, and by not breaking at hyphens
* at the beginning or end of a word or
* immediately preceded or followed by another hyphen or
* escaped by a preceding backslash.

Before this patch, differences in break-at-hyphen support were one
of the major sources of noise in automatic comparisons to mdoc(7)
groff output. Now, the remaining differences are hard to find among
the noise coming from other sources.

Where there are still differences, what we do seems to be better than
what groff does, see e.g. the chio(1) exchange and position commands
for one of the now rare examples.

idea and coding by kristaps@

Besides, this was the last substantial code difference left
between bsd.lv and openbsd.org. We are now in full sync.


# 1.54 23-May-2010 schwarze

Unified error and warning message system for all of mandoc,
featuring three message levels, as agreed during the mandoc hackathon:
* FATAL parser failure, cannot produce any output from this input file:
eventually, we hope to convert most of these to ERRORs.
* ERROR, meaning mandoc cannot cope fully with the input syntax and will
probably lose information or produce structurally garbled output;
it will try to produce output anyway but exit non-zero at the end,
which is eventually intended to make the ports infrastructure happy.
* WARNING, meaning you should clean up the input file, but output
is probably mostly OK, so this will not cause error-exit at the end.
This commit is mostly just converting the old system to the new one; before
the classification will become really reliable, we must check all messages.

In particular,
* set up a new central message string table in main.c
* drop the old message string tables from man.c and mdoc.c
* get rid of the piece-meal merr enums in libman and libmdoc
* reduce number of error/warning functions from 16 to 6 (still a lot...)

While here, handle a few problems more gracefully:
* allow .Rv and .Ex to work without a prior .Nm
* allow .An to ignore extra arguments
* allow undeclared columns in .Bl -column

Written by kristaps@.


# 1.53 20-May-2010 schwarze

Support nested roff instructions:
* allow roff_parseln() to be re-run
* allow roff_parseln() to manipulate the line buffer offset
* support the offset in the man and mdoc libraries
* adapt .if, .ie, .el, .ig, .am* and .de* support
* interpret some instructions even in conditional-negative context
Coded by kristaps during the last day of the mandoc hackathon.

To avoid regressions in the OpenBSD tree, commit this together
with some small local additions:
* detect roff block end "\}" even on macro lines
* actually implement the ".if n" conditional
* ignore .ds, .rm and .tr in libroff

Also back my old .if/.ie/.el-handling out of libman, reverting:
man.h 1.15 man.c 1.25 man_macro.c 1.15 man_validate.c 1.19
man_action.c 1.15 man_term.c 1.28 man_html.c 1.9.


# 1.52 16-May-2010 schwarze

Rewrite the main mdoc text parser, mdoc_ptext()
to make it easier to understand and to fix various bugs:
* strip white space from the end MDOC_TEXT elements in literal mode
* in literal mode, a line may be blank even when containing tabs
* escaped backslashes do not escape following characters
ok kristaps@


# 1.51 16-May-2010 schwarze

allow the single quote as a control character in place of the dot
at all relevant places;
from kristaps@


# 1.50 15-May-2010 schwarze

allow non-numeric manual sections in -mdoc;
while here, allow LIBRARY in section 9;
by kristaps@


# 1.49 15-May-2010 schwarze

various improvements regarding errors and warnings Joerg Sonnenberger:
* If the last -column .Bl isn't specified, it is auto-sized.
* An invalid .St argument should be a warning, not an error.
Just put the argument into the output.
* An invalid .At argument should be a warning, not an error.
Just print the argument, like new groff does.
* Remove warnings concerning manual section (like 1, 6, 8).
It was only used for .Ex and not really useful.
* Remove warnings concerning page section (like SYNOPSIS).
These were only used for .Fd and .Lb and not really useful.


# 1.48 14-May-2010 schwarze

Integrate kristaps@' end-of-sentence (EOS) framework
which is simpler and more powerful than mine, and remove mine.

* man(7) now has EOS handling, too
* put EOS detection into its own function in libmandoc
* use node and termp flags to communicate the EOS condition
* no more EOS pseudo-macro
* no more non-printable EOS marker character on the formatter level

This slightly breaks EOS detection after trailing punctuation
in mdoc(7) macros, but that will be restored soon.


# 1.47 14-May-2010 schwarze

Merge 1.9.25, keeping local patches;
this does not merge kristaps' end-of-sentences handling yet,
i will check that separately. This one includes:
* handle \*(Ba as a delimiter
* introduce ARGS_PEND for .Bl -column .It end-of-line special casing
* section ordering: expect EXIT STATUS at the right place
* line break fixes in SYNOPSIS
* allow literal contexts to have arbitrary line lengths
* the input file column number can not be used to identify the beginning
of a line because white space is allowed after the initial '.'
* proper leading spaces in -man -Tascii mode
* do not let Lb break lines in -mdoc -Thtml LIBRARY


# 1.46 14-May-2010 schwarze

merge 1.9.24, keeping local patches; some changes:
* preserve multiple consecutive space characters in input
* do not restrict .Cd and .Rv to certain sections (requested by Joerg)
* do not run lookup() on quoted words
* enum return types for mdoc_args and mdoc_argv
* fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel)
* various lint and manual fixes


# 1.45 08-May-2010 schwarze

merge bsd.lv rev 1.123:
sync mdoc.c's static function names with man.c


# 1.44 08-May-2010 schwarze

handle text lines beginning with \." as comments, like groff does,
even though this is not correct comment syntax (so warn, too)
reported by Claus Assmann on misc@, fix by kristaps@


# 1.43 04-May-2010 schwarze

end-of-sentence markers at the end of .Fn argument lists
ruin indentation of the next line in the SYNOPSIS section;
bug found by jacekm@ in err(3)


# 1.42 27-Apr-2010 schwarze

Fix a subtle bug noticed by naddy@ in pftop(8), thanks!

When converting blank lines to .Pp outside literal context,
it could happen that the following node ended up as a child
of the .Pp element, but it must always be a sibling.


# 1.41 22-Apr-2010 schwarze

Fix a segfault reported by nicm@, introduced in rev. 1.38.
When finding a blank line, trying to parse it is a bad idea.
Instead, after adding .Pp to the AST, just return from parsetext().


# 1.40 07-Apr-2010 schwarze

Merge the good parts of 1.9.23,
avoid the bad parts of 1.9.23, and keep local patches.

Input in general:
* Basic handling of roff-style font escapes \f, \F.
* Quoted punctuation does not count as punctuation.

mdoc(7) parser:
* Make .Pf callable; noted by Claus Assmann.
* Let .Bd and .Bl ignore unknown arguments; noted by deraadt@.
* Do not warn when .Er is used outside certain sections.
* Replace mdoc_node_free[list] by mdoc_node_delete.
* Replace #define by enum for rew*() return values.

man(7) parser:
* When .TH is missing, use default section and date.

Output in general:
* Curly braces do not count as punctuation.
* No space after .Fl w/o args when a macro follows on the same line.

HTML output:
* Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT().
* Print whitespace after, not before .Vt .Fn .Ft .Fo.

Checked that all manuals in base still build.


# 1.39 04-Apr-2010 schwarze

When the prologue lacks required information, do not error out,
but warn, set up some default values, and prod on.
Unbreaking the ports build for textproc/sgmlformat;
reported by naddy@, thanks.


# 1.38 03-Apr-2010 schwarze

* outside literal context in mdoc(7), handle blank lines like .Pp
* a missing NAME section in mdoc(7) need not be fatal

ok deraadt@


# 1.37 02-Apr-2010 schwarze

merge 1.9.22, keeping local patches
* convert mdoc tokens from #define to enum
* fix a segfault with .Xo/.Xc in explicit blocks
* Thorn is \*(Th, not \*(TH; noticed by Joerg Sonnenberger


# 1.36 25-Mar-2010 schwarze

fix a stupid out-of-bounds read access introduced in the previous
revision, in the code searching for the end of a sentence


Revision tags: OPENBSD_4_7_BASE
# 1.35 02-Mar-2010 schwarze

Proper inter-sentence spacing for mdoc(7).
When a text line or a non-block macro line in the source code ends
in any of ".!?", consider that an end of sentence (EOS).
This makes Jason's rule "new sentence, new line" even more important.
Let the parser detect the EOS and insert a token into the AST.
Let the -Tascii frontend render the EOS token as a double space before
the next word.


# 1.34 18-Feb-2010 schwarze

sync to release 1.9.15:
* corrected .Vt handling (spotted by Joerg Sonnenberger)
* corrected .Xr argument handling (based on my patch)
* removed \\ escape sequence (because it is for low-level roff only)
* warn about trailing whitespace (suggested by jmc@)
* -Txhtml support
* and some general cleanup and doc improvements


# 1.33 02-Jan-2010 schwarze

complete the sync to 1.9.15-pre2: mostly minor fixes
* bugfix: do not restore TERMP flags when leaving lists, just reset them
* and a few HTML fixes
* clarity: return width from a2width, not width+2, and adapt to it
* manual: document .Bl and .Fl
* portability: no need to escape '%' in macro names


# 1.32 22-Dec-2009 schwarze

sync to 1.9.12, mostly portability and refactoring:

correctness/functionality:
- bugfix: do not die when overstep hits the right margin
- new option: -fign-escape
- and various HTML features

portability:
- replace bzero(3) by memset(3), which is ANSI C
- replace err(3)/warn(3) by perror(3)/exit(3), which is ANSI C
- iuse argv[0] instead of __progname
- add time.h to various files for FreeBSD compilation

simplicity:
- do not allocate header/footer data dynamically in *_term.c
- provide and use malloc frontends that error out on failure

for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.31 27-Oct-2009 schwarze

sync to 1.9.11: adapt printing of dates to groff conventions,
NetBSD portability fixes and some minor bugfixes and feature enhancements;
also checked that my hyphenation code still works on top of this


# 1.30 21-Oct-2009 schwarze

sync to 1.9.9, featuring:
* -Thtml output mode
* roff scaling units
* and some minor fixes
for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.29 19-Oct-2009 schwarze

sync to 1.9.6: multiple improvements to references (.Rs)
* validate and order .Rs child nodes
* underline book title (.%B) and issuer (.%I)
* enclose title of article (.%T) in quotes
* avoid calling mdoc_verr directly, use a proper error code instead


# 1.28 19-Oct-2009 schwarze

sync to 1.9.6: u_char lives in <sys/types.h>
noticed by uqs at spoerlein dot net on FreeBSD,
where <stdlib.h> does not include <sys/types.h>


# 1.27 21-Sep-2009 schwarze

sync to 1.9.5: lookup hashes are now static tables
shortening the code, and, according to kristaps@, speeding it up


# 1.26 18-Sep-2009 schwarze

sync to 1.9.2: non-printable characters in macro names are errors;
from joerg at netbsd dot org


# 1.25 22-Aug-2009 schwarze

sync to 1.9.1: set mdoc_next flags in mdoc_*_alloc routines, where they belong


# 1.24 22-Aug-2009 schwarze

sync to 1.9.0: polishing the core code of mdoc macro handling
1) If a macro is not parsed, do not parse it. Of course, without
parsing it, we cannot produce "macro-like parameter" warnings,
but these were useless anyway.
2) If a macro is not callable, do not print a useless warning when
it occurs as a parameter, just display the raw characters.
3) Below .Bl -column, check whether macros are callable.
4) Like groff, allow whitespace after the initial dot on macro lines.


# 1.23 22-Aug-2009 schwarze

sync to 1.8.5: Error string is now file:line:col: message.
Fixed column reporting (off by one).
Use fprintf instead of warnx for parse errors (like cc).


# 1.22 26-Jul-2009 schwarze

sync to 1.8.1: rewrite quoted literal handling correctly,
rewrite TABSEP handling in a simpler way,
and retire ECOLEMPTY, ARGS_QUOTED and ARGS_ARGVLIKE


# 1.21 26-Jul-2009 schwarze

sync to 1.8.1: removed excessively verbose EARGVPARM warning


# 1.20 26-Jul-2009 schwarze

sync to 1.8.1: support .br and .sp


# 1.19 26-Jul-2009 schwarze

sync to 1.8.1: libmdoc now breaks up free-form lines into tokens;
will simplify LITERAL mode in front-end


# 1.18 18-Jul-2009 schwarze

sync to 1.8.0: move mdoc_a2att, mdoc_a2st, and mdoc_a2lib to libmdoc


# 1.17 12-Jul-2009 schwarze

sync to 1.7.23: pass warning code to mdoc_pwarn() instead of warning message
define additional warning macro mdoc_nwarn()
remove obsolete warning functions mdoc_warn(), pwarn(), vwarn(), nwarn()
remove various now unused "enum mdoc_warn" and "enum mwarn"


# 1.16 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_perr() instead of error string
and use the so improved mdoc_nerr() at many places;
get rid of now unused static functions perr()


# 1.15 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_nerr() instead of error string
and use the so improved mdoc_nerr() at many places


# 1.14 12-Jul-2009 schwarze

sync to 1.7.23: unify the various "enum merr" into libman.h and libmdoc.h,
use it as a new argument to mdoc_err(), the same way as for for man_err(),
and use string tables instead of switch statements to select error messages


# 1.13 12-Jul-2009 schwarze

sync to 1.7.23: third step to get rid of enum mdoc_warn:
mdoc_verr is not using enum mdoc_warn, so use it at a few more places


# 1.12 12-Jul-2009 schwarze

sync to 1.7.23: second step to get rid of enum mdoc_warn:
remove type from mdoc_vwarn arguments, and use this function where apropriate


# 1.11 12-Jul-2009 schwarze

sync to 1.7.23: first step to get rid of enum mdoc_warn:
unify manwarn() and mdocwarn() into mwarn()


Revision tags: OPENBSD_4_6_BASE
# 1.10 23-Jun-2009 schwarze

sync to 1.7.20: like for the -man case, add an nchild counter to the -mdoc
nodes, simplifying the validation code; no functional change


# 1.9 19-Jun-2009 schwarze

sync to 1.7.19: more elegant section handling


# 1.8 18-Jun-2009 schwarze

sync to 1.7.19: comment cleanup; no functional change


# 1.7 18-Jun-2009 schwarze

sync to 1.7.19: improved comment handling


# 1.6 18-Jun-2009 schwarze

complete sync to 1.7.17: garbage collect unused functions
mdoc_msg, mdoc_pmsg, mdoc_vmsg, and mdoc_nwarn


# 1.5 15-Jun-2009 schwarze

bring back miod@'s "real functions" patch (rev. 1.2)
which was erroneously backed out in rev. 1.4, sorry;
ok kristaps@


# 1.4 15-Jun-2009 schwarze

sync to 1.7.16:
reduce code duplication in warning and error reporting functions
while here, garbage collect three unused function prototypes


# 1.3 14-Jun-2009 schwarze

sync to 1.7.16: comments, whitespace and spelling fixes; no functional change


# 1.2 15-Apr-2009 miod

Replace variadic macros with real functions, so that this compiles on
platforms still using gcc 2.
ok deraadt@


# 1.1 06-Apr-2009 kristaps

Initial check-in of mandoc for formatting manuals. ok deraadt@


Revision tags: OPENBSD_6_2_BASE
# 1.157 11-Aug-2017 schwarze

Make the "new sentence, new line" check stricter, allowing digits
in the last two letters of the last word of the sentence.
No false positives in base or Xenocara.
Suggested by and OK jmc@.


# 1.156 17-Jun-2017 schwarze

correct handling of blank lines after \c


# 1.155 07-Jun-2017 schwarze

Also catch "new sentence, new line" if there are three blanks
between the sentences. Thomas Klausner says he has seen some
of these, and i don't see any false positives.


# 1.154 07-Jun-2017 schwarze

Make "new sentence, new line" detection stricter:
Also catch cases where the new sentence starts with a one-letter word
and the input line is broken right after that word.
Suggested by Thomas Klausner <wiz @ NetBSD>.

It's merely a three-bit diff, changing one byte from 0x34 to 0x33,
so what can possibly go wrong...


# 1.153 05-May-2017 schwarze

Move .sp to the roff modules. Enough infrastructure is in place
now that this actually saves code: -70 LOC.


# 1.152 29-Apr-2017 schwarze

Parser unification: use nice ohashes for all three request and macro tables;
no functional change, minus two source files, minus 200 lines of code.


# 1.151 24-Apr-2017 schwarze

Continue parser unification:
* Make enum rofft an internal interface as enum roff_tok in "roff.h".
* Represent mdoc and man macros in enum roff_tok.
* Make TOKEN_NONE a proper enum value and use it throughout.
* Put the prologue macros first in the macro tables.
* Unify mdoc_macroname[] and man_macroname[] into roff_name[].


Revision tags: OPENBSD_6_1_BASE
# 1.150 03-Mar-2017 schwarze

remove a few redundant conditions that jsg@ found with cppcheck


# 1.149 16-Feb-2017 schwarze

Remove the ENDBODY_NOSPACE flag, simplifying the code.

Comparing to groff output, it appears that all cases where it was used
and made a difference actually require the opposite, ENDBODY_SPACE.

I have no idea why i added it back in 2010; maybe to compensate for
some other bug that has long been fixed.


# 1.148 28-Jan-2017 schwarze

Add a warning "new sentence, new line".
This does not attempt to pinpoint each and every offender, but
instead tries very hard to avoid false positives: Currently, there
are only two false positives in the whole OpenBSD base system.
Only do this in mdoc(7), not in man(7), because manuals written
in man(7) typically have much worse problems than this.
OK jmc@ on a previous version of the patch


# 1.147 10-Jan-2017 schwarze

unify names of AST node flags; no change of cpp output


# 1.146 20-Aug-2016 schwarze

If a column list starts with implicit rows (that is, rows without .It)
and roff-level nodes (e.g. tbl or eqn) follow, don't run into an
assertion. Instead, wrap the roff-level nodes in their own row.
Issue found by tb@ with afl(1).


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.145 30-Oct-2015 schwarze

If a .Bd block has no arguments at all, drop the block and only keep
its contents. Removing a gratuitious difference to groff output
found after a related bug report from krw@.


# 1.144 20-Oct-2015 schwarze

In order to become able to generate syntax tree nodes on the roff(7)
level, validation must be separated from parsing and rewinding.
This first big step moves calling of the mdoc(7) post_*() functions
out of the parser loop into their own mdoc_validate() pass, while
using a new mdoc_state() module to make syntax tree state handling
available to both the parser loop and the validation pass.


# 1.143 12-Oct-2015 schwarze

To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.


# 1.142 06-Oct-2015 schwarze

modernize style: "return" is not a function; ok cmp(1)


Revision tags: OPENBSD_5_8_BASE
# 1.141 23-Apr-2015 schwarze

Unify mdoc_deroff() and man_deroff() into a common function deroff().
No functional change except that for mdoc(7), it now skips leading
escape sequences just like it already did for man(7).
Escape sequences rarely occur in mdoc(7) code and if they do,
skipping them is an improvement in this context.
Minus 30 lines of code.


# 1.140 23-Apr-2015 schwarze

Get rid of two empty wrapper functions. No functional change.


# 1.139 19-Apr-2015 schwarze

Unify trickier node handling functions.
* man_elem_alloc() -> roff_elem_alloc()
* man_block_alloc() -> roff_block_alloc()
The functions mdoc_elem_alloc() and mdoc_block_alloc() remain for
now because they need to do mdoc(7)-specific argument processing.


# 1.138 19-Apr-2015 schwarze

Unify some node handling functions that use TOKEN_NONE.
* mdoc_word_alloc(), man_word_alloc() -> roff_word_alloc()
* mdoc_word_append(), man_word_append() -> roff_word_append()
* mdoc_addspan(), man_addspan() -> roff_addtbl()
* mdoc_addeqn(), man_addeqn() -> roff_addeqn()
Minus 50 lines of code, no functional change.


# 1.137 19-Apr-2015 schwarze

Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.


# 1.136 19-Apr-2015 schwarze

Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.


# 1.135 18-Apr-2015 schwarze

Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.


# 1.134 18-Apr-2015 schwarze

Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.


# 1.133 18-Apr-2015 schwarze

Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.


# 1.132 18-Apr-2015 schwarze

Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.


# 1.131 18-Apr-2015 schwarze

Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.


# 1.130 02-Apr-2015 schwarze

Third step towards parser unification:
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.


# 1.129 02-Apr-2015 schwarze

Second step towards parser unification:
Replace struct mdoc_node and struct man_node by a unified struct roff_node.
To be able to use the tok member for both mdoc(7) and man(7) without
defining all the macros in roff.h, sacrifice a tiny bit of type safety
and make tok an int rather than an enum.
Almost mechanical, no functional change.
Written on the Eurostar from Bruxelles to London on the way to p2k15.


# 1.128 02-Apr-2015 schwarze

First step towards parser unification:
Replace enum mdoc_type and enum man_type by a unified enum roff_type.
Almost mechanical, no functional change.
Written on the ICE train from Frankfurt to Bruxelles on the way to p2k15.


Revision tags: OPENBSD_5_7_BASE
# 1.127 12-Feb-2015 schwarze

Do not confuse .Bl -column lists that just broken another block
with newly opened .Bl -column lists;
fixing an assertion failure jsg@ found with afl:
test case #481, Bl It Bl -column It Bd El text text El


# 1.126 12-Feb-2015 schwarze

Delete the mdoc_node.pending pointer and the function calculating
it, make_pending(), which was the most difficult function of the
whole mdoc(7) parser. After almost five years of maintaining this
hellhole, i just noticed the pointer isn't needed after all.

Blocks are always rewound in the reverse order they were opened;
that even holds for broken blocks. Consequently, it is sufficient
to just mark broken blogs with the flag MDOC_BROKEN and breaking
blocks with the flag MDOC_ENDED. When rewinding, instead of iterating
the pending pointers, just iterate from each broken block to its
parents, rewinding all that are MDOC_ENDED and stopping after
processing the first ancestor that it not MDOC_BROKEN. For ENDBODY
markers, use the mdoc_node.body pointer in place of the former
mdoc_node.pending.

This also fixes an assertion failure found by jsg@ with afl,
test case #467 (Bo Bl It Bd Bc It), where (surprise surprise)
the pending pointer got corrupted.

Improved functionality, minus one function, minus one struct field,
minus 50 lines of code.


# 1.125 05-Feb-2015 schwarze

Simplify by deleting the "lastline" member of struct mdoc_node.
Minus one struct member, minus 17 lines of code, no functional change.


# 1.124 02-Feb-2015 schwarze

Get rid of all calls to rew_sub() in blk_exp_close(); only ten calls
remain in other functions. As a bonus, this fixes an assertion failure
jsg@ found some time ago with afl (test case 982) and improves minor
details in error reporting.


# 1.123 15-Jan-2015 schwarze

Fatal errors no longer exist.
If a file can be opened, mandoc will produce some output;
at worst, the output may be almost empty.
Simplifies error handling and frees a message type for future use.


# 1.122 28-Nov-2014 schwarze

Simplify by making the eqn and tbl steering functions void;
no functional change, minus 15 lines of code.


# 1.121 28-Nov-2014 schwarze

Simplify by making the mdoc parser callbacks void, and some cleanup;
no functional change, minus 50 lines of code.


# 1.120 28-Nov-2014 schwarze

Simplify the code by making various mdoc parser helper functions void.
No functional change, minus 130 lines of code.


# 1.119 28-Nov-2014 schwarze

Simplify code by making mdoc validation handlers void.
No functional change, minus 90 lines of code.


# 1.118 19-Nov-2014 schwarze

Escape sequences terminate high-level macro names, and when doing so,
they are ignored, just in the same way as for request names
and for low-level macro names.
This also cures a warning in the pod2man(1) preamble.


# 1.117 20-Oct-2014 schwarze

correct the spacing after in-line equations
that start at the beginning of an input line
but end before the end of an input line


# 1.116 20-Oct-2014 schwarze

correct spacing before inline equations


# 1.115 16-Oct-2014 schwarze

Implement in-line equations, much needed by Xenocara manuals.
Put the steering into the roff parser rather than into the mdoc
parser such that it works for all macro languages and on both text
and macro lines.
Line breaks and blank characters generated before and after in-line
equations are not perfect yet, but let's do one thing at a time.


# 1.114 06-Sep-2014 schwarze

Simplify by handling empty request lines at the one logical place
in the roff parser instead of in three other places in other parsers.
No functional change.


# 1.113 08-Aug-2014 schwarze

Bring the handling of defective prologues even closer to groff,
in particular relaxing the distinction between prologue and body
and further improving messages.
* The last .Dd wins and the last .Os wins, even in the body.
* The last .Dt before the first body macro wins.
* Missing title in .Dt defaults to UNTITLED. Warn about it.
* Missing section in .Dt does not default to 1. But warn about it.
* Do not warn multiple times about the same mdoc(7) prologue macro.
* Warn about missing .Os.
* Incomplete .TH defaults to empty strings. Warn about it.


# 1.112 08-Aug-2014 schwarze

mention requests and macros in more messages


# 1.111 08-Aug-2014 schwarze

Simplify: replace one global flag by one local variable
and remove three unused global flags. No functional change.


Revision tags: OPENBSD_5_6_BASE
# 1.110 09-Jul-2014 schwarze

mark defos as const; nobody needs to change it,
and it is occasionally useful to be able to pass literal strings


# 1.109 07-Jul-2014 schwarze

no need to skip content before first section header


# 1.108 06-Jul-2014 schwarze

Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.


# 1.107 02-Jul-2014 schwarze

Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!


# 1.106 01-Jul-2014 schwarze

Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.


# 1.105 20-Jun-2014 schwarze

Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.

1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue

Started on the plane back from Ottawa.


# 1.104 25-Apr-2014 schwarze

Fix a minor optimization i broke in bsd.lv rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!


# 1.103 20-Apr-2014 schwarze

KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change


# 1.102 30-Mar-2014 schwarze

Implement the roff(7) .ll (line length) request.
Found by naddy@ in the textproc/enchant(1) port.
Of course, do not use this in new manuals.


# 1.101 23-Mar-2014 schwarze

If an .Nd block contains macros, avoid fragmented entries in mandocdb(8),
instead use the .Nd content recursively.
Improves a couple of index entries in base.


# 1.100 21-Mar-2014 schwarze

avoid repetitive code for asprintf error handling


# 1.99 21-Mar-2014 schwarze

The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.


Revision tags: OPENBSD_5_5_BASE
# 1.98 05-Jan-2014 schwarze

Add an option -Q (quick) to mandocdb(8)
for accelerated generation of reduced-size databases.

Implement this by allowing the parsers to optionally
abort the parse sequence after the NAME section.

While here, garbage collect the unused void *arg attribute
of struct mparse and mparse_alloc().

This reduces the processing time of mandocdb(8) on /usr/share/man
by a factor of 2 and the database size by a factor of 4.
However, it still takes 5 times the time and 6 times the space
of makewhatis(8), so more work is clearly needed.


# 1.97 30-Dec-2013 schwarze

Simplify: Remove an unused argument from the mandoc_eos() function.
No functional change.


# 1.96 24-Dec-2013 schwarze

When deciding whether two consecutive macros are on the same input line,
we have to compare the line where the first one *ends* (not where it begins)
to the line where the second one starts.
This fixes the bug that .Bk allowed output line breaks right after block
macros spanning more than one input line, even when the next macro follows
on the same line.


# 1.95 21-Oct-2013 schwarze

There are three kinds of input lines: text lines, macros taking
positional arguments (like Dt Fn Xr) and macros taking text as
arguments (like Nd Sh Em %T An). In the past, even the latter put
each word of their arguments into its own MDOC_TEXT node; instead,
concatenate arguments unless delimiters, keeps or spacing mode
prevent that. Regarding mandoc(1), this is internal refactoring,
no output change intended.

Once we will switch mandocdb(8) from DB to SQLite in the future,
this is going to be required to support search expressions crossing
word boundaries, and it will reduce both database sizes and build
times by a bit more than 5% each.


# 1.94 03-Oct-2013 schwarze

Support setting arbitrary roff(7) number registers,
preserving read support for the ".nr nS" SYNOPSIS state register;
read support for arbitrary registers is still not available.

Inspired by NetBSD roff.c rev. 1.18 (Christos Zoulas, March 21, 2013),
but implemented differently. I don't want to have yet another different
implementation of a hash table in mandoc - it would be the second one
in roff.c alone and the fifth one in mandoc grand total.
Instead, i designed and implemented roff_setreg() and roff_getreg()
to be similar to roff_setstrn() and roff_getstrn().

Once we feel the need to optimize, we can introduce one common
hash table implementation for everything in mandoc.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.93 17-Nov-2012 schwarze

Cleanup naming of local variables to make the code easier on the eye:
Settle for "struct man *man", "struct mdoc *mdoc", "struct meta *meta"
and avoid the confusing "*m" which was sometimes this, sometimes that.
No functional change.

ok kristaps@ some time ago


# 1.92 16-Nov-2012 schwarze

Fix a crash triggered by .Bl -tag .It Xo .El .Sh found by florian@.

* When allocating a body end marker, copy the pointer to the normalized
block information from the body block, avoiding the risk of subsequent
null pointer derefence.
* When inserting the body end marker into the syntax tree, do not try to
copy that pointer from the parent block, because not being a direkt child
of the block it belongs to is the whole point of a body end marker.
* Even non-callable blocks (like Bd and Bl) can break other blocks;
when this happens, postpone closing them out in the usual way.


Revision tags: OPENBSD_5_2_BASE
# 1.91 18-Jul-2012 schwarze

Fix handling of paragraph macros inside lists:
* When they are trailing the last item, move them outside the list.
* When they are trailing any other none-compact item, drop them.

Improves formatting of 40 pages, e.g. grep(1), ksh(1), netstat(1),
ath(4), bsd.port.mk(5), pf.conf(5), mount(8), crypto(9).


# 1.90 18-Jul-2012 schwarze

The mdoc(7) \*(Ba predefined string actually forces roman font;
that's stupid because it may break enclosing font changes,
but let's do the same for groff bug compatibility.

--> Never use \*(Ba, use just plain "|"! <--

Also, predefined strings are already expanded by the roff(7) parser,
so the mdoc(7) parser has to look for the expanded string.

Formatting improvements in ksh(1), less(1), atan2(3),
hostapd.conf(5), snmpd.conf(5), and mknod(8).


# 1.89 16-Jul-2012 schwarze

Several -mdoc parser improvements related to vertical spacing:
* So far, .Pp and .Lp were removed before paragraph type blocks.
* Now also remove .br before paragraph type blocks.
* Treat .Lp as a paragraph like .Pp, so remove .Pp, .Lp, .br before it.
* Do not treat .sp as a paragraph, don't remove anything before it.
* After .Sh, .Ss, .Pp, and .Lp, remove .Pp, .Lp, .sp, .br, and blank lines.
* After .sp and .br, remove .br.


# 1.88 07-Jul-2012 schwarze

Support the .cc request; code by kristaps@, tests by me.
Needed for sqlite3(1) as reported by espie@.


# 1.87 24-May-2012 schwarze

Support -Ios='OpenBSD 5.1' to override uname(3) as the source of the
default value for the mdoc(7) .Os macro.
Needed for man.cgi on the OpenBSD website.

Problem with man.cgi first noticed by deraadt@;
beck@ and deraadt@ agree with the way to solve the issue.


Revision tags: OPENBSD_5_1_BASE
# 1.86 30-Sep-2011 schwarze

implement .Ap .Bd .Bo .Bq .D1 .Ic .Lp .Oo .Pf .Po .Ss .Sx .Sy .br .sp
implement .Bl -bullet
add more information to the .TH line
escape dots at the beginnings of lines
add trailing newline character at the end of the file
do not misinterpret the ROOT block as .Ap


# 1.85 18-Sep-2011 schwarze

sync to version 1.11.7 from kristaps@
main new feature: support the roff(7) .tr request
plus various bugfixes and some refactoring

regressions are so minor that it's better to get this in
and fix them in the tree


# 1.84 18-Sep-2011 schwarze

sync to version 1.11.5:
adding an implementation of the eqn(7) language
by kristaps@

So far, only .EQ/.EN blocks are handled, in-line equations are not, and
rendering is not yet very pretty, but the parser is fairly complete.


Revision tags: OPENBSD_5_0_BASE
# 1.83 24-Apr-2011 schwarze

Merge version 1.11.1:
Again lots of cleanup and maintenance work by kristaps@.
- simplify error reporting: less function pointers, more mandoc_[v]msg
- main: split document parsing out of main.c into read.c
- roff, mdoc, man: improved recognition of control characters
- roff: better handling of if/else stack overflows
- roff: add some predefined strings for backward compatibility
- mdoc, man: empty sections are not errors
- mdoc: move delimiter handling to libmdoc
- some header restructuring and some minor features and fixes
This merge causes two minor regressions
that i will fix in separate commits right afterwards.


# 1.82 21-Apr-2011 schwarze

Merge version 1.10.10:
lots of cleanup and maintenance work by kristaps@.
- move some main.c globals into struct curparse
- move mandoc_*alloc to mandoc.h such that all code can use them
- make mandoc_isdelim available to formatting frontends
- dissolve mdoc_strings.c, move the code where it is used
- make all error reporting functions void, their return values were useless
- and various minor cleanups and fixes


# 1.81 20-Mar-2011 schwarze

Import the foundation for eqn(7) support.
Written by kristaps@.

For now, i'm adding one line to each of the four frontends
to just pass the input text through to the output,
not yet interpreting any of then eqn keywords.


# 1.80 07-Mar-2011 schwarze

Clean up date handling,
as a first step to get rid of the frequent petty warnings in this area:
- always store dates as strings, not as seconds since the Epoch
- for input, try the three most common formats everywhere
- for unrecognized format, just pass the date though verbatim
- when there is no date at all, still use the current date
Originally triggered by a one-line patch from Tim van der Molen,
<tbvdm at xs4all dot nl>, which is included here.
Feedback and OK on manual parts from jmc@.
"please check this in" kristaps@


Revision tags: OPENBSD_4_9_BASE
# 1.79 10-Feb-2011 schwarze

Tbl code maintenance by kristaps@.
- Remember the line-number of a tbl_span, and use it in messages.
- Put *_span_alloc() functions right into the *_addspan() ones,
since these are the only places they are called from.


# 1.78 09-Jan-2011 schwarze

Make sure coding errors cannot make us miss fatal parsing errors
by assert(3)ing valid parser state in the main parsing functions;
from kristaps@.


# 1.77 04-Jan-2011 schwarze

Merge kristaps@' cleaner tbl integration, removing mine;
there are still a few bugs, but fixing these will be easier in tree.


# 1.76 01-Jan-2011 schwarze

Clean up {mdoc,man}_{p,v}msg invocations:
Ignore the return values, they are constant anyway.
From kristaps@.


# 1.75 29-Dec-2010 schwarze

Reorg by Kristaps: In libmdoc, replace the union of pointers to structs
of macro-specific data by a pointer to a union of structs, which makes the
code simpler and more robust at the expense of a small memory overhead.
Merging was somewhat difficult because we mustn't break tbl(1) support
which the bsd.lv version does not yet have.


# 1.74 26-Dec-2010 schwarze

Behave more like groff (both old and new): Specifying both .%T and .%J in
an .Rs block causes the title to be quoted instead of underlined, such
that journal title and article title appear visually different.
Original diff from kristaps@, simplified by me, tweaked again by kristaps@.


# 1.73 21-Dec-2010 schwarze

Migrate .An to use a pointer to its data, like everybody else.
In preparation for a simpler ref-counted system for node data.
From kristaps@.


# 1.72 21-Dec-2010 schwarze

Vertical spacing improvements from kristaps@, small tweaks by me:
Add a "last child" member to struct mdoc_node.
Remove .Pp or .Lp if it is the first or last child of an .Sh or .Ss body.
Thus, no need to do the same in the front-ends any longer.
Tolerate some cases of .Pp inside .Bl.


# 1.71 02-Dec-2010 schwarze

Properly initialize the manual section to a default when .Dt is missing.
Without this, we died on an assertion.
Problem noted and patch provided by kristaps@.


# 1.70 01-Dec-2010 schwarze

Merge mdoc_action.c into mdoc_validate.c, because having two places to do
basically the same things just causes code duplication and confusion.
Work by kristaps@, including a few bugfixes he found during the merge,
and reapplying OpenBSD changes on top.


# 1.69 28-Nov-2010 schwarze

To avoid FATAL errors, we have been parsing and ignoring the roff
requests .am, .ami, .am1, .dei, and .rm for a long time.
Since ignoring them can (rarely) cause information loss and serious
misformatting, throw an ERROR: NOT IMPLEMENTED when finding them.
Implementing them would not be too difficult, but they are so rare
in practice that i can find better use for my time right now.

In this context,
- Put the string "NOT IMPLEMENTED" into two other error messages
as well, to distinguish them from those caused by broken input.
- Print the string "unknown macro" once, not twice in the error message
associated with MANDOCERR_MACRO, and begin printing the buffer at the
point where the unknown macro really is, not at the start of line.


# 1.68 16-Oct-2010 schwarze

Do not abort() on tbl errors, reduce the risk that tbl stuff kills a build,
and provide more useful tbl error messages in a non-intrusive way.


# 1.67 16-Oct-2010 schwarze

Support tbl(1) code embedded into mdoc(7) input files.
Very similar to what i have done in man(7) yesterday.
Allows to build cpu(4) on HPPA, wi(4), and phantasia(6).
Now we are able to build all tbl code in base.


# 1.66 27-Sep-2010 schwarze

Merge the last bits of 1.10.6 (released today), most were already in:
* ignore double-.Pp
* ignore .Pp before .Bd and .Bl (unless -compact in specified)
* avoid double blank line upon .Pp, .br and friends in literal context
* cast enums to int when passing them to exit(3) to please lint(1)
While merging, fix a regression introduced by kristaps@:
Outside literal mode, double blank lines must both be printed.
To achieve this again after kristaps@ improvements in 1.10.6,
treat such blank lines as .sp (instead of .Pp as in 1.10.5)
and drop .Pp before .sp just like dropping .Pp before .Pp.


# 1.65 20-Aug-2010 schwarze

Implement a simple, consistent user interface for error handling.
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.


# 1.64 18-Aug-2010 schwarze

Simplify and sync the code and comments for copying the macro name
in man_pmacro() and mdoc_pmacro(). In particular, no need to use
isgraph(3) here, that has already been done in main.c.
Joint work by Kristaps and myself, ok kristaps@.


Revision tags: OPENBSD_4_8_BASE
# 1.63 07-Aug-2010 schwarze

Groff allows the initial macro on a line to be delimited by a space
of by a tab; so allow the tab in mandoc, too.
Bug found by me, fix by kristaps@, "sure" deraadt@.


# 1.62 16-Jul-2010 schwarze

Text ending in a full stop, exclamation mark or question mark
should not flag the end of a sentence if:

1) The punctuation is followed by closing delimiters
and not preceded by alphanumeric characters, like in
"There is no full stop (.) in this sentence"

or

2) The punctuation is a child of a macro
and not preceded by alphanumeric characters, like in
"There is no full stop
.Pq \&.
in this sentence"

jmc@ and sobrado@ like this


# 1.61 13-Jul-2010 schwarze

Merge release 1.10.4 (all code by kristaps@), providing four new features:
1) Proper .Bk support: allow output line breaks at input line breaks,
but keep input lines together in the output, finally fixing
synopses like aucat(1), mail(1) and tmux(1).
2) Mostly finished -Tps (PostScript) output.
3) Implement -Thtml output for .Nm blocks and .Bk -words.
4) Allow iterative interpolation of user-defined roff(7) strings.
Also contains some minor bugfixes and some performance improvements.


# 1.60 01-Jul-2010 schwarze

In the mdoc(7) parser, inspect roff registers early such that all parts
of the parser can use the resulting cues. In particular, this allows
to use .nr nS to force SYNOPSIS-style .Nm indentation outside the
SYNOPSIS as needed by ifconfig(8).

To actually make this useable, .Pp must rewind .Nm, or the rest of the
section would end up indented. Implement a quick hack for now,
a generic solution can be designed later.

ok kristaps@ sobrado@


# 1.59 29-Jun-2010 schwarze

Support for badly nested blocks, written around the time of
the Rostock mandoc hackathon and tested and polished since,
supporting constructs like:

.Ao Bo Ac Bc (exp breaking exp)
.Aq Bo eol Bc (imp breaking exp)
.Ao Bq Ac eol (exp breaking imp)
.Ao Bo So Bc Ac Sc (double break, inner before outer)
.Ao Bo So Ac Bc Sc (double break, outer before inner)
.Ao Bo Ac So Bc Sc (broken breaker)
.Ao Bo So Bc Do Ac Sc Dc (broken double breaker)

There are still two known issues which are tricky:

1) Breaking two identical explicit blocks (Ao Bo Bo Ac or Aq Bo Bo eol)
fails outright, triggering a bogus syntax error.
2) Breaking a block by two identical explicit blocks (Ao Ao Bo Ac Ac Bc
or Ao Ao Bq Ac Ac eol) still has a minor rendering error left:
"<ao1 <ao2 [bo ac2> ac1> bc]>" should not have the final ">".

We can fix these later in the tree, let's not grow this diff too large.

"get it in" kristaps@


# 1.58 27-Jun-2010 schwarze

Full .nr nS support, unbreaking the kernel manuals.

Kristaps coded this from scratch after reading my .nr patch;
it is simpler and more powerful.

Registers live in struct regset in regs.h, struct man and struct mdoc
contain pointers to it. The nS register is cleared when parsing .Sh.
Frontends respect the MDOC_SYNPRETTY flag set in mdoc node_alloc.


# 1.57 26-Jun-2010 schwarze

merge release 1.10.2
* bug fixes:
- interaction of ASCII_HYPH with special chars (found by Ulrich Spoerlein)
- handling of roff conditionals (found by Ulrich Spoerlein)
- .Bd -offset will no more default to 6n
* maintenance:
- more caching of .Bd and .Bl arguments for efficiency
- deconstify man(7) validation routines
- add FreeBSD library names (provided by Ulrich Spoerlein)
* start PostScript font-switching


# 1.56 06-Jun-2010 schwarze

Merge bsd.lv version 1.10.1 (to be released soon).

The main step forward is that this now has *much* better .Bl -column
support, now supporting many manuals that previously errored out
without producing any output.

Other fixes include:
* do not die from multiple list types, use the first and warn
* in .Bl without a type, default to -item
* various tweaks to .Dt
* fix .In, .Fd, .Ft, .Fn and .Fo formatting
* some documentation fixes and additions
* and fix a couple of bugs reported by Ulrich Spoerlein:
* better support for roff block-end "\}" without a preceding dot
* .In must not break the line outside SYNOPSIS
* spelling in some error messages

While merging, fix one regression in .In spacing
that needs to go to bsd.lv, too.


# 1.55 26-May-2010 schwarze

When a word does not fully fit onto the output line, but it contains
at least one hyphen, we already had support for breaking the line a the
last fitting hyphen. This patch improves this functionality by only
breaking at hyphens in free-form text, and by not breaking at hyphens
* at the beginning or end of a word or
* immediately preceded or followed by another hyphen or
* escaped by a preceding backslash.

Before this patch, differences in break-at-hyphen support were one
of the major sources of noise in automatic comparisons to mdoc(7)
groff output. Now, the remaining differences are hard to find among
the noise coming from other sources.

Where there are still differences, what we do seems to be better than
what groff does, see e.g. the chio(1) exchange and position commands
for one of the now rare examples.

idea and coding by kristaps@

Besides, this was the last substantial code difference left
between bsd.lv and openbsd.org. We are now in full sync.


# 1.54 23-May-2010 schwarze

Unified error and warning message system for all of mandoc,
featuring three message levels, as agreed during the mandoc hackathon:
* FATAL parser failure, cannot produce any output from this input file:
eventually, we hope to convert most of these to ERRORs.
* ERROR, meaning mandoc cannot cope fully with the input syntax and will
probably lose information or produce structurally garbled output;
it will try to produce output anyway but exit non-zero at the end,
which is eventually intended to make the ports infrastructure happy.
* WARNING, meaning you should clean up the input file, but output
is probably mostly OK, so this will not cause error-exit at the end.
This commit is mostly just converting the old system to the new one; before
the classification will become really reliable, we must check all messages.

In particular,
* set up a new central message string table in main.c
* drop the old message string tables from man.c and mdoc.c
* get rid of the piece-meal merr enums in libman and libmdoc
* reduce number of error/warning functions from 16 to 6 (still a lot...)

While here, handle a few problems more gracefully:
* allow .Rv and .Ex to work without a prior .Nm
* allow .An to ignore extra arguments
* allow undeclared columns in .Bl -column

Written by kristaps@.


# 1.53 20-May-2010 schwarze

Support nested roff instructions:
* allow roff_parseln() to be re-run
* allow roff_parseln() to manipulate the line buffer offset
* support the offset in the man and mdoc libraries
* adapt .if, .ie, .el, .ig, .am* and .de* support
* interpret some instructions even in conditional-negative context
Coded by kristaps during the last day of the mandoc hackathon.

To avoid regressions in the OpenBSD tree, commit this together
with some small local additions:
* detect roff block end "\}" even on macro lines
* actually implement the ".if n" conditional
* ignore .ds, .rm and .tr in libroff

Also back my old .if/.ie/.el-handling out of libman, reverting:
man.h 1.15 man.c 1.25 man_macro.c 1.15 man_validate.c 1.19
man_action.c 1.15 man_term.c 1.28 man_html.c 1.9.


# 1.52 16-May-2010 schwarze

Rewrite the main mdoc text parser, mdoc_ptext()
to make it easier to understand and to fix various bugs:
* strip white space from the end MDOC_TEXT elements in literal mode
* in literal mode, a line may be blank even when containing tabs
* escaped backslashes do not escape following characters
ok kristaps@


# 1.51 16-May-2010 schwarze

allow the single quote as a control character in place of the dot
at all relevant places;
from kristaps@


# 1.50 15-May-2010 schwarze

allow non-numeric manual sections in -mdoc;
while here, allow LIBRARY in section 9;
by kristaps@


# 1.49 15-May-2010 schwarze

various improvements regarding errors and warnings Joerg Sonnenberger:
* If the last -column .Bl isn't specified, it is auto-sized.
* An invalid .St argument should be a warning, not an error.
Just put the argument into the output.
* An invalid .At argument should be a warning, not an error.
Just print the argument, like new groff does.
* Remove warnings concerning manual section (like 1, 6, 8).
It was only used for .Ex and not really useful.
* Remove warnings concerning page section (like SYNOPSIS).
These were only used for .Fd and .Lb and not really useful.


# 1.48 14-May-2010 schwarze

Integrate kristaps@' end-of-sentence (EOS) framework
which is simpler and more powerful than mine, and remove mine.

* man(7) now has EOS handling, too
* put EOS detection into its own function in libmandoc
* use node and termp flags to communicate the EOS condition
* no more EOS pseudo-macro
* no more non-printable EOS marker character on the formatter level

This slightly breaks EOS detection after trailing punctuation
in mdoc(7) macros, but that will be restored soon.


# 1.47 14-May-2010 schwarze

Merge 1.9.25, keeping local patches;
this does not merge kristaps' end-of-sentences handling yet,
i will check that separately. This one includes:
* handle \*(Ba as a delimiter
* introduce ARGS_PEND for .Bl -column .It end-of-line special casing
* section ordering: expect EXIT STATUS at the right place
* line break fixes in SYNOPSIS
* allow literal contexts to have arbitrary line lengths
* the input file column number can not be used to identify the beginning
of a line because white space is allowed after the initial '.'
* proper leading spaces in -man -Tascii mode
* do not let Lb break lines in -mdoc -Thtml LIBRARY


# 1.46 14-May-2010 schwarze

merge 1.9.24, keeping local patches; some changes:
* preserve multiple consecutive space characters in input
* do not restrict .Cd and .Rv to certain sections (requested by Joerg)
* do not run lookup() on quoted words
* enum return types for mdoc_args and mdoc_argv
* fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel)
* various lint and manual fixes


# 1.45 08-May-2010 schwarze

merge bsd.lv rev 1.123:
sync mdoc.c's static function names with man.c


# 1.44 08-May-2010 schwarze

handle text lines beginning with \." as comments, like groff does,
even though this is not correct comment syntax (so warn, too)
reported by Claus Assmann on misc@, fix by kristaps@


# 1.43 04-May-2010 schwarze

end-of-sentence markers at the end of .Fn argument lists
ruin indentation of the next line in the SYNOPSIS section;
bug found by jacekm@ in err(3)


# 1.42 27-Apr-2010 schwarze

Fix a subtle bug noticed by naddy@ in pftop(8), thanks!

When converting blank lines to .Pp outside literal context,
it could happen that the following node ended up as a child
of the .Pp element, but it must always be a sibling.


# 1.41 22-Apr-2010 schwarze

Fix a segfault reported by nicm@, introduced in rev. 1.38.
When finding a blank line, trying to parse it is a bad idea.
Instead, after adding .Pp to the AST, just return from parsetext().


# 1.40 07-Apr-2010 schwarze

Merge the good parts of 1.9.23,
avoid the bad parts of 1.9.23, and keep local patches.

Input in general:
* Basic handling of roff-style font escapes \f, \F.
* Quoted punctuation does not count as punctuation.

mdoc(7) parser:
* Make .Pf callable; noted by Claus Assmann.
* Let .Bd and .Bl ignore unknown arguments; noted by deraadt@.
* Do not warn when .Er is used outside certain sections.
* Replace mdoc_node_free[list] by mdoc_node_delete.
* Replace #define by enum for rew*() return values.

man(7) parser:
* When .TH is missing, use default section and date.

Output in general:
* Curly braces do not count as punctuation.
* No space after .Fl w/o args when a macro follows on the same line.

HTML output:
* Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT().
* Print whitespace after, not before .Vt .Fn .Ft .Fo.

Checked that all manuals in base still build.


# 1.39 04-Apr-2010 schwarze

When the prologue lacks required information, do not error out,
but warn, set up some default values, and prod on.
Unbreaking the ports build for textproc/sgmlformat;
reported by naddy@, thanks.


# 1.38 03-Apr-2010 schwarze

* outside literal context in mdoc(7), handle blank lines like .Pp
* a missing NAME section in mdoc(7) need not be fatal

ok deraadt@


# 1.37 02-Apr-2010 schwarze

merge 1.9.22, keeping local patches
* convert mdoc tokens from #define to enum
* fix a segfault with .Xo/.Xc in explicit blocks
* Thorn is \*(Th, not \*(TH; noticed by Joerg Sonnenberger


# 1.36 25-Mar-2010 schwarze

fix a stupid out-of-bounds read access introduced in the previous
revision, in the code searching for the end of a sentence


Revision tags: OPENBSD_4_7_BASE
# 1.35 02-Mar-2010 schwarze

Proper inter-sentence spacing for mdoc(7).
When a text line or a non-block macro line in the source code ends
in any of ".!?", consider that an end of sentence (EOS).
This makes Jason's rule "new sentence, new line" even more important.
Let the parser detect the EOS and insert a token into the AST.
Let the -Tascii frontend render the EOS token as a double space before
the next word.


# 1.34 18-Feb-2010 schwarze

sync to release 1.9.15:
* corrected .Vt handling (spotted by Joerg Sonnenberger)
* corrected .Xr argument handling (based on my patch)
* removed \\ escape sequence (because it is for low-level roff only)
* warn about trailing whitespace (suggested by jmc@)
* -Txhtml support
* and some general cleanup and doc improvements


# 1.33 02-Jan-2010 schwarze

complete the sync to 1.9.15-pre2: mostly minor fixes
* bugfix: do not restore TERMP flags when leaving lists, just reset them
* and a few HTML fixes
* clarity: return width from a2width, not width+2, and adapt to it
* manual: document .Bl and .Fl
* portability: no need to escape '%' in macro names


# 1.32 22-Dec-2009 schwarze

sync to 1.9.12, mostly portability and refactoring:

correctness/functionality:
- bugfix: do not die when overstep hits the right margin
- new option: -fign-escape
- and various HTML features

portability:
- replace bzero(3) by memset(3), which is ANSI C
- replace err(3)/warn(3) by perror(3)/exit(3), which is ANSI C
- iuse argv[0] instead of __progname
- add time.h to various files for FreeBSD compilation

simplicity:
- do not allocate header/footer data dynamically in *_term.c
- provide and use malloc frontends that error out on failure

for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.31 27-Oct-2009 schwarze

sync to 1.9.11: adapt printing of dates to groff conventions,
NetBSD portability fixes and some minor bugfixes and feature enhancements;
also checked that my hyphenation code still works on top of this


# 1.30 21-Oct-2009 schwarze

sync to 1.9.9, featuring:
* -Thtml output mode
* roff scaling units
* and some minor fixes
for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/


# 1.29 19-Oct-2009 schwarze

sync to 1.9.6: multiple improvements to references (.Rs)
* validate and order .Rs child nodes
* underline book title (.%B) and issuer (.%I)
* enclose title of article (.%T) in quotes
* avoid calling mdoc_verr directly, use a proper error code instead


# 1.28 19-Oct-2009 schwarze

sync to 1.9.6: u_char lives in <sys/types.h>
noticed by uqs at spoerlein dot net on FreeBSD,
where <stdlib.h> does not include <sys/types.h>


# 1.27 21-Sep-2009 schwarze

sync to 1.9.5: lookup hashes are now static tables
shortening the code, and, according to kristaps@, speeding it up


# 1.26 18-Sep-2009 schwarze

sync to 1.9.2: non-printable characters in macro names are errors;
from joerg at netbsd dot org


# 1.25 22-Aug-2009 schwarze

sync to 1.9.1: set mdoc_next flags in mdoc_*_alloc routines, where they belong


# 1.24 22-Aug-2009 schwarze

sync to 1.9.0: polishing the core code of mdoc macro handling
1) If a macro is not parsed, do not parse it. Of course, without
parsing it, we cannot produce "macro-like parameter" warnings,
but these were useless anyway.
2) If a macro is not callable, do not print a useless warning when
it occurs as a parameter, just display the raw characters.
3) Below .Bl -column, check whether macros are callable.
4) Like groff, allow whitespace after the initial dot on macro lines.


# 1.23 22-Aug-2009 schwarze

sync to 1.8.5: Error string is now file:line:col: message.
Fixed column reporting (off by one).
Use fprintf instead of warnx for parse errors (like cc).


# 1.22 26-Jul-2009 schwarze

sync to 1.8.1: rewrite quoted literal handling correctly,
rewrite TABSEP handling in a simpler way,
and retire ECOLEMPTY, ARGS_QUOTED and ARGS_ARGVLIKE


# 1.21 26-Jul-2009 schwarze

sync to 1.8.1: removed excessively verbose EARGVPARM warning


# 1.20 26-Jul-2009 schwarze

sync to 1.8.1: support .br and .sp


# 1.19 26-Jul-2009 schwarze

sync to 1.8.1: libmdoc now breaks up free-form lines into tokens;
will simplify LITERAL mode in front-end


# 1.18 18-Jul-2009 schwarze

sync to 1.8.0: move mdoc_a2att, mdoc_a2st, and mdoc_a2lib to libmdoc


# 1.17 12-Jul-2009 schwarze

sync to 1.7.23: pass warning code to mdoc_pwarn() instead of warning message
define additional warning macro mdoc_nwarn()
remove obsolete warning functions mdoc_warn(), pwarn(), vwarn(), nwarn()
remove various now unused "enum mdoc_warn" and "enum mwarn"


# 1.16 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_perr() instead of error string
and use the so improved mdoc_nerr() at many places;
get rid of now unused static functions perr()


# 1.15 12-Jul-2009 schwarze

sync to 1.7.23: pass error code to mdoc_nerr() instead of error string
and use the so improved mdoc_nerr() at many places


# 1.14 12-Jul-2009 schwarze

sync to 1.7.23: unify the various "enum merr" into libman.h and libmdoc.h,
use it as a new argument to mdoc_err(), the same way as for for man_err(),
and use string tables instead of switch statements to select error messages


# 1.13 12-Jul-2009 schwarze

sync to 1.7.23: third step to get rid of enum mdoc_warn:
mdoc_verr is not using enum mdoc_warn, so use it at a few more places


# 1.12 12-Jul-2009 schwarze

sync to 1.7.23: second step to get rid of enum mdoc_warn:
remove type from mdoc_vwarn arguments, and use this function where apropriate


# 1.11 12-Jul-2009 schwarze

sync to 1.7.23: first step to get rid of enum mdoc_warn:
unify manwarn() and mdocwarn() into mwarn()


Revision tags: OPENBSD_4_6_BASE
# 1.10 23-Jun-2009 schwarze

sync to 1.7.20: like for the -man case, add an nchild counter to the -mdoc
nodes, simplifying the validation code; no functional change


# 1.9 19-Jun-2009 schwarze

sync to 1.7.19: more elegant section handling


# 1.8 18-Jun-2009 schwarze

sync to 1.7.19: comment cleanup; no functional change


# 1.7 18-Jun-2009 schwarze

sync to 1.7.19: improved comment handling


# 1.6 18-Jun-2009 schwarze

complete sync to 1.7.17: garbage collect unused functions
mdoc_msg, mdoc_pmsg, mdoc_vmsg, and mdoc_nwarn


# 1.5 15-Jun-2009 schwarze

bring back miod@'s "real functions" patch (rev. 1.2)
which was erroneously backed out in rev. 1.4, sorry;
ok kristaps@


# 1.4 15-Jun-2009 schwarze

sync to 1.7.16:
reduce code duplication in warning and error reporting functions
while here, garbage collect three unused function prototypes


# 1.3 14-Jun-2009 schwarze

sync to 1.7.16: comments, whitespace and spelling fixes; no functional change


# 1.2 15-Apr-2009 miod

Replace variadic macros with real functions, so that this compiles on
platforms still using gcc 2.
ok deraadt@


# 1.1 06-Apr-2009 kristaps

Initial check-in of mandoc for formatting manuals. ok deraadt@