History log of /netbsd-current/usr.sbin/makemandb/makemandb.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.67 01-Jan-2023 gutteridge

makemandb.c: spell "metadata" consistently


Revision tags: netbsd-10-base
# 1.66 30-Oct-2022 gutteridge

makemandb.c: fix grammar in a comment


# 1.65 26-Oct-2022 andvar

fix various typos in comments and makefs README file.


# 1.64 11-Sep-2022 gutteridge

makemandb/*: fix spelling of database and consistency of SQLite


# 1.63 06-Jun-2022 skrll

Don't index outside the mdocs array of function pointers. Analysis and
suggested fixes from Tom Lane. I played it safe and went with (my
variation of) the minimal fix.

port-hppa/56118: sporadic app crashes in HPPA -current


# 1.62 06-Apr-2022 gutteridge

makemandb.c: fail sooner if man page dirs can't be found

There's no point initializing database state if we're then going to
fail to locate any man page sources. Make all the initial state checks
contiguous for simplicity and readability. Also, free the variable
"command" on the error path, and correct the error message.


# 1.61 05-Dec-2021 msaitoh

s/trival/trivial/ in comment.


Revision tags: cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 netbsd-9-2-RELEASE cjep_staticlib_x-base netbsd-9-1-RELEASE phil-wifi-20200421 phil-wifi-20200411 is-mlppp-base phil-wifi-20200406 netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609
# 1.60 18-May-2019 abhinav

branches: 1.60.2;
PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8; 1.55.10;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.66 30-Oct-2022 gutteridge

makemandb.c: fix grammar in a comment


# 1.65 26-Oct-2022 andvar

fix various typos in comments and makefs README file.


# 1.64 11-Sep-2022 gutteridge

makemandb/*: fix spelling of database and consistency of SQLite


# 1.63 06-Jun-2022 skrll

Don't index outside the mdocs array of function pointers. Analysis and
suggested fixes from Tom Lane. I played it safe and went with (my
variation of) the minimal fix.

port-hppa/56118: sporadic app crashes in HPPA -current


# 1.62 06-Apr-2022 gutteridge

makemandb.c: fail sooner if man page dirs can't be found

There's no point initializing database state if we're then going to
fail to locate any man page sources. Make all the initial state checks
contiguous for simplicity and readability. Also, free the variable
"command" on the error path, and correct the error message.


# 1.61 05-Dec-2021 msaitoh

s/trival/trivial/ in comment.


Revision tags: cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 netbsd-9-2-RELEASE cjep_staticlib_x-base netbsd-9-1-RELEASE phil-wifi-20200421 phil-wifi-20200411 is-mlppp-base phil-wifi-20200406 netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609
# 1.60 18-May-2019 abhinav

branches: 1.60.2;
PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8; 1.55.10;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.65 26-Oct-2022 andvar

fix various typos in comments and makefs README file.


# 1.64 11-Sep-2022 gutteridge

makemandb/*: fix spelling of database and consistency of SQLite


# 1.63 06-Jun-2022 skrll

Don't index outside the mdocs array of function pointers. Analysis and
suggested fixes from Tom Lane. I played it safe and went with (my
variation of) the minimal fix.

port-hppa/56118: sporadic app crashes in HPPA -current


# 1.62 06-Apr-2022 gutteridge

makemandb.c: fail sooner if man page dirs can't be found

There's no point initializing database state if we're then going to
fail to locate any man page sources. Make all the initial state checks
contiguous for simplicity and readability. Also, free the variable
"command" on the error path, and correct the error message.


# 1.61 05-Dec-2021 msaitoh

s/trival/trivial/ in comment.


Revision tags: cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 netbsd-9-2-RELEASE cjep_staticlib_x-base netbsd-9-1-RELEASE phil-wifi-20200421 phil-wifi-20200411 is-mlppp-base phil-wifi-20200406 netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609
# 1.60 18-May-2019 abhinav

branches: 1.60.2;
PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8; 1.55.10;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.64 11-Sep-2022 gutteridge

makemandb/*: fix spelling of database and consistency of SQLite


# 1.63 06-Jun-2022 skrll

Don't index outside the mdocs array of function pointers. Analysis and
suggested fixes from Tom Lane. I played it safe and went with (my
variation of) the minimal fix.

port-hppa/56118: sporadic app crashes in HPPA -current


# 1.62 06-Apr-2022 gutteridge

makemandb.c: fail sooner if man page dirs can't be found

There's no point initializing database state if we're then going to
fail to locate any man page sources. Make all the initial state checks
contiguous for simplicity and readability. Also, free the variable
"command" on the error path, and correct the error message.


# 1.61 05-Dec-2021 msaitoh

s/trival/trivial/ in comment.


Revision tags: cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 netbsd-9-2-RELEASE cjep_staticlib_x-base netbsd-9-1-RELEASE phil-wifi-20200421 phil-wifi-20200411 is-mlppp-base phil-wifi-20200406 netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609
# 1.60 18-May-2019 abhinav

branches: 1.60.2;
PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8; 1.55.10;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.63 06-Jun-2022 skrll

Don't index outside the mdocs array of function pointers. Analysis and
suggested fixes from Tom Lane. I played it safe and went with (my
variation of) the minimal fix.

port-hppa/56118: sporadic app crashes in HPPA -current


# 1.62 06-Apr-2022 gutteridge

makemandb.c: fail sooner if man page dirs can't be found

There's no point initializing database state if we're then going to
fail to locate any man page sources. Make all the initial state checks
contiguous for simplicity and readability. Also, free the variable
"command" on the error path, and correct the error message.


# 1.61 05-Dec-2021 msaitoh

s/trival/trivial/ in comment.


Revision tags: cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 netbsd-9-2-RELEASE cjep_staticlib_x-base netbsd-9-1-RELEASE phil-wifi-20200421 phil-wifi-20200411 is-mlppp-base phil-wifi-20200406 netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609
# 1.60 18-May-2019 abhinav

branches: 1.60.2;
PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8; 1.55.10;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.62 06-Apr-2022 gutteridge

makemandb.c: fail sooner if man page dirs can't be found

There's no point initializing database state if we're then going to
fail to locate any man page sources. Make all the initial state checks
contiguous for simplicity and readability. Also, free the variable
"command" on the error path, and correct the error message.


# 1.61 05-Dec-2021 msaitoh

s/trival/trivial/ in comment.


Revision tags: cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 netbsd-9-2-RELEASE cjep_staticlib_x-base netbsd-9-1-RELEASE phil-wifi-20200421 phil-wifi-20200411 is-mlppp-base phil-wifi-20200406 netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609
# 1.60 18-May-2019 abhinav

PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8; 1.55.10;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.61 05-Dec-2021 msaitoh

s/trival/trivial/ in comment.


Revision tags: cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 netbsd-9-2-RELEASE cjep_staticlib_x-base netbsd-9-1-RELEASE phil-wifi-20200421 phil-wifi-20200411 is-mlppp-base phil-wifi-20200406 netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609
# 1.60 18-May-2019 abhinav

PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8; 1.55.10;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.60 18-May-2019 abhinav

PR misc/54213: Fix performance of whatis(1) when no matches are found

In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: netbsd-8-1-RC1 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.59 11-Mar-2019 christos

remove unneeded header.


# 1.58 11-Mar-2019 christos

adjust to the new mandoc api


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.57 24-Aug-2018 abhinav

Adjust makemandb for the latest mandoc

ok christos@


# 1.56 16-Aug-2018 kre

In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).


Revision tags: pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2
# 1.55 10-May-2017 abhinav

branches: 1.55.8;
Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


Revision tags: prg-localcount2-base2
# 1.55 10-May-2017 abhinav

Get rid of unnecessary variable.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.54 02-May-2017 abhinav

We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.


Revision tags: prg-localcount2-base1
# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

branches: 1.47.2;
Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.53 01-May-2017 abhinav

Avoid dereferencing pointer at multiple places, instead use a local variable.


# 1.52 01-May-2017 abhinav

Remove the table name parameter from the check_md5 function.

There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.


# 1.51 01-May-2017 abhinav

Avoid copying strings where it is not needed.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.50 30-Apr-2017 abhinav

Avoid a call to strncmp when comparing only the first character of the string.


# 1.49 29-Apr-2017 abhinav

Bring the comment in sync with code (after changes brought by the last commit).


# 1.48 29-Apr-2017 abhinav

Don't parse Nm macro when it occurs anywhere outside the NAME section.

mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.47 20-Apr-2017 joerg

Use libarchive 3.x interface and not obsolete 2.x versions.


Revision tags: pgoyette-localcount-20170320 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.46 19-Dec-2016 abhinav

branches: 1.46.2;
Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.


# 1.46 19-Dec-2016 abhinav

Escape hyphen when parsing .Nd


# 1.45 17-Dec-2016 abhinav

Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@


Revision tags: pgoyette-localcount-20161104
# 1.44 03-Oct-2016 abhinav

We don't need to parse the sections we don't index, so stop early. Saves few
instructions.


# 1.43 03-Oct-2016 abhinav

With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@


Revision tags: localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726
# 1.42 17-Jul-2016 abhinav

Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@


# 1.41 17-Jul-2016 abhinav

Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>


# 1.40 15-Jul-2016 christos

Sync with API changes.


Revision tags: pgoyette-localcount-base
# 1.39 06-Jul-2016 abhinav

branches: 1.39.2;
Avoid possible buffer overflow while parsing NAME section of man(7) pages.
Also, simplify copyging of strings, use estrdupn instead of emalloc + memcpy.

Patch from christos@, XXX comment by me


# 1.38 05-Jul-2016 abhinav

Reuse variable from previous line.


# 1.37 13-Apr-2016 christos

PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.


# 1.36 13-Apr-2016 christos

PR/51040: Abhinav Upadhyay: Fix memory leak


# 1.35 13-Apr-2016 christos

PR/51039: Abhinav Upadhyay: Check for return value of chdir(2)


# 1.34 13-Apr-2016 christos

PR/51034: Abhinav Upadhyay: Close database connection when failed to commit


# 1.33 31-Mar-2016 christos

PR/51034: Abhinav Upadhyay: makemandb(8): Close database connection when
failed to commit


# 1.32 24-Mar-2016 christos

PR/51006: Abhinav Upadhyay: makemandb(8) should parse escape sequences
in the NAME section


# 1.31 28-Jan-2016 christos

Don't crash if we have a missing section.


# 1.30 18-Dec-2015 christos

Adjust to the new mdocml


# 1.29 07-Apr-2015 plunky

largely apply patch from PR bin/47392 by Abhinav Upadhyay

change some comments to reflect reality, a variable name to enhance
readability, and adds an assert for safety.


# 1.28 12-Mar-2015 joerg

MDOC_MAX is a valid token if the type is text. Adjust.


# 1.27 04-Mar-2015 christos

- handle section numbers that are not single digits
- don't allocate and free needlessly


# 1.26 02-Mar-2015 joerg

Explicitly deal with end of lists. PR 49708.


# 1.25 18-Oct-2014 snj

src is too big these days to tolerate superfluous apostrophes. It's
"its", people!


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.24 03-Jun-2014 wiz

branches: 1.24.2;
Fix a bug that caused an error about a UNIQUE constraint violation.
Patch from Abhinav Upadhyay.


# 1.23 24-May-2014 wiz

Replace non-breaking space with hyphen, and call hyphen replacement
from one more place.
Improves 'man -k midi' output.

From Abhinav Upadhyay.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.22 10-Feb-2014 chs

branches: 1.22.2;
in update_db(), extract the full list of files to update from the db
before actually updating anything, since changing the db while the query
that extracts the list of files is still in progress results in
the extraction query failing before it finds everything.


# 1.21 05-Jan-2014 joerg

Sync with interface change in mdocml 1.12.3.


# 1.20 13-Nov-2013 wiz

Skip files of size 0 from indexing.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 15-May-2013 christos

avoid stdio assertion, failing later


Revision tags: agc-symver-base
# 1.18 10-Feb-2013 christos

remove trailing whitespace


Revision tags: yamt-pagecache-base8
# 1.17 14-Jan-2013 christos

Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.


Revision tags: yamt-pagecache-base7
# 1.16 08-Nov-2012 christos

If you cannot parse .SH NAME, like in the case of the ksh93 man page
where the .SH is followed by a conditional:

.SH NAME
.if \nZ=0 \{\
text text text
.\}

at least don't core-dump.


Revision tags: yamt-pagecache-base6
# 1.15 06-Oct-2012 wiz

Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.14 07-Sep-2012 wiz

branches: 1.14.2;
Use emalloc in one more place, like the rest of the code does.
From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.13 29-Aug-2012 wiz

Add -Q flag:
Print only fatal error messages (i.e., when the database is left in
an inconsistent state and needs manual intervention).

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.


# 1.12 25-Aug-2012 wiz

Sync usage with manpage.


# 1.11 11-Aug-2012 wiz

Bug fix for PR 46733:
> makemandb always reports the same number for "Total Number of new or
> updated pages enountered" and "Total number of (hard or symbolic)
> links found".

Patch from Abhinav Upadhyay.


# 1.10 08-Jul-2012 uwe

Fix typo in a message.


Revision tags: yamt-pagecache-base5
# 1.9 07-May-2012 wiz

PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.


# 1.8 04-May-2012 wiz

The new apropos(1) incorrectly displays hyphens in the first line
of the search results for a few man pages (for man(7) based man
pages).

Use patch from Abhinav Upadhyay in PR 46408 to fix this.


Revision tags: yamt-pagecache-base4
# 1.7 02-Mar-2012 joerg

branches: 1.7.2;
Fix inverted condition when handling stale entries.
From Abhinav Upadhyay.


# 1.6 27-Feb-2012 joerg

Expand workaround for .so usage to do the chdir call just before
starting parsing, not during the tree iteration. This gives it a chance
to work.


# 1.5 16-Feb-2012 joerg

Add support for compressed man pages in all the usual formats.


# 1.4 15-Feb-2012 joerg

Also handle hyphen replacement if it was used as plain input and no
backslash sequence was used at all in the line.


# 1.3 15-Feb-2012 joerg

Be a bit more friendly to man pages using the roff .so command by
changing the current directory to the parent of the man -p entry, e.g.
/usr/share/man for /usr/share/man1.


Revision tags: netbsd-6-base
# 1.2 07-Feb-2012 joerg

branches: 1.2.2;
Fix C&P error with $NetBSD$


# 1.1 07-Feb-2012 joerg

Import the new apropos/whatis.

This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.