1\input texinfo
2@c %**start of header
3@setfilename regex.info
4@settitle Regex
5@c %**end of header
6
7@c \\{fill-paragraph} works better (for me, anyway) if the text in the
8@c source file isn't indented.
9@paragraphindent 2
10
11@c Define a new index for our magic constants.
12@defcodeindex cn
13
14@c Put everything in one index (arbitrarily chosen to be the concept index).
15@syncodeindex cn cp
16@syncodeindex ky cp
17@syncodeindex pg cp
18@syncodeindex tp cp
19@syncodeindex vr cp
20
21@c Here is what we use in the Info `dir' file:
22@c * Regex: (regex).	Regular expression library.
23
24
25@ifinfo
26This file documents the GNU regular expression library.
27
28Copyright (C) 1992, 1993 Free Software Foundation, Inc.
29
30Permission is granted to make and distribute verbatim copies of this
31manual provided the copyright notice and this permission notice are
32preserved on all copies.
33
34@ignore
35Permission is granted to process this file through TeX and print the
36results, provided the printed document carries a copying permission
37notice identical to this one except for the removal of this paragraph
38(this paragraph not being relevant to the printed manual).
39@end ignore
40
41Permission is granted to copy and distribute modified versions of this
42manual under the conditions for verbatim copying, provided also that the
43section entitled ``GNU General Public License'' is included exactly as
44in the original, and provided that the entire resulting derived work is
45distributed under the terms of a permission notice identical to this one.
46
47Permission is granted to copy and distribute translations of this manual
48into another language, under the above conditions for modified versions,
49except that the section entitled ``GNU General Public License'' may be
50included in a translation approved by the Free Software Foundation
51instead of in the original English.
52@end ifinfo
53
54
55@titlepage
56
57@title Regex
58@subtitle edition 0.12a
59@subtitle 19 September 1992
60@author Kathryn A. Hargreaves
61@author Karl Berry
62
63@page
64
65@vskip 0pt plus 1filll
66Copyright @copyright{} 1992 Free Software Foundation.
67
68Permission is granted to make and distribute verbatim copies of this
69manual provided the copyright notice and this permission notice are
70preserved on all copies.
71
72Permission is granted to copy and distribute modified versions of this
73manual under the conditions for verbatim copying, provided also that the
74section entitled ``GNU General Public License'' is included exactly as
75in the original, and provided that the entire resulting derived work is
76distributed under the terms of a permission notice identical to this
77one.
78
79Permission is granted to copy and distribute translations of this manual
80into another language, under the above conditions for modified versions,
81except that the section entitled ``GNU General Public License'' may be
82included in a translation approved by the Free Software Foundation
83instead of in the original English.
84
85@end titlepage
86
87
88@ifinfo
89@node Top, Overview, (dir), (dir)
90@top Regular Expression Library
91
92This manual documents how to program with the GNU regular expression
93library.  This is edition 0.12a of the manual, 19 September 1992.
94
95The first part of this master menu lists the major nodes in this Info
96document, including the index.  The rest of the menu lists all the
97lower level nodes in the document.
98
99@menu
100* Overview::
101* Regular Expression Syntax::
102* Common Operators::
103* GNU Operators::
104* GNU Emacs Operators::
105* What Gets Matched?::
106* Programming with Regex::
107* Copying::			Copying and sharing Regex.
108* Index::			General index.
109 --- The Detailed Node Listing ---
110
111Regular Expression Syntax
112
113* Syntax Bits::
114* Predefined Syntaxes::
115* Collating Elements vs. Characters::
116* The Backslash Character::
117
118Common Operators
119
120* Match-self Operator::			Ordinary characters.
121* Match-any-character Operator::	.
122* Concatenation Operator::		Juxtaposition.
123* Repetition Operators::		*  +  ? @{@}
124* Alternation Operator::		|
125* List Operators::			[...]  [^...]
126* Grouping Operators::			(...)
127* Back-reference Operator::		\digit
128* Anchoring Operators::			^  $
129
130Repetition Operators    
131
132* Match-zero-or-more Operator::  *
133* Match-one-or-more Operator::   +
134* Match-zero-or-one Operator::   ?
135* Interval Operators::           @{@}
136
137List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
138
139* Character Class Operators::   [:class:]
140* Range Operator::          start-end
141
142Anchoring Operators    
143
144* Match-beginning-of-line Operator::  ^
145* Match-end-of-line Operator::        $
146
147GNU Operators
148
149* Word Operators::
150* Buffer Operators::
151
152Word Operators
153
154* Non-Emacs Syntax Tables::
155* Match-word-boundary Operator::	\b
156* Match-within-word Operator::		\B
157* Match-beginning-of-word Operator::	\<
158* Match-end-of-word Operator::		\>
159* Match-word-constituent Operator::	\w
160* Match-non-word-constituent Operator::	\W
161
162Buffer Operators    
163
164* Match-beginning-of-buffer Operator::	\`
165* Match-end-of-buffer Operator::	\'
166
167GNU Emacs Operators
168
169* Syntactic Class Operators::
170
171Syntactic Class Operators
172
173* Emacs Syntax Tables::
174* Match-syntactic-class Operator::	\sCLASS
175* Match-not-syntactic-class Operator::  \SCLASS
176
177Programming with Regex
178
179* GNU Regex Functions::
180* POSIX Regex Functions::
181* BSD Regex Functions::
182
183GNU Regex Functions
184
185* GNU Pattern Buffers::         The re_pattern_buffer type.
186* GNU Regular Expression Compiling::  re_compile_pattern ()
187* GNU Matching::                re_match ()
188* GNU Searching::               re_search ()
189* Matching/Searching with Split Data::  re_match_2 (), re_search_2 ()
190* Searching with Fastmaps::     re_compile_fastmap ()
191* GNU Translate Tables::        The `translate' field.
192* Using Registers::             The re_registers type and related fns.
193* Freeing GNU Pattern Buffers::  regfree ()
194
195POSIX Regex Functions
196
197* POSIX Pattern Buffers::		The regex_t type.
198* POSIX Regular Expression Compiling::	regcomp ()
199* POSIX Matching::			regexec ()
200* Reporting Errors::			regerror ()
201* Using Byte Offsets::			The regmatch_t type.
202* Freeing POSIX Pattern Buffers::	regfree ()
203
204BSD Regex Functions
205
206* BSD Regular Expression Compiling::	re_comp ()
207* BSD Searching::			re_exec ()
208@end menu
209@end ifinfo
210@node Overview, Regular Expression Syntax, Top, Top
211@chapter Overview
212
213A @dfn{regular expression} (or @dfn{regexp}, or @dfn{pattern}) is a text
214string that describes some (mathematical) set of strings.  A regexp
215@var{r} @dfn{matches} a string @var{s} if @var{s} is in the set of
216strings described by @var{r}.
217
218Using the Regex library, you can:
219
220@itemize @bullet
221
222@item
223see if a string matches a specified pattern as a whole, and 
224
225@item
226search within a string for a substring matching a specified pattern.
227
228@end itemize
229
230Some regular expressions match only one string, i.e., the set they
231describe has only one member.  For example, the regular expression
232@samp{foo} matches the string @samp{foo} and no others.  Other regular
233expressions match more than one string, i.e., the set they describe has
234more than one member.  For example, the regular expression @samp{f*}
235matches the set of strings made up of any number (including zero) of
236@samp{f}s.  As you can see, some characters in regular expressions match
237themselves (such as @samp{f}) and some don't (such as @samp{*}); the
238ones that don't match themselves instead let you specify patterns that
239describe many different strings.
240
241To either match or search for a regular expression with the Regex
242library functions, you must first compile it with a Regex pattern
243compiling function.  A @dfn{compiled pattern} is a regular expression
244converted to the internal format used by the library functions.  Once
245you've compiled a pattern, you can use it for matching or searching any
246number of times.
247
248The Regex library consists of two source files: @file{regex.h} and
249@file{regex.c}.  
250@pindex regex.h
251@pindex regex.c
252Regex provides three groups of functions with which you can operate on
253regular expressions.  One group---the @sc{gnu} group---is more powerful
254but not completely compatible with the other two, namely the @sc{posix}
255and Berkeley @sc{unix} groups; its interface was designed specifically
256for @sc{gnu}.  The other groups have the same interfaces as do the
257regular expression functions in @sc{posix} and Berkeley
258@sc{unix}.
259
260We wrote this chapter with programmers in mind, not users of
261programs---such as Emacs---that use Regex.  We describe the Regex
262library in its entirety, not how to write regular expressions that a
263particular program understands.
264
265
266@node Regular Expression Syntax, Common Operators, Overview, Top
267@chapter Regular Expression Syntax
268
269@cindex regular expressions, syntax of
270@cindex syntax of regular expressions
271
272@dfn{Characters} are things you can type.  @dfn{Operators} are things in
273a regular expression that match one or more characters.  You compose
274regular expressions from operators, which in turn you specify using one
275or more characters.
276
277Most characters represent what we call the match-self operator, i.e.,
278they match themselves; we call these characters @dfn{ordinary}.  Other
279characters represent either all or parts of fancier operators; e.g.,
280@samp{.} represents what we call the match-any-character operator
281(which, no surprise, matches (almost) any character); we call these
282characters @dfn{special}.  Two different things determine what
283characters represent what operators:
284
285@enumerate
286@item
287the regular expression syntax your program has told the Regex library to
288recognize, and
289
290@item
291the context of the character in the regular expression.
292@end enumerate
293
294In the following sections, we describe these things in more detail.
295
296@menu
297* Syntax Bits::
298* Predefined Syntaxes::
299* Collating Elements vs. Characters::
300* The Backslash Character::
301@end menu
302
303
304@node Syntax Bits, Predefined Syntaxes,  , Regular Expression Syntax
305@section Syntax Bits 
306
307@cindex syntax bits
308
309In any particular syntax for regular expressions, some characters are
310always special, others are sometimes special, and others are never
311special.  The particular syntax that Regex recognizes for a given
312regular expression depends on the value in the @code{syntax} field of
313the pattern buffer of that regular expression.
314
315You get a pattern buffer by compiling a regular expression.  @xref{GNU
316Pattern Buffers}, and @ref{POSIX Pattern Buffers}, for more information
317on pattern buffers.  @xref{GNU Regular Expression Compiling}, @ref{POSIX
318Regular Expression Compiling}, and @ref{BSD Regular Expression
319Compiling}, for more information on compiling.
320
321Regex considers the value of the @code{syntax} field to be a collection
322of bits; we refer to these bits as @dfn{syntax bits}.  In most cases,
323they affect what characters represent what operators.  We describe the
324meanings of the operators to which we refer in @ref{Common Operators},
325@ref{GNU Operators}, and @ref{GNU Emacs Operators}.  
326
327For reference, here is the complete list of syntax bits, in alphabetical
328order:
329
330@table @code
331
332@cnindex RE_BACKSLASH_ESCAPE_IN_LIST
333@item RE_BACKSLASH_ESCAPE_IN_LISTS
334If this bit is set, then @samp{\} inside a list (@pxref{List Operators}
335quotes (makes ordinary, if it's special) the following character; if
336this bit isn't set, then @samp{\} is an ordinary character inside lists.
337(@xref{The Backslash Character}, for what `\' does outside of lists.)
338
339@cnindex RE_BK_PLUS_QM
340@item RE_BK_PLUS_QM
341If this bit is set, then @samp{\+} represents the match-one-or-more
342operator and @samp{\?} represents the match-zero-or-more operator; if
343this bit isn't set, then @samp{+} represents the match-one-or-more
344operator and @samp{?} represents the match-zero-or-one operator.  This
345bit is irrelevant if @code{RE_LIMITED_OPS} is set.
346
347@cnindex RE_CHAR_CLASSES
348@item RE_CHAR_CLASSES
349If this bit is set, then you can use character classes in lists; if this
350bit isn't set, then you can't.
351
352@cnindex RE_CONTEXT_INDEP_ANCHORS
353@item RE_CONTEXT_INDEP_ANCHORS
354If this bit is set, then @samp{^} and @samp{$} are special anywhere outside
355a list; if this bit isn't set, then these characters are special only in
356certain contexts.  @xref{Match-beginning-of-line Operator}, and
357@ref{Match-end-of-line Operator}.
358
359@cnindex RE_CONTEXT_INDEP_OPS
360@item RE_CONTEXT_INDEP_OPS
361If this bit is set, then certain characters are special anywhere outside
362a list; if this bit isn't set, then those characters are special only in
363some contexts and are ordinary elsewhere.  Specifically, if this bit
364isn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS}
365isn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, depending
366on the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operators
367only if they're not first in a regular expression or just after an
368open-group or alternation operator.  The same holds for @samp{@{} (or
369@samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) if
370it is the beginning of a valid interval and the syntax bit
371@code{RE_INTERVALS} is set.
372
373@cnindex RE_CONTEXT_INVALID_OPS
374@item RE_CONTEXT_INVALID_OPS
375If this bit is set, then repetition and alternation operators can't be
376in certain positions within a regular expression.  Specifically, the
377regular expression is invalid if it has:
378
379@itemize @bullet
380
381@item
382a repetition operator first in the regular expression or just after a
383match-beginning-of-line, open-group, or alternation operator; or
384
385@item
386an alternation operator first or last in the regular expression, just
387before a match-end-of-line operator, or just after an alternation or
388open-group operator.
389
390@end itemize
391
392If this bit isn't set, then you can put the characters representing the
393repetition and alternation characters anywhere in a regular expression.
394Whether or not they will in fact be operators in certain positions
395depends on other syntax bits.
396
397@cnindex RE_DOT_NEWLINE
398@item RE_DOT_NEWLINE
399If this bit is set, then the match-any-character operator matches
400a newline; if this bit isn't set, then it doesn't.
401
402@cnindex RE_DOT_NOT_NULL
403@item RE_DOT_NOT_NULL
404If this bit is set, then the match-any-character operator doesn't match
405a null character; if this bit isn't set, then it does.
406
407@cnindex RE_INTERVALS
408@item RE_INTERVALS
409If this bit is set, then Regex recognizes interval operators; if this bit
410isn't set, then it doesn't.
411
412@cnindex RE_LIMITED_OPS
413@item RE_LIMITED_OPS
414If this bit is set, then Regex doesn't recognize the match-one-or-more,
415match-zero-or-one or alternation operators; if this bit isn't set, then
416it does.
417
418@cnindex RE_NEWLINE_ALT
419@item RE_NEWLINE_ALT
420If this bit is set, then newline represents the alternation operator; if
421this bit isn't set, then newline is ordinary.
422
423@cnindex RE_NO_BK_BRACES
424@item RE_NO_BK_BRACES
425If this bit is set, then @samp{@{} represents the open-interval operator
426and @samp{@}} represents the close-interval operator; if this bit isn't
427set, then @samp{\@{} represents the open-interval operator and
428@samp{\@}} represents the close-interval operator.  This bit is relevant
429only if @code{RE_INTERVALS} is set.
430
431@cnindex RE_NO_BK_PARENS
432@item RE_NO_BK_PARENS
433If this bit is set, then @samp{(} represents the open-group operator and
434@samp{)} represents the close-group operator; if this bit isn't set, then
435@samp{\(} represents the open-group operator and @samp{\)} represents
436the close-group operator.
437
438@cnindex RE_NO_BK_REFS
439@item RE_NO_BK_REFS
440If this bit is set, then Regex doesn't recognize @samp{\}@var{digit} as
441the back reference operator; if this bit isn't set, then it does.
442
443@cnindex RE_NO_BK_VBAR
444@item RE_NO_BK_VBAR
445If this bit is set, then @samp{|} represents the alternation operator;
446if this bit isn't set, then @samp{\|} represents the alternation
447operator.  This bit is irrelevant if @code{RE_LIMITED_OPS} is set.
448
449@cnindex RE_NO_EMPTY_RANGES
450@item RE_NO_EMPTY_RANGES
451If this bit is set, then a regular expression with a range whose ending
452point collates lower than its starting point is invalid; if this bit
453isn't set, then Regex considers such a range to be empty.
454
455@cnindex RE_UNMATCHED_RIGHT_PAREN_ORD
456@item RE_UNMATCHED_RIGHT_PAREN_ORD
457If this bit is set and the regular expression has no matching open-group
458operator, then Regex considers what would otherwise be a close-group
459operator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}.
460
461@end table
462
463
464@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax
465@section Predefined Syntaxes    
466
467If you're programming with Regex, you can set a pattern buffer's
468(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers})
469@code{syntax} field either to an arbitrary combination of syntax bits
470(@pxref{Syntax Bits}) or else to the configurations defined by Regex.
471These configurations define the syntaxes used by certain
472programs---@sc{gnu} Emacs,
473@cindex Emacs 
474@sc{posix} Awk,
475@cindex POSIX Awk
476traditional Awk, 
477@cindex Awk
478Grep,
479@cindex Grep
480@cindex Egrep
481Egrep---in addition to syntaxes for @sc{posix} basic and extended
482regular expressions.
483
484The predefined syntaxes--taken directly from @file{regex.h}---are:
485
486@example
487[[[ syntaxes ]]]
488@end example
489
490@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax
491@section Collating Elements vs.@: Characters    
492
493@sc{posix} generalizes the notion of a character to that of a
494collating element.  It defines a @dfn{collating element} to be ``a
495sequence of one or more bytes defined in the current collating sequence
496as a unit of collation.''
497
498This generalizes the notion of a character in
499two ways.  First, a single character can map into two or more collating
500elements.  For example, the German
501@tex
502`\ss'
503@end tex
504@ifinfo
505``es-zet''
506@end ifinfo
507collates as the collating element @samp{s} followed by another collating
508element @samp{s}.  Second, two or more characters can map into one
509collating element.  For example, the Spanish @samp{ll} collates after
510@samp{l} and before @samp{m}.
511
512Since @sc{posix}'s ``collating element'' preserves the essential idea of
513a ``character,'' we use the latter, more familiar, term in this document.
514
515@node The Backslash Character,  , Collating Elements vs. Characters, Regular Expression Syntax
516@section The Backslash Character
517
518@cindex \
519The @samp{\} character has one of four different meanings, depending on
520the context in which you use it and what syntax bits are set
521(@pxref{Syntax Bits}).  It can: 1) stand for itself, 2) quote the next
522character, 3) introduce an operator, or 4) do nothing.
523
524@enumerate
525@item
526It stands for itself inside a list
527(@pxref{List Operators}) if the syntax bit
528@code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set.  For example, @samp{[\]}
529would match @samp{\}.
530
531@item
532It quotes (makes ordinary, if it's special) the next character when you
533use it either:
534
535@itemize @bullet
536@item
537outside a list,@footnote{Sometimes
538you don't have to explicitly quote special characters to make
539them ordinary.  For instance, most characters lose any special meaning
540inside a list (@pxref{List Operators}).  In addition, if the syntax bits
541@code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS}
542aren't set, then (for historical reasons) the matcher considers special
543characters ordinary if they are in contexts where the operations they
544represent make no sense; for example, then the match-zero-or-more
545operator (represented by @samp{*}) matches itself in the regular
546expression @samp{*foo} because there is no preceding expression on which
547it can operate.  It is poor practice, however, to depend on this
548behavior; if you want a special character to be ordinary outside a list,
549it's better to always quote it, regardless.} or
550
551@item
552inside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set.
553
554@end itemize
555
556@item
557It introduces an operator when followed by certain ordinary
558characters---sometimes only when certain syntax bits are set.  See the
559cases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},
560@code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}.  Also:
561
562@itemize @bullet
563@item
564@samp{\b} represents the match-word-boundary operator
565(@pxref{Match-word-boundary Operator}).
566
567@item
568@samp{\B} represents the match-within-word operator
569(@pxref{Match-within-word Operator}).
570
571@item
572@samp{\<} represents the match-beginning-of-word operator @*
573(@pxref{Match-beginning-of-word Operator}).
574
575@item
576@samp{\>} represents the match-end-of-word operator
577(@pxref{Match-end-of-word Operator}).
578
579@item
580@samp{\w} represents the match-word-constituent operator
581(@pxref{Match-word-constituent Operator}).
582
583@item
584@samp{\W} represents the match-non-word-constituent operator
585(@pxref{Match-non-word-constituent Operator}).
586
587@item
588@samp{\`} represents the match-beginning-of-buffer
589operator and @samp{\'} represents the match-end-of-buffer operator
590(@pxref{Buffer Operators}).
591
592@item
593If Regex was compiled with the C preprocessor symbol @code{emacs}
594defined, then @samp{\s@var{class}} represents the match-syntactic-class
595operator and @samp{\S@var{class}} represents the
596match-not-syntactic-class operator (@pxref{Syntactic Class Operators}).
597
598@end itemize
599
600@item
601In all other cases, Regex ignores @samp{\}.  For example,
602@samp{\n} matches @samp{n}.
603
604@end enumerate
605
606@node Common Operators, GNU Operators, Regular Expression Syntax, Top
607@chapter Common Operators
608
609You compose regular expressions from operators.  In the following
610sections, we describe the regular expression operators specified by
611@sc{posix}; @sc{gnu} also uses these.  Most operators have more than one
612representation as characters.  @xref{Regular Expression Syntax}, for
613what characters represent what operators under what circumstances.
614
615For most operators that can be represented in two ways, one
616representation is a single character and the other is that character
617preceded by @samp{\}.  For example, either @samp{(} or @samp{\(}
618represents the open-group operator.  Which one does depends on the
619setting of a syntax bit, in this case @code{RE_NO_BK_PARENS}.  Why is
620this so?  Historical reasons dictate some of the varying
621representations, while @sc{posix} dictates others.  
622
623Finally, almost all characters lose any special meaning inside a list
624(@pxref{List Operators}).
625
626@menu
627* Match-self Operator::			Ordinary characters.
628* Match-any-character Operator::	.
629* Concatenation Operator::		Juxtaposition.
630* Repetition Operators::		*  +  ? @{@}
631* Alternation Operator::		|
632* List Operators::			[...]  [^...]
633* Grouping Operators::			(...)
634* Back-reference Operator::		\digit
635* Anchoring Operators::			^  $
636@end menu
637
638@node Match-self Operator, Match-any-character Operator,  , Common Operators
639@section The Match-self Operator (@var{ordinary character})
640
641This operator matches the character itself.  All ordinary characters
642(@pxref{Regular Expression Syntax}) represent this operator.  For
643example, @samp{f} is always an ordinary character, so the regular
644expression @samp{f} matches only the string @samp{f}.  In
645particular, it does @emph{not} match the string @samp{ff}.
646
647@node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators
648@section The Match-any-character Operator (@code{.})
649
650@cindex @samp{.}
651
652This operator matches any single printing or nonprinting character
653except it won't match a:
654
655@table @asis
656@item newline
657if the syntax bit @code{RE_DOT_NEWLINE} isn't set.
658
659@item null
660if the syntax bit @code{RE_DOT_NOT_NULL} is set.
661
662@end table
663
664The @samp{.} (period) character represents this operator.  For example,
665@samp{a.b} matches any three-character string beginning with @samp{a}
666and ending with @samp{b}.
667
668@node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators
669@section The Concatenation Operator
670
671This operator concatenates two regular expressions @var{a} and @var{b}.
672No character represents this operator; you simply put @var{b} after
673@var{a}.  The result is a regular expression that will match a string if
674@var{a} matches its first part and @var{b} matches the rest.  For
675example, @samp{xy} (two match-self operators) matches @samp{xy}.
676
677@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators
678@section Repetition Operators    
679
680Repetition operators repeat the preceding regular expression a specified
681number of times.
682
683@menu
684* Match-zero-or-more Operator::  *
685* Match-one-or-more Operator::   +
686* Match-zero-or-one Operator::   ?
687* Interval Operators::           @{@}
688@end menu
689
690@node Match-zero-or-more Operator, Match-one-or-more Operator,  , Repetition Operators
691@subsection The Match-zero-or-more Operator (@code{*})
692
693@cindex @samp{*}
694
695This operator repeats the smallest possible preceding regular expression
696as many times as necessary (including zero) to match the pattern.
697@samp{*} represents this operator.  For example, @samp{o*}
698matches any string made up of zero or more @samp{o}s.  Since this
699operator operates on the smallest preceding regular expression,
700@samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}.  So,
701@samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
702
703Since the match-zero-or-more operator is a suffix operator, it may be
704useless as such when no regular expression precedes it.  This is the
705case when it:
706
707@itemize @bullet
708@item 
709is first in a regular expression, or
710
711@item 
712follows a match-beginning-of-line, open-group, or alternation
713operator.
714
715@end itemize
716
717@noindent
718Three different things can happen in these cases:
719
720@enumerate
721@item
722If the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then the
723regular expression is invalid.
724
725@item
726If @code{RE_CONTEXT_INVALID_OPS} isn't set, but
727@code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents the
728match-zero-or-more operator (which then operates on the empty string).
729
730@item
731Otherwise, @samp{*} is ordinary.
732
733@end enumerate
734
735@cindex backtracking
736The matcher processes a match-zero-or-more operator by first matching as
737many repetitions of the smallest preceding regular expression as it can.
738Then it continues to match the rest of the pattern.  
739
740If it can't match the rest of the pattern, it backtracks (as many times
741as necessary), each time discarding one of the matches until it can
742either match the entire pattern or be certain that it cannot get a
743match.  For example, when matching @samp{ca*ar} against @samp{caaar},
744the matcher first matches all three @samp{a}s of the string with the
745@samp{a*} of the regular expression.  However, it cannot then match the
746final @samp{ar} of the regular expression against the final @samp{r} of
747the string.  So it backtracks, discarding the match of the last @samp{a}
748in the string.  It can then match the remaining @samp{ar}.
749
750
751@node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators
752@subsection The Match-one-or-more Operator (@code{+} or @code{\+})
753
754@cindex @samp{+} 
755
756If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize
757this operator.  Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't
758set, then @samp{+} represents this operator; if it is, then @samp{\+}
759does.
760
761This operator is similar to the match-zero-or-more operator except that
762it repeats the preceding regular expression at least once;
763@pxref{Match-zero-or-more Operator}, for what it operates on, how some
764syntax bits affect it, and how Regex backtracks to match it.
765
766For example, supposing that @samp{+} represents the match-one-or-more
767operator; then @samp{ca+r} matches, e.g., @samp{car} and
768@samp{caaaar}, but not @samp{cr}.
769
770@node Match-zero-or-one Operator, Interval Operators, Match-one-or-more Operator, Repetition Operators
771@subsection The Match-zero-or-one Operator (@code{?} or @code{\?})
772@cindex @samp{?}
773
774If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
775recognize this operator.  Otherwise, if the syntax bit
776@code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator;
777if it is, then @samp{\?} does.
778
779This operator is similar to the match-zero-or-more operator except that
780it repeats the preceding regular expression once or not at all;
781@pxref{Match-zero-or-more Operator}, to see what it operates on, how
782some syntax bits affect it, and how Regex backtracks to match it.
783
784For example, supposing that @samp{?} represents the match-zero-or-one
785operator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, but
786nothing else.
787
788@node Interval Operators,  , Match-zero-or-one Operator, Repetition Operators
789@subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}})
790
791@cindex interval expression
792@cindex @samp{@{}
793@cindex @samp{@}}
794@cindex @samp{\@{}
795@cindex @samp{\@}}
796
797If the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes
798@dfn{interval expressions}.  They repeat the smallest possible preceding
799regular expression a specified number of times.
800
801If the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} represents
802the @dfn{open-interval operator} and @samp{@}} represents the
803@dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do.
804
805Specifically, supposing that @samp{@{} and @samp{@}} represent the
806open-interval and close-interval operators; then:
807
808@table @code
809@item  @{@var{count}@}
810matches exactly @var{count} occurrences of the preceding regular
811expression.
812
813@item @{@var{min,}@}
814matches @var{min} or more occurrences of the preceding regular
815expression.
816
817@item  @{@var{min, max}@}
818matches at least @var{min} but no more than @var{max} occurrences of
819the preceding regular expression.
820
821@end table
822
823The interval expression (but not necessarily the regular expression that
824contains it) is invalid if:
825
826@itemize @bullet
827@item
828@var{min} is greater than @var{max}, or 
829
830@item
831any of @var{count}, @var{min}, or @var{max} are outside the range
832zero to @code{RE_DUP_MAX} (which symbol @file{regex.h}
833defines).
834
835@end itemize
836
837If the interval expression is invalid and the syntax bit
838@code{RE_NO_BK_BRACES} is set, then Regex considers all the
839characters in the would-be interval to be ordinary.  If that bit
840isn't set, then the regular expression is invalid.
841
842If the interval expression is valid but there is no preceding regular
843expression on which to operate, then if the syntax bit
844@code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid.
845If that bit isn't set, then Regex considers all the characters---other
846than backslashes, which it ignores---in the would-be interval to be
847ordinary.
848
849
850@node Alternation Operator, List Operators, Repetition Operators, Common Operators
851@section The Alternation Operator (@code{|} or @code{\|})
852
853@kindex |
854@kindex \|
855@cindex alternation operator
856@cindex or operator
857
858If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
859recognize this operator.  Otherwise, if the syntax bit
860@code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator;
861otherwise, @samp{\|} does.
862
863Alternatives match one of a choice of regular expressions:
864if you put the character(s) representing the alternation operator between
865any two regular expressions @var{a} and @var{b}, the result matches
866the union of the strings that @var{a} and @var{b} match.  For
867example, supposing that @samp{|} is the alternation operator, then
868@samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or
869@samp{quux}.
870
871@ignore
872@c Nobody needs to disallow empty alternatives any more.
873If the syntax bit @code{RE_NO_EMPTY_ALTS} is set, then if either of the regular
874expressions @var{a} or @var{b} is empty, the
875regular expression is invalid.  More precisely, if this syntax bit is
876set, then the alternation operator can't:
877
878@itemize @bullet
879@item
880be first or last in a regular expression;
881
882@item
883follow either another alternation operator or an open-group operator
884(@pxref{Grouping Operators}); or
885
886@item
887precede a close-group operator.
888
889@end itemize
890
891@noindent
892For example, supposing @samp{(} and @samp{)} represent the open and
893close-group operators, then @samp{|foo}, @samp{foo|}, @samp{foo||bar},
894@samp{foo(|bar)}, and @samp{(foo|)bar} would all be invalid.
895@end ignore
896
897The alternation operator operates on the @emph{largest} possible
898surrounding regular expressions.  (Put another way, it has the lowest
899precedence of any regular expression operator.)
900Thus, the only way you can
901delimit its arguments is to use grouping.  For example, if @samp{(} and
902@samp{)} are the open and close-group operators, then @samp{fo(o|b)ar}
903would match either @samp{fooar} or @samp{fobar}.  (@samp{foo|bar} would
904match @samp{foo} or @samp{bar}.)
905
906@cindex backtracking
907The matcher usually tries all combinations of alternatives so as to 
908match the longest possible string.  For example, when matching
909@samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot
910take, say, the first (``depth-first'') combination it could match, since
911then it would be content to match just @samp{fooqbar}.  
912
913@comment xx something about leftmost-longest
914
915
916@node List Operators, Grouping Operators, Alternation Operator, Common Operators
917@section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
918
919@cindex matching list
920@cindex @samp{[}
921@cindex @samp{]}
922@cindex @samp{^}
923@cindex @samp{-}
924@cindex @samp{\}
925@cindex @samp{[^}
926@cindex nonmatching list
927@cindex matching newline
928@cindex bracket expression
929
930@dfn{Lists}, also called @dfn{bracket expressions}, are a set of one or
931more items.  An @dfn{item} is a character,
932@ignore
933(These get added when they get implemented.)
934a collating symbol, an equivalence class expression, 
935@end ignore
936a character class expression, or a range expression.  The syntax bits
937affect which kinds of items you can put in a list.  We explain the last
938two items in subsections below.  Empty lists are invalid.
939
940A @dfn{matching list} matches a single character represented by one of
941the list items.  You form a matching list by enclosing one or more items
942within an @dfn{open-matching-list operator} (represented by @samp{[})
943and a @dfn{close-list operator} (represented by @samp{]}).  
944
945For example, @samp{[ab]} matches either @samp{a} or @samp{b}.
946@samp{[ad]*} matches the empty string and any string composed of just
947@samp{a}s and @samp{d}s in any order.  Regex considers invalid a regular
948expression with a @samp{[} but no matching
949@samp{]}.
950
951@dfn{Nonmatching lists} are similar to matching lists except that they
952match a single character @emph{not} represented by one of the list
953items.  You use an @dfn{open-nonmatching-list operator} (represented by
954@samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to be
955the first character in the list.  If you put a @samp{^} character first
956in (what you think is) a matching list, you'll turn it into a
957nonmatching list.}) instead of an open-matching-list operator to start a
958nonmatching list.  
959
960For example, @samp{[^ab]} matches any character except @samp{a} or
961@samp{b}.  
962
963If the @code{posix_newline} field in the pattern buffer (@pxref{GNU
964Pattern Buffers} is set, then nonmatching lists do not match a newline.
965
966Most characters lose any special meaning inside a list.  The special
967characters inside a list follow.
968
969@table @samp
970@item ]
971ends the list if it's not the first list item.  So, if you want to make
972the @samp{]} character a list item, you must put it first.
973
974@item \
975quotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is
976set.
977
978@ignore
979Put these in if they get implemented.
980
981@item [.
982represents the open-collating-symbol operator (@pxref{Collating Symbol
983Operators}).
984
985@item .]
986represents the close-collating-symbol operator.
987
988@item [=
989represents the open-equivalence-class operator (@pxref{Equivalence Class
990Operators}).
991
992@item =]
993represents the close-equivalence-class operator.
994
995@end ignore
996
997@item [:
998represents the open-character-class operator (@pxref{Character Class
999Operators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and what
1000follows is a valid character class expression.
1001
1002@item :]
1003represents the close-character-class operator if the syntax bit
1004@code{RE_CHAR_CLASSES} is set and what precedes it is an
1005open-character-class operator followed by a valid character class name.
1006
1007@item - 
1008represents the range operator (@pxref{Range Operator}) if it's
1009not first or last in a list or the ending point of a range.
1010
1011@end table
1012
1013@noindent
1014All other characters are ordinary.  For example, @samp{[.*]} matches 
1015@samp{.} and @samp{*}.  
1016
1017@menu
1018* Character Class Operators::   [:class:]
1019* Range Operator::          start-end
1020@end menu
1021
1022@ignore
1023(If collating symbols and equivalence class expressions get implemented,
1024then add this.)
1025
1026node Collating Symbol Operators
1027subsubsection Collating Symbol Operators (@code{[.} @dots{} @code{.]})
1028
1029If the syntax bit @code{XX} is set, then you can represent
1030collating symbols inside lists.  You form a @dfn{collating symbol} by
1031putting a collating element between an @dfn{open-collating-symbol
1032operator} and an @dfn{close-collating-symbol operator}.  @samp{[.}
1033represents the open-collating-symbol operator and @samp{.]} represents
1034the close-collating-symbol operator.  For example, if @samp{ll} is a
1035collating element, then @samp{[[.ll.]]} would match @samp{ll}.
1036
1037node Equivalence Class Operators
1038subsubsection Equivalence Class Operators (@code{[=} @dots{} @code{=]})
1039@cindex equivalence class expression in regex
1040@cindex @samp{[=} in regex
1041@cindex @samp{=]} in regex
1042
1043If the syntax bit @code{XX} is set, then Regex recognizes equivalence class
1044expressions inside lists.  A @dfn{equivalence class expression} is a set
1045of collating elements which all belong to the same equivalence class.
1046You form an equivalence class expression by putting a collating
1047element between an @dfn{open-equivalence-class operator} and a
1048@dfn{close-equivalence-class operator}.  @samp{[=} represents the
1049open-equivalence-class operator and @samp{=]} represents the
1050close-equivalence-class operator.  For example, if @samp{a} and @samp{A}
1051were an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]}
1052would match both @samp{a} and @samp{A}.  If the collating element in an
1053equivalence class expression isn't part of an equivalence class, then
1054the matcher considers the equivalence class expression to be a collating
1055symbol.
1056
1057@end ignore
1058
1059@node Character Class Operators, Range Operator,  , List Operators
1060@subsection Character Class Operators (@code{[:} @dots{} @code{:]})
1061
1062@cindex character classes
1063@cindex @samp{[:} in regex
1064@cindex @samp{:]} in regex
1065
1066If the syntax bit @code{RE_CHARACTER_CLASSES} is set, then Regex
1067recognizes character class expressions inside lists.  A @dfn{character
1068class expression} matches one character from a given class.  You form a
1069character class expression by putting a character class name between an
1070@dfn{open-character-class operator} (represented by @samp{[:}) and a
1071@dfn{close-character-class operator} (represented by @samp{:]}).  The
1072character class names and their meanings are:
1073
1074@table @code
1075
1076@item alnum 
1077letters and digits
1078
1079@item alpha
1080letters
1081
1082@item blank
1083system-dependent; for @sc{gnu}, a space or tab
1084
1085@item cntrl
1086control characters (in the @sc{ascii} encoding, code 0177 and codes
1087less than 040)
1088
1089@item digit
1090digits
1091
1092@item graph
1093same as @code{print} except omits space
1094
1095@item lower 
1096lowercase letters
1097
1098@item print
1099printable characters (in the @sc{ascii} encoding, space 
1100tilde---codes 040 through 0176)
1101
1102@item punct
1103neither control nor alphanumeric characters
1104
1105@item space
1106space, carriage return, newline, vertical tab, and form feed
1107
1108@item upper
1109uppercase letters
1110
1111@item xdigit
1112hexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F}
1113
1114@end table
1115
1116@noindent
1117These correspond to the definitions in the C library's @file{<ctype.h>}
1118facility.  For example, @samp{[:alpha:]} corresponds to the standard
1119facility @code{isalpha}.  Regex recognizes character class expressions
1120only inside of lists; so @samp{[[:alpha:]]} matches any letter, but
1121@samp{[:alpha:]} outside of a bracket expression and not followed by a
1122repetition operator matches just itself.
1123
1124@node Range Operator,  , Character Class Operators, List Operators
1125@subsection The Range Operator (@code{-})
1126
1127Regex recognizes @dfn{range expressions} inside a list. They represent
1128those characters
1129that fall between two elements in the current collating sequence.  You
1130form a range expression by putting a @dfn{range operator} between two 
1131@ignore
1132(If these get implemented, then substitute this for ``characters.'')
1133of any of the following: characters, collating elements, collating symbols,
1134and equivalence class expressions.  The starting point of the range and
1135the ending point of the range don't have to be the same kind of item,
1136e.g., the starting point could be a collating element and the ending
1137point could be an equivalence class expression.  If a range's ending
1138point is an equivalence class, then all the collating elements in that
1139class will be in the range.
1140@end ignore
1141characters.@footnote{You can't use a character class for the starting
1142or ending point of a range, since a character class is not a single
1143character.} @samp{-} represents the range operator.  For example,
1144@samp{a-f} within a list represents all the characters from @samp{a}
1145through @samp{f}
1146inclusively.
1147
1148If the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range's
1149ending point collates less than its starting point, the range (and the
1150regular expression containing it) is invalid.  For example, the regular
1151expression @samp{[z-a]} would be invalid.  If this bit isn't set, then
1152Regex considers such a range to be empty.
1153
1154Since @samp{-} represents the range operator, if you want to make a
1155@samp{-} character itself
1156a list item, you must do one of the following:
1157
1158@itemize @bullet
1159@item
1160Put the @samp{-} either first or last in the list.
1161
1162@item
1163Include a range whose starting point collates strictly lower than
1164@samp{-} and whose ending point collates equal or higher.  Unless a
1165range is the first item in a list, a @samp{-} can't be its starting
1166point, but @emph{can} be its ending point.  That is because Regex
1167considers @samp{-} to be the range operator unless it is preceded by
1168another @samp{-}.  For example, in the @sc{ascii} encoding, @samp{)},
1169@samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} are
1170contiguous characters in the collating sequence.  You might think that
1171@samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}.  Rather, it
1172has the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, so
1173it matches, e.g., @samp{,}, not @samp{.}.
1174
1175@item
1176Put a range whose starting point is @samp{-} first in the list.
1177
1178@end itemize
1179
1180For example, @samp{[-a-z]} matches a lowercase letter or a hyphen (in
1181English, in @sc{ascii}).
1182
1183
1184@node Grouping Operators, Back-reference Operator, List Operators, Common Operators
1185@section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)})
1186
1187@kindex (
1188@kindex )
1189@kindex \(
1190@kindex \)
1191@cindex grouping
1192@cindex subexpressions
1193@cindex parenthesizing
1194
1195A @dfn{group}, also known as a @dfn{subexpression}, consists of an
1196@dfn{open-group operator}, any number of other operators, and a
1197@dfn{close-group operator}.  Regex treats this sequence as a unit, just
1198as mathematics and programming languages treat a parenthesized
1199expression as a unit.
1200
1201Therefore, using @dfn{groups}, you can:
1202
1203@itemize @bullet
1204@item
1205delimit the argument(s) to an alternation operator (@pxref{Alternation
1206Operator}) or a repetition operator (@pxref{Repetition
1207Operators}).
1208
1209@item 
1210keep track of the indices of the substring that matched a given group.
1211@xref{Using Registers}, for a precise explanation.
1212This lets you:
1213
1214@itemize @bullet
1215@item
1216use the back-reference operator (@pxref{Back-reference Operator}).
1217
1218@item 
1219use registers (@pxref{Using Registers}).
1220
1221@end itemize
1222
1223@end itemize
1224
1225If the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} represents
1226the open-group operator and @samp{)} represents the
1227close-group operator; otherwise, @samp{\(} and @samp{\)} do.
1228
1229If the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and a
1230close-group operator has no matching open-group operator, then Regex
1231considers it to match @samp{)}.
1232
1233
1234@node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators
1235@section The Back-reference Operator (@dfn{\}@var{digit})
1236
1237@cindex back references
1238
1239If the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizes
1240back references.  A back reference matches a specified preceding group.
1241The back reference operator is represented by @samp{\@var{digit}}
1242anywhere after the end of a regular expression's @w{@var{digit}-th}
1243group (@pxref{Grouping Operators}).
1244
1245@var{digit} must be between @samp{1} and @samp{9}.  The matcher assigns
1246numbers 1 through 9 to the first nine groups it encounters.  By using
1247one of @samp{\1} through @samp{\9} after the corresponding group's
1248close-group operator, you can match a substring identical to the
1249one that the group does.
1250
1251Back references match according to the following (in all examples below,
1252@samp{(} represents the open-group, @samp{)} the close-group, @samp{@{}
1253the open-interval and @samp{@}} the close-interval operator):
1254
1255@itemize @bullet
1256@item
1257If the group matches a substring, the back reference matches an
1258identical substring.  For example, @samp{(a)\1} matches @samp{aa} and
1259@samp{(bana)na\1bo\1} matches @samp{bananabanabobana}.  Likewise,
1260@samp{(.*)\1} matches any (newline-free if the syntax bit
1261@code{RE_DOT_NEWLINE} isn't set) string that is composed of two
1262identical halves; the @samp{(.*)} matches the first half and the
1263@samp{\1} matches the second half.
1264
1265@item
1266If the group matches more than once (as it might if followed
1267by, e.g., a repetition operator), then the back reference matches the
1268substring the group @emph{last} matched.  For example,
1269@samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (the
1270outer one) matches @samp{aab} and @w{group 2} (the inner one) matches
1271@samp{aa}.  Then @w{group 1} matches @samp{ab} and @w{group 2} matches
1272@samp{a}.  So, @samp{\1} matches @samp{ab} and @samp{\2} matches
1273@samp{a}.
1274
1275@item
1276If the group doesn't participate in a match, i.e., it is part of an
1277alternative not taken or a repetition operator allows zero repetitions
1278of it, then the back reference makes the whole match fail.  For example,
1279@samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three}
1280and @samp{two-and-four}, but not @samp{one-and-four} or
1281@samp{two-and-three}.  For example, if the pattern matches
1282@samp{one-and-}, then its @w{group 2} matches the empty string and its
1283@w{group 3} doesn't participate in the match.  So, if it then matches
1284@samp{four}, then when it tries to back reference @w{group 3}---which it
1285will attempt to do because @samp{\3} follows the @samp{four}---the match
1286will fail because @w{group 3} didn't participate in the match.
1287
1288@end itemize
1289
1290You can use a back reference as an argument to a repetition operator.  For
1291example, @samp{(a(b))\2*} matches @samp{a} followed by two or more
1292@samp{b}s.  Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}.
1293
1294If there is no preceding @w{@var{digit}-th} subexpression, the regular
1295expression is invalid.
1296
1297
1298@node Anchoring Operators,  , Back-reference Operator, Common Operators
1299@section Anchoring Operators    
1300
1301@cindex anchoring
1302@cindex regexp anchoring
1303
1304These operators can constrain a pattern to match only at the beginning or
1305end of the entire string or at the beginning or end of a line.
1306
1307@menu
1308* Match-beginning-of-line Operator::  ^
1309* Match-end-of-line Operator::        $
1310@end menu
1311
1312
1313@node Match-beginning-of-line Operator, Match-end-of-line Operator,  , Anchoring Operators
1314@subsection The Match-beginning-of-line Operator (@code{^})
1315
1316@kindex ^
1317@cindex beginning-of-line operator
1318@cindex anchors
1319
1320This operator can match the empty string either at the beginning of the
1321string or after a newline character.  Thus, it is said to @dfn{anchor}
1322the pattern to the beginning of a line.
1323
1324In the cases following, @samp{^} represents this operator.  (Otherwise,
1325@samp{^} is ordinary.)
1326
1327@itemize @bullet
1328
1329@item
1330It (the @samp{^}) is first in the pattern, as in @samp{^foo}.
1331
1332@cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})}
1333@item
1334The syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outside
1335a bracket expression.
1336
1337@cindex open-group operator and @samp{^}
1338@cindex alternation operator and @samp{^}
1339@item
1340It follows an open-group or alternation operator, as in @samp{a\(^b\)}
1341and @samp{a\|^b}.  @xref{Grouping Operators}, and @ref{Alternation
1342Operator}.
1343
1344@end itemize
1345
1346These rules imply that some valid patterns containing @samp{^} cannot be
1347matched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS}
1348is set.
1349
1350@vindex not_bol @r{field in pattern buffer}
1351If the @code{not_bol} field is set in the pattern buffer (@pxref{GNU
1352Pattern Buffers}), then @samp{^} fails to match at the beginning of the
1353string.  @xref{POSIX Matching}, for when you might find this useful.
1354
1355@vindex newline_anchor @r{field in pattern buffer}
1356If the @code{newline_anchor} field is set in the pattern buffer, then
1357@samp{^} fails to match after a newline.  This is useful when you do not
1358regard the string to be matched as broken into lines.
1359
1360
1361@node Match-end-of-line Operator,  , Match-beginning-of-line Operator, Anchoring Operators
1362@subsection The Match-end-of-line Operator (@code{$})
1363
1364@kindex $
1365@cindex end-of-line operator
1366@cindex anchors
1367
1368This operator can match the empty string either at the end of
1369the string or before a newline character in the string.  Thus, it is
1370said to @dfn{anchor} the pattern to the end of a line.
1371
1372It is always represented by @samp{$}.  For example, @samp{foo$} usually
1373matches, e.g., @samp{foo} and, e.g., the first three characters of
1374@samp{foo\nbar}.
1375
1376Its interaction with the syntax bits and pattern buffer fields is
1377exactly the dual of @samp{^}'s; see the previous section.  (That is,
1378``beginning'' becomes ``end'', ``next'' becomes ``previous'', and
1379``after'' becomes ``before''.)
1380
1381
1382@node GNU Operators, GNU Emacs Operators, Common Operators, Top
1383@chapter GNU Operators
1384
1385Following are operators that @sc{gnu} defines (and @sc{posix} doesn't).
1386
1387@menu
1388* Word Operators::
1389* Buffer Operators::
1390@end menu
1391
1392@node Word Operators, Buffer Operators,  , GNU Operators
1393@section Word Operators
1394
1395The operators in this section require Regex to recognize parts of words.
1396Regex uses a syntax table to determine whether or not a character is
1397part of a word, i.e., whether or not it is @dfn{word-constituent}.
1398
1399@menu
1400* Non-Emacs Syntax Tables::
1401* Match-word-boundary Operator::	\b
1402* Match-within-word Operator::		\B
1403* Match-beginning-of-word Operator::	\<
1404* Match-end-of-word Operator::		\>
1405* Match-word-constituent Operator::	\w
1406* Match-non-word-constituent Operator::	\W
1407@end menu
1408
1409@node Non-Emacs Syntax Tables, Match-word-boundary Operator,  , Word Operators
1410@subsection Non-Emacs Syntax Tables    
1411
1412A @dfn{syntax table} is an array indexed by the characters in your
1413character set.  In the @sc{ascii} encoding, therefore, a syntax table
1414has 256 elements.  Regex always uses a @code{char *} variable
1415@code{re_syntax_table} as its syntax table.  In some cases, it
1416initializes this variable and in others it expects you to initialize it.
1417
1418@itemize @bullet
1419@item
1420If Regex is compiled with the preprocessor symbols @code{emacs} and
1421@code{SYNTAX_TABLE} both undefined, then Regex allocates
1422@code{re_syntax_table} and initializes an element @var{i} either to
1423@code{Sword} (which it defines) if @var{i} is a letter, number, or
1424@samp{_}, or to zero if it's not.
1425
1426@item
1427If Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}
1428defined, then Regex expects you to define a @code{char *} variable
1429@code{re_syntax_table} to be a valid syntax table.
1430
1431@item
1432@xref{Emacs Syntax Tables}, for what happens when Regex is compiled with
1433the preprocessor symbol @code{emacs} defined.
1434
1435@end itemize
1436
1437@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators
1438@subsection The Match-word-boundary Operator (@code{\b})
1439
1440@cindex @samp{\b}
1441@cindex word boundaries, matching
1442
1443This operator (represented by @samp{\b}) matches the empty string at
1444either the beginning or the end of a word.  For example, @samp{\brat\b}
1445matches the separate word @samp{rat}.
1446
1447@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators
1448@subsection The Match-within-word Operator (@code{\B})
1449
1450@cindex @samp{\B}
1451
1452This operator (represented by @samp{\B}) matches the empty string within
1453a word. For example, @samp{c\Brat\Be} matches @samp{crate}, but
1454@samp{dirty \Brat} doesn't match @samp{dirty rat}.
1455
1456@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators
1457@subsection The Match-beginning-of-word Operator (@code{\<})
1458
1459@cindex @samp{\<}
1460
1461This operator (represented by @samp{\<}) matches the empty string at the
1462beginning of a word.
1463
1464@node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators
1465@subsection The Match-end-of-word Operator (@code{\>})
1466
1467@cindex @samp{\>}
1468
1469This operator (represented by @samp{\>}) matches the empty string at the
1470end of a word.
1471
1472@node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators
1473@subsection The Match-word-constituent Operator (@code{\w})
1474
1475@cindex @samp{\w}
1476
1477This operator (represented by @samp{\w}) matches any word-constituent
1478character.
1479
1480@node Match-non-word-constituent Operator,  , Match-word-constituent Operator, Word Operators
1481@subsection The Match-non-word-constituent Operator (@code{\W})
1482
1483@cindex @samp{\W}
1484
1485This operator (represented by @samp{\W}) matches any character that is
1486not word-constituent.
1487
1488
1489@node Buffer Operators,  , Word Operators, GNU Operators
1490@section Buffer Operators    
1491
1492Following are operators which work on buffers.  In Emacs, a @dfn{buffer}
1493is, naturally, an Emacs buffer.  For other programs, Regex considers the
1494entire string to be matched as the buffer.
1495
1496@menu
1497* Match-beginning-of-buffer Operator::	\`
1498* Match-end-of-buffer Operator::	\'
1499@end menu
1500
1501
1502@node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator,  , Buffer Operators
1503@subsection The Match-beginning-of-buffer Operator (@code{\`})
1504
1505@cindex @samp{\`}
1506
1507This operator (represented by @samp{\`}) matches the empty string at the
1508beginning of the buffer.
1509
1510@node Match-end-of-buffer Operator,  , Match-beginning-of-buffer Operator, Buffer Operators
1511@subsection The Match-end-of-buffer Operator (@code{\'})
1512
1513@cindex @samp{\'}
1514
1515This operator (represented by @samp{\'}) matches the empty string at the
1516end of the buffer.
1517
1518
1519@node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top
1520@chapter GNU Emacs Operators
1521
1522Following are operators that @sc{gnu} defines (and @sc{posix} doesn't)
1523that you can use only when Regex is compiled with the preprocessor
1524symbol @code{emacs} defined.  
1525
1526@menu
1527* Syntactic Class Operators::
1528@end menu
1529
1530
1531@node Syntactic Class Operators,  ,  , GNU Emacs Operators
1532@section Syntactic Class Operators
1533
1534The operators in this section require Regex to recognize the syntactic
1535classes of characters.  Regex uses a syntax table to determine this.
1536
1537@menu
1538* Emacs Syntax Tables::
1539* Match-syntactic-class Operator::	\sCLASS
1540* Match-not-syntactic-class Operator::  \SCLASS
1541@end menu
1542
1543@node Emacs Syntax Tables, Match-syntactic-class Operator,  , Syntactic Class Operators
1544@subsection Emacs Syntax Tables
1545
1546A @dfn{syntax table} is an array indexed by the characters in your
1547character set.  In the @sc{ascii} encoding, therefore, a syntax table
1548has 256 elements.
1549
1550If Regex is compiled with the preprocessor symbol @code{emacs} defined,
1551then Regex expects you to define and initialize the variable
1552@code{re_syntax_table} to be an Emacs syntax table.  Emacs' syntax
1553tables are more complicated than Regex's own (@pxref{Non-Emacs Syntax
1554Tables}).  @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},
1555for a description of Emacs' syntax tables.
1556
1557@node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators
1558@subsection The Match-syntactic-class Operator (@code{\s}@var{class})
1559
1560@cindex @samp{\s}
1561
1562This operator matches any character whose syntactic class is represented
1563by a specified character.  @samp{\s@var{class}} represents this operator
1564where @var{class} is the character representing the syntactic class you
1565want.  For example, @samp{w} represents the syntactic
1566class of word-constituent characters, so @samp{\sw} matches any
1567word-constituent character.
1568
1569@node Match-not-syntactic-class Operator,  , Match-syntactic-class Operator, Syntactic Class Operators
1570@subsection The Match-not-syntactic-class Operator (@code{\S}@var{class})
1571
1572@cindex @samp{\S}
1573
1574This operator is similar to the match-syntactic-class operator except
1575that it matches any character whose syntactic class is @emph{not}
1576represented by the specified character.  @samp{\S@var{class}} represents
1577this operator.  For example, @samp{w} represents the syntactic class of
1578word-constituent characters, so @samp{\Sw} matches any character that is
1579not word-constituent.
1580
1581
1582@node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top
1583@chapter What Gets Matched?
1584
1585Regex usually matches strings according to the ``leftmost longest''
1586rule; that is, it chooses the longest of the leftmost matches.  This
1587does not mean that for a regular expression containing subexpressions
1588that it simply chooses the longest match for each subexpression, left to
1589right; the overall match must also be the longest possible one.
1590
1591For example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not
1592@samp{acdac}, as it would if it were to choose the longest match for the
1593first subexpression.
1594
1595
1596@node Programming with Regex, Copying, What Gets Matched?, Top
1597@chapter Programming with Regex
1598
1599Here we describe how you use the Regex data structures and functions in
1600C programs.  Regex has three interfaces: one designed for @sc{gnu}, one
1601compatible with @sc{posix} and one compatible with Berkeley @sc{unix}.
1602
1603@menu
1604* GNU Regex Functions::
1605* POSIX Regex Functions::
1606* BSD Regex Functions::
1607@end menu
1608
1609
1610@node GNU Regex Functions, POSIX Regex Functions,  , Programming with Regex
1611@section GNU Regex Functions
1612
1613If you're writing code that doesn't need to be compatible with either
1614@sc{posix} or Berkeley @sc{unix}, you can use these functions.  They
1615provide more options than the other interfaces.
1616
1617@menu
1618* GNU Pattern Buffers::         The re_pattern_buffer type.
1619* GNU Regular Expression Compiling::  re_compile_pattern ()
1620* GNU Matching::                re_match ()
1621* GNU Searching::               re_search ()
1622* Matching/Searching with Split Data::  re_match_2 (), re_search_2 ()
1623* Searching with Fastmaps::     re_compile_fastmap ()
1624* GNU Translate Tables::        The `translate' field.
1625* Using Registers::             The re_registers type and related fns.
1626* Freeing GNU Pattern Buffers::  regfree ()
1627@end menu
1628
1629
1630@node GNU Pattern Buffers, GNU Regular Expression Compiling,  , GNU Regex Functions
1631@subsection GNU Pattern Buffers
1632
1633@cindex pattern buffer, definition of
1634@tindex re_pattern_buffer @r{definition}
1635@tindex struct re_pattern_buffer @r{definition}
1636
1637To compile, match, or search for a given regular expression, you must
1638supply a pattern buffer.  A @dfn{pattern buffer} holds one compiled
1639regular expression.@footnote{Regular expressions are also referred to as
1640``patterns,'' hence the name ``pattern buffer.''}
1641
1642You can have several different pattern buffers simultaneously, each
1643holding a compiled pattern for a different regular expression.
1644
1645@file{regex.h} defines the pattern buffer @code{struct} as follows:
1646
1647@example
1648[[[ pattern_buffer ]]]
1649@end example
1650
1651
1652@node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions
1653@subsection GNU Regular Expression Compiling
1654
1655In @sc{gnu}, you can both match and search for a given regular
1656expression.  To do either, you must first compile it in a pattern buffer
1657(@pxref{GNU Pattern Buffers}).
1658
1659@cindex syntax initialization
1660@vindex re_syntax_options @r{initialization}
1661Regular expressions match according to the syntax with which they were
1662compiled; with @sc{gnu}, you indicate what syntax you want by setting
1663the variable @code{re_syntax_options} (declared in @file{regex.h} and
1664defined in @file{regex.c}) before calling the compiling function,
1665@code{re_compile_pattern} (see below).  @xref{Syntax Bits}, and
1666@ref{Predefined Syntaxes}.
1667
1668You can change the value of @code{re_syntax_options} at any time.
1669Usually, however, you set its value once and then never change it.
1670
1671@cindex pattern buffer initialization
1672@code{re_compile_pattern} takes a pattern buffer as an argument.  You
1673must initialize the following fields:
1674
1675@table @code
1676
1677@item translate @r{initialization}
1678
1679@item translate
1680@vindex translate @r{initialization}
1681Initialize this to point to a translate table if you want one, or to
1682zero if you don't.  We explain translate tables in @ref{GNU Translate
1683Tables}.
1684
1685@item fastmap
1686@vindex fastmap @r{initialization}
1687Initialize this to nonzero if you want a fastmap, or to zero if you
1688don't.
1689
1690@item buffer
1691@itemx allocated
1692@vindex buffer @r{initialization}
1693@vindex allocated @r{initialization}
1694@findex malloc
1695If you want @code{re_compile_pattern} to allocate memory for the
1696compiled pattern, set both of these to zero.  If you have an existing
1697block of memory (allocated with @code{malloc}) you want Regex to use,
1698set @code{buffer} to its address and @code{allocated} to its size (in
1699bytes).
1700
1701@code{re_compile_pattern} uses @code{realloc} to extend the space for
1702the compiled pattern as necessary.
1703
1704@end table
1705
1706To compile a pattern buffer, use:
1707
1708@findex re_compile_pattern
1709@example
1710char * 
1711re_compile_pattern (const char *@var{regex}, const int @var{regex_size}, 
1712                    struct re_pattern_buffer *@var{pattern_buffer})
1713@end example
1714
1715@noindent
1716@var{regex} is the regular expression's address, @var{regex_size} is its
1717length, and @var{pattern_buffer} is the pattern buffer's address.
1718
1719If @code{re_compile_pattern} successfully compiles the regular
1720expression, it returns zero and sets @code{*@var{pattern_buffer}} to the
1721compiled pattern.  It sets the pattern buffer's fields as follows:
1722
1723@table @code
1724@item buffer
1725@vindex buffer @r{field, set by @code{re_compile_pattern}}
1726to the compiled pattern.
1727
1728@item used
1729@vindex used @r{field, set by @code{re_compile_pattern}}
1730to the number of bytes the compiled pattern in @code{buffer} occupies.
1731
1732@item syntax
1733@vindex syntax @r{field, set by @code{re_compile_pattern}}
1734to the current value of @code{re_syntax_options}.
1735
1736@item re_nsub
1737@vindex re_nsub @r{field, set by @code{re_compile_pattern}}
1738to the number of subexpressions in @var{regex}.
1739
1740@item fastmap_accurate
1741@vindex fastmap_accurate @r{field, set by @code{re_compile_pattern}}
1742to zero on the theory that the pattern you're compiling is different
1743than the one previously compiled into @code{buffer}; in that case (since
1744you can't make a fastmap without a compiled pattern), 
1745@code{fastmap} would either contain an incompatible fastmap, or nothing
1746at all.
1747
1748@c xx what else?
1749@end table
1750
1751If @code{re_compile_pattern} can't compile @var{regex}, it returns an
1752error string corresponding to one of the errors listed in @ref{POSIX
1753Regular Expression Compiling}.
1754
1755
1756@node GNU Matching, GNU Searching, GNU Regular Expression Compiling, GNU Regex Functions
1757@subsection GNU Matching 
1758
1759@cindex matching with GNU functions
1760
1761Matching the @sc{gnu} way means trying to match as much of a string as
1762possible starting at a position within it you specify.  Once you've compiled
1763a pattern into a pattern buffer (@pxref{GNU Regular Expression
1764Compiling}), you can ask the matcher to match that pattern against a
1765string using:
1766
1767@findex re_match
1768@example
1769int
1770re_match (struct re_pattern_buffer *@var{pattern_buffer}, 
1771          const char *@var{string}, const int @var{size}, 
1772          const int @var{start}, struct re_registers *@var{regs})
1773@end example
1774
1775@noindent
1776@var{pattern_buffer} is the address of a pattern buffer containing a
1777compiled pattern.  @var{string} is the string you want to match; it can
1778contain newline and null characters.  @var{size} is the length of that
1779string.  @var{start} is the string index at which you want to
1780begin matching; the first character of @var{string} is at index zero.
1781@xref{Using Registers}, for a explanation of @var{regs}; you can safely
1782pass zero.
1783
1784@code{re_match} matches the regular expression in @var{pattern_buffer}
1785against the string @var{string} according to the syntax in
1786@var{pattern_buffers}'s @code{syntax} field.  (@xref{GNU Regular
1787Expression Compiling}, for how to set it.)  The function returns
1788@math{-1} if the compiled pattern does not match any part of
1789@var{string} and @math{-2} if an internal error happens; otherwise, it
1790returns how many (possibly zero) characters of @var{string} the pattern
1791matched.
1792
1793An example: suppose @var{pattern_buffer} points to a pattern buffer
1794containing the compiled pattern for @samp{a*}, and @var{string} points
1795to @samp{aaaaab} (whereupon @var{size} should be 6). Then if @var{start}
1796is 2, @code{re_match} returns 3, i.e., @samp{a*} would have matched the
1797last three @samp{a}s in @var{string}.  If @var{start} is 0,
1798@code{re_match} returns 5, i.e., @samp{a*} would have matched all the
1799@samp{a}s in @var{string}.  If @var{start} is either 5 or 6, it returns
1800zero.
1801
1802If @var{start} is not between zero and @var{size}, then
1803@code{re_match} returns @math{-1}.
1804
1805
1806@node GNU Searching, Matching/Searching with Split Data, GNU Matching, GNU Regex Functions
1807@subsection GNU Searching 
1808
1809@cindex searching with GNU functions
1810
1811@dfn{Searching} means trying to match starting at successive positions
1812within a string.  The function @code{re_search} does this.
1813
1814Before calling @code{re_search}, you must compile your regular
1815expression.  @xref{GNU Regular Expression Compiling}.
1816
1817Here is the function declaration:
1818
1819@findex re_search
1820@example
1821int 
1822re_search (struct re_pattern_buffer *@var{pattern_buffer}, 
1823           const char *@var{string}, const int @var{size}, 
1824           const int @var{start}, const int @var{range}, 
1825           struct re_registers *@var{regs})
1826@end example
1827
1828@noindent
1829@vindex start @r{argument to @code{re_search}}
1830@vindex range @r{argument to @code{re_search}}
1831whose arguments are the same as those to @code{re_match} (@pxref{GNU
1832Matching}) except that the two arguments @var{start} and @var{range}
1833replace @code{re_match}'s argument @var{start}.
1834
1835If @var{range} is positive, then @code{re_search} attempts a match
1836starting first at index @var{start}, then at @math{@var{start} + 1} if
1837that fails, and so on, up to @math{@var{start} + @var{range}}; if
1838@var{range} is negative, then it attempts a match starting first at
1839index @var{start}, then at @math{@var{start} -1} if that fails, and so
1840on.  
1841
1842If @var{start} is not between zero and @var{size}, then @code{re_search}
1843returns @math{-1}.  When @var{range} is positive, @code{re_search}
1844adjusts @var{range} so that @math{@var{start} + @var{range} - 1} is
1845between zero and @var{size}, if necessary; that way it won't search
1846outside of @var{string}.  Similarly, when @var{range} is negative,
1847@code{re_search} adjusts @var{range} so that @math{@var{start} +
1848@var{range} + 1} is between zero and @var{size}, if necessary.
1849
1850If the @code{fastmap} field of @var{pattern_buffer} is zero,
1851@code{re_search} matches starting at consecutive positions; otherwise,
1852it uses @code{fastmap} to make the search more efficient.
1853@xref{Searching with Fastmaps}.
1854
1855If no match is found, @code{re_search} returns @math{-1}.  If
1856a match is found, it returns the index where the match began.  If an
1857internal error happens, it returns @math{-2}.
1858
1859
1860@node Matching/Searching with Split Data, Searching with Fastmaps, GNU Searching, GNU Regex Functions
1861@subsection Matching and Searching with Split Data
1862
1863Using the functions @code{re_match_2} and @code{re_search_2}, you can
1864match or search in data that is divided into two strings.  
1865
1866The function:
1867
1868@findex re_match_2
1869@example
1870int
1871re_match_2 (struct re_pattern_buffer *@var{buffer}, 
1872            const char *@var{string1}, const int @var{size1}, 
1873            const char *@var{string2}, const int @var{size2}, 
1874            const int @var{start}, 
1875            struct re_registers *@var{regs}, 
1876            const int @var{stop})
1877@end example
1878
1879@noindent
1880is similar to @code{re_match} (@pxref{GNU Matching}) except that you
1881pass @emph{two} data strings and sizes, and an index @var{stop} beyond
1882which you don't want the matcher to try matching.  As with
1883@code{re_match}, if it succeeds, @code{re_match_2} returns how many
1884characters of @var{string} it matched.  Regard @var{string1} and
1885@var{string2} as concatenated when you set the arguments @var{start} and
1886@var{stop} and use the contents of @var{regs}; @code{re_match_2} never
1887returns a value larger than @math{@var{size1} + @var{size2}}.  
1888
1889The function:
1890
1891@findex re_search_2
1892@example
1893int
1894re_search_2 (struct re_pattern_buffer *@var{buffer}, 
1895             const char *@var{string1}, const int @var{size1}, 
1896             const char *@var{string2}, const int @var{size2}, 
1897             const int @var{start}, const int @var{range}, 
1898             struct re_registers *@var{regs}, 
1899             const int @var{stop})
1900@end example
1901
1902@noindent
1903is similarly related to @code{re_search}.
1904
1905
1906@node Searching with Fastmaps, GNU Translate Tables, Matching/Searching with Split Data, GNU Regex Functions
1907@subsection Searching with Fastmaps
1908
1909@cindex fastmaps
1910If you're searching through a long string, you should use a fastmap.
1911Without one, the searcher tries to match at consecutive positions in the
1912string.  Generally, most of the characters in the string could not start
1913a match.  It takes much longer to try matching at a given position in the
1914string than it does to check in a table whether or not the character at
1915that position could start a match.  A @dfn{fastmap} is such a table.
1916
1917More specifically, a fastmap is an array indexed by the characters in
1918your character set.  Under the @sc{ascii} encoding, therefore, a fastmap
1919has 256 elements.  If you want the searcher to use a fastmap with a
1920given pattern buffer, you must allocate the array and assign the array's
1921address to the pattern buffer's @code{fastmap} field.  You either can
1922compile the fastmap yourself or have @code{re_search} do it for you;
1923when @code{fastmap} is nonzero, it automatically compiles a fastmap the
1924first time you search using a particular compiled pattern.  
1925
1926To compile a fastmap yourself, use:
1927
1928@findex re_compile_fastmap
1929@example
1930int
1931re_compile_fastmap (struct re_pattern_buffer *@var{pattern_buffer})
1932@end example
1933
1934@noindent
1935@var{pattern_buffer} is the address of a pattern buffer.  If the
1936character @var{c} could start a match for the pattern,
1937@code{re_compile_fastmap} makes
1938@code{@var{pattern_buffer}->fastmap[@var{c}]} nonzero.  It returns
1939@math{0} if it can compile a fastmap and @math{-2} if there is an
1940internal error.  For example, if @samp{|} is the alternation operator
1941and @var{pattern_buffer} holds the compiled pattern for @samp{a|b}, then
1942@code{re_compile_fastmap} sets @code{fastmap['a']} and
1943@code{fastmap['b']} (and no others).
1944
1945@code{re_search} uses a fastmap as it moves along in the string: it
1946checks the string's characters until it finds one that's in the fastmap.
1947Then it tries matching at that character.  If the match fails, it
1948repeats the process.  So, by using a fastmap, @code{re_search} doesn't
1949waste time trying to match at positions in the string that couldn't
1950start a match.
1951
1952If you don't want @code{re_search} to use a fastmap,
1953store zero in the @code{fastmap} field of the pattern buffer before
1954calling @code{re_search}.
1955
1956Once you've initialized a pattern buffer's @code{fastmap} field, you
1957need never do so again---even if you compile a new pattern in
1958it---provided the way the field is set still reflects whether or not you
1959want a fastmap.  @code{re_search} will still either do nothing if
1960@code{fastmap} is null or, if it isn't, compile a new fastmap for the
1961new pattern.
1962
1963@node GNU Translate Tables, Using Registers, Searching with Fastmaps, GNU Regex Functions
1964@subsection GNU Translate Tables
1965
1966If you set the @code{translate} field of a pattern buffer to a translate
1967table, then the @sc{gnu} Regex functions to which you've passed that
1968pattern buffer use it to apply a simple transformation
1969to all the regular expression and string characters at which they look.
1970
1971A @dfn{translate table} is an array indexed by the characters in your
1972character set.  Under the @sc{ascii} encoding, therefore, a translate
1973table has 256 elements.  The array's elements are also characters in
1974your character set.  When the Regex functions see a character @var{c},
1975they use @code{translate[@var{c}]} in its place, with one exception: the
1976character after a @samp{\} is not translated.  (This ensures that, the
1977operators, e.g., @samp{\B} and @samp{\b}, are always distinguishable.)
1978
1979For example, a table that maps all lowercase letters to the
1980corresponding uppercase ones would cause the matcher to ignore
1981differences in case.@footnote{A table that maps all uppercase letters to
1982the corresponding lowercase ones would work just as well for this
1983purpose.}  Such a table would map all characters except lowercase letters
1984to themselves, and lowercase letters to the corresponding uppercase
1985ones.  Under the @sc{ascii} encoding, here's how you could initialize
1986such a table (we'll call it @code{case_fold}):
1987
1988@example
1989for (i = 0; i < 256; i++)
1990  case_fold[i] = i;
1991for (i = 'a'; i <= 'z'; i++)
1992  case_fold[i] = i - ('a' - 'A');
1993@end example
1994
1995You tell Regex to use a translate table on a given pattern buffer by
1996assigning that table's address to the @code{translate} field of that
1997buffer.  If you don't want Regex to do any translation, put zero into
1998this field.  You'll get weird results if you change the table's contents
1999anytime between compiling the pattern buffer, compiling its fastmap, and
2000matching or searching with the pattern buffer.
2001
2002@node Using Registers, Freeing GNU Pattern Buffers, GNU Translate Tables, GNU Regex Functions
2003@subsection Using Registers
2004
2005A group in a regular expression can match a (posssibly empty) substring
2006of the string that regular expression as a whole matched.  The matcher
2007remembers the beginning and end of the substring matched by
2008each group.
2009
2010To find out what they matched, pass a nonzero @var{regs} argument to a
2011@sc{gnu} matching or searching function (@pxref{GNU Matching} and
2012@ref{GNU Searching}), i.e., the address of a structure of this type, as
2013defined in @file{regex.h}:
2014
2015@c We don't bother to include this directly from regex.h,
2016@c since it changes so rarely.
2017@example
2018@tindex re_registers
2019@vindex num_regs @r{in @code{struct re_registers}}
2020@vindex start @r{in @code{struct re_registers}}
2021@vindex end @r{in @code{struct re_registers}}
2022struct re_registers
2023@{
2024  unsigned num_regs;
2025  regoff_t *start;
2026  regoff_t *end;
2027@};
2028@end example
2029
2030Except for (possibly) the @var{num_regs}'th element (see below), the
2031@var{i}th element of the @code{start} and @code{end} arrays records
2032information about the @var{i}th group in the pattern.  (They're declared
2033as C pointers, but this is only because not all C compilers accept
2034zero-length arrays; conceptually, it is simplest to think of them as
2035arrays.)
2036
2037The @code{start} and @code{end} arrays are allocated in various ways,
2038depending on the value of the @code{regs_allocated}
2039@vindex regs_allocated
2040field in the pattern buffer passed to the matcher.
2041
2042The simplest and perhaps most useful is to let the matcher (re)allocate
2043enough space to record information for all the groups in the regular
2044expression.  If @code{regs_allocated} is @code{REGS_UNALLOCATED},
2045@vindex REGS_UNALLOCATED
2046the matcher allocates @math{1 + @var{re_nsub}} (another field in the
2047pattern buffer; @pxref{GNU Pattern Buffers}).  The extra element is set
2048to @math{-1}, and sets @code{regs_allocated} to @code{REGS_REALLOCATE}.
2049@vindex REGS_REALLOCATE
2050Then on subsequent calls with the same pattern buffer and @var{regs}
2051arguments, the matcher reallocates more space if necessary.
2052
2053It would perhaps be more logical to make the @code{regs_allocated} field
2054part of the @code{re_registers} structure, instead of part of the
2055pattern buffer.  But in that case the caller would be forced to
2056initialize the structure before passing it.  Much existing code doesn't
2057do this initialization, and it's arguably better to avoid it anyway.
2058
2059@code{re_compile_pattern} sets @code{regs_allocated} to
2060@code{REGS_UNALLOCATED},
2061so if you use the GNU regular expression
2062functions, you get this behavior by default.
2063
2064xx document re_set_registers
2065
2066@sc{posix}, on the other hand, requires a different interface:  the
2067caller is supposed to pass in a fixed-length array which the matcher
2068fills.  Therefore, if @code{regs_allocated} is @code{REGS_FIXED} 
2069@vindex REGS_FIXED
2070the matcher simply fills that array.
2071
2072The following examples illustrate the information recorded in the
2073@code{re_registers} structure.  (In all of them, @samp{(} represents the
2074open-group and @samp{)} the close-group operator.  The first character
2075in the string @var{string} is at index 0.)
2076
2077@c xx i'm not sure this is all true anymore.
2078
2079@itemize @bullet
2080
2081@item 
2082If the regular expression has an @w{@var{i}-th}
2083group not contained within another group that matches a
2084substring of @var{string}, then the function sets
2085@code{@w{@var{regs}->}start[@var{i}]} to the index in @var{string} where
2086the substring matched by the @w{@var{i}-th} group begins, and
2087@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2088substring's end.  The function sets @code{@w{@var{regs}->}start[0]} and
2089@code{@w{@var{regs}->}end[0]} to analogous information about the entire
2090pattern.
2091
2092For example, when you match @samp{((a)(b))} against @samp{ab}, you get:
2093
2094@itemize @bullet
2095@item
20960 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 
2097
2098@item
20990 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 
2100
2101@item
21020 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 
2103
2104@item
21051 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]} 
2106@end itemize
2107
2108@item
2109If a group matches more than once (as it might if followed by,
2110e.g., a repetition operator), then the function reports the information
2111about what the group @emph{last} matched.
2112
2113For example, when you match the pattern @samp{(a)*} against the string
2114@samp{aa}, you get:
2115
2116@itemize @bullet
2117@item
21180 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 
2119
2120@item
21211 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 
2122@end itemize
2123
2124@item
2125If the @w{@var{i}-th} group does not participate in a
2126successful match, e.g., it is an alternative not taken or a
2127repetition operator allows zero repetitions of it, then the function
2128sets @code{@w{@var{regs}->}start[@var{i}]} and
2129@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}.
2130
2131For example, when you match the pattern @samp{(a)*b} against
2132the string @samp{b}, you get:
2133
2134@itemize @bullet
2135@item
21360 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 
2137
2138@item
2139@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 
2140@end itemize
2141
2142@item
2143If the @w{@var{i}-th} group matches a zero-length string, then the
2144function sets @code{@w{@var{regs}->}start[@var{i}]} and
2145@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2146zero-length string.  
2147
2148For example, when you match the pattern @samp{(a*)b} against the string
2149@samp{b}, you get:
2150
2151@itemize @bullet
2152@item
21530 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 
2154
2155@item
21560 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 
2157@end itemize
2158
2159@ignore
2160The function sets @code{@w{@var{regs}->}start[0]} and
2161@code{@w{@var{regs}->}end[0]} to analogous information about the entire
2162pattern.
2163
2164For example, when you match the pattern @samp{(a*)} against the empty
2165string, you get:
2166
2167@itemize @bullet
2168@item
21690 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]} 
2170
2171@item
21720 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 
2173@end itemize
2174@end ignore
2175
2176@item
2177If an @w{@var{i}-th} group contains a @w{@var{j}-th} group 
2178in turn not contained within any other group within group @var{i} and
2179the function reports a match of the @w{@var{i}-th} group, then it
2180records in @code{@w{@var{regs}->}start[@var{j}]} and
2181@code{@w{@var{regs}->}end[@var{j}]} the last match (if it matched) of
2182the @w{@var{j}-th} group.
2183
2184For example, when you match the pattern @samp{((a*)b)*} against the
2185string @samp{abb}, @w{group 2} last matches the empty string, so you
2186get what it previously matched:
2187
2188@itemize @bullet
2189@item
21900 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 
2191
2192@item
21932 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 
2194
2195@item
21962 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]} 
2197@end itemize
2198
2199When you match the pattern @samp{((a)*b)*} against the string
2200@samp{abb}, @w{group 2} doesn't participate in the last match, so you
2201get:
2202
2203@itemize @bullet
2204@item
22050 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 
2206
2207@item
22082 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 
2209
2210@item
22110 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 
2212@end itemize
2213
2214@item
2215If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
2216in turn not contained within any other group within group @var{i}
2217and the function sets 
2218@code{@w{@var{regs}->}start[@var{i}]} and 
2219@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets
2220@code{@w{@var{regs}->}start[@var{j}]} and
2221@code{@w{@var{regs}->}end[@var{j}]} to @math{-1}.
2222
2223For example, when you match the pattern @samp{((a)*b)*c} against the
2224string @samp{c}, you get:
2225
2226@itemize @bullet
2227@item
22280 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 
2229
2230@item
2231@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 
2232
2233@item
2234@math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]} 
2235@end itemize
2236
2237@end itemize
2238
2239@node Freeing GNU Pattern Buffers,  , Using Registers, GNU Regex Functions
2240@subsection Freeing GNU Pattern Buffers
2241
2242To free any allocated fields of a pattern buffer, you can use the
2243@sc{posix} function described in @ref{Freeing POSIX Pattern Buffers},
2244since the type @code{regex_t}---the type for @sc{posix} pattern
2245buffers---is equivalent to the type @code{re_pattern_buffer}.  After
2246freeing a pattern buffer, you need to again compile a regular expression
2247in it (@pxref{GNU Regular Expression Compiling}) before passing it to
2248a matching or searching function.
2249
2250
2251@node POSIX Regex Functions, BSD Regex Functions, GNU Regex Functions, Programming with Regex
2252@section POSIX Regex Functions
2253
2254If you're writing code that has to be @sc{posix} compatible, you'll need
2255to use these functions. Their interfaces are as specified by @sc{posix},
2256draft 1003.2/D11.2.
2257
2258@menu
2259* POSIX Pattern Buffers::		The regex_t type.
2260* POSIX Regular Expression Compiling::	regcomp ()
2261* POSIX Matching::			regexec ()
2262* Reporting Errors::			regerror ()
2263* Using Byte Offsets::			The regmatch_t type.
2264* Freeing POSIX Pattern Buffers::	regfree ()
2265@end menu
2266
2267
2268@node POSIX Pattern Buffers, POSIX Regular Expression Compiling,  , POSIX Regex Functions
2269@subsection POSIX Pattern Buffers
2270
2271To compile or match a given regular expression the @sc{posix} way, you
2272must supply a pattern buffer exactly the way you do for @sc{gnu}
2273(@pxref{GNU Pattern Buffers}).  @sc{posix} pattern buffers have type
2274@code{regex_t}, which is equivalent to the @sc{gnu} pattern buffer
2275type @code{re_pattern_buffer}.
2276
2277
2278@node POSIX Regular Expression Compiling, POSIX Matching, POSIX Pattern Buffers, POSIX Regex Functions
2279@subsection POSIX Regular Expression Compiling
2280
2281With @sc{posix}, you can only search for a given regular expression; you
2282can't match it.  To do this, you must first compile it in a
2283pattern buffer, using @code{regcomp}.
2284
2285@ignore
2286Before calling @code{regcomp}, you must initialize this pattern buffer
2287as you do for @sc{gnu} (@pxref{GNU Regular Expression Compiling}).  See
2288below, however, for how to choose a syntax with which to compile.
2289@end ignore
2290
2291To compile a pattern buffer, use:
2292
2293@findex regcomp
2294@example
2295int
2296regcomp (regex_t *@var{preg}, const char *@var{regex}, int @var{cflags})
2297@end example
2298
2299@noindent
2300@var{preg} is the initialized pattern buffer's address, @var{regex} is
2301the regular expression's address, and @var{cflags} is the compilation
2302flags, which Regex considers as a collection of bits.  Here are the
2303valid bits, as defined in @file{regex.h}:
2304
2305@table @code
2306
2307@item REG_EXTENDED
2308@vindex REG_EXTENDED
2309says to use @sc{posix} Extended Regular Expression syntax; if this isn't
2310set, then says to use @sc{posix} Basic Regular Expression syntax.
2311@code{regcomp} sets @var{preg}'s @code{syntax} field accordingly.
2312
2313@item REG_ICASE
2314@vindex REG_ICASE
2315@cindex ignoring case
2316says to ignore case; @code{regcomp} sets @var{preg}'s @code{translate}
2317field to a translate table which ignores case, replacing anything you've
2318put there before.
2319
2320@item REG_NOSUB
2321@vindex REG_NOSUB
2322says to set @var{preg}'s @code{no_sub} field; @pxref{POSIX Matching},
2323for what this means.
2324
2325@item REG_NEWLINE
2326@vindex REG_NEWLINE
2327says that a:
2328
2329@itemize @bullet
2330
2331@item
2332match-any-character operator (@pxref{Match-any-character
2333Operator}) doesn't match a newline.
2334
2335@item
2336nonmatching list not containing a newline (@pxref{List
2337Operators}) matches a newline.
2338
2339@item
2340match-beginning-of-line operator (@pxref{Match-beginning-of-line
2341Operator}) matches the empty string immediately after a newline,
2342regardless of how @code{REG_NOTBOL} is set (@pxref{POSIX Matching}, for
2343an explanation of @code{REG_NOTBOL}).
2344
2345@item
2346match-end-of-line operator (@pxref{Match-beginning-of-line
2347Operator}) matches the empty string immediately before a newline,
2348regardless of how @code{REG_NOTEOL} is set (@pxref{POSIX Matching},
2349for an explanation of @code{REG_NOTEOL}).
2350
2351@end itemize
2352
2353@end table
2354
2355If @code{regcomp} successfully compiles the regular expression, it
2356returns zero and sets @code{*@var{pattern_buffer}} to the compiled
2357pattern. Except for @code{syntax} (which it sets as explained above), it
2358also sets the same fields the same way as does the @sc{gnu} compiling
2359function (@pxref{GNU Regular Expression Compiling}).
2360
2361If @code{regcomp} can't compile the regular expression, it returns one
2362of the error codes listed here.  (Except when noted differently, the
2363syntax of in all examples below is basic regular expression syntax.)
2364
2365@table @code
2366
2367@comment repetitions
2368@item REG_BADRPT
2369For example, the consecutive repetition operators @samp{**} in
2370@samp{a**} are invalid.  As another example, if the syntax is extended
2371regular expression syntax, then the repetition operator @samp{*} with
2372nothing on which to operate in @samp{*} is invalid.
2373
2374@item REG_BADBR
2375For example, the @var{count} @samp{-1} in @samp{a\@{-1} is invalid.
2376
2377@item REG_EBRACE
2378For example, @samp{a\@{1} is missing a close-interval operator.
2379
2380@comment lists
2381@item REG_EBRACK
2382For example, @samp{[a} is missing a close-list operator.
2383
2384@item REG_ERANGE
2385For example, the range ending point @samp{z} that collates lower than
2386does its starting point @samp{a} in @samp{[z-a]} is invalid.  Also, the
2387range with the character class @samp{[:alpha:]} as its starting point in
2388@samp{[[:alpha:]-|]}.
2389
2390@item REG_ECTYPE
2391For example, the character class name @samp{foo} in @samp{[[:foo:]} is
2392invalid.
2393
2394@comment groups
2395@item REG_EPAREN
2396For example, @samp{a\)} is missing an open-group operator and @samp{\(a}
2397is missing a close-group operator.
2398
2399@item REG_ESUBREG
2400For example, the back reference @samp{\2} that refers to a nonexistent
2401subexpression in @samp{\(a\)\2} is invalid.
2402
2403@comment unfinished business
2404
2405@item REG_EEND
2406Returned when a regular expression causes no other more specific error.
2407
2408@item REG_EESCAPE
2409For example, the trailing backslash @samp{\} in @samp{a\} is invalid, as is the
2410one in @samp{\}.
2411
2412@comment kitchen sink
2413@item REG_BADPAT
2414For example, in the extended regular expression syntax, the empty group
2415@samp{()} in @samp{a()b} is invalid.
2416
2417@comment internal
2418@item REG_ESIZE
2419Returned when a regular expression needs a pattern buffer larger than
242065536 bytes.
2421
2422@item REG_ESPACE
2423Returned when a regular expression makes Regex to run out of memory.
2424
2425@end table
2426
2427
2428@node POSIX Matching, Reporting Errors, POSIX Regular Expression Compiling, POSIX Regex Functions
2429@subsection POSIX Matching 
2430
2431Matching the @sc{posix} way means trying to match a null-terminated
2432string starting at its first character.  Once you've compiled a pattern
2433into a pattern buffer (@pxref{POSIX Regular Expression Compiling}), you
2434can ask the matcher to match that pattern against a string using:
2435
2436@findex regexec
2437@example
2438int
2439regexec (const regex_t *@var{preg}, const char *@var{string}, 
2440         size_t @var{nmatch}, regmatch_t @var{pmatch}[], int @var{eflags})
2441@end example
2442
2443@noindent
2444@var{preg} is the address of a pattern buffer for a compiled pattern.
2445@var{string} is the string you want to match.  
2446
2447@xref{Using Byte Offsets}, for an explanation of @var{pmatch}.  If you
2448pass zero for @var{nmatch} or you compiled @var{preg} with the
2449compilation flag @code{REG_NOSUB} set, then @code{regexec} will ignore
2450@var{pmatch}; otherwise, you must allocate it to have at least
2451@var{nmatch} elements.  @code{regexec} will record @var{nmatch} byte
2452offsets in @var{pmatch}, and set to @math{-1} any unused elements up to
2453@math{@var{pmatch}@code{[@var{nmatch}]} - 1}.
2454
2455@var{eflags} specifies @dfn{execution flags}---namely, the two bits
2456@code{REG_NOTBOL} and @code{REG_NOTEOL} (defined in @file{regex.h}).  If
2457you set @code{REG_NOTBOL}, then the match-beginning-of-line operator
2458(@pxref{Match-beginning-of-line Operator}) always fails to match.
2459This lets you match against pieces of a line, as you would need to if,
2460say, searching for repeated instances of a given pattern in a line; it
2461would work correctly for patterns both with and without
2462match-beginning-of-line operators.  @code{REG_NOTEOL} works analogously
2463for the match-end-of-line operator (@pxref{Match-end-of-line
2464Operator}); it exists for symmetry.
2465
2466@code{regexec} tries to find a match for @var{preg} in @var{string}
2467according to the syntax in @var{preg}'s @code{syntax} field.
2468(@xref{POSIX Regular Expression Compiling}, for how to set it.)  The
2469function returns zero if the compiled pattern matches @var{string} and
2470@code{REG_NOMATCH} (defined in @file{regex.h}) if it doesn't.
2471
2472@node Reporting Errors, Using Byte Offsets, POSIX Matching, POSIX Regex Functions
2473@subsection Reporting Errors
2474
2475If either @code{regcomp} or @code{regexec} fail, they return a nonzero
2476error code, the possibilities for which are defined in @file{regex.h}.
2477@xref{POSIX Regular Expression Compiling}, and @ref{POSIX Matching}, for
2478what these codes mean.  To get an error string corresponding to these
2479codes, you can use:
2480
2481@findex regerror
2482@example
2483size_t
2484regerror (int @var{errcode},
2485          const regex_t *@var{preg},
2486          char *@var{errbuf},
2487          size_t @var{errbuf_size})
2488@end example
2489
2490@noindent
2491@var{errcode} is an error code, @var{preg} is the address of the pattern
2492buffer which provoked the error, @var{errbuf} is the error buffer, and
2493@var{errbuf_size} is @var{errbuf}'s size.
2494
2495@code{regerror} returns the size in bytes of the error string
2496corresponding to @var{errcode} (including its terminating null).  If
2497@var{errbuf} and @var{errbuf_size} are nonzero, it also returns in
2498@var{errbuf} the first @math{@var{errbuf_size} - 1} characters of the
2499error string, followed by a null.  
2500@var{errbuf_size} must be a nonnegative number less than or equal to the
2501size in bytes of @var{errbuf}.
2502
2503You can call @code{regerror} with a null @var{errbuf} and a zero
2504@var{errbuf_size} to determine how large @var{errbuf} need be to
2505accommodate @code{regerror}'s error string.
2506
2507@node Using Byte Offsets, Freeing POSIX Pattern Buffers, Reporting Errors, POSIX Regex Functions
2508@subsection Using Byte Offsets
2509
2510In @sc{posix}, variables of type @code{regmatch_t} hold analogous
2511information, but are not identical to, @sc{gnu}'s registers (@pxref{Using
2512Registers}).  To get information about registers in @sc{posix}, pass to
2513@code{regexec} a nonzero @var{pmatch} of type @code{regmatch_t}, i.e.,
2514the address of a structure of this type, defined in
2515@file{regex.h}:
2516
2517@tindex regmatch_t
2518@example
2519typedef struct
2520@{
2521  regoff_t rm_so;
2522  regoff_t rm_eo;
2523@} regmatch_t;
2524@end example
2525
2526When reading in @ref{Using Registers}, about how the matching function
2527stores the information into the registers, substitute @var{pmatch} for
2528@var{regs}, @code{@w{@var{pmatch}[@var{i}]->}rm_so} for
2529@code{@w{@var{regs}->}start[@var{i}]} and
2530@code{@w{@var{pmatch}[@var{i}]->}rm_eo} for
2531@code{@w{@var{regs}->}end[@var{i}]}.
2532
2533@node Freeing POSIX Pattern Buffers,  , Using Byte Offsets, POSIX Regex Functions
2534@subsection Freeing POSIX Pattern Buffers
2535
2536To free any allocated fields of a pattern buffer, use:
2537
2538@findex regfree
2539@example
2540void 
2541regfree (regex_t *@var{preg})
2542@end example
2543
2544@noindent
2545@var{preg} is the pattern buffer whose allocated fields you want freed.
2546@code{regfree} also sets @var{preg}'s @code{allocated} and @code{used}
2547fields to zero.  After freeing a pattern buffer, you need to again
2548compile a regular expression in it (@pxref{POSIX Regular Expression
2549Compiling}) before passing it to the matching function (@pxref{POSIX
2550Matching}).
2551
2552
2553@node BSD Regex Functions,  , POSIX Regex Functions, Programming with Regex
2554@section BSD Regex Functions
2555
2556If you're writing code that has to be Berkeley @sc{unix} compatible,
2557you'll need to use these functions whose interfaces are the same as those
2558in Berkeley @sc{unix}.  
2559
2560@menu
2561* BSD Regular Expression Compiling::	re_comp ()
2562* BSD Searching::			re_exec ()
2563@end menu
2564
2565@node BSD Regular Expression Compiling, BSD Searching,  , BSD Regex Functions
2566@subsection  BSD Regular Expression Compiling
2567
2568With Berkeley @sc{unix}, you can only search for a given regular
2569expression; you can't match one.  To search for it, you must first
2570compile it.  Before you compile it, you must indicate the regular
2571expression syntax you want it compiled according to by setting the 
2572variable @code{re_syntax_options} (declared in @file{regex.h} to some
2573syntax (@pxref{Regular Expression Syntax}).
2574
2575To compile a regular expression use:
2576
2577@findex re_comp
2578@example
2579char *
2580re_comp (char *@var{regex})
2581@end example
2582
2583@noindent
2584@var{regex} is the address of a null-terminated regular expression.
2585@code{re_comp} uses an internal pattern buffer, so you can use only the
2586most recently compiled pattern buffer.  This means that if you want to
2587use a given regular expression that you've already compiled---but it
2588isn't the latest one you've compiled---you'll have to recompile it.  If
2589you call @code{re_comp} with the null string (@emph{not} the empty
2590string) as the argument, it doesn't change the contents of the pattern
2591buffer.
2592
2593If @code{re_comp} successfully compiles the regular expression, it
2594returns zero.  If it can't compile the regular expression, it returns
2595an error string.  @code{re_comp}'s error messages are identical to those
2596of @code{re_compile_pattern} (@pxref{GNU Regular Expression
2597Compiling}).
2598
2599@node BSD Searching,  , BSD Regular Expression Compiling, BSD Regex Functions
2600@subsection BSD Searching 
2601
2602Searching the Berkeley @sc{unix} way means searching in a string
2603starting at its first character and trying successive positions within
2604it to find a match.  Once you've compiled a pattern using @code{re_comp}
2605(@pxref{BSD Regular Expression Compiling}), you can ask Regex
2606to search for that pattern in a string using:
2607
2608@findex re_exec
2609@example
2610int
2611re_exec (char *@var{string})
2612@end example
2613
2614@noindent
2615@var{string} is the address of the null-terminated string in which you
2616want to search.
2617
2618@code{re_exec} returns either 1 for success or 0 for failure.  It
2619automatically uses a @sc{gnu} fastmap (@pxref{Searching with Fastmaps}).
2620
2621
2622@node Copying, Index, Programming with Regex, Top
2623@appendix GNU GENERAL PUBLIC LICENSE
2624@center Version 2, June 1991
2625
2626@display
2627Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
2628675 Mass Ave, Cambridge, MA 02139, USA
2629
2630Everyone is permitted to copy and distribute verbatim copies
2631of this license document, but changing it is not allowed.
2632@end display
2633
2634@unnumberedsec Preamble
2635
2636  The licenses for most software are designed to take away your
2637freedom to share and change it.  By contrast, the GNU General Public
2638License is intended to guarantee your freedom to share and change free
2639software---to make sure the software is free for all its users.  This
2640General Public License applies to most of the Free Software
2641Foundation's software and to any other program whose authors commit to
2642using it.  (Some other Free Software Foundation software is covered by
2643the GNU Library General Public License instead.)  You can apply it to
2644your programs, too.
2645
2646  When we speak of free software, we are referring to freedom, not
2647price.  Our General Public Licenses are designed to make sure that you
2648have the freedom to distribute copies of free software (and charge for
2649this service if you wish), that you receive source code or can get it
2650if you want it, that you can change the software or use pieces of it
2651in new free programs; and that you know you can do these things.
2652
2653  To protect your rights, we need to make restrictions that forbid
2654anyone to deny you these rights or to ask you to surrender the rights.
2655These restrictions translate to certain responsibilities for you if you
2656distribute copies of the software, or if you modify it.
2657
2658  For example, if you distribute copies of such a program, whether
2659gratis or for a fee, you must give the recipients all the rights that
2660you have.  You must make sure that they, too, receive or can get the
2661source code.  And you must show them these terms so they know their
2662rights.
2663
2664  We protect your rights with two steps: (1) copyright the software, and
2665(2) offer you this license which gives you legal permission to copy,
2666distribute and/or modify the software.
2667
2668  Also, for each author's protection and ours, we want to make certain
2669that everyone understands that there is no warranty for this free
2670software.  If the software is modified by someone else and passed on, we
2671want its recipients to know that what they have is not the original, so
2672that any problems introduced by others will not reflect on the original
2673authors' reputations.
2674
2675  Finally, any free program is threatened constantly by software
2676patents.  We wish to avoid the danger that redistributors of a free
2677program will individually obtain patent licenses, in effect making the
2678program proprietary.  To prevent this, we have made it clear that any
2679patent must be licensed for everyone's free use or not licensed at all.
2680
2681  The precise terms and conditions for copying, distribution and
2682modification follow.
2683
2684@iftex
2685@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2686@end iftex
2687@ifinfo
2688@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2689@end ifinfo
2690
2691@enumerate
2692@item
2693This License applies to any program or other work which contains
2694a notice placed by the copyright holder saying it may be distributed
2695under the terms of this General Public License.  The ``Program'', below,
2696refers to any such program or work, and a ``work based on the Program''
2697means either the Program or any derivative work under copyright law:
2698that is to say, a work containing the Program or a portion of it,
2699either verbatim or with modifications and/or translated into another
2700language.  (Hereinafter, translation is included without limitation in
2701the term ``modification''.)  Each licensee is addressed as ``you''.
2702
2703Activities other than copying, distribution and modification are not
2704covered by this License; they are outside its scope.  The act of
2705running the Program is not restricted, and the output from the Program
2706is covered only if its contents constitute a work based on the
2707Program (independent of having been made by running the Program).
2708Whether that is true depends on what the Program does.
2709
2710@item
2711You may copy and distribute verbatim copies of the Program's
2712source code as you receive it, in any medium, provided that you
2713conspicuously and appropriately publish on each copy an appropriate
2714copyright notice and disclaimer of warranty; keep intact all the
2715notices that refer to this License and to the absence of any warranty;
2716and give any other recipients of the Program a copy of this License
2717along with the Program.
2718
2719You may charge a fee for the physical act of transferring a copy, and
2720you may at your option offer warranty protection in exchange for a fee.
2721
2722@item
2723You may modify your copy or copies of the Program or any portion
2724of it, thus forming a work based on the Program, and copy and
2725distribute such modifications or work under the terms of Section 1
2726above, provided that you also meet all of these conditions:
2727
2728@enumerate a
2729@item
2730You must cause the modified files to carry prominent notices
2731stating that you changed the files and the date of any change.
2732
2733@item
2734You must cause any work that you distribute or publish, that in
2735whole or in part contains or is derived from the Program or any
2736part thereof, to be licensed as a whole at no charge to all third
2737parties under the terms of this License.
2738
2739@item
2740If the modified program normally reads commands interactively
2741when run, you must cause it, when started running for such
2742interactive use in the most ordinary way, to print or display an
2743announcement including an appropriate copyright notice and a
2744notice that there is no warranty (or else, saying that you provide
2745a warranty) and that users may redistribute the program under
2746these conditions, and telling the user how to view a copy of this
2747License.  (Exception: if the Program itself is interactive but
2748does not normally print such an announcement, your work based on
2749the Program is not required to print an announcement.)
2750@end enumerate
2751
2752These requirements apply to the modified work as a whole.  If
2753identifiable sections of that work are not derived from the Program,
2754and can be reasonably considered independent and separate works in
2755themselves, then this License, and its terms, do not apply to those
2756sections when you distribute them as separate works.  But when you
2757distribute the same sections as part of a whole which is a work based
2758on the Program, the distribution of the whole must be on the terms of
2759this License, whose permissions for other licensees extend to the
2760entire whole, and thus to each and every part regardless of who wrote it.
2761
2762Thus, it is not the intent of this section to claim rights or contest
2763your rights to work written entirely by you; rather, the intent is to
2764exercise the right to control the distribution of derivative or
2765collective works based on the Program.
2766
2767In addition, mere aggregation of another work not based on the Program
2768with the Program (or with a work based on the Program) on a volume of
2769a storage or distribution medium does not bring the other work under
2770the scope of this License.
2771
2772@item
2773You may copy and distribute the Program (or a work based on it,
2774under Section 2) in object code or executable form under the terms of
2775Sections 1 and 2 above provided that you also do one of the following:
2776
2777@enumerate a
2778@item
2779Accompany it with the complete corresponding machine-readable
2780source code, which must be distributed under the terms of Sections
27811 and 2 above on a medium customarily used for software interchange; or,
2782
2783@item
2784Accompany it with a written offer, valid for at least three
2785years, to give any third party, for a charge no more than your
2786cost of physically performing source distribution, a complete
2787machine-readable copy of the corresponding source code, to be
2788distributed under the terms of Sections 1 and 2 above on a medium
2789customarily used for software interchange; or,
2790
2791@item
2792Accompany it with the information you received as to the offer
2793to distribute corresponding source code.  (This alternative is
2794allowed only for noncommercial distribution and only if you
2795received the program in object code or executable form with such
2796an offer, in accord with Subsection b above.)
2797@end enumerate
2798
2799The source code for a work means the preferred form of the work for
2800making modifications to it.  For an executable work, complete source
2801code means all the source code for all modules it contains, plus any
2802associated interface definition files, plus the scripts used to
2803control compilation and installation of the executable.  However, as a
2804special exception, the source code distributed need not include
2805anything that is normally distributed (in either source or binary
2806form) with the major components (compiler, kernel, and so on) of the
2807operating system on which the executable runs, unless that component
2808itself accompanies the executable.
2809
2810If distribution of executable or object code is made by offering
2811access to copy from a designated place, then offering equivalent
2812access to copy the source code from the same place counts as
2813distribution of the source code, even though third parties are not
2814compelled to copy the source along with the object code.
2815
2816@item
2817You may not copy, modify, sublicense, or distribute the Program
2818except as expressly provided under this License.  Any attempt
2819otherwise to copy, modify, sublicense or distribute the Program is
2820void, and will automatically terminate your rights under this License.
2821However, parties who have received copies, or rights, from you under
2822this License will not have their licenses terminated so long as such
2823parties remain in full compliance.
2824
2825@item
2826You are not required to accept this License, since you have not
2827signed it.  However, nothing else grants you permission to modify or
2828distribute the Program or its derivative works.  These actions are
2829prohibited by law if you do not accept this License.  Therefore, by
2830modifying or distributing the Program (or any work based on the
2831Program), you indicate your acceptance of this License to do so, and
2832all its terms and conditions for copying, distributing or modifying
2833the Program or works based on it.
2834
2835@item
2836Each time you redistribute the Program (or any work based on the
2837Program), the recipient automatically receives a license from the
2838original licensor to copy, distribute or modify the Program subject to
2839these terms and conditions.  You may not impose any further
2840restrictions on the recipients' exercise of the rights granted herein.
2841You are not responsible for enforcing compliance by third parties to
2842this License.
2843
2844@item
2845If, as a consequence of a court judgment or allegation of patent
2846infringement or for any other reason (not limited to patent issues),
2847conditions are imposed on you (whether by court order, agreement or
2848otherwise) that contradict the conditions of this License, they do not
2849excuse you from the conditions of this License.  If you cannot
2850distribute so as to satisfy simultaneously your obligations under this
2851License and any other pertinent obligations, then as a consequence you
2852may not distribute the Program at all.  For example, if a patent
2853license would not permit royalty-free redistribution of the Program by
2854all those who receive copies directly or indirectly through you, then
2855the only way you could satisfy both it and this License would be to
2856refrain entirely from distribution of the Program.
2857
2858If any portion of this section is held invalid or unenforceable under
2859any particular circumstance, the balance of the section is intended to
2860apply and the section as a whole is intended to apply in other
2861circumstances.
2862
2863It is not the purpose of this section to induce you to infringe any
2864patents or other property right claims or to contest validity of any
2865such claims; this section has the sole purpose of protecting the
2866integrity of the free software distribution system, which is
2867implemented by public license practices.  Many people have made
2868generous contributions to the wide range of software distributed
2869through that system in reliance on consistent application of that
2870system; it is up to the author/donor to decide if he or she is willing
2871to distribute software through any other system and a licensee cannot
2872impose that choice.
2873
2874This section is intended to make thoroughly clear what is believed to
2875be a consequence of the rest of this License.
2876
2877@item
2878If the distribution and/or use of the Program is restricted in
2879certain countries either by patents or by copyrighted interfaces, the
2880original copyright holder who places the Program under this License
2881may add an explicit geographical distribution limitation excluding
2882those countries, so that distribution is permitted only in or among
2883countries not thus excluded.  In such case, this License incorporates
2884the limitation as if written in the body of this License.
2885
2886@item
2887The Free Software Foundation may publish revised and/or new versions
2888of the General Public License from time to time.  Such new versions will
2889be similar in spirit to the present version, but may differ in detail to
2890address new problems or concerns.
2891
2892Each version is given a distinguishing version number.  If the Program
2893specifies a version number of this License which applies to it and ``any
2894later version'', you have the option of following the terms and conditions
2895either of that version or of any later version published by the Free
2896Software Foundation.  If the Program does not specify a version number of
2897this License, you may choose any version ever published by the Free Software
2898Foundation.
2899
2900@item
2901If you wish to incorporate parts of the Program into other free
2902programs whose distribution conditions are different, write to the author
2903to ask for permission.  For software which is copyrighted by the Free
2904Software Foundation, write to the Free Software Foundation; we sometimes
2905make exceptions for this.  Our decision will be guided by the two goals
2906of preserving the free status of all derivatives of our free software and
2907of promoting the sharing and reuse of software generally.
2908
2909@iftex
2910@heading NO WARRANTY
2911@end iftex
2912@ifinfo
2913@center NO WARRANTY
2914@end ifinfo
2915
2916@item
2917BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
2918FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
2919OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
2920PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
2921OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
2922MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
2923TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
2924PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
2925REPAIR OR CORRECTION.
2926
2927@item
2928IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
2929WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
2930REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
2931INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
2932OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
2933TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
2934YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
2935PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
2936POSSIBILITY OF SUCH DAMAGES.
2937@end enumerate
2938
2939@iftex
2940@heading END OF TERMS AND CONDITIONS
2941@end iftex
2942@ifinfo
2943@center END OF TERMS AND CONDITIONS
2944@end ifinfo
2945
2946@page
2947@unnumberedsec Appendix: How to Apply These Terms to Your New Programs
2948
2949  If you develop a new program, and you want it to be of the greatest
2950possible use to the public, the best way to achieve this is to make it
2951free software which everyone can redistribute and change under these terms.
2952
2953  To do so, attach the following notices to the program.  It is safest
2954to attach them to the start of each source file to most effectively
2955convey the exclusion of warranty; and each file should have at least
2956the ``copyright'' line and a pointer to where the full notice is found.
2957
2958@smallexample
2959@var{one line to give the program's name and a brief idea of what it does.}
2960Copyright (C) 19@var{yy}  @var{name of author}
2961
2962This program is free software; you can redistribute it and/or modify
2963it under the terms of the GNU General Public License as published by
2964the Free Software Foundation; either version 2 of the License, or
2965(at your option) any later version.
2966
2967This program is distributed in the hope that it will be useful,
2968but WITHOUT ANY WARRANTY; without even the implied warranty of
2969MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
2970GNU General Public License for more details.
2971
2972You should have received a copy of the GNU General Public License
2973along with this program; if not, write to the Free Software
2974Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
2975@end smallexample
2976
2977Also add information on how to contact you by electronic and paper mail.
2978
2979If the program is interactive, make it output a short notice like this
2980when it starts in an interactive mode:
2981
2982@smallexample
2983Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
2984Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
2985This is free software, and you are welcome to redistribute it
2986under certain conditions; type `show c' for details.
2987@end smallexample
2988
2989The hypothetical commands @samp{show w} and @samp{show c} should show
2990the appropriate parts of the General Public License.  Of course, the
2991commands you use may be called something other than @samp{show w} and
2992@samp{show c}; they could even be mouse-clicks or menu items---whatever
2993suits your program.
2994
2995You should also get your employer (if you work as a programmer) or your
2996school, if any, to sign a ``copyright disclaimer'' for the program, if
2997necessary.  Here is a sample; alter the names:
2998
2999@example
3000Yoyodyne, Inc., hereby disclaims all copyright interest in the program
3001`Gnomovision' (which makes passes at compilers) written by James Hacker.
3002
3003@var{signature of Ty Coon}, 1 April 1989
3004Ty Coon, President of Vice
3005@end example
3006
3007This General Public License does not permit incorporating your program into
3008proprietary programs.  If your program is a subroutine library, you may
3009consider it more useful to permit linking proprietary applications with the
3010library.  If this is what you want to do, use the GNU Library General
3011Public License instead of this License.
3012
3013
3014@node Index,  , Copying, Top
3015@unnumbered Index
3016
3017@printindex cp
3018
3019@contents
3020
3021@bye
3022