1218Sconklin\input texinfo
2218Sconklin@c %**start of header
3218Sconklin@setfilename regex.info
4218Sconklin@settitle Regex
5218Sconklin@c %**end of header
6218Sconklin
7218Sconklin@c \\{fill-paragraph} works better (for me, anyway) if the text in the
8218Sconklin@c source file isn't indented.
9218Sconklin@paragraphindent 2
10218Sconklin
11218Sconklin@c Define a new index for our magic constants.
12218Sconklin@defcodeindex cn
13218Sconklin
14218Sconklin@c Put everything in one index (arbitrarily chosen to be the concept index).
15218Sconklin@syncodeindex cn cp
16218Sconklin@syncodeindex ky cp
17218Sconklin@syncodeindex pg cp
18218Sconklin@syncodeindex tp cp
19218Sconklin@syncodeindex vr cp
20218Sconklin
21218Sconklin@c Here is what we use in the Info `dir' file:
22218Sconklin@c * Regex: (regex).	Regular expression library.
23218Sconklin
24218Sconklin
25218Sconklin@ifinfo
26218SconklinThis file documents the GNU regular expression library.
27218Sconklin
28218SconklinCopyright (C) 1992, 1993 Free Software Foundation, Inc.
29218Sconklin
30218SconklinPermission is granted to make and distribute verbatim copies of this
31218Sconklinmanual provided the copyright notice and this permission notice are
32218Sconklinpreserved on all copies.
33218Sconklin
34218Sconklin@ignore
35218SconklinPermission is granted to process this file through TeX and print the
36218Sconklinresults, provided the printed document carries a copying permission
37218Sconklinnotice identical to this one except for the removal of this paragraph
38218Sconklin(this paragraph not being relevant to the printed manual).
39218Sconklin@end ignore
40218Sconklin
41218SconklinPermission is granted to copy and distribute modified versions of this
42218Sconklinmanual under the conditions for verbatim copying, provided also that the
43218Sconklinsection entitled ``GNU General Public License'' is included exactly as
44218Sconklinin the original, and provided that the entire resulting derived work is
45218Sconklindistributed under the terms of a permission notice identical to this one.
46218Sconklin
47218SconklinPermission is granted to copy and distribute translations of this manual
48218Sconklininto another language, under the above conditions for modified versions,
49218Sconklinexcept that the section entitled ``GNU General Public License'' may be
50218Sconklinincluded in a translation approved by the Free Software Foundation
51218Sconklininstead of in the original English.
52218Sconklin@end ifinfo
53218Sconklin
54218Sconklin
55218Sconklin@titlepage
56218Sconklin
57218Sconklin@title Regex
58218Sconklin@subtitle edition 0.12a
59218Sconklin@subtitle 19 September 1992
60218Sconklin@author Kathryn A. Hargreaves
61218Sconklin@author Karl Berry
62218Sconklin
63218Sconklin@page
64218Sconklin
65218Sconklin@vskip 0pt plus 1filll
66218SconklinCopyright @copyright{} 1992 Free Software Foundation.
67218Sconklin
68218SconklinPermission is granted to make and distribute verbatim copies of this
69218Sconklinmanual provided the copyright notice and this permission notice are
70218Sconklinpreserved on all copies.
71218Sconklin
72218SconklinPermission is granted to copy and distribute modified versions of this
73218Sconklinmanual under the conditions for verbatim copying, provided also that the
74218Sconklinsection entitled ``GNU General Public License'' is included exactly as
75218Sconklinin the original, and provided that the entire resulting derived work is
76218Sconklindistributed under the terms of a permission notice identical to this
77218Sconklinone.
78218Sconklin
79218SconklinPermission is granted to copy and distribute translations of this manual
80218Sconklininto another language, under the above conditions for modified versions,
81218Sconklinexcept that the section entitled ``GNU General Public License'' may be
82218Sconklinincluded in a translation approved by the Free Software Foundation
83218Sconklininstead of in the original English.
84218Sconklin
85218Sconklin@end titlepage
86218Sconklin
87218Sconklin
88218Sconklin@ifinfo
89218Sconklin@node Top, Overview, (dir), (dir)
90218Sconklin@top Regular Expression Library
91218Sconklin
92218SconklinThis manual documents how to program with the GNU regular expression
93218Sconklinlibrary.  This is edition 0.12a of the manual, 19 September 1992.
94218Sconklin
95218SconklinThe first part of this master menu lists the major nodes in this Info
96218Sconklindocument, including the index.  The rest of the menu lists all the
97218Sconklinlower level nodes in the document.
98218Sconklin
99218Sconklin@menu
100218Sconklin* Overview::
101218Sconklin* Regular Expression Syntax::
102218Sconklin* Common Operators::
103218Sconklin* GNU Operators::
104218Sconklin* GNU Emacs Operators::
105218Sconklin* What Gets Matched?::
106218Sconklin* Programming with Regex::
107218Sconklin* Copying::			Copying and sharing Regex.
108218Sconklin* Index::			General index.
109218Sconklin --- The Detailed Node Listing ---
110218Sconklin
111218SconklinRegular Expression Syntax
112218Sconklin
113218Sconklin* Syntax Bits::
114218Sconklin* Predefined Syntaxes::
115218Sconklin* Collating Elements vs. Characters::
116218Sconklin* The Backslash Character::
117218Sconklin
118218SconklinCommon Operators
119218Sconklin
120218Sconklin* Match-self Operator::			Ordinary characters.
121218Sconklin* Match-any-character Operator::	.
122218Sconklin* Concatenation Operator::		Juxtaposition.
123218Sconklin* Repetition Operators::		*  +  ? @{@}
124218Sconklin* Alternation Operator::		|
125218Sconklin* List Operators::			[...]  [^...]
126218Sconklin* Grouping Operators::			(...)
127218Sconklin* Back-reference Operator::		\digit
128218Sconklin* Anchoring Operators::			^  $
129218Sconklin
130218SconklinRepetition Operators    
131218Sconklin
132218Sconklin* Match-zero-or-more Operator::  *
133218Sconklin* Match-one-or-more Operator::   +
134218Sconklin* Match-zero-or-one Operator::   ?
135218Sconklin* Interval Operators::           @{@}
136218Sconklin
137218SconklinList Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
138218Sconklin
139218Sconklin* Character Class Operators::   [:class:]
140218Sconklin* Range Operator::          start-end
141218Sconklin
142218SconklinAnchoring Operators    
143218Sconklin
144218Sconklin* Match-beginning-of-line Operator::  ^
145218Sconklin* Match-end-of-line Operator::        $
146218Sconklin
147218SconklinGNU Operators
148218Sconklin
149218Sconklin* Word Operators::
150218Sconklin* Buffer Operators::
151218Sconklin
152218SconklinWord Operators
153218Sconklin
154218Sconklin* Non-Emacs Syntax Tables::
155218Sconklin* Match-word-boundary Operator::	\b
156218Sconklin* Match-within-word Operator::		\B
157218Sconklin* Match-beginning-of-word Operator::	\<
158218Sconklin* Match-end-of-word Operator::		\>
159218Sconklin* Match-word-constituent Operator::	\w
160218Sconklin* Match-non-word-constituent Operator::	\W
161218Sconklin
162218SconklinBuffer Operators    
163218Sconklin
164218Sconklin* Match-beginning-of-buffer Operator::	\`
165218Sconklin* Match-end-of-buffer Operator::	\'
166218Sconklin
167218SconklinGNU Emacs Operators
168218Sconklin
169218Sconklin* Syntactic Class Operators::
170218Sconklin
171218SconklinSyntactic Class Operators
172218Sconklin
173218Sconklin* Emacs Syntax Tables::
174218Sconklin* Match-syntactic-class Operator::	\sCLASS
175218Sconklin* Match-not-syntactic-class Operator::  \SCLASS
176218Sconklin
177218SconklinProgramming with Regex
178218Sconklin
179218Sconklin* GNU Regex Functions::
180218Sconklin* POSIX Regex Functions::
181218Sconklin* BSD Regex Functions::
182218Sconklin
183218SconklinGNU Regex Functions
184218Sconklin
185218Sconklin* GNU Pattern Buffers::         The re_pattern_buffer type.
186218Sconklin* GNU Regular Expression Compiling::  re_compile_pattern ()
187218Sconklin* GNU Matching::                re_match ()
188218Sconklin* GNU Searching::               re_search ()
189218Sconklin* Matching/Searching with Split Data::  re_match_2 (), re_search_2 ()
190218Sconklin* Searching with Fastmaps::     re_compile_fastmap ()
191218Sconklin* GNU Translate Tables::        The `translate' field.
192218Sconklin* Using Registers::             The re_registers type and related fns.
193218Sconklin* Freeing GNU Pattern Buffers::  regfree ()
194218Sconklin
195218SconklinPOSIX Regex Functions
196218Sconklin
197218Sconklin* POSIX Pattern Buffers::		The regex_t type.
198218Sconklin* POSIX Regular Expression Compiling::	regcomp ()
199218Sconklin* POSIX Matching::			regexec ()
200218Sconklin* Reporting Errors::			regerror ()
201218Sconklin* Using Byte Offsets::			The regmatch_t type.
202218Sconklin* Freeing POSIX Pattern Buffers::	regfree ()
203218Sconklin
204218SconklinBSD Regex Functions
205218Sconklin
206218Sconklin* BSD Regular Expression Compiling::	re_comp ()
207218Sconklin* BSD Searching::			re_exec ()
208218Sconklin@end menu
209218Sconklin@end ifinfo
210218Sconklin@node Overview, Regular Expression Syntax, Top, Top
211218Sconklin@chapter Overview
212218Sconklin
213218SconklinA @dfn{regular expression} (or @dfn{regexp}, or @dfn{pattern}) is a text
214218Sconklinstring that describes some (mathematical) set of strings.  A regexp
215218Sconklin@var{r} @dfn{matches} a string @var{s} if @var{s} is in the set of
216218Sconklinstrings described by @var{r}.
217218Sconklin
218218SconklinUsing the Regex library, you can:
219218Sconklin
220218Sconklin@itemize @bullet
221218Sconklin
222218Sconklin@item
223218Sconklinsee if a string matches a specified pattern as a whole, and 
224218Sconklin
225218Sconklin@item
226218Sconklinsearch within a string for a substring matching a specified pattern.
227218Sconklin
228218Sconklin@end itemize
229218Sconklin
230218SconklinSome regular expressions match only one string, i.e., the set they
231218Sconklindescribe has only one member.  For example, the regular expression
232218Sconklin@samp{foo} matches the string @samp{foo} and no others.  Other regular
233218Sconklinexpressions match more than one string, i.e., the set they describe has
234218Sconklinmore than one member.  For example, the regular expression @samp{f*}
235218Sconklinmatches the set of strings made up of any number (including zero) of
236218Sconklin@samp{f}s.  As you can see, some characters in regular expressions match
237218Sconklinthemselves (such as @samp{f}) and some don't (such as @samp{*}); the
238218Sconklinones that don't match themselves instead let you specify patterns that
239218Sconklindescribe many different strings.
240218Sconklin
241218SconklinTo either match or search for a regular expression with the Regex
242218Sconklinlibrary functions, you must first compile it with a Regex pattern
243218Sconklincompiling function.  A @dfn{compiled pattern} is a regular expression
244218Sconklinconverted to the internal format used by the library functions.  Once
245218Sconklinyou've compiled a pattern, you can use it for matching or searching any
246218Sconklinnumber of times.
247218Sconklin
248218SconklinThe Regex library consists of two source files: @file{regex.h} and
249218Sconklin@file{regex.c}.  
250218Sconklin@pindex regex.h
251218Sconklin@pindex regex.c
252218SconklinRegex provides three groups of functions with which you can operate on
253218Sconklinregular expressions.  One group---the @sc{gnu} group---is more powerful
254218Sconklinbut not completely compatible with the other two, namely the @sc{posix}
255218Sconklinand Berkeley @sc{unix} groups; its interface was designed specifically
256218Sconklinfor @sc{gnu}.  The other groups have the same interfaces as do the
257218Sconklinregular expression functions in @sc{posix} and Berkeley
258218Sconklin@sc{unix}.
259218Sconklin
260218SconklinWe wrote this chapter with programmers in mind, not users of
261218Sconklinprograms---such as Emacs---that use Regex.  We describe the Regex
262218Sconklinlibrary in its entirety, not how to write regular expressions that a
263218Sconklinparticular program understands.
264218Sconklin
265218Sconklin
266218Sconklin@node Regular Expression Syntax, Common Operators, Overview, Top
267218Sconklin@chapter Regular Expression Syntax
268218Sconklin
269218Sconklin@cindex regular expressions, syntax of
270218Sconklin@cindex syntax of regular expressions
271218Sconklin
272218Sconklin@dfn{Characters} are things you can type.  @dfn{Operators} are things in
273218Sconklina regular expression that match one or more characters.  You compose
274218Sconklinregular expressions from operators, which in turn you specify using one
275218Sconklinor more characters.
276218Sconklin
277218SconklinMost characters represent what we call the match-self operator, i.e.,
278218Sconklinthey match themselves; we call these characters @dfn{ordinary}.  Other
279218Sconklincharacters represent either all or parts of fancier operators; e.g.,
280218Sconklin@samp{.} represents what we call the match-any-character operator
281218Sconklin(which, no surprise, matches (almost) any character); we call these
282218Sconklincharacters @dfn{special}.  Two different things determine what
283218Sconklincharacters represent what operators:
284218Sconklin
285218Sconklin@enumerate
286218Sconklin@item
287218Sconklinthe regular expression syntax your program has told the Regex library to
288218Sconklinrecognize, and
289218Sconklin
290218Sconklin@item
291218Sconklinthe context of the character in the regular expression.
292218Sconklin@end enumerate
293218Sconklin
294218SconklinIn the following sections, we describe these things in more detail.
295218Sconklin
296218Sconklin@menu
297218Sconklin* Syntax Bits::
298218Sconklin* Predefined Syntaxes::
299218Sconklin* Collating Elements vs. Characters::
300218Sconklin* The Backslash Character::
301218Sconklin@end menu
302218Sconklin
303218Sconklin
304218Sconklin@node Syntax Bits, Predefined Syntaxes,  , Regular Expression Syntax
305218Sconklin@section Syntax Bits 
306218Sconklin
307218Sconklin@cindex syntax bits
308218Sconklin
309218SconklinIn any particular syntax for regular expressions, some characters are
310218Sconklinalways special, others are sometimes special, and others are never
311218Sconklinspecial.  The particular syntax that Regex recognizes for a given
312218Sconklinregular expression depends on the value in the @code{syntax} field of
313218Sconklinthe pattern buffer of that regular expression.
314218Sconklin
315218SconklinYou get a pattern buffer by compiling a regular expression.  @xref{GNU
316218SconklinPattern Buffers}, and @ref{POSIX Pattern Buffers}, for more information
317218Sconklinon pattern buffers.  @xref{GNU Regular Expression Compiling}, @ref{POSIX
318218SconklinRegular Expression Compiling}, and @ref{BSD Regular Expression
319218SconklinCompiling}, for more information on compiling.
320218Sconklin
321218SconklinRegex considers the value of the @code{syntax} field to be a collection
322218Sconklinof bits; we refer to these bits as @dfn{syntax bits}.  In most cases,
323218Sconklinthey affect what characters represent what operators.  We describe the
324218Sconklinmeanings of the operators to which we refer in @ref{Common Operators},
325218Sconklin@ref{GNU Operators}, and @ref{GNU Emacs Operators}.  
326218Sconklin
327218SconklinFor reference, here is the complete list of syntax bits, in alphabetical
328218Sconklinorder:
329218Sconklin
330218Sconklin@table @code
331218Sconklin
332218Sconklin@cnindex RE_BACKSLASH_ESCAPE_IN_LIST
333218Sconklin@item RE_BACKSLASH_ESCAPE_IN_LISTS
334218SconklinIf this bit is set, then @samp{\} inside a list (@pxref{List Operators}
335218Sconklinquotes (makes ordinary, if it's special) the following character; if
336218Sconklinthis bit isn't set, then @samp{\} is an ordinary character inside lists.
337218Sconklin(@xref{The Backslash Character}, for what `\' does outside of lists.)
338218Sconklin
339218Sconklin@cnindex RE_BK_PLUS_QM
340218Sconklin@item RE_BK_PLUS_QM
341218SconklinIf this bit is set, then @samp{\+} represents the match-one-or-more
342218Sconklinoperator and @samp{\?} represents the match-zero-or-more operator; if
343218Sconklinthis bit isn't set, then @samp{+} represents the match-one-or-more
344218Sconklinoperator and @samp{?} represents the match-zero-or-one operator.  This
345218Sconklinbit is irrelevant if @code{RE_LIMITED_OPS} is set.
346218Sconklin
347218Sconklin@cnindex RE_CHAR_CLASSES
348218Sconklin@item RE_CHAR_CLASSES
349218SconklinIf this bit is set, then you can use character classes in lists; if this
350218Sconklinbit isn't set, then you can't.
351218Sconklin
352218Sconklin@cnindex RE_CONTEXT_INDEP_ANCHORS
353218Sconklin@item RE_CONTEXT_INDEP_ANCHORS
354218SconklinIf this bit is set, then @samp{^} and @samp{$} are special anywhere outside
355218Sconklina list; if this bit isn't set, then these characters are special only in
356218Sconklincertain contexts.  @xref{Match-beginning-of-line Operator}, and
357218Sconklin@ref{Match-end-of-line Operator}.
358218Sconklin
359218Sconklin@cnindex RE_CONTEXT_INDEP_OPS
360218Sconklin@item RE_CONTEXT_INDEP_OPS
361218SconklinIf this bit is set, then certain characters are special anywhere outside
362218Sconklina list; if this bit isn't set, then those characters are special only in
363218Sconklinsome contexts and are ordinary elsewhere.  Specifically, if this bit
364218Sconklinisn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS}
365218Sconklinisn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, depending
366218Sconklinon the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operators
367218Sconklinonly if they're not first in a regular expression or just after an
368218Sconklinopen-group or alternation operator.  The same holds for @samp{@{} (or
369218Sconklin@samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) if
370218Sconklinit is the beginning of a valid interval and the syntax bit
371218Sconklin@code{RE_INTERVALS} is set.
372218Sconklin
373218Sconklin@cnindex RE_CONTEXT_INVALID_OPS
374218Sconklin@item RE_CONTEXT_INVALID_OPS
375218SconklinIf this bit is set, then repetition and alternation operators can't be
376218Sconklinin certain positions within a regular expression.  Specifically, the
377218Sconklinregular expression is invalid if it has:
378218Sconklin
379218Sconklin@itemize @bullet
380218Sconklin
381218Sconklin@item
382218Sconklina repetition operator first in the regular expression or just after a
383218Sconklinmatch-beginning-of-line, open-group, or alternation operator; or
384218Sconklin
385218Sconklin@item
386218Sconklinan alternation operator first or last in the regular expression, just
387218Sconklinbefore a match-end-of-line operator, or just after an alternation or
388218Sconklinopen-group operator.
389218Sconklin
390218Sconklin@end itemize
391218Sconklin
392218SconklinIf this bit isn't set, then you can put the characters representing the
393218Sconklinrepetition and alternation characters anywhere in a regular expression.
394218SconklinWhether or not they will in fact be operators in certain positions
395218Sconklindepends on other syntax bits.
396218Sconklin
397218Sconklin@cnindex RE_DOT_NEWLINE
398218Sconklin@item RE_DOT_NEWLINE
399218SconklinIf this bit is set, then the match-any-character operator matches
400218Sconklina newline; if this bit isn't set, then it doesn't.
401218Sconklin
402218Sconklin@cnindex RE_DOT_NOT_NULL
403218Sconklin@item RE_DOT_NOT_NULL
404218SconklinIf this bit is set, then the match-any-character operator doesn't match
405218Sconklina null character; if this bit isn't set, then it does.
406218Sconklin
407218Sconklin@cnindex RE_INTERVALS
408218Sconklin@item RE_INTERVALS
409218SconklinIf this bit is set, then Regex recognizes interval operators; if this bit
410218Sconklinisn't set, then it doesn't.
411218Sconklin
412218Sconklin@cnindex RE_LIMITED_OPS
413218Sconklin@item RE_LIMITED_OPS
414218SconklinIf this bit is set, then Regex doesn't recognize the match-one-or-more,
415218Sconklinmatch-zero-or-one or alternation operators; if this bit isn't set, then
416218Sconklinit does.
417218Sconklin
418218Sconklin@cnindex RE_NEWLINE_ALT
419218Sconklin@item RE_NEWLINE_ALT
420218SconklinIf this bit is set, then newline represents the alternation operator; if
421218Sconklinthis bit isn't set, then newline is ordinary.
422218Sconklin
423218Sconklin@cnindex RE_NO_BK_BRACES
424218Sconklin@item RE_NO_BK_BRACES
425218SconklinIf this bit is set, then @samp{@{} represents the open-interval operator
426218Sconklinand @samp{@}} represents the close-interval operator; if this bit isn't
427218Sconklinset, then @samp{\@{} represents the open-interval operator and
428218Sconklin@samp{\@}} represents the close-interval operator.  This bit is relevant
429218Sconklinonly if @code{RE_INTERVALS} is set.
430218Sconklin
431218Sconklin@cnindex RE_NO_BK_PARENS
432218Sconklin@item RE_NO_BK_PARENS
433218SconklinIf this bit is set, then @samp{(} represents the open-group operator and
434218Sconklin@samp{)} represents the close-group operator; if this bit isn't set, then
435218Sconklin@samp{\(} represents the open-group operator and @samp{\)} represents
436218Sconklinthe close-group operator.
437218Sconklin
438218Sconklin@cnindex RE_NO_BK_REFS
439218Sconklin@item RE_NO_BK_REFS
440218SconklinIf this bit is set, then Regex doesn't recognize @samp{\}@var{digit} as
441218Sconklinthe back reference operator; if this bit isn't set, then it does.
442218Sconklin
443218Sconklin@cnindex RE_NO_BK_VBAR
444218Sconklin@item RE_NO_BK_VBAR
445218SconklinIf this bit is set, then @samp{|} represents the alternation operator;
446218Sconklinif this bit isn't set, then @samp{\|} represents the alternation
447218Sconklinoperator.  This bit is irrelevant if @code{RE_LIMITED_OPS} is set.
448218Sconklin
449218Sconklin@cnindex RE_NO_EMPTY_RANGES
450218Sconklin@item RE_NO_EMPTY_RANGES
451218SconklinIf this bit is set, then a regular expression with a range whose ending
452218Sconklinpoint collates lower than its starting point is invalid; if this bit
453218Sconklinisn't set, then Regex considers such a range to be empty.
454218Sconklin
455218Sconklin@cnindex RE_UNMATCHED_RIGHT_PAREN_ORD
456218Sconklin@item RE_UNMATCHED_RIGHT_PAREN_ORD
457218SconklinIf this bit is set and the regular expression has no matching open-group
458218Sconklinoperator, then Regex considers what would otherwise be a close-group
459218Sconklinoperator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}.
460218Sconklin
461218Sconklin@end table
462218Sconklin
463218Sconklin
464218Sconklin@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax
465218Sconklin@section Predefined Syntaxes    
466218Sconklin
467218SconklinIf you're programming with Regex, you can set a pattern buffer's
468218Sconklin(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers})
469218Sconklin@code{syntax} field either to an arbitrary combination of syntax bits
470218Sconklin(@pxref{Syntax Bits}) or else to the configurations defined by Regex.
471218SconklinThese configurations define the syntaxes used by certain
472218Sconklinprograms---@sc{gnu} Emacs,
473218Sconklin@cindex Emacs 
474218Sconklin@sc{posix} Awk,
475218Sconklin@cindex POSIX Awk
476218Sconklintraditional Awk, 
477218Sconklin@cindex Awk
478218SconklinGrep,
479218Sconklin@cindex Grep
480218Sconklin@cindex Egrep
481218SconklinEgrep---in addition to syntaxes for @sc{posix} basic and extended
482218Sconklinregular expressions.
483218Sconklin
484218SconklinThe predefined syntaxes--taken directly from @file{regex.h}---are:
485218Sconklin
486218Sconklin@example
487218Sconklin[[[ syntaxes ]]]
488218Sconklin@end example
489218Sconklin
490218Sconklin@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax
491218Sconklin@section Collating Elements vs.@: Characters    
492218Sconklin
493218Sconklin@sc{posix} generalizes the notion of a character to that of a
494218Sconklincollating element.  It defines a @dfn{collating element} to be ``a
495218Sconklinsequence of one or more bytes defined in the current collating sequence
496218Sconklinas a unit of collation.''
497218Sconklin
498218SconklinThis generalizes the notion of a character in
499218Sconklintwo ways.  First, a single character can map into two or more collating
500218Sconklinelements.  For example, the German
501218Sconklin@tex
502218Sconklin`\ss'
503218Sconklin@end tex
504218Sconklin@ifinfo
505218Sconklin``es-zet''
506218Sconklin@end ifinfo
507218Sconklincollates as the collating element @samp{s} followed by another collating
508218Sconklinelement @samp{s}.  Second, two or more characters can map into one
509218Sconklincollating element.  For example, the Spanish @samp{ll} collates after
510218Sconklin@samp{l} and before @samp{m}.
511218Sconklin
512218SconklinSince @sc{posix}'s ``collating element'' preserves the essential idea of
513218Sconklina ``character,'' we use the latter, more familiar, term in this document.
514218Sconklin
515218Sconklin@node The Backslash Character,  , Collating Elements vs. Characters, Regular Expression Syntax
516218Sconklin@section The Backslash Character
517218Sconklin
518218Sconklin@cindex \
519218SconklinThe @samp{\} character has one of four different meanings, depending on
520218Sconklinthe context in which you use it and what syntax bits are set
521218Sconklin(@pxref{Syntax Bits}).  It can: 1) stand for itself, 2) quote the next
522218Sconklincharacter, 3) introduce an operator, or 4) do nothing.
523218Sconklin
524218Sconklin@enumerate
525218Sconklin@item
526218SconklinIt stands for itself inside a list
527218Sconklin(@pxref{List Operators}) if the syntax bit
528218Sconklin@code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set.  For example, @samp{[\]}
529218Sconklinwould match @samp{\}.
530218Sconklin
531218Sconklin@item
532218SconklinIt quotes (makes ordinary, if it's special) the next character when you
533218Sconklinuse it either:
534218Sconklin
535218Sconklin@itemize @bullet
536218Sconklin@item
537218Sconklinoutside a list,@footnote{Sometimes
538218Sconklinyou don't have to explicitly quote special characters to make
539218Sconklinthem ordinary.  For instance, most characters lose any special meaning
540218Sconklininside a list (@pxref{List Operators}).  In addition, if the syntax bits
541218Sconklin@code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS}
542218Sconklinaren't set, then (for historical reasons) the matcher considers special
543218Sconklincharacters ordinary if they are in contexts where the operations they
544218Sconklinrepresent make no sense; for example, then the match-zero-or-more
545218Sconklinoperator (represented by @samp{*}) matches itself in the regular
546218Sconklinexpression @samp{*foo} because there is no preceding expression on which
547218Sconklinit can operate.  It is poor practice, however, to depend on this
548218Sconklinbehavior; if you want a special character to be ordinary outside a list,
549218Sconklinit's better to always quote it, regardless.} or
550218Sconklin
551218Sconklin@item
552218Sconklininside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set.
553218Sconklin
554218Sconklin@end itemize
555218Sconklin
556218Sconklin@item
557218SconklinIt introduces an operator when followed by certain ordinary
558218Sconklincharacters---sometimes only when certain syntax bits are set.  See the
559218Sconklincases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},
560218Sconklin@code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}.  Also:
561218Sconklin
562218Sconklin@itemize @bullet
563218Sconklin@item
564218Sconklin@samp{\b} represents the match-word-boundary operator
565218Sconklin(@pxref{Match-word-boundary Operator}).
566218Sconklin
567218Sconklin@item
568218Sconklin@samp{\B} represents the match-within-word operator
569218Sconklin(@pxref{Match-within-word Operator}).
570218Sconklin
571218Sconklin@item
572218Sconklin@samp{\<} represents the match-beginning-of-word operator @*
573218Sconklin(@pxref{Match-beginning-of-word Operator}).
574218Sconklin
575218Sconklin@item
576218Sconklin@samp{\>} represents the match-end-of-word operator
577218Sconklin(@pxref{Match-end-of-word Operator}).
578218Sconklin
579218Sconklin@item
580218Sconklin@samp{\w} represents the match-word-constituent operator
581218Sconklin(@pxref{Match-word-constituent Operator}).
582218Sconklin
583218Sconklin@item
584218Sconklin@samp{\W} represents the match-non-word-constituent operator
585218Sconklin(@pxref{Match-non-word-constituent Operator}).
586218Sconklin
587218Sconklin@item
588218Sconklin@samp{\`} represents the match-beginning-of-buffer
589218Sconklinoperator and @samp{\'} represents the match-end-of-buffer operator
590218Sconklin(@pxref{Buffer Operators}).
591218Sconklin
592218Sconklin@item
593218SconklinIf Regex was compiled with the C preprocessor symbol @code{emacs}
594218Sconklindefined, then @samp{\s@var{class}} represents the match-syntactic-class
595218Sconklinoperator and @samp{\S@var{class}} represents the
596218Sconklinmatch-not-syntactic-class operator (@pxref{Syntactic Class Operators}).
597218Sconklin
598218Sconklin@end itemize
599218Sconklin
600218Sconklin@item
601218SconklinIn all other cases, Regex ignores @samp{\}.  For example,
602218Sconklin@samp{\n} matches @samp{n}.
603218Sconklin
604218Sconklin@end enumerate
605218Sconklin
606218Sconklin@node Common Operators, GNU Operators, Regular Expression Syntax, Top
607218Sconklin@chapter Common Operators
608218Sconklin
609218SconklinYou compose regular expressions from operators.  In the following
610218Sconklinsections, we describe the regular expression operators specified by
611218Sconklin@sc{posix}; @sc{gnu} also uses these.  Most operators have more than one
612218Sconklinrepresentation as characters.  @xref{Regular Expression Syntax}, for
613218Sconklinwhat characters represent what operators under what circumstances.
614218Sconklin
615218SconklinFor most operators that can be represented in two ways, one
616218Sconklinrepresentation is a single character and the other is that character
617218Sconklinpreceded by @samp{\}.  For example, either @samp{(} or @samp{\(}
618218Sconklinrepresents the open-group operator.  Which one does depends on the
619218Sconklinsetting of a syntax bit, in this case @code{RE_NO_BK_PARENS}.  Why is
620218Sconklinthis so?  Historical reasons dictate some of the varying
621218Sconklinrepresentations, while @sc{posix} dictates others.  
622218Sconklin
623218SconklinFinally, almost all characters lose any special meaning inside a list
624218Sconklin(@pxref{List Operators}).
625218Sconklin
626218Sconklin@menu
627218Sconklin* Match-self Operator::			Ordinary characters.
628218Sconklin* Match-any-character Operator::	.
629218Sconklin* Concatenation Operator::		Juxtaposition.
630218Sconklin* Repetition Operators::		*  +  ? @{@}
631218Sconklin* Alternation Operator::		|
632218Sconklin* List Operators::			[...]  [^...]
633218Sconklin* Grouping Operators::			(...)
634218Sconklin* Back-reference Operator::		\digit
635218Sconklin* Anchoring Operators::			^  $
636218Sconklin@end menu
637218Sconklin
638218Sconklin@node Match-self Operator, Match-any-character Operator,  , Common Operators
639218Sconklin@section The Match-self Operator (@var{ordinary character})
640218Sconklin
641218SconklinThis operator matches the character itself.  All ordinary characters
642218Sconklin(@pxref{Regular Expression Syntax}) represent this operator.  For
643218Sconklinexample, @samp{f} is always an ordinary character, so the regular
644218Sconklinexpression @samp{f} matches only the string @samp{f}.  In
645218Sconklinparticular, it does @emph{not} match the string @samp{ff}.
646218Sconklin
647218Sconklin@node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators
648218Sconklin@section The Match-any-character Operator (@code{.})
649218Sconklin
650218Sconklin@cindex @samp{.}
651218Sconklin
652218SconklinThis operator matches any single printing or nonprinting character
653218Sconklinexcept it won't match a:
654218Sconklin
655218Sconklin@table @asis
656218Sconklin@item newline
657218Sconklinif the syntax bit @code{RE_DOT_NEWLINE} isn't set.
658218Sconklin
659218Sconklin@item null
660218Sconklinif the syntax bit @code{RE_DOT_NOT_NULL} is set.
661218Sconklin
662218Sconklin@end table
663218Sconklin
664218SconklinThe @samp{.} (period) character represents this operator.  For example,
665218Sconklin@samp{a.b} matches any three-character string beginning with @samp{a}
666218Sconklinand ending with @samp{b}.
667218Sconklin
668218Sconklin@node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators
669218Sconklin@section The Concatenation Operator
670218Sconklin
671218SconklinThis operator concatenates two regular expressions @var{a} and @var{b}.
672218SconklinNo character represents this operator; you simply put @var{b} after
673218Sconklin@var{a}.  The result is a regular expression that will match a string if
674218Sconklin@var{a} matches its first part and @var{b} matches the rest.  For
675218Sconklinexample, @samp{xy} (two match-self operators) matches @samp{xy}.
676218Sconklin
677218Sconklin@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators
678218Sconklin@section Repetition Operators    
679218Sconklin
680218SconklinRepetition operators repeat the preceding regular expression a specified
681218Sconklinnumber of times.
682218Sconklin
683218Sconklin@menu
684218Sconklin* Match-zero-or-more Operator::  *
685218Sconklin* Match-one-or-more Operator::   +
686218Sconklin* Match-zero-or-one Operator::   ?
687218Sconklin* Interval Operators::           @{@}
688218Sconklin@end menu
689218Sconklin
690218Sconklin@node Match-zero-or-more Operator, Match-one-or-more Operator,  , Repetition Operators
691218Sconklin@subsection The Match-zero-or-more Operator (@code{*})
692218Sconklin
693218Sconklin@cindex @samp{*}
694218Sconklin
695218SconklinThis operator repeats the smallest possible preceding regular expression
696218Sconklinas many times as necessary (including zero) to match the pattern.
697218Sconklin@samp{*} represents this operator.  For example, @samp{o*}
698218Sconklinmatches any string made up of zero or more @samp{o}s.  Since this
699218Sconklinoperator operates on the smallest preceding regular expression,
700218Sconklin@samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}.  So,
701218Sconklin@samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
702218Sconklin
703218SconklinSince the match-zero-or-more operator is a suffix operator, it may be
704218Sconklinuseless as such when no regular expression precedes it.  This is the
705218Sconklincase when it:
706218Sconklin
707218Sconklin@itemize @bullet
708218Sconklin@item 
709218Sconklinis first in a regular expression, or
710218Sconklin
711218Sconklin@item 
712218Sconklinfollows a match-beginning-of-line, open-group, or alternation
713218Sconklinoperator.
714218Sconklin
715218Sconklin@end itemize
716218Sconklin
717218Sconklin@noindent
718218SconklinThree different things can happen in these cases:
719218Sconklin
720218Sconklin@enumerate
721218Sconklin@item
722218SconklinIf the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then the
723218Sconklinregular expression is invalid.
724218Sconklin
725218Sconklin@item
726218SconklinIf @code{RE_CONTEXT_INVALID_OPS} isn't set, but
727218Sconklin@code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents the
728218Sconklinmatch-zero-or-more operator (which then operates on the empty string).
729218Sconklin
730218Sconklin@item
731218SconklinOtherwise, @samp{*} is ordinary.
732218Sconklin
733218Sconklin@end enumerate
734218Sconklin
735218Sconklin@cindex backtracking
736218SconklinThe matcher processes a match-zero-or-more operator by first matching as
737218Sconklinmany repetitions of the smallest preceding regular expression as it can.
738218SconklinThen it continues to match the rest of the pattern.  
739218Sconklin
740218SconklinIf it can't match the rest of the pattern, it backtracks (as many times
741218Sconklinas necessary), each time discarding one of the matches until it can
742218Sconklineither match the entire pattern or be certain that it cannot get a
743218Sconklinmatch.  For example, when matching @samp{ca*ar} against @samp{caaar},
744218Sconklinthe matcher first matches all three @samp{a}s of the string with the
745218Sconklin@samp{a*} of the regular expression.  However, it cannot then match the
746218Sconklinfinal @samp{ar} of the regular expression against the final @samp{r} of
747218Sconklinthe string.  So it backtracks, discarding the match of the last @samp{a}
748218Sconklinin the string.  It can then match the remaining @samp{ar}.
749218Sconklin
750218Sconklin
751218Sconklin@node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators
752218Sconklin@subsection The Match-one-or-more Operator (@code{+} or @code{\+})
753218Sconklin
754218Sconklin@cindex @samp{+} 
755218Sconklin
756218SconklinIf the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize
757218Sconklinthis operator.  Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't
758218Sconklinset, then @samp{+} represents this operator; if it is, then @samp{\+}
759218Sconklindoes.
760218Sconklin
761218SconklinThis operator is similar to the match-zero-or-more operator except that
762218Sconklinit repeats the preceding regular expression at least once;
763218Sconklin@pxref{Match-zero-or-more Operator}, for what it operates on, how some
764218Sconklinsyntax bits affect it, and how Regex backtracks to match it.
765218Sconklin
766218SconklinFor example, supposing that @samp{+} represents the match-one-or-more
767218Sconklinoperator; then @samp{ca+r} matches, e.g., @samp{car} and
768218Sconklin@samp{caaaar}, but not @samp{cr}.
769218Sconklin
770218Sconklin@node Match-zero-or-one Operator, Interval Operators, Match-one-or-more Operator, Repetition Operators
771218Sconklin@subsection The Match-zero-or-one Operator (@code{?} or @code{\?})
772218Sconklin@cindex @samp{?}
773218Sconklin
774218SconklinIf the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
775218Sconklinrecognize this operator.  Otherwise, if the syntax bit
776218Sconklin@code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator;
777218Sconklinif it is, then @samp{\?} does.
778218Sconklin
779218SconklinThis operator is similar to the match-zero-or-more operator except that
780218Sconklinit repeats the preceding regular expression once or not at all;
781218Sconklin@pxref{Match-zero-or-more Operator}, to see what it operates on, how
782218Sconklinsome syntax bits affect it, and how Regex backtracks to match it.
783218Sconklin
784218SconklinFor example, supposing that @samp{?} represents the match-zero-or-one
785218Sconklinoperator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, but
786218Sconklinnothing else.
787218Sconklin
788218Sconklin@node Interval Operators,  , Match-zero-or-one Operator, Repetition Operators
789218Sconklin@subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}})
790218Sconklin
791218Sconklin@cindex interval expression
792218Sconklin@cindex @samp{@{}
793218Sconklin@cindex @samp{@}}
794218Sconklin@cindex @samp{\@{}
795218Sconklin@cindex @samp{\@}}
796218Sconklin
797218SconklinIf the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes
798218Sconklin@dfn{interval expressions}.  They repeat the smallest possible preceding
799218Sconklinregular expression a specified number of times.
800218Sconklin
801218SconklinIf the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} represents
802218Sconklinthe @dfn{open-interval operator} and @samp{@}} represents the
803218Sconklin@dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do.
804218Sconklin
805218SconklinSpecifically, supposing that @samp{@{} and @samp{@}} represent the
806218Sconklinopen-interval and close-interval operators; then:
807218Sconklin
808218Sconklin@table @code
809218Sconklin@item  @{@var{count}@}
810218Sconklinmatches exactly @var{count} occurrences of the preceding regular
811218Sconklinexpression.
812218Sconklin
813218Sconklin@item @{@var{min,}@}
814218Sconklinmatches @var{min} or more occurrences of the preceding regular
815218Sconklinexpression.
816218Sconklin
817218Sconklin@item  @{@var{min, max}@}
818218Sconklinmatches at least @var{min} but no more than @var{max} occurrences of
819218Sconklinthe preceding regular expression.
820218Sconklin
821218Sconklin@end table
822218Sconklin
823218SconklinThe interval expression (but not necessarily the regular expression that
824218Sconklincontains it) is invalid if:
825218Sconklin
826218Sconklin@itemize @bullet
827218Sconklin@item
828218Sconklin@var{min} is greater than @var{max}, or 
829218Sconklin
830218Sconklin@item
831218Sconklinany of @var{count}, @var{min}, or @var{max} are outside the range
832218Sconklinzero to @code{RE_DUP_MAX} (which symbol @file{regex.h}
833218Sconklindefines).
834218Sconklin
835218Sconklin@end itemize
836218Sconklin
837218SconklinIf the interval expression is invalid and the syntax bit
838218Sconklin@code{RE_NO_BK_BRACES} is set, then Regex considers all the
839218Sconklincharacters in the would-be interval to be ordinary.  If that bit
840218Sconklinisn't set, then the regular expression is invalid.
841218Sconklin
842218SconklinIf the interval expression is valid but there is no preceding regular
843218Sconklinexpression on which to operate, then if the syntax bit
844218Sconklin@code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid.
845218SconklinIf that bit isn't set, then Regex considers all the characters---other
846218Sconklinthan backslashes, which it ignores---in the would-be interval to be
847218Sconklinordinary.
848218Sconklin
849218Sconklin
850218Sconklin@node Alternation Operator, List Operators, Repetition Operators, Common Operators
851218Sconklin@section The Alternation Operator (@code{|} or @code{\|})
852218Sconklin
853218Sconklin@kindex |
854218Sconklin@kindex \|
855218Sconklin@cindex alternation operator
856218Sconklin@cindex or operator
857218Sconklin
858218SconklinIf the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
859218Sconklinrecognize this operator.  Otherwise, if the syntax bit
860218Sconklin@code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator;
861218Sconklinotherwise, @samp{\|} does.
862218Sconklin
863218SconklinAlternatives match one of a choice of regular expressions:
864218Sconklinif you put the character(s) representing the alternation operator between
865218Sconklinany two regular expressions @var{a} and @var{b}, the result matches
866218Sconklinthe union of the strings that @var{a} and @var{b} match.  For
867218Sconklinexample, supposing that @samp{|} is the alternation operator, then
868218Sconklin@samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or
869218Sconklin@samp{quux}.
870218Sconklin
871218Sconklin@ignore
872218Sconklin@c Nobody needs to disallow empty alternatives any more.
873218SconklinIf the syntax bit @code{RE_NO_EMPTY_ALTS} is set, then if either of the regular
874218Sconklinexpressions @var{a} or @var{b} is empty, the
875218Sconklinregular expression is invalid.  More precisely, if this syntax bit is
876218Sconklinset, then the alternation operator can't:
877218Sconklin
878218Sconklin@itemize @bullet
879218Sconklin@item
880218Sconklinbe first or last in a regular expression;
881218Sconklin
882218Sconklin@item
883218Sconklinfollow either another alternation operator or an open-group operator
884218Sconklin(@pxref{Grouping Operators}); or
885218Sconklin
886218Sconklin@item
887218Sconklinprecede a close-group operator.
888218Sconklin
889218Sconklin@end itemize
890218Sconklin
891218Sconklin@noindent
892218SconklinFor example, supposing @samp{(} and @samp{)} represent the open and
893218Sconklinclose-group operators, then @samp{|foo}, @samp{foo|}, @samp{foo||bar},
894218Sconklin@samp{foo(|bar)}, and @samp{(foo|)bar} would all be invalid.
895218Sconklin@end ignore
896218Sconklin
897218SconklinThe alternation operator operates on the @emph{largest} possible
898218Sconklinsurrounding regular expressions.  (Put another way, it has the lowest
899218Sconklinprecedence of any regular expression operator.)
900218SconklinThus, the only way you can
901218Sconklindelimit its arguments is to use grouping.  For example, if @samp{(} and
902218Sconklin@samp{)} are the open and close-group operators, then @samp{fo(o|b)ar}
903218Sconklinwould match either @samp{fooar} or @samp{fobar}.  (@samp{foo|bar} would
904218Sconklinmatch @samp{foo} or @samp{bar}.)
905218Sconklin
906218Sconklin@cindex backtracking
907218SconklinThe matcher usually tries all combinations of alternatives so as to 
908218Sconklinmatch the longest possible string.  For example, when matching
909218Sconklin@samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot
910218Sconklintake, say, the first (``depth-first'') combination it could match, since
911218Sconklinthen it would be content to match just @samp{fooqbar}.  
912218Sconklin
913218Sconklin@comment xx something about leftmost-longest
914218Sconklin
915218Sconklin
916218Sconklin@node List Operators, Grouping Operators, Alternation Operator, Common Operators
917218Sconklin@section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
918218Sconklin
919218Sconklin@cindex matching list
920218Sconklin@cindex @samp{[}
921218Sconklin@cindex @samp{]}
922218Sconklin@cindex @samp{^}
923218Sconklin@cindex @samp{-}
924218Sconklin@cindex @samp{\}
925218Sconklin@cindex @samp{[^}
926218Sconklin@cindex nonmatching list
927218Sconklin@cindex matching newline
928218Sconklin@cindex bracket expression
929218Sconklin
930218Sconklin@dfn{Lists}, also called @dfn{bracket expressions}, are a set of one or
931218Sconklinmore items.  An @dfn{item} is a character,
932218Sconklin@ignore
933218Sconklin(These get added when they get implemented.)
934218Sconklina collating symbol, an equivalence class expression, 
935218Sconklin@end ignore
936218Sconklina character class expression, or a range expression.  The syntax bits
937218Sconklinaffect which kinds of items you can put in a list.  We explain the last
938218Sconklintwo items in subsections below.  Empty lists are invalid.
939218Sconklin
940218SconklinA @dfn{matching list} matches a single character represented by one of
941218Sconklinthe list items.  You form a matching list by enclosing one or more items
942218Sconklinwithin an @dfn{open-matching-list operator} (represented by @samp{[})
943218Sconklinand a @dfn{close-list operator} (represented by @samp{]}).  
944218Sconklin
945218SconklinFor example, @samp{[ab]} matches either @samp{a} or @samp{b}.
946218Sconklin@samp{[ad]*} matches the empty string and any string composed of just
947218Sconklin@samp{a}s and @samp{d}s in any order.  Regex considers invalid a regular
948218Sconklinexpression with a @samp{[} but no matching
949218Sconklin@samp{]}.
950218Sconklin
951218Sconklin@dfn{Nonmatching lists} are similar to matching lists except that they
952218Sconklinmatch a single character @emph{not} represented by one of the list
953218Sconklinitems.  You use an @dfn{open-nonmatching-list operator} (represented by
954218Sconklin@samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to be
955218Sconklinthe first character in the list.  If you put a @samp{^} character first
956218Sconklinin (what you think is) a matching list, you'll turn it into a
957218Sconklinnonmatching list.}) instead of an open-matching-list operator to start a
958218Sconklinnonmatching list.  
959218Sconklin
960218SconklinFor example, @samp{[^ab]} matches any character except @samp{a} or
961218Sconklin@samp{b}.  
962218Sconklin
963218SconklinIf the @code{posix_newline} field in the pattern buffer (@pxref{GNU
964218SconklinPattern Buffers} is set, then nonmatching lists do not match a newline.
965218Sconklin
966218SconklinMost characters lose any special meaning inside a list.  The special
967218Sconklincharacters inside a list follow.
968218Sconklin
969218Sconklin@table @samp
970218Sconklin@item ]
971218Sconklinends the list if it's not the first list item.  So, if you want to make
972218Sconklinthe @samp{]} character a list item, you must put it first.
973218Sconklin
974218Sconklin@item \
975218Sconklinquotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is
976218Sconklinset.
977218Sconklin
978218Sconklin@ignore
979218SconklinPut these in if they get implemented.
980218Sconklin
981218Sconklin@item [.
982218Sconklinrepresents the open-collating-symbol operator (@pxref{Collating Symbol
983218SconklinOperators}).
984218Sconklin
985218Sconklin@item .]
986218Sconklinrepresents the close-collating-symbol operator.
987218Sconklin
988218Sconklin@item [=
989218Sconklinrepresents the open-equivalence-class operator (@pxref{Equivalence Class
990218SconklinOperators}).
991218Sconklin
992218Sconklin@item =]
993218Sconklinrepresents the close-equivalence-class operator.
994218Sconklin
995218Sconklin@end ignore
996218Sconklin
997218Sconklin@item [:
998218Sconklinrepresents the open-character-class operator (@pxref{Character Class
999218SconklinOperators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and what
1000218Sconklinfollows is a valid character class expression.
1001218Sconklin
1002218Sconklin@item :]
1003218Sconklinrepresents the close-character-class operator if the syntax bit
1004218Sconklin@code{RE_CHAR_CLASSES} is set and what precedes it is an
1005218Sconklinopen-character-class operator followed by a valid character class name.
1006218Sconklin
1007218Sconklin@item - 
1008218Sconklinrepresents the range operator (@pxref{Range Operator}) if it's
1009218Sconklinnot first or last in a list or the ending point of a range.
1010218Sconklin
1011218Sconklin@end table
1012218Sconklin
1013218Sconklin@noindent
1014218SconklinAll other characters are ordinary.  For example, @samp{[.*]} matches 
1015218Sconklin@samp{.} and @samp{*}.  
1016218Sconklin
1017218Sconklin@menu
1018218Sconklin* Character Class Operators::   [:class:]
1019218Sconklin* Range Operator::          start-end
1020218Sconklin@end menu
1021218Sconklin
1022218Sconklin@ignore
1023218Sconklin(If collating symbols and equivalence class expressions get implemented,
1024218Sconklinthen add this.)
1025218Sconklin
1026218Sconklinnode Collating Symbol Operators
1027218Sconklinsubsubsection Collating Symbol Operators (@code{[.} @dots{} @code{.]})
1028218Sconklin
1029218SconklinIf the syntax bit @code{XX} is set, then you can represent
1030218Sconklincollating symbols inside lists.  You form a @dfn{collating symbol} by
1031218Sconklinputting a collating element between an @dfn{open-collating-symbol
1032218Sconklinoperator} and an @dfn{close-collating-symbol operator}.  @samp{[.}
1033218Sconklinrepresents the open-collating-symbol operator and @samp{.]} represents
1034218Sconklinthe close-collating-symbol operator.  For example, if @samp{ll} is a
1035218Sconklincollating element, then @samp{[[.ll.]]} would match @samp{ll}.
1036218Sconklin
1037218Sconklinnode Equivalence Class Operators
1038218Sconklinsubsubsection Equivalence Class Operators (@code{[=} @dots{} @code{=]})
1039218Sconklin@cindex equivalence class expression in regex
1040218Sconklin@cindex @samp{[=} in regex
1041218Sconklin@cindex @samp{=]} in regex
1042218Sconklin
1043218SconklinIf the syntax bit @code{XX} is set, then Regex recognizes equivalence class
1044218Sconklinexpressions inside lists.  A @dfn{equivalence class expression} is a set
1045218Sconklinof collating elements which all belong to the same equivalence class.
1046218SconklinYou form an equivalence class expression by putting a collating
1047218Sconklinelement between an @dfn{open-equivalence-class operator} and a
1048218Sconklin@dfn{close-equivalence-class operator}.  @samp{[=} represents the
1049218Sconklinopen-equivalence-class operator and @samp{=]} represents the
1050218Sconklinclose-equivalence-class operator.  For example, if @samp{a} and @samp{A}
1051218Sconklinwere an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]}
1052218Sconklinwould match both @samp{a} and @samp{A}.  If the collating element in an
1053218Sconklinequivalence class expression isn't part of an equivalence class, then
1054218Sconklinthe matcher considers the equivalence class expression to be a collating
1055218Sconklinsymbol.
1056218Sconklin
1057218Sconklin@end ignore
1058218Sconklin
1059218Sconklin@node Character Class Operators, Range Operator,  , List Operators
1060218Sconklin@subsection Character Class Operators (@code{[:} @dots{} @code{:]})
1061218Sconklin
1062218Sconklin@cindex character classes
1063218Sconklin@cindex @samp{[:} in regex
1064218Sconklin@cindex @samp{:]} in regex
1065218Sconklin
1066218SconklinIf the syntax bit @code{RE_CHARACTER_CLASSES} is set, then Regex
1067218Sconklinrecognizes character class expressions inside lists.  A @dfn{character
1068218Sconklinclass expression} matches one character from a given class.  You form a
1069218Sconklincharacter class expression by putting a character class name between an
1070218Sconklin@dfn{open-character-class operator} (represented by @samp{[:}) and a
1071218Sconklin@dfn{close-character-class operator} (represented by @samp{:]}).  The
1072218Sconklincharacter class names and their meanings are:
1073218Sconklin
1074218Sconklin@table @code
1075218Sconklin
1076218Sconklin@item alnum 
1077218Sconklinletters and digits
1078218Sconklin
1079218Sconklin@item alpha
1080218Sconklinletters
1081218Sconklin
1082218Sconklin@item blank
1083218Sconklinsystem-dependent; for @sc{gnu}, a space or tab
1084218Sconklin
1085218Sconklin@item cntrl
1086218Sconklincontrol characters (in the @sc{ascii} encoding, code 0177 and codes
1087218Sconklinless than 040)
1088218Sconklin
1089218Sconklin@item digit
1090218Sconklindigits
1091218Sconklin
1092218Sconklin@item graph
1093218Sconklinsame as @code{print} except omits space
1094218Sconklin
1095218Sconklin@item lower 
1096218Sconklinlowercase letters
1097218Sconklin
1098218Sconklin@item print
1099218Sconklinprintable characters (in the @sc{ascii} encoding, space 
1100218Sconklintilde---codes 040 through 0176)
1101218Sconklin
1102218Sconklin@item punct
1103218Sconklinneither control nor alphanumeric characters
1104218Sconklin
1105218Sconklin@item space
1106218Sconklinspace, carriage return, newline, vertical tab, and form feed
1107218Sconklin
1108218Sconklin@item upper
1109218Sconklinuppercase letters
1110218Sconklin
1111218Sconklin@item xdigit
1112218Sconklinhexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F}
1113218Sconklin
1114218Sconklin@end table
1115218Sconklin
1116218Sconklin@noindent
1117218SconklinThese correspond to the definitions in the C library's @file{<ctype.h>}
1118218Sconklinfacility.  For example, @samp{[:alpha:]} corresponds to the standard
1119218Sconklinfacility @code{isalpha}.  Regex recognizes character class expressions
1120218Sconklinonly inside of lists; so @samp{[[:alpha:]]} matches any letter, but
1121218Sconklin@samp{[:alpha:]} outside of a bracket expression and not followed by a
1122218Sconklinrepetition operator matches just itself.
1123218Sconklin
1124218Sconklin@node Range Operator,  , Character Class Operators, List Operators
1125218Sconklin@subsection The Range Operator (@code{-})
1126218Sconklin
1127218SconklinRegex recognizes @dfn{range expressions} inside a list. They represent
1128218Sconklinthose characters
1129218Sconklinthat fall between two elements in the current collating sequence.  You
1130218Sconklinform a range expression by putting a @dfn{range operator} between two 
1131218Sconklin@ignore
1132218Sconklin(If these get implemented, then substitute this for ``characters.'')
1133218Sconklinof any of the following: characters, collating elements, collating symbols,
1134218Sconklinand equivalence class expressions.  The starting point of the range and
1135218Sconklinthe ending point of the range don't have to be the same kind of item,
1136218Sconkline.g., the starting point could be a collating element and the ending
1137218Sconklinpoint could be an equivalence class expression.  If a range's ending
1138218Sconklinpoint is an equivalence class, then all the collating elements in that
1139218Sconklinclass will be in the range.
1140218Sconklin@end ignore
1141218Sconklincharacters.@footnote{You can't use a character class for the starting
1142218Sconklinor ending point of a range, since a character class is not a single
1143218Sconklincharacter.} @samp{-} represents the range operator.  For example,
1144218Sconklin@samp{a-f} within a list represents all the characters from @samp{a}
1145218Sconklinthrough @samp{f}
1146218Sconklininclusively.
1147218Sconklin
1148218SconklinIf the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range's
1149218Sconklinending point collates less than its starting point, the range (and the
1150218Sconklinregular expression containing it) is invalid.  For example, the regular
1151218Sconklinexpression @samp{[z-a]} would be invalid.  If this bit isn't set, then
1152218SconklinRegex considers such a range to be empty.
1153218Sconklin
1154218SconklinSince @samp{-} represents the range operator, if you want to make a
1155218Sconklin@samp{-} character itself
1156218Sconklina list item, you must do one of the following:
1157218Sconklin
1158218Sconklin@itemize @bullet
1159218Sconklin@item
1160218SconklinPut the @samp{-} either first or last in the list.
1161218Sconklin
1162218Sconklin@item
1163218SconklinInclude a range whose starting point collates strictly lower than
1164218Sconklin@samp{-} and whose ending point collates equal or higher.  Unless a
1165218Sconklinrange is the first item in a list, a @samp{-} can't be its starting
1166218Sconklinpoint, but @emph{can} be its ending point.  That is because Regex
1167218Sconklinconsiders @samp{-} to be the range operator unless it is preceded by
1168218Sconklinanother @samp{-}.  For example, in the @sc{ascii} encoding, @samp{)},
1169218Sconklin@samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} are
1170218Sconklincontiguous characters in the collating sequence.  You might think that
1171218Sconklin@samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}.  Rather, it
1172218Sconklinhas the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, so
1173218Sconklinit matches, e.g., @samp{,}, not @samp{.}.
1174218Sconklin
1175218Sconklin@item
1176218SconklinPut a range whose starting point is @samp{-} first in the list.
1177218Sconklin
1178218Sconklin@end itemize
1179218Sconklin
1180218SconklinFor example, @samp{[-a-z]} matches a lowercase letter or a hyphen (in
1181218SconklinEnglish, in @sc{ascii}).
1182218Sconklin
1183218Sconklin
1184218Sconklin@node Grouping Operators, Back-reference Operator, List Operators, Common Operators
1185218Sconklin@section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)})
1186218Sconklin
1187218Sconklin@kindex (
1188218Sconklin@kindex )
1189218Sconklin@kindex \(
1190218Sconklin@kindex \)
1191218Sconklin@cindex grouping
1192218Sconklin@cindex subexpressions
1193218Sconklin@cindex parenthesizing
1194218Sconklin
1195218SconklinA @dfn{group}, also known as a @dfn{subexpression}, consists of an
1196218Sconklin@dfn{open-group operator}, any number of other operators, and a
1197218Sconklin@dfn{close-group operator}.  Regex treats this sequence as a unit, just
1198218Sconklinas mathematics and programming languages treat a parenthesized
1199218Sconklinexpression as a unit.
1200218Sconklin
1201218SconklinTherefore, using @dfn{groups}, you can:
1202218Sconklin
1203218Sconklin@itemize @bullet
1204218Sconklin@item
1205218Sconklindelimit the argument(s) to an alternation operator (@pxref{Alternation
1206218SconklinOperator}) or a repetition operator (@pxref{Repetition
1207218SconklinOperators}).
1208218Sconklin
1209218Sconklin@item 
1210218Sconklinkeep track of the indices of the substring that matched a given group.
1211218Sconklin@xref{Using Registers}, for a precise explanation.
1212218SconklinThis lets you:
1213218Sconklin
1214218Sconklin@itemize @bullet
1215218Sconklin@item
1216218Sconklinuse the back-reference operator (@pxref{Back-reference Operator}).
1217218Sconklin
1218218Sconklin@item 
1219218Sconklinuse registers (@pxref{Using Registers}).
1220218Sconklin
1221218Sconklin@end itemize
1222218Sconklin
1223218Sconklin@end itemize
1224218Sconklin
1225218SconklinIf the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} represents
1226218Sconklinthe open-group operator and @samp{)} represents the
1227218Sconklinclose-group operator; otherwise, @samp{\(} and @samp{\)} do.
1228218Sconklin
1229218SconklinIf the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and a
1230218Sconklinclose-group operator has no matching open-group operator, then Regex
1231218Sconklinconsiders it to match @samp{)}.
1232218Sconklin
1233218Sconklin
1234218Sconklin@node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators
1235218Sconklin@section The Back-reference Operator (@dfn{\}@var{digit})
1236218Sconklin
1237218Sconklin@cindex back references
1238218Sconklin
1239218SconklinIf the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizes
1240218Sconklinback references.  A back reference matches a specified preceding group.
1241218SconklinThe back reference operator is represented by @samp{\@var{digit}}
1242218Sconklinanywhere after the end of a regular expression's @w{@var{digit}-th}
1243218Sconklingroup (@pxref{Grouping Operators}).
1244218Sconklin
1245218Sconklin@var{digit} must be between @samp{1} and @samp{9}.  The matcher assigns
1246218Sconklinnumbers 1 through 9 to the first nine groups it encounters.  By using
1247218Sconklinone of @samp{\1} through @samp{\9} after the corresponding group's
1248218Sconklinclose-group operator, you can match a substring identical to the
1249218Sconklinone that the group does.
1250218Sconklin
1251218SconklinBack references match according to the following (in all examples below,
1252218Sconklin@samp{(} represents the open-group, @samp{)} the close-group, @samp{@{}
1253218Sconklinthe open-interval and @samp{@}} the close-interval operator):
1254218Sconklin
1255218Sconklin@itemize @bullet
1256218Sconklin@item
1257218SconklinIf the group matches a substring, the back reference matches an
1258218Sconklinidentical substring.  For example, @samp{(a)\1} matches @samp{aa} and
1259218Sconklin@samp{(bana)na\1bo\1} matches @samp{bananabanabobana}.  Likewise,
1260218Sconklin@samp{(.*)\1} matches any (newline-free if the syntax bit
1261218Sconklin@code{RE_DOT_NEWLINE} isn't set) string that is composed of two
1262218Sconklinidentical halves; the @samp{(.*)} matches the first half and the
1263218Sconklin@samp{\1} matches the second half.
1264218Sconklin
1265218Sconklin@item
1266218SconklinIf the group matches more than once (as it might if followed
1267218Sconklinby, e.g., a repetition operator), then the back reference matches the
1268218Sconklinsubstring the group @emph{last} matched.  For example,
1269218Sconklin@samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (the
1270218Sconklinouter one) matches @samp{aab} and @w{group 2} (the inner one) matches
1271218Sconklin@samp{aa}.  Then @w{group 1} matches @samp{ab} and @w{group 2} matches
1272218Sconklin@samp{a}.  So, @samp{\1} matches @samp{ab} and @samp{\2} matches
1273218Sconklin@samp{a}.
1274218Sconklin
1275218Sconklin@item
1276218SconklinIf the group doesn't participate in a match, i.e., it is part of an
1277218Sconklinalternative not taken or a repetition operator allows zero repetitions
1278218Sconklinof it, then the back reference makes the whole match fail.  For example,
1279218Sconklin@samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three}
1280218Sconklinand @samp{two-and-four}, but not @samp{one-and-four} or
1281218Sconklin@samp{two-and-three}.  For example, if the pattern matches
1282218Sconklin@samp{one-and-}, then its @w{group 2} matches the empty string and its
1283218Sconklin@w{group 3} doesn't participate in the match.  So, if it then matches
1284218Sconklin@samp{four}, then when it tries to back reference @w{group 3}---which it
1285218Sconklinwill attempt to do because @samp{\3} follows the @samp{four}---the match
1286218Sconklinwill fail because @w{group 3} didn't participate in the match.
1287218Sconklin
1288218Sconklin@end itemize
1289218Sconklin
1290218SconklinYou can use a back reference as an argument to a repetition operator.  For
1291218Sconklinexample, @samp{(a(b))\2*} matches @samp{a} followed by two or more
1292218Sconklin@samp{b}s.  Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}.
1293218Sconklin
1294218SconklinIf there is no preceding @w{@var{digit}-th} subexpression, the regular
1295218Sconklinexpression is invalid.
1296218Sconklin
1297218Sconklin
1298218Sconklin@node Anchoring Operators,  , Back-reference Operator, Common Operators
1299218Sconklin@section Anchoring Operators    
1300218Sconklin
1301218Sconklin@cindex anchoring
1302218Sconklin@cindex regexp anchoring
1303218Sconklin
1304218SconklinThese operators can constrain a pattern to match only at the beginning or
1305218Sconklinend of the entire string or at the beginning or end of a line.
1306218Sconklin
1307218Sconklin@menu
1308218Sconklin* Match-beginning-of-line Operator::  ^
1309218Sconklin* Match-end-of-line Operator::        $
1310218Sconklin@end menu
1311218Sconklin
1312218Sconklin
1313218Sconklin@node Match-beginning-of-line Operator, Match-end-of-line Operator,  , Anchoring Operators
1314218Sconklin@subsection The Match-beginning-of-line Operator (@code{^})
1315218Sconklin
1316218Sconklin@kindex ^
1317218Sconklin@cindex beginning-of-line operator
1318218Sconklin@cindex anchors
1319218Sconklin
1320218SconklinThis operator can match the empty string either at the beginning of the
1321218Sconklinstring or after a newline character.  Thus, it is said to @dfn{anchor}
1322218Sconklinthe pattern to the beginning of a line.
1323218Sconklin
1324218SconklinIn the cases following, @samp{^} represents this operator.  (Otherwise,
1325218Sconklin@samp{^} is ordinary.)
1326218Sconklin
1327218Sconklin@itemize @bullet
1328218Sconklin
1329218Sconklin@item
1330218SconklinIt (the @samp{^}) is first in the pattern, as in @samp{^foo}.
1331218Sconklin
1332218Sconklin@cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})}
1333218Sconklin@item
1334218SconklinThe syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outside
1335218Sconklina bracket expression.
1336218Sconklin
1337218Sconklin@cindex open-group operator and @samp{^}
1338218Sconklin@cindex alternation operator and @samp{^}
1339218Sconklin@item
1340218SconklinIt follows an open-group or alternation operator, as in @samp{a\(^b\)}
1341218Sconklinand @samp{a\|^b}.  @xref{Grouping Operators}, and @ref{Alternation
1342218SconklinOperator}.
1343218Sconklin
1344218Sconklin@end itemize
1345218Sconklin
1346218SconklinThese rules imply that some valid patterns containing @samp{^} cannot be
1347218Sconklinmatched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS}
1348218Sconklinis set.
1349218Sconklin
1350218Sconklin@vindex not_bol @r{field in pattern buffer}
1351218SconklinIf the @code{not_bol} field is set in the pattern buffer (@pxref{GNU
1352218SconklinPattern Buffers}), then @samp{^} fails to match at the beginning of the
1353218Sconklinstring.  @xref{POSIX Matching}, for when you might find this useful.
1354218Sconklin
1355218Sconklin@vindex newline_anchor @r{field in pattern buffer}
1356218SconklinIf the @code{newline_anchor} field is set in the pattern buffer, then
1357218Sconklin@samp{^} fails to match after a newline.  This is useful when you do not
1358218Sconklinregard the string to be matched as broken into lines.
1359218Sconklin
1360218Sconklin
1361218Sconklin@node Match-end-of-line Operator,  , Match-beginning-of-line Operator, Anchoring Operators
1362218Sconklin@subsection The Match-end-of-line Operator (@code{$})
1363218Sconklin
1364218Sconklin@kindex $
1365218Sconklin@cindex end-of-line operator
1366218Sconklin@cindex anchors
1367218Sconklin
1368218SconklinThis operator can match the empty string either at the end of
1369218Sconklinthe string or before a newline character in the string.  Thus, it is
1370218Sconklinsaid to @dfn{anchor} the pattern to the end of a line.
1371218Sconklin
1372218SconklinIt is always represented by @samp{$}.  For example, @samp{foo$} usually
1373218Sconklinmatches, e.g., @samp{foo} and, e.g., the first three characters of
1374218Sconklin@samp{foo\nbar}.
1375218Sconklin
1376218SconklinIts interaction with the syntax bits and pattern buffer fields is
1377218Sconklinexactly the dual of @samp{^}'s; see the previous section.  (That is,
1378218Sconklin``beginning'' becomes ``end'', ``next'' becomes ``previous'', and
1379218Sconklin``after'' becomes ``before''.)
1380218Sconklin
1381218Sconklin
1382218Sconklin@node GNU Operators, GNU Emacs Operators, Common Operators, Top
1383218Sconklin@chapter GNU Operators
1384218Sconklin
1385218SconklinFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't).
1386218Sconklin
1387218Sconklin@menu
1388218Sconklin* Word Operators::
1389218Sconklin* Buffer Operators::
1390218Sconklin@end menu
1391218Sconklin
1392218Sconklin@node Word Operators, Buffer Operators,  , GNU Operators
1393218Sconklin@section Word Operators
1394218Sconklin
1395218SconklinThe operators in this section require Regex to recognize parts of words.
1396218SconklinRegex uses a syntax table to determine whether or not a character is
1397218Sconklinpart of a word, i.e., whether or not it is @dfn{word-constituent}.
1398218Sconklin
1399218Sconklin@menu
1400218Sconklin* Non-Emacs Syntax Tables::
1401218Sconklin* Match-word-boundary Operator::	\b
1402218Sconklin* Match-within-word Operator::		\B
1403218Sconklin* Match-beginning-of-word Operator::	\<
1404218Sconklin* Match-end-of-word Operator::		\>
1405218Sconklin* Match-word-constituent Operator::	\w
1406218Sconklin* Match-non-word-constituent Operator::	\W
1407218Sconklin@end menu
1408218Sconklin
1409218Sconklin@node Non-Emacs Syntax Tables, Match-word-boundary Operator,  , Word Operators
1410218Sconklin@subsection Non-Emacs Syntax Tables    
1411218Sconklin
1412218SconklinA @dfn{syntax table} is an array indexed by the characters in your
1413218Sconklincharacter set.  In the @sc{ascii} encoding, therefore, a syntax table
1414218Sconklinhas 256 elements.  Regex always uses a @code{char *} variable
1415218Sconklin@code{re_syntax_table} as its syntax table.  In some cases, it
1416218Sconklininitializes this variable and in others it expects you to initialize it.
1417218Sconklin
1418218Sconklin@itemize @bullet
1419218Sconklin@item
1420218SconklinIf Regex is compiled with the preprocessor symbols @code{emacs} and
1421218Sconklin@code{SYNTAX_TABLE} both undefined, then Regex allocates
1422218Sconklin@code{re_syntax_table} and initializes an element @var{i} either to
1423218Sconklin@code{Sword} (which it defines) if @var{i} is a letter, number, or
1424218Sconklin@samp{_}, or to zero if it's not.
1425218Sconklin
1426218Sconklin@item
1427218SconklinIf Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}
1428218Sconklindefined, then Regex expects you to define a @code{char *} variable
1429218Sconklin@code{re_syntax_table} to be a valid syntax table.
1430218Sconklin
1431218Sconklin@item
1432218Sconklin@xref{Emacs Syntax Tables}, for what happens when Regex is compiled with
1433218Sconklinthe preprocessor symbol @code{emacs} defined.
1434218Sconklin
1435218Sconklin@end itemize
1436218Sconklin
1437218Sconklin@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators
1438218Sconklin@subsection The Match-word-boundary Operator (@code{\b})
1439218Sconklin
1440218Sconklin@cindex @samp{\b}
1441218Sconklin@cindex word boundaries, matching
1442218Sconklin
1443218SconklinThis operator (represented by @samp{\b}) matches the empty string at
1444218Sconklineither the beginning or the end of a word.  For example, @samp{\brat\b}
1445218Sconklinmatches the separate word @samp{rat}.
1446218Sconklin
1447218Sconklin@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators
1448218Sconklin@subsection The Match-within-word Operator (@code{\B})
1449218Sconklin
1450218Sconklin@cindex @samp{\B}
1451218Sconklin
1452218SconklinThis operator (represented by @samp{\B}) matches the empty string within
1453218Sconklina word. For example, @samp{c\Brat\Be} matches @samp{crate}, but
1454218Sconklin@samp{dirty \Brat} doesn't match @samp{dirty rat}.
1455218Sconklin
1456218Sconklin@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators
1457218Sconklin@subsection The Match-beginning-of-word Operator (@code{\<})
1458218Sconklin
1459218Sconklin@cindex @samp{\<}
1460218Sconklin
1461218SconklinThis operator (represented by @samp{\<}) matches the empty string at the
1462218Sconklinbeginning of a word.
1463218Sconklin
1464218Sconklin@node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators
1465218Sconklin@subsection The Match-end-of-word Operator (@code{\>})
1466218Sconklin
1467218Sconklin@cindex @samp{\>}
1468218Sconklin
1469218SconklinThis operator (represented by @samp{\>}) matches the empty string at the
1470218Sconklinend of a word.
1471218Sconklin
1472218Sconklin@node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators
1473218Sconklin@subsection The Match-word-constituent Operator (@code{\w})
1474218Sconklin
1475218Sconklin@cindex @samp{\w}
1476218Sconklin
1477218SconklinThis operator (represented by @samp{\w}) matches any word-constituent
1478218Sconklincharacter.
1479218Sconklin
1480218Sconklin@node Match-non-word-constituent Operator,  , Match-word-constituent Operator, Word Operators
1481218Sconklin@subsection The Match-non-word-constituent Operator (@code{\W})
1482218Sconklin
1483218Sconklin@cindex @samp{\W}
1484218Sconklin
1485218SconklinThis operator (represented by @samp{\W}) matches any character that is
1486218Sconklinnot word-constituent.
1487218Sconklin
1488218Sconklin
1489218Sconklin@node Buffer Operators,  , Word Operators, GNU Operators
1490218Sconklin@section Buffer Operators    
1491218Sconklin
1492218SconklinFollowing are operators which work on buffers.  In Emacs, a @dfn{buffer}
1493218Sconklinis, naturally, an Emacs buffer.  For other programs, Regex considers the
1494218Sconklinentire string to be matched as the buffer.
1495218Sconklin
1496218Sconklin@menu
1497218Sconklin* Match-beginning-of-buffer Operator::	\`
1498218Sconklin* Match-end-of-buffer Operator::	\'
1499218Sconklin@end menu
1500218Sconklin
1501218Sconklin
1502218Sconklin@node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator,  , Buffer Operators
1503218Sconklin@subsection The Match-beginning-of-buffer Operator (@code{\`})
1504218Sconklin
1505218Sconklin@cindex @samp{\`}
1506218Sconklin
1507218SconklinThis operator (represented by @samp{\`}) matches the empty string at the
1508218Sconklinbeginning of the buffer.
1509218Sconklin
1510218Sconklin@node Match-end-of-buffer Operator,  , Match-beginning-of-buffer Operator, Buffer Operators
1511218Sconklin@subsection The Match-end-of-buffer Operator (@code{\'})
1512218Sconklin
1513218Sconklin@cindex @samp{\'}
1514218Sconklin
1515218SconklinThis operator (represented by @samp{\'}) matches the empty string at the
1516218Sconklinend of the buffer.
1517218Sconklin
1518218Sconklin
1519218Sconklin@node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top
1520218Sconklin@chapter GNU Emacs Operators
1521218Sconklin
1522218SconklinFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't)
1523218Sconklinthat you can use only when Regex is compiled with the preprocessor
1524218Sconklinsymbol @code{emacs} defined.  
1525218Sconklin
1526218Sconklin@menu
1527218Sconklin* Syntactic Class Operators::
1528218Sconklin@end menu
1529218Sconklin
1530218Sconklin
1531218Sconklin@node Syntactic Class Operators,  ,  , GNU Emacs Operators
1532218Sconklin@section Syntactic Class Operators
1533218Sconklin
1534218SconklinThe operators in this section require Regex to recognize the syntactic
1535218Sconklinclasses of characters.  Regex uses a syntax table to determine this.
1536218Sconklin
1537218Sconklin@menu
1538218Sconklin* Emacs Syntax Tables::
1539218Sconklin* Match-syntactic-class Operator::	\sCLASS
1540218Sconklin* Match-not-syntactic-class Operator::  \SCLASS
1541218Sconklin@end menu
1542218Sconklin
1543218Sconklin@node Emacs Syntax Tables, Match-syntactic-class Operator,  , Syntactic Class Operators
1544218Sconklin@subsection Emacs Syntax Tables
1545218Sconklin
1546218SconklinA @dfn{syntax table} is an array indexed by the characters in your
1547218Sconklincharacter set.  In the @sc{ascii} encoding, therefore, a syntax table
1548218Sconklinhas 256 elements.
1549218Sconklin
1550218SconklinIf Regex is compiled with the preprocessor symbol @code{emacs} defined,
1551218Sconklinthen Regex expects you to define and initialize the variable
1552218Sconklin@code{re_syntax_table} to be an Emacs syntax table.  Emacs' syntax
1553218Sconklintables are more complicated than Regex's own (@pxref{Non-Emacs Syntax
1554218SconklinTables}).  @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},
1555218Sconklinfor a description of Emacs' syntax tables.
1556218Sconklin
1557218Sconklin@node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators
1558218Sconklin@subsection The Match-syntactic-class Operator (@code{\s}@var{class})
1559218Sconklin
1560218Sconklin@cindex @samp{\s}
1561218Sconklin
1562218SconklinThis operator matches any character whose syntactic class is represented
1563218Sconklinby a specified character.  @samp{\s@var{class}} represents this operator
1564218Sconklinwhere @var{class} is the character representing the syntactic class you
1565218Sconklinwant.  For example, @samp{w} represents the syntactic
1566218Sconklinclass of word-constituent characters, so @samp{\sw} matches any
1567218Sconklinword-constituent character.
1568218Sconklin
1569218Sconklin@node Match-not-syntactic-class Operator,  , Match-syntactic-class Operator, Syntactic Class Operators
1570218Sconklin@subsection The Match-not-syntactic-class Operator (@code{\S}@var{class})
1571218Sconklin
1572218Sconklin@cindex @samp{\S}
1573218Sconklin
1574218SconklinThis operator is similar to the match-syntactic-class operator except
1575218Sconklinthat it matches any character whose syntactic class is @emph{not}
1576218Sconklinrepresented by the specified character.  @samp{\S@var{class}} represents
1577218Sconklinthis operator.  For example, @samp{w} represents the syntactic class of
1578218Sconklinword-constituent characters, so @samp{\Sw} matches any character that is
1579218Sconklinnot word-constituent.
1580218Sconklin
1581218Sconklin
1582218Sconklin@node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top
1583218Sconklin@chapter What Gets Matched?
1584218Sconklin
1585218SconklinRegex usually matches strings according to the ``leftmost longest''
1586218Sconklinrule; that is, it chooses the longest of the leftmost matches.  This
1587218Sconklindoes not mean that for a regular expression containing subexpressions
1588218Sconklinthat it simply chooses the longest match for each subexpression, left to
1589218Sconklinright; the overall match must also be the longest possible one.
1590218Sconklin
1591218SconklinFor example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not
1592218Sconklin@samp{acdac}, as it would if it were to choose the longest match for the
1593218Sconklinfirst subexpression.
1594218Sconklin
1595218Sconklin
1596218Sconklin@node Programming with Regex, Copying, What Gets Matched?, Top
1597218Sconklin@chapter Programming with Regex
1598218Sconklin
1599218SconklinHere we describe how you use the Regex data structures and functions in
1600218SconklinC programs.  Regex has three interfaces: one designed for @sc{gnu}, one
1601218Sconklincompatible with @sc{posix} and one compatible with Berkeley @sc{unix}.
1602218Sconklin
1603218Sconklin@menu
1604218Sconklin* GNU Regex Functions::
1605218Sconklin* POSIX Regex Functions::
1606218Sconklin* BSD Regex Functions::
1607218Sconklin@end menu
1608218Sconklin
1609218Sconklin
1610218Sconklin@node GNU Regex Functions, POSIX Regex Functions,  , Programming with Regex
1611218Sconklin@section GNU Regex Functions
1612218Sconklin
1613218SconklinIf you're writing code that doesn't need to be compatible with either
1614218Sconklin@sc{posix} or Berkeley @sc{unix}, you can use these functions.  They
1615218Sconklinprovide more options than the other interfaces.
1616218Sconklin
1617218Sconklin@menu
1618218Sconklin* GNU Pattern Buffers::         The re_pattern_buffer type.
1619218Sconklin* GNU Regular Expression Compiling::  re_compile_pattern ()
1620218Sconklin* GNU Matching::                re_match ()
1621218Sconklin* GNU Searching::               re_search ()
1622218Sconklin* Matching/Searching with Split Data::  re_match_2 (), re_search_2 ()
1623218Sconklin* Searching with Fastmaps::     re_compile_fastmap ()
1624218Sconklin* GNU Translate Tables::        The `translate' field.
1625218Sconklin* Using Registers::             The re_registers type and related fns.
1626218Sconklin* Freeing GNU Pattern Buffers::  regfree ()
1627218Sconklin@end menu
1628218Sconklin
1629218Sconklin
1630218Sconklin@node GNU Pattern Buffers, GNU Regular Expression Compiling,  , GNU Regex Functions
1631218Sconklin@subsection GNU Pattern Buffers
1632218Sconklin
1633218Sconklin@cindex pattern buffer, definition of
1634218Sconklin@tindex re_pattern_buffer @r{definition}
1635218Sconklin@tindex struct re_pattern_buffer @r{definition}
1636218Sconklin
1637218SconklinTo compile, match, or search for a given regular expression, you must
1638218Sconklinsupply a pattern buffer.  A @dfn{pattern buffer} holds one compiled
1639218Sconklinregular expression.@footnote{Regular expressions are also referred to as
1640218Sconklin``patterns,'' hence the name ``pattern buffer.''}
1641218Sconklin
1642218SconklinYou can have several different pattern buffers simultaneously, each
1643218Sconklinholding a compiled pattern for a different regular expression.
1644218Sconklin
1645218Sconklin@file{regex.h} defines the pattern buffer @code{struct} as follows:
1646218Sconklin
1647218Sconklin@example
1648218Sconklin[[[ pattern_buffer ]]]
1649218Sconklin@end example
1650218Sconklin
1651218Sconklin
1652218Sconklin@node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions
1653218Sconklin@subsection GNU Regular Expression Compiling
1654218Sconklin
1655218SconklinIn @sc{gnu}, you can both match and search for a given regular
1656218Sconklinexpression.  To do either, you must first compile it in a pattern buffer
1657218Sconklin(@pxref{GNU Pattern Buffers}).
1658218Sconklin
1659218Sconklin@cindex syntax initialization
1660218Sconklin@vindex re_syntax_options @r{initialization}
1661218SconklinRegular expressions match according to the syntax with which they were
1662218Sconklincompiled; with @sc{gnu}, you indicate what syntax you want by setting
1663218Sconklinthe variable @code{re_syntax_options} (declared in @file{regex.h} and
1664218Sconklindefined in @file{regex.c}) before calling the compiling function,
1665218Sconklin@code{re_compile_pattern} (see below).  @xref{Syntax Bits}, and
1666218Sconklin@ref{Predefined Syntaxes}.
1667218Sconklin
1668218SconklinYou can change the value of @code{re_syntax_options} at any time.
1669218SconklinUsually, however, you set its value once and then never change it.
1670218Sconklin
1671218Sconklin@cindex pattern buffer initialization
1672218Sconklin@code{re_compile_pattern} takes a pattern buffer as an argument.  You
1673218Sconklinmust initialize the following fields:
1674218Sconklin
1675218Sconklin@table @code
1676218Sconklin
1677218Sconklin@item translate @r{initialization}
1678218Sconklin
1679218Sconklin@item translate
1680218Sconklin@vindex translate @r{initialization}
1681218SconklinInitialize this to point to a translate table if you want one, or to
1682218Sconklinzero if you don't.  We explain translate tables in @ref{GNU Translate
1683218SconklinTables}.
1684218Sconklin
1685218Sconklin@item fastmap
1686218Sconklin@vindex fastmap @r{initialization}
1687218SconklinInitialize this to nonzero if you want a fastmap, or to zero if you
1688218Sconklindon't.
1689218Sconklin
1690218Sconklin@item buffer
1691218Sconklin@itemx allocated
1692218Sconklin@vindex buffer @r{initialization}
1693218Sconklin@vindex allocated @r{initialization}
1694218Sconklin@findex malloc
1695218SconklinIf you want @code{re_compile_pattern} to allocate memory for the
1696218Sconklincompiled pattern, set both of these to zero.  If you have an existing
1697218Sconklinblock of memory (allocated with @code{malloc}) you want Regex to use,
1698218Sconklinset @code{buffer} to its address and @code{allocated} to its size (in
1699218Sconklinbytes).
1700218Sconklin
1701218Sconklin@code{re_compile_pattern} uses @code{realloc} to extend the space for
1702218Sconklinthe compiled pattern as necessary.
1703218Sconklin
1704218Sconklin@end table
1705218Sconklin
1706218SconklinTo compile a pattern buffer, use:
1707218Sconklin
1708218Sconklin@findex re_compile_pattern
1709218Sconklin@example
1710218Sconklinchar * 
1711218Sconklinre_compile_pattern (const char *@var{regex}, const int @var{regex_size}, 
1712218Sconklin                    struct re_pattern_buffer *@var{pattern_buffer})
1713218Sconklin@end example
1714218Sconklin
1715218Sconklin@noindent
1716218Sconklin@var{regex} is the regular expression's address, @var{regex_size} is its
1717218Sconklinlength, and @var{pattern_buffer} is the pattern buffer's address.
1718218Sconklin
1719218SconklinIf @code{re_compile_pattern} successfully compiles the regular
1720218Sconklinexpression, it returns zero and sets @code{*@var{pattern_buffer}} to the
1721218Sconklincompiled pattern.  It sets the pattern buffer's fields as follows:
1722218Sconklin
1723218Sconklin@table @code
1724218Sconklin@item buffer
1725218Sconklin@vindex buffer @r{field, set by @code{re_compile_pattern}}
1726218Sconklinto the compiled pattern.
1727218Sconklin
1728218Sconklin@item used
1729218Sconklin@vindex used @r{field, set by @code{re_compile_pattern}}
1730218Sconklinto the number of bytes the compiled pattern in @code{buffer} occupies.
1731218Sconklin
1732218Sconklin@item syntax
1733218Sconklin@vindex syntax @r{field, set by @code{re_compile_pattern}}
1734218Sconklinto the current value of @code{re_syntax_options}.
1735218Sconklin
1736218Sconklin@item re_nsub
1737218Sconklin@vindex re_nsub @r{field, set by @code{re_compile_pattern}}
1738218Sconklinto the number of subexpressions in @var{regex}.
1739218Sconklin
1740218Sconklin@item fastmap_accurate
1741218Sconklin@vindex fastmap_accurate @r{field, set by @code{re_compile_pattern}}
1742218Sconklinto zero on the theory that the pattern you're compiling is different
1743218Sconklinthan the one previously compiled into @code{buffer}; in that case (since
1744218Sconklinyou can't make a fastmap without a compiled pattern), 
1745218Sconklin@code{fastmap} would either contain an incompatible fastmap, or nothing
1746218Sconklinat all.
1747218Sconklin
1748218Sconklin@c xx what else?
1749218Sconklin@end table
1750218Sconklin
1751218SconklinIf @code{re_compile_pattern} can't compile @var{regex}, it returns an
1752218Sconklinerror string corresponding to one of the errors listed in @ref{POSIX
1753218SconklinRegular Expression Compiling}.
1754218Sconklin
1755218Sconklin
1756218Sconklin@node GNU Matching, GNU Searching, GNU Regular Expression Compiling, GNU Regex Functions
1757218Sconklin@subsection GNU Matching 
1758218Sconklin
1759218Sconklin@cindex matching with GNU functions
1760218Sconklin
1761218SconklinMatching the @sc{gnu} way means trying to match as much of a string as
1762218Sconklinpossible starting at a position within it you specify.  Once you've compiled
1763218Sconklina pattern into a pattern buffer (@pxref{GNU Regular Expression
1764218SconklinCompiling}), you can ask the matcher to match that pattern against a
1765218Sconklinstring using:
1766218Sconklin
1767218Sconklin@findex re_match
1768218Sconklin@example
1769218Sconklinint
1770218Sconklinre_match (struct re_pattern_buffer *@var{pattern_buffer}, 
1771218Sconklin          const char *@var{string}, const int @var{size}, 
1772218Sconklin          const int @var{start}, struct re_registers *@var{regs})
1773218Sconklin@end example
1774218Sconklin
1775218Sconklin@noindent
1776218Sconklin@var{pattern_buffer} is the address of a pattern buffer containing a
1777218Sconklincompiled pattern.  @var{string} is the string you want to match; it can
1778218Sconklincontain newline and null characters.  @var{size} is the length of that
1779218Sconklinstring.  @var{start} is the string index at which you want to
1780218Sconklinbegin matching; the first character of @var{string} is at index zero.
1781218Sconklin@xref{Using Registers}, for a explanation of @var{regs}; you can safely
1782218Sconklinpass zero.
1783218Sconklin
1784218Sconklin@code{re_match} matches the regular expression in @var{pattern_buffer}
1785218Sconklinagainst the string @var{string} according to the syntax in
1786218Sconklin@var{pattern_buffers}'s @code{syntax} field.  (@xref{GNU Regular
1787218SconklinExpression Compiling}, for how to set it.)  The function returns
1788218Sconklin@math{-1} if the compiled pattern does not match any part of
1789218Sconklin@var{string} and @math{-2} if an internal error happens; otherwise, it
1790218Sconklinreturns how many (possibly zero) characters of @var{string} the pattern
1791218Sconklinmatched.
1792218Sconklin
1793218SconklinAn example: suppose @var{pattern_buffer} points to a pattern buffer
1794218Sconklincontaining the compiled pattern for @samp{a*}, and @var{string} points
1795218Sconklinto @samp{aaaaab} (whereupon @var{size} should be 6). Then if @var{start}
1796218Sconklinis 2, @code{re_match} returns 3, i.e., @samp{a*} would have matched the
1797218Sconklinlast three @samp{a}s in @var{string}.  If @var{start} is 0,
1798218Sconklin@code{re_match} returns 5, i.e., @samp{a*} would have matched all the
1799218Sconklin@samp{a}s in @var{string}.  If @var{start} is either 5 or 6, it returns
1800218Sconklinzero.
1801218Sconklin
1802218SconklinIf @var{start} is not between zero and @var{size}, then
1803218Sconklin@code{re_match} returns @math{-1}.
1804218Sconklin
1805218Sconklin
1806218Sconklin@node GNU Searching, Matching/Searching with Split Data, GNU Matching, GNU Regex Functions
1807218Sconklin@subsection GNU Searching 
1808218Sconklin
1809218Sconklin@cindex searching with GNU functions
1810218Sconklin
1811218Sconklin@dfn{Searching} means trying to match starting at successive positions
1812218Sconklinwithin a string.  The function @code{re_search} does this.
1813218Sconklin
1814218SconklinBefore calling @code{re_search}, you must compile your regular
1815218Sconklinexpression.  @xref{GNU Regular Expression Compiling}.
1816218Sconklin
1817218SconklinHere is the function declaration:
1818218Sconklin
1819218Sconklin@findex re_search
1820218Sconklin@example
1821218Sconklinint 
1822218Sconklinre_search (struct re_pattern_buffer *@var{pattern_buffer}, 
1823218Sconklin           const char *@var{string}, const int @var{size}, 
1824218Sconklin           const int @var{start}, const int @var{range}, 
1825218Sconklin           struct re_registers *@var{regs})
1826218Sconklin@end example
1827218Sconklin
1828218Sconklin@noindent
1829218Sconklin@vindex start @r{argument to @code{re_search}}
1830218Sconklin@vindex range @r{argument to @code{re_search}}
1831218Sconklinwhose arguments are the same as those to @code{re_match} (@pxref{GNU
1832218SconklinMatching}) except that the two arguments @var{start} and @var{range}
1833218Sconklinreplace @code{re_match}'s argument @var{start}.
1834218Sconklin
1835218SconklinIf @var{range} is positive, then @code{re_search} attempts a match
1836218Sconklinstarting first at index @var{start}, then at @math{@var{start} + 1} if
1837218Sconklinthat fails, and so on, up to @math{@var{start} + @var{range}}; if
1838218Sconklin@var{range} is negative, then it attempts a match starting first at
1839218Sconklinindex @var{start}, then at @math{@var{start} -1} if that fails, and so
1840218Sconklinon.  
1841218Sconklin
1842218SconklinIf @var{start} is not between zero and @var{size}, then @code{re_search}
1843218Sconklinreturns @math{-1}.  When @var{range} is positive, @code{re_search}
1844218Sconklinadjusts @var{range} so that @math{@var{start} + @var{range} - 1} is
1845218Sconklinbetween zero and @var{size}, if necessary; that way it won't search
1846218Sconklinoutside of @var{string}.  Similarly, when @var{range} is negative,
1847218Sconklin@code{re_search} adjusts @var{range} so that @math{@var{start} +
1848218Sconklin@var{range} + 1} is between zero and @var{size}, if necessary.
1849218Sconklin
1850218SconklinIf the @code{fastmap} field of @var{pattern_buffer} is zero,
1851218Sconklin@code{re_search} matches starting at consecutive positions; otherwise,
1852218Sconklinit uses @code{fastmap} to make the search more efficient.
1853218Sconklin@xref{Searching with Fastmaps}.
1854218Sconklin
1855218SconklinIf no match is found, @code{re_search} returns @math{-1}.  If
1856218Sconklina match is found, it returns the index where the match began.  If an
1857218Sconklininternal error happens, it returns @math{-2}.
1858218Sconklin
1859218Sconklin
1860218Sconklin@node Matching/Searching with Split Data, Searching with Fastmaps, GNU Searching, GNU Regex Functions
1861218Sconklin@subsection Matching and Searching with Split Data
1862218Sconklin
1863218SconklinUsing the functions @code{re_match_2} and @code{re_search_2}, you can
1864218Sconklinmatch or search in data that is divided into two strings.  
1865218Sconklin
1866218SconklinThe function:
1867218Sconklin
1868218Sconklin@findex re_match_2
1869218Sconklin@example
1870218Sconklinint
1871218Sconklinre_match_2 (struct re_pattern_buffer *@var{buffer}, 
1872218Sconklin            const char *@var{string1}, const int @var{size1}, 
1873218Sconklin            const char *@var{string2}, const int @var{size2}, 
1874218Sconklin            const int @var{start}, 
1875218Sconklin            struct re_registers *@var{regs}, 
1876218Sconklin            const int @var{stop})
1877218Sconklin@end example
1878218Sconklin
1879218Sconklin@noindent
1880218Sconklinis similar to @code{re_match} (@pxref{GNU Matching}) except that you
1881218Sconklinpass @emph{two} data strings and sizes, and an index @var{stop} beyond
1882218Sconklinwhich you don't want the matcher to try matching.  As with
1883218Sconklin@code{re_match}, if it succeeds, @code{re_match_2} returns how many
1884218Sconklincharacters of @var{string} it matched.  Regard @var{string1} and
1885218Sconklin@var{string2} as concatenated when you set the arguments @var{start} and
1886218Sconklin@var{stop} and use the contents of @var{regs}; @code{re_match_2} never
1887218Sconklinreturns a value larger than @math{@var{size1} + @var{size2}}.  
1888218Sconklin
1889218SconklinThe function:
1890218Sconklin
1891218Sconklin@findex re_search_2
1892218Sconklin@example
1893218Sconklinint
1894218Sconklinre_search_2 (struct re_pattern_buffer *@var{buffer}, 
1895218Sconklin             const char *@var{string1}, const int @var{size1}, 
1896218Sconklin             const char *@var{string2}, const int @var{size2}, 
1897218Sconklin             const int @var{start}, const int @var{range}, 
1898218Sconklin             struct re_registers *@var{regs}, 
1899218Sconklin             const int @var{stop})
1900218Sconklin@end example
1901218Sconklin
1902218Sconklin@noindent
1903218Sconklinis similarly related to @code{re_search}.
1904218Sconklin
1905218Sconklin
1906218Sconklin@node Searching with Fastmaps, GNU Translate Tables, Matching/Searching with Split Data, GNU Regex Functions
1907218Sconklin@subsection Searching with Fastmaps
1908218Sconklin
1909218Sconklin@cindex fastmaps
1910218SconklinIf you're searching through a long string, you should use a fastmap.
1911218SconklinWithout one, the searcher tries to match at consecutive positions in the
1912218Sconklinstring.  Generally, most of the characters in the string could not start
1913218Sconklina match.  It takes much longer to try matching at a given position in the
1914218Sconklinstring than it does to check in a table whether or not the character at
1915218Sconklinthat position could start a match.  A @dfn{fastmap} is such a table.
1916218Sconklin
1917218SconklinMore specifically, a fastmap is an array indexed by the characters in
1918218Sconklinyour character set.  Under the @sc{ascii} encoding, therefore, a fastmap
1919218Sconklinhas 256 elements.  If you want the searcher to use a fastmap with a
1920218Sconklingiven pattern buffer, you must allocate the array and assign the array's
1921218Sconklinaddress to the pattern buffer's @code{fastmap} field.  You either can
1922218Sconklincompile the fastmap yourself or have @code{re_search} do it for you;
1923218Sconklinwhen @code{fastmap} is nonzero, it automatically compiles a fastmap the
1924218Sconklinfirst time you search using a particular compiled pattern.  
1925218Sconklin
1926218SconklinTo compile a fastmap yourself, use:
1927218Sconklin
1928218Sconklin@findex re_compile_fastmap
1929218Sconklin@example
1930218Sconklinint
1931218Sconklinre_compile_fastmap (struct re_pattern_buffer *@var{pattern_buffer})
1932218Sconklin@end example
1933218Sconklin
1934218Sconklin@noindent
1935218Sconklin@var{pattern_buffer} is the address of a pattern buffer.  If the
1936218Sconklincharacter @var{c} could start a match for the pattern,
1937218Sconklin@code{re_compile_fastmap} makes
1938218Sconklin@code{@var{pattern_buffer}->fastmap[@var{c}]} nonzero.  It returns
1939218Sconklin@math{0} if it can compile a fastmap and @math{-2} if there is an
1940218Sconklininternal error.  For example, if @samp{|} is the alternation operator
1941218Sconklinand @var{pattern_buffer} holds the compiled pattern for @samp{a|b}, then
1942218Sconklin@code{re_compile_fastmap} sets @code{fastmap['a']} and
1943218Sconklin@code{fastmap['b']} (and no others).
1944218Sconklin
1945218Sconklin@code{re_search} uses a fastmap as it moves along in the string: it
1946218Sconklinchecks the string's characters until it finds one that's in the fastmap.
1947218SconklinThen it tries matching at that character.  If the match fails, it
1948218Sconklinrepeats the process.  So, by using a fastmap, @code{re_search} doesn't
1949218Sconklinwaste time trying to match at positions in the string that couldn't
1950218Sconklinstart a match.
1951218Sconklin
1952218SconklinIf you don't want @code{re_search} to use a fastmap,
1953218Sconklinstore zero in the @code{fastmap} field of the pattern buffer before
1954218Sconklincalling @code{re_search}.
1955218Sconklin
1956218SconklinOnce you've initialized a pattern buffer's @code{fastmap} field, you
1957218Sconklinneed never do so again---even if you compile a new pattern in
1958218Sconklinit---provided the way the field is set still reflects whether or not you
1959218Sconklinwant a fastmap.  @code{re_search} will still either do nothing if
1960218Sconklin@code{fastmap} is null or, if it isn't, compile a new fastmap for the
1961218Sconklinnew pattern.
1962218Sconklin
1963218Sconklin@node GNU Translate Tables, Using Registers, Searching with Fastmaps, GNU Regex Functions
1964218Sconklin@subsection GNU Translate Tables
1965218Sconklin
1966218SconklinIf you set the @code{translate} field of a pattern buffer to a translate
1967218Sconklintable, then the @sc{gnu} Regex functions to which you've passed that
1968218Sconklinpattern buffer use it to apply a simple transformation
1969218Sconklinto all the regular expression and string characters at which they look.
1970218Sconklin
1971218SconklinA @dfn{translate table} is an array indexed by the characters in your
1972218Sconklincharacter set.  Under the @sc{ascii} encoding, therefore, a translate
1973218Sconklintable has 256 elements.  The array's elements are also characters in
1974218Sconklinyour character set.  When the Regex functions see a character @var{c},
1975218Sconklinthey use @code{translate[@var{c}]} in its place, with one exception: the
1976218Sconklincharacter after a @samp{\} is not translated.  (This ensures that, the
1977218Sconklinoperators, e.g., @samp{\B} and @samp{\b}, are always distinguishable.)
1978218Sconklin
1979218SconklinFor example, a table that maps all lowercase letters to the
1980218Sconklincorresponding uppercase ones would cause the matcher to ignore
1981218Sconklindifferences in case.@footnote{A table that maps all uppercase letters to
1982218Sconklinthe corresponding lowercase ones would work just as well for this
1983218Sconklinpurpose.}  Such a table would map all characters except lowercase letters
1984218Sconklinto themselves, and lowercase letters to the corresponding uppercase
1985218Sconklinones.  Under the @sc{ascii} encoding, here's how you could initialize
1986218Sconklinsuch a table (we'll call it @code{case_fold}):
1987218Sconklin
1988218Sconklin@example
1989218Sconklinfor (i = 0; i < 256; i++)
1990218Sconklin  case_fold[i] = i;
1991218Sconklinfor (i = 'a'; i <= 'z'; i++)
1992218Sconklin  case_fold[i] = i - ('a' - 'A');
1993218Sconklin@end example
1994218Sconklin
1995218SconklinYou tell Regex to use a translate table on a given pattern buffer by
1996218Sconklinassigning that table's address to the @code{translate} field of that
1997218Sconklinbuffer.  If you don't want Regex to do any translation, put zero into
1998218Sconklinthis field.  You'll get weird results if you change the table's contents
1999218Sconklinanytime between compiling the pattern buffer, compiling its fastmap, and
2000218Sconklinmatching or searching with the pattern buffer.
2001218Sconklin
2002218Sconklin@node Using Registers, Freeing GNU Pattern Buffers, GNU Translate Tables, GNU Regex Functions
2003218Sconklin@subsection Using Registers
2004218Sconklin
2005218SconklinA group in a regular expression can match a (posssibly empty) substring
2006218Sconklinof the string that regular expression as a whole matched.  The matcher
2007218Sconklinremembers the beginning and end of the substring matched by
2008218Sconklineach group.
2009218Sconklin
2010218SconklinTo find out what they matched, pass a nonzero @var{regs} argument to a
2011218Sconklin@sc{gnu} matching or searching function (@pxref{GNU Matching} and
2012218Sconklin@ref{GNU Searching}), i.e., the address of a structure of this type, as
2013218Sconklindefined in @file{regex.h}:
2014218Sconklin
2015218Sconklin@c We don't bother to include this directly from regex.h,
2016218Sconklin@c since it changes so rarely.
2017218Sconklin@example
2018218Sconklin@tindex re_registers
2019218Sconklin@vindex num_regs @r{in @code{struct re_registers}}
2020218Sconklin@vindex start @r{in @code{struct re_registers}}
2021218Sconklin@vindex end @r{in @code{struct re_registers}}
2022218Sconklinstruct re_registers
2023218Sconklin@{
2024218Sconklin  unsigned num_regs;
2025218Sconklin  regoff_t *start;
2026218Sconklin  regoff_t *end;
2027218Sconklin@};
2028218Sconklin@end example
2029218Sconklin
2030218SconklinExcept for (possibly) the @var{num_regs}'th element (see below), the
2031218Sconklin@var{i}th element of the @code{start} and @code{end} arrays records
2032218Sconklininformation about the @var{i}th group in the pattern.  (They're declared
2033218Sconklinas C pointers, but this is only because not all C compilers accept
2034218Sconklinzero-length arrays; conceptually, it is simplest to think of them as
2035218Sconklinarrays.)
2036218Sconklin
2037218SconklinThe @code{start} and @code{end} arrays are allocated in various ways,
2038218Sconklindepending on the value of the @code{regs_allocated}
2039218Sconklin@vindex regs_allocated
2040218Sconklinfield in the pattern buffer passed to the matcher.
2041218Sconklin
2042218SconklinThe simplest and perhaps most useful is to let the matcher (re)allocate
2043218Sconklinenough space to record information for all the groups in the regular
2044218Sconklinexpression.  If @code{regs_allocated} is @code{REGS_UNALLOCATED},
2045218Sconklin@vindex REGS_UNALLOCATED
2046218Sconklinthe matcher allocates @math{1 + @var{re_nsub}} (another field in the
2047218Sconklinpattern buffer; @pxref{GNU Pattern Buffers}).  The extra element is set
2048218Sconklinto @math{-1}, and sets @code{regs_allocated} to @code{REGS_REALLOCATE}.
2049218Sconklin@vindex REGS_REALLOCATE
2050218SconklinThen on subsequent calls with the same pattern buffer and @var{regs}
2051218Sconklinarguments, the matcher reallocates more space if necessary.
2052218Sconklin
2053218SconklinIt would perhaps be more logical to make the @code{regs_allocated} field
2054218Sconklinpart of the @code{re_registers} structure, instead of part of the
2055218Sconklinpattern buffer.  But in that case the caller would be forced to
2056218Sconklininitialize the structure before passing it.  Much existing code doesn't
2057218Sconklindo this initialization, and it's arguably better to avoid it anyway.
2058218Sconklin
2059218Sconklin@code{re_compile_pattern} sets @code{regs_allocated} to
2060218Sconklin@code{REGS_UNALLOCATED},
2061218Sconklinso if you use the GNU regular expression
2062218Sconklinfunctions, you get this behavior by default.
2063218Sconklin
2064218Sconklinxx document re_set_registers
2065218Sconklin
2066218Sconklin@sc{posix}, on the other hand, requires a different interface:  the
2067218Sconklincaller is supposed to pass in a fixed-length array which the matcher
2068218Sconklinfills.  Therefore, if @code{regs_allocated} is @code{REGS_FIXED} 
2069218Sconklin@vindex REGS_FIXED
2070218Sconklinthe matcher simply fills that array.
2071218Sconklin
2072218SconklinThe following examples illustrate the information recorded in the
2073218Sconklin@code{re_registers} structure.  (In all of them, @samp{(} represents the
2074218Sconklinopen-group and @samp{)} the close-group operator.  The first character
2075218Sconklinin the string @var{string} is at index 0.)
2076218Sconklin
2077218Sconklin@c xx i'm not sure this is all true anymore.
2078218Sconklin
2079218Sconklin@itemize @bullet
2080218Sconklin
2081218Sconklin@item 
2082218SconklinIf the regular expression has an @w{@var{i}-th}
2083218Sconklingroup not contained within another group that matches a
2084218Sconklinsubstring of @var{string}, then the function sets
2085218Sconklin@code{@w{@var{regs}->}start[@var{i}]} to the index in @var{string} where
2086218Sconklinthe substring matched by the @w{@var{i}-th} group begins, and
2087218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2088218Sconklinsubstring's end.  The function sets @code{@w{@var{regs}->}start[0]} and
2089218Sconklin@code{@w{@var{regs}->}end[0]} to analogous information about the entire
2090218Sconklinpattern.
2091218Sconklin
2092218SconklinFor example, when you match @samp{((a)(b))} against @samp{ab}, you get:
2093218Sconklin
209421643Sjkh@itemize @bullet
2095218Sconklin@item
2096218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 
2097218Sconklin
2098218Sconklin@item
2099218Sconklin0 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 
2100218Sconklin
2101218Sconklin@item
2102218Sconklin0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 
2103218Sconklin
2104218Sconklin@item
2105218Sconklin1 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]} 
2106218Sconklin@end itemize
2107218Sconklin
2108218Sconklin@item
2109218SconklinIf a group matches more than once (as it might if followed by,
2110218Sconkline.g., a repetition operator), then the function reports the information
2111218Sconklinabout what the group @emph{last} matched.
2112218Sconklin
2113218SconklinFor example, when you match the pattern @samp{(a)*} against the string
2114218Sconklin@samp{aa}, you get:
2115218Sconklin
211621643Sjkh@itemize @bullet
2117218Sconklin@item
2118218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 
2119218Sconklin
2120218Sconklin@item
2121218Sconklin1 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 
2122218Sconklin@end itemize
2123218Sconklin
2124218Sconklin@item
2125218SconklinIf the @w{@var{i}-th} group does not participate in a
2126218Sconklinsuccessful match, e.g., it is an alternative not taken or a
2127218Sconklinrepetition operator allows zero repetitions of it, then the function
2128218Sconklinsets @code{@w{@var{regs}->}start[@var{i}]} and
2129218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}.
2130218Sconklin
2131218SconklinFor example, when you match the pattern @samp{(a)*b} against
2132218Sconklinthe string @samp{b}, you get:
2133218Sconklin
213421643Sjkh@itemize @bullet
2135218Sconklin@item
2136218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 
2137218Sconklin
2138218Sconklin@item
2139218Sconklin@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 
2140218Sconklin@end itemize
2141218Sconklin
2142218Sconklin@item
2143218SconklinIf the @w{@var{i}-th} group matches a zero-length string, then the
2144218Sconklinfunction sets @code{@w{@var{regs}->}start[@var{i}]} and
2145218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2146218Sconklinzero-length string.  
2147218Sconklin
2148218SconklinFor example, when you match the pattern @samp{(a*)b} against the string
2149218Sconklin@samp{b}, you get:
2150218Sconklin
215121643Sjkh@itemize @bullet
2152218Sconklin@item
2153218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 
2154218Sconklin
2155218Sconklin@item
2156218Sconklin0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 
2157218Sconklin@end itemize
2158218Sconklin
2159218Sconklin@ignore
2160218SconklinThe function sets @code{@w{@var{regs}->}start[0]} and
2161218Sconklin@code{@w{@var{regs}->}end[0]} to analogous information about the entire
2162218Sconklinpattern.
2163218Sconklin
2164218SconklinFor example, when you match the pattern @samp{(a*)} against the empty
2165218Sconklinstring, you get:
2166218Sconklin
216721643Sjkh@itemize @bullet
2168218Sconklin@item
2169218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]} 
2170218Sconklin
2171218Sconklin@item
2172218Sconklin0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 
2173218Sconklin@end itemize
2174218Sconklin@end ignore
2175218Sconklin
2176218Sconklin@item
2177218SconklinIf an @w{@var{i}-th} group contains a @w{@var{j}-th} group 
2178218Sconklinin turn not contained within any other group within group @var{i} and
2179218Sconklinthe function reports a match of the @w{@var{i}-th} group, then it
2180218Sconklinrecords in @code{@w{@var{regs}->}start[@var{j}]} and
2181218Sconklin@code{@w{@var{regs}->}end[@var{j}]} the last match (if it matched) of
2182218Sconklinthe @w{@var{j}-th} group.
2183218Sconklin
2184218SconklinFor example, when you match the pattern @samp{((a*)b)*} against the
2185218Sconklinstring @samp{abb}, @w{group 2} last matches the empty string, so you
2186218Sconklinget what it previously matched:
2187218Sconklin
218821643Sjkh@itemize @bullet
2189218Sconklin@item
2190218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 
2191218Sconklin
2192218Sconklin@item
2193218Sconklin2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 
2194218Sconklin
2195218Sconklin@item
2196218Sconklin2 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]} 
2197218Sconklin@end itemize
2198218Sconklin
2199218SconklinWhen you match the pattern @samp{((a)*b)*} against the string
2200218Sconklin@samp{abb}, @w{group 2} doesn't participate in the last match, so you
2201218Sconklinget:
2202218Sconklin
220321643Sjkh@itemize @bullet
2204218Sconklin@item
2205218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 
2206218Sconklin
2207218Sconklin@item
2208218Sconklin2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 
2209218Sconklin
2210218Sconklin@item
2211218Sconklin0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 
2212218Sconklin@end itemize
2213218Sconklin
2214218Sconklin@item
2215218SconklinIf an @w{@var{i}-th} group contains a @w{@var{j}-th} group
2216218Sconklinin turn not contained within any other group within group @var{i}
2217218Sconklinand the function sets 
2218218Sconklin@code{@w{@var{regs}->}start[@var{i}]} and 
2219218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets
2220218Sconklin@code{@w{@var{regs}->}start[@var{j}]} and
2221218Sconklin@code{@w{@var{regs}->}end[@var{j}]} to @math{-1}.
2222218Sconklin
2223218SconklinFor example, when you match the pattern @samp{((a)*b)*c} against the
2224218Sconklinstring @samp{c}, you get:
2225218Sconklin
222621643Sjkh@itemize @bullet
2227218Sconklin@item
2228218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 
2229218Sconklin
2230218Sconklin@item
2231218Sconklin@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 
2232218Sconklin
2233218Sconklin@item
2234218Sconklin@math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]} 
2235218Sconklin@end itemize
2236218Sconklin
2237218Sconklin@end itemize
2238218Sconklin
2239218Sconklin@node Freeing GNU Pattern Buffers,  , Using Registers, GNU Regex Functions
2240218Sconklin@subsection Freeing GNU Pattern Buffers
2241218Sconklin
2242218SconklinTo free any allocated fields of a pattern buffer, you can use the
2243218Sconklin@sc{posix} function described in @ref{Freeing POSIX Pattern Buffers},
2244218Sconklinsince the type @code{regex_t}---the type for @sc{posix} pattern
2245218Sconklinbuffers---is equivalent to the type @code{re_pattern_buffer}.  After
2246218Sconklinfreeing a pattern buffer, you need to again compile a regular expression
2247218Sconklinin it (@pxref{GNU Regular Expression Compiling}) before passing it to
2248218Sconklina matching or searching function.
2249218Sconklin
2250218Sconklin
2251218Sconklin@node POSIX Regex Functions, BSD Regex Functions, GNU Regex Functions, Programming with Regex
2252218Sconklin@section POSIX Regex Functions
2253218Sconklin
2254218SconklinIf you're writing code that has to be @sc{posix} compatible, you'll need
2255218Sconklinto use these functions. Their interfaces are as specified by @sc{posix},
2256218Sconklindraft 1003.2/D11.2.
2257218Sconklin
2258218Sconklin@menu
2259218Sconklin* POSIX Pattern Buffers::		The regex_t type.
2260218Sconklin* POSIX Regular Expression Compiling::	regcomp ()
2261218Sconklin* POSIX Matching::			regexec ()
2262218Sconklin* Reporting Errors::			regerror ()
2263218Sconklin* Using Byte Offsets::			The regmatch_t type.
2264218Sconklin* Freeing POSIX Pattern Buffers::	regfree ()
2265218Sconklin@end menu
2266218Sconklin
2267218Sconklin
2268218Sconklin@node POSIX Pattern Buffers, POSIX Regular Expression Compiling,  , POSIX Regex Functions
2269218Sconklin@subsection POSIX Pattern Buffers
2270218Sconklin
2271218SconklinTo compile or match a given regular expression the @sc{posix} way, you
2272218Sconklinmust supply a pattern buffer exactly the way you do for @sc{gnu}
2273218Sconklin(@pxref{GNU Pattern Buffers}).  @sc{posix} pattern buffers have type
2274218Sconklin@code{regex_t}, which is equivalent to the @sc{gnu} pattern buffer
2275218Sconklintype @code{re_pattern_buffer}.
2276218Sconklin
2277218Sconklin
2278218Sconklin@node POSIX Regular Expression Compiling, POSIX Matching, POSIX Pattern Buffers, POSIX Regex Functions
2279218Sconklin@subsection POSIX Regular Expression Compiling
2280218Sconklin
2281218SconklinWith @sc{posix}, you can only search for a given regular expression; you
2282218Sconklincan't match it.  To do this, you must first compile it in a
2283218Sconklinpattern buffer, using @code{regcomp}.
2284218Sconklin
2285218Sconklin@ignore
2286218SconklinBefore calling @code{regcomp}, you must initialize this pattern buffer
2287218Sconklinas you do for @sc{gnu} (@pxref{GNU Regular Expression Compiling}).  See
2288218Sconklinbelow, however, for how to choose a syntax with which to compile.
2289218Sconklin@end ignore
2290218Sconklin
2291218SconklinTo compile a pattern buffer, use:
2292218Sconklin
2293218Sconklin@findex regcomp
2294218Sconklin@example
2295218Sconklinint
2296218Sconklinregcomp (regex_t *@var{preg}, const char *@var{regex}, int @var{cflags})
2297218Sconklin@end example
2298218Sconklin
2299218Sconklin@noindent
2300218Sconklin@var{preg} is the initialized pattern buffer's address, @var{regex} is
2301218Sconklinthe regular expression's address, and @var{cflags} is the compilation
2302218Sconklinflags, which Regex considers as a collection of bits.  Here are the
2303218Sconklinvalid bits, as defined in @file{regex.h}:
2304218Sconklin
2305218Sconklin@table @code
2306218Sconklin
2307218Sconklin@item REG_EXTENDED
2308218Sconklin@vindex REG_EXTENDED
2309218Sconklinsays to use @sc{posix} Extended Regular Expression syntax; if this isn't
2310218Sconklinset, then says to use @sc{posix} Basic Regular Expression syntax.
2311218Sconklin@code{regcomp} sets @var{preg}'s @code{syntax} field accordingly.
2312218Sconklin
2313218Sconklin@item REG_ICASE
2314218Sconklin@vindex REG_ICASE
2315218Sconklin@cindex ignoring case
2316218Sconklinsays to ignore case; @code{regcomp} sets @var{preg}'s @code{translate}
2317218Sconklinfield to a translate table which ignores case, replacing anything you've
2318218Sconklinput there before.
2319218Sconklin
2320218Sconklin@item REG_NOSUB
2321218Sconklin@vindex REG_NOSUB
2322218Sconklinsays to set @var{preg}'s @code{no_sub} field; @pxref{POSIX Matching},
2323218Sconklinfor what this means.
2324218Sconklin
2325218Sconklin@item REG_NEWLINE
2326218Sconklin@vindex REG_NEWLINE
2327218Sconklinsays that a:
2328218Sconklin
2329218Sconklin@itemize @bullet
2330218Sconklin
2331218Sconklin@item
2332218Sconklinmatch-any-character operator (@pxref{Match-any-character
2333218SconklinOperator}) doesn't match a newline.
2334218Sconklin
2335218Sconklin@item
2336218Sconklinnonmatching list not containing a newline (@pxref{List
2337218SconklinOperators}) matches a newline.
2338218Sconklin
2339218Sconklin@item
2340218Sconklinmatch-beginning-of-line operator (@pxref{Match-beginning-of-line
2341218SconklinOperator}) matches the empty string immediately after a newline,
2342218Sconklinregardless of how @code{REG_NOTBOL} is set (@pxref{POSIX Matching}, for
2343218Sconklinan explanation of @code{REG_NOTBOL}).
2344218Sconklin
2345218Sconklin@item
2346218Sconklinmatch-end-of-line operator (@pxref{Match-beginning-of-line
2347218SconklinOperator}) matches the empty string immediately before a newline,
2348218Sconklinregardless of how @code{REG_NOTEOL} is set (@pxref{POSIX Matching},
2349218Sconklinfor an explanation of @code{REG_NOTEOL}).
2350218Sconklin
2351218Sconklin@end itemize
2352218Sconklin
2353218Sconklin@end table
2354218Sconklin
2355218SconklinIf @code{regcomp} successfully compiles the regular expression, it
2356218Sconklinreturns zero and sets @code{*@var{pattern_buffer}} to the compiled
2357218Sconklinpattern. Except for @code{syntax} (which it sets as explained above), it
2358218Sconklinalso sets the same fields the same way as does the @sc{gnu} compiling
2359218Sconklinfunction (@pxref{GNU Regular Expression Compiling}).
2360218Sconklin
2361218SconklinIf @code{regcomp} can't compile the regular expression, it returns one
2362218Sconklinof the error codes listed here.  (Except when noted differently, the
2363218Sconklinsyntax of in all examples below is basic regular expression syntax.)
2364218Sconklin
2365218Sconklin@table @code
2366218Sconklin
2367218Sconklin@comment repetitions
2368218Sconklin@item REG_BADRPT
2369218SconklinFor example, the consecutive repetition operators @samp{**} in
2370218Sconklin@samp{a**} are invalid.  As another example, if the syntax is extended
2371218Sconklinregular expression syntax, then the repetition operator @samp{*} with
2372218Sconklinnothing on which to operate in @samp{*} is invalid.
2373218Sconklin
2374218Sconklin@item REG_BADBR
2375218SconklinFor example, the @var{count} @samp{-1} in @samp{a\@{-1} is invalid.
2376218Sconklin
2377218Sconklin@item REG_EBRACE
2378218SconklinFor example, @samp{a\@{1} is missing a close-interval operator.
2379218Sconklin
2380218Sconklin@comment lists
2381218Sconklin@item REG_EBRACK
2382218SconklinFor example, @samp{[a} is missing a close-list operator.
2383218Sconklin
2384218Sconklin@item REG_ERANGE
2385218SconklinFor example, the range ending point @samp{z} that collates lower than
2386218Sconklindoes its starting point @samp{a} in @samp{[z-a]} is invalid.  Also, the
2387218Sconklinrange with the character class @samp{[:alpha:]} as its starting point in
2388218Sconklin@samp{[[:alpha:]-|]}.
2389218Sconklin
2390218Sconklin@item REG_ECTYPE
2391218SconklinFor example, the character class name @samp{foo} in @samp{[[:foo:]} is
2392218Sconklininvalid.
2393218Sconklin
2394218Sconklin@comment groups
2395218Sconklin@item REG_EPAREN
2396218SconklinFor example, @samp{a\)} is missing an open-group operator and @samp{\(a}
2397218Sconklinis missing a close-group operator.
2398218Sconklin
2399218Sconklin@item REG_ESUBREG
2400218SconklinFor example, the back reference @samp{\2} that refers to a nonexistent
2401218Sconklinsubexpression in @samp{\(a\)\2} is invalid.
2402218Sconklin
2403218Sconklin@comment unfinished business
2404218Sconklin
2405218Sconklin@item REG_EEND
2406218SconklinReturned when a regular expression causes no other more specific error.
2407218Sconklin
2408218Sconklin@item REG_EESCAPE
2409218SconklinFor example, the trailing backslash @samp{\} in @samp{a\} is invalid, as is the
2410218Sconklinone in @samp{\}.
2411218Sconklin
2412218Sconklin@comment kitchen sink
2413218Sconklin@item REG_BADPAT
2414218SconklinFor example, in the extended regular expression syntax, the empty group
2415218Sconklin@samp{()} in @samp{a()b} is invalid.
2416218Sconklin
2417218Sconklin@comment internal
2418218Sconklin@item REG_ESIZE
2419218SconklinReturned when a regular expression needs a pattern buffer larger than
2420218Sconklin65536 bytes.
2421218Sconklin
2422218Sconklin@item REG_ESPACE
2423218SconklinReturned when a regular expression makes Regex to run out of memory.
2424218Sconklin
2425218Sconklin@end table
2426218Sconklin
2427218Sconklin
2428218Sconklin@node POSIX Matching, Reporting Errors, POSIX Regular Expression Compiling, POSIX Regex Functions
2429218Sconklin@subsection POSIX Matching 
2430218Sconklin
2431218SconklinMatching the @sc{posix} way means trying to match a null-terminated
2432218Sconklinstring starting at its first character.  Once you've compiled a pattern
2433218Sconklininto a pattern buffer (@pxref{POSIX Regular Expression Compiling}), you
2434218Sconklincan ask the matcher to match that pattern against a string using:
2435218Sconklin
2436218Sconklin@findex regexec
2437218Sconklin@example
2438218Sconklinint
2439218Sconklinregexec (const regex_t *@var{preg}, const char *@var{string}, 
2440218Sconklin         size_t @var{nmatch}, regmatch_t @var{pmatch}[], int @var{eflags})
2441218Sconklin@end example
2442218Sconklin
2443218Sconklin@noindent
2444218Sconklin@var{preg} is the address of a pattern buffer for a compiled pattern.
2445218Sconklin@var{string} is the string you want to match.  
2446218Sconklin
2447218Sconklin@xref{Using Byte Offsets}, for an explanation of @var{pmatch}.  If you
2448218Sconklinpass zero for @var{nmatch} or you compiled @var{preg} with the
2449218Sconklincompilation flag @code{REG_NOSUB} set, then @code{regexec} will ignore
2450218Sconklin@var{pmatch}; otherwise, you must allocate it to have at least
2451218Sconklin@var{nmatch} elements.  @code{regexec} will record @var{nmatch} byte
2452218Sconklinoffsets in @var{pmatch}, and set to @math{-1} any unused elements up to
2453218Sconklin@math{@var{pmatch}@code{[@var{nmatch}]} - 1}.
2454218Sconklin
2455218Sconklin@var{eflags} specifies @dfn{execution flags}---namely, the two bits
2456218Sconklin@code{REG_NOTBOL} and @code{REG_NOTEOL} (defined in @file{regex.h}).  If
2457218Sconklinyou set @code{REG_NOTBOL}, then the match-beginning-of-line operator
2458218Sconklin(@pxref{Match-beginning-of-line Operator}) always fails to match.
2459218SconklinThis lets you match against pieces of a line, as you would need to if,
2460218Sconklinsay, searching for repeated instances of a given pattern in a line; it
2461218Sconklinwould work correctly for patterns both with and without
2462218Sconklinmatch-beginning-of-line operators.  @code{REG_NOTEOL} works analogously
2463218Sconklinfor the match-end-of-line operator (@pxref{Match-end-of-line
2464218SconklinOperator}); it exists for symmetry.
2465218Sconklin
2466218Sconklin@code{regexec} tries to find a match for @var{preg} in @var{string}
2467218Sconklinaccording to the syntax in @var{preg}'s @code{syntax} field.
2468218Sconklin(@xref{POSIX Regular Expression Compiling}, for how to set it.)  The
2469218Sconklinfunction returns zero if the compiled pattern matches @var{string} and
2470218Sconklin@code{REG_NOMATCH} (defined in @file{regex.h}) if it doesn't.
2471218Sconklin
2472218Sconklin@node Reporting Errors, Using Byte Offsets, POSIX Matching, POSIX Regex Functions
2473218Sconklin@subsection Reporting Errors
2474218Sconklin
2475218SconklinIf either @code{regcomp} or @code{regexec} fail, they return a nonzero
2476218Sconklinerror code, the possibilities for which are defined in @file{regex.h}.
2477218Sconklin@xref{POSIX Regular Expression Compiling}, and @ref{POSIX Matching}, for
2478218Sconklinwhat these codes mean.  To get an error string corresponding to these
2479218Sconklincodes, you can use:
2480218Sconklin
2481218Sconklin@findex regerror
2482218Sconklin@example
2483218Sconklinsize_t
2484218Sconklinregerror (int @var{errcode},
2485218Sconklin          const regex_t *@var{preg},
2486218Sconklin          char *@var{errbuf},
2487218Sconklin          size_t @var{errbuf_size})
2488218Sconklin@end example
2489218Sconklin
2490218Sconklin@noindent
2491218Sconklin@var{errcode} is an error code, @var{preg} is the address of the pattern
2492218Sconklinbuffer which provoked the error, @var{errbuf} is the error buffer, and
2493218Sconklin@var{errbuf_size} is @var{errbuf}'s size.
2494218Sconklin
2495218Sconklin@code{regerror} returns the size in bytes of the error string
2496218Sconklincorresponding to @var{errcode} (including its terminating null).  If
2497218Sconklin@var{errbuf} and @var{errbuf_size} are nonzero, it also returns in
2498218Sconklin@var{errbuf} the first @math{@var{errbuf_size} - 1} characters of the
2499218Sconklinerror string, followed by a null.  
2500218Sconklin@var{errbuf_size} must be a nonnegative number less than or equal to the
2501218Sconklinsize in bytes of @var{errbuf}.
2502218Sconklin
2503218SconklinYou can call @code{regerror} with a null @var{errbuf} and a zero
2504218Sconklin@var{errbuf_size} to determine how large @var{errbuf} need be to
2505218Sconklinaccommodate @code{regerror}'s error string.
2506218Sconklin
2507218Sconklin@node Using Byte Offsets, Freeing POSIX Pattern Buffers, Reporting Errors, POSIX Regex Functions
2508218Sconklin@subsection Using Byte Offsets
2509218Sconklin
2510218SconklinIn @sc{posix}, variables of type @code{regmatch_t} hold analogous
2511218Sconklininformation, but are not identical to, @sc{gnu}'s registers (@pxref{Using
2512218SconklinRegisters}).  To get information about registers in @sc{posix}, pass to
2513218Sconklin@code{regexec} a nonzero @var{pmatch} of type @code{regmatch_t}, i.e.,
2514218Sconklinthe address of a structure of this type, defined in
2515218Sconklin@file{regex.h}:
2516218Sconklin
2517218Sconklin@tindex regmatch_t
2518218Sconklin@example
2519218Sconklintypedef struct
2520218Sconklin@{
2521218Sconklin  regoff_t rm_so;
2522218Sconklin  regoff_t rm_eo;
2523218Sconklin@} regmatch_t;
2524218Sconklin@end example
2525218Sconklin
2526218SconklinWhen reading in @ref{Using Registers}, about how the matching function
2527218Sconklinstores the information into the registers, substitute @var{pmatch} for
2528218Sconklin@var{regs}, @code{@w{@var{pmatch}[@var{i}]->}rm_so} for
2529218Sconklin@code{@w{@var{regs}->}start[@var{i}]} and
2530218Sconklin@code{@w{@var{pmatch}[@var{i}]->}rm_eo} for
2531218Sconklin@code{@w{@var{regs}->}end[@var{i}]}.
2532218Sconklin
2533218Sconklin@node Freeing POSIX Pattern Buffers,  , Using Byte Offsets, POSIX Regex Functions
2534218Sconklin@subsection Freeing POSIX Pattern Buffers
2535218Sconklin
2536218SconklinTo free any allocated fields of a pattern buffer, use:
2537218Sconklin
2538218Sconklin@findex regfree
2539218Sconklin@example
2540218Sconklinvoid 
2541218Sconklinregfree (regex_t *@var{preg})
2542218Sconklin@end example
2543218Sconklin
2544218Sconklin@noindent
2545218Sconklin@var{preg} is the pattern buffer whose allocated fields you want freed.
2546218Sconklin@code{regfree} also sets @var{preg}'s @code{allocated} and @code{used}
2547218Sconklinfields to zero.  After freeing a pattern buffer, you need to again
2548218Sconklincompile a regular expression in it (@pxref{POSIX Regular Expression
2549218SconklinCompiling}) before passing it to the matching function (@pxref{POSIX
2550218SconklinMatching}).
2551218Sconklin
2552218Sconklin
2553218Sconklin@node BSD Regex Functions,  , POSIX Regex Functions, Programming with Regex
2554218Sconklin@section BSD Regex Functions
2555218Sconklin
2556218SconklinIf you're writing code that has to be Berkeley @sc{unix} compatible,
2557218Sconklinyou'll need to use these functions whose interfaces are the same as those
2558218Sconklinin Berkeley @sc{unix}.  
2559218Sconklin
2560218Sconklin@menu
2561218Sconklin* BSD Regular Expression Compiling::	re_comp ()
2562218Sconklin* BSD Searching::			re_exec ()
2563218Sconklin@end menu
2564218Sconklin
2565218Sconklin@node BSD Regular Expression Compiling, BSD Searching,  , BSD Regex Functions
2566218Sconklin@subsection  BSD Regular Expression Compiling
2567218Sconklin
2568218SconklinWith Berkeley @sc{unix}, you can only search for a given regular
2569218Sconklinexpression; you can't match one.  To search for it, you must first
2570218Sconklincompile it.  Before you compile it, you must indicate the regular
2571218Sconklinexpression syntax you want it compiled according to by setting the 
2572218Sconklinvariable @code{re_syntax_options} (declared in @file{regex.h} to some
2573218Sconklinsyntax (@pxref{Regular Expression Syntax}).
2574218Sconklin
2575218SconklinTo compile a regular expression use:
2576218Sconklin
2577218Sconklin@findex re_comp
2578218Sconklin@example
2579218Sconklinchar *
2580218Sconklinre_comp (char *@var{regex})
2581218Sconklin@end example
2582218Sconklin
2583218Sconklin@noindent
2584218Sconklin@var{regex} is the address of a null-terminated regular expression.
2585218Sconklin@code{re_comp} uses an internal pattern buffer, so you can use only the
2586218Sconklinmost recently compiled pattern buffer.  This means that if you want to
2587218Sconklinuse a given regular expression that you've already compiled---but it
2588218Sconklinisn't the latest one you've compiled---you'll have to recompile it.  If
2589218Sconklinyou call @code{re_comp} with the null string (@emph{not} the empty
2590218Sconklinstring) as the argument, it doesn't change the contents of the pattern
2591218Sconklinbuffer.
2592218Sconklin
2593218SconklinIf @code{re_comp} successfully compiles the regular expression, it
2594218Sconklinreturns zero.  If it can't compile the regular expression, it returns
2595218Sconklinan error string.  @code{re_comp}'s error messages are identical to those
2596218Sconklinof @code{re_compile_pattern} (@pxref{GNU Regular Expression
2597218SconklinCompiling}).
2598218Sconklin
2599218Sconklin@node BSD Searching,  , BSD Regular Expression Compiling, BSD Regex Functions
2600218Sconklin@subsection BSD Searching 
2601218Sconklin
2602218SconklinSearching the Berkeley @sc{unix} way means searching in a string
2603218Sconklinstarting at its first character and trying successive positions within
2604218Sconklinit to find a match.  Once you've compiled a pattern using @code{re_comp}
2605218Sconklin(@pxref{BSD Regular Expression Compiling}), you can ask Regex
2606218Sconklinto search for that pattern in a string using:
2607218Sconklin
2608218Sconklin@findex re_exec
2609218Sconklin@example
2610218Sconklinint
2611218Sconklinre_exec (char *@var{string})
2612218Sconklin@end example
2613218Sconklin
2614218Sconklin@noindent
2615218Sconklin@var{string} is the address of the null-terminated string in which you
2616218Sconklinwant to search.
2617218Sconklin
2618218Sconklin@code{re_exec} returns either 1 for success or 0 for failure.  It
2619218Sconklinautomatically uses a @sc{gnu} fastmap (@pxref{Searching with Fastmaps}).
2620218Sconklin
2621218Sconklin
2622218Sconklin@node Copying, Index, Programming with Regex, Top
2623218Sconklin@appendix GNU GENERAL PUBLIC LICENSE
2624218Sconklin@center Version 2, June 1991
2625218Sconklin
2626218Sconklin@display
2627218SconklinCopyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
2628218Sconklin675 Mass Ave, Cambridge, MA 02139, USA
2629218Sconklin
2630218SconklinEveryone is permitted to copy and distribute verbatim copies
2631218Sconklinof this license document, but changing it is not allowed.
2632218Sconklin@end display
2633218Sconklin
2634218Sconklin@unnumberedsec Preamble
2635218Sconklin
2636218Sconklin  The licenses for most software are designed to take away your
2637218Sconklinfreedom to share and change it.  By contrast, the GNU General Public
2638218SconklinLicense is intended to guarantee your freedom to share and change free
2639218Sconklinsoftware---to make sure the software is free for all its users.  This
2640218SconklinGeneral Public License applies to most of the Free Software
2641218SconklinFoundation's software and to any other program whose authors commit to
2642218Sconklinusing it.  (Some other Free Software Foundation software is covered by
2643218Sconklinthe GNU Library General Public License instead.)  You can apply it to
2644218Sconklinyour programs, too.
2645218Sconklin
2646218Sconklin  When we speak of free software, we are referring to freedom, not
2647218Sconklinprice.  Our General Public Licenses are designed to make sure that you
2648218Sconklinhave the freedom to distribute copies of free software (and charge for
2649218Sconklinthis service if you wish), that you receive source code or can get it
2650218Sconklinif you want it, that you can change the software or use pieces of it
2651218Sconklinin new free programs; and that you know you can do these things.
2652218Sconklin
2653218Sconklin  To protect your rights, we need to make restrictions that forbid
2654218Sconklinanyone to deny you these rights or to ask you to surrender the rights.
2655218SconklinThese restrictions translate to certain responsibilities for you if you
2656218Sconklindistribute copies of the software, or if you modify it.
2657218Sconklin
2658218Sconklin  For example, if you distribute copies of such a program, whether
2659218Sconklingratis or for a fee, you must give the recipients all the rights that
2660218Sconklinyou have.  You must make sure that they, too, receive or can get the
2661218Sconklinsource code.  And you must show them these terms so they know their
2662218Sconklinrights.
2663218Sconklin
2664218Sconklin  We protect your rights with two steps: (1) copyright the software, and
2665218Sconklin(2) offer you this license which gives you legal permission to copy,
2666218Sconklindistribute and/or modify the software.
2667218Sconklin
2668218Sconklin  Also, for each author's protection and ours, we want to make certain
2669218Sconklinthat everyone understands that there is no warranty for this free
2670218Sconklinsoftware.  If the software is modified by someone else and passed on, we
2671218Sconklinwant its recipients to know that what they have is not the original, so
2672218Sconklinthat any problems introduced by others will not reflect on the original
2673218Sconklinauthors' reputations.
2674218Sconklin
2675218Sconklin  Finally, any free program is threatened constantly by software
2676218Sconklinpatents.  We wish to avoid the danger that redistributors of a free
2677218Sconklinprogram will individually obtain patent licenses, in effect making the
2678218Sconklinprogram proprietary.  To prevent this, we have made it clear that any
2679218Sconklinpatent must be licensed for everyone's free use or not licensed at all.
2680218Sconklin
2681218Sconklin  The precise terms and conditions for copying, distribution and
2682218Sconklinmodification follow.
2683218Sconklin
2684218Sconklin@iftex
2685218Sconklin@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2686218Sconklin@end iftex
2687218Sconklin@ifinfo
2688218Sconklin@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2689218Sconklin@end ifinfo
2690218Sconklin
2691218Sconklin@enumerate
2692218Sconklin@item
2693218SconklinThis License applies to any program or other work which contains
2694218Sconklina notice placed by the copyright holder saying it may be distributed
2695218Sconklinunder the terms of this General Public License.  The ``Program'', below,
2696218Sconklinrefers to any such program or work, and a ``work based on the Program''
2697218Sconklinmeans either the Program or any derivative work under copyright law:
2698218Sconklinthat is to say, a work containing the Program or a portion of it,
2699218Sconklineither verbatim or with modifications and/or translated into another
2700218Sconklinlanguage.  (Hereinafter, translation is included without limitation in
2701218Sconklinthe term ``modification''.)  Each licensee is addressed as ``you''.
2702218Sconklin
2703218SconklinActivities other than copying, distribution and modification are not
2704218Sconklincovered by this License; they are outside its scope.  The act of
2705218Sconklinrunning the Program is not restricted, and the output from the Program
2706218Sconklinis covered only if its contents constitute a work based on the
2707218SconklinProgram (independent of having been made by running the Program).
2708218SconklinWhether that is true depends on what the Program does.
2709218Sconklin
2710218Sconklin@item
2711218SconklinYou may copy and distribute verbatim copies of the Program's
2712218Sconklinsource code as you receive it, in any medium, provided that you
2713218Sconklinconspicuously and appropriately publish on each copy an appropriate
2714218Sconklincopyright notice and disclaimer of warranty; keep intact all the
2715218Sconklinnotices that refer to this License and to the absence of any warranty;
2716218Sconklinand give any other recipients of the Program a copy of this License
2717218Sconklinalong with the Program.
2718218Sconklin
2719218SconklinYou may charge a fee for the physical act of transferring a copy, and
2720218Sconklinyou may at your option offer warranty protection in exchange for a fee.
2721218Sconklin
2722218Sconklin@item
2723218SconklinYou may modify your copy or copies of the Program or any portion
2724218Sconklinof it, thus forming a work based on the Program, and copy and
2725218Sconklindistribute such modifications or work under the terms of Section 1
2726218Sconklinabove, provided that you also meet all of these conditions:
2727218Sconklin
2728218Sconklin@enumerate a
2729218Sconklin@item
2730218SconklinYou must cause the modified files to carry prominent notices
2731218Sconklinstating that you changed the files and the date of any change.
2732218Sconklin
2733218Sconklin@item
2734218SconklinYou must cause any work that you distribute or publish, that in
2735218Sconklinwhole or in part contains or is derived from the Program or any
2736218Sconklinpart thereof, to be licensed as a whole at no charge to all third
2737218Sconklinparties under the terms of this License.
2738218Sconklin
2739218Sconklin@item
2740218SconklinIf the modified program normally reads commands interactively
2741218Sconklinwhen run, you must cause it, when started running for such
2742218Sconklininteractive use in the most ordinary way, to print or display an
2743218Sconklinannouncement including an appropriate copyright notice and a
2744218Sconklinnotice that there is no warranty (or else, saying that you provide
2745218Sconklina warranty) and that users may redistribute the program under
2746218Sconklinthese conditions, and telling the user how to view a copy of this
2747218SconklinLicense.  (Exception: if the Program itself is interactive but
2748218Sconklindoes not normally print such an announcement, your work based on
2749218Sconklinthe Program is not required to print an announcement.)
2750218Sconklin@end enumerate
2751218Sconklin
2752218SconklinThese requirements apply to the modified work as a whole.  If
2753218Sconklinidentifiable sections of that work are not derived from the Program,
2754218Sconklinand can be reasonably considered independent and separate works in
2755218Sconklinthemselves, then this License, and its terms, do not apply to those
2756218Sconklinsections when you distribute them as separate works.  But when you
2757218Sconklindistribute the same sections as part of a whole which is a work based
2758218Sconklinon the Program, the distribution of the whole must be on the terms of
2759218Sconklinthis License, whose permissions for other licensees extend to the
2760218Sconklinentire whole, and thus to each and every part regardless of who wrote it.
2761218Sconklin
2762218SconklinThus, it is not the intent of this section to claim rights or contest
2763218Sconklinyour rights to work written entirely by you; rather, the intent is to
2764218Sconklinexercise the right to control the distribution of derivative or
2765218Sconklincollective works based on the Program.
2766218Sconklin
2767218SconklinIn addition, mere aggregation of another work not based on the Program
2768218Sconklinwith the Program (or with a work based on the Program) on a volume of
2769218Sconklina storage or distribution medium does not bring the other work under
2770218Sconklinthe scope of this License.
2771218Sconklin
2772218Sconklin@item
2773218SconklinYou may copy and distribute the Program (or a work based on it,
2774218Sconklinunder Section 2) in object code or executable form under the terms of
2775218SconklinSections 1 and 2 above provided that you also do one of the following:
2776218Sconklin
2777218Sconklin@enumerate a
2778218Sconklin@item
2779218SconklinAccompany it with the complete corresponding machine-readable
2780218Sconklinsource code, which must be distributed under the terms of Sections
2781218Sconklin1 and 2 above on a medium customarily used for software interchange; or,
2782218Sconklin
2783218Sconklin@item
2784218SconklinAccompany it with a written offer, valid for at least three
2785218Sconklinyears, to give any third party, for a charge no more than your
2786218Sconklincost of physically performing source distribution, a complete
2787218Sconklinmachine-readable copy of the corresponding source code, to be
2788218Sconklindistributed under the terms of Sections 1 and 2 above on a medium
2789218Sconklincustomarily used for software interchange; or,
2790218Sconklin
2791218Sconklin@item
2792218SconklinAccompany it with the information you received as to the offer
2793218Sconklinto distribute corresponding source code.  (This alternative is
2794218Sconklinallowed only for noncommercial distribution and only if you
2795218Sconklinreceived the program in object code or executable form with such
2796218Sconklinan offer, in accord with Subsection b above.)
2797218Sconklin@end enumerate
2798218Sconklin
2799218SconklinThe source code for a work means the preferred form of the work for
2800218Sconklinmaking modifications to it.  For an executable work, complete source
2801218Sconklincode means all the source code for all modules it contains, plus any
2802218Sconklinassociated interface definition files, plus the scripts used to
2803218Sconklincontrol compilation and installation of the executable.  However, as a
2804218Sconklinspecial exception, the source code distributed need not include
2805218Sconklinanything that is normally distributed (in either source or binary
2806218Sconklinform) with the major components (compiler, kernel, and so on) of the
2807218Sconklinoperating system on which the executable runs, unless that component
2808218Sconklinitself accompanies the executable.
2809218Sconklin
2810218SconklinIf distribution of executable or object code is made by offering
2811218Sconklinaccess to copy from a designated place, then offering equivalent
2812218Sconklinaccess to copy the source code from the same place counts as
2813218Sconklindistribution of the source code, even though third parties are not
2814218Sconklincompelled to copy the source along with the object code.
2815218Sconklin
2816218Sconklin@item
2817218SconklinYou may not copy, modify, sublicense, or distribute the Program
2818218Sconklinexcept as expressly provided under this License.  Any attempt
2819218Sconklinotherwise to copy, modify, sublicense or distribute the Program is
2820218Sconklinvoid, and will automatically terminate your rights under this License.
2821218SconklinHowever, parties who have received copies, or rights, from you under
2822218Sconklinthis License will not have their licenses terminated so long as such
2823218Sconklinparties remain in full compliance.
2824218Sconklin
2825218Sconklin@item
2826218SconklinYou are not required to accept this License, since you have not
2827218Sconklinsigned it.  However, nothing else grants you permission to modify or
2828218Sconklindistribute the Program or its derivative works.  These actions are
2829218Sconklinprohibited by law if you do not accept this License.  Therefore, by
2830218Sconklinmodifying or distributing the Program (or any work based on the
2831218SconklinProgram), you indicate your acceptance of this License to do so, and
2832218Sconklinall its terms and conditions for copying, distributing or modifying
2833218Sconklinthe Program or works based on it.
2834218Sconklin
2835218Sconklin@item
2836218SconklinEach time you redistribute the Program (or any work based on the
2837218SconklinProgram), the recipient automatically receives a license from the
2838218Sconklinoriginal licensor to copy, distribute or modify the Program subject to
2839218Sconklinthese terms and conditions.  You may not impose any further
2840218Sconklinrestrictions on the recipients' exercise of the rights granted herein.
2841218SconklinYou are not responsible for enforcing compliance by third parties to
2842218Sconklinthis License.
2843218Sconklin
2844218Sconklin@item
2845218SconklinIf, as a consequence of a court judgment or allegation of patent
2846218Sconklininfringement or for any other reason (not limited to patent issues),
2847218Sconklinconditions are imposed on you (whether by court order, agreement or
2848218Sconklinotherwise) that contradict the conditions of this License, they do not
2849218Sconklinexcuse you from the conditions of this License.  If you cannot
2850218Sconklindistribute so as to satisfy simultaneously your obligations under this
2851218SconklinLicense and any other pertinent obligations, then as a consequence you
2852218Sconklinmay not distribute the Program at all.  For example, if a patent
2853218Sconklinlicense would not permit royalty-free redistribution of the Program by
2854218Sconklinall those who receive copies directly or indirectly through you, then
2855218Sconklinthe only way you could satisfy both it and this License would be to
2856218Sconklinrefrain entirely from distribution of the Program.
2857218Sconklin
2858218SconklinIf any portion of this section is held invalid or unenforceable under
2859218Sconklinany particular circumstance, the balance of the section is intended to
2860218Sconklinapply and the section as a whole is intended to apply in other
2861218Sconklincircumstances.
2862218Sconklin
2863218SconklinIt is not the purpose of this section to induce you to infringe any
2864218Sconklinpatents or other property right claims or to contest validity of any
2865218Sconklinsuch claims; this section has the sole purpose of protecting the
2866218Sconklinintegrity of the free software distribution system, which is
2867218Sconklinimplemented by public license practices.  Many people have made
2868218Sconklingenerous contributions to the wide range of software distributed
2869218Sconklinthrough that system in reliance on consistent application of that
2870218Sconklinsystem; it is up to the author/donor to decide if he or she is willing
2871218Sconklinto distribute software through any other system and a licensee cannot
2872218Sconklinimpose that choice.
2873218Sconklin
2874218SconklinThis section is intended to make thoroughly clear what is believed to
2875218Sconklinbe a consequence of the rest of this License.
2876218Sconklin
2877218Sconklin@item
2878218SconklinIf the distribution and/or use of the Program is restricted in
2879218Sconklincertain countries either by patents or by copyrighted interfaces, the
2880218Sconklinoriginal copyright holder who places the Program under this License
2881218Sconklinmay add an explicit geographical distribution limitation excluding
2882218Sconklinthose countries, so that distribution is permitted only in or among
2883218Sconklincountries not thus excluded.  In such case, this License incorporates
2884218Sconklinthe limitation as if written in the body of this License.
2885218Sconklin
2886218Sconklin@item
2887218SconklinThe Free Software Foundation may publish revised and/or new versions
2888218Sconklinof the General Public License from time to time.  Such new versions will
2889218Sconklinbe similar in spirit to the present version, but may differ in detail to
2890218Sconklinaddress new problems or concerns.
2891218Sconklin
2892218SconklinEach version is given a distinguishing version number.  If the Program
2893218Sconklinspecifies a version number of this License which applies to it and ``any
2894218Sconklinlater version'', you have the option of following the terms and conditions
2895218Sconklineither of that version or of any later version published by the Free
2896218SconklinSoftware Foundation.  If the Program does not specify a version number of
2897218Sconklinthis License, you may choose any version ever published by the Free Software
2898218SconklinFoundation.
2899218Sconklin
2900218Sconklin@item
2901218SconklinIf you wish to incorporate parts of the Program into other free
2902218Sconklinprograms whose distribution conditions are different, write to the author
2903218Sconklinto ask for permission.  For software which is copyrighted by the Free
2904218SconklinSoftware Foundation, write to the Free Software Foundation; we sometimes
2905218Sconklinmake exceptions for this.  Our decision will be guided by the two goals
2906218Sconklinof preserving the free status of all derivatives of our free software and
2907218Sconklinof promoting the sharing and reuse of software generally.
2908218Sconklin
2909218Sconklin@iftex
2910218Sconklin@heading NO WARRANTY
2911218Sconklin@end iftex
2912218Sconklin@ifinfo
2913218Sconklin@center NO WARRANTY
2914218Sconklin@end ifinfo
2915218Sconklin
2916218Sconklin@item
2917218SconklinBECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
2918218SconklinFOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
2919218SconklinOTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
2920218SconklinPROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
2921218SconklinOR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
2922218SconklinMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
2923218SconklinTO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
2924218SconklinPROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
2925218SconklinREPAIR OR CORRECTION.
2926218Sconklin
2927218Sconklin@item
2928218SconklinIN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
2929218SconklinWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
2930218SconklinREDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
2931218SconklinINCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
2932218SconklinOUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
2933218SconklinTO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
2934218SconklinYOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
2935218SconklinPROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
2936218SconklinPOSSIBILITY OF SUCH DAMAGES.
2937218Sconklin@end enumerate
2938218Sconklin
2939218Sconklin@iftex
2940218Sconklin@heading END OF TERMS AND CONDITIONS
2941218Sconklin@end iftex
2942218Sconklin@ifinfo
2943218Sconklin@center END OF TERMS AND CONDITIONS
2944218Sconklin@end ifinfo
2945218Sconklin
2946218Sconklin@page
2947218Sconklin@unnumberedsec Appendix: How to Apply These Terms to Your New Programs
2948218Sconklin
2949218Sconklin  If you develop a new program, and you want it to be of the greatest
2950218Sconklinpossible use to the public, the best way to achieve this is to make it
2951218Sconklinfree software which everyone can redistribute and change under these terms.
2952218Sconklin
2953218Sconklin  To do so, attach the following notices to the program.  It is safest
2954218Sconklinto attach them to the start of each source file to most effectively
2955218Sconklinconvey the exclusion of warranty; and each file should have at least
2956218Sconklinthe ``copyright'' line and a pointer to where the full notice is found.
2957218Sconklin
2958218Sconklin@smallexample
2959218Sconklin@var{one line to give the program's name and a brief idea of what it does.}
2960218SconklinCopyright (C) 19@var{yy}  @var{name of author}
2961218Sconklin
2962218SconklinThis program is free software; you can redistribute it and/or modify
2963218Sconklinit under the terms of the GNU General Public License as published by
2964218Sconklinthe Free Software Foundation; either version 2 of the License, or
2965218Sconklin(at your option) any later version.
2966218Sconklin
2967218SconklinThis program is distributed in the hope that it will be useful,
2968218Sconklinbut WITHOUT ANY WARRANTY; without even the implied warranty of
2969218SconklinMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
2970218SconklinGNU General Public License for more details.
2971218Sconklin
2972218SconklinYou should have received a copy of the GNU General Public License
2973218Sconklinalong with this program; if not, write to the Free Software
2974218SconklinFoundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
2975218Sconklin@end smallexample
2976218Sconklin
2977218SconklinAlso add information on how to contact you by electronic and paper mail.
2978218Sconklin
2979218SconklinIf the program is interactive, make it output a short notice like this
2980218Sconklinwhen it starts in an interactive mode:
2981218Sconklin
2982218Sconklin@smallexample
2983218SconklinGnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
2984218SconklinGnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
2985218SconklinThis is free software, and you are welcome to redistribute it
2986218Sconklinunder certain conditions; type `show c' for details.
2987218Sconklin@end smallexample
2988218Sconklin
2989218SconklinThe hypothetical commands @samp{show w} and @samp{show c} should show
2990218Sconklinthe appropriate parts of the General Public License.  Of course, the
2991218Sconklincommands you use may be called something other than @samp{show w} and
2992218Sconklin@samp{show c}; they could even be mouse-clicks or menu items---whatever
2993218Sconklinsuits your program.
2994218Sconklin
2995218SconklinYou should also get your employer (if you work as a programmer) or your
2996218Sconklinschool, if any, to sign a ``copyright disclaimer'' for the program, if
2997218Sconklinnecessary.  Here is a sample; alter the names:
2998218Sconklin
2999218Sconklin@example
3000218SconklinYoyodyne, Inc., hereby disclaims all copyright interest in the program
3001218Sconklin`Gnomovision' (which makes passes at compilers) written by James Hacker.
3002218Sconklin
3003218Sconklin@var{signature of Ty Coon}, 1 April 1989
3004218SconklinTy Coon, President of Vice
3005218Sconklin@end example
3006218Sconklin
3007218SconklinThis General Public License does not permit incorporating your program into
3008218Sconklinproprietary programs.  If your program is a subroutine library, you may
3009218Sconklinconsider it more useful to permit linking proprietary applications with the
3010218Sconklinlibrary.  If this is what you want to do, use the GNU Library General
3011218SconklinPublic License instead of this License.
3012218Sconklin
3013218Sconklin
3014218Sconklin@node Index,  , Copying, Top
3015218Sconklin@unnumbered Index
3016218Sconklin
3017218Sconklin@printindex cp
3018218Sconklin
3019218Sconklin@contents
3020218Sconklin
3021218Sconklin@bye
3022