1218Sconklin\input texinfo 2218Sconklin@c %**start of header 3218Sconklin@setfilename regex.info 4218Sconklin@settitle Regex 5218Sconklin@c %**end of header 6218Sconklin 7218Sconklin@c \\{fill-paragraph} works better (for me, anyway) if the text in the 8218Sconklin@c source file isn't indented. 9218Sconklin@paragraphindent 2 10218Sconklin 11218Sconklin@c Define a new index for our magic constants. 12218Sconklin@defcodeindex cn 13218Sconklin 14218Sconklin@c Put everything in one index (arbitrarily chosen to be the concept index). 15218Sconklin@syncodeindex cn cp 16218Sconklin@syncodeindex ky cp 17218Sconklin@syncodeindex pg cp 18218Sconklin@syncodeindex tp cp 19218Sconklin@syncodeindex vr cp 20218Sconklin 21218Sconklin@c Here is what we use in the Info `dir' file: 22218Sconklin@c * Regex: (regex). Regular expression library. 23218Sconklin 24218Sconklin 25218Sconklin@ifinfo 26218SconklinThis file documents the GNU regular expression library. 27218Sconklin 28218SconklinCopyright (C) 1992, 1993 Free Software Foundation, Inc. 29218Sconklin 30218SconklinPermission is granted to make and distribute verbatim copies of this 31218Sconklinmanual provided the copyright notice and this permission notice are 32218Sconklinpreserved on all copies. 33218Sconklin 34218Sconklin@ignore 35218SconklinPermission is granted to process this file through TeX and print the 36218Sconklinresults, provided the printed document carries a copying permission 37218Sconklinnotice identical to this one except for the removal of this paragraph 38218Sconklin(this paragraph not being relevant to the printed manual). 39218Sconklin@end ignore 40218Sconklin 41218SconklinPermission is granted to copy and distribute modified versions of this 42218Sconklinmanual under the conditions for verbatim copying, provided also that the 43218Sconklinsection entitled ``GNU General Public License'' is included exactly as 44218Sconklinin the original, and provided that the entire resulting derived work is 45218Sconklindistributed under the terms of a permission notice identical to this one. 46218Sconklin 47218SconklinPermission is granted to copy and distribute translations of this manual 48218Sconklininto another language, under the above conditions for modified versions, 49218Sconklinexcept that the section entitled ``GNU General Public License'' may be 50218Sconklinincluded in a translation approved by the Free Software Foundation 51218Sconklininstead of in the original English. 52218Sconklin@end ifinfo 53218Sconklin 54218Sconklin 55218Sconklin@titlepage 56218Sconklin 57218Sconklin@title Regex 58218Sconklin@subtitle edition 0.12a 59218Sconklin@subtitle 19 September 1992 60218Sconklin@author Kathryn A. Hargreaves 61218Sconklin@author Karl Berry 62218Sconklin 63218Sconklin@page 64218Sconklin 65218Sconklin@vskip 0pt plus 1filll 66218SconklinCopyright @copyright{} 1992 Free Software Foundation. 67218Sconklin 68218SconklinPermission is granted to make and distribute verbatim copies of this 69218Sconklinmanual provided the copyright notice and this permission notice are 70218Sconklinpreserved on all copies. 71218Sconklin 72218SconklinPermission is granted to copy and distribute modified versions of this 73218Sconklinmanual under the conditions for verbatim copying, provided also that the 74218Sconklinsection entitled ``GNU General Public License'' is included exactly as 75218Sconklinin the original, and provided that the entire resulting derived work is 76218Sconklindistributed under the terms of a permission notice identical to this 77218Sconklinone. 78218Sconklin 79218SconklinPermission is granted to copy and distribute translations of this manual 80218Sconklininto another language, under the above conditions for modified versions, 81218Sconklinexcept that the section entitled ``GNU General Public License'' may be 82218Sconklinincluded in a translation approved by the Free Software Foundation 83218Sconklininstead of in the original English. 84218Sconklin 85218Sconklin@end titlepage 86218Sconklin 87218Sconklin 88218Sconklin@ifinfo 89218Sconklin@node Top, Overview, (dir), (dir) 90218Sconklin@top Regular Expression Library 91218Sconklin 92218SconklinThis manual documents how to program with the GNU regular expression 93218Sconklinlibrary. This is edition 0.12a of the manual, 19 September 1992. 94218Sconklin 95218SconklinThe first part of this master menu lists the major nodes in this Info 96218Sconklindocument, including the index. The rest of the menu lists all the 97218Sconklinlower level nodes in the document. 98218Sconklin 99218Sconklin@menu 100218Sconklin* Overview:: 101218Sconklin* Regular Expression Syntax:: 102218Sconklin* Common Operators:: 103218Sconklin* GNU Operators:: 104218Sconklin* GNU Emacs Operators:: 105218Sconklin* What Gets Matched?:: 106218Sconklin* Programming with Regex:: 107218Sconklin* Copying:: Copying and sharing Regex. 108218Sconklin* Index:: General index. 109218Sconklin --- The Detailed Node Listing --- 110218Sconklin 111218SconklinRegular Expression Syntax 112218Sconklin 113218Sconklin* Syntax Bits:: 114218Sconklin* Predefined Syntaxes:: 115218Sconklin* Collating Elements vs. Characters:: 116218Sconklin* The Backslash Character:: 117218Sconklin 118218SconklinCommon Operators 119218Sconklin 120218Sconklin* Match-self Operator:: Ordinary characters. 121218Sconklin* Match-any-character Operator:: . 122218Sconklin* Concatenation Operator:: Juxtaposition. 123218Sconklin* Repetition Operators:: * + ? @{@} 124218Sconklin* Alternation Operator:: | 125218Sconklin* List Operators:: [...] [^...] 126218Sconklin* Grouping Operators:: (...) 127218Sconklin* Back-reference Operator:: \digit 128218Sconklin* Anchoring Operators:: ^ $ 129218Sconklin 130218SconklinRepetition Operators 131218Sconklin 132218Sconklin* Match-zero-or-more Operator:: * 133218Sconklin* Match-one-or-more Operator:: + 134218Sconklin* Match-zero-or-one Operator:: ? 135218Sconklin* Interval Operators:: @{@} 136218Sconklin 137218SconklinList Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]}) 138218Sconklin 139218Sconklin* Character Class Operators:: [:class:] 140218Sconklin* Range Operator:: start-end 141218Sconklin 142218SconklinAnchoring Operators 143218Sconklin 144218Sconklin* Match-beginning-of-line Operator:: ^ 145218Sconklin* Match-end-of-line Operator:: $ 146218Sconklin 147218SconklinGNU Operators 148218Sconklin 149218Sconklin* Word Operators:: 150218Sconklin* Buffer Operators:: 151218Sconklin 152218SconklinWord Operators 153218Sconklin 154218Sconklin* Non-Emacs Syntax Tables:: 155218Sconklin* Match-word-boundary Operator:: \b 156218Sconklin* Match-within-word Operator:: \B 157218Sconklin* Match-beginning-of-word Operator:: \< 158218Sconklin* Match-end-of-word Operator:: \> 159218Sconklin* Match-word-constituent Operator:: \w 160218Sconklin* Match-non-word-constituent Operator:: \W 161218Sconklin 162218SconklinBuffer Operators 163218Sconklin 164218Sconklin* Match-beginning-of-buffer Operator:: \` 165218Sconklin* Match-end-of-buffer Operator:: \' 166218Sconklin 167218SconklinGNU Emacs Operators 168218Sconklin 169218Sconklin* Syntactic Class Operators:: 170218Sconklin 171218SconklinSyntactic Class Operators 172218Sconklin 173218Sconklin* Emacs Syntax Tables:: 174218Sconklin* Match-syntactic-class Operator:: \sCLASS 175218Sconklin* Match-not-syntactic-class Operator:: \SCLASS 176218Sconklin 177218SconklinProgramming with Regex 178218Sconklin 179218Sconklin* GNU Regex Functions:: 180218Sconklin* POSIX Regex Functions:: 181218Sconklin* BSD Regex Functions:: 182218Sconklin 183218SconklinGNU Regex Functions 184218Sconklin 185218Sconklin* GNU Pattern Buffers:: The re_pattern_buffer type. 186218Sconklin* GNU Regular Expression Compiling:: re_compile_pattern () 187218Sconklin* GNU Matching:: re_match () 188218Sconklin* GNU Searching:: re_search () 189218Sconklin* Matching/Searching with Split Data:: re_match_2 (), re_search_2 () 190218Sconklin* Searching with Fastmaps:: re_compile_fastmap () 191218Sconklin* GNU Translate Tables:: The `translate' field. 192218Sconklin* Using Registers:: The re_registers type and related fns. 193218Sconklin* Freeing GNU Pattern Buffers:: regfree () 194218Sconklin 195218SconklinPOSIX Regex Functions 196218Sconklin 197218Sconklin* POSIX Pattern Buffers:: The regex_t type. 198218Sconklin* POSIX Regular Expression Compiling:: regcomp () 199218Sconklin* POSIX Matching:: regexec () 200218Sconklin* Reporting Errors:: regerror () 201218Sconklin* Using Byte Offsets:: The regmatch_t type. 202218Sconklin* Freeing POSIX Pattern Buffers:: regfree () 203218Sconklin 204218SconklinBSD Regex Functions 205218Sconklin 206218Sconklin* BSD Regular Expression Compiling:: re_comp () 207218Sconklin* BSD Searching:: re_exec () 208218Sconklin@end menu 209218Sconklin@end ifinfo 210218Sconklin@node Overview, Regular Expression Syntax, Top, Top 211218Sconklin@chapter Overview 212218Sconklin 213218SconklinA @dfn{regular expression} (or @dfn{regexp}, or @dfn{pattern}) is a text 214218Sconklinstring that describes some (mathematical) set of strings. A regexp 215218Sconklin@var{r} @dfn{matches} a string @var{s} if @var{s} is in the set of 216218Sconklinstrings described by @var{r}. 217218Sconklin 218218SconklinUsing the Regex library, you can: 219218Sconklin 220218Sconklin@itemize @bullet 221218Sconklin 222218Sconklin@item 223218Sconklinsee if a string matches a specified pattern as a whole, and 224218Sconklin 225218Sconklin@item 226218Sconklinsearch within a string for a substring matching a specified pattern. 227218Sconklin 228218Sconklin@end itemize 229218Sconklin 230218SconklinSome regular expressions match only one string, i.e., the set they 231218Sconklindescribe has only one member. For example, the regular expression 232218Sconklin@samp{foo} matches the string @samp{foo} and no others. Other regular 233218Sconklinexpressions match more than one string, i.e., the set they describe has 234218Sconklinmore than one member. For example, the regular expression @samp{f*} 235218Sconklinmatches the set of strings made up of any number (including zero) of 236218Sconklin@samp{f}s. As you can see, some characters in regular expressions match 237218Sconklinthemselves (such as @samp{f}) and some don't (such as @samp{*}); the 238218Sconklinones that don't match themselves instead let you specify patterns that 239218Sconklindescribe many different strings. 240218Sconklin 241218SconklinTo either match or search for a regular expression with the Regex 242218Sconklinlibrary functions, you must first compile it with a Regex pattern 243218Sconklincompiling function. A @dfn{compiled pattern} is a regular expression 244218Sconklinconverted to the internal format used by the library functions. Once 245218Sconklinyou've compiled a pattern, you can use it for matching or searching any 246218Sconklinnumber of times. 247218Sconklin 248218SconklinThe Regex library consists of two source files: @file{regex.h} and 249218Sconklin@file{regex.c}. 250218Sconklin@pindex regex.h 251218Sconklin@pindex regex.c 252218SconklinRegex provides three groups of functions with which you can operate on 253218Sconklinregular expressions. One group---the @sc{gnu} group---is more powerful 254218Sconklinbut not completely compatible with the other two, namely the @sc{posix} 255218Sconklinand Berkeley @sc{unix} groups; its interface was designed specifically 256218Sconklinfor @sc{gnu}. The other groups have the same interfaces as do the 257218Sconklinregular expression functions in @sc{posix} and Berkeley 258218Sconklin@sc{unix}. 259218Sconklin 260218SconklinWe wrote this chapter with programmers in mind, not users of 261218Sconklinprograms---such as Emacs---that use Regex. We describe the Regex 262218Sconklinlibrary in its entirety, not how to write regular expressions that a 263218Sconklinparticular program understands. 264218Sconklin 265218Sconklin 266218Sconklin@node Regular Expression Syntax, Common Operators, Overview, Top 267218Sconklin@chapter Regular Expression Syntax 268218Sconklin 269218Sconklin@cindex regular expressions, syntax of 270218Sconklin@cindex syntax of regular expressions 271218Sconklin 272218Sconklin@dfn{Characters} are things you can type. @dfn{Operators} are things in 273218Sconklina regular expression that match one or more characters. You compose 274218Sconklinregular expressions from operators, which in turn you specify using one 275218Sconklinor more characters. 276218Sconklin 277218SconklinMost characters represent what we call the match-self operator, i.e., 278218Sconklinthey match themselves; we call these characters @dfn{ordinary}. Other 279218Sconklincharacters represent either all or parts of fancier operators; e.g., 280218Sconklin@samp{.} represents what we call the match-any-character operator 281218Sconklin(which, no surprise, matches (almost) any character); we call these 282218Sconklincharacters @dfn{special}. Two different things determine what 283218Sconklincharacters represent what operators: 284218Sconklin 285218Sconklin@enumerate 286218Sconklin@item 287218Sconklinthe regular expression syntax your program has told the Regex library to 288218Sconklinrecognize, and 289218Sconklin 290218Sconklin@item 291218Sconklinthe context of the character in the regular expression. 292218Sconklin@end enumerate 293218Sconklin 294218SconklinIn the following sections, we describe these things in more detail. 295218Sconklin 296218Sconklin@menu 297218Sconklin* Syntax Bits:: 298218Sconklin* Predefined Syntaxes:: 299218Sconklin* Collating Elements vs. Characters:: 300218Sconklin* The Backslash Character:: 301218Sconklin@end menu 302218Sconklin 303218Sconklin 304218Sconklin@node Syntax Bits, Predefined Syntaxes, , Regular Expression Syntax 305218Sconklin@section Syntax Bits 306218Sconklin 307218Sconklin@cindex syntax bits 308218Sconklin 309218SconklinIn any particular syntax for regular expressions, some characters are 310218Sconklinalways special, others are sometimes special, and others are never 311218Sconklinspecial. The particular syntax that Regex recognizes for a given 312218Sconklinregular expression depends on the value in the @code{syntax} field of 313218Sconklinthe pattern buffer of that regular expression. 314218Sconklin 315218SconklinYou get a pattern buffer by compiling a regular expression. @xref{GNU 316218SconklinPattern Buffers}, and @ref{POSIX Pattern Buffers}, for more information 317218Sconklinon pattern buffers. @xref{GNU Regular Expression Compiling}, @ref{POSIX 318218SconklinRegular Expression Compiling}, and @ref{BSD Regular Expression 319218SconklinCompiling}, for more information on compiling. 320218Sconklin 321218SconklinRegex considers the value of the @code{syntax} field to be a collection 322218Sconklinof bits; we refer to these bits as @dfn{syntax bits}. In most cases, 323218Sconklinthey affect what characters represent what operators. We describe the 324218Sconklinmeanings of the operators to which we refer in @ref{Common Operators}, 325218Sconklin@ref{GNU Operators}, and @ref{GNU Emacs Operators}. 326218Sconklin 327218SconklinFor reference, here is the complete list of syntax bits, in alphabetical 328218Sconklinorder: 329218Sconklin 330218Sconklin@table @code 331218Sconklin 332218Sconklin@cnindex RE_BACKSLASH_ESCAPE_IN_LIST 333218Sconklin@item RE_BACKSLASH_ESCAPE_IN_LISTS 334218SconklinIf this bit is set, then @samp{\} inside a list (@pxref{List Operators} 335218Sconklinquotes (makes ordinary, if it's special) the following character; if 336218Sconklinthis bit isn't set, then @samp{\} is an ordinary character inside lists. 337218Sconklin(@xref{The Backslash Character}, for what `\' does outside of lists.) 338218Sconklin 339218Sconklin@cnindex RE_BK_PLUS_QM 340218Sconklin@item RE_BK_PLUS_QM 341218SconklinIf this bit is set, then @samp{\+} represents the match-one-or-more 342218Sconklinoperator and @samp{\?} represents the match-zero-or-more operator; if 343218Sconklinthis bit isn't set, then @samp{+} represents the match-one-or-more 344218Sconklinoperator and @samp{?} represents the match-zero-or-one operator. This 345218Sconklinbit is irrelevant if @code{RE_LIMITED_OPS} is set. 346218Sconklin 347218Sconklin@cnindex RE_CHAR_CLASSES 348218Sconklin@item RE_CHAR_CLASSES 349218SconklinIf this bit is set, then you can use character classes in lists; if this 350218Sconklinbit isn't set, then you can't. 351218Sconklin 352218Sconklin@cnindex RE_CONTEXT_INDEP_ANCHORS 353218Sconklin@item RE_CONTEXT_INDEP_ANCHORS 354218SconklinIf this bit is set, then @samp{^} and @samp{$} are special anywhere outside 355218Sconklina list; if this bit isn't set, then these characters are special only in 356218Sconklincertain contexts. @xref{Match-beginning-of-line Operator}, and 357218Sconklin@ref{Match-end-of-line Operator}. 358218Sconklin 359218Sconklin@cnindex RE_CONTEXT_INDEP_OPS 360218Sconklin@item RE_CONTEXT_INDEP_OPS 361218SconklinIf this bit is set, then certain characters are special anywhere outside 362218Sconklina list; if this bit isn't set, then those characters are special only in 363218Sconklinsome contexts and are ordinary elsewhere. Specifically, if this bit 364218Sconklinisn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS} 365218Sconklinisn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, depending 366218Sconklinon the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operators 367218Sconklinonly if they're not first in a regular expression or just after an 368218Sconklinopen-group or alternation operator. The same holds for @samp{@{} (or 369218Sconklin@samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) if 370218Sconklinit is the beginning of a valid interval and the syntax bit 371218Sconklin@code{RE_INTERVALS} is set. 372218Sconklin 373218Sconklin@cnindex RE_CONTEXT_INVALID_OPS 374218Sconklin@item RE_CONTEXT_INVALID_OPS 375218SconklinIf this bit is set, then repetition and alternation operators can't be 376218Sconklinin certain positions within a regular expression. Specifically, the 377218Sconklinregular expression is invalid if it has: 378218Sconklin 379218Sconklin@itemize @bullet 380218Sconklin 381218Sconklin@item 382218Sconklina repetition operator first in the regular expression or just after a 383218Sconklinmatch-beginning-of-line, open-group, or alternation operator; or 384218Sconklin 385218Sconklin@item 386218Sconklinan alternation operator first or last in the regular expression, just 387218Sconklinbefore a match-end-of-line operator, or just after an alternation or 388218Sconklinopen-group operator. 389218Sconklin 390218Sconklin@end itemize 391218Sconklin 392218SconklinIf this bit isn't set, then you can put the characters representing the 393218Sconklinrepetition and alternation characters anywhere in a regular expression. 394218SconklinWhether or not they will in fact be operators in certain positions 395218Sconklindepends on other syntax bits. 396218Sconklin 397218Sconklin@cnindex RE_DOT_NEWLINE 398218Sconklin@item RE_DOT_NEWLINE 399218SconklinIf this bit is set, then the match-any-character operator matches 400218Sconklina newline; if this bit isn't set, then it doesn't. 401218Sconklin 402218Sconklin@cnindex RE_DOT_NOT_NULL 403218Sconklin@item RE_DOT_NOT_NULL 404218SconklinIf this bit is set, then the match-any-character operator doesn't match 405218Sconklina null character; if this bit isn't set, then it does. 406218Sconklin 407218Sconklin@cnindex RE_INTERVALS 408218Sconklin@item RE_INTERVALS 409218SconklinIf this bit is set, then Regex recognizes interval operators; if this bit 410218Sconklinisn't set, then it doesn't. 411218Sconklin 412218Sconklin@cnindex RE_LIMITED_OPS 413218Sconklin@item RE_LIMITED_OPS 414218SconklinIf this bit is set, then Regex doesn't recognize the match-one-or-more, 415218Sconklinmatch-zero-or-one or alternation operators; if this bit isn't set, then 416218Sconklinit does. 417218Sconklin 418218Sconklin@cnindex RE_NEWLINE_ALT 419218Sconklin@item RE_NEWLINE_ALT 420218SconklinIf this bit is set, then newline represents the alternation operator; if 421218Sconklinthis bit isn't set, then newline is ordinary. 422218Sconklin 423218Sconklin@cnindex RE_NO_BK_BRACES 424218Sconklin@item RE_NO_BK_BRACES 425218SconklinIf this bit is set, then @samp{@{} represents the open-interval operator 426218Sconklinand @samp{@}} represents the close-interval operator; if this bit isn't 427218Sconklinset, then @samp{\@{} represents the open-interval operator and 428218Sconklin@samp{\@}} represents the close-interval operator. This bit is relevant 429218Sconklinonly if @code{RE_INTERVALS} is set. 430218Sconklin 431218Sconklin@cnindex RE_NO_BK_PARENS 432218Sconklin@item RE_NO_BK_PARENS 433218SconklinIf this bit is set, then @samp{(} represents the open-group operator and 434218Sconklin@samp{)} represents the close-group operator; if this bit isn't set, then 435218Sconklin@samp{\(} represents the open-group operator and @samp{\)} represents 436218Sconklinthe close-group operator. 437218Sconklin 438218Sconklin@cnindex RE_NO_BK_REFS 439218Sconklin@item RE_NO_BK_REFS 440218SconklinIf this bit is set, then Regex doesn't recognize @samp{\}@var{digit} as 441218Sconklinthe back reference operator; if this bit isn't set, then it does. 442218Sconklin 443218Sconklin@cnindex RE_NO_BK_VBAR 444218Sconklin@item RE_NO_BK_VBAR 445218SconklinIf this bit is set, then @samp{|} represents the alternation operator; 446218Sconklinif this bit isn't set, then @samp{\|} represents the alternation 447218Sconklinoperator. This bit is irrelevant if @code{RE_LIMITED_OPS} is set. 448218Sconklin 449218Sconklin@cnindex RE_NO_EMPTY_RANGES 450218Sconklin@item RE_NO_EMPTY_RANGES 451218SconklinIf this bit is set, then a regular expression with a range whose ending 452218Sconklinpoint collates lower than its starting point is invalid; if this bit 453218Sconklinisn't set, then Regex considers such a range to be empty. 454218Sconklin 455218Sconklin@cnindex RE_UNMATCHED_RIGHT_PAREN_ORD 456218Sconklin@item RE_UNMATCHED_RIGHT_PAREN_ORD 457218SconklinIf this bit is set and the regular expression has no matching open-group 458218Sconklinoperator, then Regex considers what would otherwise be a close-group 459218Sconklinoperator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}. 460218Sconklin 461218Sconklin@end table 462218Sconklin 463218Sconklin 464218Sconklin@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax 465218Sconklin@section Predefined Syntaxes 466218Sconklin 467218SconklinIf you're programming with Regex, you can set a pattern buffer's 468218Sconklin(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers}) 469218Sconklin@code{syntax} field either to an arbitrary combination of syntax bits 470218Sconklin(@pxref{Syntax Bits}) or else to the configurations defined by Regex. 471218SconklinThese configurations define the syntaxes used by certain 472218Sconklinprograms---@sc{gnu} Emacs, 473218Sconklin@cindex Emacs 474218Sconklin@sc{posix} Awk, 475218Sconklin@cindex POSIX Awk 476218Sconklintraditional Awk, 477218Sconklin@cindex Awk 478218SconklinGrep, 479218Sconklin@cindex Grep 480218Sconklin@cindex Egrep 481218SconklinEgrep---in addition to syntaxes for @sc{posix} basic and extended 482218Sconklinregular expressions. 483218Sconklin 484218SconklinThe predefined syntaxes--taken directly from @file{regex.h}---are: 485218Sconklin 486218Sconklin@example 487218Sconklin[[[ syntaxes ]]] 488218Sconklin@end example 489218Sconklin 490218Sconklin@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax 491218Sconklin@section Collating Elements vs.@: Characters 492218Sconklin 493218Sconklin@sc{posix} generalizes the notion of a character to that of a 494218Sconklincollating element. It defines a @dfn{collating element} to be ``a 495218Sconklinsequence of one or more bytes defined in the current collating sequence 496218Sconklinas a unit of collation.'' 497218Sconklin 498218SconklinThis generalizes the notion of a character in 499218Sconklintwo ways. First, a single character can map into two or more collating 500218Sconklinelements. For example, the German 501218Sconklin@tex 502218Sconklin`\ss' 503218Sconklin@end tex 504218Sconklin@ifinfo 505218Sconklin``es-zet'' 506218Sconklin@end ifinfo 507218Sconklincollates as the collating element @samp{s} followed by another collating 508218Sconklinelement @samp{s}. Second, two or more characters can map into one 509218Sconklincollating element. For example, the Spanish @samp{ll} collates after 510218Sconklin@samp{l} and before @samp{m}. 511218Sconklin 512218SconklinSince @sc{posix}'s ``collating element'' preserves the essential idea of 513218Sconklina ``character,'' we use the latter, more familiar, term in this document. 514218Sconklin 515218Sconklin@node The Backslash Character, , Collating Elements vs. Characters, Regular Expression Syntax 516218Sconklin@section The Backslash Character 517218Sconklin 518218Sconklin@cindex \ 519218SconklinThe @samp{\} character has one of four different meanings, depending on 520218Sconklinthe context in which you use it and what syntax bits are set 521218Sconklin(@pxref{Syntax Bits}). It can: 1) stand for itself, 2) quote the next 522218Sconklincharacter, 3) introduce an operator, or 4) do nothing. 523218Sconklin 524218Sconklin@enumerate 525218Sconklin@item 526218SconklinIt stands for itself inside a list 527218Sconklin(@pxref{List Operators}) if the syntax bit 528218Sconklin@code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set. For example, @samp{[\]} 529218Sconklinwould match @samp{\}. 530218Sconklin 531218Sconklin@item 532218SconklinIt quotes (makes ordinary, if it's special) the next character when you 533218Sconklinuse it either: 534218Sconklin 535218Sconklin@itemize @bullet 536218Sconklin@item 537218Sconklinoutside a list,@footnote{Sometimes 538218Sconklinyou don't have to explicitly quote special characters to make 539218Sconklinthem ordinary. For instance, most characters lose any special meaning 540218Sconklininside a list (@pxref{List Operators}). In addition, if the syntax bits 541218Sconklin@code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS} 542218Sconklinaren't set, then (for historical reasons) the matcher considers special 543218Sconklincharacters ordinary if they are in contexts where the operations they 544218Sconklinrepresent make no sense; for example, then the match-zero-or-more 545218Sconklinoperator (represented by @samp{*}) matches itself in the regular 546218Sconklinexpression @samp{*foo} because there is no preceding expression on which 547218Sconklinit can operate. It is poor practice, however, to depend on this 548218Sconklinbehavior; if you want a special character to be ordinary outside a list, 549218Sconklinit's better to always quote it, regardless.} or 550218Sconklin 551218Sconklin@item 552218Sconklininside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set. 553218Sconklin 554218Sconklin@end itemize 555218Sconklin 556218Sconklin@item 557218SconklinIt introduces an operator when followed by certain ordinary 558218Sconklincharacters---sometimes only when certain syntax bits are set. See the 559218Sconklincases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR}, 560218Sconklin@code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}. Also: 561218Sconklin 562218Sconklin@itemize @bullet 563218Sconklin@item 564218Sconklin@samp{\b} represents the match-word-boundary operator 565218Sconklin(@pxref{Match-word-boundary Operator}). 566218Sconklin 567218Sconklin@item 568218Sconklin@samp{\B} represents the match-within-word operator 569218Sconklin(@pxref{Match-within-word Operator}). 570218Sconklin 571218Sconklin@item 572218Sconklin@samp{\<} represents the match-beginning-of-word operator @* 573218Sconklin(@pxref{Match-beginning-of-word Operator}). 574218Sconklin 575218Sconklin@item 576218Sconklin@samp{\>} represents the match-end-of-word operator 577218Sconklin(@pxref{Match-end-of-word Operator}). 578218Sconklin 579218Sconklin@item 580218Sconklin@samp{\w} represents the match-word-constituent operator 581218Sconklin(@pxref{Match-word-constituent Operator}). 582218Sconklin 583218Sconklin@item 584218Sconklin@samp{\W} represents the match-non-word-constituent operator 585218Sconklin(@pxref{Match-non-word-constituent Operator}). 586218Sconklin 587218Sconklin@item 588218Sconklin@samp{\`} represents the match-beginning-of-buffer 589218Sconklinoperator and @samp{\'} represents the match-end-of-buffer operator 590218Sconklin(@pxref{Buffer Operators}). 591218Sconklin 592218Sconklin@item 593218SconklinIf Regex was compiled with the C preprocessor symbol @code{emacs} 594218Sconklindefined, then @samp{\s@var{class}} represents the match-syntactic-class 595218Sconklinoperator and @samp{\S@var{class}} represents the 596218Sconklinmatch-not-syntactic-class operator (@pxref{Syntactic Class Operators}). 597218Sconklin 598218Sconklin@end itemize 599218Sconklin 600218Sconklin@item 601218SconklinIn all other cases, Regex ignores @samp{\}. For example, 602218Sconklin@samp{\n} matches @samp{n}. 603218Sconklin 604218Sconklin@end enumerate 605218Sconklin 606218Sconklin@node Common Operators, GNU Operators, Regular Expression Syntax, Top 607218Sconklin@chapter Common Operators 608218Sconklin 609218SconklinYou compose regular expressions from operators. In the following 610218Sconklinsections, we describe the regular expression operators specified by 611218Sconklin@sc{posix}; @sc{gnu} also uses these. Most operators have more than one 612218Sconklinrepresentation as characters. @xref{Regular Expression Syntax}, for 613218Sconklinwhat characters represent what operators under what circumstances. 614218Sconklin 615218SconklinFor most operators that can be represented in two ways, one 616218Sconklinrepresentation is a single character and the other is that character 617218Sconklinpreceded by @samp{\}. For example, either @samp{(} or @samp{\(} 618218Sconklinrepresents the open-group operator. Which one does depends on the 619218Sconklinsetting of a syntax bit, in this case @code{RE_NO_BK_PARENS}. Why is 620218Sconklinthis so? Historical reasons dictate some of the varying 621218Sconklinrepresentations, while @sc{posix} dictates others. 622218Sconklin 623218SconklinFinally, almost all characters lose any special meaning inside a list 624218Sconklin(@pxref{List Operators}). 625218Sconklin 626218Sconklin@menu 627218Sconklin* Match-self Operator:: Ordinary characters. 628218Sconklin* Match-any-character Operator:: . 629218Sconklin* Concatenation Operator:: Juxtaposition. 630218Sconklin* Repetition Operators:: * + ? @{@} 631218Sconklin* Alternation Operator:: | 632218Sconklin* List Operators:: [...] [^...] 633218Sconklin* Grouping Operators:: (...) 634218Sconklin* Back-reference Operator:: \digit 635218Sconklin* Anchoring Operators:: ^ $ 636218Sconklin@end menu 637218Sconklin 638218Sconklin@node Match-self Operator, Match-any-character Operator, , Common Operators 639218Sconklin@section The Match-self Operator (@var{ordinary character}) 640218Sconklin 641218SconklinThis operator matches the character itself. All ordinary characters 642218Sconklin(@pxref{Regular Expression Syntax}) represent this operator. For 643218Sconklinexample, @samp{f} is always an ordinary character, so the regular 644218Sconklinexpression @samp{f} matches only the string @samp{f}. In 645218Sconklinparticular, it does @emph{not} match the string @samp{ff}. 646218Sconklin 647218Sconklin@node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators 648218Sconklin@section The Match-any-character Operator (@code{.}) 649218Sconklin 650218Sconklin@cindex @samp{.} 651218Sconklin 652218SconklinThis operator matches any single printing or nonprinting character 653218Sconklinexcept it won't match a: 654218Sconklin 655218Sconklin@table @asis 656218Sconklin@item newline 657218Sconklinif the syntax bit @code{RE_DOT_NEWLINE} isn't set. 658218Sconklin 659218Sconklin@item null 660218Sconklinif the syntax bit @code{RE_DOT_NOT_NULL} is set. 661218Sconklin 662218Sconklin@end table 663218Sconklin 664218SconklinThe @samp{.} (period) character represents this operator. For example, 665218Sconklin@samp{a.b} matches any three-character string beginning with @samp{a} 666218Sconklinand ending with @samp{b}. 667218Sconklin 668218Sconklin@node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators 669218Sconklin@section The Concatenation Operator 670218Sconklin 671218SconklinThis operator concatenates two regular expressions @var{a} and @var{b}. 672218SconklinNo character represents this operator; you simply put @var{b} after 673218Sconklin@var{a}. The result is a regular expression that will match a string if 674218Sconklin@var{a} matches its first part and @var{b} matches the rest. For 675218Sconklinexample, @samp{xy} (two match-self operators) matches @samp{xy}. 676218Sconklin 677218Sconklin@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators 678218Sconklin@section Repetition Operators 679218Sconklin 680218SconklinRepetition operators repeat the preceding regular expression a specified 681218Sconklinnumber of times. 682218Sconklin 683218Sconklin@menu 684218Sconklin* Match-zero-or-more Operator:: * 685218Sconklin* Match-one-or-more Operator:: + 686218Sconklin* Match-zero-or-one Operator:: ? 687218Sconklin* Interval Operators:: @{@} 688218Sconklin@end menu 689218Sconklin 690218Sconklin@node Match-zero-or-more Operator, Match-one-or-more Operator, , Repetition Operators 691218Sconklin@subsection The Match-zero-or-more Operator (@code{*}) 692218Sconklin 693218Sconklin@cindex @samp{*} 694218Sconklin 695218SconklinThis operator repeats the smallest possible preceding regular expression 696218Sconklinas many times as necessary (including zero) to match the pattern. 697218Sconklin@samp{*} represents this operator. For example, @samp{o*} 698218Sconklinmatches any string made up of zero or more @samp{o}s. Since this 699218Sconklinoperator operates on the smallest preceding regular expression, 700218Sconklin@samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}. So, 701218Sconklin@samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on. 702218Sconklin 703218SconklinSince the match-zero-or-more operator is a suffix operator, it may be 704218Sconklinuseless as such when no regular expression precedes it. This is the 705218Sconklincase when it: 706218Sconklin 707218Sconklin@itemize @bullet 708218Sconklin@item 709218Sconklinis first in a regular expression, or 710218Sconklin 711218Sconklin@item 712218Sconklinfollows a match-beginning-of-line, open-group, or alternation 713218Sconklinoperator. 714218Sconklin 715218Sconklin@end itemize 716218Sconklin 717218Sconklin@noindent 718218SconklinThree different things can happen in these cases: 719218Sconklin 720218Sconklin@enumerate 721218Sconklin@item 722218SconklinIf the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then the 723218Sconklinregular expression is invalid. 724218Sconklin 725218Sconklin@item 726218SconklinIf @code{RE_CONTEXT_INVALID_OPS} isn't set, but 727218Sconklin@code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents the 728218Sconklinmatch-zero-or-more operator (which then operates on the empty string). 729218Sconklin 730218Sconklin@item 731218SconklinOtherwise, @samp{*} is ordinary. 732218Sconklin 733218Sconklin@end enumerate 734218Sconklin 735218Sconklin@cindex backtracking 736218SconklinThe matcher processes a match-zero-or-more operator by first matching as 737218Sconklinmany repetitions of the smallest preceding regular expression as it can. 738218SconklinThen it continues to match the rest of the pattern. 739218Sconklin 740218SconklinIf it can't match the rest of the pattern, it backtracks (as many times 741218Sconklinas necessary), each time discarding one of the matches until it can 742218Sconklineither match the entire pattern or be certain that it cannot get a 743218Sconklinmatch. For example, when matching @samp{ca*ar} against @samp{caaar}, 744218Sconklinthe matcher first matches all three @samp{a}s of the string with the 745218Sconklin@samp{a*} of the regular expression. However, it cannot then match the 746218Sconklinfinal @samp{ar} of the regular expression against the final @samp{r} of 747218Sconklinthe string. So it backtracks, discarding the match of the last @samp{a} 748218Sconklinin the string. It can then match the remaining @samp{ar}. 749218Sconklin 750218Sconklin 751218Sconklin@node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators 752218Sconklin@subsection The Match-one-or-more Operator (@code{+} or @code{\+}) 753218Sconklin 754218Sconklin@cindex @samp{+} 755218Sconklin 756218SconklinIf the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize 757218Sconklinthis operator. Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't 758218Sconklinset, then @samp{+} represents this operator; if it is, then @samp{\+} 759218Sconklindoes. 760218Sconklin 761218SconklinThis operator is similar to the match-zero-or-more operator except that 762218Sconklinit repeats the preceding regular expression at least once; 763218Sconklin@pxref{Match-zero-or-more Operator}, for what it operates on, how some 764218Sconklinsyntax bits affect it, and how Regex backtracks to match it. 765218Sconklin 766218SconklinFor example, supposing that @samp{+} represents the match-one-or-more 767218Sconklinoperator; then @samp{ca+r} matches, e.g., @samp{car} and 768218Sconklin@samp{caaaar}, but not @samp{cr}. 769218Sconklin 770218Sconklin@node Match-zero-or-one Operator, Interval Operators, Match-one-or-more Operator, Repetition Operators 771218Sconklin@subsection The Match-zero-or-one Operator (@code{?} or @code{\?}) 772218Sconklin@cindex @samp{?} 773218Sconklin 774218SconklinIf the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't 775218Sconklinrecognize this operator. Otherwise, if the syntax bit 776218Sconklin@code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator; 777218Sconklinif it is, then @samp{\?} does. 778218Sconklin 779218SconklinThis operator is similar to the match-zero-or-more operator except that 780218Sconklinit repeats the preceding regular expression once or not at all; 781218Sconklin@pxref{Match-zero-or-more Operator}, to see what it operates on, how 782218Sconklinsome syntax bits affect it, and how Regex backtracks to match it. 783218Sconklin 784218SconklinFor example, supposing that @samp{?} represents the match-zero-or-one 785218Sconklinoperator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, but 786218Sconklinnothing else. 787218Sconklin 788218Sconklin@node Interval Operators, , Match-zero-or-one Operator, Repetition Operators 789218Sconklin@subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}}) 790218Sconklin 791218Sconklin@cindex interval expression 792218Sconklin@cindex @samp{@{} 793218Sconklin@cindex @samp{@}} 794218Sconklin@cindex @samp{\@{} 795218Sconklin@cindex @samp{\@}} 796218Sconklin 797218SconklinIf the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes 798218Sconklin@dfn{interval expressions}. They repeat the smallest possible preceding 799218Sconklinregular expression a specified number of times. 800218Sconklin 801218SconklinIf the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} represents 802218Sconklinthe @dfn{open-interval operator} and @samp{@}} represents the 803218Sconklin@dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do. 804218Sconklin 805218SconklinSpecifically, supposing that @samp{@{} and @samp{@}} represent the 806218Sconklinopen-interval and close-interval operators; then: 807218Sconklin 808218Sconklin@table @code 809218Sconklin@item @{@var{count}@} 810218Sconklinmatches exactly @var{count} occurrences of the preceding regular 811218Sconklinexpression. 812218Sconklin 813218Sconklin@item @{@var{min,}@} 814218Sconklinmatches @var{min} or more occurrences of the preceding regular 815218Sconklinexpression. 816218Sconklin 817218Sconklin@item @{@var{min, max}@} 818218Sconklinmatches at least @var{min} but no more than @var{max} occurrences of 819218Sconklinthe preceding regular expression. 820218Sconklin 821218Sconklin@end table 822218Sconklin 823218SconklinThe interval expression (but not necessarily the regular expression that 824218Sconklincontains it) is invalid if: 825218Sconklin 826218Sconklin@itemize @bullet 827218Sconklin@item 828218Sconklin@var{min} is greater than @var{max}, or 829218Sconklin 830218Sconklin@item 831218Sconklinany of @var{count}, @var{min}, or @var{max} are outside the range 832218Sconklinzero to @code{RE_DUP_MAX} (which symbol @file{regex.h} 833218Sconklindefines). 834218Sconklin 835218Sconklin@end itemize 836218Sconklin 837218SconklinIf the interval expression is invalid and the syntax bit 838218Sconklin@code{RE_NO_BK_BRACES} is set, then Regex considers all the 839218Sconklincharacters in the would-be interval to be ordinary. If that bit 840218Sconklinisn't set, then the regular expression is invalid. 841218Sconklin 842218SconklinIf the interval expression is valid but there is no preceding regular 843218Sconklinexpression on which to operate, then if the syntax bit 844218Sconklin@code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid. 845218SconklinIf that bit isn't set, then Regex considers all the characters---other 846218Sconklinthan backslashes, which it ignores---in the would-be interval to be 847218Sconklinordinary. 848218Sconklin 849218Sconklin 850218Sconklin@node Alternation Operator, List Operators, Repetition Operators, Common Operators 851218Sconklin@section The Alternation Operator (@code{|} or @code{\|}) 852218Sconklin 853218Sconklin@kindex | 854218Sconklin@kindex \| 855218Sconklin@cindex alternation operator 856218Sconklin@cindex or operator 857218Sconklin 858218SconklinIf the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't 859218Sconklinrecognize this operator. Otherwise, if the syntax bit 860218Sconklin@code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator; 861218Sconklinotherwise, @samp{\|} does. 862218Sconklin 863218SconklinAlternatives match one of a choice of regular expressions: 864218Sconklinif you put the character(s) representing the alternation operator between 865218Sconklinany two regular expressions @var{a} and @var{b}, the result matches 866218Sconklinthe union of the strings that @var{a} and @var{b} match. For 867218Sconklinexample, supposing that @samp{|} is the alternation operator, then 868218Sconklin@samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or 869218Sconklin@samp{quux}. 870218Sconklin 871218Sconklin@ignore 872218Sconklin@c Nobody needs to disallow empty alternatives any more. 873218SconklinIf the syntax bit @code{RE_NO_EMPTY_ALTS} is set, then if either of the regular 874218Sconklinexpressions @var{a} or @var{b} is empty, the 875218Sconklinregular expression is invalid. More precisely, if this syntax bit is 876218Sconklinset, then the alternation operator can't: 877218Sconklin 878218Sconklin@itemize @bullet 879218Sconklin@item 880218Sconklinbe first or last in a regular expression; 881218Sconklin 882218Sconklin@item 883218Sconklinfollow either another alternation operator or an open-group operator 884218Sconklin(@pxref{Grouping Operators}); or 885218Sconklin 886218Sconklin@item 887218Sconklinprecede a close-group operator. 888218Sconklin 889218Sconklin@end itemize 890218Sconklin 891218Sconklin@noindent 892218SconklinFor example, supposing @samp{(} and @samp{)} represent the open and 893218Sconklinclose-group operators, then @samp{|foo}, @samp{foo|}, @samp{foo||bar}, 894218Sconklin@samp{foo(|bar)}, and @samp{(foo|)bar} would all be invalid. 895218Sconklin@end ignore 896218Sconklin 897218SconklinThe alternation operator operates on the @emph{largest} possible 898218Sconklinsurrounding regular expressions. (Put another way, it has the lowest 899218Sconklinprecedence of any regular expression operator.) 900218SconklinThus, the only way you can 901218Sconklindelimit its arguments is to use grouping. For example, if @samp{(} and 902218Sconklin@samp{)} are the open and close-group operators, then @samp{fo(o|b)ar} 903218Sconklinwould match either @samp{fooar} or @samp{fobar}. (@samp{foo|bar} would 904218Sconklinmatch @samp{foo} or @samp{bar}.) 905218Sconklin 906218Sconklin@cindex backtracking 907218SconklinThe matcher usually tries all combinations of alternatives so as to 908218Sconklinmatch the longest possible string. For example, when matching 909218Sconklin@samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot 910218Sconklintake, say, the first (``depth-first'') combination it could match, since 911218Sconklinthen it would be content to match just @samp{fooqbar}. 912218Sconklin 913218Sconklin@comment xx something about leftmost-longest 914218Sconklin 915218Sconklin 916218Sconklin@node List Operators, Grouping Operators, Alternation Operator, Common Operators 917218Sconklin@section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]}) 918218Sconklin 919218Sconklin@cindex matching list 920218Sconklin@cindex @samp{[} 921218Sconklin@cindex @samp{]} 922218Sconklin@cindex @samp{^} 923218Sconklin@cindex @samp{-} 924218Sconklin@cindex @samp{\} 925218Sconklin@cindex @samp{[^} 926218Sconklin@cindex nonmatching list 927218Sconklin@cindex matching newline 928218Sconklin@cindex bracket expression 929218Sconklin 930218Sconklin@dfn{Lists}, also called @dfn{bracket expressions}, are a set of one or 931218Sconklinmore items. An @dfn{item} is a character, 932218Sconklin@ignore 933218Sconklin(These get added when they get implemented.) 934218Sconklina collating symbol, an equivalence class expression, 935218Sconklin@end ignore 936218Sconklina character class expression, or a range expression. The syntax bits 937218Sconklinaffect which kinds of items you can put in a list. We explain the last 938218Sconklintwo items in subsections below. Empty lists are invalid. 939218Sconklin 940218SconklinA @dfn{matching list} matches a single character represented by one of 941218Sconklinthe list items. You form a matching list by enclosing one or more items 942218Sconklinwithin an @dfn{open-matching-list operator} (represented by @samp{[}) 943218Sconklinand a @dfn{close-list operator} (represented by @samp{]}). 944218Sconklin 945218SconklinFor example, @samp{[ab]} matches either @samp{a} or @samp{b}. 946218Sconklin@samp{[ad]*} matches the empty string and any string composed of just 947218Sconklin@samp{a}s and @samp{d}s in any order. Regex considers invalid a regular 948218Sconklinexpression with a @samp{[} but no matching 949218Sconklin@samp{]}. 950218Sconklin 951218Sconklin@dfn{Nonmatching lists} are similar to matching lists except that they 952218Sconklinmatch a single character @emph{not} represented by one of the list 953218Sconklinitems. You use an @dfn{open-nonmatching-list operator} (represented by 954218Sconklin@samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to be 955218Sconklinthe first character in the list. If you put a @samp{^} character first 956218Sconklinin (what you think is) a matching list, you'll turn it into a 957218Sconklinnonmatching list.}) instead of an open-matching-list operator to start a 958218Sconklinnonmatching list. 959218Sconklin 960218SconklinFor example, @samp{[^ab]} matches any character except @samp{a} or 961218Sconklin@samp{b}. 962218Sconklin 963218SconklinIf the @code{posix_newline} field in the pattern buffer (@pxref{GNU 964218SconklinPattern Buffers} is set, then nonmatching lists do not match a newline. 965218Sconklin 966218SconklinMost characters lose any special meaning inside a list. The special 967218Sconklincharacters inside a list follow. 968218Sconklin 969218Sconklin@table @samp 970218Sconklin@item ] 971218Sconklinends the list if it's not the first list item. So, if you want to make 972218Sconklinthe @samp{]} character a list item, you must put it first. 973218Sconklin 974218Sconklin@item \ 975218Sconklinquotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is 976218Sconklinset. 977218Sconklin 978218Sconklin@ignore 979218SconklinPut these in if they get implemented. 980218Sconklin 981218Sconklin@item [. 982218Sconklinrepresents the open-collating-symbol operator (@pxref{Collating Symbol 983218SconklinOperators}). 984218Sconklin 985218Sconklin@item .] 986218Sconklinrepresents the close-collating-symbol operator. 987218Sconklin 988218Sconklin@item [= 989218Sconklinrepresents the open-equivalence-class operator (@pxref{Equivalence Class 990218SconklinOperators}). 991218Sconklin 992218Sconklin@item =] 993218Sconklinrepresents the close-equivalence-class operator. 994218Sconklin 995218Sconklin@end ignore 996218Sconklin 997218Sconklin@item [: 998218Sconklinrepresents the open-character-class operator (@pxref{Character Class 999218SconklinOperators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and what 1000218Sconklinfollows is a valid character class expression. 1001218Sconklin 1002218Sconklin@item :] 1003218Sconklinrepresents the close-character-class operator if the syntax bit 1004218Sconklin@code{RE_CHAR_CLASSES} is set and what precedes it is an 1005218Sconklinopen-character-class operator followed by a valid character class name. 1006218Sconklin 1007218Sconklin@item - 1008218Sconklinrepresents the range operator (@pxref{Range Operator}) if it's 1009218Sconklinnot first or last in a list or the ending point of a range. 1010218Sconklin 1011218Sconklin@end table 1012218Sconklin 1013218Sconklin@noindent 1014218SconklinAll other characters are ordinary. For example, @samp{[.*]} matches 1015218Sconklin@samp{.} and @samp{*}. 1016218Sconklin 1017218Sconklin@menu 1018218Sconklin* Character Class Operators:: [:class:] 1019218Sconklin* Range Operator:: start-end 1020218Sconklin@end menu 1021218Sconklin 1022218Sconklin@ignore 1023218Sconklin(If collating symbols and equivalence class expressions get implemented, 1024218Sconklinthen add this.) 1025218Sconklin 1026218Sconklinnode Collating Symbol Operators 1027218Sconklinsubsubsection Collating Symbol Operators (@code{[.} @dots{} @code{.]}) 1028218Sconklin 1029218SconklinIf the syntax bit @code{XX} is set, then you can represent 1030218Sconklincollating symbols inside lists. You form a @dfn{collating symbol} by 1031218Sconklinputting a collating element between an @dfn{open-collating-symbol 1032218Sconklinoperator} and an @dfn{close-collating-symbol operator}. @samp{[.} 1033218Sconklinrepresents the open-collating-symbol operator and @samp{.]} represents 1034218Sconklinthe close-collating-symbol operator. For example, if @samp{ll} is a 1035218Sconklincollating element, then @samp{[[.ll.]]} would match @samp{ll}. 1036218Sconklin 1037218Sconklinnode Equivalence Class Operators 1038218Sconklinsubsubsection Equivalence Class Operators (@code{[=} @dots{} @code{=]}) 1039218Sconklin@cindex equivalence class expression in regex 1040218Sconklin@cindex @samp{[=} in regex 1041218Sconklin@cindex @samp{=]} in regex 1042218Sconklin 1043218SconklinIf the syntax bit @code{XX} is set, then Regex recognizes equivalence class 1044218Sconklinexpressions inside lists. A @dfn{equivalence class expression} is a set 1045218Sconklinof collating elements which all belong to the same equivalence class. 1046218SconklinYou form an equivalence class expression by putting a collating 1047218Sconklinelement between an @dfn{open-equivalence-class operator} and a 1048218Sconklin@dfn{close-equivalence-class operator}. @samp{[=} represents the 1049218Sconklinopen-equivalence-class operator and @samp{=]} represents the 1050218Sconklinclose-equivalence-class operator. For example, if @samp{a} and @samp{A} 1051218Sconklinwere an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]} 1052218Sconklinwould match both @samp{a} and @samp{A}. If the collating element in an 1053218Sconklinequivalence class expression isn't part of an equivalence class, then 1054218Sconklinthe matcher considers the equivalence class expression to be a collating 1055218Sconklinsymbol. 1056218Sconklin 1057218Sconklin@end ignore 1058218Sconklin 1059218Sconklin@node Character Class Operators, Range Operator, , List Operators 1060218Sconklin@subsection Character Class Operators (@code{[:} @dots{} @code{:]}) 1061218Sconklin 1062218Sconklin@cindex character classes 1063218Sconklin@cindex @samp{[:} in regex 1064218Sconklin@cindex @samp{:]} in regex 1065218Sconklin 1066218SconklinIf the syntax bit @code{RE_CHARACTER_CLASSES} is set, then Regex 1067218Sconklinrecognizes character class expressions inside lists. A @dfn{character 1068218Sconklinclass expression} matches one character from a given class. You form a 1069218Sconklincharacter class expression by putting a character class name between an 1070218Sconklin@dfn{open-character-class operator} (represented by @samp{[:}) and a 1071218Sconklin@dfn{close-character-class operator} (represented by @samp{:]}). The 1072218Sconklincharacter class names and their meanings are: 1073218Sconklin 1074218Sconklin@table @code 1075218Sconklin 1076218Sconklin@item alnum 1077218Sconklinletters and digits 1078218Sconklin 1079218Sconklin@item alpha 1080218Sconklinletters 1081218Sconklin 1082218Sconklin@item blank 1083218Sconklinsystem-dependent; for @sc{gnu}, a space or tab 1084218Sconklin 1085218Sconklin@item cntrl 1086218Sconklincontrol characters (in the @sc{ascii} encoding, code 0177 and codes 1087218Sconklinless than 040) 1088218Sconklin 1089218Sconklin@item digit 1090218Sconklindigits 1091218Sconklin 1092218Sconklin@item graph 1093218Sconklinsame as @code{print} except omits space 1094218Sconklin 1095218Sconklin@item lower 1096218Sconklinlowercase letters 1097218Sconklin 1098218Sconklin@item print 1099218Sconklinprintable characters (in the @sc{ascii} encoding, space 1100218Sconklintilde---codes 040 through 0176) 1101218Sconklin 1102218Sconklin@item punct 1103218Sconklinneither control nor alphanumeric characters 1104218Sconklin 1105218Sconklin@item space 1106218Sconklinspace, carriage return, newline, vertical tab, and form feed 1107218Sconklin 1108218Sconklin@item upper 1109218Sconklinuppercase letters 1110218Sconklin 1111218Sconklin@item xdigit 1112218Sconklinhexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F} 1113218Sconklin 1114218Sconklin@end table 1115218Sconklin 1116218Sconklin@noindent 1117218SconklinThese correspond to the definitions in the C library's @file{<ctype.h>} 1118218Sconklinfacility. For example, @samp{[:alpha:]} corresponds to the standard 1119218Sconklinfacility @code{isalpha}. Regex recognizes character class expressions 1120218Sconklinonly inside of lists; so @samp{[[:alpha:]]} matches any letter, but 1121218Sconklin@samp{[:alpha:]} outside of a bracket expression and not followed by a 1122218Sconklinrepetition operator matches just itself. 1123218Sconklin 1124218Sconklin@node Range Operator, , Character Class Operators, List Operators 1125218Sconklin@subsection The Range Operator (@code{-}) 1126218Sconklin 1127218SconklinRegex recognizes @dfn{range expressions} inside a list. They represent 1128218Sconklinthose characters 1129218Sconklinthat fall between two elements in the current collating sequence. You 1130218Sconklinform a range expression by putting a @dfn{range operator} between two 1131218Sconklin@ignore 1132218Sconklin(If these get implemented, then substitute this for ``characters.'') 1133218Sconklinof any of the following: characters, collating elements, collating symbols, 1134218Sconklinand equivalence class expressions. The starting point of the range and 1135218Sconklinthe ending point of the range don't have to be the same kind of item, 1136218Sconkline.g., the starting point could be a collating element and the ending 1137218Sconklinpoint could be an equivalence class expression. If a range's ending 1138218Sconklinpoint is an equivalence class, then all the collating elements in that 1139218Sconklinclass will be in the range. 1140218Sconklin@end ignore 1141218Sconklincharacters.@footnote{You can't use a character class for the starting 1142218Sconklinor ending point of a range, since a character class is not a single 1143218Sconklincharacter.} @samp{-} represents the range operator. For example, 1144218Sconklin@samp{a-f} within a list represents all the characters from @samp{a} 1145218Sconklinthrough @samp{f} 1146218Sconklininclusively. 1147218Sconklin 1148218SconklinIf the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range's 1149218Sconklinending point collates less than its starting point, the range (and the 1150218Sconklinregular expression containing it) is invalid. For example, the regular 1151218Sconklinexpression @samp{[z-a]} would be invalid. If this bit isn't set, then 1152218SconklinRegex considers such a range to be empty. 1153218Sconklin 1154218SconklinSince @samp{-} represents the range operator, if you want to make a 1155218Sconklin@samp{-} character itself 1156218Sconklina list item, you must do one of the following: 1157218Sconklin 1158218Sconklin@itemize @bullet 1159218Sconklin@item 1160218SconklinPut the @samp{-} either first or last in the list. 1161218Sconklin 1162218Sconklin@item 1163218SconklinInclude a range whose starting point collates strictly lower than 1164218Sconklin@samp{-} and whose ending point collates equal or higher. Unless a 1165218Sconklinrange is the first item in a list, a @samp{-} can't be its starting 1166218Sconklinpoint, but @emph{can} be its ending point. That is because Regex 1167218Sconklinconsiders @samp{-} to be the range operator unless it is preceded by 1168218Sconklinanother @samp{-}. For example, in the @sc{ascii} encoding, @samp{)}, 1169218Sconklin@samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} are 1170218Sconklincontiguous characters in the collating sequence. You might think that 1171218Sconklin@samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}. Rather, it 1172218Sconklinhas the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, so 1173218Sconklinit matches, e.g., @samp{,}, not @samp{.}. 1174218Sconklin 1175218Sconklin@item 1176218SconklinPut a range whose starting point is @samp{-} first in the list. 1177218Sconklin 1178218Sconklin@end itemize 1179218Sconklin 1180218SconklinFor example, @samp{[-a-z]} matches a lowercase letter or a hyphen (in 1181218SconklinEnglish, in @sc{ascii}). 1182218Sconklin 1183218Sconklin 1184218Sconklin@node Grouping Operators, Back-reference Operator, List Operators, Common Operators 1185218Sconklin@section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)}) 1186218Sconklin 1187218Sconklin@kindex ( 1188218Sconklin@kindex ) 1189218Sconklin@kindex \( 1190218Sconklin@kindex \) 1191218Sconklin@cindex grouping 1192218Sconklin@cindex subexpressions 1193218Sconklin@cindex parenthesizing 1194218Sconklin 1195218SconklinA @dfn{group}, also known as a @dfn{subexpression}, consists of an 1196218Sconklin@dfn{open-group operator}, any number of other operators, and a 1197218Sconklin@dfn{close-group operator}. Regex treats this sequence as a unit, just 1198218Sconklinas mathematics and programming languages treat a parenthesized 1199218Sconklinexpression as a unit. 1200218Sconklin 1201218SconklinTherefore, using @dfn{groups}, you can: 1202218Sconklin 1203218Sconklin@itemize @bullet 1204218Sconklin@item 1205218Sconklindelimit the argument(s) to an alternation operator (@pxref{Alternation 1206218SconklinOperator}) or a repetition operator (@pxref{Repetition 1207218SconklinOperators}). 1208218Sconklin 1209218Sconklin@item 1210218Sconklinkeep track of the indices of the substring that matched a given group. 1211218Sconklin@xref{Using Registers}, for a precise explanation. 1212218SconklinThis lets you: 1213218Sconklin 1214218Sconklin@itemize @bullet 1215218Sconklin@item 1216218Sconklinuse the back-reference operator (@pxref{Back-reference Operator}). 1217218Sconklin 1218218Sconklin@item 1219218Sconklinuse registers (@pxref{Using Registers}). 1220218Sconklin 1221218Sconklin@end itemize 1222218Sconklin 1223218Sconklin@end itemize 1224218Sconklin 1225218SconklinIf the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} represents 1226218Sconklinthe open-group operator and @samp{)} represents the 1227218Sconklinclose-group operator; otherwise, @samp{\(} and @samp{\)} do. 1228218Sconklin 1229218SconklinIf the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and a 1230218Sconklinclose-group operator has no matching open-group operator, then Regex 1231218Sconklinconsiders it to match @samp{)}. 1232218Sconklin 1233218Sconklin 1234218Sconklin@node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators 1235218Sconklin@section The Back-reference Operator (@dfn{\}@var{digit}) 1236218Sconklin 1237218Sconklin@cindex back references 1238218Sconklin 1239218SconklinIf the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizes 1240218Sconklinback references. A back reference matches a specified preceding group. 1241218SconklinThe back reference operator is represented by @samp{\@var{digit}} 1242218Sconklinanywhere after the end of a regular expression's @w{@var{digit}-th} 1243218Sconklingroup (@pxref{Grouping Operators}). 1244218Sconklin 1245218Sconklin@var{digit} must be between @samp{1} and @samp{9}. The matcher assigns 1246218Sconklinnumbers 1 through 9 to the first nine groups it encounters. By using 1247218Sconklinone of @samp{\1} through @samp{\9} after the corresponding group's 1248218Sconklinclose-group operator, you can match a substring identical to the 1249218Sconklinone that the group does. 1250218Sconklin 1251218SconklinBack references match according to the following (in all examples below, 1252218Sconklin@samp{(} represents the open-group, @samp{)} the close-group, @samp{@{} 1253218Sconklinthe open-interval and @samp{@}} the close-interval operator): 1254218Sconklin 1255218Sconklin@itemize @bullet 1256218Sconklin@item 1257218SconklinIf the group matches a substring, the back reference matches an 1258218Sconklinidentical substring. For example, @samp{(a)\1} matches @samp{aa} and 1259218Sconklin@samp{(bana)na\1bo\1} matches @samp{bananabanabobana}. Likewise, 1260218Sconklin@samp{(.*)\1} matches any (newline-free if the syntax bit 1261218Sconklin@code{RE_DOT_NEWLINE} isn't set) string that is composed of two 1262218Sconklinidentical halves; the @samp{(.*)} matches the first half and the 1263218Sconklin@samp{\1} matches the second half. 1264218Sconklin 1265218Sconklin@item 1266218SconklinIf the group matches more than once (as it might if followed 1267218Sconklinby, e.g., a repetition operator), then the back reference matches the 1268218Sconklinsubstring the group @emph{last} matched. For example, 1269218Sconklin@samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (the 1270218Sconklinouter one) matches @samp{aab} and @w{group 2} (the inner one) matches 1271218Sconklin@samp{aa}. Then @w{group 1} matches @samp{ab} and @w{group 2} matches 1272218Sconklin@samp{a}. So, @samp{\1} matches @samp{ab} and @samp{\2} matches 1273218Sconklin@samp{a}. 1274218Sconklin 1275218Sconklin@item 1276218SconklinIf the group doesn't participate in a match, i.e., it is part of an 1277218Sconklinalternative not taken or a repetition operator allows zero repetitions 1278218Sconklinof it, then the back reference makes the whole match fail. For example, 1279218Sconklin@samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three} 1280218Sconklinand @samp{two-and-four}, but not @samp{one-and-four} or 1281218Sconklin@samp{two-and-three}. For example, if the pattern matches 1282218Sconklin@samp{one-and-}, then its @w{group 2} matches the empty string and its 1283218Sconklin@w{group 3} doesn't participate in the match. So, if it then matches 1284218Sconklin@samp{four}, then when it tries to back reference @w{group 3}---which it 1285218Sconklinwill attempt to do because @samp{\3} follows the @samp{four}---the match 1286218Sconklinwill fail because @w{group 3} didn't participate in the match. 1287218Sconklin 1288218Sconklin@end itemize 1289218Sconklin 1290218SconklinYou can use a back reference as an argument to a repetition operator. For 1291218Sconklinexample, @samp{(a(b))\2*} matches @samp{a} followed by two or more 1292218Sconklin@samp{b}s. Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}. 1293218Sconklin 1294218SconklinIf there is no preceding @w{@var{digit}-th} subexpression, the regular 1295218Sconklinexpression is invalid. 1296218Sconklin 1297218Sconklin 1298218Sconklin@node Anchoring Operators, , Back-reference Operator, Common Operators 1299218Sconklin@section Anchoring Operators 1300218Sconklin 1301218Sconklin@cindex anchoring 1302218Sconklin@cindex regexp anchoring 1303218Sconklin 1304218SconklinThese operators can constrain a pattern to match only at the beginning or 1305218Sconklinend of the entire string or at the beginning or end of a line. 1306218Sconklin 1307218Sconklin@menu 1308218Sconklin* Match-beginning-of-line Operator:: ^ 1309218Sconklin* Match-end-of-line Operator:: $ 1310218Sconklin@end menu 1311218Sconklin 1312218Sconklin 1313218Sconklin@node Match-beginning-of-line Operator, Match-end-of-line Operator, , Anchoring Operators 1314218Sconklin@subsection The Match-beginning-of-line Operator (@code{^}) 1315218Sconklin 1316218Sconklin@kindex ^ 1317218Sconklin@cindex beginning-of-line operator 1318218Sconklin@cindex anchors 1319218Sconklin 1320218SconklinThis operator can match the empty string either at the beginning of the 1321218Sconklinstring or after a newline character. Thus, it is said to @dfn{anchor} 1322218Sconklinthe pattern to the beginning of a line. 1323218Sconklin 1324218SconklinIn the cases following, @samp{^} represents this operator. (Otherwise, 1325218Sconklin@samp{^} is ordinary.) 1326218Sconklin 1327218Sconklin@itemize @bullet 1328218Sconklin 1329218Sconklin@item 1330218SconklinIt (the @samp{^}) is first in the pattern, as in @samp{^foo}. 1331218Sconklin 1332218Sconklin@cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})} 1333218Sconklin@item 1334218SconklinThe syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outside 1335218Sconklina bracket expression. 1336218Sconklin 1337218Sconklin@cindex open-group operator and @samp{^} 1338218Sconklin@cindex alternation operator and @samp{^} 1339218Sconklin@item 1340218SconklinIt follows an open-group or alternation operator, as in @samp{a\(^b\)} 1341218Sconklinand @samp{a\|^b}. @xref{Grouping Operators}, and @ref{Alternation 1342218SconklinOperator}. 1343218Sconklin 1344218Sconklin@end itemize 1345218Sconklin 1346218SconklinThese rules imply that some valid patterns containing @samp{^} cannot be 1347218Sconklinmatched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS} 1348218Sconklinis set. 1349218Sconklin 1350218Sconklin@vindex not_bol @r{field in pattern buffer} 1351218SconklinIf the @code{not_bol} field is set in the pattern buffer (@pxref{GNU 1352218SconklinPattern Buffers}), then @samp{^} fails to match at the beginning of the 1353218Sconklinstring. @xref{POSIX Matching}, for when you might find this useful. 1354218Sconklin 1355218Sconklin@vindex newline_anchor @r{field in pattern buffer} 1356218SconklinIf the @code{newline_anchor} field is set in the pattern buffer, then 1357218Sconklin@samp{^} fails to match after a newline. This is useful when you do not 1358218Sconklinregard the string to be matched as broken into lines. 1359218Sconklin 1360218Sconklin 1361218Sconklin@node Match-end-of-line Operator, , Match-beginning-of-line Operator, Anchoring Operators 1362218Sconklin@subsection The Match-end-of-line Operator (@code{$}) 1363218Sconklin 1364218Sconklin@kindex $ 1365218Sconklin@cindex end-of-line operator 1366218Sconklin@cindex anchors 1367218Sconklin 1368218SconklinThis operator can match the empty string either at the end of 1369218Sconklinthe string or before a newline character in the string. Thus, it is 1370218Sconklinsaid to @dfn{anchor} the pattern to the end of a line. 1371218Sconklin 1372218SconklinIt is always represented by @samp{$}. For example, @samp{foo$} usually 1373218Sconklinmatches, e.g., @samp{foo} and, e.g., the first three characters of 1374218Sconklin@samp{foo\nbar}. 1375218Sconklin 1376218SconklinIts interaction with the syntax bits and pattern buffer fields is 1377218Sconklinexactly the dual of @samp{^}'s; see the previous section. (That is, 1378218Sconklin``beginning'' becomes ``end'', ``next'' becomes ``previous'', and 1379218Sconklin``after'' becomes ``before''.) 1380218Sconklin 1381218Sconklin 1382218Sconklin@node GNU Operators, GNU Emacs Operators, Common Operators, Top 1383218Sconklin@chapter GNU Operators 1384218Sconklin 1385218SconklinFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't). 1386218Sconklin 1387218Sconklin@menu 1388218Sconklin* Word Operators:: 1389218Sconklin* Buffer Operators:: 1390218Sconklin@end menu 1391218Sconklin 1392218Sconklin@node Word Operators, Buffer Operators, , GNU Operators 1393218Sconklin@section Word Operators 1394218Sconklin 1395218SconklinThe operators in this section require Regex to recognize parts of words. 1396218SconklinRegex uses a syntax table to determine whether or not a character is 1397218Sconklinpart of a word, i.e., whether or not it is @dfn{word-constituent}. 1398218Sconklin 1399218Sconklin@menu 1400218Sconklin* Non-Emacs Syntax Tables:: 1401218Sconklin* Match-word-boundary Operator:: \b 1402218Sconklin* Match-within-word Operator:: \B 1403218Sconklin* Match-beginning-of-word Operator:: \< 1404218Sconklin* Match-end-of-word Operator:: \> 1405218Sconklin* Match-word-constituent Operator:: \w 1406218Sconklin* Match-non-word-constituent Operator:: \W 1407218Sconklin@end menu 1408218Sconklin 1409218Sconklin@node Non-Emacs Syntax Tables, Match-word-boundary Operator, , Word Operators 1410218Sconklin@subsection Non-Emacs Syntax Tables 1411218Sconklin 1412218SconklinA @dfn{syntax table} is an array indexed by the characters in your 1413218Sconklincharacter set. In the @sc{ascii} encoding, therefore, a syntax table 1414218Sconklinhas 256 elements. Regex always uses a @code{char *} variable 1415218Sconklin@code{re_syntax_table} as its syntax table. In some cases, it 1416218Sconklininitializes this variable and in others it expects you to initialize it. 1417218Sconklin 1418218Sconklin@itemize @bullet 1419218Sconklin@item 1420218SconklinIf Regex is compiled with the preprocessor symbols @code{emacs} and 1421218Sconklin@code{SYNTAX_TABLE} both undefined, then Regex allocates 1422218Sconklin@code{re_syntax_table} and initializes an element @var{i} either to 1423218Sconklin@code{Sword} (which it defines) if @var{i} is a letter, number, or 1424218Sconklin@samp{_}, or to zero if it's not. 1425218Sconklin 1426218Sconklin@item 1427218SconklinIf Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE} 1428218Sconklindefined, then Regex expects you to define a @code{char *} variable 1429218Sconklin@code{re_syntax_table} to be a valid syntax table. 1430218Sconklin 1431218Sconklin@item 1432218Sconklin@xref{Emacs Syntax Tables}, for what happens when Regex is compiled with 1433218Sconklinthe preprocessor symbol @code{emacs} defined. 1434218Sconklin 1435218Sconklin@end itemize 1436218Sconklin 1437218Sconklin@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators 1438218Sconklin@subsection The Match-word-boundary Operator (@code{\b}) 1439218Sconklin 1440218Sconklin@cindex @samp{\b} 1441218Sconklin@cindex word boundaries, matching 1442218Sconklin 1443218SconklinThis operator (represented by @samp{\b}) matches the empty string at 1444218Sconklineither the beginning or the end of a word. For example, @samp{\brat\b} 1445218Sconklinmatches the separate word @samp{rat}. 1446218Sconklin 1447218Sconklin@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators 1448218Sconklin@subsection The Match-within-word Operator (@code{\B}) 1449218Sconklin 1450218Sconklin@cindex @samp{\B} 1451218Sconklin 1452218SconklinThis operator (represented by @samp{\B}) matches the empty string within 1453218Sconklina word. For example, @samp{c\Brat\Be} matches @samp{crate}, but 1454218Sconklin@samp{dirty \Brat} doesn't match @samp{dirty rat}. 1455218Sconklin 1456218Sconklin@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators 1457218Sconklin@subsection The Match-beginning-of-word Operator (@code{\<}) 1458218Sconklin 1459218Sconklin@cindex @samp{\<} 1460218Sconklin 1461218SconklinThis operator (represented by @samp{\<}) matches the empty string at the 1462218Sconklinbeginning of a word. 1463218Sconklin 1464218Sconklin@node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators 1465218Sconklin@subsection The Match-end-of-word Operator (@code{\>}) 1466218Sconklin 1467218Sconklin@cindex @samp{\>} 1468218Sconklin 1469218SconklinThis operator (represented by @samp{\>}) matches the empty string at the 1470218Sconklinend of a word. 1471218Sconklin 1472218Sconklin@node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators 1473218Sconklin@subsection The Match-word-constituent Operator (@code{\w}) 1474218Sconklin 1475218Sconklin@cindex @samp{\w} 1476218Sconklin 1477218SconklinThis operator (represented by @samp{\w}) matches any word-constituent 1478218Sconklincharacter. 1479218Sconklin 1480218Sconklin@node Match-non-word-constituent Operator, , Match-word-constituent Operator, Word Operators 1481218Sconklin@subsection The Match-non-word-constituent Operator (@code{\W}) 1482218Sconklin 1483218Sconklin@cindex @samp{\W} 1484218Sconklin 1485218SconklinThis operator (represented by @samp{\W}) matches any character that is 1486218Sconklinnot word-constituent. 1487218Sconklin 1488218Sconklin 1489218Sconklin@node Buffer Operators, , Word Operators, GNU Operators 1490218Sconklin@section Buffer Operators 1491218Sconklin 1492218SconklinFollowing are operators which work on buffers. In Emacs, a @dfn{buffer} 1493218Sconklinis, naturally, an Emacs buffer. For other programs, Regex considers the 1494218Sconklinentire string to be matched as the buffer. 1495218Sconklin 1496218Sconklin@menu 1497218Sconklin* Match-beginning-of-buffer Operator:: \` 1498218Sconklin* Match-end-of-buffer Operator:: \' 1499218Sconklin@end menu 1500218Sconklin 1501218Sconklin 1502218Sconklin@node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator, , Buffer Operators 1503218Sconklin@subsection The Match-beginning-of-buffer Operator (@code{\`}) 1504218Sconklin 1505218Sconklin@cindex @samp{\`} 1506218Sconklin 1507218SconklinThis operator (represented by @samp{\`}) matches the empty string at the 1508218Sconklinbeginning of the buffer. 1509218Sconklin 1510218Sconklin@node Match-end-of-buffer Operator, , Match-beginning-of-buffer Operator, Buffer Operators 1511218Sconklin@subsection The Match-end-of-buffer Operator (@code{\'}) 1512218Sconklin 1513218Sconklin@cindex @samp{\'} 1514218Sconklin 1515218SconklinThis operator (represented by @samp{\'}) matches the empty string at the 1516218Sconklinend of the buffer. 1517218Sconklin 1518218Sconklin 1519218Sconklin@node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top 1520218Sconklin@chapter GNU Emacs Operators 1521218Sconklin 1522218SconklinFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't) 1523218Sconklinthat you can use only when Regex is compiled with the preprocessor 1524218Sconklinsymbol @code{emacs} defined. 1525218Sconklin 1526218Sconklin@menu 1527218Sconklin* Syntactic Class Operators:: 1528218Sconklin@end menu 1529218Sconklin 1530218Sconklin 1531218Sconklin@node Syntactic Class Operators, , , GNU Emacs Operators 1532218Sconklin@section Syntactic Class Operators 1533218Sconklin 1534218SconklinThe operators in this section require Regex to recognize the syntactic 1535218Sconklinclasses of characters. Regex uses a syntax table to determine this. 1536218Sconklin 1537218Sconklin@menu 1538218Sconklin* Emacs Syntax Tables:: 1539218Sconklin* Match-syntactic-class Operator:: \sCLASS 1540218Sconklin* Match-not-syntactic-class Operator:: \SCLASS 1541218Sconklin@end menu 1542218Sconklin 1543218Sconklin@node Emacs Syntax Tables, Match-syntactic-class Operator, , Syntactic Class Operators 1544218Sconklin@subsection Emacs Syntax Tables 1545218Sconklin 1546218SconklinA @dfn{syntax table} is an array indexed by the characters in your 1547218Sconklincharacter set. In the @sc{ascii} encoding, therefore, a syntax table 1548218Sconklinhas 256 elements. 1549218Sconklin 1550218SconklinIf Regex is compiled with the preprocessor symbol @code{emacs} defined, 1551218Sconklinthen Regex expects you to define and initialize the variable 1552218Sconklin@code{re_syntax_table} to be an Emacs syntax table. Emacs' syntax 1553218Sconklintables are more complicated than Regex's own (@pxref{Non-Emacs Syntax 1554218SconklinTables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual}, 1555218Sconklinfor a description of Emacs' syntax tables. 1556218Sconklin 1557218Sconklin@node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators 1558218Sconklin@subsection The Match-syntactic-class Operator (@code{\s}@var{class}) 1559218Sconklin 1560218Sconklin@cindex @samp{\s} 1561218Sconklin 1562218SconklinThis operator matches any character whose syntactic class is represented 1563218Sconklinby a specified character. @samp{\s@var{class}} represents this operator 1564218Sconklinwhere @var{class} is the character representing the syntactic class you 1565218Sconklinwant. For example, @samp{w} represents the syntactic 1566218Sconklinclass of word-constituent characters, so @samp{\sw} matches any 1567218Sconklinword-constituent character. 1568218Sconklin 1569218Sconklin@node Match-not-syntactic-class Operator, , Match-syntactic-class Operator, Syntactic Class Operators 1570218Sconklin@subsection The Match-not-syntactic-class Operator (@code{\S}@var{class}) 1571218Sconklin 1572218Sconklin@cindex @samp{\S} 1573218Sconklin 1574218SconklinThis operator is similar to the match-syntactic-class operator except 1575218Sconklinthat it matches any character whose syntactic class is @emph{not} 1576218Sconklinrepresented by the specified character. @samp{\S@var{class}} represents 1577218Sconklinthis operator. For example, @samp{w} represents the syntactic class of 1578218Sconklinword-constituent characters, so @samp{\Sw} matches any character that is 1579218Sconklinnot word-constituent. 1580218Sconklin 1581218Sconklin 1582218Sconklin@node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top 1583218Sconklin@chapter What Gets Matched? 1584218Sconklin 1585218SconklinRegex usually matches strings according to the ``leftmost longest'' 1586218Sconklinrule; that is, it chooses the longest of the leftmost matches. This 1587218Sconklindoes not mean that for a regular expression containing subexpressions 1588218Sconklinthat it simply chooses the longest match for each subexpression, left to 1589218Sconklinright; the overall match must also be the longest possible one. 1590218Sconklin 1591218SconklinFor example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not 1592218Sconklin@samp{acdac}, as it would if it were to choose the longest match for the 1593218Sconklinfirst subexpression. 1594218Sconklin 1595218Sconklin 1596218Sconklin@node Programming with Regex, Copying, What Gets Matched?, Top 1597218Sconklin@chapter Programming with Regex 1598218Sconklin 1599218SconklinHere we describe how you use the Regex data structures and functions in 1600218SconklinC programs. Regex has three interfaces: one designed for @sc{gnu}, one 1601218Sconklincompatible with @sc{posix} and one compatible with Berkeley @sc{unix}. 1602218Sconklin 1603218Sconklin@menu 1604218Sconklin* GNU Regex Functions:: 1605218Sconklin* POSIX Regex Functions:: 1606218Sconklin* BSD Regex Functions:: 1607218Sconklin@end menu 1608218Sconklin 1609218Sconklin 1610218Sconklin@node GNU Regex Functions, POSIX Regex Functions, , Programming with Regex 1611218Sconklin@section GNU Regex Functions 1612218Sconklin 1613218SconklinIf you're writing code that doesn't need to be compatible with either 1614218Sconklin@sc{posix} or Berkeley @sc{unix}, you can use these functions. They 1615218Sconklinprovide more options than the other interfaces. 1616218Sconklin 1617218Sconklin@menu 1618218Sconklin* GNU Pattern Buffers:: The re_pattern_buffer type. 1619218Sconklin* GNU Regular Expression Compiling:: re_compile_pattern () 1620218Sconklin* GNU Matching:: re_match () 1621218Sconklin* GNU Searching:: re_search () 1622218Sconklin* Matching/Searching with Split Data:: re_match_2 (), re_search_2 () 1623218Sconklin* Searching with Fastmaps:: re_compile_fastmap () 1624218Sconklin* GNU Translate Tables:: The `translate' field. 1625218Sconklin* Using Registers:: The re_registers type and related fns. 1626218Sconklin* Freeing GNU Pattern Buffers:: regfree () 1627218Sconklin@end menu 1628218Sconklin 1629218Sconklin 1630218Sconklin@node GNU Pattern Buffers, GNU Regular Expression Compiling, , GNU Regex Functions 1631218Sconklin@subsection GNU Pattern Buffers 1632218Sconklin 1633218Sconklin@cindex pattern buffer, definition of 1634218Sconklin@tindex re_pattern_buffer @r{definition} 1635218Sconklin@tindex struct re_pattern_buffer @r{definition} 1636218Sconklin 1637218SconklinTo compile, match, or search for a given regular expression, you must 1638218Sconklinsupply a pattern buffer. A @dfn{pattern buffer} holds one compiled 1639218Sconklinregular expression.@footnote{Regular expressions are also referred to as 1640218Sconklin``patterns,'' hence the name ``pattern buffer.''} 1641218Sconklin 1642218SconklinYou can have several different pattern buffers simultaneously, each 1643218Sconklinholding a compiled pattern for a different regular expression. 1644218Sconklin 1645218Sconklin@file{regex.h} defines the pattern buffer @code{struct} as follows: 1646218Sconklin 1647218Sconklin@example 1648218Sconklin[[[ pattern_buffer ]]] 1649218Sconklin@end example 1650218Sconklin 1651218Sconklin 1652218Sconklin@node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions 1653218Sconklin@subsection GNU Regular Expression Compiling 1654218Sconklin 1655218SconklinIn @sc{gnu}, you can both match and search for a given regular 1656218Sconklinexpression. To do either, you must first compile it in a pattern buffer 1657218Sconklin(@pxref{GNU Pattern Buffers}). 1658218Sconklin 1659218Sconklin@cindex syntax initialization 1660218Sconklin@vindex re_syntax_options @r{initialization} 1661218SconklinRegular expressions match according to the syntax with which they were 1662218Sconklincompiled; with @sc{gnu}, you indicate what syntax you want by setting 1663218Sconklinthe variable @code{re_syntax_options} (declared in @file{regex.h} and 1664218Sconklindefined in @file{regex.c}) before calling the compiling function, 1665218Sconklin@code{re_compile_pattern} (see below). @xref{Syntax Bits}, and 1666218Sconklin@ref{Predefined Syntaxes}. 1667218Sconklin 1668218SconklinYou can change the value of @code{re_syntax_options} at any time. 1669218SconklinUsually, however, you set its value once and then never change it. 1670218Sconklin 1671218Sconklin@cindex pattern buffer initialization 1672218Sconklin@code{re_compile_pattern} takes a pattern buffer as an argument. You 1673218Sconklinmust initialize the following fields: 1674218Sconklin 1675218Sconklin@table @code 1676218Sconklin 1677218Sconklin@item translate @r{initialization} 1678218Sconklin 1679218Sconklin@item translate 1680218Sconklin@vindex translate @r{initialization} 1681218SconklinInitialize this to point to a translate table if you want one, or to 1682218Sconklinzero if you don't. We explain translate tables in @ref{GNU Translate 1683218SconklinTables}. 1684218Sconklin 1685218Sconklin@item fastmap 1686218Sconklin@vindex fastmap @r{initialization} 1687218SconklinInitialize this to nonzero if you want a fastmap, or to zero if you 1688218Sconklindon't. 1689218Sconklin 1690218Sconklin@item buffer 1691218Sconklin@itemx allocated 1692218Sconklin@vindex buffer @r{initialization} 1693218Sconklin@vindex allocated @r{initialization} 1694218Sconklin@findex malloc 1695218SconklinIf you want @code{re_compile_pattern} to allocate memory for the 1696218Sconklincompiled pattern, set both of these to zero. If you have an existing 1697218Sconklinblock of memory (allocated with @code{malloc}) you want Regex to use, 1698218Sconklinset @code{buffer} to its address and @code{allocated} to its size (in 1699218Sconklinbytes). 1700218Sconklin 1701218Sconklin@code{re_compile_pattern} uses @code{realloc} to extend the space for 1702218Sconklinthe compiled pattern as necessary. 1703218Sconklin 1704218Sconklin@end table 1705218Sconklin 1706218SconklinTo compile a pattern buffer, use: 1707218Sconklin 1708218Sconklin@findex re_compile_pattern 1709218Sconklin@example 1710218Sconklinchar * 1711218Sconklinre_compile_pattern (const char *@var{regex}, const int @var{regex_size}, 1712218Sconklin struct re_pattern_buffer *@var{pattern_buffer}) 1713218Sconklin@end example 1714218Sconklin 1715218Sconklin@noindent 1716218Sconklin@var{regex} is the regular expression's address, @var{regex_size} is its 1717218Sconklinlength, and @var{pattern_buffer} is the pattern buffer's address. 1718218Sconklin 1719218SconklinIf @code{re_compile_pattern} successfully compiles the regular 1720218Sconklinexpression, it returns zero and sets @code{*@var{pattern_buffer}} to the 1721218Sconklincompiled pattern. It sets the pattern buffer's fields as follows: 1722218Sconklin 1723218Sconklin@table @code 1724218Sconklin@item buffer 1725218Sconklin@vindex buffer @r{field, set by @code{re_compile_pattern}} 1726218Sconklinto the compiled pattern. 1727218Sconklin 1728218Sconklin@item used 1729218Sconklin@vindex used @r{field, set by @code{re_compile_pattern}} 1730218Sconklinto the number of bytes the compiled pattern in @code{buffer} occupies. 1731218Sconklin 1732218Sconklin@item syntax 1733218Sconklin@vindex syntax @r{field, set by @code{re_compile_pattern}} 1734218Sconklinto the current value of @code{re_syntax_options}. 1735218Sconklin 1736218Sconklin@item re_nsub 1737218Sconklin@vindex re_nsub @r{field, set by @code{re_compile_pattern}} 1738218Sconklinto the number of subexpressions in @var{regex}. 1739218Sconklin 1740218Sconklin@item fastmap_accurate 1741218Sconklin@vindex fastmap_accurate @r{field, set by @code{re_compile_pattern}} 1742218Sconklinto zero on the theory that the pattern you're compiling is different 1743218Sconklinthan the one previously compiled into @code{buffer}; in that case (since 1744218Sconklinyou can't make a fastmap without a compiled pattern), 1745218Sconklin@code{fastmap} would either contain an incompatible fastmap, or nothing 1746218Sconklinat all. 1747218Sconklin 1748218Sconklin@c xx what else? 1749218Sconklin@end table 1750218Sconklin 1751218SconklinIf @code{re_compile_pattern} can't compile @var{regex}, it returns an 1752218Sconklinerror string corresponding to one of the errors listed in @ref{POSIX 1753218SconklinRegular Expression Compiling}. 1754218Sconklin 1755218Sconklin 1756218Sconklin@node GNU Matching, GNU Searching, GNU Regular Expression Compiling, GNU Regex Functions 1757218Sconklin@subsection GNU Matching 1758218Sconklin 1759218Sconklin@cindex matching with GNU functions 1760218Sconklin 1761218SconklinMatching the @sc{gnu} way means trying to match as much of a string as 1762218Sconklinpossible starting at a position within it you specify. Once you've compiled 1763218Sconklina pattern into a pattern buffer (@pxref{GNU Regular Expression 1764218SconklinCompiling}), you can ask the matcher to match that pattern against a 1765218Sconklinstring using: 1766218Sconklin 1767218Sconklin@findex re_match 1768218Sconklin@example 1769218Sconklinint 1770218Sconklinre_match (struct re_pattern_buffer *@var{pattern_buffer}, 1771218Sconklin const char *@var{string}, const int @var{size}, 1772218Sconklin const int @var{start}, struct re_registers *@var{regs}) 1773218Sconklin@end example 1774218Sconklin 1775218Sconklin@noindent 1776218Sconklin@var{pattern_buffer} is the address of a pattern buffer containing a 1777218Sconklincompiled pattern. @var{string} is the string you want to match; it can 1778218Sconklincontain newline and null characters. @var{size} is the length of that 1779218Sconklinstring. @var{start} is the string index at which you want to 1780218Sconklinbegin matching; the first character of @var{string} is at index zero. 1781218Sconklin@xref{Using Registers}, for a explanation of @var{regs}; you can safely 1782218Sconklinpass zero. 1783218Sconklin 1784218Sconklin@code{re_match} matches the regular expression in @var{pattern_buffer} 1785218Sconklinagainst the string @var{string} according to the syntax in 1786218Sconklin@var{pattern_buffers}'s @code{syntax} field. (@xref{GNU Regular 1787218SconklinExpression Compiling}, for how to set it.) The function returns 1788218Sconklin@math{-1} if the compiled pattern does not match any part of 1789218Sconklin@var{string} and @math{-2} if an internal error happens; otherwise, it 1790218Sconklinreturns how many (possibly zero) characters of @var{string} the pattern 1791218Sconklinmatched. 1792218Sconklin 1793218SconklinAn example: suppose @var{pattern_buffer} points to a pattern buffer 1794218Sconklincontaining the compiled pattern for @samp{a*}, and @var{string} points 1795218Sconklinto @samp{aaaaab} (whereupon @var{size} should be 6). Then if @var{start} 1796218Sconklinis 2, @code{re_match} returns 3, i.e., @samp{a*} would have matched the 1797218Sconklinlast three @samp{a}s in @var{string}. If @var{start} is 0, 1798218Sconklin@code{re_match} returns 5, i.e., @samp{a*} would have matched all the 1799218Sconklin@samp{a}s in @var{string}. If @var{start} is either 5 or 6, it returns 1800218Sconklinzero. 1801218Sconklin 1802218SconklinIf @var{start} is not between zero and @var{size}, then 1803218Sconklin@code{re_match} returns @math{-1}. 1804218Sconklin 1805218Sconklin 1806218Sconklin@node GNU Searching, Matching/Searching with Split Data, GNU Matching, GNU Regex Functions 1807218Sconklin@subsection GNU Searching 1808218Sconklin 1809218Sconklin@cindex searching with GNU functions 1810218Sconklin 1811218Sconklin@dfn{Searching} means trying to match starting at successive positions 1812218Sconklinwithin a string. The function @code{re_search} does this. 1813218Sconklin 1814218SconklinBefore calling @code{re_search}, you must compile your regular 1815218Sconklinexpression. @xref{GNU Regular Expression Compiling}. 1816218Sconklin 1817218SconklinHere is the function declaration: 1818218Sconklin 1819218Sconklin@findex re_search 1820218Sconklin@example 1821218Sconklinint 1822218Sconklinre_search (struct re_pattern_buffer *@var{pattern_buffer}, 1823218Sconklin const char *@var{string}, const int @var{size}, 1824218Sconklin const int @var{start}, const int @var{range}, 1825218Sconklin struct re_registers *@var{regs}) 1826218Sconklin@end example 1827218Sconklin 1828218Sconklin@noindent 1829218Sconklin@vindex start @r{argument to @code{re_search}} 1830218Sconklin@vindex range @r{argument to @code{re_search}} 1831218Sconklinwhose arguments are the same as those to @code{re_match} (@pxref{GNU 1832218SconklinMatching}) except that the two arguments @var{start} and @var{range} 1833218Sconklinreplace @code{re_match}'s argument @var{start}. 1834218Sconklin 1835218SconklinIf @var{range} is positive, then @code{re_search} attempts a match 1836218Sconklinstarting first at index @var{start}, then at @math{@var{start} + 1} if 1837218Sconklinthat fails, and so on, up to @math{@var{start} + @var{range}}; if 1838218Sconklin@var{range} is negative, then it attempts a match starting first at 1839218Sconklinindex @var{start}, then at @math{@var{start} -1} if that fails, and so 1840218Sconklinon. 1841218Sconklin 1842218SconklinIf @var{start} is not between zero and @var{size}, then @code{re_search} 1843218Sconklinreturns @math{-1}. When @var{range} is positive, @code{re_search} 1844218Sconklinadjusts @var{range} so that @math{@var{start} + @var{range} - 1} is 1845218Sconklinbetween zero and @var{size}, if necessary; that way it won't search 1846218Sconklinoutside of @var{string}. Similarly, when @var{range} is negative, 1847218Sconklin@code{re_search} adjusts @var{range} so that @math{@var{start} + 1848218Sconklin@var{range} + 1} is between zero and @var{size}, if necessary. 1849218Sconklin 1850218SconklinIf the @code{fastmap} field of @var{pattern_buffer} is zero, 1851218Sconklin@code{re_search} matches starting at consecutive positions; otherwise, 1852218Sconklinit uses @code{fastmap} to make the search more efficient. 1853218Sconklin@xref{Searching with Fastmaps}. 1854218Sconklin 1855218SconklinIf no match is found, @code{re_search} returns @math{-1}. If 1856218Sconklina match is found, it returns the index where the match began. If an 1857218Sconklininternal error happens, it returns @math{-2}. 1858218Sconklin 1859218Sconklin 1860218Sconklin@node Matching/Searching with Split Data, Searching with Fastmaps, GNU Searching, GNU Regex Functions 1861218Sconklin@subsection Matching and Searching with Split Data 1862218Sconklin 1863218SconklinUsing the functions @code{re_match_2} and @code{re_search_2}, you can 1864218Sconklinmatch or search in data that is divided into two strings. 1865218Sconklin 1866218SconklinThe function: 1867218Sconklin 1868218Sconklin@findex re_match_2 1869218Sconklin@example 1870218Sconklinint 1871218Sconklinre_match_2 (struct re_pattern_buffer *@var{buffer}, 1872218Sconklin const char *@var{string1}, const int @var{size1}, 1873218Sconklin const char *@var{string2}, const int @var{size2}, 1874218Sconklin const int @var{start}, 1875218Sconklin struct re_registers *@var{regs}, 1876218Sconklin const int @var{stop}) 1877218Sconklin@end example 1878218Sconklin 1879218Sconklin@noindent 1880218Sconklinis similar to @code{re_match} (@pxref{GNU Matching}) except that you 1881218Sconklinpass @emph{two} data strings and sizes, and an index @var{stop} beyond 1882218Sconklinwhich you don't want the matcher to try matching. As with 1883218Sconklin@code{re_match}, if it succeeds, @code{re_match_2} returns how many 1884218Sconklincharacters of @var{string} it matched. Regard @var{string1} and 1885218Sconklin@var{string2} as concatenated when you set the arguments @var{start} and 1886218Sconklin@var{stop} and use the contents of @var{regs}; @code{re_match_2} never 1887218Sconklinreturns a value larger than @math{@var{size1} + @var{size2}}. 1888218Sconklin 1889218SconklinThe function: 1890218Sconklin 1891218Sconklin@findex re_search_2 1892218Sconklin@example 1893218Sconklinint 1894218Sconklinre_search_2 (struct re_pattern_buffer *@var{buffer}, 1895218Sconklin const char *@var{string1}, const int @var{size1}, 1896218Sconklin const char *@var{string2}, const int @var{size2}, 1897218Sconklin const int @var{start}, const int @var{range}, 1898218Sconklin struct re_registers *@var{regs}, 1899218Sconklin const int @var{stop}) 1900218Sconklin@end example 1901218Sconklin 1902218Sconklin@noindent 1903218Sconklinis similarly related to @code{re_search}. 1904218Sconklin 1905218Sconklin 1906218Sconklin@node Searching with Fastmaps, GNU Translate Tables, Matching/Searching with Split Data, GNU Regex Functions 1907218Sconklin@subsection Searching with Fastmaps 1908218Sconklin 1909218Sconklin@cindex fastmaps 1910218SconklinIf you're searching through a long string, you should use a fastmap. 1911218SconklinWithout one, the searcher tries to match at consecutive positions in the 1912218Sconklinstring. Generally, most of the characters in the string could not start 1913218Sconklina match. It takes much longer to try matching at a given position in the 1914218Sconklinstring than it does to check in a table whether or not the character at 1915218Sconklinthat position could start a match. A @dfn{fastmap} is such a table. 1916218Sconklin 1917218SconklinMore specifically, a fastmap is an array indexed by the characters in 1918218Sconklinyour character set. Under the @sc{ascii} encoding, therefore, a fastmap 1919218Sconklinhas 256 elements. If you want the searcher to use a fastmap with a 1920218Sconklingiven pattern buffer, you must allocate the array and assign the array's 1921218Sconklinaddress to the pattern buffer's @code{fastmap} field. You either can 1922218Sconklincompile the fastmap yourself or have @code{re_search} do it for you; 1923218Sconklinwhen @code{fastmap} is nonzero, it automatically compiles a fastmap the 1924218Sconklinfirst time you search using a particular compiled pattern. 1925218Sconklin 1926218SconklinTo compile a fastmap yourself, use: 1927218Sconklin 1928218Sconklin@findex re_compile_fastmap 1929218Sconklin@example 1930218Sconklinint 1931218Sconklinre_compile_fastmap (struct re_pattern_buffer *@var{pattern_buffer}) 1932218Sconklin@end example 1933218Sconklin 1934218Sconklin@noindent 1935218Sconklin@var{pattern_buffer} is the address of a pattern buffer. If the 1936218Sconklincharacter @var{c} could start a match for the pattern, 1937218Sconklin@code{re_compile_fastmap} makes 1938218Sconklin@code{@var{pattern_buffer}->fastmap[@var{c}]} nonzero. It returns 1939218Sconklin@math{0} if it can compile a fastmap and @math{-2} if there is an 1940218Sconklininternal error. For example, if @samp{|} is the alternation operator 1941218Sconklinand @var{pattern_buffer} holds the compiled pattern for @samp{a|b}, then 1942218Sconklin@code{re_compile_fastmap} sets @code{fastmap['a']} and 1943218Sconklin@code{fastmap['b']} (and no others). 1944218Sconklin 1945218Sconklin@code{re_search} uses a fastmap as it moves along in the string: it 1946218Sconklinchecks the string's characters until it finds one that's in the fastmap. 1947218SconklinThen it tries matching at that character. If the match fails, it 1948218Sconklinrepeats the process. So, by using a fastmap, @code{re_search} doesn't 1949218Sconklinwaste time trying to match at positions in the string that couldn't 1950218Sconklinstart a match. 1951218Sconklin 1952218SconklinIf you don't want @code{re_search} to use a fastmap, 1953218Sconklinstore zero in the @code{fastmap} field of the pattern buffer before 1954218Sconklincalling @code{re_search}. 1955218Sconklin 1956218SconklinOnce you've initialized a pattern buffer's @code{fastmap} field, you 1957218Sconklinneed never do so again---even if you compile a new pattern in 1958218Sconklinit---provided the way the field is set still reflects whether or not you 1959218Sconklinwant a fastmap. @code{re_search} will still either do nothing if 1960218Sconklin@code{fastmap} is null or, if it isn't, compile a new fastmap for the 1961218Sconklinnew pattern. 1962218Sconklin 1963218Sconklin@node GNU Translate Tables, Using Registers, Searching with Fastmaps, GNU Regex Functions 1964218Sconklin@subsection GNU Translate Tables 1965218Sconklin 1966218SconklinIf you set the @code{translate} field of a pattern buffer to a translate 1967218Sconklintable, then the @sc{gnu} Regex functions to which you've passed that 1968218Sconklinpattern buffer use it to apply a simple transformation 1969218Sconklinto all the regular expression and string characters at which they look. 1970218Sconklin 1971218SconklinA @dfn{translate table} is an array indexed by the characters in your 1972218Sconklincharacter set. Under the @sc{ascii} encoding, therefore, a translate 1973218Sconklintable has 256 elements. The array's elements are also characters in 1974218Sconklinyour character set. When the Regex functions see a character @var{c}, 1975218Sconklinthey use @code{translate[@var{c}]} in its place, with one exception: the 1976218Sconklincharacter after a @samp{\} is not translated. (This ensures that, the 1977218Sconklinoperators, e.g., @samp{\B} and @samp{\b}, are always distinguishable.) 1978218Sconklin 1979218SconklinFor example, a table that maps all lowercase letters to the 1980218Sconklincorresponding uppercase ones would cause the matcher to ignore 1981218Sconklindifferences in case.@footnote{A table that maps all uppercase letters to 1982218Sconklinthe corresponding lowercase ones would work just as well for this 1983218Sconklinpurpose.} Such a table would map all characters except lowercase letters 1984218Sconklinto themselves, and lowercase letters to the corresponding uppercase 1985218Sconklinones. Under the @sc{ascii} encoding, here's how you could initialize 1986218Sconklinsuch a table (we'll call it @code{case_fold}): 1987218Sconklin 1988218Sconklin@example 1989218Sconklinfor (i = 0; i < 256; i++) 1990218Sconklin case_fold[i] = i; 1991218Sconklinfor (i = 'a'; i <= 'z'; i++) 1992218Sconklin case_fold[i] = i - ('a' - 'A'); 1993218Sconklin@end example 1994218Sconklin 1995218SconklinYou tell Regex to use a translate table on a given pattern buffer by 1996218Sconklinassigning that table's address to the @code{translate} field of that 1997218Sconklinbuffer. If you don't want Regex to do any translation, put zero into 1998218Sconklinthis field. You'll get weird results if you change the table's contents 1999218Sconklinanytime between compiling the pattern buffer, compiling its fastmap, and 2000218Sconklinmatching or searching with the pattern buffer. 2001218Sconklin 2002218Sconklin@node Using Registers, Freeing GNU Pattern Buffers, GNU Translate Tables, GNU Regex Functions 2003218Sconklin@subsection Using Registers 2004218Sconklin 2005218SconklinA group in a regular expression can match a (posssibly empty) substring 2006218Sconklinof the string that regular expression as a whole matched. The matcher 2007218Sconklinremembers the beginning and end of the substring matched by 2008218Sconklineach group. 2009218Sconklin 2010218SconklinTo find out what they matched, pass a nonzero @var{regs} argument to a 2011218Sconklin@sc{gnu} matching or searching function (@pxref{GNU Matching} and 2012218Sconklin@ref{GNU Searching}), i.e., the address of a structure of this type, as 2013218Sconklindefined in @file{regex.h}: 2014218Sconklin 2015218Sconklin@c We don't bother to include this directly from regex.h, 2016218Sconklin@c since it changes so rarely. 2017218Sconklin@example 2018218Sconklin@tindex re_registers 2019218Sconklin@vindex num_regs @r{in @code{struct re_registers}} 2020218Sconklin@vindex start @r{in @code{struct re_registers}} 2021218Sconklin@vindex end @r{in @code{struct re_registers}} 2022218Sconklinstruct re_registers 2023218Sconklin@{ 2024218Sconklin unsigned num_regs; 2025218Sconklin regoff_t *start; 2026218Sconklin regoff_t *end; 2027218Sconklin@}; 2028218Sconklin@end example 2029218Sconklin 2030218SconklinExcept for (possibly) the @var{num_regs}'th element (see below), the 2031218Sconklin@var{i}th element of the @code{start} and @code{end} arrays records 2032218Sconklininformation about the @var{i}th group in the pattern. (They're declared 2033218Sconklinas C pointers, but this is only because not all C compilers accept 2034218Sconklinzero-length arrays; conceptually, it is simplest to think of them as 2035218Sconklinarrays.) 2036218Sconklin 2037218SconklinThe @code{start} and @code{end} arrays are allocated in various ways, 2038218Sconklindepending on the value of the @code{regs_allocated} 2039218Sconklin@vindex regs_allocated 2040218Sconklinfield in the pattern buffer passed to the matcher. 2041218Sconklin 2042218SconklinThe simplest and perhaps most useful is to let the matcher (re)allocate 2043218Sconklinenough space to record information for all the groups in the regular 2044218Sconklinexpression. If @code{regs_allocated} is @code{REGS_UNALLOCATED}, 2045218Sconklin@vindex REGS_UNALLOCATED 2046218Sconklinthe matcher allocates @math{1 + @var{re_nsub}} (another field in the 2047218Sconklinpattern buffer; @pxref{GNU Pattern Buffers}). The extra element is set 2048218Sconklinto @math{-1}, and sets @code{regs_allocated} to @code{REGS_REALLOCATE}. 2049218Sconklin@vindex REGS_REALLOCATE 2050218SconklinThen on subsequent calls with the same pattern buffer and @var{regs} 2051218Sconklinarguments, the matcher reallocates more space if necessary. 2052218Sconklin 2053218SconklinIt would perhaps be more logical to make the @code{regs_allocated} field 2054218Sconklinpart of the @code{re_registers} structure, instead of part of the 2055218Sconklinpattern buffer. But in that case the caller would be forced to 2056218Sconklininitialize the structure before passing it. Much existing code doesn't 2057218Sconklindo this initialization, and it's arguably better to avoid it anyway. 2058218Sconklin 2059218Sconklin@code{re_compile_pattern} sets @code{regs_allocated} to 2060218Sconklin@code{REGS_UNALLOCATED}, 2061218Sconklinso if you use the GNU regular expression 2062218Sconklinfunctions, you get this behavior by default. 2063218Sconklin 2064218Sconklinxx document re_set_registers 2065218Sconklin 2066218Sconklin@sc{posix}, on the other hand, requires a different interface: the 2067218Sconklincaller is supposed to pass in a fixed-length array which the matcher 2068218Sconklinfills. Therefore, if @code{regs_allocated} is @code{REGS_FIXED} 2069218Sconklin@vindex REGS_FIXED 2070218Sconklinthe matcher simply fills that array. 2071218Sconklin 2072218SconklinThe following examples illustrate the information recorded in the 2073218Sconklin@code{re_registers} structure. (In all of them, @samp{(} represents the 2074218Sconklinopen-group and @samp{)} the close-group operator. The first character 2075218Sconklinin the string @var{string} is at index 0.) 2076218Sconklin 2077218Sconklin@c xx i'm not sure this is all true anymore. 2078218Sconklin 2079218Sconklin@itemize @bullet 2080218Sconklin 2081218Sconklin@item 2082218SconklinIf the regular expression has an @w{@var{i}-th} 2083218Sconklingroup not contained within another group that matches a 2084218Sconklinsubstring of @var{string}, then the function sets 2085218Sconklin@code{@w{@var{regs}->}start[@var{i}]} to the index in @var{string} where 2086218Sconklinthe substring matched by the @w{@var{i}-th} group begins, and 2087218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that 2088218Sconklinsubstring's end. The function sets @code{@w{@var{regs}->}start[0]} and 2089218Sconklin@code{@w{@var{regs}->}end[0]} to analogous information about the entire 2090218Sconklinpattern. 2091218Sconklin 2092218SconklinFor example, when you match @samp{((a)(b))} against @samp{ab}, you get: 2093218Sconklin 209421643Sjkh@itemize @bullet 2095218Sconklin@item 2096218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 2097218Sconklin 2098218Sconklin@item 2099218Sconklin0 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 2100218Sconklin 2101218Sconklin@item 2102218Sconklin0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 2103218Sconklin 2104218Sconklin@item 2105218Sconklin1 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]} 2106218Sconklin@end itemize 2107218Sconklin 2108218Sconklin@item 2109218SconklinIf a group matches more than once (as it might if followed by, 2110218Sconkline.g., a repetition operator), then the function reports the information 2111218Sconklinabout what the group @emph{last} matched. 2112218Sconklin 2113218SconklinFor example, when you match the pattern @samp{(a)*} against the string 2114218Sconklin@samp{aa}, you get: 2115218Sconklin 211621643Sjkh@itemize @bullet 2117218Sconklin@item 2118218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 2119218Sconklin 2120218Sconklin@item 2121218Sconklin1 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 2122218Sconklin@end itemize 2123218Sconklin 2124218Sconklin@item 2125218SconklinIf the @w{@var{i}-th} group does not participate in a 2126218Sconklinsuccessful match, e.g., it is an alternative not taken or a 2127218Sconklinrepetition operator allows zero repetitions of it, then the function 2128218Sconklinsets @code{@w{@var{regs}->}start[@var{i}]} and 2129218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}. 2130218Sconklin 2131218SconklinFor example, when you match the pattern @samp{(a)*b} against 2132218Sconklinthe string @samp{b}, you get: 2133218Sconklin 213421643Sjkh@itemize @bullet 2135218Sconklin@item 2136218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 2137218Sconklin 2138218Sconklin@item 2139218Sconklin@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 2140218Sconklin@end itemize 2141218Sconklin 2142218Sconklin@item 2143218SconklinIf the @w{@var{i}-th} group matches a zero-length string, then the 2144218Sconklinfunction sets @code{@w{@var{regs}->}start[@var{i}]} and 2145218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that 2146218Sconklinzero-length string. 2147218Sconklin 2148218SconklinFor example, when you match the pattern @samp{(a*)b} against the string 2149218Sconklin@samp{b}, you get: 2150218Sconklin 215121643Sjkh@itemize @bullet 2152218Sconklin@item 2153218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 2154218Sconklin 2155218Sconklin@item 2156218Sconklin0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 2157218Sconklin@end itemize 2158218Sconklin 2159218Sconklin@ignore 2160218SconklinThe function sets @code{@w{@var{regs}->}start[0]} and 2161218Sconklin@code{@w{@var{regs}->}end[0]} to analogous information about the entire 2162218Sconklinpattern. 2163218Sconklin 2164218SconklinFor example, when you match the pattern @samp{(a*)} against the empty 2165218Sconklinstring, you get: 2166218Sconklin 216721643Sjkh@itemize @bullet 2168218Sconklin@item 2169218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]} 2170218Sconklin 2171218Sconklin@item 2172218Sconklin0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 2173218Sconklin@end itemize 2174218Sconklin@end ignore 2175218Sconklin 2176218Sconklin@item 2177218SconklinIf an @w{@var{i}-th} group contains a @w{@var{j}-th} group 2178218Sconklinin turn not contained within any other group within group @var{i} and 2179218Sconklinthe function reports a match of the @w{@var{i}-th} group, then it 2180218Sconklinrecords in @code{@w{@var{regs}->}start[@var{j}]} and 2181218Sconklin@code{@w{@var{regs}->}end[@var{j}]} the last match (if it matched) of 2182218Sconklinthe @w{@var{j}-th} group. 2183218Sconklin 2184218SconklinFor example, when you match the pattern @samp{((a*)b)*} against the 2185218Sconklinstring @samp{abb}, @w{group 2} last matches the empty string, so you 2186218Sconklinget what it previously matched: 2187218Sconklin 218821643Sjkh@itemize @bullet 2189218Sconklin@item 2190218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 2191218Sconklin 2192218Sconklin@item 2193218Sconklin2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 2194218Sconklin 2195218Sconklin@item 2196218Sconklin2 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]} 2197218Sconklin@end itemize 2198218Sconklin 2199218SconklinWhen you match the pattern @samp{((a)*b)*} against the string 2200218Sconklin@samp{abb}, @w{group 2} doesn't participate in the last match, so you 2201218Sconklinget: 2202218Sconklin 220321643Sjkh@itemize @bullet 2204218Sconklin@item 2205218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 2206218Sconklin 2207218Sconklin@item 2208218Sconklin2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 2209218Sconklin 2210218Sconklin@item 2211218Sconklin0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 2212218Sconklin@end itemize 2213218Sconklin 2214218Sconklin@item 2215218SconklinIf an @w{@var{i}-th} group contains a @w{@var{j}-th} group 2216218Sconklinin turn not contained within any other group within group @var{i} 2217218Sconklinand the function sets 2218218Sconklin@code{@w{@var{regs}->}start[@var{i}]} and 2219218Sconklin@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets 2220218Sconklin@code{@w{@var{regs}->}start[@var{j}]} and 2221218Sconklin@code{@w{@var{regs}->}end[@var{j}]} to @math{-1}. 2222218Sconklin 2223218SconklinFor example, when you match the pattern @samp{((a)*b)*c} against the 2224218Sconklinstring @samp{c}, you get: 2225218Sconklin 222621643Sjkh@itemize @bullet 2227218Sconklin@item 2228218Sconklin0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 2229218Sconklin 2230218Sconklin@item 2231218Sconklin@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 2232218Sconklin 2233218Sconklin@item 2234218Sconklin@math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]} 2235218Sconklin@end itemize 2236218Sconklin 2237218Sconklin@end itemize 2238218Sconklin 2239218Sconklin@node Freeing GNU Pattern Buffers, , Using Registers, GNU Regex Functions 2240218Sconklin@subsection Freeing GNU Pattern Buffers 2241218Sconklin 2242218SconklinTo free any allocated fields of a pattern buffer, you can use the 2243218Sconklin@sc{posix} function described in @ref{Freeing POSIX Pattern Buffers}, 2244218Sconklinsince the type @code{regex_t}---the type for @sc{posix} pattern 2245218Sconklinbuffers---is equivalent to the type @code{re_pattern_buffer}. After 2246218Sconklinfreeing a pattern buffer, you need to again compile a regular expression 2247218Sconklinin it (@pxref{GNU Regular Expression Compiling}) before passing it to 2248218Sconklina matching or searching function. 2249218Sconklin 2250218Sconklin 2251218Sconklin@node POSIX Regex Functions, BSD Regex Functions, GNU Regex Functions, Programming with Regex 2252218Sconklin@section POSIX Regex Functions 2253218Sconklin 2254218SconklinIf you're writing code that has to be @sc{posix} compatible, you'll need 2255218Sconklinto use these functions. Their interfaces are as specified by @sc{posix}, 2256218Sconklindraft 1003.2/D11.2. 2257218Sconklin 2258218Sconklin@menu 2259218Sconklin* POSIX Pattern Buffers:: The regex_t type. 2260218Sconklin* POSIX Regular Expression Compiling:: regcomp () 2261218Sconklin* POSIX Matching:: regexec () 2262218Sconklin* Reporting Errors:: regerror () 2263218Sconklin* Using Byte Offsets:: The regmatch_t type. 2264218Sconklin* Freeing POSIX Pattern Buffers:: regfree () 2265218Sconklin@end menu 2266218Sconklin 2267218Sconklin 2268218Sconklin@node POSIX Pattern Buffers, POSIX Regular Expression Compiling, , POSIX Regex Functions 2269218Sconklin@subsection POSIX Pattern Buffers 2270218Sconklin 2271218SconklinTo compile or match a given regular expression the @sc{posix} way, you 2272218Sconklinmust supply a pattern buffer exactly the way you do for @sc{gnu} 2273218Sconklin(@pxref{GNU Pattern Buffers}). @sc{posix} pattern buffers have type 2274218Sconklin@code{regex_t}, which is equivalent to the @sc{gnu} pattern buffer 2275218Sconklintype @code{re_pattern_buffer}. 2276218Sconklin 2277218Sconklin 2278218Sconklin@node POSIX Regular Expression Compiling, POSIX Matching, POSIX Pattern Buffers, POSIX Regex Functions 2279218Sconklin@subsection POSIX Regular Expression Compiling 2280218Sconklin 2281218SconklinWith @sc{posix}, you can only search for a given regular expression; you 2282218Sconklincan't match it. To do this, you must first compile it in a 2283218Sconklinpattern buffer, using @code{regcomp}. 2284218Sconklin 2285218Sconklin@ignore 2286218SconklinBefore calling @code{regcomp}, you must initialize this pattern buffer 2287218Sconklinas you do for @sc{gnu} (@pxref{GNU Regular Expression Compiling}). See 2288218Sconklinbelow, however, for how to choose a syntax with which to compile. 2289218Sconklin@end ignore 2290218Sconklin 2291218SconklinTo compile a pattern buffer, use: 2292218Sconklin 2293218Sconklin@findex regcomp 2294218Sconklin@example 2295218Sconklinint 2296218Sconklinregcomp (regex_t *@var{preg}, const char *@var{regex}, int @var{cflags}) 2297218Sconklin@end example 2298218Sconklin 2299218Sconklin@noindent 2300218Sconklin@var{preg} is the initialized pattern buffer's address, @var{regex} is 2301218Sconklinthe regular expression's address, and @var{cflags} is the compilation 2302218Sconklinflags, which Regex considers as a collection of bits. Here are the 2303218Sconklinvalid bits, as defined in @file{regex.h}: 2304218Sconklin 2305218Sconklin@table @code 2306218Sconklin 2307218Sconklin@item REG_EXTENDED 2308218Sconklin@vindex REG_EXTENDED 2309218Sconklinsays to use @sc{posix} Extended Regular Expression syntax; if this isn't 2310218Sconklinset, then says to use @sc{posix} Basic Regular Expression syntax. 2311218Sconklin@code{regcomp} sets @var{preg}'s @code{syntax} field accordingly. 2312218Sconklin 2313218Sconklin@item REG_ICASE 2314218Sconklin@vindex REG_ICASE 2315218Sconklin@cindex ignoring case 2316218Sconklinsays to ignore case; @code{regcomp} sets @var{preg}'s @code{translate} 2317218Sconklinfield to a translate table which ignores case, replacing anything you've 2318218Sconklinput there before. 2319218Sconklin 2320218Sconklin@item REG_NOSUB 2321218Sconklin@vindex REG_NOSUB 2322218Sconklinsays to set @var{preg}'s @code{no_sub} field; @pxref{POSIX Matching}, 2323218Sconklinfor what this means. 2324218Sconklin 2325218Sconklin@item REG_NEWLINE 2326218Sconklin@vindex REG_NEWLINE 2327218Sconklinsays that a: 2328218Sconklin 2329218Sconklin@itemize @bullet 2330218Sconklin 2331218Sconklin@item 2332218Sconklinmatch-any-character operator (@pxref{Match-any-character 2333218SconklinOperator}) doesn't match a newline. 2334218Sconklin 2335218Sconklin@item 2336218Sconklinnonmatching list not containing a newline (@pxref{List 2337218SconklinOperators}) matches a newline. 2338218Sconklin 2339218Sconklin@item 2340218Sconklinmatch-beginning-of-line operator (@pxref{Match-beginning-of-line 2341218SconklinOperator}) matches the empty string immediately after a newline, 2342218Sconklinregardless of how @code{REG_NOTBOL} is set (@pxref{POSIX Matching}, for 2343218Sconklinan explanation of @code{REG_NOTBOL}). 2344218Sconklin 2345218Sconklin@item 2346218Sconklinmatch-end-of-line operator (@pxref{Match-beginning-of-line 2347218SconklinOperator}) matches the empty string immediately before a newline, 2348218Sconklinregardless of how @code{REG_NOTEOL} is set (@pxref{POSIX Matching}, 2349218Sconklinfor an explanation of @code{REG_NOTEOL}). 2350218Sconklin 2351218Sconklin@end itemize 2352218Sconklin 2353218Sconklin@end table 2354218Sconklin 2355218SconklinIf @code{regcomp} successfully compiles the regular expression, it 2356218Sconklinreturns zero and sets @code{*@var{pattern_buffer}} to the compiled 2357218Sconklinpattern. Except for @code{syntax} (which it sets as explained above), it 2358218Sconklinalso sets the same fields the same way as does the @sc{gnu} compiling 2359218Sconklinfunction (@pxref{GNU Regular Expression Compiling}). 2360218Sconklin 2361218SconklinIf @code{regcomp} can't compile the regular expression, it returns one 2362218Sconklinof the error codes listed here. (Except when noted differently, the 2363218Sconklinsyntax of in all examples below is basic regular expression syntax.) 2364218Sconklin 2365218Sconklin@table @code 2366218Sconklin 2367218Sconklin@comment repetitions 2368218Sconklin@item REG_BADRPT 2369218SconklinFor example, the consecutive repetition operators @samp{**} in 2370218Sconklin@samp{a**} are invalid. As another example, if the syntax is extended 2371218Sconklinregular expression syntax, then the repetition operator @samp{*} with 2372218Sconklinnothing on which to operate in @samp{*} is invalid. 2373218Sconklin 2374218Sconklin@item REG_BADBR 2375218SconklinFor example, the @var{count} @samp{-1} in @samp{a\@{-1} is invalid. 2376218Sconklin 2377218Sconklin@item REG_EBRACE 2378218SconklinFor example, @samp{a\@{1} is missing a close-interval operator. 2379218Sconklin 2380218Sconklin@comment lists 2381218Sconklin@item REG_EBRACK 2382218SconklinFor example, @samp{[a} is missing a close-list operator. 2383218Sconklin 2384218Sconklin@item REG_ERANGE 2385218SconklinFor example, the range ending point @samp{z} that collates lower than 2386218Sconklindoes its starting point @samp{a} in @samp{[z-a]} is invalid. Also, the 2387218Sconklinrange with the character class @samp{[:alpha:]} as its starting point in 2388218Sconklin@samp{[[:alpha:]-|]}. 2389218Sconklin 2390218Sconklin@item REG_ECTYPE 2391218SconklinFor example, the character class name @samp{foo} in @samp{[[:foo:]} is 2392218Sconklininvalid. 2393218Sconklin 2394218Sconklin@comment groups 2395218Sconklin@item REG_EPAREN 2396218SconklinFor example, @samp{a\)} is missing an open-group operator and @samp{\(a} 2397218Sconklinis missing a close-group operator. 2398218Sconklin 2399218Sconklin@item REG_ESUBREG 2400218SconklinFor example, the back reference @samp{\2} that refers to a nonexistent 2401218Sconklinsubexpression in @samp{\(a\)\2} is invalid. 2402218Sconklin 2403218Sconklin@comment unfinished business 2404218Sconklin 2405218Sconklin@item REG_EEND 2406218SconklinReturned when a regular expression causes no other more specific error. 2407218Sconklin 2408218Sconklin@item REG_EESCAPE 2409218SconklinFor example, the trailing backslash @samp{\} in @samp{a\} is invalid, as is the 2410218Sconklinone in @samp{\}. 2411218Sconklin 2412218Sconklin@comment kitchen sink 2413218Sconklin@item REG_BADPAT 2414218SconklinFor example, in the extended regular expression syntax, the empty group 2415218Sconklin@samp{()} in @samp{a()b} is invalid. 2416218Sconklin 2417218Sconklin@comment internal 2418218Sconklin@item REG_ESIZE 2419218SconklinReturned when a regular expression needs a pattern buffer larger than 2420218Sconklin65536 bytes. 2421218Sconklin 2422218Sconklin@item REG_ESPACE 2423218SconklinReturned when a regular expression makes Regex to run out of memory. 2424218Sconklin 2425218Sconklin@end table 2426218Sconklin 2427218Sconklin 2428218Sconklin@node POSIX Matching, Reporting Errors, POSIX Regular Expression Compiling, POSIX Regex Functions 2429218Sconklin@subsection POSIX Matching 2430218Sconklin 2431218SconklinMatching the @sc{posix} way means trying to match a null-terminated 2432218Sconklinstring starting at its first character. Once you've compiled a pattern 2433218Sconklininto a pattern buffer (@pxref{POSIX Regular Expression Compiling}), you 2434218Sconklincan ask the matcher to match that pattern against a string using: 2435218Sconklin 2436218Sconklin@findex regexec 2437218Sconklin@example 2438218Sconklinint 2439218Sconklinregexec (const regex_t *@var{preg}, const char *@var{string}, 2440218Sconklin size_t @var{nmatch}, regmatch_t @var{pmatch}[], int @var{eflags}) 2441218Sconklin@end example 2442218Sconklin 2443218Sconklin@noindent 2444218Sconklin@var{preg} is the address of a pattern buffer for a compiled pattern. 2445218Sconklin@var{string} is the string you want to match. 2446218Sconklin 2447218Sconklin@xref{Using Byte Offsets}, for an explanation of @var{pmatch}. If you 2448218Sconklinpass zero for @var{nmatch} or you compiled @var{preg} with the 2449218Sconklincompilation flag @code{REG_NOSUB} set, then @code{regexec} will ignore 2450218Sconklin@var{pmatch}; otherwise, you must allocate it to have at least 2451218Sconklin@var{nmatch} elements. @code{regexec} will record @var{nmatch} byte 2452218Sconklinoffsets in @var{pmatch}, and set to @math{-1} any unused elements up to 2453218Sconklin@math{@var{pmatch}@code{[@var{nmatch}]} - 1}. 2454218Sconklin 2455218Sconklin@var{eflags} specifies @dfn{execution flags}---namely, the two bits 2456218Sconklin@code{REG_NOTBOL} and @code{REG_NOTEOL} (defined in @file{regex.h}). If 2457218Sconklinyou set @code{REG_NOTBOL}, then the match-beginning-of-line operator 2458218Sconklin(@pxref{Match-beginning-of-line Operator}) always fails to match. 2459218SconklinThis lets you match against pieces of a line, as you would need to if, 2460218Sconklinsay, searching for repeated instances of a given pattern in a line; it 2461218Sconklinwould work correctly for patterns both with and without 2462218Sconklinmatch-beginning-of-line operators. @code{REG_NOTEOL} works analogously 2463218Sconklinfor the match-end-of-line operator (@pxref{Match-end-of-line 2464218SconklinOperator}); it exists for symmetry. 2465218Sconklin 2466218Sconklin@code{regexec} tries to find a match for @var{preg} in @var{string} 2467218Sconklinaccording to the syntax in @var{preg}'s @code{syntax} field. 2468218Sconklin(@xref{POSIX Regular Expression Compiling}, for how to set it.) The 2469218Sconklinfunction returns zero if the compiled pattern matches @var{string} and 2470218Sconklin@code{REG_NOMATCH} (defined in @file{regex.h}) if it doesn't. 2471218Sconklin 2472218Sconklin@node Reporting Errors, Using Byte Offsets, POSIX Matching, POSIX Regex Functions 2473218Sconklin@subsection Reporting Errors 2474218Sconklin 2475218SconklinIf either @code{regcomp} or @code{regexec} fail, they return a nonzero 2476218Sconklinerror code, the possibilities for which are defined in @file{regex.h}. 2477218Sconklin@xref{POSIX Regular Expression Compiling}, and @ref{POSIX Matching}, for 2478218Sconklinwhat these codes mean. To get an error string corresponding to these 2479218Sconklincodes, you can use: 2480218Sconklin 2481218Sconklin@findex regerror 2482218Sconklin@example 2483218Sconklinsize_t 2484218Sconklinregerror (int @var{errcode}, 2485218Sconklin const regex_t *@var{preg}, 2486218Sconklin char *@var{errbuf}, 2487218Sconklin size_t @var{errbuf_size}) 2488218Sconklin@end example 2489218Sconklin 2490218Sconklin@noindent 2491218Sconklin@var{errcode} is an error code, @var{preg} is the address of the pattern 2492218Sconklinbuffer which provoked the error, @var{errbuf} is the error buffer, and 2493218Sconklin@var{errbuf_size} is @var{errbuf}'s size. 2494218Sconklin 2495218Sconklin@code{regerror} returns the size in bytes of the error string 2496218Sconklincorresponding to @var{errcode} (including its terminating null). If 2497218Sconklin@var{errbuf} and @var{errbuf_size} are nonzero, it also returns in 2498218Sconklin@var{errbuf} the first @math{@var{errbuf_size} - 1} characters of the 2499218Sconklinerror string, followed by a null. 2500218Sconklin@var{errbuf_size} must be a nonnegative number less than or equal to the 2501218Sconklinsize in bytes of @var{errbuf}. 2502218Sconklin 2503218SconklinYou can call @code{regerror} with a null @var{errbuf} and a zero 2504218Sconklin@var{errbuf_size} to determine how large @var{errbuf} need be to 2505218Sconklinaccommodate @code{regerror}'s error string. 2506218Sconklin 2507218Sconklin@node Using Byte Offsets, Freeing POSIX Pattern Buffers, Reporting Errors, POSIX Regex Functions 2508218Sconklin@subsection Using Byte Offsets 2509218Sconklin 2510218SconklinIn @sc{posix}, variables of type @code{regmatch_t} hold analogous 2511218Sconklininformation, but are not identical to, @sc{gnu}'s registers (@pxref{Using 2512218SconklinRegisters}). To get information about registers in @sc{posix}, pass to 2513218Sconklin@code{regexec} a nonzero @var{pmatch} of type @code{regmatch_t}, i.e., 2514218Sconklinthe address of a structure of this type, defined in 2515218Sconklin@file{regex.h}: 2516218Sconklin 2517218Sconklin@tindex regmatch_t 2518218Sconklin@example 2519218Sconklintypedef struct 2520218Sconklin@{ 2521218Sconklin regoff_t rm_so; 2522218Sconklin regoff_t rm_eo; 2523218Sconklin@} regmatch_t; 2524218Sconklin@end example 2525218Sconklin 2526218SconklinWhen reading in @ref{Using Registers}, about how the matching function 2527218Sconklinstores the information into the registers, substitute @var{pmatch} for 2528218Sconklin@var{regs}, @code{@w{@var{pmatch}[@var{i}]->}rm_so} for 2529218Sconklin@code{@w{@var{regs}->}start[@var{i}]} and 2530218Sconklin@code{@w{@var{pmatch}[@var{i}]->}rm_eo} for 2531218Sconklin@code{@w{@var{regs}->}end[@var{i}]}. 2532218Sconklin 2533218Sconklin@node Freeing POSIX Pattern Buffers, , Using Byte Offsets, POSIX Regex Functions 2534218Sconklin@subsection Freeing POSIX Pattern Buffers 2535218Sconklin 2536218SconklinTo free any allocated fields of a pattern buffer, use: 2537218Sconklin 2538218Sconklin@findex regfree 2539218Sconklin@example 2540218Sconklinvoid 2541218Sconklinregfree (regex_t *@var{preg}) 2542218Sconklin@end example 2543218Sconklin 2544218Sconklin@noindent 2545218Sconklin@var{preg} is the pattern buffer whose allocated fields you want freed. 2546218Sconklin@code{regfree} also sets @var{preg}'s @code{allocated} and @code{used} 2547218Sconklinfields to zero. After freeing a pattern buffer, you need to again 2548218Sconklincompile a regular expression in it (@pxref{POSIX Regular Expression 2549218SconklinCompiling}) before passing it to the matching function (@pxref{POSIX 2550218SconklinMatching}). 2551218Sconklin 2552218Sconklin 2553218Sconklin@node BSD Regex Functions, , POSIX Regex Functions, Programming with Regex 2554218Sconklin@section BSD Regex Functions 2555218Sconklin 2556218SconklinIf you're writing code that has to be Berkeley @sc{unix} compatible, 2557218Sconklinyou'll need to use these functions whose interfaces are the same as those 2558218Sconklinin Berkeley @sc{unix}. 2559218Sconklin 2560218Sconklin@menu 2561218Sconklin* BSD Regular Expression Compiling:: re_comp () 2562218Sconklin* BSD Searching:: re_exec () 2563218Sconklin@end menu 2564218Sconklin 2565218Sconklin@node BSD Regular Expression Compiling, BSD Searching, , BSD Regex Functions 2566218Sconklin@subsection BSD Regular Expression Compiling 2567218Sconklin 2568218SconklinWith Berkeley @sc{unix}, you can only search for a given regular 2569218Sconklinexpression; you can't match one. To search for it, you must first 2570218Sconklincompile it. Before you compile it, you must indicate the regular 2571218Sconklinexpression syntax you want it compiled according to by setting the 2572218Sconklinvariable @code{re_syntax_options} (declared in @file{regex.h} to some 2573218Sconklinsyntax (@pxref{Regular Expression Syntax}). 2574218Sconklin 2575218SconklinTo compile a regular expression use: 2576218Sconklin 2577218Sconklin@findex re_comp 2578218Sconklin@example 2579218Sconklinchar * 2580218Sconklinre_comp (char *@var{regex}) 2581218Sconklin@end example 2582218Sconklin 2583218Sconklin@noindent 2584218Sconklin@var{regex} is the address of a null-terminated regular expression. 2585218Sconklin@code{re_comp} uses an internal pattern buffer, so you can use only the 2586218Sconklinmost recently compiled pattern buffer. This means that if you want to 2587218Sconklinuse a given regular expression that you've already compiled---but it 2588218Sconklinisn't the latest one you've compiled---you'll have to recompile it. If 2589218Sconklinyou call @code{re_comp} with the null string (@emph{not} the empty 2590218Sconklinstring) as the argument, it doesn't change the contents of the pattern 2591218Sconklinbuffer. 2592218Sconklin 2593218SconklinIf @code{re_comp} successfully compiles the regular expression, it 2594218Sconklinreturns zero. If it can't compile the regular expression, it returns 2595218Sconklinan error string. @code{re_comp}'s error messages are identical to those 2596218Sconklinof @code{re_compile_pattern} (@pxref{GNU Regular Expression 2597218SconklinCompiling}). 2598218Sconklin 2599218Sconklin@node BSD Searching, , BSD Regular Expression Compiling, BSD Regex Functions 2600218Sconklin@subsection BSD Searching 2601218Sconklin 2602218SconklinSearching the Berkeley @sc{unix} way means searching in a string 2603218Sconklinstarting at its first character and trying successive positions within 2604218Sconklinit to find a match. Once you've compiled a pattern using @code{re_comp} 2605218Sconklin(@pxref{BSD Regular Expression Compiling}), you can ask Regex 2606218Sconklinto search for that pattern in a string using: 2607218Sconklin 2608218Sconklin@findex re_exec 2609218Sconklin@example 2610218Sconklinint 2611218Sconklinre_exec (char *@var{string}) 2612218Sconklin@end example 2613218Sconklin 2614218Sconklin@noindent 2615218Sconklin@var{string} is the address of the null-terminated string in which you 2616218Sconklinwant to search. 2617218Sconklin 2618218Sconklin@code{re_exec} returns either 1 for success or 0 for failure. It 2619218Sconklinautomatically uses a @sc{gnu} fastmap (@pxref{Searching with Fastmaps}). 2620218Sconklin 2621218Sconklin 2622218Sconklin@node Copying, Index, Programming with Regex, Top 2623218Sconklin@appendix GNU GENERAL PUBLIC LICENSE 2624218Sconklin@center Version 2, June 1991 2625218Sconklin 2626218Sconklin@display 2627218SconklinCopyright @copyright{} 1989, 1991 Free Software Foundation, Inc. 2628218Sconklin675 Mass Ave, Cambridge, MA 02139, USA 2629218Sconklin 2630218SconklinEveryone is permitted to copy and distribute verbatim copies 2631218Sconklinof this license document, but changing it is not allowed. 2632218Sconklin@end display 2633218Sconklin 2634218Sconklin@unnumberedsec Preamble 2635218Sconklin 2636218Sconklin The licenses for most software are designed to take away your 2637218Sconklinfreedom to share and change it. By contrast, the GNU General Public 2638218SconklinLicense is intended to guarantee your freedom to share and change free 2639218Sconklinsoftware---to make sure the software is free for all its users. This 2640218SconklinGeneral Public License applies to most of the Free Software 2641218SconklinFoundation's software and to any other program whose authors commit to 2642218Sconklinusing it. (Some other Free Software Foundation software is covered by 2643218Sconklinthe GNU Library General Public License instead.) You can apply it to 2644218Sconklinyour programs, too. 2645218Sconklin 2646218Sconklin When we speak of free software, we are referring to freedom, not 2647218Sconklinprice. Our General Public Licenses are designed to make sure that you 2648218Sconklinhave the freedom to distribute copies of free software (and charge for 2649218Sconklinthis service if you wish), that you receive source code or can get it 2650218Sconklinif you want it, that you can change the software or use pieces of it 2651218Sconklinin new free programs; and that you know you can do these things. 2652218Sconklin 2653218Sconklin To protect your rights, we need to make restrictions that forbid 2654218Sconklinanyone to deny you these rights or to ask you to surrender the rights. 2655218SconklinThese restrictions translate to certain responsibilities for you if you 2656218Sconklindistribute copies of the software, or if you modify it. 2657218Sconklin 2658218Sconklin For example, if you distribute copies of such a program, whether 2659218Sconklingratis or for a fee, you must give the recipients all the rights that 2660218Sconklinyou have. You must make sure that they, too, receive or can get the 2661218Sconklinsource code. And you must show them these terms so they know their 2662218Sconklinrights. 2663218Sconklin 2664218Sconklin We protect your rights with two steps: (1) copyright the software, and 2665218Sconklin(2) offer you this license which gives you legal permission to copy, 2666218Sconklindistribute and/or modify the software. 2667218Sconklin 2668218Sconklin Also, for each author's protection and ours, we want to make certain 2669218Sconklinthat everyone understands that there is no warranty for this free 2670218Sconklinsoftware. If the software is modified by someone else and passed on, we 2671218Sconklinwant its recipients to know that what they have is not the original, so 2672218Sconklinthat any problems introduced by others will not reflect on the original 2673218Sconklinauthors' reputations. 2674218Sconklin 2675218Sconklin Finally, any free program is threatened constantly by software 2676218Sconklinpatents. We wish to avoid the danger that redistributors of a free 2677218Sconklinprogram will individually obtain patent licenses, in effect making the 2678218Sconklinprogram proprietary. To prevent this, we have made it clear that any 2679218Sconklinpatent must be licensed for everyone's free use or not licensed at all. 2680218Sconklin 2681218Sconklin The precise terms and conditions for copying, distribution and 2682218Sconklinmodification follow. 2683218Sconklin 2684218Sconklin@iftex 2685218Sconklin@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 2686218Sconklin@end iftex 2687218Sconklin@ifinfo 2688218Sconklin@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 2689218Sconklin@end ifinfo 2690218Sconklin 2691218Sconklin@enumerate 2692218Sconklin@item 2693218SconklinThis License applies to any program or other work which contains 2694218Sconklina notice placed by the copyright holder saying it may be distributed 2695218Sconklinunder the terms of this General Public License. The ``Program'', below, 2696218Sconklinrefers to any such program or work, and a ``work based on the Program'' 2697218Sconklinmeans either the Program or any derivative work under copyright law: 2698218Sconklinthat is to say, a work containing the Program or a portion of it, 2699218Sconklineither verbatim or with modifications and/or translated into another 2700218Sconklinlanguage. (Hereinafter, translation is included without limitation in 2701218Sconklinthe term ``modification''.) Each licensee is addressed as ``you''. 2702218Sconklin 2703218SconklinActivities other than copying, distribution and modification are not 2704218Sconklincovered by this License; they are outside its scope. The act of 2705218Sconklinrunning the Program is not restricted, and the output from the Program 2706218Sconklinis covered only if its contents constitute a work based on the 2707218SconklinProgram (independent of having been made by running the Program). 2708218SconklinWhether that is true depends on what the Program does. 2709218Sconklin 2710218Sconklin@item 2711218SconklinYou may copy and distribute verbatim copies of the Program's 2712218Sconklinsource code as you receive it, in any medium, provided that you 2713218Sconklinconspicuously and appropriately publish on each copy an appropriate 2714218Sconklincopyright notice and disclaimer of warranty; keep intact all the 2715218Sconklinnotices that refer to this License and to the absence of any warranty; 2716218Sconklinand give any other recipients of the Program a copy of this License 2717218Sconklinalong with the Program. 2718218Sconklin 2719218SconklinYou may charge a fee for the physical act of transferring a copy, and 2720218Sconklinyou may at your option offer warranty protection in exchange for a fee. 2721218Sconklin 2722218Sconklin@item 2723218SconklinYou may modify your copy or copies of the Program or any portion 2724218Sconklinof it, thus forming a work based on the Program, and copy and 2725218Sconklindistribute such modifications or work under the terms of Section 1 2726218Sconklinabove, provided that you also meet all of these conditions: 2727218Sconklin 2728218Sconklin@enumerate a 2729218Sconklin@item 2730218SconklinYou must cause the modified files to carry prominent notices 2731218Sconklinstating that you changed the files and the date of any change. 2732218Sconklin 2733218Sconklin@item 2734218SconklinYou must cause any work that you distribute or publish, that in 2735218Sconklinwhole or in part contains or is derived from the Program or any 2736218Sconklinpart thereof, to be licensed as a whole at no charge to all third 2737218Sconklinparties under the terms of this License. 2738218Sconklin 2739218Sconklin@item 2740218SconklinIf the modified program normally reads commands interactively 2741218Sconklinwhen run, you must cause it, when started running for such 2742218Sconklininteractive use in the most ordinary way, to print or display an 2743218Sconklinannouncement including an appropriate copyright notice and a 2744218Sconklinnotice that there is no warranty (or else, saying that you provide 2745218Sconklina warranty) and that users may redistribute the program under 2746218Sconklinthese conditions, and telling the user how to view a copy of this 2747218SconklinLicense. (Exception: if the Program itself is interactive but 2748218Sconklindoes not normally print such an announcement, your work based on 2749218Sconklinthe Program is not required to print an announcement.) 2750218Sconklin@end enumerate 2751218Sconklin 2752218SconklinThese requirements apply to the modified work as a whole. If 2753218Sconklinidentifiable sections of that work are not derived from the Program, 2754218Sconklinand can be reasonably considered independent and separate works in 2755218Sconklinthemselves, then this License, and its terms, do not apply to those 2756218Sconklinsections when you distribute them as separate works. But when you 2757218Sconklindistribute the same sections as part of a whole which is a work based 2758218Sconklinon the Program, the distribution of the whole must be on the terms of 2759218Sconklinthis License, whose permissions for other licensees extend to the 2760218Sconklinentire whole, and thus to each and every part regardless of who wrote it. 2761218Sconklin 2762218SconklinThus, it is not the intent of this section to claim rights or contest 2763218Sconklinyour rights to work written entirely by you; rather, the intent is to 2764218Sconklinexercise the right to control the distribution of derivative or 2765218Sconklincollective works based on the Program. 2766218Sconklin 2767218SconklinIn addition, mere aggregation of another work not based on the Program 2768218Sconklinwith the Program (or with a work based on the Program) on a volume of 2769218Sconklina storage or distribution medium does not bring the other work under 2770218Sconklinthe scope of this License. 2771218Sconklin 2772218Sconklin@item 2773218SconklinYou may copy and distribute the Program (or a work based on it, 2774218Sconklinunder Section 2) in object code or executable form under the terms of 2775218SconklinSections 1 and 2 above provided that you also do one of the following: 2776218Sconklin 2777218Sconklin@enumerate a 2778218Sconklin@item 2779218SconklinAccompany it with the complete corresponding machine-readable 2780218Sconklinsource code, which must be distributed under the terms of Sections 2781218Sconklin1 and 2 above on a medium customarily used for software interchange; or, 2782218Sconklin 2783218Sconklin@item 2784218SconklinAccompany it with a written offer, valid for at least three 2785218Sconklinyears, to give any third party, for a charge no more than your 2786218Sconklincost of physically performing source distribution, a complete 2787218Sconklinmachine-readable copy of the corresponding source code, to be 2788218Sconklindistributed under the terms of Sections 1 and 2 above on a medium 2789218Sconklincustomarily used for software interchange; or, 2790218Sconklin 2791218Sconklin@item 2792218SconklinAccompany it with the information you received as to the offer 2793218Sconklinto distribute corresponding source code. (This alternative is 2794218Sconklinallowed only for noncommercial distribution and only if you 2795218Sconklinreceived the program in object code or executable form with such 2796218Sconklinan offer, in accord with Subsection b above.) 2797218Sconklin@end enumerate 2798218Sconklin 2799218SconklinThe source code for a work means the preferred form of the work for 2800218Sconklinmaking modifications to it. For an executable work, complete source 2801218Sconklincode means all the source code for all modules it contains, plus any 2802218Sconklinassociated interface definition files, plus the scripts used to 2803218Sconklincontrol compilation and installation of the executable. However, as a 2804218Sconklinspecial exception, the source code distributed need not include 2805218Sconklinanything that is normally distributed (in either source or binary 2806218Sconklinform) with the major components (compiler, kernel, and so on) of the 2807218Sconklinoperating system on which the executable runs, unless that component 2808218Sconklinitself accompanies the executable. 2809218Sconklin 2810218SconklinIf distribution of executable or object code is made by offering 2811218Sconklinaccess to copy from a designated place, then offering equivalent 2812218Sconklinaccess to copy the source code from the same place counts as 2813218Sconklindistribution of the source code, even though third parties are not 2814218Sconklincompelled to copy the source along with the object code. 2815218Sconklin 2816218Sconklin@item 2817218SconklinYou may not copy, modify, sublicense, or distribute the Program 2818218Sconklinexcept as expressly provided under this License. Any attempt 2819218Sconklinotherwise to copy, modify, sublicense or distribute the Program is 2820218Sconklinvoid, and will automatically terminate your rights under this License. 2821218SconklinHowever, parties who have received copies, or rights, from you under 2822218Sconklinthis License will not have their licenses terminated so long as such 2823218Sconklinparties remain in full compliance. 2824218Sconklin 2825218Sconklin@item 2826218SconklinYou are not required to accept this License, since you have not 2827218Sconklinsigned it. However, nothing else grants you permission to modify or 2828218Sconklindistribute the Program or its derivative works. These actions are 2829218Sconklinprohibited by law if you do not accept this License. Therefore, by 2830218Sconklinmodifying or distributing the Program (or any work based on the 2831218SconklinProgram), you indicate your acceptance of this License to do so, and 2832218Sconklinall its terms and conditions for copying, distributing or modifying 2833218Sconklinthe Program or works based on it. 2834218Sconklin 2835218Sconklin@item 2836218SconklinEach time you redistribute the Program (or any work based on the 2837218SconklinProgram), the recipient automatically receives a license from the 2838218Sconklinoriginal licensor to copy, distribute or modify the Program subject to 2839218Sconklinthese terms and conditions. You may not impose any further 2840218Sconklinrestrictions on the recipients' exercise of the rights granted herein. 2841218SconklinYou are not responsible for enforcing compliance by third parties to 2842218Sconklinthis License. 2843218Sconklin 2844218Sconklin@item 2845218SconklinIf, as a consequence of a court judgment or allegation of patent 2846218Sconklininfringement or for any other reason (not limited to patent issues), 2847218Sconklinconditions are imposed on you (whether by court order, agreement or 2848218Sconklinotherwise) that contradict the conditions of this License, they do not 2849218Sconklinexcuse you from the conditions of this License. If you cannot 2850218Sconklindistribute so as to satisfy simultaneously your obligations under this 2851218SconklinLicense and any other pertinent obligations, then as a consequence you 2852218Sconklinmay not distribute the Program at all. For example, if a patent 2853218Sconklinlicense would not permit royalty-free redistribution of the Program by 2854218Sconklinall those who receive copies directly or indirectly through you, then 2855218Sconklinthe only way you could satisfy both it and this License would be to 2856218Sconklinrefrain entirely from distribution of the Program. 2857218Sconklin 2858218SconklinIf any portion of this section is held invalid or unenforceable under 2859218Sconklinany particular circumstance, the balance of the section is intended to 2860218Sconklinapply and the section as a whole is intended to apply in other 2861218Sconklincircumstances. 2862218Sconklin 2863218SconklinIt is not the purpose of this section to induce you to infringe any 2864218Sconklinpatents or other property right claims or to contest validity of any 2865218Sconklinsuch claims; this section has the sole purpose of protecting the 2866218Sconklinintegrity of the free software distribution system, which is 2867218Sconklinimplemented by public license practices. Many people have made 2868218Sconklingenerous contributions to the wide range of software distributed 2869218Sconklinthrough that system in reliance on consistent application of that 2870218Sconklinsystem; it is up to the author/donor to decide if he or she is willing 2871218Sconklinto distribute software through any other system and a licensee cannot 2872218Sconklinimpose that choice. 2873218Sconklin 2874218SconklinThis section is intended to make thoroughly clear what is believed to 2875218Sconklinbe a consequence of the rest of this License. 2876218Sconklin 2877218Sconklin@item 2878218SconklinIf the distribution and/or use of the Program is restricted in 2879218Sconklincertain countries either by patents or by copyrighted interfaces, the 2880218Sconklinoriginal copyright holder who places the Program under this License 2881218Sconklinmay add an explicit geographical distribution limitation excluding 2882218Sconklinthose countries, so that distribution is permitted only in or among 2883218Sconklincountries not thus excluded. In such case, this License incorporates 2884218Sconklinthe limitation as if written in the body of this License. 2885218Sconklin 2886218Sconklin@item 2887218SconklinThe Free Software Foundation may publish revised and/or new versions 2888218Sconklinof the General Public License from time to time. Such new versions will 2889218Sconklinbe similar in spirit to the present version, but may differ in detail to 2890218Sconklinaddress new problems or concerns. 2891218Sconklin 2892218SconklinEach version is given a distinguishing version number. If the Program 2893218Sconklinspecifies a version number of this License which applies to it and ``any 2894218Sconklinlater version'', you have the option of following the terms and conditions 2895218Sconklineither of that version or of any later version published by the Free 2896218SconklinSoftware Foundation. If the Program does not specify a version number of 2897218Sconklinthis License, you may choose any version ever published by the Free Software 2898218SconklinFoundation. 2899218Sconklin 2900218Sconklin@item 2901218SconklinIf you wish to incorporate parts of the Program into other free 2902218Sconklinprograms whose distribution conditions are different, write to the author 2903218Sconklinto ask for permission. For software which is copyrighted by the Free 2904218SconklinSoftware Foundation, write to the Free Software Foundation; we sometimes 2905218Sconklinmake exceptions for this. Our decision will be guided by the two goals 2906218Sconklinof preserving the free status of all derivatives of our free software and 2907218Sconklinof promoting the sharing and reuse of software generally. 2908218Sconklin 2909218Sconklin@iftex 2910218Sconklin@heading NO WARRANTY 2911218Sconklin@end iftex 2912218Sconklin@ifinfo 2913218Sconklin@center NO WARRANTY 2914218Sconklin@end ifinfo 2915218Sconklin 2916218Sconklin@item 2917218SconklinBECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 2918218SconklinFOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 2919218SconklinOTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 2920218SconklinPROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 2921218SconklinOR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 2922218SconklinMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 2923218SconklinTO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 2924218SconklinPROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 2925218SconklinREPAIR OR CORRECTION. 2926218Sconklin 2927218Sconklin@item 2928218SconklinIN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 2929218SconklinWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 2930218SconklinREDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 2931218SconklinINCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 2932218SconklinOUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 2933218SconklinTO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 2934218SconklinYOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 2935218SconklinPROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 2936218SconklinPOSSIBILITY OF SUCH DAMAGES. 2937218Sconklin@end enumerate 2938218Sconklin 2939218Sconklin@iftex 2940218Sconklin@heading END OF TERMS AND CONDITIONS 2941218Sconklin@end iftex 2942218Sconklin@ifinfo 2943218Sconklin@center END OF TERMS AND CONDITIONS 2944218Sconklin@end ifinfo 2945218Sconklin 2946218Sconklin@page 2947218Sconklin@unnumberedsec Appendix: How to Apply These Terms to Your New Programs 2948218Sconklin 2949218Sconklin If you develop a new program, and you want it to be of the greatest 2950218Sconklinpossible use to the public, the best way to achieve this is to make it 2951218Sconklinfree software which everyone can redistribute and change under these terms. 2952218Sconklin 2953218Sconklin To do so, attach the following notices to the program. It is safest 2954218Sconklinto attach them to the start of each source file to most effectively 2955218Sconklinconvey the exclusion of warranty; and each file should have at least 2956218Sconklinthe ``copyright'' line and a pointer to where the full notice is found. 2957218Sconklin 2958218Sconklin@smallexample 2959218Sconklin@var{one line to give the program's name and a brief idea of what it does.} 2960218SconklinCopyright (C) 19@var{yy} @var{name of author} 2961218Sconklin 2962218SconklinThis program is free software; you can redistribute it and/or modify 2963218Sconklinit under the terms of the GNU General Public License as published by 2964218Sconklinthe Free Software Foundation; either version 2 of the License, or 2965218Sconklin(at your option) any later version. 2966218Sconklin 2967218SconklinThis program is distributed in the hope that it will be useful, 2968218Sconklinbut WITHOUT ANY WARRANTY; without even the implied warranty of 2969218SconklinMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 2970218SconklinGNU General Public License for more details. 2971218Sconklin 2972218SconklinYou should have received a copy of the GNU General Public License 2973218Sconklinalong with this program; if not, write to the Free Software 2974218SconklinFoundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 2975218Sconklin@end smallexample 2976218Sconklin 2977218SconklinAlso add information on how to contact you by electronic and paper mail. 2978218Sconklin 2979218SconklinIf the program is interactive, make it output a short notice like this 2980218Sconklinwhen it starts in an interactive mode: 2981218Sconklin 2982218Sconklin@smallexample 2983218SconklinGnomovision version 69, Copyright (C) 19@var{yy} @var{name of author} 2984218SconklinGnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 2985218SconklinThis is free software, and you are welcome to redistribute it 2986218Sconklinunder certain conditions; type `show c' for details. 2987218Sconklin@end smallexample 2988218Sconklin 2989218SconklinThe hypothetical commands @samp{show w} and @samp{show c} should show 2990218Sconklinthe appropriate parts of the General Public License. Of course, the 2991218Sconklincommands you use may be called something other than @samp{show w} and 2992218Sconklin@samp{show c}; they could even be mouse-clicks or menu items---whatever 2993218Sconklinsuits your program. 2994218Sconklin 2995218SconklinYou should also get your employer (if you work as a programmer) or your 2996218Sconklinschool, if any, to sign a ``copyright disclaimer'' for the program, if 2997218Sconklinnecessary. Here is a sample; alter the names: 2998218Sconklin 2999218Sconklin@example 3000218SconklinYoyodyne, Inc., hereby disclaims all copyright interest in the program 3001218Sconklin`Gnomovision' (which makes passes at compilers) written by James Hacker. 3002218Sconklin 3003218Sconklin@var{signature of Ty Coon}, 1 April 1989 3004218SconklinTy Coon, President of Vice 3005218Sconklin@end example 3006218Sconklin 3007218SconklinThis General Public License does not permit incorporating your program into 3008218Sconklinproprietary programs. If your program is a subroutine library, you may 3009218Sconklinconsider it more useful to permit linking proprietary applications with the 3010218Sconklinlibrary. If this is what you want to do, use the GNU Library General 3011218SconklinPublic License instead of this License. 3012218Sconklin 3013218Sconklin 3014218Sconklin@node Index, , Copying, Top 3015218Sconklin@unnumbered Index 3016218Sconklin 3017218Sconklin@printindex cp 3018218Sconklin 3019218Sconklin@contents 3020218Sconklin 3021218Sconklin@bye 3022