xregex.texi revision 21643
1\input texinfo 2@c %**start of header 3@setfilename regex.info 4@settitle Regex 5@c %**end of header 6 7@c \\{fill-paragraph} works better (for me, anyway) if the text in the 8@c source file isn't indented. 9@paragraphindent 2 10 11@c Define a new index for our magic constants. 12@defcodeindex cn 13 14@c Put everything in one index (arbitrarily chosen to be the concept index). 15@syncodeindex cn cp 16@syncodeindex ky cp 17@syncodeindex pg cp 18@syncodeindex tp cp 19@syncodeindex vr cp 20 21@c Here is what we use in the Info `dir' file: 22@c * Regex: (regex). Regular expression library. 23 24 25@ifinfo 26This file documents the GNU regular expression library. 27 28Copyright (C) 1992, 1993 Free Software Foundation, Inc. 29 30Permission is granted to make and distribute verbatim copies of this 31manual provided the copyright notice and this permission notice are 32preserved on all copies. 33 34@ignore 35Permission is granted to process this file through TeX and print the 36results, provided the printed document carries a copying permission 37notice identical to this one except for the removal of this paragraph 38(this paragraph not being relevant to the printed manual). 39@end ignore 40 41Permission is granted to copy and distribute modified versions of this 42manual under the conditions for verbatim copying, provided also that the 43section entitled ``GNU General Public License'' is included exactly as 44in the original, and provided that the entire resulting derived work is 45distributed under the terms of a permission notice identical to this one. 46 47Permission is granted to copy and distribute translations of this manual 48into another language, under the above conditions for modified versions, 49except that the section entitled ``GNU General Public License'' may be 50included in a translation approved by the Free Software Foundation 51instead of in the original English. 52@end ifinfo 53 54 55@titlepage 56 57@title Regex 58@subtitle edition 0.12a 59@subtitle 19 September 1992 60@author Kathryn A. Hargreaves 61@author Karl Berry 62 63@page 64 65@vskip 0pt plus 1filll 66Copyright @copyright{} 1992 Free Software Foundation. 67 68Permission is granted to make and distribute verbatim copies of this 69manual provided the copyright notice and this permission notice are 70preserved on all copies. 71 72Permission is granted to copy and distribute modified versions of this 73manual under the conditions for verbatim copying, provided also that the 74section entitled ``GNU General Public License'' is included exactly as 75in the original, and provided that the entire resulting derived work is 76distributed under the terms of a permission notice identical to this 77one. 78 79Permission is granted to copy and distribute translations of this manual 80into another language, under the above conditions for modified versions, 81except that the section entitled ``GNU General Public License'' may be 82included in a translation approved by the Free Software Foundation 83instead of in the original English. 84 85@end titlepage 86 87 88@ifinfo 89@node Top, Overview, (dir), (dir) 90@top Regular Expression Library 91 92This manual documents how to program with the GNU regular expression 93library. This is edition 0.12a of the manual, 19 September 1992. 94 95The first part of this master menu lists the major nodes in this Info 96document, including the index. The rest of the menu lists all the 97lower level nodes in the document. 98 99@menu 100* Overview:: 101* Regular Expression Syntax:: 102* Common Operators:: 103* GNU Operators:: 104* GNU Emacs Operators:: 105* What Gets Matched?:: 106* Programming with Regex:: 107* Copying:: Copying and sharing Regex. 108* Index:: General index. 109 --- The Detailed Node Listing --- 110 111Regular Expression Syntax 112 113* Syntax Bits:: 114* Predefined Syntaxes:: 115* Collating Elements vs. Characters:: 116* The Backslash Character:: 117 118Common Operators 119 120* Match-self Operator:: Ordinary characters. 121* Match-any-character Operator:: . 122* Concatenation Operator:: Juxtaposition. 123* Repetition Operators:: * + ? @{@} 124* Alternation Operator:: | 125* List Operators:: [...] [^...] 126* Grouping Operators:: (...) 127* Back-reference Operator:: \digit 128* Anchoring Operators:: ^ $ 129 130Repetition Operators 131 132* Match-zero-or-more Operator:: * 133* Match-one-or-more Operator:: + 134* Match-zero-or-one Operator:: ? 135* Interval Operators:: @{@} 136 137List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]}) 138 139* Character Class Operators:: [:class:] 140* Range Operator:: start-end 141 142Anchoring Operators 143 144* Match-beginning-of-line Operator:: ^ 145* Match-end-of-line Operator:: $ 146 147GNU Operators 148 149* Word Operators:: 150* Buffer Operators:: 151 152Word Operators 153 154* Non-Emacs Syntax Tables:: 155* Match-word-boundary Operator:: \b 156* Match-within-word Operator:: \B 157* Match-beginning-of-word Operator:: \< 158* Match-end-of-word Operator:: \> 159* Match-word-constituent Operator:: \w 160* Match-non-word-constituent Operator:: \W 161 162Buffer Operators 163 164* Match-beginning-of-buffer Operator:: \` 165* Match-end-of-buffer Operator:: \' 166 167GNU Emacs Operators 168 169* Syntactic Class Operators:: 170 171Syntactic Class Operators 172 173* Emacs Syntax Tables:: 174* Match-syntactic-class Operator:: \sCLASS 175* Match-not-syntactic-class Operator:: \SCLASS 176 177Programming with Regex 178 179* GNU Regex Functions:: 180* POSIX Regex Functions:: 181* BSD Regex Functions:: 182 183GNU Regex Functions 184 185* GNU Pattern Buffers:: The re_pattern_buffer type. 186* GNU Regular Expression Compiling:: re_compile_pattern () 187* GNU Matching:: re_match () 188* GNU Searching:: re_search () 189* Matching/Searching with Split Data:: re_match_2 (), re_search_2 () 190* Searching with Fastmaps:: re_compile_fastmap () 191* GNU Translate Tables:: The `translate' field. 192* Using Registers:: The re_registers type and related fns. 193* Freeing GNU Pattern Buffers:: regfree () 194 195POSIX Regex Functions 196 197* POSIX Pattern Buffers:: The regex_t type. 198* POSIX Regular Expression Compiling:: regcomp () 199* POSIX Matching:: regexec () 200* Reporting Errors:: regerror () 201* Using Byte Offsets:: The regmatch_t type. 202* Freeing POSIX Pattern Buffers:: regfree () 203 204BSD Regex Functions 205 206* BSD Regular Expression Compiling:: re_comp () 207* BSD Searching:: re_exec () 208@end menu 209@end ifinfo 210@node Overview, Regular Expression Syntax, Top, Top 211@chapter Overview 212 213A @dfn{regular expression} (or @dfn{regexp}, or @dfn{pattern}) is a text 214string that describes some (mathematical) set of strings. A regexp 215@var{r} @dfn{matches} a string @var{s} if @var{s} is in the set of 216strings described by @var{r}. 217 218Using the Regex library, you can: 219 220@itemize @bullet 221 222@item 223see if a string matches a specified pattern as a whole, and 224 225@item 226search within a string for a substring matching a specified pattern. 227 228@end itemize 229 230Some regular expressions match only one string, i.e., the set they 231describe has only one member. For example, the regular expression 232@samp{foo} matches the string @samp{foo} and no others. Other regular 233expressions match more than one string, i.e., the set they describe has 234more than one member. For example, the regular expression @samp{f*} 235matches the set of strings made up of any number (including zero) of 236@samp{f}s. As you can see, some characters in regular expressions match 237themselves (such as @samp{f}) and some don't (such as @samp{*}); the 238ones that don't match themselves instead let you specify patterns that 239describe many different strings. 240 241To either match or search for a regular expression with the Regex 242library functions, you must first compile it with a Regex pattern 243compiling function. A @dfn{compiled pattern} is a regular expression 244converted to the internal format used by the library functions. Once 245you've compiled a pattern, you can use it for matching or searching any 246number of times. 247 248The Regex library consists of two source files: @file{regex.h} and 249@file{regex.c}. 250@pindex regex.h 251@pindex regex.c 252Regex provides three groups of functions with which you can operate on 253regular expressions. One group---the @sc{gnu} group---is more powerful 254but not completely compatible with the other two, namely the @sc{posix} 255and Berkeley @sc{unix} groups; its interface was designed specifically 256for @sc{gnu}. The other groups have the same interfaces as do the 257regular expression functions in @sc{posix} and Berkeley 258@sc{unix}. 259 260We wrote this chapter with programmers in mind, not users of 261programs---such as Emacs---that use Regex. We describe the Regex 262library in its entirety, not how to write regular expressions that a 263particular program understands. 264 265 266@node Regular Expression Syntax, Common Operators, Overview, Top 267@chapter Regular Expression Syntax 268 269@cindex regular expressions, syntax of 270@cindex syntax of regular expressions 271 272@dfn{Characters} are things you can type. @dfn{Operators} are things in 273a regular expression that match one or more characters. You compose 274regular expressions from operators, which in turn you specify using one 275or more characters. 276 277Most characters represent what we call the match-self operator, i.e., 278they match themselves; we call these characters @dfn{ordinary}. Other 279characters represent either all or parts of fancier operators; e.g., 280@samp{.} represents what we call the match-any-character operator 281(which, no surprise, matches (almost) any character); we call these 282characters @dfn{special}. Two different things determine what 283characters represent what operators: 284 285@enumerate 286@item 287the regular expression syntax your program has told the Regex library to 288recognize, and 289 290@item 291the context of the character in the regular expression. 292@end enumerate 293 294In the following sections, we describe these things in more detail. 295 296@menu 297* Syntax Bits:: 298* Predefined Syntaxes:: 299* Collating Elements vs. Characters:: 300* The Backslash Character:: 301@end menu 302 303 304@node Syntax Bits, Predefined Syntaxes, , Regular Expression Syntax 305@section Syntax Bits 306 307@cindex syntax bits 308 309In any particular syntax for regular expressions, some characters are 310always special, others are sometimes special, and others are never 311special. The particular syntax that Regex recognizes for a given 312regular expression depends on the value in the @code{syntax} field of 313the pattern buffer of that regular expression. 314 315You get a pattern buffer by compiling a regular expression. @xref{GNU 316Pattern Buffers}, and @ref{POSIX Pattern Buffers}, for more information 317on pattern buffers. @xref{GNU Regular Expression Compiling}, @ref{POSIX 318Regular Expression Compiling}, and @ref{BSD Regular Expression 319Compiling}, for more information on compiling. 320 321Regex considers the value of the @code{syntax} field to be a collection 322of bits; we refer to these bits as @dfn{syntax bits}. In most cases, 323they affect what characters represent what operators. We describe the 324meanings of the operators to which we refer in @ref{Common Operators}, 325@ref{GNU Operators}, and @ref{GNU Emacs Operators}. 326 327For reference, here is the complete list of syntax bits, in alphabetical 328order: 329 330@table @code 331 332@cnindex RE_BACKSLASH_ESCAPE_IN_LIST 333@item RE_BACKSLASH_ESCAPE_IN_LISTS 334If this bit is set, then @samp{\} inside a list (@pxref{List Operators} 335quotes (makes ordinary, if it's special) the following character; if 336this bit isn't set, then @samp{\} is an ordinary character inside lists. 337(@xref{The Backslash Character}, for what `\' does outside of lists.) 338 339@cnindex RE_BK_PLUS_QM 340@item RE_BK_PLUS_QM 341If this bit is set, then @samp{\+} represents the match-one-or-more 342operator and @samp{\?} represents the match-zero-or-more operator; if 343this bit isn't set, then @samp{+} represents the match-one-or-more 344operator and @samp{?} represents the match-zero-or-one operator. This 345bit is irrelevant if @code{RE_LIMITED_OPS} is set. 346 347@cnindex RE_CHAR_CLASSES 348@item RE_CHAR_CLASSES 349If this bit is set, then you can use character classes in lists; if this 350bit isn't set, then you can't. 351 352@cnindex RE_CONTEXT_INDEP_ANCHORS 353@item RE_CONTEXT_INDEP_ANCHORS 354If this bit is set, then @samp{^} and @samp{$} are special anywhere outside 355a list; if this bit isn't set, then these characters are special only in 356certain contexts. @xref{Match-beginning-of-line Operator}, and 357@ref{Match-end-of-line Operator}. 358 359@cnindex RE_CONTEXT_INDEP_OPS 360@item RE_CONTEXT_INDEP_OPS 361If this bit is set, then certain characters are special anywhere outside 362a list; if this bit isn't set, then those characters are special only in 363some contexts and are ordinary elsewhere. Specifically, if this bit 364isn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS} 365isn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, depending 366on the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operators 367only if they're not first in a regular expression or just after an 368open-group or alternation operator. The same holds for @samp{@{} (or 369@samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) if 370it is the beginning of a valid interval and the syntax bit 371@code{RE_INTERVALS} is set. 372 373@cnindex RE_CONTEXT_INVALID_OPS 374@item RE_CONTEXT_INVALID_OPS 375If this bit is set, then repetition and alternation operators can't be 376in certain positions within a regular expression. Specifically, the 377regular expression is invalid if it has: 378 379@itemize @bullet 380 381@item 382a repetition operator first in the regular expression or just after a 383match-beginning-of-line, open-group, or alternation operator; or 384 385@item 386an alternation operator first or last in the regular expression, just 387before a match-end-of-line operator, or just after an alternation or 388open-group operator. 389 390@end itemize 391 392If this bit isn't set, then you can put the characters representing the 393repetition and alternation characters anywhere in a regular expression. 394Whether or not they will in fact be operators in certain positions 395depends on other syntax bits. 396 397@cnindex RE_DOT_NEWLINE 398@item RE_DOT_NEWLINE 399If this bit is set, then the match-any-character operator matches 400a newline; if this bit isn't set, then it doesn't. 401 402@cnindex RE_DOT_NOT_NULL 403@item RE_DOT_NOT_NULL 404If this bit is set, then the match-any-character operator doesn't match 405a null character; if this bit isn't set, then it does. 406 407@cnindex RE_INTERVALS 408@item RE_INTERVALS 409If this bit is set, then Regex recognizes interval operators; if this bit 410isn't set, then it doesn't. 411 412@cnindex RE_LIMITED_OPS 413@item RE_LIMITED_OPS 414If this bit is set, then Regex doesn't recognize the match-one-or-more, 415match-zero-or-one or alternation operators; if this bit isn't set, then 416it does. 417 418@cnindex RE_NEWLINE_ALT 419@item RE_NEWLINE_ALT 420If this bit is set, then newline represents the alternation operator; if 421this bit isn't set, then newline is ordinary. 422 423@cnindex RE_NO_BK_BRACES 424@item RE_NO_BK_BRACES 425If this bit is set, then @samp{@{} represents the open-interval operator 426and @samp{@}} represents the close-interval operator; if this bit isn't 427set, then @samp{\@{} represents the open-interval operator and 428@samp{\@}} represents the close-interval operator. This bit is relevant 429only if @code{RE_INTERVALS} is set. 430 431@cnindex RE_NO_BK_PARENS 432@item RE_NO_BK_PARENS 433If this bit is set, then @samp{(} represents the open-group operator and 434@samp{)} represents the close-group operator; if this bit isn't set, then 435@samp{\(} represents the open-group operator and @samp{\)} represents 436the close-group operator. 437 438@cnindex RE_NO_BK_REFS 439@item RE_NO_BK_REFS 440If this bit is set, then Regex doesn't recognize @samp{\}@var{digit} as 441the back reference operator; if this bit isn't set, then it does. 442 443@cnindex RE_NO_BK_VBAR 444@item RE_NO_BK_VBAR 445If this bit is set, then @samp{|} represents the alternation operator; 446if this bit isn't set, then @samp{\|} represents the alternation 447operator. This bit is irrelevant if @code{RE_LIMITED_OPS} is set. 448 449@cnindex RE_NO_EMPTY_RANGES 450@item RE_NO_EMPTY_RANGES 451If this bit is set, then a regular expression with a range whose ending 452point collates lower than its starting point is invalid; if this bit 453isn't set, then Regex considers such a range to be empty. 454 455@cnindex RE_UNMATCHED_RIGHT_PAREN_ORD 456@item RE_UNMATCHED_RIGHT_PAREN_ORD 457If this bit is set and the regular expression has no matching open-group 458operator, then Regex considers what would otherwise be a close-group 459operator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}. 460 461@end table 462 463 464@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax 465@section Predefined Syntaxes 466 467If you're programming with Regex, you can set a pattern buffer's 468(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers}) 469@code{syntax} field either to an arbitrary combination of syntax bits 470(@pxref{Syntax Bits}) or else to the configurations defined by Regex. 471These configurations define the syntaxes used by certain 472programs---@sc{gnu} Emacs, 473@cindex Emacs 474@sc{posix} Awk, 475@cindex POSIX Awk 476traditional Awk, 477@cindex Awk 478Grep, 479@cindex Grep 480@cindex Egrep 481Egrep---in addition to syntaxes for @sc{posix} basic and extended 482regular expressions. 483 484The predefined syntaxes--taken directly from @file{regex.h}---are: 485 486@example 487[[[ syntaxes ]]] 488@end example 489 490@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax 491@section Collating Elements vs.@: Characters 492 493@sc{posix} generalizes the notion of a character to that of a 494collating element. It defines a @dfn{collating element} to be ``a 495sequence of one or more bytes defined in the current collating sequence 496as a unit of collation.'' 497 498This generalizes the notion of a character in 499two ways. First, a single character can map into two or more collating 500elements. For example, the German 501@tex 502`\ss' 503@end tex 504@ifinfo 505``es-zet'' 506@end ifinfo 507collates as the collating element @samp{s} followed by another collating 508element @samp{s}. Second, two or more characters can map into one 509collating element. For example, the Spanish @samp{ll} collates after 510@samp{l} and before @samp{m}. 511 512Since @sc{posix}'s ``collating element'' preserves the essential idea of 513a ``character,'' we use the latter, more familiar, term in this document. 514 515@node The Backslash Character, , Collating Elements vs. Characters, Regular Expression Syntax 516@section The Backslash Character 517 518@cindex \ 519The @samp{\} character has one of four different meanings, depending on 520the context in which you use it and what syntax bits are set 521(@pxref{Syntax Bits}). It can: 1) stand for itself, 2) quote the next 522character, 3) introduce an operator, or 4) do nothing. 523 524@enumerate 525@item 526It stands for itself inside a list 527(@pxref{List Operators}) if the syntax bit 528@code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set. For example, @samp{[\]} 529would match @samp{\}. 530 531@item 532It quotes (makes ordinary, if it's special) the next character when you 533use it either: 534 535@itemize @bullet 536@item 537outside a list,@footnote{Sometimes 538you don't have to explicitly quote special characters to make 539them ordinary. For instance, most characters lose any special meaning 540inside a list (@pxref{List Operators}). In addition, if the syntax bits 541@code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS} 542aren't set, then (for historical reasons) the matcher considers special 543characters ordinary if they are in contexts where the operations they 544represent make no sense; for example, then the match-zero-or-more 545operator (represented by @samp{*}) matches itself in the regular 546expression @samp{*foo} because there is no preceding expression on which 547it can operate. It is poor practice, however, to depend on this 548behavior; if you want a special character to be ordinary outside a list, 549it's better to always quote it, regardless.} or 550 551@item 552inside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set. 553 554@end itemize 555 556@item 557It introduces an operator when followed by certain ordinary 558characters---sometimes only when certain syntax bits are set. See the 559cases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR}, 560@code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}. Also: 561 562@itemize @bullet 563@item 564@samp{\b} represents the match-word-boundary operator 565(@pxref{Match-word-boundary Operator}). 566 567@item 568@samp{\B} represents the match-within-word operator 569(@pxref{Match-within-word Operator}). 570 571@item 572@samp{\<} represents the match-beginning-of-word operator @* 573(@pxref{Match-beginning-of-word Operator}). 574 575@item 576@samp{\>} represents the match-end-of-word operator 577(@pxref{Match-end-of-word Operator}). 578 579@item 580@samp{\w} represents the match-word-constituent operator 581(@pxref{Match-word-constituent Operator}). 582 583@item 584@samp{\W} represents the match-non-word-constituent operator 585(@pxref{Match-non-word-constituent Operator}). 586 587@item 588@samp{\`} represents the match-beginning-of-buffer 589operator and @samp{\'} represents the match-end-of-buffer operator 590(@pxref{Buffer Operators}). 591 592@item 593If Regex was compiled with the C preprocessor symbol @code{emacs} 594defined, then @samp{\s@var{class}} represents the match-syntactic-class 595operator and @samp{\S@var{class}} represents the 596match-not-syntactic-class operator (@pxref{Syntactic Class Operators}). 597 598@end itemize 599 600@item 601In all other cases, Regex ignores @samp{\}. For example, 602@samp{\n} matches @samp{n}. 603 604@end enumerate 605 606@node Common Operators, GNU Operators, Regular Expression Syntax, Top 607@chapter Common Operators 608 609You compose regular expressions from operators. In the following 610sections, we describe the regular expression operators specified by 611@sc{posix}; @sc{gnu} also uses these. Most operators have more than one 612representation as characters. @xref{Regular Expression Syntax}, for 613what characters represent what operators under what circumstances. 614 615For most operators that can be represented in two ways, one 616representation is a single character and the other is that character 617preceded by @samp{\}. For example, either @samp{(} or @samp{\(} 618represents the open-group operator. Which one does depends on the 619setting of a syntax bit, in this case @code{RE_NO_BK_PARENS}. Why is 620this so? Historical reasons dictate some of the varying 621representations, while @sc{posix} dictates others. 622 623Finally, almost all characters lose any special meaning inside a list 624(@pxref{List Operators}). 625 626@menu 627* Match-self Operator:: Ordinary characters. 628* Match-any-character Operator:: . 629* Concatenation Operator:: Juxtaposition. 630* Repetition Operators:: * + ? @{@} 631* Alternation Operator:: | 632* List Operators:: [...] [^...] 633* Grouping Operators:: (...) 634* Back-reference Operator:: \digit 635* Anchoring Operators:: ^ $ 636@end menu 637 638@node Match-self Operator, Match-any-character Operator, , Common Operators 639@section The Match-self Operator (@var{ordinary character}) 640 641This operator matches the character itself. All ordinary characters 642(@pxref{Regular Expression Syntax}) represent this operator. For 643example, @samp{f} is always an ordinary character, so the regular 644expression @samp{f} matches only the string @samp{f}. In 645particular, it does @emph{not} match the string @samp{ff}. 646 647@node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators 648@section The Match-any-character Operator (@code{.}) 649 650@cindex @samp{.} 651 652This operator matches any single printing or nonprinting character 653except it won't match a: 654 655@table @asis 656@item newline 657if the syntax bit @code{RE_DOT_NEWLINE} isn't set. 658 659@item null 660if the syntax bit @code{RE_DOT_NOT_NULL} is set. 661 662@end table 663 664The @samp{.} (period) character represents this operator. For example, 665@samp{a.b} matches any three-character string beginning with @samp{a} 666and ending with @samp{b}. 667 668@node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators 669@section The Concatenation Operator 670 671This operator concatenates two regular expressions @var{a} and @var{b}. 672No character represents this operator; you simply put @var{b} after 673@var{a}. The result is a regular expression that will match a string if 674@var{a} matches its first part and @var{b} matches the rest. For 675example, @samp{xy} (two match-self operators) matches @samp{xy}. 676 677@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators 678@section Repetition Operators 679 680Repetition operators repeat the preceding regular expression a specified 681number of times. 682 683@menu 684* Match-zero-or-more Operator:: * 685* Match-one-or-more Operator:: + 686* Match-zero-or-one Operator:: ? 687* Interval Operators:: @{@} 688@end menu 689 690@node Match-zero-or-more Operator, Match-one-or-more Operator, , Repetition Operators 691@subsection The Match-zero-or-more Operator (@code{*}) 692 693@cindex @samp{*} 694 695This operator repeats the smallest possible preceding regular expression 696as many times as necessary (including zero) to match the pattern. 697@samp{*} represents this operator. For example, @samp{o*} 698matches any string made up of zero or more @samp{o}s. Since this 699operator operates on the smallest preceding regular expression, 700@samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}. So, 701@samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on. 702 703Since the match-zero-or-more operator is a suffix operator, it may be 704useless as such when no regular expression precedes it. This is the 705case when it: 706 707@itemize @bullet 708@item 709is first in a regular expression, or 710 711@item 712follows a match-beginning-of-line, open-group, or alternation 713operator. 714 715@end itemize 716 717@noindent 718Three different things can happen in these cases: 719 720@enumerate 721@item 722If the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then the 723regular expression is invalid. 724 725@item 726If @code{RE_CONTEXT_INVALID_OPS} isn't set, but 727@code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents the 728match-zero-or-more operator (which then operates on the empty string). 729 730@item 731Otherwise, @samp{*} is ordinary. 732 733@end enumerate 734 735@cindex backtracking 736The matcher processes a match-zero-or-more operator by first matching as 737many repetitions of the smallest preceding regular expression as it can. 738Then it continues to match the rest of the pattern. 739 740If it can't match the rest of the pattern, it backtracks (as many times 741as necessary), each time discarding one of the matches until it can 742either match the entire pattern or be certain that it cannot get a 743match. For example, when matching @samp{ca*ar} against @samp{caaar}, 744the matcher first matches all three @samp{a}s of the string with the 745@samp{a*} of the regular expression. However, it cannot then match the 746final @samp{ar} of the regular expression against the final @samp{r} of 747the string. So it backtracks, discarding the match of the last @samp{a} 748in the string. It can then match the remaining @samp{ar}. 749 750 751@node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators 752@subsection The Match-one-or-more Operator (@code{+} or @code{\+}) 753 754@cindex @samp{+} 755 756If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize 757this operator. Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't 758set, then @samp{+} represents this operator; if it is, then @samp{\+} 759does. 760 761This operator is similar to the match-zero-or-more operator except that 762it repeats the preceding regular expression at least once; 763@pxref{Match-zero-or-more Operator}, for what it operates on, how some 764syntax bits affect it, and how Regex backtracks to match it. 765 766For example, supposing that @samp{+} represents the match-one-or-more 767operator; then @samp{ca+r} matches, e.g., @samp{car} and 768@samp{caaaar}, but not @samp{cr}. 769 770@node Match-zero-or-one Operator, Interval Operators, Match-one-or-more Operator, Repetition Operators 771@subsection The Match-zero-or-one Operator (@code{?} or @code{\?}) 772@cindex @samp{?} 773 774If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't 775recognize this operator. Otherwise, if the syntax bit 776@code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator; 777if it is, then @samp{\?} does. 778 779This operator is similar to the match-zero-or-more operator except that 780it repeats the preceding regular expression once or not at all; 781@pxref{Match-zero-or-more Operator}, to see what it operates on, how 782some syntax bits affect it, and how Regex backtracks to match it. 783 784For example, supposing that @samp{?} represents the match-zero-or-one 785operator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, but 786nothing else. 787 788@node Interval Operators, , Match-zero-or-one Operator, Repetition Operators 789@subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}}) 790 791@cindex interval expression 792@cindex @samp{@{} 793@cindex @samp{@}} 794@cindex @samp{\@{} 795@cindex @samp{\@}} 796 797If the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes 798@dfn{interval expressions}. They repeat the smallest possible preceding 799regular expression a specified number of times. 800 801If the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} represents 802the @dfn{open-interval operator} and @samp{@}} represents the 803@dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do. 804 805Specifically, supposing that @samp{@{} and @samp{@}} represent the 806open-interval and close-interval operators; then: 807 808@table @code 809@item @{@var{count}@} 810matches exactly @var{count} occurrences of the preceding regular 811expression. 812 813@item @{@var{min,}@} 814matches @var{min} or more occurrences of the preceding regular 815expression. 816 817@item @{@var{min, max}@} 818matches at least @var{min} but no more than @var{max} occurrences of 819the preceding regular expression. 820 821@end table 822 823The interval expression (but not necessarily the regular expression that 824contains it) is invalid if: 825 826@itemize @bullet 827@item 828@var{min} is greater than @var{max}, or 829 830@item 831any of @var{count}, @var{min}, or @var{max} are outside the range 832zero to @code{RE_DUP_MAX} (which symbol @file{regex.h} 833defines). 834 835@end itemize 836 837If the interval expression is invalid and the syntax bit 838@code{RE_NO_BK_BRACES} is set, then Regex considers all the 839characters in the would-be interval to be ordinary. If that bit 840isn't set, then the regular expression is invalid. 841 842If the interval expression is valid but there is no preceding regular 843expression on which to operate, then if the syntax bit 844@code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid. 845If that bit isn't set, then Regex considers all the characters---other 846than backslashes, which it ignores---in the would-be interval to be 847ordinary. 848 849 850@node Alternation Operator, List Operators, Repetition Operators, Common Operators 851@section The Alternation Operator (@code{|} or @code{\|}) 852 853@kindex | 854@kindex \| 855@cindex alternation operator 856@cindex or operator 857 858If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't 859recognize this operator. Otherwise, if the syntax bit 860@code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator; 861otherwise, @samp{\|} does. 862 863Alternatives match one of a choice of regular expressions: 864if you put the character(s) representing the alternation operator between 865any two regular expressions @var{a} and @var{b}, the result matches 866the union of the strings that @var{a} and @var{b} match. For 867example, supposing that @samp{|} is the alternation operator, then 868@samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or 869@samp{quux}. 870 871@ignore 872@c Nobody needs to disallow empty alternatives any more. 873If the syntax bit @code{RE_NO_EMPTY_ALTS} is set, then if either of the regular 874expressions @var{a} or @var{b} is empty, the 875regular expression is invalid. More precisely, if this syntax bit is 876set, then the alternation operator can't: 877 878@itemize @bullet 879@item 880be first or last in a regular expression; 881 882@item 883follow either another alternation operator or an open-group operator 884(@pxref{Grouping Operators}); or 885 886@item 887precede a close-group operator. 888 889@end itemize 890 891@noindent 892For example, supposing @samp{(} and @samp{)} represent the open and 893close-group operators, then @samp{|foo}, @samp{foo|}, @samp{foo||bar}, 894@samp{foo(|bar)}, and @samp{(foo|)bar} would all be invalid. 895@end ignore 896 897The alternation operator operates on the @emph{largest} possible 898surrounding regular expressions. (Put another way, it has the lowest 899precedence of any regular expression operator.) 900Thus, the only way you can 901delimit its arguments is to use grouping. For example, if @samp{(} and 902@samp{)} are the open and close-group operators, then @samp{fo(o|b)ar} 903would match either @samp{fooar} or @samp{fobar}. (@samp{foo|bar} would 904match @samp{foo} or @samp{bar}.) 905 906@cindex backtracking 907The matcher usually tries all combinations of alternatives so as to 908match the longest possible string. For example, when matching 909@samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot 910take, say, the first (``depth-first'') combination it could match, since 911then it would be content to match just @samp{fooqbar}. 912 913@comment xx something about leftmost-longest 914 915 916@node List Operators, Grouping Operators, Alternation Operator, Common Operators 917@section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]}) 918 919@cindex matching list 920@cindex @samp{[} 921@cindex @samp{]} 922@cindex @samp{^} 923@cindex @samp{-} 924@cindex @samp{\} 925@cindex @samp{[^} 926@cindex nonmatching list 927@cindex matching newline 928@cindex bracket expression 929 930@dfn{Lists}, also called @dfn{bracket expressions}, are a set of one or 931more items. An @dfn{item} is a character, 932@ignore 933(These get added when they get implemented.) 934a collating symbol, an equivalence class expression, 935@end ignore 936a character class expression, or a range expression. The syntax bits 937affect which kinds of items you can put in a list. We explain the last 938two items in subsections below. Empty lists are invalid. 939 940A @dfn{matching list} matches a single character represented by one of 941the list items. You form a matching list by enclosing one or more items 942within an @dfn{open-matching-list operator} (represented by @samp{[}) 943and a @dfn{close-list operator} (represented by @samp{]}). 944 945For example, @samp{[ab]} matches either @samp{a} or @samp{b}. 946@samp{[ad]*} matches the empty string and any string composed of just 947@samp{a}s and @samp{d}s in any order. Regex considers invalid a regular 948expression with a @samp{[} but no matching 949@samp{]}. 950 951@dfn{Nonmatching lists} are similar to matching lists except that they 952match a single character @emph{not} represented by one of the list 953items. You use an @dfn{open-nonmatching-list operator} (represented by 954@samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to be 955the first character in the list. If you put a @samp{^} character first 956in (what you think is) a matching list, you'll turn it into a 957nonmatching list.}) instead of an open-matching-list operator to start a 958nonmatching list. 959 960For example, @samp{[^ab]} matches any character except @samp{a} or 961@samp{b}. 962 963If the @code{posix_newline} field in the pattern buffer (@pxref{GNU 964Pattern Buffers} is set, then nonmatching lists do not match a newline. 965 966Most characters lose any special meaning inside a list. The special 967characters inside a list follow. 968 969@table @samp 970@item ] 971ends the list if it's not the first list item. So, if you want to make 972the @samp{]} character a list item, you must put it first. 973 974@item \ 975quotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is 976set. 977 978@ignore 979Put these in if they get implemented. 980 981@item [. 982represents the open-collating-symbol operator (@pxref{Collating Symbol 983Operators}). 984 985@item .] 986represents the close-collating-symbol operator. 987 988@item [= 989represents the open-equivalence-class operator (@pxref{Equivalence Class 990Operators}). 991 992@item =] 993represents the close-equivalence-class operator. 994 995@end ignore 996 997@item [: 998represents the open-character-class operator (@pxref{Character Class 999Operators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and what 1000follows is a valid character class expression. 1001 1002@item :] 1003represents the close-character-class operator if the syntax bit 1004@code{RE_CHAR_CLASSES} is set and what precedes it is an 1005open-character-class operator followed by a valid character class name. 1006 1007@item - 1008represents the range operator (@pxref{Range Operator}) if it's 1009not first or last in a list or the ending point of a range. 1010 1011@end table 1012 1013@noindent 1014All other characters are ordinary. For example, @samp{[.*]} matches 1015@samp{.} and @samp{*}. 1016 1017@menu 1018* Character Class Operators:: [:class:] 1019* Range Operator:: start-end 1020@end menu 1021 1022@ignore 1023(If collating symbols and equivalence class expressions get implemented, 1024then add this.) 1025 1026node Collating Symbol Operators 1027subsubsection Collating Symbol Operators (@code{[.} @dots{} @code{.]}) 1028 1029If the syntax bit @code{XX} is set, then you can represent 1030collating symbols inside lists. You form a @dfn{collating symbol} by 1031putting a collating element between an @dfn{open-collating-symbol 1032operator} and an @dfn{close-collating-symbol operator}. @samp{[.} 1033represents the open-collating-symbol operator and @samp{.]} represents 1034the close-collating-symbol operator. For example, if @samp{ll} is a 1035collating element, then @samp{[[.ll.]]} would match @samp{ll}. 1036 1037node Equivalence Class Operators 1038subsubsection Equivalence Class Operators (@code{[=} @dots{} @code{=]}) 1039@cindex equivalence class expression in regex 1040@cindex @samp{[=} in regex 1041@cindex @samp{=]} in regex 1042 1043If the syntax bit @code{XX} is set, then Regex recognizes equivalence class 1044expressions inside lists. A @dfn{equivalence class expression} is a set 1045of collating elements which all belong to the same equivalence class. 1046You form an equivalence class expression by putting a collating 1047element between an @dfn{open-equivalence-class operator} and a 1048@dfn{close-equivalence-class operator}. @samp{[=} represents the 1049open-equivalence-class operator and @samp{=]} represents the 1050close-equivalence-class operator. For example, if @samp{a} and @samp{A} 1051were an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]} 1052would match both @samp{a} and @samp{A}. If the collating element in an 1053equivalence class expression isn't part of an equivalence class, then 1054the matcher considers the equivalence class expression to be a collating 1055symbol. 1056 1057@end ignore 1058 1059@node Character Class Operators, Range Operator, , List Operators 1060@subsection Character Class Operators (@code{[:} @dots{} @code{:]}) 1061 1062@cindex character classes 1063@cindex @samp{[:} in regex 1064@cindex @samp{:]} in regex 1065 1066If the syntax bit @code{RE_CHARACTER_CLASSES} is set, then Regex 1067recognizes character class expressions inside lists. A @dfn{character 1068class expression} matches one character from a given class. You form a 1069character class expression by putting a character class name between an 1070@dfn{open-character-class operator} (represented by @samp{[:}) and a 1071@dfn{close-character-class operator} (represented by @samp{:]}). The 1072character class names and their meanings are: 1073 1074@table @code 1075 1076@item alnum 1077letters and digits 1078 1079@item alpha 1080letters 1081 1082@item blank 1083system-dependent; for @sc{gnu}, a space or tab 1084 1085@item cntrl 1086control characters (in the @sc{ascii} encoding, code 0177 and codes 1087less than 040) 1088 1089@item digit 1090digits 1091 1092@item graph 1093same as @code{print} except omits space 1094 1095@item lower 1096lowercase letters 1097 1098@item print 1099printable characters (in the @sc{ascii} encoding, space 1100tilde---codes 040 through 0176) 1101 1102@item punct 1103neither control nor alphanumeric characters 1104 1105@item space 1106space, carriage return, newline, vertical tab, and form feed 1107 1108@item upper 1109uppercase letters 1110 1111@item xdigit 1112hexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F} 1113 1114@end table 1115 1116@noindent 1117These correspond to the definitions in the C library's @file{<ctype.h>} 1118facility. For example, @samp{[:alpha:]} corresponds to the standard 1119facility @code{isalpha}. Regex recognizes character class expressions 1120only inside of lists; so @samp{[[:alpha:]]} matches any letter, but 1121@samp{[:alpha:]} outside of a bracket expression and not followed by a 1122repetition operator matches just itself. 1123 1124@node Range Operator, , Character Class Operators, List Operators 1125@subsection The Range Operator (@code{-}) 1126 1127Regex recognizes @dfn{range expressions} inside a list. They represent 1128those characters 1129that fall between two elements in the current collating sequence. You 1130form a range expression by putting a @dfn{range operator} between two 1131@ignore 1132(If these get implemented, then substitute this for ``characters.'') 1133of any of the following: characters, collating elements, collating symbols, 1134and equivalence class expressions. The starting point of the range and 1135the ending point of the range don't have to be the same kind of item, 1136e.g., the starting point could be a collating element and the ending 1137point could be an equivalence class expression. If a range's ending 1138point is an equivalence class, then all the collating elements in that 1139class will be in the range. 1140@end ignore 1141characters.@footnote{You can't use a character class for the starting 1142or ending point of a range, since a character class is not a single 1143character.} @samp{-} represents the range operator. For example, 1144@samp{a-f} within a list represents all the characters from @samp{a} 1145through @samp{f} 1146inclusively. 1147 1148If the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range's 1149ending point collates less than its starting point, the range (and the 1150regular expression containing it) is invalid. For example, the regular 1151expression @samp{[z-a]} would be invalid. If this bit isn't set, then 1152Regex considers such a range to be empty. 1153 1154Since @samp{-} represents the range operator, if you want to make a 1155@samp{-} character itself 1156a list item, you must do one of the following: 1157 1158@itemize @bullet 1159@item 1160Put the @samp{-} either first or last in the list. 1161 1162@item 1163Include a range whose starting point collates strictly lower than 1164@samp{-} and whose ending point collates equal or higher. Unless a 1165range is the first item in a list, a @samp{-} can't be its starting 1166point, but @emph{can} be its ending point. That is because Regex 1167considers @samp{-} to be the range operator unless it is preceded by 1168another @samp{-}. For example, in the @sc{ascii} encoding, @samp{)}, 1169@samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} are 1170contiguous characters in the collating sequence. You might think that 1171@samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}. Rather, it 1172has the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, so 1173it matches, e.g., @samp{,}, not @samp{.}. 1174 1175@item 1176Put a range whose starting point is @samp{-} first in the list. 1177 1178@end itemize 1179 1180For example, @samp{[-a-z]} matches a lowercase letter or a hyphen (in 1181English, in @sc{ascii}). 1182 1183 1184@node Grouping Operators, Back-reference Operator, List Operators, Common Operators 1185@section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)}) 1186 1187@kindex ( 1188@kindex ) 1189@kindex \( 1190@kindex \) 1191@cindex grouping 1192@cindex subexpressions 1193@cindex parenthesizing 1194 1195A @dfn{group}, also known as a @dfn{subexpression}, consists of an 1196@dfn{open-group operator}, any number of other operators, and a 1197@dfn{close-group operator}. Regex treats this sequence as a unit, just 1198as mathematics and programming languages treat a parenthesized 1199expression as a unit. 1200 1201Therefore, using @dfn{groups}, you can: 1202 1203@itemize @bullet 1204@item 1205delimit the argument(s) to an alternation operator (@pxref{Alternation 1206Operator}) or a repetition operator (@pxref{Repetition 1207Operators}). 1208 1209@item 1210keep track of the indices of the substring that matched a given group. 1211@xref{Using Registers}, for a precise explanation. 1212This lets you: 1213 1214@itemize @bullet 1215@item 1216use the back-reference operator (@pxref{Back-reference Operator}). 1217 1218@item 1219use registers (@pxref{Using Registers}). 1220 1221@end itemize 1222 1223@end itemize 1224 1225If the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} represents 1226the open-group operator and @samp{)} represents the 1227close-group operator; otherwise, @samp{\(} and @samp{\)} do. 1228 1229If the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and a 1230close-group operator has no matching open-group operator, then Regex 1231considers it to match @samp{)}. 1232 1233 1234@node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators 1235@section The Back-reference Operator (@dfn{\}@var{digit}) 1236 1237@cindex back references 1238 1239If the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizes 1240back references. A back reference matches a specified preceding group. 1241The back reference operator is represented by @samp{\@var{digit}} 1242anywhere after the end of a regular expression's @w{@var{digit}-th} 1243group (@pxref{Grouping Operators}). 1244 1245@var{digit} must be between @samp{1} and @samp{9}. The matcher assigns 1246numbers 1 through 9 to the first nine groups it encounters. By using 1247one of @samp{\1} through @samp{\9} after the corresponding group's 1248close-group operator, you can match a substring identical to the 1249one that the group does. 1250 1251Back references match according to the following (in all examples below, 1252@samp{(} represents the open-group, @samp{)} the close-group, @samp{@{} 1253the open-interval and @samp{@}} the close-interval operator): 1254 1255@itemize @bullet 1256@item 1257If the group matches a substring, the back reference matches an 1258identical substring. For example, @samp{(a)\1} matches @samp{aa} and 1259@samp{(bana)na\1bo\1} matches @samp{bananabanabobana}. Likewise, 1260@samp{(.*)\1} matches any (newline-free if the syntax bit 1261@code{RE_DOT_NEWLINE} isn't set) string that is composed of two 1262identical halves; the @samp{(.*)} matches the first half and the 1263@samp{\1} matches the second half. 1264 1265@item 1266If the group matches more than once (as it might if followed 1267by, e.g., a repetition operator), then the back reference matches the 1268substring the group @emph{last} matched. For example, 1269@samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (the 1270outer one) matches @samp{aab} and @w{group 2} (the inner one) matches 1271@samp{aa}. Then @w{group 1} matches @samp{ab} and @w{group 2} matches 1272@samp{a}. So, @samp{\1} matches @samp{ab} and @samp{\2} matches 1273@samp{a}. 1274 1275@item 1276If the group doesn't participate in a match, i.e., it is part of an 1277alternative not taken or a repetition operator allows zero repetitions 1278of it, then the back reference makes the whole match fail. For example, 1279@samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three} 1280and @samp{two-and-four}, but not @samp{one-and-four} or 1281@samp{two-and-three}. For example, if the pattern matches 1282@samp{one-and-}, then its @w{group 2} matches the empty string and its 1283@w{group 3} doesn't participate in the match. So, if it then matches 1284@samp{four}, then when it tries to back reference @w{group 3}---which it 1285will attempt to do because @samp{\3} follows the @samp{four}---the match 1286will fail because @w{group 3} didn't participate in the match. 1287 1288@end itemize 1289 1290You can use a back reference as an argument to a repetition operator. For 1291example, @samp{(a(b))\2*} matches @samp{a} followed by two or more 1292@samp{b}s. Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}. 1293 1294If there is no preceding @w{@var{digit}-th} subexpression, the regular 1295expression is invalid. 1296 1297 1298@node Anchoring Operators, , Back-reference Operator, Common Operators 1299@section Anchoring Operators 1300 1301@cindex anchoring 1302@cindex regexp anchoring 1303 1304These operators can constrain a pattern to match only at the beginning or 1305end of the entire string or at the beginning or end of a line. 1306 1307@menu 1308* Match-beginning-of-line Operator:: ^ 1309* Match-end-of-line Operator:: $ 1310@end menu 1311 1312 1313@node Match-beginning-of-line Operator, Match-end-of-line Operator, , Anchoring Operators 1314@subsection The Match-beginning-of-line Operator (@code{^}) 1315 1316@kindex ^ 1317@cindex beginning-of-line operator 1318@cindex anchors 1319 1320This operator can match the empty string either at the beginning of the 1321string or after a newline character. Thus, it is said to @dfn{anchor} 1322the pattern to the beginning of a line. 1323 1324In the cases following, @samp{^} represents this operator. (Otherwise, 1325@samp{^} is ordinary.) 1326 1327@itemize @bullet 1328 1329@item 1330It (the @samp{^}) is first in the pattern, as in @samp{^foo}. 1331 1332@cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})} 1333@item 1334The syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outside 1335a bracket expression. 1336 1337@cindex open-group operator and @samp{^} 1338@cindex alternation operator and @samp{^} 1339@item 1340It follows an open-group or alternation operator, as in @samp{a\(^b\)} 1341and @samp{a\|^b}. @xref{Grouping Operators}, and @ref{Alternation 1342Operator}. 1343 1344@end itemize 1345 1346These rules imply that some valid patterns containing @samp{^} cannot be 1347matched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS} 1348is set. 1349 1350@vindex not_bol @r{field in pattern buffer} 1351If the @code{not_bol} field is set in the pattern buffer (@pxref{GNU 1352Pattern Buffers}), then @samp{^} fails to match at the beginning of the 1353string. @xref{POSIX Matching}, for when you might find this useful. 1354 1355@vindex newline_anchor @r{field in pattern buffer} 1356If the @code{newline_anchor} field is set in the pattern buffer, then 1357@samp{^} fails to match after a newline. This is useful when you do not 1358regard the string to be matched as broken into lines. 1359 1360 1361@node Match-end-of-line Operator, , Match-beginning-of-line Operator, Anchoring Operators 1362@subsection The Match-end-of-line Operator (@code{$}) 1363 1364@kindex $ 1365@cindex end-of-line operator 1366@cindex anchors 1367 1368This operator can match the empty string either at the end of 1369the string or before a newline character in the string. Thus, it is 1370said to @dfn{anchor} the pattern to the end of a line. 1371 1372It is always represented by @samp{$}. For example, @samp{foo$} usually 1373matches, e.g., @samp{foo} and, e.g., the first three characters of 1374@samp{foo\nbar}. 1375 1376Its interaction with the syntax bits and pattern buffer fields is 1377exactly the dual of @samp{^}'s; see the previous section. (That is, 1378``beginning'' becomes ``end'', ``next'' becomes ``previous'', and 1379``after'' becomes ``before''.) 1380 1381 1382@node GNU Operators, GNU Emacs Operators, Common Operators, Top 1383@chapter GNU Operators 1384 1385Following are operators that @sc{gnu} defines (and @sc{posix} doesn't). 1386 1387@menu 1388* Word Operators:: 1389* Buffer Operators:: 1390@end menu 1391 1392@node Word Operators, Buffer Operators, , GNU Operators 1393@section Word Operators 1394 1395The operators in this section require Regex to recognize parts of words. 1396Regex uses a syntax table to determine whether or not a character is 1397part of a word, i.e., whether or not it is @dfn{word-constituent}. 1398 1399@menu 1400* Non-Emacs Syntax Tables:: 1401* Match-word-boundary Operator:: \b 1402* Match-within-word Operator:: \B 1403* Match-beginning-of-word Operator:: \< 1404* Match-end-of-word Operator:: \> 1405* Match-word-constituent Operator:: \w 1406* Match-non-word-constituent Operator:: \W 1407@end menu 1408 1409@node Non-Emacs Syntax Tables, Match-word-boundary Operator, , Word Operators 1410@subsection Non-Emacs Syntax Tables 1411 1412A @dfn{syntax table} is an array indexed by the characters in your 1413character set. In the @sc{ascii} encoding, therefore, a syntax table 1414has 256 elements. Regex always uses a @code{char *} variable 1415@code{re_syntax_table} as its syntax table. In some cases, it 1416initializes this variable and in others it expects you to initialize it. 1417 1418@itemize @bullet 1419@item 1420If Regex is compiled with the preprocessor symbols @code{emacs} and 1421@code{SYNTAX_TABLE} both undefined, then Regex allocates 1422@code{re_syntax_table} and initializes an element @var{i} either to 1423@code{Sword} (which it defines) if @var{i} is a letter, number, or 1424@samp{_}, or to zero if it's not. 1425 1426@item 1427If Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE} 1428defined, then Regex expects you to define a @code{char *} variable 1429@code{re_syntax_table} to be a valid syntax table. 1430 1431@item 1432@xref{Emacs Syntax Tables}, for what happens when Regex is compiled with 1433the preprocessor symbol @code{emacs} defined. 1434 1435@end itemize 1436 1437@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators 1438@subsection The Match-word-boundary Operator (@code{\b}) 1439 1440@cindex @samp{\b} 1441@cindex word boundaries, matching 1442 1443This operator (represented by @samp{\b}) matches the empty string at 1444either the beginning or the end of a word. For example, @samp{\brat\b} 1445matches the separate word @samp{rat}. 1446 1447@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators 1448@subsection The Match-within-word Operator (@code{\B}) 1449 1450@cindex @samp{\B} 1451 1452This operator (represented by @samp{\B}) matches the empty string within 1453a word. For example, @samp{c\Brat\Be} matches @samp{crate}, but 1454@samp{dirty \Brat} doesn't match @samp{dirty rat}. 1455 1456@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators 1457@subsection The Match-beginning-of-word Operator (@code{\<}) 1458 1459@cindex @samp{\<} 1460 1461This operator (represented by @samp{\<}) matches the empty string at the 1462beginning of a word. 1463 1464@node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators 1465@subsection The Match-end-of-word Operator (@code{\>}) 1466 1467@cindex @samp{\>} 1468 1469This operator (represented by @samp{\>}) matches the empty string at the 1470end of a word. 1471 1472@node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators 1473@subsection The Match-word-constituent Operator (@code{\w}) 1474 1475@cindex @samp{\w} 1476 1477This operator (represented by @samp{\w}) matches any word-constituent 1478character. 1479 1480@node Match-non-word-constituent Operator, , Match-word-constituent Operator, Word Operators 1481@subsection The Match-non-word-constituent Operator (@code{\W}) 1482 1483@cindex @samp{\W} 1484 1485This operator (represented by @samp{\W}) matches any character that is 1486not word-constituent. 1487 1488 1489@node Buffer Operators, , Word Operators, GNU Operators 1490@section Buffer Operators 1491 1492Following are operators which work on buffers. In Emacs, a @dfn{buffer} 1493is, naturally, an Emacs buffer. For other programs, Regex considers the 1494entire string to be matched as the buffer. 1495 1496@menu 1497* Match-beginning-of-buffer Operator:: \` 1498* Match-end-of-buffer Operator:: \' 1499@end menu 1500 1501 1502@node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator, , Buffer Operators 1503@subsection The Match-beginning-of-buffer Operator (@code{\`}) 1504 1505@cindex @samp{\`} 1506 1507This operator (represented by @samp{\`}) matches the empty string at the 1508beginning of the buffer. 1509 1510@node Match-end-of-buffer Operator, , Match-beginning-of-buffer Operator, Buffer Operators 1511@subsection The Match-end-of-buffer Operator (@code{\'}) 1512 1513@cindex @samp{\'} 1514 1515This operator (represented by @samp{\'}) matches the empty string at the 1516end of the buffer. 1517 1518 1519@node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top 1520@chapter GNU Emacs Operators 1521 1522Following are operators that @sc{gnu} defines (and @sc{posix} doesn't) 1523that you can use only when Regex is compiled with the preprocessor 1524symbol @code{emacs} defined. 1525 1526@menu 1527* Syntactic Class Operators:: 1528@end menu 1529 1530 1531@node Syntactic Class Operators, , , GNU Emacs Operators 1532@section Syntactic Class Operators 1533 1534The operators in this section require Regex to recognize the syntactic 1535classes of characters. Regex uses a syntax table to determine this. 1536 1537@menu 1538* Emacs Syntax Tables:: 1539* Match-syntactic-class Operator:: \sCLASS 1540* Match-not-syntactic-class Operator:: \SCLASS 1541@end menu 1542 1543@node Emacs Syntax Tables, Match-syntactic-class Operator, , Syntactic Class Operators 1544@subsection Emacs Syntax Tables 1545 1546A @dfn{syntax table} is an array indexed by the characters in your 1547character set. In the @sc{ascii} encoding, therefore, a syntax table 1548has 256 elements. 1549 1550If Regex is compiled with the preprocessor symbol @code{emacs} defined, 1551then Regex expects you to define and initialize the variable 1552@code{re_syntax_table} to be an Emacs syntax table. Emacs' syntax 1553tables are more complicated than Regex's own (@pxref{Non-Emacs Syntax 1554Tables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual}, 1555for a description of Emacs' syntax tables. 1556 1557@node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators 1558@subsection The Match-syntactic-class Operator (@code{\s}@var{class}) 1559 1560@cindex @samp{\s} 1561 1562This operator matches any character whose syntactic class is represented 1563by a specified character. @samp{\s@var{class}} represents this operator 1564where @var{class} is the character representing the syntactic class you 1565want. For example, @samp{w} represents the syntactic 1566class of word-constituent characters, so @samp{\sw} matches any 1567word-constituent character. 1568 1569@node Match-not-syntactic-class Operator, , Match-syntactic-class Operator, Syntactic Class Operators 1570@subsection The Match-not-syntactic-class Operator (@code{\S}@var{class}) 1571 1572@cindex @samp{\S} 1573 1574This operator is similar to the match-syntactic-class operator except 1575that it matches any character whose syntactic class is @emph{not} 1576represented by the specified character. @samp{\S@var{class}} represents 1577this operator. For example, @samp{w} represents the syntactic class of 1578word-constituent characters, so @samp{\Sw} matches any character that is 1579not word-constituent. 1580 1581 1582@node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top 1583@chapter What Gets Matched? 1584 1585Regex usually matches strings according to the ``leftmost longest'' 1586rule; that is, it chooses the longest of the leftmost matches. This 1587does not mean that for a regular expression containing subexpressions 1588that it simply chooses the longest match for each subexpression, left to 1589right; the overall match must also be the longest possible one. 1590 1591For example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not 1592@samp{acdac}, as it would if it were to choose the longest match for the 1593first subexpression. 1594 1595 1596@node Programming with Regex, Copying, What Gets Matched?, Top 1597@chapter Programming with Regex 1598 1599Here we describe how you use the Regex data structures and functions in 1600C programs. Regex has three interfaces: one designed for @sc{gnu}, one 1601compatible with @sc{posix} and one compatible with Berkeley @sc{unix}. 1602 1603@menu 1604* GNU Regex Functions:: 1605* POSIX Regex Functions:: 1606* BSD Regex Functions:: 1607@end menu 1608 1609 1610@node GNU Regex Functions, POSIX Regex Functions, , Programming with Regex 1611@section GNU Regex Functions 1612 1613If you're writing code that doesn't need to be compatible with either 1614@sc{posix} or Berkeley @sc{unix}, you can use these functions. They 1615provide more options than the other interfaces. 1616 1617@menu 1618* GNU Pattern Buffers:: The re_pattern_buffer type. 1619* GNU Regular Expression Compiling:: re_compile_pattern () 1620* GNU Matching:: re_match () 1621* GNU Searching:: re_search () 1622* Matching/Searching with Split Data:: re_match_2 (), re_search_2 () 1623* Searching with Fastmaps:: re_compile_fastmap () 1624* GNU Translate Tables:: The `translate' field. 1625* Using Registers:: The re_registers type and related fns. 1626* Freeing GNU Pattern Buffers:: regfree () 1627@end menu 1628 1629 1630@node GNU Pattern Buffers, GNU Regular Expression Compiling, , GNU Regex Functions 1631@subsection GNU Pattern Buffers 1632 1633@cindex pattern buffer, definition of 1634@tindex re_pattern_buffer @r{definition} 1635@tindex struct re_pattern_buffer @r{definition} 1636 1637To compile, match, or search for a given regular expression, you must 1638supply a pattern buffer. A @dfn{pattern buffer} holds one compiled 1639regular expression.@footnote{Regular expressions are also referred to as 1640``patterns,'' hence the name ``pattern buffer.''} 1641 1642You can have several different pattern buffers simultaneously, each 1643holding a compiled pattern for a different regular expression. 1644 1645@file{regex.h} defines the pattern buffer @code{struct} as follows: 1646 1647@example 1648[[[ pattern_buffer ]]] 1649@end example 1650 1651 1652@node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions 1653@subsection GNU Regular Expression Compiling 1654 1655In @sc{gnu}, you can both match and search for a given regular 1656expression. To do either, you must first compile it in a pattern buffer 1657(@pxref{GNU Pattern Buffers}). 1658 1659@cindex syntax initialization 1660@vindex re_syntax_options @r{initialization} 1661Regular expressions match according to the syntax with which they were 1662compiled; with @sc{gnu}, you indicate what syntax you want by setting 1663the variable @code{re_syntax_options} (declared in @file{regex.h} and 1664defined in @file{regex.c}) before calling the compiling function, 1665@code{re_compile_pattern} (see below). @xref{Syntax Bits}, and 1666@ref{Predefined Syntaxes}. 1667 1668You can change the value of @code{re_syntax_options} at any time. 1669Usually, however, you set its value once and then never change it. 1670 1671@cindex pattern buffer initialization 1672@code{re_compile_pattern} takes a pattern buffer as an argument. You 1673must initialize the following fields: 1674 1675@table @code 1676 1677@item translate @r{initialization} 1678 1679@item translate 1680@vindex translate @r{initialization} 1681Initialize this to point to a translate table if you want one, or to 1682zero if you don't. We explain translate tables in @ref{GNU Translate 1683Tables}. 1684 1685@item fastmap 1686@vindex fastmap @r{initialization} 1687Initialize this to nonzero if you want a fastmap, or to zero if you 1688don't. 1689 1690@item buffer 1691@itemx allocated 1692@vindex buffer @r{initialization} 1693@vindex allocated @r{initialization} 1694@findex malloc 1695If you want @code{re_compile_pattern} to allocate memory for the 1696compiled pattern, set both of these to zero. If you have an existing 1697block of memory (allocated with @code{malloc}) you want Regex to use, 1698set @code{buffer} to its address and @code{allocated} to its size (in 1699bytes). 1700 1701@code{re_compile_pattern} uses @code{realloc} to extend the space for 1702the compiled pattern as necessary. 1703 1704@end table 1705 1706To compile a pattern buffer, use: 1707 1708@findex re_compile_pattern 1709@example 1710char * 1711re_compile_pattern (const char *@var{regex}, const int @var{regex_size}, 1712 struct re_pattern_buffer *@var{pattern_buffer}) 1713@end example 1714 1715@noindent 1716@var{regex} is the regular expression's address, @var{regex_size} is its 1717length, and @var{pattern_buffer} is the pattern buffer's address. 1718 1719If @code{re_compile_pattern} successfully compiles the regular 1720expression, it returns zero and sets @code{*@var{pattern_buffer}} to the 1721compiled pattern. It sets the pattern buffer's fields as follows: 1722 1723@table @code 1724@item buffer 1725@vindex buffer @r{field, set by @code{re_compile_pattern}} 1726to the compiled pattern. 1727 1728@item used 1729@vindex used @r{field, set by @code{re_compile_pattern}} 1730to the number of bytes the compiled pattern in @code{buffer} occupies. 1731 1732@item syntax 1733@vindex syntax @r{field, set by @code{re_compile_pattern}} 1734to the current value of @code{re_syntax_options}. 1735 1736@item re_nsub 1737@vindex re_nsub @r{field, set by @code{re_compile_pattern}} 1738to the number of subexpressions in @var{regex}. 1739 1740@item fastmap_accurate 1741@vindex fastmap_accurate @r{field, set by @code{re_compile_pattern}} 1742to zero on the theory that the pattern you're compiling is different 1743than the one previously compiled into @code{buffer}; in that case (since 1744you can't make a fastmap without a compiled pattern), 1745@code{fastmap} would either contain an incompatible fastmap, or nothing 1746at all. 1747 1748@c xx what else? 1749@end table 1750 1751If @code{re_compile_pattern} can't compile @var{regex}, it returns an 1752error string corresponding to one of the errors listed in @ref{POSIX 1753Regular Expression Compiling}. 1754 1755 1756@node GNU Matching, GNU Searching, GNU Regular Expression Compiling, GNU Regex Functions 1757@subsection GNU Matching 1758 1759@cindex matching with GNU functions 1760 1761Matching the @sc{gnu} way means trying to match as much of a string as 1762possible starting at a position within it you specify. Once you've compiled 1763a pattern into a pattern buffer (@pxref{GNU Regular Expression 1764Compiling}), you can ask the matcher to match that pattern against a 1765string using: 1766 1767@findex re_match 1768@example 1769int 1770re_match (struct re_pattern_buffer *@var{pattern_buffer}, 1771 const char *@var{string}, const int @var{size}, 1772 const int @var{start}, struct re_registers *@var{regs}) 1773@end example 1774 1775@noindent 1776@var{pattern_buffer} is the address of a pattern buffer containing a 1777compiled pattern. @var{string} is the string you want to match; it can 1778contain newline and null characters. @var{size} is the length of that 1779string. @var{start} is the string index at which you want to 1780begin matching; the first character of @var{string} is at index zero. 1781@xref{Using Registers}, for a explanation of @var{regs}; you can safely 1782pass zero. 1783 1784@code{re_match} matches the regular expression in @var{pattern_buffer} 1785against the string @var{string} according to the syntax in 1786@var{pattern_buffers}'s @code{syntax} field. (@xref{GNU Regular 1787Expression Compiling}, for how to set it.) The function returns 1788@math{-1} if the compiled pattern does not match any part of 1789@var{string} and @math{-2} if an internal error happens; otherwise, it 1790returns how many (possibly zero) characters of @var{string} the pattern 1791matched. 1792 1793An example: suppose @var{pattern_buffer} points to a pattern buffer 1794containing the compiled pattern for @samp{a*}, and @var{string} points 1795to @samp{aaaaab} (whereupon @var{size} should be 6). Then if @var{start} 1796is 2, @code{re_match} returns 3, i.e., @samp{a*} would have matched the 1797last three @samp{a}s in @var{string}. If @var{start} is 0, 1798@code{re_match} returns 5, i.e., @samp{a*} would have matched all the 1799@samp{a}s in @var{string}. If @var{start} is either 5 or 6, it returns 1800zero. 1801 1802If @var{start} is not between zero and @var{size}, then 1803@code{re_match} returns @math{-1}. 1804 1805 1806@node GNU Searching, Matching/Searching with Split Data, GNU Matching, GNU Regex Functions 1807@subsection GNU Searching 1808 1809@cindex searching with GNU functions 1810 1811@dfn{Searching} means trying to match starting at successive positions 1812within a string. The function @code{re_search} does this. 1813 1814Before calling @code{re_search}, you must compile your regular 1815expression. @xref{GNU Regular Expression Compiling}. 1816 1817Here is the function declaration: 1818 1819@findex re_search 1820@example 1821int 1822re_search (struct re_pattern_buffer *@var{pattern_buffer}, 1823 const char *@var{string}, const int @var{size}, 1824 const int @var{start}, const int @var{range}, 1825 struct re_registers *@var{regs}) 1826@end example 1827 1828@noindent 1829@vindex start @r{argument to @code{re_search}} 1830@vindex range @r{argument to @code{re_search}} 1831whose arguments are the same as those to @code{re_match} (@pxref{GNU 1832Matching}) except that the two arguments @var{start} and @var{range} 1833replace @code{re_match}'s argument @var{start}. 1834 1835If @var{range} is positive, then @code{re_search} attempts a match 1836starting first at index @var{start}, then at @math{@var{start} + 1} if 1837that fails, and so on, up to @math{@var{start} + @var{range}}; if 1838@var{range} is negative, then it attempts a match starting first at 1839index @var{start}, then at @math{@var{start} -1} if that fails, and so 1840on. 1841 1842If @var{start} is not between zero and @var{size}, then @code{re_search} 1843returns @math{-1}. When @var{range} is positive, @code{re_search} 1844adjusts @var{range} so that @math{@var{start} + @var{range} - 1} is 1845between zero and @var{size}, if necessary; that way it won't search 1846outside of @var{string}. Similarly, when @var{range} is negative, 1847@code{re_search} adjusts @var{range} so that @math{@var{start} + 1848@var{range} + 1} is between zero and @var{size}, if necessary. 1849 1850If the @code{fastmap} field of @var{pattern_buffer} is zero, 1851@code{re_search} matches starting at consecutive positions; otherwise, 1852it uses @code{fastmap} to make the search more efficient. 1853@xref{Searching with Fastmaps}. 1854 1855If no match is found, @code{re_search} returns @math{-1}. If 1856a match is found, it returns the index where the match began. If an 1857internal error happens, it returns @math{-2}. 1858 1859 1860@node Matching/Searching with Split Data, Searching with Fastmaps, GNU Searching, GNU Regex Functions 1861@subsection Matching and Searching with Split Data 1862 1863Using the functions @code{re_match_2} and @code{re_search_2}, you can 1864match or search in data that is divided into two strings. 1865 1866The function: 1867 1868@findex re_match_2 1869@example 1870int 1871re_match_2 (struct re_pattern_buffer *@var{buffer}, 1872 const char *@var{string1}, const int @var{size1}, 1873 const char *@var{string2}, const int @var{size2}, 1874 const int @var{start}, 1875 struct re_registers *@var{regs}, 1876 const int @var{stop}) 1877@end example 1878 1879@noindent 1880is similar to @code{re_match} (@pxref{GNU Matching}) except that you 1881pass @emph{two} data strings and sizes, and an index @var{stop} beyond 1882which you don't want the matcher to try matching. As with 1883@code{re_match}, if it succeeds, @code{re_match_2} returns how many 1884characters of @var{string} it matched. Regard @var{string1} and 1885@var{string2} as concatenated when you set the arguments @var{start} and 1886@var{stop} and use the contents of @var{regs}; @code{re_match_2} never 1887returns a value larger than @math{@var{size1} + @var{size2}}. 1888 1889The function: 1890 1891@findex re_search_2 1892@example 1893int 1894re_search_2 (struct re_pattern_buffer *@var{buffer}, 1895 const char *@var{string1}, const int @var{size1}, 1896 const char *@var{string2}, const int @var{size2}, 1897 const int @var{start}, const int @var{range}, 1898 struct re_registers *@var{regs}, 1899 const int @var{stop}) 1900@end example 1901 1902@noindent 1903is similarly related to @code{re_search}. 1904 1905 1906@node Searching with Fastmaps, GNU Translate Tables, Matching/Searching with Split Data, GNU Regex Functions 1907@subsection Searching with Fastmaps 1908 1909@cindex fastmaps 1910If you're searching through a long string, you should use a fastmap. 1911Without one, the searcher tries to match at consecutive positions in the 1912string. Generally, most of the characters in the string could not start 1913a match. It takes much longer to try matching at a given position in the 1914string than it does to check in a table whether or not the character at 1915that position could start a match. A @dfn{fastmap} is such a table. 1916 1917More specifically, a fastmap is an array indexed by the characters in 1918your character set. Under the @sc{ascii} encoding, therefore, a fastmap 1919has 256 elements. If you want the searcher to use a fastmap with a 1920given pattern buffer, you must allocate the array and assign the array's 1921address to the pattern buffer's @code{fastmap} field. You either can 1922compile the fastmap yourself or have @code{re_search} do it for you; 1923when @code{fastmap} is nonzero, it automatically compiles a fastmap the 1924first time you search using a particular compiled pattern. 1925 1926To compile a fastmap yourself, use: 1927 1928@findex re_compile_fastmap 1929@example 1930int 1931re_compile_fastmap (struct re_pattern_buffer *@var{pattern_buffer}) 1932@end example 1933 1934@noindent 1935@var{pattern_buffer} is the address of a pattern buffer. If the 1936character @var{c} could start a match for the pattern, 1937@code{re_compile_fastmap} makes 1938@code{@var{pattern_buffer}->fastmap[@var{c}]} nonzero. It returns 1939@math{0} if it can compile a fastmap and @math{-2} if there is an 1940internal error. For example, if @samp{|} is the alternation operator 1941and @var{pattern_buffer} holds the compiled pattern for @samp{a|b}, then 1942@code{re_compile_fastmap} sets @code{fastmap['a']} and 1943@code{fastmap['b']} (and no others). 1944 1945@code{re_search} uses a fastmap as it moves along in the string: it 1946checks the string's characters until it finds one that's in the fastmap. 1947Then it tries matching at that character. If the match fails, it 1948repeats the process. So, by using a fastmap, @code{re_search} doesn't 1949waste time trying to match at positions in the string that couldn't 1950start a match. 1951 1952If you don't want @code{re_search} to use a fastmap, 1953store zero in the @code{fastmap} field of the pattern buffer before 1954calling @code{re_search}. 1955 1956Once you've initialized a pattern buffer's @code{fastmap} field, you 1957need never do so again---even if you compile a new pattern in 1958it---provided the way the field is set still reflects whether or not you 1959want a fastmap. @code{re_search} will still either do nothing if 1960@code{fastmap} is null or, if it isn't, compile a new fastmap for the 1961new pattern. 1962 1963@node GNU Translate Tables, Using Registers, Searching with Fastmaps, GNU Regex Functions 1964@subsection GNU Translate Tables 1965 1966If you set the @code{translate} field of a pattern buffer to a translate 1967table, then the @sc{gnu} Regex functions to which you've passed that 1968pattern buffer use it to apply a simple transformation 1969to all the regular expression and string characters at which they look. 1970 1971A @dfn{translate table} is an array indexed by the characters in your 1972character set. Under the @sc{ascii} encoding, therefore, a translate 1973table has 256 elements. The array's elements are also characters in 1974your character set. When the Regex functions see a character @var{c}, 1975they use @code{translate[@var{c}]} in its place, with one exception: the 1976character after a @samp{\} is not translated. (This ensures that, the 1977operators, e.g., @samp{\B} and @samp{\b}, are always distinguishable.) 1978 1979For example, a table that maps all lowercase letters to the 1980corresponding uppercase ones would cause the matcher to ignore 1981differences in case.@footnote{A table that maps all uppercase letters to 1982the corresponding lowercase ones would work just as well for this 1983purpose.} Such a table would map all characters except lowercase letters 1984to themselves, and lowercase letters to the corresponding uppercase 1985ones. Under the @sc{ascii} encoding, here's how you could initialize 1986such a table (we'll call it @code{case_fold}): 1987 1988@example 1989for (i = 0; i < 256; i++) 1990 case_fold[i] = i; 1991for (i = 'a'; i <= 'z'; i++) 1992 case_fold[i] = i - ('a' - 'A'); 1993@end example 1994 1995You tell Regex to use a translate table on a given pattern buffer by 1996assigning that table's address to the @code{translate} field of that 1997buffer. If you don't want Regex to do any translation, put zero into 1998this field. You'll get weird results if you change the table's contents 1999anytime between compiling the pattern buffer, compiling its fastmap, and 2000matching or searching with the pattern buffer. 2001 2002@node Using Registers, Freeing GNU Pattern Buffers, GNU Translate Tables, GNU Regex Functions 2003@subsection Using Registers 2004 2005A group in a regular expression can match a (posssibly empty) substring 2006of the string that regular expression as a whole matched. The matcher 2007remembers the beginning and end of the substring matched by 2008each group. 2009 2010To find out what they matched, pass a nonzero @var{regs} argument to a 2011@sc{gnu} matching or searching function (@pxref{GNU Matching} and 2012@ref{GNU Searching}), i.e., the address of a structure of this type, as 2013defined in @file{regex.h}: 2014 2015@c We don't bother to include this directly from regex.h, 2016@c since it changes so rarely. 2017@example 2018@tindex re_registers 2019@vindex num_regs @r{in @code{struct re_registers}} 2020@vindex start @r{in @code{struct re_registers}} 2021@vindex end @r{in @code{struct re_registers}} 2022struct re_registers 2023@{ 2024 unsigned num_regs; 2025 regoff_t *start; 2026 regoff_t *end; 2027@}; 2028@end example 2029 2030Except for (possibly) the @var{num_regs}'th element (see below), the 2031@var{i}th element of the @code{start} and @code{end} arrays records 2032information about the @var{i}th group in the pattern. (They're declared 2033as C pointers, but this is only because not all C compilers accept 2034zero-length arrays; conceptually, it is simplest to think of them as 2035arrays.) 2036 2037The @code{start} and @code{end} arrays are allocated in various ways, 2038depending on the value of the @code{regs_allocated} 2039@vindex regs_allocated 2040field in the pattern buffer passed to the matcher. 2041 2042The simplest and perhaps most useful is to let the matcher (re)allocate 2043enough space to record information for all the groups in the regular 2044expression. If @code{regs_allocated} is @code{REGS_UNALLOCATED}, 2045@vindex REGS_UNALLOCATED 2046the matcher allocates @math{1 + @var{re_nsub}} (another field in the 2047pattern buffer; @pxref{GNU Pattern Buffers}). The extra element is set 2048to @math{-1}, and sets @code{regs_allocated} to @code{REGS_REALLOCATE}. 2049@vindex REGS_REALLOCATE 2050Then on subsequent calls with the same pattern buffer and @var{regs} 2051arguments, the matcher reallocates more space if necessary. 2052 2053It would perhaps be more logical to make the @code{regs_allocated} field 2054part of the @code{re_registers} structure, instead of part of the 2055pattern buffer. But in that case the caller would be forced to 2056initialize the structure before passing it. Much existing code doesn't 2057do this initialization, and it's arguably better to avoid it anyway. 2058 2059@code{re_compile_pattern} sets @code{regs_allocated} to 2060@code{REGS_UNALLOCATED}, 2061so if you use the GNU regular expression 2062functions, you get this behavior by default. 2063 2064xx document re_set_registers 2065 2066@sc{posix}, on the other hand, requires a different interface: the 2067caller is supposed to pass in a fixed-length array which the matcher 2068fills. Therefore, if @code{regs_allocated} is @code{REGS_FIXED} 2069@vindex REGS_FIXED 2070the matcher simply fills that array. 2071 2072The following examples illustrate the information recorded in the 2073@code{re_registers} structure. (In all of them, @samp{(} represents the 2074open-group and @samp{)} the close-group operator. The first character 2075in the string @var{string} is at index 0.) 2076 2077@c xx i'm not sure this is all true anymore. 2078 2079@itemize @bullet 2080 2081@item 2082If the regular expression has an @w{@var{i}-th} 2083group not contained within another group that matches a 2084substring of @var{string}, then the function sets 2085@code{@w{@var{regs}->}start[@var{i}]} to the index in @var{string} where 2086the substring matched by the @w{@var{i}-th} group begins, and 2087@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that 2088substring's end. The function sets @code{@w{@var{regs}->}start[0]} and 2089@code{@w{@var{regs}->}end[0]} to analogous information about the entire 2090pattern. 2091 2092For example, when you match @samp{((a)(b))} against @samp{ab}, you get: 2093 2094@itemize @bullet 2095@item 20960 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 2097 2098@item 20990 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 2100 2101@item 21020 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 2103 2104@item 21051 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]} 2106@end itemize 2107 2108@item 2109If a group matches more than once (as it might if followed by, 2110e.g., a repetition operator), then the function reports the information 2111about what the group @emph{last} matched. 2112 2113For example, when you match the pattern @samp{(a)*} against the string 2114@samp{aa}, you get: 2115 2116@itemize @bullet 2117@item 21180 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]} 2119 2120@item 21211 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]} 2122@end itemize 2123 2124@item 2125If the @w{@var{i}-th} group does not participate in a 2126successful match, e.g., it is an alternative not taken or a 2127repetition operator allows zero repetitions of it, then the function 2128sets @code{@w{@var{regs}->}start[@var{i}]} and 2129@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}. 2130 2131For example, when you match the pattern @samp{(a)*b} against 2132the string @samp{b}, you get: 2133 2134@itemize @bullet 2135@item 21360 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 2137 2138@item 2139@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 2140@end itemize 2141 2142@item 2143If the @w{@var{i}-th} group matches a zero-length string, then the 2144function sets @code{@w{@var{regs}->}start[@var{i}]} and 2145@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that 2146zero-length string. 2147 2148For example, when you match the pattern @samp{(a*)b} against the string 2149@samp{b}, you get: 2150 2151@itemize @bullet 2152@item 21530 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 2154 2155@item 21560 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 2157@end itemize 2158 2159@ignore 2160The function sets @code{@w{@var{regs}->}start[0]} and 2161@code{@w{@var{regs}->}end[0]} to analogous information about the entire 2162pattern. 2163 2164For example, when you match the pattern @samp{(a*)} against the empty 2165string, you get: 2166 2167@itemize @bullet 2168@item 21690 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]} 2170 2171@item 21720 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]} 2173@end itemize 2174@end ignore 2175 2176@item 2177If an @w{@var{i}-th} group contains a @w{@var{j}-th} group 2178in turn not contained within any other group within group @var{i} and 2179the function reports a match of the @w{@var{i}-th} group, then it 2180records in @code{@w{@var{regs}->}start[@var{j}]} and 2181@code{@w{@var{regs}->}end[@var{j}]} the last match (if it matched) of 2182the @w{@var{j}-th} group. 2183 2184For example, when you match the pattern @samp{((a*)b)*} against the 2185string @samp{abb}, @w{group 2} last matches the empty string, so you 2186get what it previously matched: 2187 2188@itemize @bullet 2189@item 21900 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 2191 2192@item 21932 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 2194 2195@item 21962 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]} 2197@end itemize 2198 2199When you match the pattern @samp{((a)*b)*} against the string 2200@samp{abb}, @w{group 2} doesn't participate in the last match, so you 2201get: 2202 2203@itemize @bullet 2204@item 22050 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]} 2206 2207@item 22082 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]} 2209 2210@item 22110 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]} 2212@end itemize 2213 2214@item 2215If an @w{@var{i}-th} group contains a @w{@var{j}-th} group 2216in turn not contained within any other group within group @var{i} 2217and the function sets 2218@code{@w{@var{regs}->}start[@var{i}]} and 2219@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets 2220@code{@w{@var{regs}->}start[@var{j}]} and 2221@code{@w{@var{regs}->}end[@var{j}]} to @math{-1}. 2222 2223For example, when you match the pattern @samp{((a)*b)*c} against the 2224string @samp{c}, you get: 2225 2226@itemize @bullet 2227@item 22280 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]} 2229 2230@item 2231@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]} 2232 2233@item 2234@math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]} 2235@end itemize 2236 2237@end itemize 2238 2239@node Freeing GNU Pattern Buffers, , Using Registers, GNU Regex Functions 2240@subsection Freeing GNU Pattern Buffers 2241 2242To free any allocated fields of a pattern buffer, you can use the 2243@sc{posix} function described in @ref{Freeing POSIX Pattern Buffers}, 2244since the type @code{regex_t}---the type for @sc{posix} pattern 2245buffers---is equivalent to the type @code{re_pattern_buffer}. After 2246freeing a pattern buffer, you need to again compile a regular expression 2247in it (@pxref{GNU Regular Expression Compiling}) before passing it to 2248a matching or searching function. 2249 2250 2251@node POSIX Regex Functions, BSD Regex Functions, GNU Regex Functions, Programming with Regex 2252@section POSIX Regex Functions 2253 2254If you're writing code that has to be @sc{posix} compatible, you'll need 2255to use these functions. Their interfaces are as specified by @sc{posix}, 2256draft 1003.2/D11.2. 2257 2258@menu 2259* POSIX Pattern Buffers:: The regex_t type. 2260* POSIX Regular Expression Compiling:: regcomp () 2261* POSIX Matching:: regexec () 2262* Reporting Errors:: regerror () 2263* Using Byte Offsets:: The regmatch_t type. 2264* Freeing POSIX Pattern Buffers:: regfree () 2265@end menu 2266 2267 2268@node POSIX Pattern Buffers, POSIX Regular Expression Compiling, , POSIX Regex Functions 2269@subsection POSIX Pattern Buffers 2270 2271To compile or match a given regular expression the @sc{posix} way, you 2272must supply a pattern buffer exactly the way you do for @sc{gnu} 2273(@pxref{GNU Pattern Buffers}). @sc{posix} pattern buffers have type 2274@code{regex_t}, which is equivalent to the @sc{gnu} pattern buffer 2275type @code{re_pattern_buffer}. 2276 2277 2278@node POSIX Regular Expression Compiling, POSIX Matching, POSIX Pattern Buffers, POSIX Regex Functions 2279@subsection POSIX Regular Expression Compiling 2280 2281With @sc{posix}, you can only search for a given regular expression; you 2282can't match it. To do this, you must first compile it in a 2283pattern buffer, using @code{regcomp}. 2284 2285@ignore 2286Before calling @code{regcomp}, you must initialize this pattern buffer 2287as you do for @sc{gnu} (@pxref{GNU Regular Expression Compiling}). See 2288below, however, for how to choose a syntax with which to compile. 2289@end ignore 2290 2291To compile a pattern buffer, use: 2292 2293@findex regcomp 2294@example 2295int 2296regcomp (regex_t *@var{preg}, const char *@var{regex}, int @var{cflags}) 2297@end example 2298 2299@noindent 2300@var{preg} is the initialized pattern buffer's address, @var{regex} is 2301the regular expression's address, and @var{cflags} is the compilation 2302flags, which Regex considers as a collection of bits. Here are the 2303valid bits, as defined in @file{regex.h}: 2304 2305@table @code 2306 2307@item REG_EXTENDED 2308@vindex REG_EXTENDED 2309says to use @sc{posix} Extended Regular Expression syntax; if this isn't 2310set, then says to use @sc{posix} Basic Regular Expression syntax. 2311@code{regcomp} sets @var{preg}'s @code{syntax} field accordingly. 2312 2313@item REG_ICASE 2314@vindex REG_ICASE 2315@cindex ignoring case 2316says to ignore case; @code{regcomp} sets @var{preg}'s @code{translate} 2317field to a translate table which ignores case, replacing anything you've 2318put there before. 2319 2320@item REG_NOSUB 2321@vindex REG_NOSUB 2322says to set @var{preg}'s @code{no_sub} field; @pxref{POSIX Matching}, 2323for what this means. 2324 2325@item REG_NEWLINE 2326@vindex REG_NEWLINE 2327says that a: 2328 2329@itemize @bullet 2330 2331@item 2332match-any-character operator (@pxref{Match-any-character 2333Operator}) doesn't match a newline. 2334 2335@item 2336nonmatching list not containing a newline (@pxref{List 2337Operators}) matches a newline. 2338 2339@item 2340match-beginning-of-line operator (@pxref{Match-beginning-of-line 2341Operator}) matches the empty string immediately after a newline, 2342regardless of how @code{REG_NOTBOL} is set (@pxref{POSIX Matching}, for 2343an explanation of @code{REG_NOTBOL}). 2344 2345@item 2346match-end-of-line operator (@pxref{Match-beginning-of-line 2347Operator}) matches the empty string immediately before a newline, 2348regardless of how @code{REG_NOTEOL} is set (@pxref{POSIX Matching}, 2349for an explanation of @code{REG_NOTEOL}). 2350 2351@end itemize 2352 2353@end table 2354 2355If @code{regcomp} successfully compiles the regular expression, it 2356returns zero and sets @code{*@var{pattern_buffer}} to the compiled 2357pattern. Except for @code{syntax} (which it sets as explained above), it 2358also sets the same fields the same way as does the @sc{gnu} compiling 2359function (@pxref{GNU Regular Expression Compiling}). 2360 2361If @code{regcomp} can't compile the regular expression, it returns one 2362of the error codes listed here. (Except when noted differently, the 2363syntax of in all examples below is basic regular expression syntax.) 2364 2365@table @code 2366 2367@comment repetitions 2368@item REG_BADRPT 2369For example, the consecutive repetition operators @samp{**} in 2370@samp{a**} are invalid. As another example, if the syntax is extended 2371regular expression syntax, then the repetition operator @samp{*} with 2372nothing on which to operate in @samp{*} is invalid. 2373 2374@item REG_BADBR 2375For example, the @var{count} @samp{-1} in @samp{a\@{-1} is invalid. 2376 2377@item REG_EBRACE 2378For example, @samp{a\@{1} is missing a close-interval operator. 2379 2380@comment lists 2381@item REG_EBRACK 2382For example, @samp{[a} is missing a close-list operator. 2383 2384@item REG_ERANGE 2385For example, the range ending point @samp{z} that collates lower than 2386does its starting point @samp{a} in @samp{[z-a]} is invalid. Also, the 2387range with the character class @samp{[:alpha:]} as its starting point in 2388@samp{[[:alpha:]-|]}. 2389 2390@item REG_ECTYPE 2391For example, the character class name @samp{foo} in @samp{[[:foo:]} is 2392invalid. 2393 2394@comment groups 2395@item REG_EPAREN 2396For example, @samp{a\)} is missing an open-group operator and @samp{\(a} 2397is missing a close-group operator. 2398 2399@item REG_ESUBREG 2400For example, the back reference @samp{\2} that refers to a nonexistent 2401subexpression in @samp{\(a\)\2} is invalid. 2402 2403@comment unfinished business 2404 2405@item REG_EEND 2406Returned when a regular expression causes no other more specific error. 2407 2408@item REG_EESCAPE 2409For example, the trailing backslash @samp{\} in @samp{a\} is invalid, as is the 2410one in @samp{\}. 2411 2412@comment kitchen sink 2413@item REG_BADPAT 2414For example, in the extended regular expression syntax, the empty group 2415@samp{()} in @samp{a()b} is invalid. 2416 2417@comment internal 2418@item REG_ESIZE 2419Returned when a regular expression needs a pattern buffer larger than 242065536 bytes. 2421 2422@item REG_ESPACE 2423Returned when a regular expression makes Regex to run out of memory. 2424 2425@end table 2426 2427 2428@node POSIX Matching, Reporting Errors, POSIX Regular Expression Compiling, POSIX Regex Functions 2429@subsection POSIX Matching 2430 2431Matching the @sc{posix} way means trying to match a null-terminated 2432string starting at its first character. Once you've compiled a pattern 2433into a pattern buffer (@pxref{POSIX Regular Expression Compiling}), you 2434can ask the matcher to match that pattern against a string using: 2435 2436@findex regexec 2437@example 2438int 2439regexec (const regex_t *@var{preg}, const char *@var{string}, 2440 size_t @var{nmatch}, regmatch_t @var{pmatch}[], int @var{eflags}) 2441@end example 2442 2443@noindent 2444@var{preg} is the address of a pattern buffer for a compiled pattern. 2445@var{string} is the string you want to match. 2446 2447@xref{Using Byte Offsets}, for an explanation of @var{pmatch}. If you 2448pass zero for @var{nmatch} or you compiled @var{preg} with the 2449compilation flag @code{REG_NOSUB} set, then @code{regexec} will ignore 2450@var{pmatch}; otherwise, you must allocate it to have at least 2451@var{nmatch} elements. @code{regexec} will record @var{nmatch} byte 2452offsets in @var{pmatch}, and set to @math{-1} any unused elements up to 2453@math{@var{pmatch}@code{[@var{nmatch}]} - 1}. 2454 2455@var{eflags} specifies @dfn{execution flags}---namely, the two bits 2456@code{REG_NOTBOL} and @code{REG_NOTEOL} (defined in @file{regex.h}). If 2457you set @code{REG_NOTBOL}, then the match-beginning-of-line operator 2458(@pxref{Match-beginning-of-line Operator}) always fails to match. 2459This lets you match against pieces of a line, as you would need to if, 2460say, searching for repeated instances of a given pattern in a line; it 2461would work correctly for patterns both with and without 2462match-beginning-of-line operators. @code{REG_NOTEOL} works analogously 2463for the match-end-of-line operator (@pxref{Match-end-of-line 2464Operator}); it exists for symmetry. 2465 2466@code{regexec} tries to find a match for @var{preg} in @var{string} 2467according to the syntax in @var{preg}'s @code{syntax} field. 2468(@xref{POSIX Regular Expression Compiling}, for how to set it.) The 2469function returns zero if the compiled pattern matches @var{string} and 2470@code{REG_NOMATCH} (defined in @file{regex.h}) if it doesn't. 2471 2472@node Reporting Errors, Using Byte Offsets, POSIX Matching, POSIX Regex Functions 2473@subsection Reporting Errors 2474 2475If either @code{regcomp} or @code{regexec} fail, they return a nonzero 2476error code, the possibilities for which are defined in @file{regex.h}. 2477@xref{POSIX Regular Expression Compiling}, and @ref{POSIX Matching}, for 2478what these codes mean. To get an error string corresponding to these 2479codes, you can use: 2480 2481@findex regerror 2482@example 2483size_t 2484regerror (int @var{errcode}, 2485 const regex_t *@var{preg}, 2486 char *@var{errbuf}, 2487 size_t @var{errbuf_size}) 2488@end example 2489 2490@noindent 2491@var{errcode} is an error code, @var{preg} is the address of the pattern 2492buffer which provoked the error, @var{errbuf} is the error buffer, and 2493@var{errbuf_size} is @var{errbuf}'s size. 2494 2495@code{regerror} returns the size in bytes of the error string 2496corresponding to @var{errcode} (including its terminating null). If 2497@var{errbuf} and @var{errbuf_size} are nonzero, it also returns in 2498@var{errbuf} the first @math{@var{errbuf_size} - 1} characters of the 2499error string, followed by a null. 2500@var{errbuf_size} must be a nonnegative number less than or equal to the 2501size in bytes of @var{errbuf}. 2502 2503You can call @code{regerror} with a null @var{errbuf} and a zero 2504@var{errbuf_size} to determine how large @var{errbuf} need be to 2505accommodate @code{regerror}'s error string. 2506 2507@node Using Byte Offsets, Freeing POSIX Pattern Buffers, Reporting Errors, POSIX Regex Functions 2508@subsection Using Byte Offsets 2509 2510In @sc{posix}, variables of type @code{regmatch_t} hold analogous 2511information, but are not identical to, @sc{gnu}'s registers (@pxref{Using 2512Registers}). To get information about registers in @sc{posix}, pass to 2513@code{regexec} a nonzero @var{pmatch} of type @code{regmatch_t}, i.e., 2514the address of a structure of this type, defined in 2515@file{regex.h}: 2516 2517@tindex regmatch_t 2518@example 2519typedef struct 2520@{ 2521 regoff_t rm_so; 2522 regoff_t rm_eo; 2523@} regmatch_t; 2524@end example 2525 2526When reading in @ref{Using Registers}, about how the matching function 2527stores the information into the registers, substitute @var{pmatch} for 2528@var{regs}, @code{@w{@var{pmatch}[@var{i}]->}rm_so} for 2529@code{@w{@var{regs}->}start[@var{i}]} and 2530@code{@w{@var{pmatch}[@var{i}]->}rm_eo} for 2531@code{@w{@var{regs}->}end[@var{i}]}. 2532 2533@node Freeing POSIX Pattern Buffers, , Using Byte Offsets, POSIX Regex Functions 2534@subsection Freeing POSIX Pattern Buffers 2535 2536To free any allocated fields of a pattern buffer, use: 2537 2538@findex regfree 2539@example 2540void 2541regfree (regex_t *@var{preg}) 2542@end example 2543 2544@noindent 2545@var{preg} is the pattern buffer whose allocated fields you want freed. 2546@code{regfree} also sets @var{preg}'s @code{allocated} and @code{used} 2547fields to zero. After freeing a pattern buffer, you need to again 2548compile a regular expression in it (@pxref{POSIX Regular Expression 2549Compiling}) before passing it to the matching function (@pxref{POSIX 2550Matching}). 2551 2552 2553@node BSD Regex Functions, , POSIX Regex Functions, Programming with Regex 2554@section BSD Regex Functions 2555 2556If you're writing code that has to be Berkeley @sc{unix} compatible, 2557you'll need to use these functions whose interfaces are the same as those 2558in Berkeley @sc{unix}. 2559 2560@menu 2561* BSD Regular Expression Compiling:: re_comp () 2562* BSD Searching:: re_exec () 2563@end menu 2564 2565@node BSD Regular Expression Compiling, BSD Searching, , BSD Regex Functions 2566@subsection BSD Regular Expression Compiling 2567 2568With Berkeley @sc{unix}, you can only search for a given regular 2569expression; you can't match one. To search for it, you must first 2570compile it. Before you compile it, you must indicate the regular 2571expression syntax you want it compiled according to by setting the 2572variable @code{re_syntax_options} (declared in @file{regex.h} to some 2573syntax (@pxref{Regular Expression Syntax}). 2574 2575To compile a regular expression use: 2576 2577@findex re_comp 2578@example 2579char * 2580re_comp (char *@var{regex}) 2581@end example 2582 2583@noindent 2584@var{regex} is the address of a null-terminated regular expression. 2585@code{re_comp} uses an internal pattern buffer, so you can use only the 2586most recently compiled pattern buffer. This means that if you want to 2587use a given regular expression that you've already compiled---but it 2588isn't the latest one you've compiled---you'll have to recompile it. If 2589you call @code{re_comp} with the null string (@emph{not} the empty 2590string) as the argument, it doesn't change the contents of the pattern 2591buffer. 2592 2593If @code{re_comp} successfully compiles the regular expression, it 2594returns zero. If it can't compile the regular expression, it returns 2595an error string. @code{re_comp}'s error messages are identical to those 2596of @code{re_compile_pattern} (@pxref{GNU Regular Expression 2597Compiling}). 2598 2599@node BSD Searching, , BSD Regular Expression Compiling, BSD Regex Functions 2600@subsection BSD Searching 2601 2602Searching the Berkeley @sc{unix} way means searching in a string 2603starting at its first character and trying successive positions within 2604it to find a match. Once you've compiled a pattern using @code{re_comp} 2605(@pxref{BSD Regular Expression Compiling}), you can ask Regex 2606to search for that pattern in a string using: 2607 2608@findex re_exec 2609@example 2610int 2611re_exec (char *@var{string}) 2612@end example 2613 2614@noindent 2615@var{string} is the address of the null-terminated string in which you 2616want to search. 2617 2618@code{re_exec} returns either 1 for success or 0 for failure. It 2619automatically uses a @sc{gnu} fastmap (@pxref{Searching with Fastmaps}). 2620 2621 2622@node Copying, Index, Programming with Regex, Top 2623@appendix GNU GENERAL PUBLIC LICENSE 2624@center Version 2, June 1991 2625 2626@display 2627Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc. 2628675 Mass Ave, Cambridge, MA 02139, USA 2629 2630Everyone is permitted to copy and distribute verbatim copies 2631of this license document, but changing it is not allowed. 2632@end display 2633 2634@unnumberedsec Preamble 2635 2636 The licenses for most software are designed to take away your 2637freedom to share and change it. By contrast, the GNU General Public 2638License is intended to guarantee your freedom to share and change free 2639software---to make sure the software is free for all its users. This 2640General Public License applies to most of the Free Software 2641Foundation's software and to any other program whose authors commit to 2642using it. (Some other Free Software Foundation software is covered by 2643the GNU Library General Public License instead.) You can apply it to 2644your programs, too. 2645 2646 When we speak of free software, we are referring to freedom, not 2647price. Our General Public Licenses are designed to make sure that you 2648have the freedom to distribute copies of free software (and charge for 2649this service if you wish), that you receive source code or can get it 2650if you want it, that you can change the software or use pieces of it 2651in new free programs; and that you know you can do these things. 2652 2653 To protect your rights, we need to make restrictions that forbid 2654anyone to deny you these rights or to ask you to surrender the rights. 2655These restrictions translate to certain responsibilities for you if you 2656distribute copies of the software, or if you modify it. 2657 2658 For example, if you distribute copies of such a program, whether 2659gratis or for a fee, you must give the recipients all the rights that 2660you have. You must make sure that they, too, receive or can get the 2661source code. And you must show them these terms so they know their 2662rights. 2663 2664 We protect your rights with two steps: (1) copyright the software, and 2665(2) offer you this license which gives you legal permission to copy, 2666distribute and/or modify the software. 2667 2668 Also, for each author's protection and ours, we want to make certain 2669that everyone understands that there is no warranty for this free 2670software. If the software is modified by someone else and passed on, we 2671want its recipients to know that what they have is not the original, so 2672that any problems introduced by others will not reflect on the original 2673authors' reputations. 2674 2675 Finally, any free program is threatened constantly by software 2676patents. We wish to avoid the danger that redistributors of a free 2677program will individually obtain patent licenses, in effect making the 2678program proprietary. To prevent this, we have made it clear that any 2679patent must be licensed for everyone's free use or not licensed at all. 2680 2681 The precise terms and conditions for copying, distribution and 2682modification follow. 2683 2684@iftex 2685@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 2686@end iftex 2687@ifinfo 2688@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 2689@end ifinfo 2690 2691@enumerate 2692@item 2693This License applies to any program or other work which contains 2694a notice placed by the copyright holder saying it may be distributed 2695under the terms of this General Public License. The ``Program'', below, 2696refers to any such program or work, and a ``work based on the Program'' 2697means either the Program or any derivative work under copyright law: 2698that is to say, a work containing the Program or a portion of it, 2699either verbatim or with modifications and/or translated into another 2700language. (Hereinafter, translation is included without limitation in 2701the term ``modification''.) Each licensee is addressed as ``you''. 2702 2703Activities other than copying, distribution and modification are not 2704covered by this License; they are outside its scope. The act of 2705running the Program is not restricted, and the output from the Program 2706is covered only if its contents constitute a work based on the 2707Program (independent of having been made by running the Program). 2708Whether that is true depends on what the Program does. 2709 2710@item 2711You may copy and distribute verbatim copies of the Program's 2712source code as you receive it, in any medium, provided that you 2713conspicuously and appropriately publish on each copy an appropriate 2714copyright notice and disclaimer of warranty; keep intact all the 2715notices that refer to this License and to the absence of any warranty; 2716and give any other recipients of the Program a copy of this License 2717along with the Program. 2718 2719You may charge a fee for the physical act of transferring a copy, and 2720you may at your option offer warranty protection in exchange for a fee. 2721 2722@item 2723You may modify your copy or copies of the Program or any portion 2724of it, thus forming a work based on the Program, and copy and 2725distribute such modifications or work under the terms of Section 1 2726above, provided that you also meet all of these conditions: 2727 2728@enumerate a 2729@item 2730You must cause the modified files to carry prominent notices 2731stating that you changed the files and the date of any change. 2732 2733@item 2734You must cause any work that you distribute or publish, that in 2735whole or in part contains or is derived from the Program or any 2736part thereof, to be licensed as a whole at no charge to all third 2737parties under the terms of this License. 2738 2739@item 2740If the modified program normally reads commands interactively 2741when run, you must cause it, when started running for such 2742interactive use in the most ordinary way, to print or display an 2743announcement including an appropriate copyright notice and a 2744notice that there is no warranty (or else, saying that you provide 2745a warranty) and that users may redistribute the program under 2746these conditions, and telling the user how to view a copy of this 2747License. (Exception: if the Program itself is interactive but 2748does not normally print such an announcement, your work based on 2749the Program is not required to print an announcement.) 2750@end enumerate 2751 2752These requirements apply to the modified work as a whole. If 2753identifiable sections of that work are not derived from the Program, 2754and can be reasonably considered independent and separate works in 2755themselves, then this License, and its terms, do not apply to those 2756sections when you distribute them as separate works. But when you 2757distribute the same sections as part of a whole which is a work based 2758on the Program, the distribution of the whole must be on the terms of 2759this License, whose permissions for other licensees extend to the 2760entire whole, and thus to each and every part regardless of who wrote it. 2761 2762Thus, it is not the intent of this section to claim rights or contest 2763your rights to work written entirely by you; rather, the intent is to 2764exercise the right to control the distribution of derivative or 2765collective works based on the Program. 2766 2767In addition, mere aggregation of another work not based on the Program 2768with the Program (or with a work based on the Program) on a volume of 2769a storage or distribution medium does not bring the other work under 2770the scope of this License. 2771 2772@item 2773You may copy and distribute the Program (or a work based on it, 2774under Section 2) in object code or executable form under the terms of 2775Sections 1 and 2 above provided that you also do one of the following: 2776 2777@enumerate a 2778@item 2779Accompany it with the complete corresponding machine-readable 2780source code, which must be distributed under the terms of Sections 27811 and 2 above on a medium customarily used for software interchange; or, 2782 2783@item 2784Accompany it with a written offer, valid for at least three 2785years, to give any third party, for a charge no more than your 2786cost of physically performing source distribution, a complete 2787machine-readable copy of the corresponding source code, to be 2788distributed under the terms of Sections 1 and 2 above on a medium 2789customarily used for software interchange; or, 2790 2791@item 2792Accompany it with the information you received as to the offer 2793to distribute corresponding source code. (This alternative is 2794allowed only for noncommercial distribution and only if you 2795received the program in object code or executable form with such 2796an offer, in accord with Subsection b above.) 2797@end enumerate 2798 2799The source code for a work means the preferred form of the work for 2800making modifications to it. For an executable work, complete source 2801code means all the source code for all modules it contains, plus any 2802associated interface definition files, plus the scripts used to 2803control compilation and installation of the executable. However, as a 2804special exception, the source code distributed need not include 2805anything that is normally distributed (in either source or binary 2806form) with the major components (compiler, kernel, and so on) of the 2807operating system on which the executable runs, unless that component 2808itself accompanies the executable. 2809 2810If distribution of executable or object code is made by offering 2811access to copy from a designated place, then offering equivalent 2812access to copy the source code from the same place counts as 2813distribution of the source code, even though third parties are not 2814compelled to copy the source along with the object code. 2815 2816@item 2817You may not copy, modify, sublicense, or distribute the Program 2818except as expressly provided under this License. Any attempt 2819otherwise to copy, modify, sublicense or distribute the Program is 2820void, and will automatically terminate your rights under this License. 2821However, parties who have received copies, or rights, from you under 2822this License will not have their licenses terminated so long as such 2823parties remain in full compliance. 2824 2825@item 2826You are not required to accept this License, since you have not 2827signed it. However, nothing else grants you permission to modify or 2828distribute the Program or its derivative works. These actions are 2829prohibited by law if you do not accept this License. Therefore, by 2830modifying or distributing the Program (or any work based on the 2831Program), you indicate your acceptance of this License to do so, and 2832all its terms and conditions for copying, distributing or modifying 2833the Program or works based on it. 2834 2835@item 2836Each time you redistribute the Program (or any work based on the 2837Program), the recipient automatically receives a license from the 2838original licensor to copy, distribute or modify the Program subject to 2839these terms and conditions. You may not impose any further 2840restrictions on the recipients' exercise of the rights granted herein. 2841You are not responsible for enforcing compliance by third parties to 2842this License. 2843 2844@item 2845If, as a consequence of a court judgment or allegation of patent 2846infringement or for any other reason (not limited to patent issues), 2847conditions are imposed on you (whether by court order, agreement or 2848otherwise) that contradict the conditions of this License, they do not 2849excuse you from the conditions of this License. If you cannot 2850distribute so as to satisfy simultaneously your obligations under this 2851License and any other pertinent obligations, then as a consequence you 2852may not distribute the Program at all. For example, if a patent 2853license would not permit royalty-free redistribution of the Program by 2854all those who receive copies directly or indirectly through you, then 2855the only way you could satisfy both it and this License would be to 2856refrain entirely from distribution of the Program. 2857 2858If any portion of this section is held invalid or unenforceable under 2859any particular circumstance, the balance of the section is intended to 2860apply and the section as a whole is intended to apply in other 2861circumstances. 2862 2863It is not the purpose of this section to induce you to infringe any 2864patents or other property right claims or to contest validity of any 2865such claims; this section has the sole purpose of protecting the 2866integrity of the free software distribution system, which is 2867implemented by public license practices. Many people have made 2868generous contributions to the wide range of software distributed 2869through that system in reliance on consistent application of that 2870system; it is up to the author/donor to decide if he or she is willing 2871to distribute software through any other system and a licensee cannot 2872impose that choice. 2873 2874This section is intended to make thoroughly clear what is believed to 2875be a consequence of the rest of this License. 2876 2877@item 2878If the distribution and/or use of the Program is restricted in 2879certain countries either by patents or by copyrighted interfaces, the 2880original copyright holder who places the Program under this License 2881may add an explicit geographical distribution limitation excluding 2882those countries, so that distribution is permitted only in or among 2883countries not thus excluded. In such case, this License incorporates 2884the limitation as if written in the body of this License. 2885 2886@item 2887The Free Software Foundation may publish revised and/or new versions 2888of the General Public License from time to time. Such new versions will 2889be similar in spirit to the present version, but may differ in detail to 2890address new problems or concerns. 2891 2892Each version is given a distinguishing version number. If the Program 2893specifies a version number of this License which applies to it and ``any 2894later version'', you have the option of following the terms and conditions 2895either of that version or of any later version published by the Free 2896Software Foundation. If the Program does not specify a version number of 2897this License, you may choose any version ever published by the Free Software 2898Foundation. 2899 2900@item 2901If you wish to incorporate parts of the Program into other free 2902programs whose distribution conditions are different, write to the author 2903to ask for permission. For software which is copyrighted by the Free 2904Software Foundation, write to the Free Software Foundation; we sometimes 2905make exceptions for this. Our decision will be guided by the two goals 2906of preserving the free status of all derivatives of our free software and 2907of promoting the sharing and reuse of software generally. 2908 2909@iftex 2910@heading NO WARRANTY 2911@end iftex 2912@ifinfo 2913@center NO WARRANTY 2914@end ifinfo 2915 2916@item 2917BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 2918FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 2919OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 2920PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 2921OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 2922MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 2923TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 2924PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 2925REPAIR OR CORRECTION. 2926 2927@item 2928IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 2929WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 2930REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 2931INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 2932OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 2933TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 2934YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 2935PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 2936POSSIBILITY OF SUCH DAMAGES. 2937@end enumerate 2938 2939@iftex 2940@heading END OF TERMS AND CONDITIONS 2941@end iftex 2942@ifinfo 2943@center END OF TERMS AND CONDITIONS 2944@end ifinfo 2945 2946@page 2947@unnumberedsec Appendix: How to Apply These Terms to Your New Programs 2948 2949 If you develop a new program, and you want it to be of the greatest 2950possible use to the public, the best way to achieve this is to make it 2951free software which everyone can redistribute and change under these terms. 2952 2953 To do so, attach the following notices to the program. It is safest 2954to attach them to the start of each source file to most effectively 2955convey the exclusion of warranty; and each file should have at least 2956the ``copyright'' line and a pointer to where the full notice is found. 2957 2958@smallexample 2959@var{one line to give the program's name and a brief idea of what it does.} 2960Copyright (C) 19@var{yy} @var{name of author} 2961 2962This program is free software; you can redistribute it and/or modify 2963it under the terms of the GNU General Public License as published by 2964the Free Software Foundation; either version 2 of the License, or 2965(at your option) any later version. 2966 2967This program is distributed in the hope that it will be useful, 2968but WITHOUT ANY WARRANTY; without even the implied warranty of 2969MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 2970GNU General Public License for more details. 2971 2972You should have received a copy of the GNU General Public License 2973along with this program; if not, write to the Free Software 2974Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 2975@end smallexample 2976 2977Also add information on how to contact you by electronic and paper mail. 2978 2979If the program is interactive, make it output a short notice like this 2980when it starts in an interactive mode: 2981 2982@smallexample 2983Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author} 2984Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 2985This is free software, and you are welcome to redistribute it 2986under certain conditions; type `show c' for details. 2987@end smallexample 2988 2989The hypothetical commands @samp{show w} and @samp{show c} should show 2990the appropriate parts of the General Public License. Of course, the 2991commands you use may be called something other than @samp{show w} and 2992@samp{show c}; they could even be mouse-clicks or menu items---whatever 2993suits your program. 2994 2995You should also get your employer (if you work as a programmer) or your 2996school, if any, to sign a ``copyright disclaimer'' for the program, if 2997necessary. Here is a sample; alter the names: 2998 2999@example 3000Yoyodyne, Inc., hereby disclaims all copyright interest in the program 3001`Gnomovision' (which makes passes at compilers) written by James Hacker. 3002 3003@var{signature of Ty Coon}, 1 April 1989 3004Ty Coon, President of Vice 3005@end example 3006 3007This General Public License does not permit incorporating your program into 3008proprietary programs. If your program is a subroutine library, you may 3009consider it more useful to permit linking proprietary applications with the 3010library. If this is what you want to do, use the GNU Library General 3011Public License instead of this License. 3012 3013 3014@node Index, , Copying, Top 3015@unnumbered Index 3016 3017@printindex cp 3018 3019@contents 3020 3021@bye 3022