1<h1>TRE API reference manual</h1>
2
3<h2>The <tt>regcomp()</tt> functions</h2>
4<a name="regcomp"></a>
5
6<div class="code">
7<code>
8#include &lt;tre/regex.h&gt;
9<br>
10<br>
11<font class="type">int</font>
12<font class="func">regcomp</font>(<font
13class="type">regex_t</font> *<font class="arg">preg</font>,
14<font class="qual">const</font> <font class="type">char</font>
15*<font class="arg">regex</font>, <font class="type">int</font>
16<font class="arg">cflags</font>);
17<br>
18<font class="type">int</font> <font
19class="func">regncomp</font>(<font class="type">regex_t</font>
20*<font class="arg">preg</font>, <font class="qual">const</font>
21<font class="type">char</font> *<font class="arg">regex</font>,
22<font class="type">size_t</font> <font class="arg">len</font>,
23<font class="type">int</font> <font class="arg">cflags</font>);
24<br>
25<font class="type">int</font> <font
26class="func">regwcomp</font>(<font class="type">regex_t</font>
27*<font class="arg">preg</font>, <font class="qual">const</font>
28<font class="type">wchar_t</font> *<font
29class="arg">regex</font>, <font class="type">int</font> <font
30class="arg">cflags</font>);
31<br>
32<font class="type">int</font> <font
33class="func">regwncomp</font>(<font class="type">regex_t</font>
34*<font class="arg">preg</font>, <font class="qual">const</font>
35<font class="type">wchar_t</font> *<font
36class="arg">regex</font>, <font class="type">size_t</font>
37<font class="arg">len</font>, <font class="type">int</font>
38<font class="arg">cflags</font>);
39<br>
40<font class="type">void</font> <font
41class="func">regfree</font>(<font class="type">regex_t</font>
42*<font class="arg">preg</font>);
43<br>
44</code>
45</div>
46
47<p>
48The <tt><font class="func">regcomp</font>()</tt> function compiles
49the regex string pointed to by <tt><font
50class="arg">regex</font></tt> to an internal representation and
51stores the result in the pattern buffer structure pointed to by
52<tt><font class="arg">preg</font></tt>.  The <tt><font
53class="func">regncomp</font>()</tt> function is like <tt><font
54class="func">regcomp</font>()</tt>, but <tt><font
55class="arg">regex</font></tt> is not terminated with the null
56byte.  Instead, the <tt><font class="arg">len</font></tt> argument
57is used to give the length of the string, and the string may contain
58null bytes.  The <tt><font class="func">regwcomp</font>()</tt> and
59<tt><font class="func">regwncomp</font>()</tt> functions work like
60<tt><font class="func">regcomp</font>()</tt> and <tt><font
61class="func">regncomp</font>()</tt>, respectively, but take a
62wide-character (<tt><font class="type">wchar_t</font></tt>) string
63instead of a byte string.
64</p>
65
66<p>
67The <tt><font class="arg">cflags</font></tt> argument is a the
68bitwise inclusive OR of zero or more of the following flags (defined
69in the header <tt>&lt;tre/regex.h&gt;</tt>):
70</p>
71
72<blockquote>
73<dl>
74<dt><tt>REG_EXTENDED</tt></dt>
75<dd>Use POSIX Extended Regular Expression (ERE) compatible syntax when
76compiling <tt><font class="arg">regex</font></tt>.  The default
77syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is
78considered obsolete.</dd>
79
80<dt><tt>REG_ICASE</tt></dt>
81<dd>Ignore case.  Subsequent searches with the <a
82href="#regexec"><tt>regexec</tt></a> family of functions using this
83pattern buffer will be case insensitive.</dd>
84
85<dt><tt>REG_NOSUB</tt></dt>
86<dd>Do not report submatches.  Subsequent searches with the <a
87href="#regexec"><tt>regexec</tt></a> family of functions will only
88report whether a match was found or not and will not fill the submatch
89array.</dd>
90
91<dt><tt>REG_NEWLINE</tt></dt>
92<dd>Normally the newline character is treated as an ordinary
93character.  When this flag is used, the newline character
94(<tt>'\n'</tt>, ASCII code 10) is treated specially as follows:
95<ol>
96<li>The match-any-character operator (dot <tt>"."</tt> outside a
97bracket expression) does not match a newline.</li>
98<li>A non-matching list (<tt>[^...]</tt>) not containing a newline
99does not match a newline.</li>
100<li>The match-beginning-of-line operator <tt>^</tt> matches the empty
101string immediately after a newline as well as the empty string at the
102beginning of the string (but see the <code>REG_NOTBOL</code>
103<code>regexec()</code> flag below).
104<li>The match-end-of-line operator <tt>$</tt> matches the empty
105string immediately before a newline as well as the empty string at the
106end of the string (but see the <code>REG_NOTEOL</code>
107<code>regexec()</code> flag below).
108</ol>
109</dd>
110
111<dt><tt>REG_LITERAL</tt></dt>
112<dd>Interpret the entire <tt><font class="arg">regex</font></tt>
113argument as a literal string, that is, all characters will be
114considered ordinary.  This is a nonstandard extension, compatible with
115but not specified by POSIX.</dd>
116
117<dt><tt>REG_NOSPEC</tt></dt>
118<dd>Same as <tt>REG_LITERAL</tt>.  This flag is provided for
119compatibility with BSD.</dd>
120
121<dt><tt>REG_RIGHT_ASSOC</tt></dt>
122<dd>By default, concatenation is left associative in TRE, as per
123the grammar given in the <a
124href="http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html">base
125specifications on regular expressions</a> of Std 1003.1-2001 (POSIX).
126This flag flips associativity of concatenation to right associative.
127Associativity can have an effect on how a match is divided into
128submatches, but does not change what is matched by the entire regexp.
129</dd>
130
131<dt><tt>REG_UNGREEDY</tt></dt>
132<dd>By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and
133can be forced to be non-greedy by appending a <tt>?</tt> character. This flag reverses this behavior
134by making the operators non-greedy by default and greedy when a <tt>?</tt> is specified.</dd>
135</dl>
136</blockquote>
137
138<p>
139After a successful call to <tt><font class="func">regcomp</font></tt> it is
140possible to use the <tt><font class="arg">preg</font></tt> pattern buffer for
141searching for matches in strings (see below).  Once the pattern buffer is no
142longer needed, it should be freed with <tt><font
143class="func">regfree</font></tt> to free the memory allocated for it.
144</p>
145
146
147<p>
148The <tt><font class="type">regex_t</font></tt> structure has the
149following fields that the application can read:
150</p>
151<blockquote>
152<dl>
153<dt><tt><font class="type">size_t</font> <font
154class="arg">re_nsub</font></tt></dt>
155<dd>Number of parenthesized subexpressions in <tt><font
156class="arg">regex</font></tt>.
157</dd>
158</dl>
159</blockquote>
160
161<p>
162The <tt><font class="func">regcomp</font></tt> function returns
163zero if the compilation was successful, or one of the following error
164codes if there was an error:
165</p>
166<blockquote>
167<dl>
168<dt><tt>REG_BADPAT</tt></dt>
169<dd>Invalid regexp.  TRE returns this only if a multibyte character
170set is used in the current locale, and <tt><font
171class="arg">regex</font></tt> contained an invalid multibyte
172sequence.</dd>
173<dt><tt>REG_ECOLLATE</tt></dt>
174<dd>Invalid collating element referenced.  TRE returns this whenever
175equivalence classes or multicharacter collating elements are used in
176bracket expressions (they are not supported yet).</dd>
177<dt><tt>REG_ECTYPE</tt></dt>
178<dd>Unknown character class name in <tt>[[:<i>name</i>:]]</tt>.</dd>
179<dt><tt>REG_EESCAPE</tt></dt>
180<dd>The last character of <tt><font class="arg">regex</font></tt>
181was a backslash (<tt>\</tt>).</dd>
182<dt><tt>REG_ESUBREG</tt></dt>
183<dd>Invalid back reference; number in <tt>\<i>digit</i></tt>
184invalid.</dd>
185<dt><tt>REG_EBRACK</tt></dt>
186<dd><tt>[]</tt> imbalance.</dd>
187<dt><tt>REG_EPAREN</tt></dt>
188<dd><tt>\(\)</tt> or <tt>()</tt> imbalance.</dd>
189<dt><tt>REG_EBRACE</tt></dt>
190<dd><tt>\{\}</tt> or <tt>{}</tt> imbalance.</dd>
191<dt><tt>REG_BADBR</tt></dt>
192<dd><tt>{}</tt> content invalid: not a number, more than two numbers,
193first larger than second, or number too large.
194<dt><tt>REG_ERANGE</tt></dt>
195<dd>Invalid character range, e.g. ending point is earlier in the
196collating order than the starting point.</dd>
197<dt><tt>REG_ESPACE</tt></dt>
198<dd>Out of memory, or an internal limit exceeded.</dd>
199<dt><tt>REG_BADRPT</tt></dt>
200<dd>Invalid use of repetition operators: two or more repetition operators have
201been chained in an undefined way.</dd>
202</dl>
203</blockquote>
204
205
206<h2>The <tt>regexec()</tt> functions</h2>
207<a name="regexec"></a>
208
209<div class="code">
210<code>
211#include &lt;tre/regex.h&gt;
212<br>
213<br>
214<font class="type">int</font> <font
215class="func">regexec</font>(<font class="qual">const</font>
216<font class="type">regex_t</font> *<font
217class="arg">preg</font>, <font class="qual">const</font> <font
218class="type">char</font> *<font class="arg">string</font>,
219<font class="type">size_t</font> <font
220class="arg">nmatch</font>,
221<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
222<font class="type">regmatch_t</font> <font
223class="arg">pmatch</font>[], <font class="type">int</font>
224<font class="arg">eflags</font>);
225<br>
226<font class="type">int</font> <font
227class="func">regnexec</font>(<font class="qual">const</font>
228<font class="type">regex_t</font> *<font
229class="arg">preg</font>, <font class="qual">const</font> <font
230class="type">char</font> *<font class="arg">string</font>,
231<font class="type">size_t</font> <font class="arg">len</font>,
232<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
233<font class="type">size_t</font> <font
234class="arg">nmatch</font>, <font class="type">regmatch_t</font>
235<font class="arg">pmatch</font>[], <font
236class="type">int</font> <font class="arg">eflags</font>);
237<br>
238<font class="type">int</font> <font
239class="func">regwexec</font>(<font class="qual">const</font>
240<font class="type">regex_t</font> *<font
241class="arg">preg</font>, <font class="qual">const</font> <font
242class="type">wchar_t</font> *<font class="arg">string</font>,
243<font class="type">size_t</font> <font
244class="arg">nmatch</font>,
245<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
246<font class="type">regmatch_t</font> <font
247class="arg">pmatch</font>[], <font class="type">int</font>
248<font class="arg">eflags</font>);
249<br>
250<font class="type">int</font> <font
251class="func">regwnexec</font>(<font class="qual">const</font>
252<font class="type">regex_t</font> *<font
253class="arg">preg</font>, <font class="qual">const</font> <font
254class="type">wchar_t</font> *<font class="arg">string</font>,
255<font class="type">size_t</font> <font class="arg">len</font>,
256<br>
257&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
258<font class="type">size_t</font> <font
259class="arg">nmatch</font>, <font class="type">regmatch_t</font>
260<font class="arg">pmatch</font>[], <font
261class="type">int</font> <font class="arg">eflags</font>);
262</code>
263</div>
264
265<p>
266The <tt><font class="func">regexec</font>()</tt> function matches
267the null-terminated string against the compiled regexp <tt><font
268class="arg">preg</font></tt>, initialized by a previous call to
269any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.  The
270<tt><font class="func">regnexec</font>()</tt> function is like
271<tt><font class="func">regexec</font>()</tt>, but <tt><font
272class="arg">string</font></tt> is not terminated with a null byte.
273Instead, the <tt><font class="arg">len</font></tt> argument is used
274to give the length of the string, and the string may contain null
275bytes.  The <tt><font class="func">regwexec</font>()</tt> and
276<tt><font class="func">regwnexec</font>()</tt> functions work like
277<tt><font class="func">regexec</font>()</tt> and <tt><font
278class="func">regnexec</font>()</tt>, respectively, but take a wide
279character (<tt><font class="type">wchar_t</font></tt>) string
280instead of a byte string. The <tt><font
281class="arg">eflags</font></tt> argument is a bitwise OR of zero or
282more of the following flags:
283</p>
284<blockquote>
285<dl>
286<dt><code>REG_NOTBOL</code></dt>
287<dd>
288<p>
289When this flag is used, the match-beginning-of-line operator
290<tt>^</tt> does not match the empty string at the beginning of
291<tt><font class="arg">string</font></tt>.  If
292<code>REG_NEWLINE</code> was used when compiling
293<tt><font class="arg">preg</font></tt> the empty string
294immediately after a newline character will still be matched.
295</p>
296</dd>
297
298<dt><code>REG_NOTEOL</code></dt>
299<dd>
300<p>
301When this flag is used, the match-end-of-line operator
302<tt>$</tt> does not match the empty string at the end of
303<tt><font class="arg">string</font></tt>.  If
304<code>REG_NEWLINE</code> was used when compiling
305<tt><font class="arg">preg</font></tt> the empty string
306immediately before a newline character will still be matched.
307</p>
308
309</dl>
310
311<p>
312These flags are useful when different portions of a string are passed
313to <code>regexec</code> and the beginning or end of the partial string
314should not be interpreted as the beginning or end of a line.
315</p>
316
317</blockquote>
318
319<p>
320If <code>REG_NOSUB</code> was used when compiling <tt><font
321class="arg">preg</font></tt>, <tt><font
322class="arg">nmatch</font></tt> is zero, or <tt><font
323class="arg">pmatch</font></tt> is <code>NULL</code>, then the
324<tt><font class="arg">pmatch</font></tt> argument is ignored.
325Otherwise, the submatches corresponding to the parenthesized
326subexpressions are filled in the elements of <tt><font
327class="arg">pmatch</font></tt>, which must be dimensioned to have
328at least <tt><font class="arg">nmatch</font></tt> elements.
329</p>
330
331<p>
332The <tt><font class="type">regmatch_t</font></tt> structure contains
333at least the following fields:
334</p>
335<blockquote>
336<dl>
337<dt><tt><font class="type">regoff_t</font> <font
338class="arg">rm_so</font></tt></dt>
339<dd>Offset from start of <tt><font class="arg">string</font></tt> to start of
340substring.  </dd>
341<dt><tt><font class="type">regoff_t</font> <font
342class="arg">rm_eo</font></tt></dt>
343<dd>Offset from start of <tt><font class="arg">string</font></tt> to the first
344character after the substring.  </dd>
345</dl>
346</blockquote>
347
348<p>
349The length of a submatch can be computed by subtracting <code>rm_eo</code> and
350<code>rm_so</code>.  If a parenthesized subexpression did not participate in a
351match, the <code>rm_so</code> and <code>rm_eo</code> fields for the
352corresponding <code>pmatch</code> element are set to <code>-1</code>.  Note
353that when a multibyte character set is in effect, the submatch offsets are
354given as byte offsets, not character offsets.
355</p>
356
357<p>
358The <code>regexec()</code> functions return zero if a match was found,
359otherwise they return <code>REG_NOMATCH</code> to indicate no match,
360or <code>REG_ESPACE</code> to indicate that enough temporary memory
361could not be allocated to complete the matching operation.
362</p>
363
364
365
366<h3>reguexec()</h3>
367
368<div class="code">
369<code>
370#include &lt;tre/regex.h&gt;
371<br>
372<br>
373<font class="qual">typedef struct</font> {
374<br>
375&nbsp;&nbsp;<font class="type">int</font> (*get_next_char)(<font
376class="type">tre_char_t</font> *<font class="arg">c</font>, <font
377class="type">unsigned int</font> *<font class="arg">pos_add</font>,
378<font class="type">void</font> *<font class="arg">context</font>);
379<br>
380&nbsp;&nbsp;<font class="type">void</font> (*rewind)(<font
381class="type">size_t</font> <font class="arg">pos</font>, <font
382class="type">void</font> *<font class="arg">context</font>);
383<br>
384&nbsp;&nbsp;<font class="type">int</font> (*compare)(<font
385class="type">size_t</font> <font class="arg">pos1</font>, <font
386class="type">size_t</font> <font class="arg">pos2</font>, <font
387class="type">size_t</font> <font class="arg">len</font>, <font
388class="type">void</font> *<font class="arg">context</font>);
389<br>
390&nbsp;&nbsp;<font class="type">void</font> *<font
391class="arg">context</font>;
392<br>
393} <font class="type">tre_str_source</font>;
394<br>
395<br>
396<font class="type">int</font> <font
397class="func">reguexec</font>(<font class="qual">const</font>
398<font class="type">regex_t</font> *<font
399class="arg">preg</font>, <font class="qual">const</font> <font
400class="type">tre_str_source</font> *<font class="arg">string</font>,
401<font class="type">size_t</font> <font class="arg">nmatch</font>,
402<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
403<font class="type">regmatch_t</font> <font
404class="arg">pmatch</font>[], <font class="type">int</font>
405<font class="arg">eflags</font>);
406</code>
407</div>
408
409<p>
410The <tt><font class="func">reguexec</font>()</tt> function works just
411like the other <tt>regexec()</tt> functions, except that the input
412string is read from user specified callback functions instead of a
413character array.  This makes it possible, for example, to match
414regexps over arbitrary user specified data structures.
415</p>
416
417<p>
418The <tt><font class="type">tre_str_source</font></tt> structure
419contains the following fields:
420</p>
421<blockquote>
422<dl>
423<dt><tt>get_next_char</tt></dt>
424<dd>This function must retrieve the next available character.  If a
425character is not available, the space pointed to by
426<tt><font class="arg">c</font></tt> must be set to zero and it must return
427a nonzero value.  If a character is available, it must be stored
428to the space pointed to by
429<tt><font class="arg">c</font></tt>, and the integer pointer to by
430<tt><font class="arg">pos_add</font></tt> must be set to the
431number of units advanced in the input (the value must be
432<tt>&gt;=1</tt>), and zero must be returned.</dd>
433
434<dt><tt>rewind</tt></dt>
435<dd>This function must rewind the input stream to the position
436specified by <tt><font class="arg">pos</font></tt>.  Unless the regexp
437uses back references, <tt>rewind</tt> is not needed and can be set to
438<tt>NULL</tt>.</dd>
439
440<dt><tt>compare</tt></dt>
441<dd>This function compares two substrings in the input streams
442starting at the positions specified by <tt><font
443class="arg">pos1</font></tt> and <tt><font
444class="arg">pos2</font></tt> of length <tt><font
445class="arg">len</font></tt>.  If the substrings are equal,
446<tt>compare</tt> must return zero, otherwise a nonzero value must be
447returned.  Unless the regexp uses back references, <tt>compare</tt> is
448not needed and can be set to <tt>NULL</tt>.</dd>
449
450<dt><tt>context</tt></dt>
451<dd>This is a context variable, passed as the last argument to
452all of the above functions for keeping track of the internal state of
453the users code.</dd>
454
455</dl>
456</blockquote>
457
458<p>
459The position in the input stream is measured in <tt><font
460class="type">size_t</font></tt> units.  The current position is the
461sum of the increments gotten from <tt><font
462class="arg">pos_add</font></tt> (plus the position of the last
463<tt>rewind</tt>, if any).  The starting position is zero.  Submatch
464positions filled in the <tt><font class="arg">pmatch</font>[]</tt>
465array are, of course, given using positions computed in this way.
466</p>
467
468<p>
469For an example of how to use <tt>reguexec()</tt>, see the
470<tt>tests/test-str-source.c</tt> file in the TRE source code
471distribution.
472</p>
473
474<h2>The approximate matching functions</h2>
475<a name="regaexec"></a>
476
477<div class="code">
478<code>
479#include &lt;tre/regex.h&gt;
480<br>
481<br>
482<font class="qual">typedef struct</font> {<br>
483&nbsp;&nbsp;<font class="type">int</font>
484<font class="arg">cost_ins</font>;<br>
485&nbsp;&nbsp;<font class="type">int</font>
486<font class="arg">cost_del</font>;<br>
487&nbsp;&nbsp;<font class="type">int</font>
488<font class="arg">cost_subst</font>;<br>
489&nbsp;&nbsp;<font class="type">int</font>
490<font class="arg">max_cost</font>;<br><br>
491&nbsp;&nbsp;<font class="type">int</font>
492<font class="arg">max_ins</font>;<br>
493&nbsp;&nbsp;<font class="type">int</font>
494<font class="arg">max_del</font>;<br>
495&nbsp;&nbsp;<font class="type">int</font>
496<font class="arg">max_subst</font>;<br>
497&nbsp;&nbsp;<font class="type">int</font>
498<font class="arg">max_err</font>;<br>
499} <font class="type">regaparams_t</font>;<br>
500<br>
501<font class="qual">typedef struct</font> {<br>
502&nbsp;&nbsp;<font class="type">size_t</font>
503<font class="arg">nmatch</font>;<br>
504&nbsp;&nbsp;<font class="type">regmatch_t</font>
505*<font class="arg">pmatch</font>;<br>
506&nbsp;&nbsp;<font class="type">int</font>
507<font class="arg">cost</font>;<br>
508&nbsp;&nbsp;<font class="type">int</font>
509<font class="arg">num_ins</font>;<br>
510&nbsp;&nbsp;<font class="type">int</font>
511<font class="arg">num_del</font>;<br>
512&nbsp;&nbsp;<font class="type">int</font>
513<font class="arg">num_subst</font>;<br>
514} <font class="type">regamatch_t</font>;<br>
515<br>
516<font class="type">int</font> <font
517class="func">regaexec</font>(<font class="qual">const</font>
518<font class="type">regex_t</font> *<font
519class="arg">preg</font>, <font class="qual">const</font> <font
520class="type">char</font> *<font class="arg">string</font>,<br>
521&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
522<font class="type">regamatch_t</font>
523*<font class="arg">match</font>,
524<font class="type">regaparams_t</font>
525<font class="arg">params</font>,
526<font class="type">int</font>
527<font class="arg">eflags</font>);
528<br>
529<font class="type">int</font> <font
530class="func">reganexec</font>(<font class="qual">const</font>
531<font class="type">regex_t</font> *<font
532class="arg">preg</font>, <font class="qual">const</font> <font
533class="type">char</font> *<font class="arg">string</font>,
534<font class="type">size_t</font> <font class="arg">len</font>,<br>
535&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
536<font class="type">regamatch_t</font>
537*<font class="arg">match</font>,
538<font class="type">regaparams_t</font>
539<font class="arg">params</font>,
540<font class="type">int</font> <font class="arg">eflags</font>);
541<br>
542<font class="type">int</font> <font
543class="func">regawexec</font>(<font class="qual">const</font>
544<font class="type">regex_t</font> *<font
545class="arg">preg</font>, <font class="qual">const</font> <font
546class="type">wchar_t</font> *<font class="arg">string</font>,<br>
547&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
548<font class="type">regamatch_t</font>
549*<font class="arg">match</font>,
550<font class="type">regaparams_t</font>
551<font class="arg">params</font>,
552<font class="type">int</font>
553<font class="arg">eflags</font>);
554<br>
555<font class="type">int</font>
556<font class="func">regawnexec</font>(
557<font class="qual">const</font>
558<font class="type">regex_t</font>
559*<font class="arg">preg</font>,
560<font class="qual">const</font>
561<font class="type">wchar_t</font>
562*<font class="arg">string</font>,
563<font class="type">size_t</font>
564<font class="arg">len</font>,<br>
565&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
566<font class="type">regamatch_t</font>
567*<font class="arg">match</font>,
568<font class="type">regaparams_t</font>
569<font class="arg">params</font>,
570<font class="type">int</font>
571<font class="arg">eflags</font>);
572<br>
573</code>
574</div>
575
576<p>
577The <tt><font class="func">regaexec</font>()</tt> function searches for
578the best match in <tt><font class="arg">string</font></tt>
579against the compiled regexp <tt><font
580class="arg">preg</font></tt>, initialized by a previous call to
581any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.
582</p>
583
584<p>
585The <tt><font class="func">reganexec</font>()</tt> function is like
586<tt><font class="func">regaexec</font>()</tt>, but <tt><font
587class="arg">string</font></tt> is not terminated by a null byte.
588Instead, the <tt><font class="arg">len</font></tt> argument is used to
589tell the length of the string, and the string may contain null
590bytes. The <tt><font class="func">regawexec</font>()</tt> and
591<tt><font class="func">regawnexec</font>()</tt> functions work like
592<tt><font class="func">regaexec</font>()</tt> and <tt><font
593class="func">reganexec</font>()</tt>, respectively, but take a wide
594character (<tt><font class="type">wchar_t</font></tt>) string instead
595of a byte string.
596</p>
597
598<p>
599The <tt><font class="arg">eflags</font></tt> argument is like for
600the regexec() functions.
601</p>
602
603<p>
604The <tt><font class="arg">params</font></tt> struct controls the
605approximate matching parameters:
606<blockquote>
607<dl>
608  <dt><tt><font class="type">int</font></tt>
609      <tt><font class="arg">cost_ins</font></tt></dt>
610  <dd>The default cost of an inserted character, that is, an extra
611      character in <tt><font class="arg">string</font></tt>.</dd>
612
613  <dt><tt><font class="type">int</font></tt>
614      <tt><font class="arg">cost_del</font></tt></dt>
615  <dd>The default cost of a deleted character, that is, a character
616      missing from <tt><font class="arg">string</font></tt>.</dd>
617
618  <dt><tt><font class="type">int</font></tt>
619      <tt><font class="arg">cost_subst</font></tt></dt>
620  <dd>The default cost of a substituted character.</dd>
621
622  <dt><tt><font class="type">int</font></tt>
623      <tt><font class="arg">max_cost</font></tt></dt>
624  <dd>The maximum allowed cost of a match.  If this is set to zero,
625      an exact matching is searched for, and results equivalent to
626      those returned by the <tt>regexec()</tt> functions are
627      returned.</dd>
628
629  <dt><tt><font class="type">int</font></tt>
630      <tt><font class="arg">max_ins</font></tt></dt>
631  <dd>Maximum allowed number of inserted characters.</dd>
632
633  <dt><tt><font class="type">int</font></tt>
634      <tt><font class="arg">max_del</font></tt></dt>
635  <dd>Maximum allowed number of deleted characters.</dd>
636
637  <dt><tt><font class="type">int</font></tt>
638      <tt><font class="arg">max_subst</font></tt></dt>
639  <dd>Maximum allowed number of substituted characters.</dd>
640
641  <dt><tt><font class="type">int</font></tt>
642      <tt><font class="arg">max_err</font></tt></dt>
643  <dd>Maximum allowed number of errors (inserts + deletes +
644      substitutes).</dd>
645</dl>
646</blockquote>
647
648<p>
649The <tt><font class="arg">match</font></tt> argument points to a
650<tt><font class="type">regamatch_t</font></tt> structure.  The
651<tt><font class="arg">nmatch</font></tt> and <tt><font
652class="arg">pmatch</font></tt> field must be filled by the caller.  If
653<code>REG_NOSUB</code> was used when compiling the regexp, or
654<code>match-&gt;nmatch</code> is zero, or
655<code>match-&gt;pmatch</code> is <code>NULL</code>, the
656<code>match-&gt;pmatch</code> argument is ignored.  Otherwise, the
657submatches corresponding to the parenthesized subexpressions are
658filled in the elements of <code>match-&gt;pmatch</code>, which must be
659dimensioned to have at least <code>match-&gt;nmatch</code> elements.
660The <code>match-&gt;cost</code> field is set to the cost of the match
661found, and the <code>match-&gt;num_ins</code>,
662<code>match-&gt;num_del</code>, and <code>match-&gt;num_subst</code>
663fields are set to the number of inserts, deletes, and substitutes in
664the match, respectively.
665</p>
666
667<p>
668The <tt>regaexec()</tt> functions return zero if a match with cost
669smaller than <code>params-&gt;max_cost</code> was found, otherwise
670they return <code>REG_NOMATCH</code> to indicate no match, or
671<code>REG_ESPACE</code> to indicate that enough temporary memory could
672not be allocated to complete the matching operation.
673</p>
674
675<h2>Miscellaneous</h2>
676
677<div class="code">
678<code>
679#include &lt;tre/regex.h&gt;
680<br>
681<br>
682<font class="type">int</font> <font
683class="func">tre_have_backrefs</font>(<font class="qual">const</font>
684<font class="type">regex_t</font> *<font class="arg">preg</font>);
685<br>
686<font class="type">int</font> <font
687class="func">tre_have_approx</font>(<font class="qual">const</font>
688<font class="type">regex_t</font> *<font class="arg">preg</font>);
689<br>
690</code>
691</div>
692
693<p>
694The <tt><font class="func">tre_have_backrefs</font>()</tt> and
695<tt><font class="func">tre_have_approx</font>()</tt> functions return
6961 if the compiled pattern has back references or uses approximate
697matching, respectively, and 0 if not.
698</p>
699
700
701<h2>Checking build time options</h2>
702
703<a name="tre_config"></a>
704<div class="code">
705<code>
706#include &lt;tre/regex.h&gt;
707<br>
708<br>
709<font class="type">char</font> *<font
710class="func">tre_version</font>(<font class="type">void</font>);
711<br>
712<font class="type">int</font> <font
713class="func">tre_config</font>(<font class="type">int</font> <font
714class="arg">query</font>, <font class="type">void</font> *<font
715class="arg">result</font>);
716<br>
717</code>
718</div>
719
720<p>
721The <tt><font class="func">tre_config</font>()</tt> function can be
722used to retrieve information of which optional features have been
723compiled into the TRE library and information of other parameters that
724may change between releases.
725</p>
726
727<p>
728The <tt><font class="arg">query</font></tt> argument is an integer
729telling what information is requested for.  The <tt><font
730class="arg">result</font></tt> argument is a pointer to a variable
731where the information is returned.  The return value of a call to
732<tt><font class="func">tre_config</font>()</tt> is zero if <tt><font
733class="arg">query</font></tt> was recognized, REG_NOMATCH otherwise.
734</p>
735
736<p>
737The following values are recognized for <tt><font
738class="arg">query</font></tt>:
739
740<blockquote>
741<dl>
742<dt><tt>TRE_CONFIG_APPROX</tt></dt>
743<dd>The result is an integer that is set to one if approximate
744matching support is available, zero if not.</dd>
745<dt><tt>TRE_CONFIG_WCHAR</tt></dt>
746<dd>The result is an integer that is set to one if wide character
747support is available, zero if not.</dd>
748<dt><tt>TRE_CONFIG_MULTIBYTE</tt></dt>
749<dd>The result is an integer that is set to one if multibyte character
750set support is available, zero if not.</dd>
751<dt><tt>TRE_CONFIG_SYSTEM_ABI</tt></dt>
752<dd>The result is an integer that is set to one if TRE has been
753compiled to be compatible with the system regex ABI, zero if not.</dd>
754<dt><tt>TRE_CONFIG_VERSION</tt></dt>
755<dd>The result is a pointer to a static character string that gives
756the version of the TRE library.</dd>
757</dl>
758</blockquote>
759
760
761<p>
762The <tt><font class="func">tre_version</font>()</tt> function returns
763a short human readable character string which shows the software name,
764version, and license.
765
766<h2>Preprocessor definitions</h2>
767
768<p>The header <tt>&lt;tre/regex.h&gt;</tt> defines certain
769C preprocessor symbols.
770
771<h3>Version information</h3>
772
773<p>The following definitions may be useful for checking whether a new
774enough version is being used.  Note that it is recommended to use the
775<tt>pkg-config</tt> tool for version and other checks in Autoconf
776scripts.</p>
777
778<blockquote>
779<dl>
780<dt><tt>TRE_VERSION</tt></dt>
781<dd>The version string. </dd>
782
783<dt><tt>TRE_VERSION_1</tt></dt>
784<dd>The major version number (first part of version string).</dd>
785
786<dt><tt>TRE_VERSION_2</tt></dt>
787<dd>The minor version number (second part of version string).</dd>
788
789<dt><tt>TRE_VERSION_3</tt></dt>
790<dd>The micro version number (third part of version string).</dd>
791
792</dl>
793</blockquote>
794
795<h3>Features</h3>
796
797<p>The following definitions may be useful for checking whether all
798necessary features are enabled.  Use these only if compile time
799checking suffices (linking statically with TRE).  When linking
800dynamically <a href="#tre_config"><tt>tre_config()</tt></a> should be used
801instead.</p>
802
803<blockquote>
804<dl>
805<dt><tt>TRE_APPROX</tt></dt>
806<dd>This is defined if approximate matching support is enabled.  The
807prototypes for approximate matching functions are defined only if
808<tt>TRE_APPROX</tt> is defined.</dd>
809
810<dt><tt>TRE_WCHAR</tt></dt>
811<dd>This is defined if wide character support is enabled.  The
812prototypes for wide character matching functions are defined only if
813<tt>TRE_WCHAR</tt> is defined.</dd>
814
815<dt><tt>TRE_MULTIBYTE</tt></dt>
816<dd>This is defined if multibyte character set support is enabled.
817If this is not set any locale settings are ignored, and the default
818locale is used when parsing regexps and matching strings.</dd>
819
820</dl>
821</blockquote>
822