1<html> 2<head> 3<title>pcrebuild specification</title> 4</head> 5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6<h1>pcrebuild man page</h1> 7<p> 8Return to the <a href="index.html">PCRE index page</a>. 9</p> 10<p> 11This page is part of the PCRE HTML documentation. It was generated automatically 12from the original man page. If there is any nonsense in it, please consult the 13man page, in case the conversion went wrong. 14<br> 15<ul> 16<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a> 17<li><a name="TOC2" href="#SEC2">BUILDING 8-BIT and 16-BIT LIBRARIES</a> 18<li><a name="TOC3" href="#SEC3">BUILDING SHARED AND STATIC LIBRARIES</a> 19<li><a name="TOC4" href="#SEC4">C++ SUPPORT</a> 20<li><a name="TOC5" href="#SEC5">UTF-8 and UTF-16 SUPPORT</a> 21<li><a name="TOC6" href="#SEC6">UNICODE CHARACTER PROPERTY SUPPORT</a> 22<li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a> 23<li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a> 24<li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a> 25<li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a> 26<li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a> 27<li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a> 28<li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a> 29<li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a> 30<li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a> 31<li><a name="TOC16" href="#SEC16">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a> 32<li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a> 33<li><a name="TOC18" href="#SEC18">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a> 34<li><a name="TOC19" href="#SEC19">SEE ALSO</a> 35<li><a name="TOC20" href="#SEC20">AUTHOR</a> 36<li><a name="TOC21" href="#SEC21">REVISION</a> 37</ul> 38<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br> 39<P> 40This document describes the optional features of PCRE that can be selected when 41the library is compiled. It assumes use of the <b>configure</b> script, where 42the optional features are selected or deselected by providing options to 43<b>configure</b> before running the <b>make</b> command. However, the same 44options can be selected in both Unix-like and non-Unix-like environments using 45the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of 46<b>configure</b> to build PCRE. 47</P> 48<P> 49There is a lot more information about building PCRE in non-Unix-like 50environments in the file called <i>NON_UNIX_USE</i>, which is part of the PCRE 51distribution. You should consult this file as well as the <i>README</i> file if 52you are building in a non-Unix-like environment. 53</P> 54<P> 55The complete list of options for <b>configure</b> (which includes the standard 56ones such as the selection of the installation directory) can be obtained by 57running 58<pre> 59 ./configure --help 60</pre> 61The following sections include descriptions of options whose names begin with 62--enable or --disable. These settings specify changes to the defaults for the 63<b>configure</b> command. Because of the way that <b>configure</b> works, 64--enable and --disable always come in pairs, so the complementary option always 65exists as well, but as it specifies the default, it is not described. 66</P> 67<br><a name="SEC2" href="#TOC1">BUILDING 8-BIT and 16-BIT LIBRARIES</a><br> 68<P> 69By default, a library called <b>libpcre</b> is built, containing functions that 70take string arguments contained in vectors of bytes, either as single-byte 71characters, or interpreted as UTF-8 strings. You can also build a separate 72library, called <b>libpcre16</b>, in which strings are contained in vectors of 7316-bit data units and interpreted either as single-unit characters or UTF-16 74strings, by adding 75<pre> 76 --enable-pcre16 77</pre> 78to the <b>configure</b> command. If you do not want the 8-bit library, add 79<pre> 80 --disable-pcre8 81</pre> 82as well. At least one of the two libraries must be built. Note that the C++ and 83POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is an 848-bit program. None of these are built if you select only the 16-bit library. 85</P> 86<br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br> 87<P> 88The PCRE building process uses <b>libtool</b> to build both shared and static 89Unix libraries by default. You can suppress one of these by adding one of 90<pre> 91 --disable-shared 92 --disable-static 93</pre> 94to the <b>configure</b> command, as required. 95</P> 96<br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br> 97<P> 98By default, if the 8-bit library is being built, the <b>configure</b> script 99will search for a C++ compiler and C++ header files. If it finds them, it 100automatically builds the C++ wrapper library (which supports only 8-bit 101strings). You can disable this by adding 102<pre> 103 --disable-cpp 104</pre> 105to the <b>configure</b> command. 106</P> 107<br><a name="SEC5" href="#TOC1">UTF-8 and UTF-16 SUPPORT</a><br> 108<P> 109To build PCRE with support for UTF Unicode character strings, add 110<pre> 111 --enable-utf 112</pre> 113to the <b>configure</b> command. This setting applies to both libraries, adding 114support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit 115library. There are no separate options for enabling UTF-8 and UTF-16 116independently because that would allow ridiculous settings such as requesting 117UTF-16 support while building only the 8-bit library. It is not possible to 118build one library with UTF support and the other without in the same 119configuration. (For backwards compatibility, --enable-utf8 is a synonym of 120--enable-utf.) 121</P> 122<P> 123Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As 124well as compiling PCRE with this option, you also have have to set the 125PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling 126functions. 127</P> 128<P> 129If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects 130its input to be either ASCII or UTF-8 (depending on the run-time option). It is 131not possible to support both EBCDIC and UTF-8 codes in the same version of the 132library. Consequently, --enable-utf and --enable-ebcdic are mutually 133exclusive. 134</P> 135<br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br> 136<P> 137UTF support allows the libraries to process character codepoints up to 0x10ffff 138in the strings that they handle. On its own, however, it does not provide any 139facilities for accessing the properties of such characters. If you want to be 140able to use the pattern escapes \P, \p, and \X, which refer to Unicode 141character properties, you must add 142<pre> 143 --enable-unicode-properties 144</pre> 145to the <b>configure</b> command. This implies UTF support, even if you have 146not explicitly requested it. 147</P> 148<P> 149Including Unicode property support adds around 30K of tables to the PCRE 150library. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are 151supported. Details are given in the 152<a href="pcrepattern.html"><b>pcrepattern</b></a> 153documentation. 154</P> 155<br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br> 156<P> 157Just-in-time compiler support is included in the build by specifying 158<pre> 159 --enable-jit 160</pre> 161This support is available only for certain hardware architectures. If this 162option is set for an unsupported architecture, a compile time error occurs. 163See the 164<a href="pcrejit.html"><b>pcrejit</b></a> 165documentation for a discussion of JIT usage. When JIT support is enabled, 166pcregrep automatically makes use of it, unless you add 167<pre> 168 --disable-pcregrep-jit 169</pre> 170to the "configure" command. 171</P> 172<br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br> 173<P> 174By default, PCRE interprets the linefeed (LF) character as indicating the end 175of a line. This is the normal newline character on Unix-like systems. You can 176compile PCRE to use carriage return (CR) instead, by adding 177<pre> 178 --enable-newline-is-cr 179</pre> 180to the <b>configure</b> command. There is also a --enable-newline-is-lf option, 181which explicitly specifies linefeed as the newline character. 182<br> 183<br> 184Alternatively, you can specify that line endings are to be indicated by the two 185character sequence CRLF. If you want this, add 186<pre> 187 --enable-newline-is-crlf 188</pre> 189to the <b>configure</b> command. There is a fourth option, specified by 190<pre> 191 --enable-newline-is-anycrlf 192</pre> 193which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as 194indicating a line ending. Finally, a fifth option, specified by 195<pre> 196 --enable-newline-is-any 197</pre> 198causes PCRE to recognize any Unicode newline sequence. 199</P> 200<P> 201Whatever line ending convention is selected when PCRE is built can be 202overridden when the library functions are called. At build time it is 203conventional to use the standard for your operating system. 204</P> 205<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br> 206<P> 207By default, the sequence \R in a pattern matches any Unicode newline sequence, 208whatever has been selected as the line ending sequence. If you specify 209<pre> 210 --enable-bsr-anycrlf 211</pre> 212the default is changed so that \R matches only CR, LF, or CRLF. Whatever is 213selected when PCRE is built can be overridden when the library functions are 214called. 215</P> 216<br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br> 217<P> 218When the 8-bit library is called through the POSIX interface (see the 219<a href="pcreposix.html"><b>pcreposix</b></a> 220documentation), additional working storage is required for holding the pointers 221to capturing substrings, because PCRE requires three integers per substring, 222whereas the POSIX interface provides only two. If the number of expected 223substrings is small, the wrapper function uses space on the stack, because this 224is faster than using <b>malloc()</b> for each call. The default threshold above 225which the stack is no longer used is 10; it can be changed by adding a setting 226such as 227<pre> 228 --with-posix-malloc-threshold=20 229</pre> 230to the <b>configure</b> command. 231</P> 232<br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br> 233<P> 234Within a compiled pattern, offset values are used to point from one part to 235another (for example, from an opening parenthesis to an alternation 236metacharacter). By default, two-byte values are used for these offsets, leading 237to a maximum size for a compiled pattern of around 64K. This is sufficient to 238handle all but the most gigantic patterns. Nevertheless, some people do want to 239process truly enormous patterns, so it is possible to compile PCRE to use 240three-byte or four-byte offsets by adding a setting such as 241<pre> 242 --with-link-size=3 243</pre> 244to the <b>configure</b> command. The value given must be 2, 3, or 4. For the 24516-bit library, a value of 3 is rounded up to 4. Using longer offsets slows 246down the operation of PCRE because it has to load additional data when handling 247them. 248</P> 249<br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br> 250<P> 251When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking 252by making recursive calls to an internal function called <b>match()</b>. In 253environments where the size of the stack is limited, this can severely limit 254PCRE's operation. (The Unix environment does not usually suffer from this 255problem, but it may sometimes be necessary to increase the maximum stack size. 256There is a discussion in the 257<a href="pcrestack.html"><b>pcrestack</b></a> 258documentation.) An alternative approach to recursion that uses memory from the 259heap to remember data, instead of using recursive function calls, has been 260implemented to work round the problem of limited stack size. If you want to 261build a version of PCRE that works this way, add 262<pre> 263 --disable-stack-for-recursion 264</pre> 265to the <b>configure</b> command. With this configuration, PCRE will use the 266<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory 267management functions. By default these point to <b>malloc()</b> and 268<b>free()</b>, but you can replace the pointers so that your own functions are 269used instead. 270</P> 271<P> 272Separate functions are provided rather than using <b>pcre_malloc</b> and 273<b>pcre_free</b> because the usage is very predictable: the block sizes 274requested are always the same, and the blocks are always freed in reverse 275order. A calling program might be able to implement optimized functions that 276perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more 277slowly when built in this way. This option affects only the <b>pcre_exec()</b> 278function; it is not relevant for <b>pcre_dfa_exec()</b>. 279</P> 280<br><a name="SEC13" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br> 281<P> 282Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly 283(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b> 284function. By controlling the maximum number of times this function may be 285called during a single matching operation, a limit can be placed on the 286resources used by a single call to <b>pcre_exec()</b>. The limit can be changed 287at run time, as described in the 288<a href="pcreapi.html"><b>pcreapi</b></a> 289documentation. The default is 10 million, but this can be changed by adding a 290setting such as 291<pre> 292 --with-match-limit=500000 293</pre> 294to the <b>configure</b> command. This setting has no effect on the 295<b>pcre_dfa_exec()</b> matching function. 296</P> 297<P> 298In some environments it is desirable to limit the depth of recursive calls of 299<b>match()</b> more strictly than the total number of calls, in order to 300restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion 301is specified) that is used. A second limit controls this; it defaults to the 302value that is set for --with-match-limit, which imposes no additional 303constraints. However, you can set a lower limit by adding, for example, 304<pre> 305 --with-match-limit-recursion=10000 306</pre> 307to the <b>configure</b> command. This value can also be overridden at run time. 308</P> 309<br><a name="SEC14" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br> 310<P> 311PCRE uses fixed tables for processing characters whose code values are less 312than 256. By default, PCRE is built with a set of tables that are distributed 313in the file <i>pcre_chartables.c.dist</i>. These tables are for ASCII codes 314only. If you add 315<pre> 316 --enable-rebuild-chartables 317</pre> 318to the <b>configure</b> command, the distributed tables are no longer used. 319Instead, a program called <b>dftables</b> is compiled and run. This outputs the 320source for new set of tables, created in the default locale of your C run-time 321system. (This method of replacing the tables does not work if you are cross 322compiling, because <b>dftables</b> is run on the local host. If you need to 323create alternative tables when cross compiling, you will have to do so "by 324hand".) 325</P> 326<br><a name="SEC15" href="#TOC1">USING EBCDIC CODE</a><br> 327<P> 328PCRE assumes by default that it will run in an environment where the character 329code is ASCII (or Unicode, which is a superset of ASCII). This is the case for 330most computer operating systems. PCRE can, however, be compiled to run in an 331EBCDIC environment by adding 332<pre> 333 --enable-ebcdic 334</pre> 335to the <b>configure</b> command. This setting implies 336--enable-rebuild-chartables. You should only use it if you know that you are in 337an EBCDIC environment (for example, an IBM mainframe operating system). The 338--enable-ebcdic option is incompatible with --enable-utf. 339</P> 340<br><a name="SEC16" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br> 341<P> 342By default, <b>pcregrep</b> reads all files as plain text. You can build it so 343that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads 344them with <b>libz</b> or <b>libbz2</b>, respectively, by adding one or both of 345<pre> 346 --enable-pcregrep-libz 347 --enable-pcregrep-libbz2 348</pre> 349to the <b>configure</b> command. These options naturally require that the 350relevant libraries are installed on your system. Configuration will fail if 351they are not. 352</P> 353<br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br> 354<P> 355<b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is 356scanning, in order to be able to output "before" and "after" lines when it 357finds a match. The size of the buffer is controlled by a parameter whose 358default value is 20K. The buffer itself is three times this size, but because 359of the way it is used for holding "before" lines, the longest line that is 360guaranteed to be processable is the parameter size. You can change the default 361parameter value by adding, for example, 362<pre> 363 --with-pcregrep-bufsize=50K 364</pre> 365to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however, 366override this value by specifying a run-time option. 367</P> 368<br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br> 369<P> 370If you add 371<pre> 372 --enable-pcretest-libreadline 373</pre> 374to the <b>configure</b> command, <b>pcretest</b> is linked with the 375<b>libreadline</b> library, and when its input is from a terminal, it reads it 376using the <b>readline()</b> function. This provides line-editing and history 377facilities. Note that <b>libreadline</b> is GPL-licensed, so if you distribute a 378binary of <b>pcretest</b> linked in this way, there may be licensing issues. 379</P> 380<P> 381Setting this option causes the <b>-lreadline</b> option to be added to the 382<b>pcretest</b> build. In many operating environments with a sytem-installed 383<b>libreadline</b> this is sufficient. However, in some environments (e.g. 384if an unmodified distribution version of readline is in use), some extra 385configuration may be necessary. The INSTALL file for <b>libreadline</b> says 386this: 387<pre> 388 "Readline uses the termcap functions, but does not link with the 389 termcap or curses library itself, allowing applications which link 390 with readline the to choose an appropriate library." 391</pre> 392If your environment has not been set up so that an appropriate library is 393automatically included, you may need to add something like 394<pre> 395 LIBS="-ncurses" 396</pre> 397immediately before the <b>configure</b> command. 398</P> 399<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br> 400<P> 401<b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre_config</b>(3). 402</P> 403<br><a name="SEC20" href="#TOC1">AUTHOR</a><br> 404<P> 405Philip Hazel 406<br> 407University Computing Service 408<br> 409Cambridge CB2 3QH, England. 410<br> 411</P> 412<br><a name="SEC21" href="#TOC1">REVISION</a><br> 413<P> 414Last updated: 07 January 2012 415<br> 416Copyright © 1997-2012 University of Cambridge. 417<br> 418<p> 419Return to the <a href="index.html">PCRE index page</a>. 420</p> 421