1<html>
2<head>
3<title>pcrebuild specification</title>
4</head>
5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6<h1>pcrebuild man page</h1>
7<p>
8Return to the <a href="index.html">PCRE index page</a>.
9</p>
10<p>
11This page is part of the PCRE HTML documentation. It was generated automatically
12from the original man page. If there is any nonsense in it, please consult the
13man page, in case the conversion went wrong.
14<br>
15<ul>
16<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
17<li><a name="TOC2" href="#SEC2">BUILDING 8-BIT and 16-BIT LIBRARIES</a>
18<li><a name="TOC3" href="#SEC3">BUILDING SHARED AND STATIC LIBRARIES</a>
19<li><a name="TOC4" href="#SEC4">C++ SUPPORT</a>
20<li><a name="TOC5" href="#SEC5">UTF-8 and UTF-16 SUPPORT</a>
21<li><a name="TOC6" href="#SEC6">UNICODE CHARACTER PROPERTY SUPPORT</a>
22<li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
23<li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a>
24<li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
25<li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a>
26<li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a>
27<li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a>
28<li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a>
29<li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a>
30<li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a>
31<li><a name="TOC16" href="#SEC16">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
32<li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a>
33<li><a name="TOC18" href="#SEC18">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
34<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
35<li><a name="TOC20" href="#SEC20">AUTHOR</a>
36<li><a name="TOC21" href="#SEC21">REVISION</a>
37</ul>
38<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
39<P>
40This document describes the optional features of PCRE that can be selected when
41the library is compiled. It assumes use of the <b>configure</b> script, where
42the optional features are selected or deselected by providing options to
43<b>configure</b> before running the <b>make</b> command. However, the same
44options can be selected in both Unix-like and non-Unix-like environments using
45the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of
46<b>configure</b> to build PCRE.
47</P>
48<P>
49There is a lot more information about building PCRE in non-Unix-like
50environments in the file called <i>NON_UNIX_USE</i>, which is part of the PCRE
51distribution. You should consult this file as well as the <i>README</i> file if
52you are building in a non-Unix-like environment.
53</P>
54<P>
55The complete list of options for <b>configure</b> (which includes the standard
56ones such as the selection of the installation directory) can be obtained by
57running
58<pre>
59  ./configure --help
60</pre>
61The following sections include descriptions of options whose names begin with
62--enable or --disable. These settings specify changes to the defaults for the
63<b>configure</b> command. Because of the way that <b>configure</b> works,
64--enable and --disable always come in pairs, so the complementary option always
65exists as well, but as it specifies the default, it is not described.
66</P>
67<br><a name="SEC2" href="#TOC1">BUILDING 8-BIT and 16-BIT LIBRARIES</a><br>
68<P>
69By default, a library called <b>libpcre</b> is built, containing functions that
70take string arguments contained in vectors of bytes, either as single-byte
71characters, or interpreted as UTF-8 strings. You can also build a separate
72library, called <b>libpcre16</b>, in which strings are contained in vectors of
7316-bit data units and interpreted either as single-unit characters or UTF-16
74strings, by adding
75<pre>
76  --enable-pcre16
77</pre>
78to the <b>configure</b> command. If you do not want the 8-bit library, add
79<pre>
80  --disable-pcre8
81</pre>
82as well. At least one of the two libraries must be built. Note that the C++ and
83POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is an
848-bit program. None of these are built if you select only the 16-bit library.
85</P>
86<br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
87<P>
88The PCRE building process uses <b>libtool</b> to build both shared and static
89Unix libraries by default. You can suppress one of these by adding one of
90<pre>
91  --disable-shared
92  --disable-static
93</pre>
94to the <b>configure</b> command, as required.
95</P>
96<br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br>
97<P>
98By default, if the 8-bit library is being built, the <b>configure</b> script
99will search for a C++ compiler and C++ header files. If it finds them, it
100automatically builds the C++ wrapper library (which supports only 8-bit
101strings). You can disable this by adding
102<pre>
103  --disable-cpp
104</pre>
105to the <b>configure</b> command.
106</P>
107<br><a name="SEC5" href="#TOC1">UTF-8 and UTF-16 SUPPORT</a><br>
108<P>
109To build PCRE with support for UTF Unicode character strings, add
110<pre>
111  --enable-utf
112</pre>
113to the <b>configure</b> command. This setting applies to both libraries, adding
114support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
115library. There are no separate options for enabling UTF-8 and UTF-16
116independently because that would allow ridiculous settings such as requesting
117UTF-16 support while building only the 8-bit library. It is not possible to
118build one library with UTF support and the other without in the same
119configuration. (For backwards compatibility, --enable-utf8 is a synonym of
120--enable-utf.)
121</P>
122<P>
123Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
124well as compiling PCRE with this option, you also have have to set the
125PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling
126functions.
127</P>
128<P>
129If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
130its input to be either ASCII or UTF-8 (depending on the run-time option). It is
131not possible to support both EBCDIC and UTF-8 codes in the same version of the
132library. Consequently, --enable-utf and --enable-ebcdic are mutually
133exclusive.
134</P>
135<br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
136<P>
137UTF support allows the libraries to process character codepoints up to 0x10ffff
138in the strings that they handle. On its own, however, it does not provide any
139facilities for accessing the properties of such characters. If you want to be
140able to use the pattern escapes \P, \p, and \X, which refer to Unicode
141character properties, you must add
142<pre>
143  --enable-unicode-properties
144</pre>
145to the <b>configure</b> command. This implies UTF support, even if you have
146not explicitly requested it.
147</P>
148<P>
149Including Unicode property support adds around 30K of tables to the PCRE
150library. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
151supported. Details are given in the
152<a href="pcrepattern.html"><b>pcrepattern</b></a>
153documentation.
154</P>
155<br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
156<P>
157Just-in-time compiler support is included in the build by specifying
158<pre>
159  --enable-jit
160</pre>
161This support is available only for certain hardware architectures. If this
162option is set for an unsupported architecture, a compile time error occurs.
163See the
164<a href="pcrejit.html"><b>pcrejit</b></a>
165documentation for a discussion of JIT usage. When JIT support is enabled,
166pcregrep automatically makes use of it, unless you add
167<pre>
168  --disable-pcregrep-jit
169</pre>
170to the "configure" command.
171</P>
172<br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
173<P>
174By default, PCRE interprets the linefeed (LF) character as indicating the end
175of a line. This is the normal newline character on Unix-like systems. You can
176compile PCRE to use carriage return (CR) instead, by adding
177<pre>
178  --enable-newline-is-cr
179</pre>
180to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
181which explicitly specifies linefeed as the newline character.
182<br>
183<br>
184Alternatively, you can specify that line endings are to be indicated by the two
185character sequence CRLF. If you want this, add
186<pre>
187  --enable-newline-is-crlf
188</pre>
189to the <b>configure</b> command. There is a fourth option, specified by
190<pre>
191  --enable-newline-is-anycrlf
192</pre>
193which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
194indicating a line ending. Finally, a fifth option, specified by
195<pre>
196  --enable-newline-is-any
197</pre>
198causes PCRE to recognize any Unicode newline sequence.
199</P>
200<P>
201Whatever line ending convention is selected when PCRE is built can be
202overridden when the library functions are called. At build time it is
203conventional to use the standard for your operating system.
204</P>
205<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
206<P>
207By default, the sequence \R in a pattern matches any Unicode newline sequence,
208whatever has been selected as the line ending sequence. If you specify
209<pre>
210  --enable-bsr-anycrlf
211</pre>
212the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
213selected when PCRE is built can be overridden when the library functions are
214called.
215</P>
216<br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br>
217<P>
218When the 8-bit library is called through the POSIX interface (see the
219<a href="pcreposix.html"><b>pcreposix</b></a>
220documentation), additional working storage is required for holding the pointers
221to capturing substrings, because PCRE requires three integers per substring,
222whereas the POSIX interface provides only two. If the number of expected
223substrings is small, the wrapper function uses space on the stack, because this
224is faster than using <b>malloc()</b> for each call. The default threshold above
225which the stack is no longer used is 10; it can be changed by adding a setting
226such as
227<pre>
228  --with-posix-malloc-threshold=20
229</pre>
230to the <b>configure</b> command.
231</P>
232<br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
233<P>
234Within a compiled pattern, offset values are used to point from one part to
235another (for example, from an opening parenthesis to an alternation
236metacharacter). By default, two-byte values are used for these offsets, leading
237to a maximum size for a compiled pattern of around 64K. This is sufficient to
238handle all but the most gigantic patterns. Nevertheless, some people do want to
239process truly enormous patterns, so it is possible to compile PCRE to use
240three-byte or four-byte offsets by adding a setting such as
241<pre>
242  --with-link-size=3
243</pre>
244to the <b>configure</b> command. The value given must be 2, 3, or 4. For the
24516-bit library, a value of 3 is rounded up to 4. Using longer offsets slows
246down the operation of PCRE because it has to load additional data when handling
247them.
248</P>
249<br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
250<P>
251When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
252by making recursive calls to an internal function called <b>match()</b>. In
253environments where the size of the stack is limited, this can severely limit
254PCRE's operation. (The Unix environment does not usually suffer from this
255problem, but it may sometimes be necessary to increase the maximum stack size.
256There is a discussion in the
257<a href="pcrestack.html"><b>pcrestack</b></a>
258documentation.) An alternative approach to recursion that uses memory from the
259heap to remember data, instead of using recursive function calls, has been
260implemented to work round the problem of limited stack size. If you want to
261build a version of PCRE that works this way, add
262<pre>
263  --disable-stack-for-recursion
264</pre>
265to the <b>configure</b> command. With this configuration, PCRE will use the
266<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
267management functions. By default these point to <b>malloc()</b> and
268<b>free()</b>, but you can replace the pointers so that your own functions are
269used instead.
270</P>
271<P>
272Separate functions are provided rather than using <b>pcre_malloc</b> and
273<b>pcre_free</b> because the usage is very predictable: the block sizes
274requested are always the same, and the blocks are always freed in reverse
275order. A calling program might be able to implement optimized functions that
276perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
277slowly when built in this way. This option affects only the <b>pcre_exec()</b>
278function; it is not relevant for <b>pcre_dfa_exec()</b>.
279</P>
280<br><a name="SEC13" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
281<P>
282Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
283(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
284function. By controlling the maximum number of times this function may be
285called during a single matching operation, a limit can be placed on the
286resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
287at run time, as described in the
288<a href="pcreapi.html"><b>pcreapi</b></a>
289documentation. The default is 10 million, but this can be changed by adding a
290setting such as
291<pre>
292  --with-match-limit=500000
293</pre>
294to the <b>configure</b> command. This setting has no effect on the
295<b>pcre_dfa_exec()</b> matching function.
296</P>
297<P>
298In some environments it is desirable to limit the depth of recursive calls of
299<b>match()</b> more strictly than the total number of calls, in order to
300restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
301is specified) that is used. A second limit controls this; it defaults to the
302value that is set for --with-match-limit, which imposes no additional
303constraints. However, you can set a lower limit by adding, for example,
304<pre>
305  --with-match-limit-recursion=10000
306</pre>
307to the <b>configure</b> command. This value can also be overridden at run time.
308</P>
309<br><a name="SEC14" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
310<P>
311PCRE uses fixed tables for processing characters whose code values are less
312than 256. By default, PCRE is built with a set of tables that are distributed
313in the file <i>pcre_chartables.c.dist</i>. These tables are for ASCII codes
314only. If you add
315<pre>
316  --enable-rebuild-chartables
317</pre>
318to the <b>configure</b> command, the distributed tables are no longer used.
319Instead, a program called <b>dftables</b> is compiled and run. This outputs the
320source for new set of tables, created in the default locale of your C run-time
321system. (This method of replacing the tables does not work if you are cross
322compiling, because <b>dftables</b> is run on the local host. If you need to
323create alternative tables when cross compiling, you will have to do so "by
324hand".)
325</P>
326<br><a name="SEC15" href="#TOC1">USING EBCDIC CODE</a><br>
327<P>
328PCRE assumes by default that it will run in an environment where the character
329code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
330most computer operating systems. PCRE can, however, be compiled to run in an
331EBCDIC environment by adding
332<pre>
333  --enable-ebcdic
334</pre>
335to the <b>configure</b> command. This setting implies
336--enable-rebuild-chartables. You should only use it if you know that you are in
337an EBCDIC environment (for example, an IBM mainframe operating system). The
338--enable-ebcdic option is incompatible with --enable-utf.
339</P>
340<br><a name="SEC16" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
341<P>
342By default, <b>pcregrep</b> reads all files as plain text. You can build it so
343that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
344them with <b>libz</b> or <b>libbz2</b>, respectively, by adding one or both of
345<pre>
346  --enable-pcregrep-libz
347  --enable-pcregrep-libbz2
348</pre>
349to the <b>configure</b> command. These options naturally require that the
350relevant libraries are installed on your system. Configuration will fail if
351they are not.
352</P>
353<br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
354<P>
355<b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
356scanning, in order to be able to output "before" and "after" lines when it
357finds a match. The size of the buffer is controlled by a parameter whose
358default value is 20K. The buffer itself is three times this size, but because
359of the way it is used for holding "before" lines, the longest line that is
360guaranteed to be processable is the parameter size. You can change the default
361parameter value by adding, for example,
362<pre>
363  --with-pcregrep-bufsize=50K
364</pre>
365to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
366override this value by specifying a run-time option.
367</P>
368<br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
369<P>
370If you add
371<pre>
372  --enable-pcretest-libreadline
373</pre>
374to the <b>configure</b> command, <b>pcretest</b> is linked with the
375<b>libreadline</b> library, and when its input is from a terminal, it reads it
376using the <b>readline()</b> function. This provides line-editing and history
377facilities. Note that <b>libreadline</b> is GPL-licensed, so if you distribute a
378binary of <b>pcretest</b> linked in this way, there may be licensing issues.
379</P>
380<P>
381Setting this option causes the <b>-lreadline</b> option to be added to the
382<b>pcretest</b> build. In many operating environments with a sytem-installed
383<b>libreadline</b> this is sufficient. However, in some environments (e.g.
384if an unmodified distribution version of readline is in use), some extra
385configuration may be necessary. The INSTALL file for <b>libreadline</b> says
386this:
387<pre>
388  "Readline uses the termcap functions, but does not link with the
389  termcap or curses library itself, allowing applications which link
390  with readline the to choose an appropriate library."
391</pre>
392If your environment has not been set up so that an appropriate library is
393automatically included, you may need to add something like
394<pre>
395  LIBS="-ncurses"
396</pre>
397immediately before the <b>configure</b> command.
398</P>
399<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
400<P>
401<b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre_config</b>(3).
402</P>
403<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
404<P>
405Philip Hazel
406<br>
407University Computing Service
408<br>
409Cambridge CB2 3QH, England.
410<br>
411</P>
412<br><a name="SEC21" href="#TOC1">REVISION</a><br>
413<P>
414Last updated: 07 January 2012
415<br>
416Copyright &copy; 1997-2012 University of Cambridge.
417<br>
418<p>
419Return to the <a href="index.html">PCRE index page</a>.
420</p>
421