1<html>
2<head>
3<title>pcresyntax specification</title>
4</head>
5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6<h1>pcresyntax man page</h1>
7<p>
8Return to the <a href="index.html">PCRE index page</a>.
9</p>
10<p>
11This page is part of the PCRE HTML documentation. It was generated automatically
12from the original man page. If there is any nonsense in it, please consult the
13man page, in case the conversion went wrong.
14<br>
15<ul>
16<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17<li><a name="TOC2" href="#SEC2">QUOTING</a>
18<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
21<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
22<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
23<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
24<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
25<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
26<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
27<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
28<li><a name="TOC13" href="#SEC13">CAPTURING</a>
29<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
30<li><a name="TOC15" href="#SEC15">COMMENT</a>
31<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
32<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
33<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
34<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
35<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
36<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
37<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
38<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
39<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
40<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
41<li><a name="TOC26" href="#SEC26">AUTHOR</a>
42<li><a name="TOC27" href="#SEC27">REVISION</a>
43</ul>
44<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
45<P>
46The full syntax and semantics of the regular expressions that are supported by
47PCRE are described in the
48<a href="pcrepattern.html"><b>pcrepattern</b></a>
49documentation. This document contains a quick-reference summary of the syntax.
50</P>
51<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
52<P>
53<pre>
54  \x         where x is non-alphanumeric is a literal x
55  \Q...\E    treat enclosed characters as literal
56</PRE>
57</P>
58<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
59<P>
60<pre>
61  \a         alarm, that is, the BEL character (hex 07)
62  \cx        "control-x", where x is any ASCII character
63  \e         escape (hex 1B)
64  \f         form feed (hex 0C)
65  \n         newline (hex 0A)
66  \r         carriage return (hex 0D)
67  \t         tab (hex 09)
68  \ddd       character with octal code ddd, or backreference
69  \xhh       character with hex code hh
70  \x{hhh..}  character with hex code hhh..
71</PRE>
72</P>
73<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
74<P>
75<pre>
76  .          any character except newline;
77               in dotall mode, any character whatsoever
78  \C         one data unit, even in UTF mode (best avoided)
79  \d         a decimal digit
80  \D         a character that is not a decimal digit
81  \h         a horizontal white space character
82  \H         a character that is not a horizontal white space character
83  \N         a character that is not a newline
84  \p{<i>xx</i>}     a character with the <i>xx</i> property
85  \P{<i>xx</i>}     a character without the <i>xx</i> property
86  \R         a newline sequence
87  \s         a white space character
88  \S         a character that is not a white space character
89  \v         a vertical white space character
90  \V         a character that is not a vertical white space character
91  \w         a "word" character
92  \W         a "non-word" character
93  \X         an extended Unicode sequence
94</pre>
95In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII
96characters, even in a UTF mode. However, this can be changed by setting the
97PCRE_UCP option.
98</P>
99<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
100<P>
101<pre>
102  C          Other
103  Cc         Control
104  Cf         Format
105  Cn         Unassigned
106  Co         Private use
107  Cs         Surrogate
108
109  L          Letter
110  Ll         Lower case letter
111  Lm         Modifier letter
112  Lo         Other letter
113  Lt         Title case letter
114  Lu         Upper case letter
115  L&         Ll, Lu, or Lt
116
117  M          Mark
118  Mc         Spacing mark
119  Me         Enclosing mark
120  Mn         Non-spacing mark
121
122  N          Number
123  Nd         Decimal number
124  Nl         Letter number
125  No         Other number
126
127  P          Punctuation
128  Pc         Connector punctuation
129  Pd         Dash punctuation
130  Pe         Close punctuation
131  Pf         Final punctuation
132  Pi         Initial punctuation
133  Po         Other punctuation
134  Ps         Open punctuation
135
136  S          Symbol
137  Sc         Currency symbol
138  Sk         Modifier symbol
139  Sm         Mathematical symbol
140  So         Other symbol
141
142  Z          Separator
143  Zl         Line separator
144  Zp         Paragraph separator
145  Zs         Space separator
146</PRE>
147</P>
148<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
149<P>
150<pre>
151  Xan        Alphanumeric: union of properties L and N
152  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
153  Xsp        Perl space: property Z or tab, NL, FF, CR
154  Xwd        Perl word: property Xan or underscore
155</PRE>
156</P>
157<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
158<P>
159Arabic,
160Armenian,
161Avestan,
162Balinese,
163Bamum,
164Batak,
165Bengali,
166Bopomofo,
167Brahmi,
168Braille,
169Buginese,
170Buhid,
171Canadian_Aboriginal,
172Carian,
173Chakma,
174Cham,
175Cherokee,
176Common,
177Coptic,
178Cuneiform,
179Cypriot,
180Cyrillic,
181Deseret,
182Devanagari,
183Egyptian_Hieroglyphs,
184Ethiopic,
185Georgian,
186Glagolitic,
187Gothic,
188Greek,
189Gujarati,
190Gurmukhi,
191Han,
192Hangul,
193Hanunoo,
194Hebrew,
195Hiragana,
196Imperial_Aramaic,
197Inherited,
198Inscriptional_Pahlavi,
199Inscriptional_Parthian,
200Javanese,
201Kaithi,
202Kannada,
203Katakana,
204Kayah_Li,
205Kharoshthi,
206Khmer,
207Lao,
208Latin,
209Lepcha,
210Limbu,
211Linear_B,
212Lisu,
213Lycian,
214Lydian,
215Malayalam,
216Mandaic,
217Meetei_Mayek,
218Meroitic_Cursive,
219Meroitic_Hieroglyphs,
220Miao,
221Mongolian,
222Myanmar,
223New_Tai_Lue,
224Nko,
225Ogham,
226Old_Italic,
227Old_Persian,
228Old_South_Arabian,
229Old_Turkic,
230Ol_Chiki,
231Oriya,
232Osmanya,
233Phags_Pa,
234Phoenician,
235Rejang,
236Runic,
237Samaritan,
238Saurashtra,
239Sharada,
240Shavian,
241Sinhala,
242Sora_Sompeng,
243Sundanese,
244Syloti_Nagri,
245Syriac,
246Tagalog,
247Tagbanwa,
248Tai_Le,
249Tai_Tham,
250Tai_Viet,
251Takri,
252Tamil,
253Telugu,
254Thaana,
255Thai,
256Tibetan,
257Tifinagh,
258Ugaritic,
259Vai,
260Yi.
261</P>
262<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
263<P>
264<pre>
265  [...]       positive character class
266  [^...]      negative character class
267  [x-y]       range (can be used for hex characters)
268  [[:xxx:]]   positive POSIX named set
269  [[:^xxx:]]  negative POSIX named set
270
271  alnum       alphanumeric
272  alpha       alphabetic
273  ascii       0-127
274  blank       space or tab
275  cntrl       control character
276  digit       decimal digit
277  graph       printing, excluding space
278  lower       lower case letter
279  print       printing, including space
280  punct       printing, excluding alphanumeric
281  space       white space
282  upper       upper case letter
283  word        same as \w
284  xdigit      hexadecimal digit
285</pre>
286In PCRE, POSIX character set names recognize only ASCII characters by default,
287but some of them use Unicode properties if PCRE_UCP is set. You can use
288\Q...\E inside a character class.
289</P>
290<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
291<P>
292<pre>
293  ?           0 or 1, greedy
294  ?+          0 or 1, possessive
295  ??          0 or 1, lazy
296  *           0 or more, greedy
297  *+          0 or more, possessive
298  *?          0 or more, lazy
299  +           1 or more, greedy
300  ++          1 or more, possessive
301  +?          1 or more, lazy
302  {n}         exactly n
303  {n,m}       at least n, no more than m, greedy
304  {n,m}+      at least n, no more than m, possessive
305  {n,m}?      at least n, no more than m, lazy
306  {n,}        n or more, greedy
307  {n,}+       n or more, possessive
308  {n,}?       n or more, lazy
309</PRE>
310</P>
311<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
312<P>
313<pre>
314  \b          word boundary
315  \B          not a word boundary
316  ^           start of subject
317               also after internal newline in multiline mode
318  \A          start of subject
319  $           end of subject
320               also before newline at end of subject
321               also before internal newline in multiline mode
322  \Z          end of subject
323               also before newline at end of subject
324  \z          end of subject
325  \G          first matching position in subject
326</PRE>
327</P>
328<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
329<P>
330<pre>
331  \K          reset start of match
332</PRE>
333</P>
334<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
335<P>
336<pre>
337  expr|expr|expr...
338</PRE>
339</P>
340<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
341<P>
342<pre>
343  (...)           capturing group
344  (?&#60;name&#62;...)    named capturing group (Perl)
345  (?'name'...)    named capturing group (Perl)
346  (?P&#60;name&#62;...)   named capturing group (Python)
347  (?:...)         non-capturing group
348  (?|...)         non-capturing group; reset group numbers for
349                   capturing groups in each alternative
350</PRE>
351</P>
352<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
353<P>
354<pre>
355  (?&#62;...)         atomic, non-capturing group
356</PRE>
357</P>
358<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
359<P>
360<pre>
361  (?#....)        comment (not nestable)
362</PRE>
363</P>
364<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
365<P>
366<pre>
367  (?i)            caseless
368  (?J)            allow duplicate names
369  (?m)            multiline
370  (?s)            single line (dotall)
371  (?U)            default ungreedy (lazy)
372  (?x)            extended (ignore white space)
373  (?-...)         unset option(s)
374</pre>
375The following are recognized only at the start of a pattern or after one of the
376newline-setting options with similar syntax:
377<pre>
378  (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
379  (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
380  (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
381  (*UCP)          set PCRE_UCP (use Unicode properties for \d etc)
382</PRE>
383</P>
384<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
385<P>
386<pre>
387  (?=...)         positive look ahead
388  (?!...)         negative look ahead
389  (?&#60;=...)        positive look behind
390  (?&#60;!...)        negative look behind
391</pre>
392Each top-level branch of a look behind must be of a fixed length.
393</P>
394<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
395<P>
396<pre>
397  \n              reference by number (can be ambiguous)
398  \gn             reference by number
399  \g{n}           reference by number
400  \g{-n}          relative reference by number
401  \k&#60;name&#62;        reference by name (Perl)
402  \k'name'        reference by name (Perl)
403  \g{name}        reference by name (Perl)
404  \k{name}        reference by name (.NET)
405  (?P=name)       reference by name (Python)
406</PRE>
407</P>
408<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
409<P>
410<pre>
411  (?R)            recurse whole pattern
412  (?n)            call subpattern by absolute number
413  (?+n)           call subpattern by relative number
414  (?-n)           call subpattern by relative number
415  (?&name)        call subpattern by name (Perl)
416  (?P&#62;name)       call subpattern by name (Python)
417  \g&#60;name&#62;        call subpattern by name (Oniguruma)
418  \g'name'        call subpattern by name (Oniguruma)
419  \g&#60;n&#62;           call subpattern by absolute number (Oniguruma)
420  \g'n'           call subpattern by absolute number (Oniguruma)
421  \g&#60;+n&#62;          call subpattern by relative number (PCRE extension)
422  \g'+n'          call subpattern by relative number (PCRE extension)
423  \g&#60;-n&#62;          call subpattern by relative number (PCRE extension)
424  \g'-n'          call subpattern by relative number (PCRE extension)
425</PRE>
426</P>
427<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
428<P>
429<pre>
430  (?(condition)yes-pattern)
431  (?(condition)yes-pattern|no-pattern)
432
433  (?(n)...        absolute reference condition
434  (?(+n)...       relative reference condition
435  (?(-n)...       relative reference condition
436  (?(&#60;name&#62;)...   named reference condition (Perl)
437  (?('name')...   named reference condition (Perl)
438  (?(name)...     named reference condition (PCRE)
439  (?(R)...        overall recursion condition
440  (?(Rn)...       specific group recursion condition
441  (?(R&name)...   specific recursion condition
442  (?(DEFINE)...   define subpattern for reference
443  (?(assert)...   assertion condition
444</PRE>
445</P>
446<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
447<P>
448The following act immediately they are reached:
449<pre>
450  (*ACCEPT)       force successful match
451  (*FAIL)         force backtrack; synonym (*F)
452  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
453</pre>
454The following act only when a subsequent match failure causes a backtrack to
455reach them. They all force a match failure, but they differ in what happens
456afterwards. Those that advance the start-of-match point do so only if the
457pattern is not anchored.
458<pre>
459  (*COMMIT)       overall failure, no advance of starting point
460  (*PRUNE)        advance to next starting character
461  (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
462  (*SKIP)         advance to current matching position
463  (*SKIP:NAME)    advance to position corresponding to an earlier
464                  (*MARK:NAME); if not found, the (*SKIP) is ignored
465  (*THEN)         local failure, backtrack to next alternation
466  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
467</PRE>
468</P>
469<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
470<P>
471These are recognized only at the very start of the pattern or after a
472(*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
473<pre>
474  (*CR)           carriage return only
475  (*LF)           linefeed only
476  (*CRLF)         carriage return followed by linefeed
477  (*ANYCRLF)      all three of the above
478  (*ANY)          any Unicode newline sequence
479</PRE>
480</P>
481<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
482<P>
483These are recognized only at the very start of the pattern or after a
484(*...) option that sets the newline convention or a UTF or UCP mode.
485<pre>
486  (*BSR_ANYCRLF)  CR, LF, or CRLF
487  (*BSR_UNICODE)  any Unicode newline sequence
488</PRE>
489</P>
490<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
491<P>
492<pre>
493  (?C)      callout
494  (?Cn)     callout with data n
495</PRE>
496</P>
497<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
498<P>
499<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
500<b>pcrematching</b>(3), <b>pcre</b>(3).
501</P>
502<br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
503<P>
504Philip Hazel
505<br>
506University Computing Service
507<br>
508Cambridge CB2 3QH, England.
509<br>
510</P>
511<br><a name="SEC27" href="#TOC1">REVISION</a><br>
512<P>
513Last updated: 10 January 2012
514<br>
515Copyright &copy; 1997-2012 University of Cambridge.
516<br>
517<p>
518Return to the <a href="index.html">PCRE index page</a>.
519</p>
520