1<html> 2<head> 3<title>pcresyntax specification</title> 4</head> 5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6<h1>pcresyntax man page</h1> 7<p> 8Return to the <a href="index.html">PCRE index page</a>. 9</p> 10<p> 11This page is part of the PCRE HTML documentation. It was generated automatically 12from the original man page. If there is any nonsense in it, please consult the 13man page, in case the conversion went wrong. 14<br> 15<ul> 16<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a> 17<li><a name="TOC2" href="#SEC2">QUOTING</a> 18<li><a name="TOC3" href="#SEC3">CHARACTERS</a> 19<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a> 20<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a> 21<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a> 22<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a> 23<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a> 24<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a> 25<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a> 26<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a> 27<li><a name="TOC12" href="#SEC12">ALTERNATION</a> 28<li><a name="TOC13" href="#SEC13">CAPTURING</a> 29<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a> 30<li><a name="TOC15" href="#SEC15">COMMENT</a> 31<li><a name="TOC16" href="#SEC16">OPTION SETTING</a> 32<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a> 33<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a> 34<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a> 35<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a> 36<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a> 37<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a> 38<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a> 39<li><a name="TOC24" href="#SEC24">CALLOUTS</a> 40<li><a name="TOC25" href="#SEC25">SEE ALSO</a> 41<li><a name="TOC26" href="#SEC26">AUTHOR</a> 42<li><a name="TOC27" href="#SEC27">REVISION</a> 43</ul> 44<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br> 45<P> 46The full syntax and semantics of the regular expressions that are supported by 47PCRE are described in the 48<a href="pcrepattern.html"><b>pcrepattern</b></a> 49documentation. This document contains a quick-reference summary of the syntax. 50</P> 51<br><a name="SEC2" href="#TOC1">QUOTING</a><br> 52<P> 53<pre> 54 \x where x is non-alphanumeric is a literal x 55 \Q...\E treat enclosed characters as literal 56</PRE> 57</P> 58<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br> 59<P> 60<pre> 61 \a alarm, that is, the BEL character (hex 07) 62 \cx "control-x", where x is any ASCII character 63 \e escape (hex 1B) 64 \f form feed (hex 0C) 65 \n newline (hex 0A) 66 \r carriage return (hex 0D) 67 \t tab (hex 09) 68 \ddd character with octal code ddd, or backreference 69 \xhh character with hex code hh 70 \x{hhh..} character with hex code hhh.. 71</PRE> 72</P> 73<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> 74<P> 75<pre> 76 . any character except newline; 77 in dotall mode, any character whatsoever 78 \C one data unit, even in UTF mode (best avoided) 79 \d a decimal digit 80 \D a character that is not a decimal digit 81 \h a horizontal white space character 82 \H a character that is not a horizontal white space character 83 \N a character that is not a newline 84 \p{<i>xx</i>} a character with the <i>xx</i> property 85 \P{<i>xx</i>} a character without the <i>xx</i> property 86 \R a newline sequence 87 \s a white space character 88 \S a character that is not a white space character 89 \v a vertical white space character 90 \V a character that is not a vertical white space character 91 \w a "word" character 92 \W a "non-word" character 93 \X an extended Unicode sequence 94</pre> 95In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII 96characters, even in a UTF mode. However, this can be changed by setting the 97PCRE_UCP option. 98</P> 99<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> 100<P> 101<pre> 102 C Other 103 Cc Control 104 Cf Format 105 Cn Unassigned 106 Co Private use 107 Cs Surrogate 108 109 L Letter 110 Ll Lower case letter 111 Lm Modifier letter 112 Lo Other letter 113 Lt Title case letter 114 Lu Upper case letter 115 L& Ll, Lu, or Lt 116 117 M Mark 118 Mc Spacing mark 119 Me Enclosing mark 120 Mn Non-spacing mark 121 122 N Number 123 Nd Decimal number 124 Nl Letter number 125 No Other number 126 127 P Punctuation 128 Pc Connector punctuation 129 Pd Dash punctuation 130 Pe Close punctuation 131 Pf Final punctuation 132 Pi Initial punctuation 133 Po Other punctuation 134 Ps Open punctuation 135 136 S Symbol 137 Sc Currency symbol 138 Sk Modifier symbol 139 Sm Mathematical symbol 140 So Other symbol 141 142 Z Separator 143 Zl Line separator 144 Zp Paragraph separator 145 Zs Space separator 146</PRE> 147</P> 148<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br> 149<P> 150<pre> 151 Xan Alphanumeric: union of properties L and N 152 Xps POSIX space: property Z or tab, NL, VT, FF, CR 153 Xsp Perl space: property Z or tab, NL, FF, CR 154 Xwd Perl word: property Xan or underscore 155</PRE> 156</P> 157<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> 158<P> 159Arabic, 160Armenian, 161Avestan, 162Balinese, 163Bamum, 164Batak, 165Bengali, 166Bopomofo, 167Brahmi, 168Braille, 169Buginese, 170Buhid, 171Canadian_Aboriginal, 172Carian, 173Chakma, 174Cham, 175Cherokee, 176Common, 177Coptic, 178Cuneiform, 179Cypriot, 180Cyrillic, 181Deseret, 182Devanagari, 183Egyptian_Hieroglyphs, 184Ethiopic, 185Georgian, 186Glagolitic, 187Gothic, 188Greek, 189Gujarati, 190Gurmukhi, 191Han, 192Hangul, 193Hanunoo, 194Hebrew, 195Hiragana, 196Imperial_Aramaic, 197Inherited, 198Inscriptional_Pahlavi, 199Inscriptional_Parthian, 200Javanese, 201Kaithi, 202Kannada, 203Katakana, 204Kayah_Li, 205Kharoshthi, 206Khmer, 207Lao, 208Latin, 209Lepcha, 210Limbu, 211Linear_B, 212Lisu, 213Lycian, 214Lydian, 215Malayalam, 216Mandaic, 217Meetei_Mayek, 218Meroitic_Cursive, 219Meroitic_Hieroglyphs, 220Miao, 221Mongolian, 222Myanmar, 223New_Tai_Lue, 224Nko, 225Ogham, 226Old_Italic, 227Old_Persian, 228Old_South_Arabian, 229Old_Turkic, 230Ol_Chiki, 231Oriya, 232Osmanya, 233Phags_Pa, 234Phoenician, 235Rejang, 236Runic, 237Samaritan, 238Saurashtra, 239Sharada, 240Shavian, 241Sinhala, 242Sora_Sompeng, 243Sundanese, 244Syloti_Nagri, 245Syriac, 246Tagalog, 247Tagbanwa, 248Tai_Le, 249Tai_Tham, 250Tai_Viet, 251Takri, 252Tamil, 253Telugu, 254Thaana, 255Thai, 256Tibetan, 257Tifinagh, 258Ugaritic, 259Vai, 260Yi. 261</P> 262<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br> 263<P> 264<pre> 265 [...] positive character class 266 [^...] negative character class 267 [x-y] range (can be used for hex characters) 268 [[:xxx:]] positive POSIX named set 269 [[:^xxx:]] negative POSIX named set 270 271 alnum alphanumeric 272 alpha alphabetic 273 ascii 0-127 274 blank space or tab 275 cntrl control character 276 digit decimal digit 277 graph printing, excluding space 278 lower lower case letter 279 print printing, including space 280 punct printing, excluding alphanumeric 281 space white space 282 upper upper case letter 283 word same as \w 284 xdigit hexadecimal digit 285</pre> 286In PCRE, POSIX character set names recognize only ASCII characters by default, 287but some of them use Unicode properties if PCRE_UCP is set. You can use 288\Q...\E inside a character class. 289</P> 290<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br> 291<P> 292<pre> 293 ? 0 or 1, greedy 294 ?+ 0 or 1, possessive 295 ?? 0 or 1, lazy 296 * 0 or more, greedy 297 *+ 0 or more, possessive 298 *? 0 or more, lazy 299 + 1 or more, greedy 300 ++ 1 or more, possessive 301 +? 1 or more, lazy 302 {n} exactly n 303 {n,m} at least n, no more than m, greedy 304 {n,m}+ at least n, no more than m, possessive 305 {n,m}? at least n, no more than m, lazy 306 {n,} n or more, greedy 307 {n,}+ n or more, possessive 308 {n,}? n or more, lazy 309</PRE> 310</P> 311<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br> 312<P> 313<pre> 314 \b word boundary 315 \B not a word boundary 316 ^ start of subject 317 also after internal newline in multiline mode 318 \A start of subject 319 $ end of subject 320 also before newline at end of subject 321 also before internal newline in multiline mode 322 \Z end of subject 323 also before newline at end of subject 324 \z end of subject 325 \G first matching position in subject 326</PRE> 327</P> 328<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br> 329<P> 330<pre> 331 \K reset start of match 332</PRE> 333</P> 334<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br> 335<P> 336<pre> 337 expr|expr|expr... 338</PRE> 339</P> 340<br><a name="SEC13" href="#TOC1">CAPTURING</a><br> 341<P> 342<pre> 343 (...) capturing group 344 (?<name>...) named capturing group (Perl) 345 (?'name'...) named capturing group (Perl) 346 (?P<name>...) named capturing group (Python) 347 (?:...) non-capturing group 348 (?|...) non-capturing group; reset group numbers for 349 capturing groups in each alternative 350</PRE> 351</P> 352<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br> 353<P> 354<pre> 355 (?>...) atomic, non-capturing group 356</PRE> 357</P> 358<br><a name="SEC15" href="#TOC1">COMMENT</a><br> 359<P> 360<pre> 361 (?#....) comment (not nestable) 362</PRE> 363</P> 364<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br> 365<P> 366<pre> 367 (?i) caseless 368 (?J) allow duplicate names 369 (?m) multiline 370 (?s) single line (dotall) 371 (?U) default ungreedy (lazy) 372 (?x) extended (ignore white space) 373 (?-...) unset option(s) 374</pre> 375The following are recognized only at the start of a pattern or after one of the 376newline-setting options with similar syntax: 377<pre> 378 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) 379 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) 380 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) 381 (*UCP) set PCRE_UCP (use Unicode properties for \d etc) 382</PRE> 383</P> 384<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> 385<P> 386<pre> 387 (?=...) positive look ahead 388 (?!...) negative look ahead 389 (?<=...) positive look behind 390 (?<!...) negative look behind 391</pre> 392Each top-level branch of a look behind must be of a fixed length. 393</P> 394<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br> 395<P> 396<pre> 397 \n reference by number (can be ambiguous) 398 \gn reference by number 399 \g{n} reference by number 400 \g{-n} relative reference by number 401 \k<name> reference by name (Perl) 402 \k'name' reference by name (Perl) 403 \g{name} reference by name (Perl) 404 \k{name} reference by name (.NET) 405 (?P=name) reference by name (Python) 406</PRE> 407</P> 408<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br> 409<P> 410<pre> 411 (?R) recurse whole pattern 412 (?n) call subpattern by absolute number 413 (?+n) call subpattern by relative number 414 (?-n) call subpattern by relative number 415 (?&name) call subpattern by name (Perl) 416 (?P>name) call subpattern by name (Python) 417 \g<name> call subpattern by name (Oniguruma) 418 \g'name' call subpattern by name (Oniguruma) 419 \g<n> call subpattern by absolute number (Oniguruma) 420 \g'n' call subpattern by absolute number (Oniguruma) 421 \g<+n> call subpattern by relative number (PCRE extension) 422 \g'+n' call subpattern by relative number (PCRE extension) 423 \g<-n> call subpattern by relative number (PCRE extension) 424 \g'-n' call subpattern by relative number (PCRE extension) 425</PRE> 426</P> 427<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br> 428<P> 429<pre> 430 (?(condition)yes-pattern) 431 (?(condition)yes-pattern|no-pattern) 432 433 (?(n)... absolute reference condition 434 (?(+n)... relative reference condition 435 (?(-n)... relative reference condition 436 (?(<name>)... named reference condition (Perl) 437 (?('name')... named reference condition (Perl) 438 (?(name)... named reference condition (PCRE) 439 (?(R)... overall recursion condition 440 (?(Rn)... specific group recursion condition 441 (?(R&name)... specific recursion condition 442 (?(DEFINE)... define subpattern for reference 443 (?(assert)... assertion condition 444</PRE> 445</P> 446<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br> 447<P> 448The following act immediately they are reached: 449<pre> 450 (*ACCEPT) force successful match 451 (*FAIL) force backtrack; synonym (*F) 452 (*MARK:NAME) set name to be passed back; synonym (*:NAME) 453</pre> 454The following act only when a subsequent match failure causes a backtrack to 455reach them. They all force a match failure, but they differ in what happens 456afterwards. Those that advance the start-of-match point do so only if the 457pattern is not anchored. 458<pre> 459 (*COMMIT) overall failure, no advance of starting point 460 (*PRUNE) advance to next starting character 461 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) 462 (*SKIP) advance to current matching position 463 (*SKIP:NAME) advance to position corresponding to an earlier 464 (*MARK:NAME); if not found, the (*SKIP) is ignored 465 (*THEN) local failure, backtrack to next alternation 466 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) 467</PRE> 468</P> 469<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> 470<P> 471These are recognized only at the very start of the pattern or after a 472(*BSR_...), (*UTF8), (*UTF16) or (*UCP) option. 473<pre> 474 (*CR) carriage return only 475 (*LF) linefeed only 476 (*CRLF) carriage return followed by linefeed 477 (*ANYCRLF) all three of the above 478 (*ANY) any Unicode newline sequence 479</PRE> 480</P> 481<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br> 482<P> 483These are recognized only at the very start of the pattern or after a 484(*...) option that sets the newline convention or a UTF or UCP mode. 485<pre> 486 (*BSR_ANYCRLF) CR, LF, or CRLF 487 (*BSR_UNICODE) any Unicode newline sequence 488</PRE> 489</P> 490<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> 491<P> 492<pre> 493 (?C) callout 494 (?Cn) callout with data n 495</PRE> 496</P> 497<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br> 498<P> 499<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), 500<b>pcrematching</b>(3), <b>pcre</b>(3). 501</P> 502<br><a name="SEC26" href="#TOC1">AUTHOR</a><br> 503<P> 504Philip Hazel 505<br> 506University Computing Service 507<br> 508Cambridge CB2 3QH, England. 509<br> 510</P> 511<br><a name="SEC27" href="#TOC1">REVISION</a><br> 512<P> 513Last updated: 10 January 2012 514<br> 515Copyright © 1997-2012 University of Cambridge. 516<br> 517<p> 518Return to the <a href="index.html">PCRE index page</a>. 519</p> 520