1PCREGREP(1)                                                        PCREGREP(1)
2
3
4NAME
5       pcregrep - a grep with Perl-compatible regular expressions.
6
7
8SYNOPSIS
9       pcregrep [options] [long options] [pattern] [path1 path2 ...]
10
11
12DESCRIPTION
13
14       pcregrep  searches  files  for  character  patterns, in the same way as
15       other grep commands do, but it uses the PCRE regular expression library
16       to support patterns that are compatible with the regular expressions of
17       Perl 5. See pcrepattern(3) for a full description of syntax and  seman-
18       tics of the regular expressions that PCRE supports.
19
20       Patterns,  whether  supplied on the command line or in a separate file,
21       are given without delimiters. For example:
22
23         pcregrep Thursday /etc/motd
24
25       If you attempt to use delimiters (for example, by surrounding a pattern
26       with  slashes,  as  is common in Perl scripts), they are interpreted as
27       part of the pattern. Quotes can of course be used to  delimit  patterns
28       on  the  command  line  because  they are interpreted by the shell, and
29       indeed they are required if a pattern contains  white  space  or  shell
30       metacharacters.
31
32       The  first  argument that follows any option settings is treated as the
33       single pattern to be matched when neither -e nor -f is  present.   Con-
34       versely,  when  one  or  both of these options are used to specify pat-
35       terns, all arguments are treated as path names. At least one of -e, -f,
36       or an argument pattern must be provided.
37
38       If no files are specified, pcregrep reads the standard input. The stan-
39       dard input can also be referenced by a  name  consisting  of  a  single
40       hyphen.  For example:
41
42         pcregrep some-pattern /file1 - /file3
43
44       By  default, each line that matches a pattern is copied to the standard
45       output, and if there is more than one file, the file name is output  at
46       the start of each line, followed by a colon. However, there are options
47       that can change how pcregrep behaves.  In  particular,  the  -M  option
48       makes  it  possible  to  search for patterns that span line boundaries.
49       What defines a line  boundary  is  controlled  by  the  -N  (--newline)
50       option.
51
52       Patterns  are  limited  to  8K  or  BUFSIZ characters, whichever is the
53       greater.  BUFSIZ is defined in <stdio.h>. When there is more  than  one
54       pattern (specified by the use of -e and/or -f), each pattern is applied
55       to each line in the order in which they are defined,  except  that  all
56       the -e patterns are tried before the -f patterns.
57
58       By  default,  as soon as one pattern matches (or fails to match when -v
59       is used), no further patterns are considered. However, if --colour  (or
60       --color) is used to colour the matching substrings, or if --only-match-
61       ing, --file-offsets, or --line-offsets is used to output only the  part
62       of  the  line  that  matched (either shown literally, or as an offset),
63       scanning resumes immediately  following  the  match,  so  that  further
64       matches  on the same line can be found. If there are multiple patterns,
65       they are all tried on the remainder of the line, but patterns that fol-
66       low the one that matched are not tried on the earlier part of the line.
67
68       This is the same behaviour as GNU grep, but it does mean that the order
69       in which multiple patterns are specified can affect the output when one
70       of the above options is used.
71
72       Patterns  that can match an empty string are accepted, but empty string
73       matches   are   never   recognized.   An   example   is   the   pattern
74       "(super)?(man)?",  in  which  all components are optional. This pattern
75       finds all occurrences of both "super" and  "man";  the  output  differs
76       from  matching  with  "super|man" when only the matching substrings are
77       being shown.
78
79       If the LC_ALL or LC_CTYPE environment variable is  set,  pcregrep  uses
80       the  value to set a locale when calling the PCRE library.  The --locale
81       option can be used to override this.
82
83
84SUPPORT FOR COMPRESSED FILES
85
86       It is possible to compile pcregrep so that it uses libz  or  libbz2  to
87       read  files  whose names end in .gz or .bz2, respectively. You can find
88       out whether your binary has support for one or both of these file types
89       by running it with the --help option. If the appropriate support is not
90       present, files are treated as plain text. The standard input is  always
91       so treated.
92
93
94OPTIONS
95
96       The  order  in  which some of the options appear can affect the output.
97       For example, both the -h and -l options affect  the  printing  of  file
98       names.  Whichever  comes later in the command line will be the one that
99       takes effect.
100
101       --        This terminate the list of options. It is useful if the  next
102                 item  on  the command line starts with a hyphen but is not an
103                 option. This allows for the processing of patterns and  file-
104                 names that start with hyphens.
105
106       -A number, --after-context=number
107                 Output  number  lines of context after each matching line. If
108                 filenames and/or line numbers are being output, a hyphen sep-
109                 arator  is  used  instead of a colon for the context lines. A
110                 line containing "--" is output between each group  of  lines,
111                 unless  they  are  in  fact contiguous in the input file. The
112                 value of number is expected to be relatively small.  However,
113                 pcregrep guarantees to have up to 8K of following text avail-
114                 able for context output.
115
116       -B number, --before-context=number
117                 Output number lines of context before each matching line.  If
118                 filenames and/or line numbers are being output, a hyphen sep-
119                 arator is used instead of a colon for the  context  lines.  A
120                 line  containing  "--" is output between each group of lines,
121                 unless they are in fact contiguous in  the  input  file.  The
122                 value  of number is expected to be relatively small. However,
123                 pcregrep guarantees to have up to 8K of preceding text avail-
124                 able for context output.
125
126       -C number, --context=number
127                 Output  number  lines  of  context both before and after each
128                 matching line.  This is equivalent to setting both -A and  -B
129                 to the same value.
130
131       -c, --count
132                 Do  not output individual lines from the files that are being
133                 scanned; instead output the number of lines that would other-
134                 wise  have  been  shown. If no lines are selected, the number
135                 zero is output. If several files are  are  being  scanned,  a
136                 count  is  output  for each of them. However, if the --files-
137                 with-matches option is also  used,  only  those  files  whose
138                 counts are greater than zero are listed. When -c is used, the
139                 -A, -B, and -C options are ignored.
140
141       --colour, --color
142                 If this option is given without any data, it is equivalent to
143                 "--colour=auto".   If  data  is required, it must be given in
144                 the same shell item, separated by an equals sign.
145
146       --colour=value, --color=value
147                 This option specifies under what circumstances the parts of a
148                 line that matched a pattern should be coloured in the output.
149                 By default, the output is not coloured. The value  (which  is
150                 optional,  see above) may be "never", "always", or "auto". In
151                 the latter case, colouring happens only if the standard  out-
152                 put  is connected to a terminal. More resources are used when
153                 colouring is enabled, because pcregrep has to search for  all
154                 possible  matches in a line, not just one, in order to colour
155                 them all.
156
157                 The colour that is used can be specified by setting the envi-
158                 ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
159                 of this variable should be a string of two numbers, separated
160                 by  a  semicolon.  They  are copied directly into the control
161                 string for setting colour  on  a  terminal,  so  it  is  your
162                 responsibility  to ensure that they make sense. If neither of
163                 the environment variables is  set,  the  default  is  "1;31",
164                 which gives red.
165
166       -D action, --devices=action
167                 If  an  input  path  is  not  a  regular file or a directory,
168                 "action" specifies how it is to be  processed.  Valid  values
169                 are "read" (the default) or "skip" (silently skip the path).
170
171       -d action, --directories=action
172                 If an input path is a directory, "action" specifies how it is
173                 to be processed.  Valid  values  are  "read"  (the  default),
174                 "recurse"  (equivalent to the -r option), or "skip" (silently
175                 skip the path). In the default case, directories are read  as
176                 if  they  were  ordinary files. In some operating systems the
177                 effect of reading a directory like this is an immediate  end-
178                 of-file.
179
180       -e pattern, --regex=pattern, --regexp=pattern
181                 Specify a pattern to be matched. This option can be used mul-
182                 tiple times in order to specify several patterns. It can also
183                 be  used  as a way of specifying a single pattern that starts
184                 with a hyphen. When -e is used, no argument pattern is  taken
185                 from  the  command  line;  all  arguments are treated as file
186                 names. There is an overall maximum of 100 patterns. They  are
187                 applied  to  each line in the order in which they are defined
188                 until one matches (or fails to match if -v is used). If -f is
189                 used  with  -e,  the command line patterns are matched first,
190                 followed by the patterns from the file,  independent  of  the
191                 order  in which these options are specified. Note that multi-
192                 ple use of -e is not the same as a single pattern with alter-
193                 natives. For example, X|Y finds the first character in a line
194                 that is X or Y, whereas if the two patterns are  given  sepa-
195                 rately, pcregrep finds X if it is present, even if it follows
196                 Y in the line. It finds Y only if there is no X in the  line.
197                 This  really  matters  only  if  you are using -o to show the
198                 part(s) of the line that matched.
199
200       --exclude=pattern
201                 When pcregrep is searching the files in a directory as a con-
202                 sequence  of  the  -r  (recursive search) option, any regular
203                 files whose names match the pattern are excluded. Subdirecto-
204                 ries  are  not  excluded  by  this  option; they are searched
205                 recursively, subject to the --exclude_dir  and  --include_dir
206                 options.  The  pattern  is  a PCRE regular expression, and is
207                 matched against the final component of the file name (not the
208                 entire  path).  If  a  file  name  matches both --include and
209                 --exclude, it is excluded.  There is no short form  for  this
210                 option.
211
212       --exclude_dir=pattern
213                 When  pcregrep  is searching the contents of a directory as a
214                 consequence of the -r (recursive search) option,  any  subdi-
215                 rectories  whose  names match the pattern are excluded. (Note
216                 that the --exclude option does  not  affect  subdirectories.)
217                 The  pattern  is  a  PCRE  regular expression, and is matched
218                 against the final component  of  the  name  (not  the  entire
219                 path).  If a subdirectory name matches both --include_dir and
220                 --exclude_dir, it is excluded. There is  no  short  form  for
221                 this option.
222
223       -F, --fixed-strings
224                 Interpret  each pattern as a list of fixed strings, separated
225                 by newlines, instead of  as  a  regular  expression.  The  -w
226                 (match  as  a  word) and -x (match whole line) options can be
227                 used with -F. They apply to each of the fixed strings. A line
228                 is selected if any of the fixed strings are found in it (sub-
229                 ject to -w or -x, if present).
230
231       -f filename, --file=filename
232                 Read a number of patterns from the file, one  per  line,  and
233                 match  them against each line of input. A data line is output
234                 if any of the patterns match it. The filename can be given as
235                 "-" to refer to the standard input. When -f is used, patterns
236                 specified on the command line using -e may also  be  present;
237                 they are tested before the file's patterns. However, no other
238                 pattern is taken from the command  line;  all  arguments  are
239                 treated  as  file  names.  There is an overall maximum of 100
240                 patterns. Trailing white space is removed from each line, and
241                 blank  lines  are ignored. An empty file contains no patterns
242                 and therefore matches nothing. See also  the  comments  about
243                 multiple  patterns  versus a single pattern with alternatives
244                 in the description of -e above.
245
246       --file-offsets
247                 Instead of showing lines or parts of lines that  match,  show
248                 each  match  as  an  offset  from the start of the file and a
249                 length, separated by a comma. In this  mode,  no  context  is
250                 shown.  That  is,  the -A, -B, and -C options are ignored. If
251                 there is more than one match in a line, each of them is shown
252                 separately.  This  option  is mutually exclusive with --line-
253                 offsets and --only-matching.
254
255       -H, --with-filename
256                 Force the inclusion of the filename at the  start  of  output
257                 lines  when searching a single file. By default, the filename
258                 is not shown in this case. For matching lines,  the  filename
259                 is followed by a colon; for context lines, a hyphen separator
260                 is used. If a line number is also being  output,  it  follows
261                 the file name.
262
263       -h, --no-filename
264                 Suppress  the output filenames when searching multiple files.
265                 By default, filenames  are  shown  when  multiple  files  are
266                 searched.  For  matching lines, the filename is followed by a
267                 colon; for context lines, a hyphen separator is used.   If  a
268                 line number is also being output, it follows the file name.
269
270       --help    Output  a  help  message, giving brief details of the command
271                 options and file type support, and then exit.
272
273       -i, --ignore-case
274                 Ignore upper/lower case distinctions during comparisons.
275
276       --include=pattern
277                 When pcregrep is searching the files in a directory as a con-
278                 sequence of the -r (recursive search) option, only those reg-
279                 ular files whose names match the pattern are included. Subdi-
280                 rectories  are always included and searched recursively, sub-
281                 ject to the --include_dir and --exclude_dir options. The pat-
282                 tern is a PCRE regular expression, and is matched against the
283                 final component of the file name (not the entire path). If  a
284                 file  name  matches  both  --include  and  --exclude,  it  is
285                 excluded. There is no short form for this option.
286
287       --include_dir=pattern
288                 When pcregrep is searching the contents of a directory  as  a
289                 consequence  of  the -r (recursive search) option, only those
290                 subdirectories whose names match the  pattern  are  included.
291                 (Note  that  the --include option does not affect subdirecto-
292                 ries.) The pattern is  a  PCRE  regular  expression,  and  is
293                 matched  against  the  final  component  of the name (not the
294                 entire  path).  If   a   subdirectory   name   matches   both
295                 --include_dir  and --exclude_dir, it is excluded. There is no
296                 short form for this option.
297
298       -L, --files-without-match
299                 Instead of outputting lines from the files, just  output  the
300                 names  of  the files that do not contain any lines that would
301                 have been output. Each file name is output once, on  a  sepa-
302                 rate line.
303
304       -l, --files-with-matches
305                 Instead  of  outputting lines from the files, just output the
306                 names of the files containing lines that would have been out-
307                 put.  Each  file  name  is  output  once, on a separate line.
308                 Searching normally stops as soon as a matching line is  found
309                 in  a  file.  However, if the -c (count) option is also used,
310                 matching continues in order to obtain the correct count,  and
311                 those  files  that  have  at least one match are listed along
312                 with their counts. Using this option with -c is a way of sup-
313                 pressing the listing of files with no matches.
314
315       --label=name
316                 This option supplies a name to be used for the standard input
317                 when file names are being output. If not supplied, "(standard
318                 input)" is used. There is no short form for this option.
319
320       --line-offsets
321                 Instead  of  showing lines or parts of lines that match, show
322                 each match as a line number, the offset from the start of the
323                 line,  and a length. The line number is terminated by a colon
324                 (as usual; see the -n option), and the offset and length  are
325                 separated  by  a  comma.  In  this mode, no context is shown.
326                 That is, the -A, -B, and -C options are ignored. If there  is
327                 more  than  one  match in a line, each of them is shown sepa-
328                 rately. This option is mutually exclusive with --file-offsets
329                 and --only-matching.
330
331       --locale=locale-name
332                 This  option specifies a locale to be used for pattern match-
333                 ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
334                 ronment  variables.  If  no  locale  is  specified,  the PCRE
335                 library's default (usually the "C" locale) is used. There  is
336                 no short form for this option.
337
338       -M, --multiline
339                 Allow  patterns to match more than one line. When this option
340                 is given, patterns may usefully contain literal newline char-
341                 acters  and  internal  occurrences of ^ and $ characters. The
342                 output for any one match may consist of more than  one  line.
343                 When  this option is set, the PCRE library is called in "mul-
344                 tiline" mode.  There is a limit to the number of  lines  that
345                 can  be matched, imposed by the way that pcregrep buffers the
346                 input file as it scans it. However, pcregrep ensures that  at
347                 least 8K characters or the rest of the document (whichever is
348                 the shorter) are available for forward  matching,  and  simi-
349                 larly the previous 8K characters (or all the previous charac-
350                 ters, if fewer than 8K) are guaranteed to  be  available  for
351                 lookbehind assertions.
352
353       -N newline-type, --newline=newline-type
354                 The  PCRE  library  supports  five  different conventions for
355                 indicating the ends of lines. They are  the  single-character
356                 sequences  CR  (carriage  return) and LF (linefeed), the two-
357                 character sequence CRLF, an "anycrlf" convention, which  rec-
358                 ognizes  any  of the preceding three types, and an "any" con-
359                 vention, in which any Unicode line ending sequence is assumed
360                 to  end a line. The Unicode sequences are the three just men-
361                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF   (formfeed,
362                 U+000C),   NEL  (next  line,  U+0085),  LS  (line  separator,
363                 U+2028), and PS (paragraph separator, U+2029).
364
365                 When  the  PCRE  library  is  built,  a  default  line-ending
366                 sequence   is  specified.   This  is  normally  the  standard
367                 sequence for the operating system. Unless otherwise specified
368                 by  this  option,  pcregrep  uses the library's default.  The
369                 possible values for this option are CR, LF, CRLF, ANYCRLF, or
370                 ANY.  This  makes  it  possible to use pcregrep on files that
371                 have come from other environments without  having  to  modify
372                 their  line  endings.  If the data that is being scanned does
373                 not agree with the convention set by  this  option,  pcregrep
374                 may behave in strange ways.
375
376       -n, --line-number
377                 Precede each output line by its line number in the file, fol-
378                 lowed by a colon for matching lines or a hyphen  for  context
379                 lines.  If the filename is also being output, it precedes the
380                 line number. This option is forced if --line-offsets is used.
381
382       -o, --only-matching
383                 Show only the part of the line that  matched  a  pattern.  In
384                 this  mode,  no context is shown. That is, the -A, -B, and -C
385                 options are ignored. If there is more than  one  match  in  a
386                 line,  each  of  them  is shown separately. If -o is combined
387                 with -v (invert the sense of the match to  find  non-matching
388                 lines),  no  output  is generated, but the return code is set
389                 appropriately. This option is mutually exclusive with --file-
390                 offsets and --line-offsets.
391
392       -q, --quiet
393                 Work quietly, that is, display nothing except error messages.
394                 The exit status indicates whether or  not  any  matches  were
395                 found.
396
397       -r, --recursive
398                 If  any given path is a directory, recursively scan the files
399                 it contains, taking note of any --include and --exclude  set-
400                 tings.  By  default, a directory is read as a normal file; in
401                 some operating systems this gives an  immediate  end-of-file.
402                 This  option  is  a  shorthand  for  setting the -d option to
403                 "recurse".
404
405       -s, --no-messages
406                 Suppress error  messages  about  non-existent  or  unreadable
407                 files.  Such  files  are quietly skipped. However, the return
408                 code is still 2, even if matches were found in other files.
409
410       -u, --utf-8
411                 Operate in UTF-8 mode. This option is available only if  PCRE
412                 has  been compiled with UTF-8 support. Both patterns and sub-
413                 ject lines must be valid strings of UTF-8 characters.
414
415       -V, --version
416                 Write the version numbers of pcregrep and  the  PCRE  library
417                 that is being used to the standard error stream.
418
419       -v, --invert-match
420                 Invert  the  sense  of  the match, so that lines which do not
421                 match any of the patterns are the ones that are found.
422
423       -w, --word-regex, --word-regexp
424                 Force the patterns to match only whole words. This is equiva-
425                 lent to having \b at the start and end of the pattern.
426
427       -x, --line-regex, --line-regexp
428                 Force  the  patterns to be anchored (each must start matching
429                 at the beginning of a line) and in addition, require them  to
430                 match  entire  lines.  This  is  equivalent to having ^ and $
431                 characters at the start and end of each alternative branch in
432                 every pattern.
433
434
435ENVIRONMENT VARIABLES
436
437       The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
438       order, for a locale. The first one that is set is  used.  This  can  be
439       overridden  by  the  --locale  option.  If  no  locale is set, the PCRE
440       library's default (usually the "C" locale) is used.
441
442
443NEWLINES
444
445       The -N (--newline) option allows pcregrep to scan files with  different
446       newline  conventions  from  the  default.  However, the setting of this
447       option does not affect the way in which pcregrep writes information  to
448       the  standard  error  and  output streams. It uses the string "\n" in C
449       printf() calls to indicate newlines, relying on the C  I/O  library  to
450       convert  this  to  an  appropriate  sequence if the output is sent to a
451       file.
452
453
454OPTIONS COMPATIBILITY
455
456       The majority of short and long forms of pcregrep's options are the same
457       as  in  the  GNU grep program. Any long option of the form --xxx-regexp
458       (GNU terminology) is also available as --xxx-regex (PCRE  terminology).
459       However,  the  --locale,  -M,  --multiline, -u, and --utf-8 options are
460       specific to pcregrep. If both the -c and -l options are given, GNU grep
461       lists only file names, without counts, but pcregrep gives the counts.
462
463
464OPTIONS WITH DATA
465
466       There are four different ways in which an option with data can be spec-
467       ified.  If a short form option is used, the  data  may  follow  immedi-
468       ately, or in the next command line item. For example:
469
470         -f/some/file
471         -f /some/file
472
473       If  a long form option is used, the data may appear in the same command
474       line item, separated by an equals character, or (with one exception) it
475       may appear in the next command line item. For example:
476
477         --file=/some/file
478         --file /some/file
479
480       Note,  however, that if you want to supply a file name beginning with ~
481       as data in a shell command, and have the  shell  expand  ~  to  a  home
482       directory, you must separate the file name from the option, because the
483       shell does not treat ~ specially unless it is at the start of an item.
484
485       The exception to the above is the --colour  (or  --color)  option,  for
486       which  the  data is optional. If this option does have data, it must be
487       given in the first form, using an equals character. Otherwise  it  will
488       be assumed that it has no data.
489
490
491MATCHING ERRORS
492
493       It  is  possible  to supply a regular expression that takes a very long
494       time to fail to match certain lines.  Such  patterns  normally  involve
495       nested  indefinite repeats, for example: (a+)*\d when matched against a
496       line of a's with no final digit.  The  PCRE  matching  function  has  a
497       resource  limit that causes it to abort in these circumstances. If this
498       happens, pcregrep outputs an error message and the line that caused the
499       problem  to  the  standard error stream. If there are more than 20 such
500       errors, pcregrep gives up.
501
502
503DIAGNOSTICS
504
505       Exit status is 0 if any matches were found, 1 if no matches were found,
506       and  2 for syntax errors and non-existent or inacessible files (even if
507       matches were found in other files) or too many matching  errors.  Using
508       the  -s  option to suppress error messages about inaccessble files does
509       not affect the return code.
510
511
512SEE ALSO
513
514       pcrepattern(3), pcretest(1).
515
516
517AUTHOR
518
519       Philip Hazel
520       University Computing Service
521       Cambridge CB2 3QH, England.
522
523
524REVISION
525
526       Last updated: 13 September 2009
527       Copyright (c) 1997-2009 University of Cambridge.
528