1 | \input texinfo @c -*-texinfo-*-
|
---|
2 | @c %**start of header
|
---|
3 | @setfilename grep.info
|
---|
4 | @include version.texi
|
---|
5 | @settitle GNU Grep @value{VERSION}
|
---|
6 |
|
---|
7 | @c Combine indices.
|
---|
8 | @syncodeindex ky cp
|
---|
9 | @syncodeindex pg cp
|
---|
10 | @syncodeindex tp cp
|
---|
11 | @defcodeindex op
|
---|
12 | @syncodeindex op cp
|
---|
13 | @syncodeindex vr cp
|
---|
14 | @c %**end of header
|
---|
15 |
|
---|
16 | @documentencoding UTF-8
|
---|
17 | @c These two require Texinfo 5.0 or later, so use the older
|
---|
18 | @c equivalent @set variables supported in 4.11 and later.
|
---|
19 | @ignore
|
---|
20 | @codequotebacktick on
|
---|
21 | @codequoteundirected on
|
---|
22 | @end ignore
|
---|
23 | @set txicodequoteundirected
|
---|
24 | @set txicodequotebacktick
|
---|
25 | @iftex
|
---|
26 | @c TeX sometimes fails to hyphenate, so help it here.
|
---|
27 | @hyphenation{spec-i-fied}
|
---|
28 | @end iftex
|
---|
29 |
|
---|
30 | @copying
|
---|
31 | This manual is for @command{grep}, a pattern matching engine.
|
---|
32 |
|
---|
33 | Copyright @copyright{} 1999--2002, 2005, 2008--2021 Free Software Foundation,
|
---|
34 | Inc.
|
---|
35 |
|
---|
36 | @quotation
|
---|
37 | Permission is granted to copy, distribute and/or modify this document
|
---|
38 | under the terms of the GNU Free Documentation License, Version 1.3 or
|
---|
39 | any later version published by the Free Software Foundation; with no
|
---|
40 | Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
---|
41 | Texts. A copy of the license is included in the section entitled
|
---|
42 | ``GNU Free Documentation License''.
|
---|
43 | @end quotation
|
---|
44 | @end copying
|
---|
45 |
|
---|
46 | @dircategory Text creation and manipulation
|
---|
47 | @direntry
|
---|
48 | * grep: (grep). Print lines that match patterns.
|
---|
49 | @end direntry
|
---|
50 |
|
---|
51 | @titlepage
|
---|
52 | @title GNU Grep: Print lines that match patterns
|
---|
53 | @subtitle version @value{VERSION}, @value{UPDATED}
|
---|
54 | @author Alain Magloire et al.
|
---|
55 | @page
|
---|
56 | @vskip 0pt plus 1filll
|
---|
57 | @insertcopying
|
---|
58 | @end titlepage
|
---|
59 |
|
---|
60 | @contents
|
---|
61 |
|
---|
62 |
|
---|
63 | @ifnottex
|
---|
64 | @node Top
|
---|
65 | @top grep
|
---|
66 |
|
---|
67 | @command{grep} prints lines that contain a match for one or more patterns.
|
---|
68 |
|
---|
69 | This manual is for version @value{VERSION} of GNU Grep.
|
---|
70 |
|
---|
71 | @insertcopying
|
---|
72 | @end ifnottex
|
---|
73 |
|
---|
74 | @menu
|
---|
75 | * Introduction:: Introduction.
|
---|
76 | * Invoking:: Command-line options, environment, exit status.
|
---|
77 | * Regular Expressions:: Regular Expressions.
|
---|
78 | * Usage:: Examples.
|
---|
79 | * Performance:: Performance tuning.
|
---|
80 | * Reporting Bugs:: Reporting Bugs.
|
---|
81 | * Copying:: License terms for this manual.
|
---|
82 | * Index:: Combined index.
|
---|
83 | @end menu
|
---|
84 |
|
---|
85 |
|
---|
86 | @node Introduction
|
---|
87 | @chapter Introduction
|
---|
88 |
|
---|
89 | @cindex searching for patterns
|
---|
90 |
|
---|
91 | Given one or more patterns, @command{grep} searches input files
|
---|
92 | for matches to the patterns.
|
---|
93 | When it finds a match in a line,
|
---|
94 | it copies the line to standard output (by default),
|
---|
95 | or produces whatever other sort of output you have requested with options.
|
---|
96 |
|
---|
97 | Though @command{grep} expects to do the matching on text,
|
---|
98 | it has no limits on input line length other than available memory,
|
---|
99 | and it can match arbitrary characters within a line.
|
---|
100 | If the final byte of an input file is not a newline,
|
---|
101 | @command{grep} silently supplies one.
|
---|
102 | Since newline is also a separator for the list of patterns,
|
---|
103 | there is no way to match newline characters in a text.
|
---|
104 |
|
---|
105 |
|
---|
106 | @node Invoking
|
---|
107 | @chapter Invoking @command{grep}
|
---|
108 |
|
---|
109 | The general synopsis of the @command{grep} command line is
|
---|
110 |
|
---|
111 | @example
|
---|
112 | grep [@var{option}...] [@var{patterns}] [@var{file}...]
|
---|
113 | @end example
|
---|
114 |
|
---|
115 | @noindent
|
---|
116 | There can be zero or more @var{option} arguments, and zero or more
|
---|
117 | @var{file} arguments. The @var{patterns} argument contains one or
|
---|
118 | more patterns separated by newlines, and is omitted when patterns are
|
---|
119 | given via the @samp{-e@ @var{patterns}} or @samp{-f@ @var{file}}
|
---|
120 | options. Typically @var{patterns} should be quoted when
|
---|
121 | @command{grep} is used in a shell command.
|
---|
122 |
|
---|
123 | @menu
|
---|
124 | * Command-line Options:: Short and long names, grouped by category.
|
---|
125 | * Environment Variables:: POSIX, GNU generic, and GNU grep specific.
|
---|
126 | * Exit Status:: Exit status returned by @command{grep}.
|
---|
127 | * grep Programs:: @command{grep} programs.
|
---|
128 | @end menu
|
---|
129 |
|
---|
130 | @node Command-line Options
|
---|
131 | @section Command-line Options
|
---|
132 |
|
---|
133 | @command{grep} comes with a rich set of options:
|
---|
134 | some from POSIX and some being GNU extensions.
|
---|
135 | Long option names are always a GNU extension,
|
---|
136 | even for options that are from POSIX specifications.
|
---|
137 | Options that are specified by POSIX,
|
---|
138 | under their short names,
|
---|
139 | are explicitly marked as such
|
---|
140 | to facilitate POSIX-portable programming.
|
---|
141 | A few option names are provided
|
---|
142 | for compatibility with older or more exotic implementations.
|
---|
143 |
|
---|
144 | @menu
|
---|
145 | * Generic Program Information::
|
---|
146 | * Matching Control::
|
---|
147 | * General Output Control::
|
---|
148 | * Output Line Prefix Control::
|
---|
149 | * Context Line Control::
|
---|
150 | * File and Directory Selection::
|
---|
151 | * Other Options::
|
---|
152 | @end menu
|
---|
153 |
|
---|
154 | Several additional options control
|
---|
155 | which variant of the @command{grep} matching engine is used.
|
---|
156 | @xref{grep Programs}.
|
---|
157 |
|
---|
158 | @node Generic Program Information
|
---|
159 | @subsection Generic Program Information
|
---|
160 |
|
---|
161 | @table @option
|
---|
162 |
|
---|
163 | @item --help
|
---|
164 | @opindex --help
|
---|
165 | @cindex usage summary, printing
|
---|
166 | Print a usage message briefly summarizing the command-line options
|
---|
167 | and the bug-reporting address, then exit.
|
---|
168 |
|
---|
169 | @item -V
|
---|
170 | @itemx --version
|
---|
171 | @opindex -V
|
---|
172 | @opindex --version
|
---|
173 | @cindex version, printing
|
---|
174 | Print the version number of @command{grep} to the standard output stream.
|
---|
175 | This version number should be included in all bug reports.
|
---|
176 |
|
---|
177 | @end table
|
---|
178 |
|
---|
179 | @node Matching Control
|
---|
180 | @subsection Matching Control
|
---|
181 |
|
---|
182 | @table @option
|
---|
183 |
|
---|
184 | @item -e @var{patterns}
|
---|
185 | @itemx --regexp=@var{patterns}
|
---|
186 | @opindex -e
|
---|
187 | @opindex --regexp=@var{patterns}
|
---|
188 | @cindex patterns option
|
---|
189 | Use @var{patterns} as one or more patterns; newlines within
|
---|
190 | @var{patterns} separate each pattern from the next.
|
---|
191 | If this option is used multiple times or is combined with the
|
---|
192 | @option{-f} (@option{--file}) option, search for all patterns given.
|
---|
193 | Typically @var{patterns} should be quoted when @command{grep} is used
|
---|
194 | in a shell command.
|
---|
195 | (@option{-e} is specified by POSIX.)
|
---|
196 |
|
---|
197 | @item -f @var{file}
|
---|
198 | @itemx --file=@var{file}
|
---|
199 | @opindex -f
|
---|
200 | @opindex --file
|
---|
201 | @cindex patterns from file
|
---|
202 | Obtain patterns from @var{file}, one per line.
|
---|
203 | If this option is used multiple times or is combined with the
|
---|
204 | @option{-e} (@option{--regexp}) option, search for all patterns given.
|
---|
205 | The empty file contains zero patterns, and therefore matches nothing.
|
---|
206 | (@option{-f} is specified by POSIX.)
|
---|
207 |
|
---|
208 | @item -i
|
---|
209 | @itemx -y
|
---|
210 | @itemx --ignore-case
|
---|
211 | @opindex -i
|
---|
212 | @opindex -y
|
---|
213 | @opindex --ignore-case
|
---|
214 | @cindex case insensitive search
|
---|
215 | Ignore case distinctions in patterns and input data,
|
---|
216 | so that characters that differ only in case
|
---|
217 | match each other. Although this is straightforward when letters
|
---|
218 | differ in case only via lowercase-uppercase pairs, the behavior is
|
---|
219 | unspecified in other situations. For example, uppercase ``S'' has an
|
---|
220 | unusual lowercase counterpart ``ſ'' (Unicode character U+017F, LATIN
|
---|
221 | SMALL LETTER LONG S) in many locales, and it is unspecified whether
|
---|
222 | this unusual character matches ``S'' or ``s'' even though uppercasing
|
---|
223 | it yields ``S''. Another example: the lowercase German letter ``ß''
|
---|
224 | (U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the
|
---|
225 | two-character string ``SS'' but it does not match ``SS'', and it might
|
---|
226 | not match the uppercase letter ``ẞ'' (U+1E9E, LATIN CAPITAL LETTER
|
---|
227 | SHARP S) even though lowercasing the latter yields the former.
|
---|
228 |
|
---|
229 | @option{-y} is an obsolete synonym that is provided for compatibility.
|
---|
230 | (@option{-i} is specified by POSIX.)
|
---|
231 |
|
---|
232 | @item --no-ignore-case
|
---|
233 | @opindex --no-ignore-case
|
---|
234 | Do not ignore case distinctions in patterns and input data. This is
|
---|
235 | the default. This option is useful for passing to shell scripts that
|
---|
236 | already use @option{-i}, in order to cancel its effects because the
|
---|
237 | two options override each other.
|
---|
238 |
|
---|
239 | @item -v
|
---|
240 | @itemx --invert-match
|
---|
241 | @opindex -v
|
---|
242 | @opindex --invert-match
|
---|
243 | @cindex invert matching
|
---|
244 | @cindex print non-matching lines
|
---|
245 | Invert the sense of matching, to select non-matching lines.
|
---|
246 | (@option{-v} is specified by POSIX.)
|
---|
247 |
|
---|
248 | @item -w
|
---|
249 | @itemx --word-regexp
|
---|
250 | @opindex -w
|
---|
251 | @opindex --word-regexp
|
---|
252 | @cindex matching whole words
|
---|
253 | Select only those lines containing matches that form whole words.
|
---|
254 | The test is that the matching substring must either
|
---|
255 | be at the beginning of the line,
|
---|
256 | or preceded by a non-word constituent character.
|
---|
257 | Similarly,
|
---|
258 | it must be either at the end of the line
|
---|
259 | or followed by a non-word constituent character.
|
---|
260 | Word constituent characters are letters, digits, and the underscore.
|
---|
261 | This option has no effect if @option{-x} is also specified.
|
---|
262 |
|
---|
263 | Because the @option{-w} option can match a substring that does not
|
---|
264 | begin and end with word constituents, it differs from surrounding a
|
---|
265 | regular expression with @samp{\<} and @samp{\>}. For example, although
|
---|
266 | @samp{grep -w @@} matches a line containing only @samp{@@}, @samp{grep
|
---|
267 | '\<@@\>'} cannot match any line because @samp{@@} is not a
|
---|
268 | word constituent. @xref{The Backslash Character and Special
|
---|
269 | Expressions}.
|
---|
270 |
|
---|
271 | @item -x
|
---|
272 | @itemx --line-regexp
|
---|
273 | @opindex -x
|
---|
274 | @opindex --line-regexp
|
---|
275 | @cindex match the whole line
|
---|
276 | Select only those matches that exactly match the whole line.
|
---|
277 | For regular expression patterns, this is like parenthesizing each
|
---|
278 | pattern and then surrounding it with @samp{^} and @samp{$}.
|
---|
279 | (@option{-x} is specified by POSIX.)
|
---|
280 |
|
---|
281 | @end table
|
---|
282 |
|
---|
283 | @node General Output Control
|
---|
284 | @subsection General Output Control
|
---|
285 |
|
---|
286 | @table @option
|
---|
287 |
|
---|
288 | @item -c
|
---|
289 | @itemx --count
|
---|
290 | @opindex -c
|
---|
291 | @opindex --count
|
---|
292 | @cindex counting lines
|
---|
293 | Suppress normal output;
|
---|
294 | instead print a count of matching lines for each input file.
|
---|
295 | With the @option{-v} (@option{--invert-match}) option,
|
---|
296 | count non-matching lines.
|
---|
297 | (@option{-c} is specified by POSIX.)
|
---|
298 |
|
---|
299 | @item --color[=@var{WHEN}]
|
---|
300 | @itemx --colour[=@var{WHEN}]
|
---|
301 | @opindex --color
|
---|
302 | @opindex --colour
|
---|
303 | @cindex highlight, color, colour
|
---|
304 | Surround the matched (non-empty) strings, matching lines, context lines,
|
---|
305 | file names, line numbers, byte offsets, and separators (for fields and
|
---|
306 | groups of context lines) with escape sequences to display them in color
|
---|
307 | on the terminal.
|
---|
308 | The colors are defined by the environment variable @env{GREP_COLORS}
|
---|
309 | and default to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36}
|
---|
310 | for bold red matched text, magenta file names, green line numbers,
|
---|
311 | green byte offsets, cyan separators, and default terminal colors otherwise.
|
---|
312 | The deprecated environment variable @env{GREP_COLOR} is still supported,
|
---|
313 | but its setting does not have priority;
|
---|
314 | it defaults to @samp{01;31} (bold red)
|
---|
315 | which only covers the color for matched text.
|
---|
316 | @var{WHEN} is @samp{never}, @samp{always}, or @samp{auto}.
|
---|
317 |
|
---|
318 | @item -L
|
---|
319 | @itemx --files-without-match
|
---|
320 | @opindex -L
|
---|
321 | @opindex --files-without-match
|
---|
322 | @cindex files which don't match
|
---|
323 | Suppress normal output;
|
---|
324 | instead print the name of each input file from which
|
---|
325 | no output would normally have been printed.
|
---|
326 |
|
---|
327 | @item -l
|
---|
328 | @itemx --files-with-matches
|
---|
329 | @opindex -l
|
---|
330 | @opindex --files-with-matches
|
---|
331 | @cindex names of matching files
|
---|
332 | Suppress normal output;
|
---|
333 | instead print the name of each input file from which
|
---|
334 | output would normally have been printed.
|
---|
335 | Scanning each input file stops upon first match.
|
---|
336 | (@option{-l} is specified by POSIX.)
|
---|
337 |
|
---|
338 | @item -m @var{num}
|
---|
339 | @itemx --max-count=@var{num}
|
---|
340 | @opindex -m
|
---|
341 | @opindex --max-count
|
---|
342 | @cindex max-count
|
---|
343 | Stop after the first @var{num} selected lines.
|
---|
344 | If the input is standard input from a regular file,
|
---|
345 | and @var{num} selected lines are output,
|
---|
346 | @command{grep} ensures that the standard input is positioned
|
---|
347 | just after the last selected line before exiting,
|
---|
348 | regardless of the presence of trailing context lines.
|
---|
349 | This enables a calling process to resume a search.
|
---|
350 | For example, the following shell script makes use of it:
|
---|
351 |
|
---|
352 | @example
|
---|
353 | while grep -m 1 'PATTERN'
|
---|
354 | do
|
---|
355 | echo xxxx
|
---|
356 | done < FILE
|
---|
357 | @end example
|
---|
358 |
|
---|
359 | But the following probably will not work because a pipe is not a regular
|
---|
360 | file:
|
---|
361 |
|
---|
362 | @example
|
---|
363 | # This probably will not work.
|
---|
364 | cat FILE |
|
---|
365 | while grep -m 1 'PATTERN'
|
---|
366 | do
|
---|
367 | echo xxxx
|
---|
368 | done
|
---|
369 | @end example
|
---|
370 |
|
---|
371 | @cindex context lines
|
---|
372 | When @command{grep} stops after @var{num} selected lines,
|
---|
373 | it outputs any trailing context lines.
|
---|
374 | When the @option{-c} or @option{--count} option is also used,
|
---|
375 | @command{grep} does not output a count greater than @var{num}.
|
---|
376 | When the @option{-v} or @option{--invert-match} option is also used,
|
---|
377 | @command{grep} stops after outputting @var{num} non-matching lines.
|
---|
378 |
|
---|
379 | @item -o
|
---|
380 | @itemx --only-matching
|
---|
381 | @opindex -o
|
---|
382 | @opindex --only-matching
|
---|
383 | @cindex only matching
|
---|
384 | Print only the matched (non-empty) parts of matching lines,
|
---|
385 | with each such part on a separate output line.
|
---|
386 | Output lines use the same delimiters as input, and delimiters are null
|
---|
387 | bytes if @option{-z} (@option{--null-data}) is also used (@pxref{Other
|
---|
388 | Options}).
|
---|
389 |
|
---|
390 | @item -q
|
---|
391 | @itemx --quiet
|
---|
392 | @itemx --silent
|
---|
393 | @opindex -q
|
---|
394 | @opindex --quiet
|
---|
395 | @opindex --silent
|
---|
396 | @cindex quiet, silent
|
---|
397 | Quiet; do not write anything to standard output.
|
---|
398 | Exit immediately with zero status if any match is found,
|
---|
399 | even if an error was detected.
|
---|
400 | Also see the @option{-s} or @option{--no-messages} option.
|
---|
401 | (@option{-q} is specified by POSIX.)
|
---|
402 |
|
---|
403 | @item -s
|
---|
404 | @itemx --no-messages
|
---|
405 | @opindex -s
|
---|
406 | @opindex --no-messages
|
---|
407 | @cindex suppress error messages
|
---|
408 | Suppress error messages about nonexistent or unreadable files.
|
---|
409 | Portability note:
|
---|
410 | unlike GNU @command{grep},
|
---|
411 | 7th Edition Unix @command{grep} did not conform to POSIX,
|
---|
412 | because it lacked @option{-q}
|
---|
413 | and its @option{-s} option behaved like
|
---|
414 | GNU @command{grep}'s @option{-q} option.@footnote{Of course, 7th Edition
|
---|
415 | Unix predated POSIX by several years!}
|
---|
416 | USG-style @command{grep} also lacked @option{-q}
|
---|
417 | but its @option{-s} option behaved like GNU @command{grep}'s.
|
---|
418 | Portable shell scripts should avoid both
|
---|
419 | @option{-q} and @option{-s} and should redirect
|
---|
420 | standard and error output to @file{/dev/null} instead.
|
---|
421 | (@option{-s} is specified by POSIX.)
|
---|
422 |
|
---|
423 | @end table
|
---|
424 |
|
---|
425 | @node Output Line Prefix Control
|
---|
426 | @subsection Output Line Prefix Control
|
---|
427 |
|
---|
428 | When several prefix fields are to be output,
|
---|
429 | the order is always file name, line number, and byte offset,
|
---|
430 | regardless of the order in which these options were specified.
|
---|
431 |
|
---|
432 | @table @option
|
---|
433 |
|
---|
434 | @item -b
|
---|
435 | @itemx --byte-offset
|
---|
436 | @opindex -b
|
---|
437 | @opindex --byte-offset
|
---|
438 | @cindex byte offset
|
---|
439 | Print the 0-based byte offset within the input file
|
---|
440 | before each line of output.
|
---|
441 | If @option{-o} (@option{--only-matching}) is specified,
|
---|
442 | print the offset of the matching part itself.
|
---|
443 |
|
---|
444 | @item -H
|
---|
445 | @itemx --with-filename
|
---|
446 | @opindex -H
|
---|
447 | @opindex --with-filename
|
---|
448 | @cindex with filename prefix
|
---|
449 | Print the file name for each match.
|
---|
450 | This is the default when there is more than one file to search.
|
---|
451 |
|
---|
452 | @item -h
|
---|
453 | @itemx --no-filename
|
---|
454 | @opindex -h
|
---|
455 | @opindex --no-filename
|
---|
456 | @cindex no filename prefix
|
---|
457 | Suppress the prefixing of file names on output.
|
---|
458 | This is the default when there is only one file
|
---|
459 | (or only standard input) to search.
|
---|
460 |
|
---|
461 | @item --label=@var{LABEL}
|
---|
462 | @opindex --label
|
---|
463 | @cindex changing name of standard input
|
---|
464 | Display input actually coming from standard input
|
---|
465 | as input coming from file @var{LABEL}.
|
---|
466 | This can be useful for commands that transform a file's contents
|
---|
467 | before searching; e.g.:
|
---|
468 |
|
---|
469 | @example
|
---|
470 | gzip -cd foo.gz | grep --label=foo -H 'some pattern'
|
---|
471 | @end example
|
---|
472 |
|
---|
473 | @item -n
|
---|
474 | @itemx --line-number
|
---|
475 | @opindex -n
|
---|
476 | @opindex --line-number
|
---|
477 | @cindex line numbering
|
---|
478 | Prefix each line of output with the 1-based line number within its input file.
|
---|
479 | (@option{-n} is specified by POSIX.)
|
---|
480 |
|
---|
481 | @item -T
|
---|
482 | @itemx --initial-tab
|
---|
483 | @opindex -T
|
---|
484 | @opindex --initial-tab
|
---|
485 | @cindex tab-aligned content lines
|
---|
486 | Make sure that the first character of actual line content lies on a tab stop,
|
---|
487 | so that the alignment of tabs looks normal.
|
---|
488 | This is useful with options that prefix their output to the actual content:
|
---|
489 | @option{-H}, @option{-n}, and @option{-b}.
|
---|
490 | This may also prepend spaces to output line numbers and byte offsets
|
---|
491 | so that lines from a single file all start at the same column.
|
---|
492 |
|
---|
493 | @item -Z
|
---|
494 | @itemx --null
|
---|
495 | @opindex -Z
|
---|
496 | @opindex --null
|
---|
497 | @cindex zero-terminated file names
|
---|
498 | Output a zero byte (the ASCII NUL character)
|
---|
499 | instead of the character that normally follows a file name.
|
---|
500 | For example,
|
---|
501 | @samp{grep -lZ} outputs a zero byte after each file name
|
---|
502 | instead of the usual newline.
|
---|
503 | This option makes the output unambiguous,
|
---|
504 | even in the presence of file names containing unusual characters like newlines.
|
---|
505 | This option can be used with commands like
|
---|
506 | @samp{find -print0}, @samp{perl -0}, @samp{sort -z}, and @samp{xargs -0}
|
---|
507 | to process arbitrary file names,
|
---|
508 | even those that contain newline characters.
|
---|
509 |
|
---|
510 | @end table
|
---|
511 |
|
---|
512 | @node Context Line Control
|
---|
513 | @subsection Context Line Control
|
---|
514 |
|
---|
515 | @cindex context lines
|
---|
516 | @dfn{Context lines} are non-matching lines that are near a matching line.
|
---|
517 | They are output only if one of the following options are used.
|
---|
518 | Regardless of how these options are set,
|
---|
519 | @command{grep} never outputs any given line more than once.
|
---|
520 | If the @option{-o} (@option{--only-matching}) option is specified,
|
---|
521 | these options have no effect and a warning is given upon their use.
|
---|
522 |
|
---|
523 | @table @option
|
---|
524 |
|
---|
525 | @item -A @var{num}
|
---|
526 | @itemx --after-context=@var{num}
|
---|
527 | @opindex -A
|
---|
528 | @opindex --after-context
|
---|
529 | @cindex after context
|
---|
530 | @cindex context lines, after match
|
---|
531 | Print @var{num} lines of trailing context after matching lines.
|
---|
532 |
|
---|
533 | @item -B @var{num}
|
---|
534 | @itemx --before-context=@var{num}
|
---|
535 | @opindex -B
|
---|
536 | @opindex --before-context
|
---|
537 | @cindex before context
|
---|
538 | @cindex context lines, before match
|
---|
539 | Print @var{num} lines of leading context before matching lines.
|
---|
540 |
|
---|
541 | @item -C @var{num}
|
---|
542 | @itemx -@var{num}
|
---|
543 | @itemx --context=@var{num}
|
---|
544 | @opindex -C
|
---|
545 | @opindex --context
|
---|
546 | @opindex -@var{num}
|
---|
547 | @cindex context lines
|
---|
548 | Print @var{num} lines of leading and trailing output context.
|
---|
549 |
|
---|
550 | @item --group-separator=@var{string}
|
---|
551 | @opindex --group-separator
|
---|
552 | @cindex group separator
|
---|
553 | When @option{-A}, @option{-B} or @option{-C} are in use,
|
---|
554 | print @var{string} instead of @option{--} between groups of lines.
|
---|
555 |
|
---|
556 | @item --no-group-separator
|
---|
557 | @opindex --group-separator
|
---|
558 | @cindex group separator
|
---|
559 | When @option{-A}, @option{-B} or @option{-C} are in use,
|
---|
560 | do not print a separator between groups of lines.
|
---|
561 |
|
---|
562 | @end table
|
---|
563 |
|
---|
564 | Here are some points about how @command{grep} chooses
|
---|
565 | the separator to print between prefix fields and line content:
|
---|
566 |
|
---|
567 | @itemize @bullet
|
---|
568 | @item
|
---|
569 | Matching lines normally use @samp{:} as a separator
|
---|
570 | between prefix fields and actual line content.
|
---|
571 |
|
---|
572 | @item
|
---|
573 | Context (i.e., non-matching) lines use @samp{-} instead.
|
---|
574 |
|
---|
575 | @item
|
---|
576 | When context is not specified,
|
---|
577 | matching lines are simply output one right after another.
|
---|
578 |
|
---|
579 | @item
|
---|
580 | When context is specified,
|
---|
581 | lines that are adjacent in the input form a group
|
---|
582 | and are output one right after another, while
|
---|
583 | by default a separator appears between non-adjacent groups.
|
---|
584 |
|
---|
585 | @item
|
---|
586 | The default separator
|
---|
587 | is a @samp{--} line; its presence and appearance
|
---|
588 | can be changed with the options above.
|
---|
589 |
|
---|
590 | @item
|
---|
591 | Each group may contain
|
---|
592 | several matching lines when they are close enough to each other
|
---|
593 | that two adjacent groups connect and can merge into a single
|
---|
594 | contiguous one.
|
---|
595 | @end itemize
|
---|
596 |
|
---|
597 | @node File and Directory Selection
|
---|
598 | @subsection File and Directory Selection
|
---|
599 |
|
---|
600 | @table @option
|
---|
601 |
|
---|
602 | @item -a
|
---|
603 | @itemx --text
|
---|
604 | @opindex -a
|
---|
605 | @opindex --text
|
---|
606 | @cindex suppress binary data
|
---|
607 | @cindex binary files
|
---|
608 | Process a binary file as if it were text;
|
---|
609 | this is equivalent to the @samp{--binary-files=text} option.
|
---|
610 |
|
---|
611 | @item --binary-files=@var{type}
|
---|
612 | @opindex --binary-files
|
---|
613 | @cindex binary files
|
---|
614 | If a file's data or metadata
|
---|
615 | indicate that the file contains binary data,
|
---|
616 | assume that the file is of type @var{type}.
|
---|
617 | Non-text bytes indicate binary data; these are either output bytes that are
|
---|
618 | improperly encoded for the current locale (@pxref{Environment
|
---|
619 | Variables}), or null input bytes when the
|
---|
620 | @option{-z} (@option{--null-data}) option is not given (@pxref{Other
|
---|
621 | Options}).
|
---|
622 |
|
---|
623 | By default, @var{type} is @samp{binary}, and @command{grep}
|
---|
624 | suppresses output after null input binary data is discovered,
|
---|
625 | and suppresses output lines that contain improperly encoded data.
|
---|
626 | When some output is suppressed, @command{grep} follows any output
|
---|
627 | with a one-line message saying that a binary file matches.
|
---|
628 |
|
---|
629 | If @var{type} is @samp{without-match},
|
---|
630 | when @command{grep} discovers null input binary data
|
---|
631 | it assumes that the rest of the file does not match;
|
---|
632 | this is equivalent to the @option{-I} option.
|
---|
633 |
|
---|
634 | If @var{type} is @samp{text},
|
---|
635 | @command{grep} processes binary data as if it were text;
|
---|
636 | this is equivalent to the @option{-a} option.
|
---|
637 |
|
---|
638 | When @var{type} is @samp{binary}, @command{grep} may treat non-text
|
---|
639 | bytes as line terminators even without the @option{-z}
|
---|
640 | (@option{--null-data}) option. This means choosing @samp{binary}
|
---|
641 | versus @samp{text} can affect whether a pattern matches a file. For
|
---|
642 | example, when @var{type} is @samp{binary} the pattern @samp{q$} might
|
---|
643 | match @samp{q} immediately followed by a null byte, even though this
|
---|
644 | is not matched when @var{type} is @samp{text}. Conversely, when
|
---|
645 | @var{type} is @samp{binary} the pattern @samp{.} (period) might not
|
---|
646 | match a null byte.
|
---|
647 |
|
---|
648 | @emph{Warning:} The @option{-a} (@option{--binary-files=text}) option
|
---|
649 | might output binary garbage, which can have nasty side effects if the
|
---|
650 | output is a terminal and if the terminal driver interprets some of it
|
---|
651 | as commands. On the other hand, when reading files whose text
|
---|
652 | encodings are unknown, it can be helpful to use @option{-a} or to set
|
---|
653 | @samp{LC_ALL='C'} in the environment, in order to find more matches
|
---|
654 | even if the matches are unsafe for direct display.
|
---|
655 |
|
---|
656 | @item -D @var{action}
|
---|
657 | @itemx --devices=@var{action}
|
---|
658 | @opindex -D
|
---|
659 | @opindex --devices
|
---|
660 | @cindex device search
|
---|
661 | If an input file is a device, FIFO, or socket, use @var{action} to process it.
|
---|
662 | If @var{action} is @samp{read},
|
---|
663 | all devices are read just as if they were ordinary files.
|
---|
664 | If @var{action} is @samp{skip},
|
---|
665 | devices, FIFOs, and sockets are silently skipped.
|
---|
666 | By default, devices are read if they are on the command line or if the
|
---|
667 | @option{-R} (@option{--dereference-recursive}) option is used, and are
|
---|
668 | skipped if they are encountered recursively and the @option{-r}
|
---|
669 | (@option{--recursive}) option is used.
|
---|
670 | This option has no effect on a file that is read via standard input.
|
---|
671 |
|
---|
672 | @item -d @var{action}
|
---|
673 | @itemx --directories=@var{action}
|
---|
674 | @opindex -d
|
---|
675 | @opindex --directories
|
---|
676 | @cindex directory search
|
---|
677 | @cindex symbolic links
|
---|
678 | If an input file is a directory, use @var{action} to process it.
|
---|
679 | By default, @var{action} is @samp{read},
|
---|
680 | which means that directories are read just as if they were ordinary files
|
---|
681 | (some operating systems and file systems disallow this,
|
---|
682 | and will cause @command{grep}
|
---|
683 | to print error messages for every directory or silently skip them).
|
---|
684 | If @var{action} is @samp{skip}, directories are silently skipped.
|
---|
685 | If @var{action} is @samp{recurse},
|
---|
686 | @command{grep} reads all files under each directory, recursively,
|
---|
687 | following command-line symbolic links and skipping other symlinks;
|
---|
688 | this is equivalent to the @option{-r} option.
|
---|
689 |
|
---|
690 | @item --exclude=@var{glob}
|
---|
691 | @opindex --exclude
|
---|
692 | @cindex exclude files
|
---|
693 | @cindex searching directory trees
|
---|
694 | Skip any command-line file with a name suffix that matches the pattern
|
---|
695 | @var{glob}, using wildcard matching; a name suffix is either the whole
|
---|
696 | name, or a trailing part that starts with a non-slash character
|
---|
697 | immediately after a slash (@samp{/}) in the name.
|
---|
698 | When searching recursively, skip any subfile whose base
|
---|
699 | name matches @var{glob}; the base name is the part after the last
|
---|
700 | slash. A pattern can use
|
---|
701 | @samp{*}, @samp{?}, and @samp{[}...@samp{]} as wildcards,
|
---|
702 | and @code{\} to quote a wildcard or backslash character literally.
|
---|
703 |
|
---|
704 | @item --exclude-from=@var{file}
|
---|
705 | @opindex --exclude-from
|
---|
706 | @cindex exclude files
|
---|
707 | @cindex searching directory trees
|
---|
708 | Skip files whose name matches any of the patterns
|
---|
709 | read from @var{file} (using wildcard matching as described
|
---|
710 | under @option{--exclude}).
|
---|
711 |
|
---|
712 | @item --exclude-dir=@var{glob}
|
---|
713 | @opindex --exclude-dir
|
---|
714 | @cindex exclude directories
|
---|
715 | Skip any command-line directory with a name suffix that matches the
|
---|
716 | pattern @var{glob}. When searching recursively, skip any subdirectory
|
---|
717 | whose base name matches @var{glob}. Ignore any redundant trailing
|
---|
718 | slashes in @var{glob}.
|
---|
719 |
|
---|
720 | @item -I
|
---|
721 | Process a binary file as if it did not contain matching data;
|
---|
722 | this is equivalent to the @samp{--binary-files=without-match} option.
|
---|
723 |
|
---|
724 | @item --include=@var{glob}
|
---|
725 | @opindex --include
|
---|
726 | @cindex include files
|
---|
727 | @cindex searching directory trees
|
---|
728 | Search only files whose name matches @var{glob},
|
---|
729 | using wildcard matching as described under @option{--exclude}.
|
---|
730 | If contradictory @option{--include} and @option{--exclude} options are
|
---|
731 | given, the last matching one wins. If no @option{--include} or
|
---|
732 | @option{--exclude} options match, a file is included unless the first
|
---|
733 | such option is @option{--include}.
|
---|
734 |
|
---|
735 | @item -r
|
---|
736 | @itemx --recursive
|
---|
737 | @opindex -r
|
---|
738 | @opindex --recursive
|
---|
739 | @cindex recursive search
|
---|
740 | @cindex searching directory trees
|
---|
741 | @cindex symbolic links
|
---|
742 | For each directory operand,
|
---|
743 | read and process all files in that directory, recursively.
|
---|
744 | Follow symbolic links on the command line, but skip symlinks
|
---|
745 | that are encountered recursively.
|
---|
746 | Note that if no file operand is given, grep searches the working directory.
|
---|
747 | This is the same as the @samp{--directories=recurse} option.
|
---|
748 |
|
---|
749 | @item -R
|
---|
750 | @itemx --dereference-recursive
|
---|
751 | @opindex -R
|
---|
752 | @opindex --dereference-recursive
|
---|
753 | @cindex recursive search
|
---|
754 | @cindex searching directory trees
|
---|
755 | @cindex symbolic links
|
---|
756 | For each directory operand, read and process all files in that
|
---|
757 | directory, recursively, following all symbolic links.
|
---|
758 |
|
---|
759 | @end table
|
---|
760 |
|
---|
761 | @node Other Options
|
---|
762 | @subsection Other Options
|
---|
763 |
|
---|
764 | @table @option
|
---|
765 |
|
---|
766 | @item --
|
---|
767 | @opindex --
|
---|
768 | @cindex option delimiter
|
---|
769 | Delimit the option list. Later arguments, if any, are treated as
|
---|
770 | operands even if they begin with @samp{-}. For example, @samp{grep PAT --
|
---|
771 | -file1 file2} searches for the pattern PAT in the files named @file{-file1}
|
---|
772 | and @file{file2}.
|
---|
773 |
|
---|
774 | @item --line-buffered
|
---|
775 | @opindex --line-buffered
|
---|
776 | @cindex line buffering
|
---|
777 | Use line buffering for standard output, regardless of output device.
|
---|
778 | By default, standard output is line buffered for interactive devices,
|
---|
779 | and is fully buffered otherwise. With full buffering, the output
|
---|
780 | buffer is flushed when full; with line buffering, the buffer is also
|
---|
781 | flushed after every output line. The buffer size is system dependent.
|
---|
782 |
|
---|
783 | @item -U
|
---|
784 | @itemx --binary
|
---|
785 | @opindex -U
|
---|
786 | @opindex --binary
|
---|
787 | @cindex MS-Windows binary I/O
|
---|
788 | @cindex binary I/O
|
---|
789 | On platforms that distinguish between text and binary I/O,
|
---|
790 | use the latter when reading and writing files other
|
---|
791 | than the user's terminal, so that all input bytes are read and written
|
---|
792 | as-is. This overrides the default behavior where @command{grep}
|
---|
793 | follows the operating system's advice whether to use text or binary
|
---|
794 | I/O@. On MS-Windows when @command{grep} uses text I/O it reads a
|
---|
795 | carriage return--newline pair as a newline and a Control-Z as
|
---|
796 | end-of-file, and it writes a newline as a carriage return--newline
|
---|
797 | pair.
|
---|
798 |
|
---|
799 | When using text I/O @option{--byte-offset} (@option{-b}) counts and
|
---|
800 | @option{--binary-files} heuristics apply to input data after text-I/O
|
---|
801 | processing. Also, the @option{--binary-files} heuristics need not agree
|
---|
802 | with the @option{--binary} option; that is, they may treat the data as
|
---|
803 | text even if @option{--binary} is given, or vice versa.
|
---|
804 | @xref{File and Directory Selection}.
|
---|
805 |
|
---|
806 | This option has no effect on GNU and other POSIX-compatible platforms,
|
---|
807 | which do not distinguish text from binary I/O.
|
---|
808 |
|
---|
809 | @item -z
|
---|
810 | @itemx --null-data
|
---|
811 | @opindex -z
|
---|
812 | @opindex --null-data
|
---|
813 | @cindex zero-terminated lines
|
---|
814 | Treat input and output data as sequences of lines, each terminated by
|
---|
815 | a zero byte (the ASCII NUL character) instead of a newline.
|
---|
816 | Like the @option{-Z} or @option{--null} option,
|
---|
817 | this option can be used with commands like
|
---|
818 | @samp{sort -z} to process arbitrary file names.
|
---|
819 |
|
---|
820 | @end table
|
---|
821 |
|
---|
822 | @node Environment Variables
|
---|
823 | @section Environment Variables
|
---|
824 |
|
---|
825 | The behavior of @command{grep} is affected
|
---|
826 | by the following environment variables.
|
---|
827 |
|
---|
828 | @vindex LANGUAGE @r{environment variable}
|
---|
829 | @vindex LC_ALL @r{environment variable}
|
---|
830 | @vindex LC_MESSAGES @r{environment variable}
|
---|
831 | @vindex LANG @r{environment variable}
|
---|
832 | The locale for category @w{@code{LC_@var{foo}}}
|
---|
833 | is specified by examining the three environment variables
|
---|
834 | @env{LC_ALL}, @w{@env{LC_@var{foo}}}, and @env{LANG},
|
---|
835 | in that order.
|
---|
836 | The first of these variables that is set specifies the locale.
|
---|
837 | For example, if @env{LC_ALL} is not set,
|
---|
838 | but @env{LC_COLLATE} is set to @samp{pt_BR},
|
---|
839 | then the Brazilian Portuguese locale is used
|
---|
840 | for the @env{LC_COLLATE} category.
|
---|
841 | As a special case for @env{LC_MESSAGES} only, the environment variable
|
---|
842 | @env{LANGUAGE} can contain a colon-separated list of languages that
|
---|
843 | overrides the three environment variables that ordinarily specify
|
---|
844 | the @env{LC_MESSAGES} category.
|
---|
845 | The @samp{C} locale is used if none of these environment variables are set,
|
---|
846 | if the locale catalog is not installed,
|
---|
847 | or if @command{grep} was not compiled
|
---|
848 | with national language support (NLS).
|
---|
849 | The shell command @code{locale -a} lists locales that are currently available.
|
---|
850 |
|
---|
851 | Many of the environment variables in the following list let you
|
---|
852 | control highlighting using
|
---|
853 | Select Graphic Rendition (SGR)
|
---|
854 | commands interpreted by the terminal or terminal emulator.
|
---|
855 | (See the
|
---|
856 | section
|
---|
857 | in the documentation of your text terminal
|
---|
858 | for permitted values and their meanings as character attributes.)
|
---|
859 | These substring values are integers in decimal representation
|
---|
860 | and can be concatenated with semicolons.
|
---|
861 | @command{grep} takes care of assembling the result
|
---|
862 | into a complete SGR sequence (@samp{\33[}...@samp{m}).
|
---|
863 | Common values to concatenate include
|
---|
864 | @samp{1} for bold,
|
---|
865 | @samp{4} for underline,
|
---|
866 | @samp{5} for blink,
|
---|
867 | @samp{7} for inverse,
|
---|
868 | @samp{39} for default foreground color,
|
---|
869 | @samp{30} to @samp{37} for foreground colors,
|
---|
870 | @samp{90} to @samp{97} for 16-color mode foreground colors,
|
---|
871 | @samp{38;5;0} to @samp{38;5;255}
|
---|
872 | for 88-color and 256-color modes foreground colors,
|
---|
873 | @samp{49} for default background color,
|
---|
874 | @samp{40} to @samp{47} for background colors,
|
---|
875 | @samp{100} to @samp{107} for 16-color mode background colors,
|
---|
876 | and @samp{48;5;0} to @samp{48;5;255}
|
---|
877 | for 88-color and 256-color modes background colors.
|
---|
878 |
|
---|
879 | The two-letter names used in the @env{GREP_COLORS} environment variable
|
---|
880 | (and some of the others) refer to terminal ``capabilities,'' the ability
|
---|
881 | of a terminal to highlight text, or change its color, and so on.
|
---|
882 | These capabilities are stored in an online database and accessed by
|
---|
883 | the @code{terminfo} library.
|
---|
884 |
|
---|
885 | @cindex environment variables
|
---|
886 |
|
---|
887 | @table @env
|
---|
888 |
|
---|
889 | @item GREP_COLOR
|
---|
890 | @vindex GREP_COLOR @r{environment variable}
|
---|
891 | @cindex highlight markers
|
---|
892 | This variable specifies the color used to highlight matched (non-empty) text.
|
---|
893 | It is deprecated in favor of @env{GREP_COLORS}, but still supported.
|
---|
894 | The @samp{mt}, @samp{ms}, and @samp{mc} capabilities of @env{GREP_COLORS}
|
---|
895 | have priority over it.
|
---|
896 | It can only specify the color used to highlight
|
---|
897 | the matching non-empty text in any matching line
|
---|
898 | (a selected line when the @option{-v} command-line option is omitted,
|
---|
899 | or a context line when @option{-v} is specified).
|
---|
900 | The default is @samp{01;31},
|
---|
901 | which means a bold red foreground text on the terminal's default background.
|
---|
902 |
|
---|
903 | @item GREP_COLORS
|
---|
904 | @vindex GREP_COLORS @r{environment variable}
|
---|
905 | @cindex highlight markers
|
---|
906 | This variable specifies the colors and other attributes
|
---|
907 | used to highlight various parts of the output.
|
---|
908 | Its value is a colon-separated list of @code{terminfo} capabilities
|
---|
909 | that defaults to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36}
|
---|
910 | with the @samp{rv} and @samp{ne} boolean capabilities omitted (i.e., false).
|
---|
911 | Supported capabilities are as follows.
|
---|
912 |
|
---|
913 | @table @code
|
---|
914 | @item sl=
|
---|
915 | @vindex sl GREP_COLORS @r{capability}
|
---|
916 | SGR substring for whole selected lines
|
---|
917 | (i.e.,
|
---|
918 | matching lines when the @option{-v} command-line option is omitted,
|
---|
919 | or non-matching lines when @option{-v} is specified).
|
---|
920 | If however the boolean @samp{rv} capability
|
---|
921 | and the @option{-v} command-line option are both specified,
|
---|
922 | it applies to context matching lines instead.
|
---|
923 | The default is empty (i.e., the terminal's default color pair).
|
---|
924 |
|
---|
925 | @item cx=
|
---|
926 | @vindex cx GREP_COLORS @r{capability}
|
---|
927 | SGR substring for whole context lines
|
---|
928 | (i.e.,
|
---|
929 | non-matching lines when the @option{-v} command-line option is omitted,
|
---|
930 | or matching lines when @option{-v} is specified).
|
---|
931 | If however the boolean @samp{rv} capability
|
---|
932 | and the @option{-v} command-line option are both specified,
|
---|
933 | it applies to selected non-matching lines instead.
|
---|
934 | The default is empty (i.e., the terminal's default color pair).
|
---|
935 |
|
---|
936 | @item rv
|
---|
937 | @vindex rv GREP_COLORS @r{capability}
|
---|
938 | Boolean value that reverses (swaps) the meanings of
|
---|
939 | the @samp{sl=} and @samp{cx=} capabilities
|
---|
940 | when the @option{-v} command-line option is specified.
|
---|
941 | The default is false (i.e., the capability is omitted).
|
---|
942 |
|
---|
943 | @item mt=01;31
|
---|
944 | @vindex mt GREP_COLORS @r{capability}
|
---|
945 | SGR substring for matching non-empty text in any matching line
|
---|
946 | (i.e.,
|
---|
947 | a selected line when the @option{-v} command-line option is omitted,
|
---|
948 | or a context line when @option{-v} is specified).
|
---|
949 | Setting this is equivalent to setting both @samp{ms=} and @samp{mc=}
|
---|
950 | at once to the same value.
|
---|
951 | The default is a bold red text foreground over the current line background.
|
---|
952 |
|
---|
953 | @item ms=01;31
|
---|
954 | @vindex ms GREP_COLORS @r{capability}
|
---|
955 | SGR substring for matching non-empty text in a selected line.
|
---|
956 | (This is used only when the @option{-v} command-line option is omitted.)
|
---|
957 | The effect of the @samp{sl=} (or @samp{cx=} if @samp{rv}) capability
|
---|
958 | remains active when this takes effect.
|
---|
959 | The default is a bold red text foreground over the current line background.
|
---|
960 |
|
---|
961 | @item mc=01;31
|
---|
962 | @vindex mc GREP_COLORS @r{capability}
|
---|
963 | SGR substring for matching non-empty text in a context line.
|
---|
964 | (This is used only when the @option{-v} command-line option is specified.)
|
---|
965 | The effect of the @samp{cx=} (or @samp{sl=} if @samp{rv}) capability
|
---|
966 | remains active when this takes effect.
|
---|
967 | The default is a bold red text foreground over the current line background.
|
---|
968 |
|
---|
969 | @item fn=35
|
---|
970 | @vindex fn GREP_COLORS @r{capability}
|
---|
971 | SGR substring for file names prefixing any content line.
|
---|
972 | The default is a magenta text foreground over the terminal's default background.
|
---|
973 |
|
---|
974 | @item ln=32
|
---|
975 | @vindex ln GREP_COLORS @r{capability}
|
---|
976 | SGR substring for line numbers prefixing any content line.
|
---|
977 | The default is a green text foreground over the terminal's default background.
|
---|
978 |
|
---|
979 | @item bn=32
|
---|
980 | @vindex bn GREP_COLORS @r{capability}
|
---|
981 | SGR substring for byte offsets prefixing any content line.
|
---|
982 | The default is a green text foreground over the terminal's default background.
|
---|
983 |
|
---|
984 | @item se=36
|
---|
985 | @vindex fn GREP_COLORS @r{capability}
|
---|
986 | SGR substring for separators that are inserted
|
---|
987 | between selected line fields (@samp{:}),
|
---|
988 | between context line fields (@samp{-}),
|
---|
989 | and between groups of adjacent lines
|
---|
990 | when nonzero context is specified (@samp{--}).
|
---|
991 | The default is a cyan text foreground over the terminal's default background.
|
---|
992 |
|
---|
993 | @item ne
|
---|
994 | @vindex ne GREP_COLORS @r{capability}
|
---|
995 | Boolean value that prevents clearing to the end of line
|
---|
996 | using Erase in Line (EL) to Right (@samp{\33[K})
|
---|
997 | each time a colorized item ends.
|
---|
998 | This is needed on terminals on which EL is not supported.
|
---|
999 | It is otherwise useful on terminals
|
---|
1000 | for which the @code{back_color_erase}
|
---|
1001 | (@code{bce}) boolean @code{terminfo} capability does not apply,
|
---|
1002 | when the chosen highlight colors do not affect the background,
|
---|
1003 | or when EL is too slow or causes too much flicker.
|
---|
1004 | The default is false (i.e., the capability is omitted).
|
---|
1005 | @end table
|
---|
1006 |
|
---|
1007 | Note that boolean capabilities have no @samp{=}... part.
|
---|
1008 | They are omitted (i.e., false) by default and become true when specified.
|
---|
1009 |
|
---|
1010 |
|
---|
1011 | @item LC_ALL
|
---|
1012 | @itemx LC_COLLATE
|
---|
1013 | @itemx LANG
|
---|
1014 | @vindex LC_ALL @r{environment variable}
|
---|
1015 | @vindex LC_COLLATE @r{environment variable}
|
---|
1016 | @vindex LANG @r{environment variable}
|
---|
1017 | @cindex character type
|
---|
1018 | @cindex national language support
|
---|
1019 | @cindex NLS
|
---|
1020 | These variables specify the locale for the @env{LC_COLLATE} category,
|
---|
1021 | which might affect how range expressions like @samp{[a-z]} are
|
---|
1022 | interpreted.
|
---|
1023 |
|
---|
1024 | @item LC_ALL
|
---|
1025 | @itemx LC_CTYPE
|
---|
1026 | @itemx LANG
|
---|
1027 | @vindex LC_ALL @r{environment variable}
|
---|
1028 | @vindex LC_CTYPE @r{environment variable}
|
---|
1029 | @vindex LANG @r{environment variable}
|
---|
1030 | @cindex encoding error
|
---|
1031 | @cindex null character
|
---|
1032 | These variables specify the locale for the @env{LC_CTYPE} category,
|
---|
1033 | which determines the type of characters,
|
---|
1034 | e.g., which characters are whitespace.
|
---|
1035 | This category also determines the character encoding.
|
---|
1036 | @xref{Character Encoding}.
|
---|
1037 |
|
---|
1038 | @item LANGUAGE
|
---|
1039 | @itemx LC_ALL
|
---|
1040 | @itemx LC_MESSAGES
|
---|
1041 | @itemx LANG
|
---|
1042 | @vindex LANGUAGE @r{environment variable}
|
---|
1043 | @vindex LC_ALL @r{environment variable}
|
---|
1044 | @vindex LC_MESSAGES @r{environment variable}
|
---|
1045 | @vindex LANG @r{environment variable}
|
---|
1046 | @cindex language of messages
|
---|
1047 | @cindex message language
|
---|
1048 | @cindex national language support
|
---|
1049 | @cindex translation of message language
|
---|
1050 | These variables specify the locale for the @env{LC_MESSAGES} category,
|
---|
1051 | which determines the language that @command{grep} uses for messages.
|
---|
1052 | The default @samp{C} locale uses American English messages.
|
---|
1053 |
|
---|
1054 | @item POSIXLY_CORRECT
|
---|
1055 | @vindex POSIXLY_CORRECT @r{environment variable}
|
---|
1056 | If set, @command{grep} behaves as POSIX requires; otherwise,
|
---|
1057 | @command{grep} behaves more like other GNU programs.
|
---|
1058 | POSIX
|
---|
1059 | requires that options that
|
---|
1060 | follow file names must be treated as file names;
|
---|
1061 | by default,
|
---|
1062 | such options are permuted to the front of the operand list
|
---|
1063 | and are treated as options.
|
---|
1064 | Also, @env{POSIXLY_CORRECT} disables special handling of an
|
---|
1065 | invalid bracket expression. @xref{invalid-bracket-expr}.
|
---|
1066 |
|
---|
1067 | @item _@var{N}_GNU_nonoption_argv_flags_
|
---|
1068 | @vindex _@var{N}_GNU_nonoption_argv_flags_ @r{environment variable}
|
---|
1069 | (Here @code{@var{N}} is @command{grep}'s numeric process ID.)
|
---|
1070 | If the @var{i}th character of this environment variable's value is @samp{1},
|
---|
1071 | do not consider the @var{i}th operand of @command{grep} to be an option,
|
---|
1072 | even if it appears to be one.
|
---|
1073 | A shell can put this variable in the environment for each command it runs,
|
---|
1074 | specifying which operands are the results of file name wildcard expansion
|
---|
1075 | and therefore should not be treated as options.
|
---|
1076 | This behavior is available only with the GNU C library,
|
---|
1077 | and only when @env{POSIXLY_CORRECT} is not set.
|
---|
1078 |
|
---|
1079 | @end table
|
---|
1080 |
|
---|
1081 | The @env{GREP_OPTIONS} environment variable of @command{grep} 2.20 and
|
---|
1082 | earlier is no longer supported, as it caused problems when writing
|
---|
1083 | portable scripts. To make arbitrary changes to how @command{grep}
|
---|
1084 | works, you can use an alias or script instead. For example, if
|
---|
1085 | @command{grep} is in the directory @samp{/usr/bin} you can prepend
|
---|
1086 | @file{$HOME/bin} to your @env{PATH} and create an executable script
|
---|
1087 | @file{$HOME/bin/grep} containing the following:
|
---|
1088 |
|
---|
1089 | @example
|
---|
1090 | #! /bin/sh
|
---|
1091 | export PATH=/usr/bin
|
---|
1092 | exec grep --color=auto --devices=skip "$@@"
|
---|
1093 | @end example
|
---|
1094 |
|
---|
1095 |
|
---|
1096 | @node Exit Status
|
---|
1097 | @section Exit Status
|
---|
1098 | @cindex exit status
|
---|
1099 | @cindex return status
|
---|
1100 |
|
---|
1101 | Normally the exit status is 0 if a line is selected, 1 if no lines
|
---|
1102 | were selected, and 2 if an error occurred. However, if the
|
---|
1103 | @option{-q} or @option{--quiet} or @option{--silent} option is used
|
---|
1104 | and a line is selected, the exit status is 0 even if an error
|
---|
1105 | occurred. Other @command{grep} implementations may exit with status
|
---|
1106 | greater than 2 on error.
|
---|
1107 |
|
---|
1108 | @node grep Programs
|
---|
1109 | @section @command{grep} Programs
|
---|
1110 | @cindex @command{grep} programs
|
---|
1111 | @cindex variants of @command{grep}
|
---|
1112 |
|
---|
1113 | @command{grep} searches the named input files
|
---|
1114 | for lines containing a match to the given patterns.
|
---|
1115 | By default, @command{grep} prints the matching lines.
|
---|
1116 | A file named @file{-} stands for standard input.
|
---|
1117 | If no input is specified, @command{grep} searches the working
|
---|
1118 | directory @file{.} if given a command-line option specifying
|
---|
1119 | recursion; otherwise, @command{grep} searches standard input.
|
---|
1120 | There are four major variants of @command{grep},
|
---|
1121 | controlled by the following options.
|
---|
1122 |
|
---|
1123 | @table @option
|
---|
1124 |
|
---|
1125 | @item -G
|
---|
1126 | @itemx --basic-regexp
|
---|
1127 | @opindex -G
|
---|
1128 | @opindex --basic-regexp
|
---|
1129 | @cindex matching basic regular expressions
|
---|
1130 | Interpret patterns as basic regular expressions (BREs).
|
---|
1131 | This is the default.
|
---|
1132 |
|
---|
1133 | @item -E
|
---|
1134 | @itemx --extended-regexp
|
---|
1135 | @opindex -E
|
---|
1136 | @opindex --extended-regexp
|
---|
1137 | @cindex matching extended regular expressions
|
---|
1138 | Interpret patterns as extended regular expressions (EREs).
|
---|
1139 | (@option{-E} is specified by POSIX.)
|
---|
1140 |
|
---|
1141 | @item -F
|
---|
1142 | @itemx --fixed-strings
|
---|
1143 | @opindex -F
|
---|
1144 | @opindex --fixed-strings
|
---|
1145 | @cindex matching fixed strings
|
---|
1146 | Interpret patterns as fixed strings, not regular expressions.
|
---|
1147 | (@option{-F} is specified by POSIX.)
|
---|
1148 |
|
---|
1149 | @item -P
|
---|
1150 | @itemx --perl-regexp
|
---|
1151 | @opindex -P
|
---|
1152 | @opindex --perl-regexp
|
---|
1153 | @cindex matching Perl-compatible regular expressions
|
---|
1154 | Interpret patterns as Perl-compatible regular expressions (PCREs).
|
---|
1155 | PCRE support is here to stay, but consider this option experimental when
|
---|
1156 | combined with the @option{-z} (@option{--null-data}) option, and note that
|
---|
1157 | @samp{grep@ -P} may warn of unimplemented features.
|
---|
1158 | @xref{Other Options}.
|
---|
1159 |
|
---|
1160 | @end table
|
---|
1161 |
|
---|
1162 | In addition,
|
---|
1163 | two variant programs @command{egrep} and @command{fgrep} are available.
|
---|
1164 | @command{egrep} is the same as @samp{grep@ -E}.
|
---|
1165 | @command{fgrep} is the same as @samp{grep@ -F}.
|
---|
1166 | Direct invocation as either
|
---|
1167 | @command{egrep} or @command{fgrep} is deprecated,
|
---|
1168 | but is provided to allow historical applications
|
---|
1169 | that rely on them to run unmodified.
|
---|
1170 |
|
---|
1171 |
|
---|
1172 | @node Regular Expressions
|
---|
1173 | @chapter Regular Expressions
|
---|
1174 | @cindex regular expressions
|
---|
1175 |
|
---|
1176 | A @dfn{regular expression} is a pattern that describes a set of strings.
|
---|
1177 | Regular expressions are constructed analogously to arithmetic expressions,
|
---|
1178 | by using various operators to combine smaller expressions.
|
---|
1179 | @command{grep} understands
|
---|
1180 | three different versions of regular expression syntax:
|
---|
1181 | basic (BRE), extended (ERE), and Perl-compatible (PCRE).
|
---|
1182 | In GNU @command{grep},
|
---|
1183 | there is no difference in available functionality between the basic and
|
---|
1184 | extended syntaxes.
|
---|
1185 | In other implementations, basic regular expressions are less powerful.
|
---|
1186 | The following description applies to extended regular expressions;
|
---|
1187 | differences for basic regular expressions are summarized afterwards.
|
---|
1188 | Perl-compatible regular expressions give additional functionality, and
|
---|
1189 | are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual
|
---|
1190 | pages, but work only if PCRE is available in the system.
|
---|
1191 |
|
---|
1192 | @menu
|
---|
1193 | * Fundamental Structure::
|
---|
1194 | * Character Classes and Bracket Expressions::
|
---|
1195 | * The Backslash Character and Special Expressions::
|
---|
1196 | * Anchoring::
|
---|
1197 | * Back-references and Subexpressions::
|
---|
1198 | * Basic vs Extended::
|
---|
1199 | * Character Encoding::
|
---|
1200 | * Matching Non-ASCII::
|
---|
1201 | @end menu
|
---|
1202 |
|
---|
1203 | @node Fundamental Structure
|
---|
1204 | @section Fundamental Structure
|
---|
1205 |
|
---|
1206 | @cindex ordinary characters
|
---|
1207 | @cindex special characters
|
---|
1208 | In regular expressions, the characters @samp{.?*+@{|()[\^$} are
|
---|
1209 | @dfn{special characters} and have uses described below. All other
|
---|
1210 | characters are @dfn{ordinary characters}, and each ordinary character
|
---|
1211 | is a regular expression that matches itself.
|
---|
1212 |
|
---|
1213 | @opindex .
|
---|
1214 | @cindex dot
|
---|
1215 | @cindex period
|
---|
1216 | The period @samp{.} matches any single character.
|
---|
1217 | It is unspecified whether @samp{.} matches an encoding error.
|
---|
1218 |
|
---|
1219 | @cindex interval expressions
|
---|
1220 | A regular expression may be followed by one of several
|
---|
1221 | repetition operators; the operators beginning with @samp{@{}
|
---|
1222 | are called @dfn{interval expressions}.
|
---|
1223 |
|
---|
1224 | @table @samp
|
---|
1225 |
|
---|
1226 | @item ?
|
---|
1227 | @opindex ?
|
---|
1228 | @cindex question mark
|
---|
1229 | @cindex match expression at most once
|
---|
1230 | The preceding item is optional and is matched at most once.
|
---|
1231 |
|
---|
1232 | @item *
|
---|
1233 | @opindex *
|
---|
1234 | @cindex asterisk
|
---|
1235 | @cindex match expression zero or more times
|
---|
1236 | The preceding item is matched zero or more times.
|
---|
1237 |
|
---|
1238 | @item +
|
---|
1239 | @opindex +
|
---|
1240 | @cindex plus sign
|
---|
1241 | @cindex match expression one or more times
|
---|
1242 | The preceding item is matched one or more times.
|
---|
1243 |
|
---|
1244 | @item @{@var{n}@}
|
---|
1245 | @opindex @{@var{n}@}
|
---|
1246 | @cindex braces, one argument
|
---|
1247 | @cindex match expression @var{n} times
|
---|
1248 | The preceding item is matched exactly @var{n} times.
|
---|
1249 |
|
---|
1250 | @item @{@var{n},@}
|
---|
1251 | @opindex @{@var{n},@}
|
---|
1252 | @cindex braces, second argument omitted
|
---|
1253 | @cindex match expression @var{n} or more times
|
---|
1254 | The preceding item is matched @var{n} or more times.
|
---|
1255 |
|
---|
1256 | @item @{,@var{m}@}
|
---|
1257 | @opindex @{,@var{m}@}
|
---|
1258 | @cindex braces, first argument omitted
|
---|
1259 | @cindex match expression at most @var{m} times
|
---|
1260 | The preceding item is matched at most @var{m} times.
|
---|
1261 | This is a GNU extension.
|
---|
1262 |
|
---|
1263 | @item @{@var{n},@var{m}@}
|
---|
1264 | @opindex @{@var{n},@var{m}@}
|
---|
1265 | @cindex braces, two arguments
|
---|
1266 | @cindex match expression from @var{n} to @var{m} times
|
---|
1267 | The preceding item is matched at least @var{n} times, but not more than
|
---|
1268 | @var{m} times.
|
---|
1269 |
|
---|
1270 | @end table
|
---|
1271 |
|
---|
1272 | The empty regular expression matches the empty string.
|
---|
1273 | Two regular expressions may be concatenated;
|
---|
1274 | the resulting regular expression
|
---|
1275 | matches any string formed by concatenating two substrings
|
---|
1276 | that respectively match the concatenated expressions.
|
---|
1277 |
|
---|
1278 | Two regular expressions may be joined by the infix operator @samp{|};
|
---|
1279 | the resulting regular expression
|
---|
1280 | matches any string matching either alternate expression.
|
---|
1281 |
|
---|
1282 | Repetition takes precedence over concatenation,
|
---|
1283 | which in turn takes precedence over alternation.
|
---|
1284 | A whole expression may be enclosed in parentheses
|
---|
1285 | to override these precedence rules and form a subexpression.
|
---|
1286 | An unmatched @samp{)} matches just itself.
|
---|
1287 |
|
---|
1288 | @node Character Classes and Bracket Expressions
|
---|
1289 | @section Character Classes and Bracket Expressions
|
---|
1290 |
|
---|
1291 | @cindex bracket expression
|
---|
1292 | @cindex character class
|
---|
1293 | A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
|
---|
1294 | @samp{]}.
|
---|
1295 | It matches any single character in that list.
|
---|
1296 | If the first character of the list is the caret @samp{^},
|
---|
1297 | then it matches any character @strong{not} in the list,
|
---|
1298 | and it is unspecified whether it matches an encoding error.
|
---|
1299 | For example, the regular expression
|
---|
1300 | @samp{[0123456789]} matches any single digit,
|
---|
1301 | whereas @samp{[^()]} matches any single character that is not
|
---|
1302 | an opening or closing parenthesis, and might or might not match an
|
---|
1303 | encoding error.
|
---|
1304 |
|
---|
1305 | @cindex range expression
|
---|
1306 | Within a bracket expression, a @dfn{range expression} consists of two
|
---|
1307 | characters separated by a hyphen.
|
---|
1308 | It matches any single character that
|
---|
1309 | sorts between the two characters, inclusive.
|
---|
1310 | In the default C locale, the sorting sequence is the native character
|
---|
1311 | order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
|
---|
1312 | In other locales, the sorting sequence is not specified, and
|
---|
1313 | @samp{[a-d]} might be equivalent to @samp{[abcd]} or to
|
---|
1314 | @samp{[aBbCcDd]}, or it might fail to match any character, or the set of
|
---|
1315 | characters that it matches might even be erratic.
|
---|
1316 | To obtain the traditional interpretation
|
---|
1317 | of bracket expressions, you can use the @samp{C} locale by setting the
|
---|
1318 | @env{LC_ALL} environment variable to the value @samp{C}.
|
---|
1319 |
|
---|
1320 | Finally, certain named classes of characters are predefined within
|
---|
1321 | bracket expressions, as follows.
|
---|
1322 | Their interpretation depends on the @env{LC_CTYPE} locale;
|
---|
1323 | for example, @samp{[[:alnum:]]} means the character class of numbers and letters
|
---|
1324 | in the current locale.
|
---|
1325 |
|
---|
1326 | @cindex classes of characters
|
---|
1327 | @cindex character classes
|
---|
1328 | @table @samp
|
---|
1329 |
|
---|
1330 | @item [:alnum:]
|
---|
1331 | @opindex alnum @r{character class}
|
---|
1332 | @cindex alphanumeric characters
|
---|
1333 | Alphanumeric characters:
|
---|
1334 | @samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
|
---|
1335 | character encoding, this is the same as @samp{[0-9A-Za-z]}.
|
---|
1336 |
|
---|
1337 | @item [:alpha:]
|
---|
1338 | @opindex alpha @r{character class}
|
---|
1339 | @cindex alphabetic characters
|
---|
1340 | Alphabetic characters:
|
---|
1341 | @samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
|
---|
1342 | character encoding, this is the same as @samp{[A-Za-z]}.
|
---|
1343 |
|
---|
1344 | @item [:blank:]
|
---|
1345 | @opindex blank @r{character class}
|
---|
1346 | @cindex blank characters
|
---|
1347 | Blank characters:
|
---|
1348 | space and tab.
|
---|
1349 |
|
---|
1350 | @item [:cntrl:]
|
---|
1351 | @opindex cntrl @r{character class}
|
---|
1352 | @cindex control characters
|
---|
1353 | Control characters.
|
---|
1354 | In ASCII, these characters have octal codes 000
|
---|
1355 | through 037, and 177 (DEL).
|
---|
1356 | In other character sets, these are
|
---|
1357 | the equivalent characters, if any.
|
---|
1358 |
|
---|
1359 | @item [:digit:]
|
---|
1360 | @opindex digit @r{character class}
|
---|
1361 | @cindex digit characters
|
---|
1362 | @cindex numeric characters
|
---|
1363 | Digits: @code{0 1 2 3 4 5 6 7 8 9}.
|
---|
1364 |
|
---|
1365 | @item [:graph:]
|
---|
1366 | @opindex graph @r{character class}
|
---|
1367 | @cindex graphic characters
|
---|
1368 | Graphical characters:
|
---|
1369 | @samp{[:alnum:]} and @samp{[:punct:]}.
|
---|
1370 |
|
---|
1371 | @item [:lower:]
|
---|
1372 | @opindex lower @r{character class}
|
---|
1373 | @cindex lower-case letters
|
---|
1374 | Lower-case letters; in the @samp{C} locale and ASCII character
|
---|
1375 | encoding, this is
|
---|
1376 | @code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
|
---|
1377 |
|
---|
1378 | @item [:print:]
|
---|
1379 | @opindex print @r{character class}
|
---|
1380 | @cindex printable characters
|
---|
1381 | Printable characters:
|
---|
1382 | @samp{[:alnum:]}, @samp{[:punct:]}, and space.
|
---|
1383 |
|
---|
1384 | @item [:punct:]
|
---|
1385 | @opindex punct @r{character class}
|
---|
1386 | @cindex punctuation characters
|
---|
1387 | Punctuation characters; in the @samp{C} locale and ASCII character
|
---|
1388 | encoding, this is
|
---|
1389 | @code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}.
|
---|
1390 |
|
---|
1391 | @item [:space:]
|
---|
1392 | @opindex space @r{character class}
|
---|
1393 | @cindex space characters
|
---|
1394 | @cindex whitespace characters
|
---|
1395 | Space characters: in the @samp{C} locale, this is
|
---|
1396 | tab, newline, vertical tab, form feed, carriage return, and space.
|
---|
1397 | @xref{Usage}, for more discussion of matching newlines.
|
---|
1398 |
|
---|
1399 | @item [:upper:]
|
---|
1400 | @opindex upper @r{character class}
|
---|
1401 | @cindex upper-case letters
|
---|
1402 | Upper-case letters: in the @samp{C} locale and ASCII character
|
---|
1403 | encoding, this is
|
---|
1404 | @code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
|
---|
1405 |
|
---|
1406 | @item [:xdigit:]
|
---|
1407 | @opindex xdigit @r{character class}
|
---|
1408 | @cindex xdigit class
|
---|
1409 | @cindex hexadecimal digits
|
---|
1410 | Hexadecimal digits:
|
---|
1411 | @code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
|
---|
1412 |
|
---|
1413 | @end table
|
---|
1414 | Note that the brackets in these class names are
|
---|
1415 | part of the symbolic names, and must be included in addition to
|
---|
1416 | the brackets delimiting the bracket expression.
|
---|
1417 |
|
---|
1418 | @anchor{invalid-bracket-expr}
|
---|
1419 | If you mistakenly omit the outer brackets, and search for say, @samp{[:upper:]},
|
---|
1420 | GNU @command{grep} prints a diagnostic and exits with status 2, on
|
---|
1421 | the assumption that you did not intend to search for the nominally
|
---|
1422 | equivalent regular expression: @samp{[:epru]}.
|
---|
1423 | Set the @env{POSIXLY_CORRECT} environment variable to disable this feature.
|
---|
1424 |
|
---|
1425 | Special characters lose their special meaning inside bracket expressions.
|
---|
1426 |
|
---|
1427 | @table @samp
|
---|
1428 | @item ]
|
---|
1429 | ends the bracket expression if it's not the first list item.
|
---|
1430 | So, if you want to make the @samp{]} character a list item,
|
---|
1431 | you must put it first.
|
---|
1432 |
|
---|
1433 | @item [.
|
---|
1434 | represents the open collating symbol.
|
---|
1435 |
|
---|
1436 | @item .]
|
---|
1437 | represents the close collating symbol.
|
---|
1438 |
|
---|
1439 | @item [=
|
---|
1440 | represents the open equivalence class.
|
---|
1441 |
|
---|
1442 | @item =]
|
---|
1443 | represents the close equivalence class.
|
---|
1444 |
|
---|
1445 | @item [:
|
---|
1446 | represents the open character class symbol, and should be followed by a
|
---|
1447 | valid character class name.
|
---|
1448 |
|
---|
1449 | @item :]
|
---|
1450 | represents the close character class symbol.
|
---|
1451 |
|
---|
1452 | @item -
|
---|
1453 | represents the range if it's not first or last in a list or the ending point
|
---|
1454 | of a range.
|
---|
1455 |
|
---|
1456 | @item ^
|
---|
1457 | represents the characters not in the list.
|
---|
1458 | If you want to make the @samp{^}
|
---|
1459 | character a list item, place it anywhere but first.
|
---|
1460 |
|
---|
1461 | @end table
|
---|
1462 |
|
---|
1463 | @node The Backslash Character and Special Expressions
|
---|
1464 | @section The Backslash Character and Special Expressions
|
---|
1465 | @cindex backslash
|
---|
1466 |
|
---|
1467 | The @samp{\} character followed by a special character is a regular
|
---|
1468 | expression that matches the special character.
|
---|
1469 | The @samp{\} character,
|
---|
1470 | when followed by certain ordinary characters,
|
---|
1471 | takes a special meaning:
|
---|
1472 |
|
---|
1473 | @table @samp
|
---|
1474 |
|
---|
1475 | @item \b
|
---|
1476 | Match the empty string at the edge of a word.
|
---|
1477 |
|
---|
1478 | @item \B
|
---|
1479 | Match the empty string provided it's not at the edge of a word.
|
---|
1480 |
|
---|
1481 | @item \<
|
---|
1482 | Match the empty string at the beginning of a word.
|
---|
1483 |
|
---|
1484 | @item \>
|
---|
1485 | Match the empty string at the end of a word.
|
---|
1486 |
|
---|
1487 | @item \w
|
---|
1488 | Match word constituent, it is a synonym for @samp{[_[:alnum:]]}.
|
---|
1489 |
|
---|
1490 | @item \W
|
---|
1491 | Match non-word constituent, it is a synonym for @samp{[^_[:alnum:]]}.
|
---|
1492 |
|
---|
1493 | @item \s
|
---|
1494 | Match whitespace, it is a synonym for @samp{[[:space:]]}.
|
---|
1495 |
|
---|
1496 | @item \S
|
---|
1497 | Match non-whitespace, it is a synonym for @samp{[^[:space:]]}.
|
---|
1498 |
|
---|
1499 | @end table
|
---|
1500 |
|
---|
1501 | For example, @samp{\brat\b} matches the separate word @samp{rat},
|
---|
1502 | @samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}.
|
---|
1503 |
|
---|
1504 | @node Anchoring
|
---|
1505 | @section Anchoring
|
---|
1506 | @cindex anchoring
|
---|
1507 |
|
---|
1508 | The caret @samp{^} and the dollar sign @samp{$} are special characters that
|
---|
1509 | respectively match the empty string at the beginning and end of a line.
|
---|
1510 | They are termed @dfn{anchors}, since they force the match to be ``anchored''
|
---|
1511 | to beginning or end of a line, respectively.
|
---|
1512 |
|
---|
1513 | @node Back-references and Subexpressions
|
---|
1514 | @section Back-references and Subexpressions
|
---|
1515 | @cindex subexpression
|
---|
1516 | @cindex back-reference
|
---|
1517 |
|
---|
1518 | The back-reference @samp{\@var{n}},
|
---|
1519 | where @var{n} is a single nonzero digit, matches
|
---|
1520 | the substring previously matched by the @var{n}th parenthesized subexpression
|
---|
1521 | of the regular expression.
|
---|
1522 | For example, @samp{(a)\1} matches @samp{aa}.
|
---|
1523 | If the parenthesized subexpression does not participate in the match,
|
---|
1524 | the back-reference makes the whole match fail;
|
---|
1525 | for example, @samp{(a)*\1} fails to match @samp{a}.
|
---|
1526 | If the parenthesized subexpression matches more than one substring,
|
---|
1527 | the back-reference refers to the last matched substring;
|
---|
1528 | for example, @samp{^(ab*)*\1$} matches @samp{ababbabb} but not @samp{ababbab}.
|
---|
1529 | When multiple regular expressions are given with
|
---|
1530 | @option{-e} or from a file (@samp{-f @var{file}}),
|
---|
1531 | back-references are local to each expression.
|
---|
1532 |
|
---|
1533 | @xref{Known Bugs}, for some known problems with back-references.
|
---|
1534 |
|
---|
1535 | @node Basic vs Extended
|
---|
1536 | @section Basic vs Extended Regular Expressions
|
---|
1537 | @cindex basic regular expressions
|
---|
1538 |
|
---|
1539 | In basic regular expressions the characters @samp{?}, @samp{+},
|
---|
1540 | @samp{@{}, @samp{|}, @samp{(}, and @samp{)} lose their special meaning;
|
---|
1541 | instead use the backslashed versions @samp{\?}, @samp{\+}, @samp{\@{},
|
---|
1542 | @samp{\|}, @samp{\(}, and @samp{\)}. Also, a backslash is needed
|
---|
1543 | before an interval expression's closing @samp{@}}, and an unmatched
|
---|
1544 | @code{\)} is invalid.
|
---|
1545 |
|
---|
1546 | Portable scripts should avoid the following constructs, as
|
---|
1547 | POSIX says they produce undefined results:
|
---|
1548 |
|
---|
1549 | @itemize @bullet
|
---|
1550 | @item
|
---|
1551 | Extended regular expressions that use back-references.
|
---|
1552 | @item
|
---|
1553 | Basic regular expressions that use @samp{\?}, @samp{\+}, or @samp{\|}.
|
---|
1554 | @item
|
---|
1555 | Empty parenthesized regular expressions like @samp{()}.
|
---|
1556 | @item
|
---|
1557 | Empty alternatives (as in, e.g, @samp{a|}).
|
---|
1558 | @item
|
---|
1559 | Repetition operators that immediately follow empty expressions,
|
---|
1560 | unescaped @samp{$}, or other repetition operators.
|
---|
1561 | @item
|
---|
1562 | A backslash escaping an ordinary character (e.g., @samp{\S}),
|
---|
1563 | unless it is a back-reference.
|
---|
1564 | @item
|
---|
1565 | An unescaped @samp{[} that is not part of a bracket expression.
|
---|
1566 | @item
|
---|
1567 | In extended regular expressions, an unescaped @samp{@{} that is not
|
---|
1568 | part of an interval expression.
|
---|
1569 | @end itemize
|
---|
1570 |
|
---|
1571 | @cindex interval expressions
|
---|
1572 | Traditional @command{egrep} did not support interval expressions and
|
---|
1573 | some @command{egrep} implementations use @samp{\@{} and @samp{\@}} instead, so
|
---|
1574 | portable scripts should avoid interval expressions in @samp{grep@ -E} patterns
|
---|
1575 | and should use @samp{[@{]} to match a literal @samp{@{}.
|
---|
1576 |
|
---|
1577 | GNU @command{grep@ -E} attempts to support traditional usage by
|
---|
1578 | assuming that @samp{@{} is not special if it would be the start of an
|
---|
1579 | invalid interval expression.
|
---|
1580 | For example, the command
|
---|
1581 | @samp{grep@ -E@ '@{1'} searches for the two-character string @samp{@{1}
|
---|
1582 | instead of reporting a syntax error in the regular expression.
|
---|
1583 | POSIX allows this behavior as an extension, but portable scripts
|
---|
1584 | should avoid it.
|
---|
1585 |
|
---|
1586 | @node Character Encoding
|
---|
1587 | @section Character Encoding
|
---|
1588 | @cindex character encoding
|
---|
1589 |
|
---|
1590 | The @env{LC_CTYPE} locale specifies the encoding of characters in
|
---|
1591 | patterns and data, that is, whether text is encoded in UTF-8, ASCII,
|
---|
1592 | or some other encoding. @xref{Environment Variables}.
|
---|
1593 |
|
---|
1594 | In the @samp{C} or @samp{POSIX} locale, every character is encoded as
|
---|
1595 | a single byte and every byte is a valid character. In more-complex
|
---|
1596 | encodings such as UTF-8, a sequence of multiple bytes may be needed to
|
---|
1597 | represent a character, and some bytes may be encoding errors that do
|
---|
1598 | not contribute to the representation of any character. POSIX does not
|
---|
1599 | specify the behavior of @command{grep} when patterns or input data
|
---|
1600 | contain encoding errors or null characters, so portable scripts should
|
---|
1601 | avoid such usage. As an extension to POSIX, GNU @command{grep} treats
|
---|
1602 | null characters like any other character. However, unless the
|
---|
1603 | @option{-a} (@option{--binary-files=text}) option is used, the
|
---|
1604 | presence of null characters in input or of encoding errors in output
|
---|
1605 | causes GNU @command{grep} to treat the file as binary and suppress
|
---|
1606 | details about matches. @xref{File and Directory Selection}.
|
---|
1607 |
|
---|
1608 | Regardless of locale, the 103 characters in the POSIX Portable
|
---|
1609 | Character Set (a subset of ASCII) are always encoded as a single byte,
|
---|
1610 | and the 128 ASCII characters have their usual single-byte encodings on
|
---|
1611 | all but oddball platforms.
|
---|
1612 |
|
---|
1613 | @node Matching Non-ASCII
|
---|
1614 | @section Matching Non-ASCII and Non-printable Characters
|
---|
1615 | @cindex non-ASCII matching
|
---|
1616 | @cindex non-printable matching
|
---|
1617 |
|
---|
1618 | In a regular expression, non-ASCII and non-printable characters other
|
---|
1619 | than newline are not special, and represent themselves. For example,
|
---|
1620 | in a locale using UTF-8 the command @samp{grep 'Λ@tie{}ω'} (where the
|
---|
1621 | white space between @samp{Λ} and the @samp{ω} is a tab character)
|
---|
1622 | searches for @samp{Λ} (Unicode character U+039B GREEK CAPITAL LETTER
|
---|
1623 | LAMBDA), followed by a tab (U+0009 TAB), followed by @samp{ω} (U+03C9
|
---|
1624 | GREEK SMALL LETTER OMEGA).
|
---|
1625 |
|
---|
1626 | Suppose you want to limit your pattern to only printable characters
|
---|
1627 | (or even only printable ASCII characters) to keep your script readable
|
---|
1628 | or portable, but you also want to match specific non-ASCII or non-null
|
---|
1629 | non-printable characters. If you are using the @option{-P}
|
---|
1630 | (@option{--perl-regexp}) option, PCREs give you several ways to do
|
---|
1631 | this. Otherwise, if you are using Bash, the GNU project's shell, you
|
---|
1632 | can represent these characters via ANSI-C quoting. For example, the
|
---|
1633 | Bash commands @samp{grep $'Λ\tω'} and @samp{grep $'\u039B\t\u03C9'}
|
---|
1634 | both search for the same three-character string @samp{Λ@tie{}ω}
|
---|
1635 | mentioned earlier. However, because Bash translates ANSI-C quoting
|
---|
1636 | before @command{grep} sees the pattern, this technique should not be
|
---|
1637 | used to match printable ASCII characters; for example, @samp{grep
|
---|
1638 | $'\u005E'} is equivalent to @samp{grep '^'} and matches any line, not
|
---|
1639 | just lines containing the character @samp{^} (U+005E CIRCUMFLEX
|
---|
1640 | ACCENT).
|
---|
1641 |
|
---|
1642 | Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable
|
---|
1643 | shell scripts written in ASCII should use other methods to match
|
---|
1644 | specific non-ASCII characters. For example, in a UTF-8 locale the
|
---|
1645 | command @samp{grep "$(printf '\316\233\t\317\211\n')"} is a portable
|
---|
1646 | albeit hard-to-read alternative to Bash's @samp{grep $'Λ\tω'}.
|
---|
1647 | However, none of these techniques will let you put a null character
|
---|
1648 | directly into a command-line pattern; null characters can appear only
|
---|
1649 | in a pattern specified via the @option{-f} (@option{--file}) option.
|
---|
1650 |
|
---|
1651 | @node Usage
|
---|
1652 | @chapter Usage
|
---|
1653 |
|
---|
1654 | @cindex usage, examples
|
---|
1655 | Here is an example command that invokes GNU @command{grep}:
|
---|
1656 |
|
---|
1657 | @example
|
---|
1658 | grep -i 'hello.*world' menu.h main.c
|
---|
1659 | @end example
|
---|
1660 |
|
---|
1661 | @noindent
|
---|
1662 | This lists all lines in the files @file{menu.h} and @file{main.c} that
|
---|
1663 | contain the string @samp{hello} followed by the string @samp{world};
|
---|
1664 | this is because @samp{.*} matches zero or more characters within a line.
|
---|
1665 | @xref{Regular Expressions}.
|
---|
1666 | The @option{-i} option causes @command{grep}
|
---|
1667 | to ignore case, causing it to match the line @samp{Hello, world!}, which
|
---|
1668 | it would not otherwise match.
|
---|
1669 |
|
---|
1670 | Here is a more complex example,
|
---|
1671 | showing the location and contents of any line
|
---|
1672 | containing @samp{f} and ending in @samp{.c},
|
---|
1673 | within all files in the current directory whose names
|
---|
1674 | start with non-@samp{.}, contain @samp{g}, and end in @samp{.h}.
|
---|
1675 | The @option{-n} option outputs line numbers, the @option{--} argument
|
---|
1676 | treats any later arguments as file names not options even if
|
---|
1677 | @code{*g*.h} expands to a file name that starts with @samp{-},
|
---|
1678 | and the empty file @file{/dev/null} causes file names to be output
|
---|
1679 | even if only one file name happens to be of the form @samp{*g*.h}.
|
---|
1680 |
|
---|
1681 | @example
|
---|
1682 | grep -n -- 'f.*\.c$' *g*.h /dev/null
|
---|
1683 | @end example
|
---|
1684 |
|
---|
1685 | @noindent
|
---|
1686 | Note that the regular expression syntax used in the pattern differs
|
---|
1687 | from the globbing syntax that the shell uses to match file names.
|
---|
1688 |
|
---|
1689 | @xref{Invoking}, for more details about
|
---|
1690 | how to invoke @command{grep}.
|
---|
1691 |
|
---|
1692 | @cindex using @command{grep}, Q&A
|
---|
1693 | @cindex FAQ about @command{grep} usage
|
---|
1694 | Here are some common questions and answers about @command{grep} usage.
|
---|
1695 |
|
---|
1696 | @enumerate
|
---|
1697 |
|
---|
1698 | @item
|
---|
1699 | How can I list just the names of matching files?
|
---|
1700 |
|
---|
1701 | @example
|
---|
1702 | grep -l 'main' test-*.c
|
---|
1703 | @end example
|
---|
1704 |
|
---|
1705 | @noindent
|
---|
1706 | lists names of @samp{test-*.c} files in the current directory whose contents
|
---|
1707 | mention @samp{main}.
|
---|
1708 |
|
---|
1709 | @item
|
---|
1710 | How do I search directories recursively?
|
---|
1711 |
|
---|
1712 | @example
|
---|
1713 | grep -r 'hello' /home/gigi
|
---|
1714 | @end example
|
---|
1715 |
|
---|
1716 | @noindent
|
---|
1717 | searches for @samp{hello} in all files
|
---|
1718 | under the @file{/home/gigi} directory.
|
---|
1719 | For more control over which files are searched,
|
---|
1720 | use @command{find} and @command{grep}.
|
---|
1721 | For example, the following command searches only C files:
|
---|
1722 |
|
---|
1723 | @example
|
---|
1724 | find /home/gigi -name '*.c' ! -type d \
|
---|
1725 | -exec grep -H 'hello' '@{@}' +
|
---|
1726 | @end example
|
---|
1727 |
|
---|
1728 | This differs from the command:
|
---|
1729 |
|
---|
1730 | @example
|
---|
1731 | grep -H 'hello' /home/gigi/*.c
|
---|
1732 | @end example
|
---|
1733 |
|
---|
1734 | which merely looks for @samp{hello} in non-hidden C files in
|
---|
1735 | @file{/home/gigi} whose names end in @samp{.c}.
|
---|
1736 | The @command{find} command line above is more similar to the command:
|
---|
1737 |
|
---|
1738 | @example
|
---|
1739 | grep -r --include='*.c' 'hello' /home/gigi
|
---|
1740 | @end example
|
---|
1741 |
|
---|
1742 | @item
|
---|
1743 | What if a pattern or file has a leading @samp{-}?
|
---|
1744 |
|
---|
1745 | @example
|
---|
1746 | grep -- '--cut here--' *
|
---|
1747 | @end example
|
---|
1748 |
|
---|
1749 | @noindent
|
---|
1750 | searches for all lines matching @samp{--cut here--}.
|
---|
1751 | Without @option{--},
|
---|
1752 | @command{grep} would attempt to parse @samp{--cut here--} as a list of
|
---|
1753 | options, and there would be similar problems with any file names
|
---|
1754 | beginning with @samp{-}.
|
---|
1755 |
|
---|
1756 | Alternatively, you can prevent misinterpretation of leading @samp{-}
|
---|
1757 | by using @option{-e} for patterns and leading @samp{./} for files:
|
---|
1758 |
|
---|
1759 | @example
|
---|
1760 | grep -e '--cut here--' ./*
|
---|
1761 | @end example
|
---|
1762 |
|
---|
1763 | @item
|
---|
1764 | Suppose I want to search for a whole word, not a part of a word?
|
---|
1765 |
|
---|
1766 | @example
|
---|
1767 | grep -w 'hello' test*.log
|
---|
1768 | @end example
|
---|
1769 |
|
---|
1770 | @noindent
|
---|
1771 | searches only for instances of @samp{hello} that are entire words;
|
---|
1772 | it does not match @samp{Othello}.
|
---|
1773 | For more control, use @samp{\<} and
|
---|
1774 | @samp{\>} to match the start and end of words.
|
---|
1775 | For example:
|
---|
1776 |
|
---|
1777 | @example
|
---|
1778 | grep 'hello\>' test*.log
|
---|
1779 | @end example
|
---|
1780 |
|
---|
1781 | @noindent
|
---|
1782 | searches only for words ending in @samp{hello}, so it matches the word
|
---|
1783 | @samp{Othello}.
|
---|
1784 |
|
---|
1785 | @item
|
---|
1786 | How do I output context around the matching lines?
|
---|
1787 |
|
---|
1788 | @example
|
---|
1789 | grep -C 2 'hello' test*.log
|
---|
1790 | @end example
|
---|
1791 |
|
---|
1792 | @noindent
|
---|
1793 | prints two lines of context around each matching line.
|
---|
1794 |
|
---|
1795 | @item
|
---|
1796 | How do I force @command{grep} to print the name of the file?
|
---|
1797 |
|
---|
1798 | Append @file{/dev/null}:
|
---|
1799 |
|
---|
1800 | @example
|
---|
1801 | grep 'eli' /etc/passwd /dev/null
|
---|
1802 | @end example
|
---|
1803 |
|
---|
1804 | gets you:
|
---|
1805 |
|
---|
1806 | @example
|
---|
1807 | /etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash
|
---|
1808 | @end example
|
---|
1809 |
|
---|
1810 | Alternatively, use @option{-H}, which is a GNU extension:
|
---|
1811 |
|
---|
1812 | @example
|
---|
1813 | grep -H 'eli' /etc/passwd
|
---|
1814 | @end example
|
---|
1815 |
|
---|
1816 | @item
|
---|
1817 | Why do people use strange regular expressions on @command{ps} output?
|
---|
1818 |
|
---|
1819 | @example
|
---|
1820 | ps -ef | grep '[c]ron'
|
---|
1821 | @end example
|
---|
1822 |
|
---|
1823 | If the pattern had been written without the square brackets, it would
|
---|
1824 | have matched not only the @command{ps} output line for @command{cron},
|
---|
1825 | but also the @command{ps} output line for @command{grep}.
|
---|
1826 | Note that on some platforms,
|
---|
1827 | @command{ps} limits the output to the width of the screen;
|
---|
1828 | @command{grep} does not have any limit on the length of a line
|
---|
1829 | except the available memory.
|
---|
1830 |
|
---|
1831 | @item
|
---|
1832 | Why does @command{grep} report ``Binary file matches''?
|
---|
1833 |
|
---|
1834 | If @command{grep} listed all matching ``lines'' from a binary file, it
|
---|
1835 | would probably generate output that is not useful, and it might even
|
---|
1836 | muck up your display.
|
---|
1837 | So GNU @command{grep} suppresses output from
|
---|
1838 | files that appear to be binary files.
|
---|
1839 | To force GNU @command{grep}
|
---|
1840 | to output lines even from files that appear to be binary, use the
|
---|
1841 | @option{-a} or @samp{--binary-files=text} option.
|
---|
1842 | To eliminate the
|
---|
1843 | ``Binary file matches'' messages, use the @option{-I} or
|
---|
1844 | @samp{--binary-files=without-match} option,
|
---|
1845 | or the @option{-s} or @option{--no-messages} option.
|
---|
1846 |
|
---|
1847 | @item
|
---|
1848 | Why doesn't @samp{grep -lv} print non-matching file names?
|
---|
1849 |
|
---|
1850 | @samp{grep -lv} lists the names of all files containing one or more
|
---|
1851 | lines that do not match.
|
---|
1852 | To list the names of all files that contain no
|
---|
1853 | matching lines, use the @option{-L} or @option{--files-without-match}
|
---|
1854 | option.
|
---|
1855 |
|
---|
1856 | @item
|
---|
1857 | I can do ``OR'' with @samp{|}, but what about ``AND''?
|
---|
1858 |
|
---|
1859 | @example
|
---|
1860 | grep 'paul' /etc/motd | grep 'franc,ois'
|
---|
1861 | @end example
|
---|
1862 |
|
---|
1863 | @noindent
|
---|
1864 | finds all lines that contain both @samp{paul} and @samp{franc,ois}.
|
---|
1865 |
|
---|
1866 | @item
|
---|
1867 | Why does the empty pattern match every input line?
|
---|
1868 |
|
---|
1869 | The @command{grep} command searches for lines that contain strings
|
---|
1870 | that match a pattern. Every line contains the empty string, so an
|
---|
1871 | empty pattern causes @command{grep} to find a match on each line. It
|
---|
1872 | is not the only such pattern: @samp{^}, @samp{$}, and many
|
---|
1873 | other patterns cause @command{grep} to match every line.
|
---|
1874 |
|
---|
1875 | To match empty lines, use the pattern @samp{^$}. To match blank
|
---|
1876 | lines, use the pattern @samp{^[[:blank:]]*$}. To match no lines at
|
---|
1877 | all, use the command @samp{grep -f /dev/null}.
|
---|
1878 |
|
---|
1879 | @item
|
---|
1880 | How can I search in both standard input and in files?
|
---|
1881 |
|
---|
1882 | Use the special file name @samp{-}:
|
---|
1883 |
|
---|
1884 | @example
|
---|
1885 | cat /etc/passwd | grep 'alain' - /etc/motd
|
---|
1886 | @end example
|
---|
1887 |
|
---|
1888 | @item
|
---|
1889 | Why is this back-reference failing?
|
---|
1890 |
|
---|
1891 | @example
|
---|
1892 | echo 'ba' | grep -E '(a)\1|b\1'
|
---|
1893 | @end example
|
---|
1894 |
|
---|
1895 | This outputs an error message, because the second @samp{\1}
|
---|
1896 | has nothing to refer back to, meaning it will never match anything.
|
---|
1897 |
|
---|
1898 | @item
|
---|
1899 | How can I match across lines?
|
---|
1900 |
|
---|
1901 | Standard grep cannot do this, as it is fundamentally line-based.
|
---|
1902 | Therefore, merely using the @code{[:space:]} character class does not
|
---|
1903 | match newlines in the way you might expect.
|
---|
1904 |
|
---|
1905 | With the GNU @command{grep} option @option{-z} (@option{--null-data}), each
|
---|
1906 | input and output ``line'' is null-terminated; @pxref{Other Options}. Thus,
|
---|
1907 | you can match newlines in the input, but typically if there is a match
|
---|
1908 | the entire input is output, so this usage is often combined with
|
---|
1909 | output-suppressing options like @option{-q}, e.g.:
|
---|
1910 |
|
---|
1911 | @example
|
---|
1912 | printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
|
---|
1913 | @end example
|
---|
1914 |
|
---|
1915 | If this does not suffice, you can transform the input
|
---|
1916 | before giving it to @command{grep}, or turn to @command{awk},
|
---|
1917 | @command{sed}, @command{perl}, or many other utilities that are
|
---|
1918 | designed to operate across lines.
|
---|
1919 |
|
---|
1920 | @item
|
---|
1921 | What do @command{grep}, @command{fgrep}, and @command{egrep} stand for?
|
---|
1922 |
|
---|
1923 | The name @command{grep} comes from the way line editing was done on Unix.
|
---|
1924 | For example,
|
---|
1925 | @command{ed} uses the following syntax
|
---|
1926 | to print a list of matching lines on the screen:
|
---|
1927 |
|
---|
1928 | @example
|
---|
1929 | global/regular expression/print
|
---|
1930 | g/re/p
|
---|
1931 | @end example
|
---|
1932 |
|
---|
1933 | @command{fgrep} stands for Fixed @command{grep};
|
---|
1934 | @command{egrep} stands for Extended @command{grep}.
|
---|
1935 |
|
---|
1936 | @end enumerate
|
---|
1937 |
|
---|
1938 |
|
---|
1939 | @node Performance
|
---|
1940 | @chapter Performance
|
---|
1941 |
|
---|
1942 | @cindex performance
|
---|
1943 | Typically @command{grep} is an efficient way to search text. However,
|
---|
1944 | it can be quite slow in some cases, and it can search large files
|
---|
1945 | where even minor performance tweaking can help significantly.
|
---|
1946 | Although the algorithm used by @command{grep} is an implementation
|
---|
1947 | detail that can change from release to release, understanding its
|
---|
1948 | basic strengths and weaknesses can help you improve its performance.
|
---|
1949 |
|
---|
1950 | The @command{grep} command operates partly via a set of automata that
|
---|
1951 | are designed for efficiency, and partly via a slower matcher that
|
---|
1952 | takes over when the fast matchers run into unusual features like
|
---|
1953 | back-references. When feasible, the Boyer--Moore fast string
|
---|
1954 | searching algorithm is used to match a single fixed pattern, and the
|
---|
1955 | Aho--Corasick algorithm is used to match multiple fixed patterns.
|
---|
1956 |
|
---|
1957 | @cindex locales
|
---|
1958 | Generally speaking @command{grep} operates more efficiently in
|
---|
1959 | single-byte locales, since it can avoid the special processing needed
|
---|
1960 | for multi-byte characters. If your patterns will work just as well
|
---|
1961 | that way, setting @env{LC_ALL} to a single-byte locale can help
|
---|
1962 | performance considerably. Setting @samp{LC_ALL='C'} can be
|
---|
1963 | particularly efficient, as @command{grep} is tuned for that locale.
|
---|
1964 |
|
---|
1965 | @cindex case insensitive search
|
---|
1966 | Outside the @samp{C} locale, case-insensitive search, and search for
|
---|
1967 | bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be
|
---|
1968 | surprisingly inefficient due to difficulties in fast portable access to
|
---|
1969 | concepts like multi-character collating elements.
|
---|
1970 |
|
---|
1971 | @cindex back-references
|
---|
1972 | A back-reference such as @samp{\1} can hurt performance significantly
|
---|
1973 | in some cases, since back-references cannot in general be implemented
|
---|
1974 | via a finite state automaton, and instead trigger a backtracking
|
---|
1975 | algorithm that can be quite inefficient. For example, although the
|
---|
1976 | pattern @samp{^(.*)\1@{14@}(.*)\2@{13@}$} matches only lines whose
|
---|
1977 | lengths can be written as a sum @math{15x + 14y} for nonnegative
|
---|
1978 | integers @math{x} and @math{y}, the pattern matcher does not perform
|
---|
1979 | linear Diophantine analysis and instead backtracks through all
|
---|
1980 | possible matching strings, using an algorithm that is exponential in
|
---|
1981 | the worst case.
|
---|
1982 |
|
---|
1983 | @cindex holes in files
|
---|
1984 | On some operating systems that support files with holes---large
|
---|
1985 | regions of zeros that are not physically present on secondary
|
---|
1986 | storage---@command{grep} can skip over the holes efficiently without
|
---|
1987 | needing to read the zeros. This optimization is not available if the
|
---|
1988 | @option{-a} (@option{--binary-files=text}) option is used (@pxref{File and
|
---|
1989 | Directory Selection}), unless the @option{-z} (@option{--null-data})
|
---|
1990 | option is also used (@pxref{Other Options}).
|
---|
1991 |
|
---|
1992 | For more about the algorithms used by @command{grep} and about
|
---|
1993 | related string matching algorithms, see:
|
---|
1994 |
|
---|
1995 | @frenchspacing on
|
---|
1996 | @itemize @bullet
|
---|
1997 | @item
|
---|
1998 | Aho AV. Algorithms for finding patterns in strings.
|
---|
1999 | In: van Leeuwen J. @emph{Handbook of Theoretical Computer Science}, vol. A.
|
---|
2000 | New York: Elsevier; 1990. p. 255--300.
|
---|
2001 | This surveys classic string matching algorithms, some of which are
|
---|
2002 | used by @command{grep}.
|
---|
2003 |
|
---|
2004 | @item
|
---|
2005 | Aho AV, Corasick MJ. Efficient string matching: an aid to bibliographic search.
|
---|
2006 | @emph{CACM}. 1975;18(6):333--40.
|
---|
2007 | @url{https://dx.doi.org/10.1145/360825.360855}.
|
---|
2008 | This introduces the Aho--Corasick algorithm.
|
---|
2009 |
|
---|
2010 | @item
|
---|
2011 | Boyer RS, Moore JS. A fast string searching algorithm.
|
---|
2012 | @emph{CACM}. 1977;20(10):762--72.
|
---|
2013 | @url{https://dx.doi.org/10.1145/359842.359859}.
|
---|
2014 | This introduces the Boyer--Moore algorithm.
|
---|
2015 |
|
---|
2016 | @item
|
---|
2017 | Faro S, Lecroq T. The exact online string matching problem: a review
|
---|
2018 | of the most recent results.
|
---|
2019 | @emph{ACM Comput Surv}. 2013;45(2):13.
|
---|
2020 | @url{https://dx.doi.org/10.1145/2431211.2431212}.
|
---|
2021 | This surveys string matching algorithms that might help improve the
|
---|
2022 | performance of @command{grep} in the future.
|
---|
2023 | @end itemize
|
---|
2024 | @frenchspacing off
|
---|
2025 |
|
---|
2026 | @node Reporting Bugs
|
---|
2027 | @chapter Reporting bugs
|
---|
2028 |
|
---|
2029 | @cindex bugs, reporting
|
---|
2030 | Bug reports can be found at the
|
---|
2031 | @url{https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep,
|
---|
2032 | GNU bug report logs for @command{grep}}.
|
---|
2033 | If you find a bug not listed there, please email it to
|
---|
2034 | @email{bug-grep@@gnu.org} to create a new bug report.
|
---|
2035 |
|
---|
2036 | @menu
|
---|
2037 | * Known Bugs::
|
---|
2038 | @end menu
|
---|
2039 |
|
---|
2040 | @node Known Bugs
|
---|
2041 | @section Known Bugs
|
---|
2042 | @cindex Bugs, known
|
---|
2043 |
|
---|
2044 | Large repetition counts in the @samp{@{n,m@}} construct may cause
|
---|
2045 | @command{grep} to use lots of memory.
|
---|
2046 | In addition, certain other
|
---|
2047 | obscure regular expressions require exponential time and
|
---|
2048 | space, and may cause @command{grep} to run out of memory.
|
---|
2049 |
|
---|
2050 | Back-references can greatly slow down matching, as they can generate
|
---|
2051 | exponentially many matching possibilities that can consume both time
|
---|
2052 | and memory to explore. Also, the POSIX specification for
|
---|
2053 | back-references is at times unclear. Furthermore, many regular
|
---|
2054 | expression implementations have back-reference bugs that can cause
|
---|
2055 | programs to return incorrect answers or even crash, and fixing these
|
---|
2056 | bugs has often been low-priority: for example, as of 2021 the
|
---|
2057 | @url{https://sourceware.org/bugzilla/,GNU C library bug database}
|
---|
2058 | contained back-reference bugs
|
---|
2059 | @url{https://sourceware.org/bugzilla/show_bug.cgi?id=52,,52},
|
---|
2060 | @url{https://sourceware.org/bugzilla/show_bug.cgi?id=10844,,10844},
|
---|
2061 | @url{https://sourceware.org/bugzilla/show_bug.cgi?id=11053,,11053},
|
---|
2062 | @url{https://sourceware.org/bugzilla/show_bug.cgi?id=24269,,24269}
|
---|
2063 | and @url{https://sourceware.org/bugzilla/show_bug.cgi?id=25322,,25322},
|
---|
2064 | with little sign of forthcoming fixes. Luckily,
|
---|
2065 | back-references are rarely useful and it should be little trouble to
|
---|
2066 | avoid them in practical applications.
|
---|
2067 |
|
---|
2068 |
|
---|
2069 | @node Copying
|
---|
2070 | @chapter Copying
|
---|
2071 | @cindex copying
|
---|
2072 |
|
---|
2073 | GNU @command{grep} is licensed under the GNU GPL, which makes it @dfn{free
|
---|
2074 | software}.
|
---|
2075 |
|
---|
2076 | The ``free'' in ``free software'' refers to liberty, not price. As
|
---|
2077 | some GNU project advocates like to point out, think of ``free speech''
|
---|
2078 | rather than ``free beer''. In short, you have the right (freedom) to
|
---|
2079 | run and change @command{grep} and distribute it to other people, and---if you
|
---|
2080 | want---charge money for doing either. The important restriction is
|
---|
2081 | that you have to grant your recipients the same rights and impose the
|
---|
2082 | same restrictions.
|
---|
2083 |
|
---|
2084 | This general method of licensing software is sometimes called
|
---|
2085 | @dfn{open source}. The GNU project prefers the term ``free software''
|
---|
2086 | for reasons outlined at
|
---|
2087 | @url{https://www.gnu.org/philosophy/open-source-misses-the-point.html}.
|
---|
2088 |
|
---|
2089 | This manual is free documentation in the same sense. The
|
---|
2090 | documentation license is included below. The license for the program
|
---|
2091 | is available with the source code, or at
|
---|
2092 | @url{https://www.gnu.org/licenses/gpl.html}.
|
---|
2093 |
|
---|
2094 | @menu
|
---|
2095 | * GNU Free Documentation License::
|
---|
2096 | @end menu
|
---|
2097 |
|
---|
2098 | @node GNU Free Documentation License
|
---|
2099 | @section GNU Free Documentation License
|
---|
2100 |
|
---|
2101 | @include fdl.texi
|
---|
2102 |
|
---|
2103 |
|
---|
2104 | @node Index
|
---|
2105 | @unnumbered Index
|
---|
2106 |
|
---|
2107 | @printindex cp
|
---|
2108 |
|
---|
2109 | @bye
|
---|