VirtualBox

source: kBuild/vendor/grep/3.7/doc/grep.texi@ 3529

Last change on this file since 3529 was 3529, checked in by bird, 3 years ago

Imported grep 3.7 from grep-3.7.tar.gz (sha256: c22b0cf2d4f6bbe599c902387e8058990e1eee99aef333a203829e5fd3dbb342), applying minimal auto-props.

File size: 70.2 KB
Line 
1\input texinfo @c -*-texinfo-*-
2@c %**start of header
3@setfilename grep.info
4@include version.texi
5@settitle GNU Grep @value{VERSION}
6
7@c Combine indices.
8@syncodeindex ky cp
9@syncodeindex pg cp
10@syncodeindex tp cp
11@defcodeindex op
12@syncodeindex op cp
13@syncodeindex vr cp
14@c %**end of header
15
16@documentencoding UTF-8
17@c These two require Texinfo 5.0 or later, so use the older
18@c equivalent @set variables supported in 4.11 and later.
19@ignore
20@codequotebacktick on
21@codequoteundirected on
22@end ignore
23@set txicodequoteundirected
24@set txicodequotebacktick
25@iftex
26@c TeX sometimes fails to hyphenate, so help it here.
27@hyphenation{spec-i-fied}
28@end iftex
29
30@copying
31This manual is for @command{grep}, a pattern matching engine.
32
33Copyright @copyright{} 1999--2002, 2005, 2008--2021 Free Software Foundation,
34Inc.
35
36@quotation
37Permission is granted to copy, distribute and/or modify this document
38under the terms of the GNU Free Documentation License, Version 1.3 or
39any later version published by the Free Software Foundation; with no
40Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
41Texts. A copy of the license is included in the section entitled
42``GNU Free Documentation License''.
43@end quotation
44@end copying
45
46@dircategory Text creation and manipulation
47@direntry
48* grep: (grep). Print lines that match patterns.
49@end direntry
50
51@titlepage
52@title GNU Grep: Print lines that match patterns
53@subtitle version @value{VERSION}, @value{UPDATED}
54@author Alain Magloire et al.
55@page
56@vskip 0pt plus 1filll
57@insertcopying
58@end titlepage
59
60@contents
61
62
63@ifnottex
64@node Top
65@top grep
66
67@command{grep} prints lines that contain a match for one or more patterns.
68
69This manual is for version @value{VERSION} of GNU Grep.
70
71@insertcopying
72@end ifnottex
73
74@menu
75* Introduction:: Introduction.
76* Invoking:: Command-line options, environment, exit status.
77* Regular Expressions:: Regular Expressions.
78* Usage:: Examples.
79* Performance:: Performance tuning.
80* Reporting Bugs:: Reporting Bugs.
81* Copying:: License terms for this manual.
82* Index:: Combined index.
83@end menu
84
85
86@node Introduction
87@chapter Introduction
88
89@cindex searching for patterns
90
91Given one or more patterns, @command{grep} searches input files
92for matches to the patterns.
93When it finds a match in a line,
94it copies the line to standard output (by default),
95or produces whatever other sort of output you have requested with options.
96
97Though @command{grep} expects to do the matching on text,
98it has no limits on input line length other than available memory,
99and it can match arbitrary characters within a line.
100If the final byte of an input file is not a newline,
101@command{grep} silently supplies one.
102Since newline is also a separator for the list of patterns,
103there is no way to match newline characters in a text.
104
105
106@node Invoking
107@chapter Invoking @command{grep}
108
109The general synopsis of the @command{grep} command line is
110
111@example
112grep [@var{option}...] [@var{patterns}] [@var{file}...]
113@end example
114
115@noindent
116There can be zero or more @var{option} arguments, and zero or more
117@var{file} arguments. The @var{patterns} argument contains one or
118more patterns separated by newlines, and is omitted when patterns are
119given via the @samp{-e@ @var{patterns}} or @samp{-f@ @var{file}}
120options. Typically @var{patterns} should be quoted when
121@command{grep} is used in a shell command.
122
123@menu
124* Command-line Options:: Short and long names, grouped by category.
125* Environment Variables:: POSIX, GNU generic, and GNU grep specific.
126* Exit Status:: Exit status returned by @command{grep}.
127* grep Programs:: @command{grep} programs.
128@end menu
129
130@node Command-line Options
131@section Command-line Options
132
133@command{grep} comes with a rich set of options:
134some from POSIX and some being GNU extensions.
135Long option names are always a GNU extension,
136even for options that are from POSIX specifications.
137Options that are specified by POSIX,
138under their short names,
139are explicitly marked as such
140to facilitate POSIX-portable programming.
141A few option names are provided
142for compatibility with older or more exotic implementations.
143
144@menu
145* Generic Program Information::
146* Matching Control::
147* General Output Control::
148* Output Line Prefix Control::
149* Context Line Control::
150* File and Directory Selection::
151* Other Options::
152@end menu
153
154Several additional options control
155which variant of the @command{grep} matching engine is used.
156@xref{grep Programs}.
157
158@node Generic Program Information
159@subsection Generic Program Information
160
161@table @option
162
163@item --help
164@opindex --help
165@cindex usage summary, printing
166Print a usage message briefly summarizing the command-line options
167and the bug-reporting address, then exit.
168
169@item -V
170@itemx --version
171@opindex -V
172@opindex --version
173@cindex version, printing
174Print the version number of @command{grep} to the standard output stream.
175This version number should be included in all bug reports.
176
177@end table
178
179@node Matching Control
180@subsection Matching Control
181
182@table @option
183
184@item -e @var{patterns}
185@itemx --regexp=@var{patterns}
186@opindex -e
187@opindex --regexp=@var{patterns}
188@cindex patterns option
189Use @var{patterns} as one or more patterns; newlines within
190@var{patterns} separate each pattern from the next.
191If this option is used multiple times or is combined with the
192@option{-f} (@option{--file}) option, search for all patterns given.
193Typically @var{patterns} should be quoted when @command{grep} is used
194in a shell command.
195(@option{-e} is specified by POSIX.)
196
197@item -f @var{file}
198@itemx --file=@var{file}
199@opindex -f
200@opindex --file
201@cindex patterns from file
202Obtain patterns from @var{file}, one per line.
203If this option is used multiple times or is combined with the
204@option{-e} (@option{--regexp}) option, search for all patterns given.
205The empty file contains zero patterns, and therefore matches nothing.
206(@option{-f} is specified by POSIX.)
207
208@item -i
209@itemx -y
210@itemx --ignore-case
211@opindex -i
212@opindex -y
213@opindex --ignore-case
214@cindex case insensitive search
215Ignore case distinctions in patterns and input data,
216so that characters that differ only in case
217match each other. Although this is straightforward when letters
218differ in case only via lowercase-uppercase pairs, the behavior is
219unspecified in other situations. For example, uppercase ``S'' has an
220unusual lowercase counterpart ``ſ'' (Unicode character U+017F, LATIN
221SMALL LETTER LONG S) in many locales, and it is unspecified whether
222this unusual character matches ``S'' or ``s'' even though uppercasing
223it yields ``S''. Another example: the lowercase German letter ``ß''
224(U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the
225two-character string ``SS'' but it does not match ``SS'', and it might
226not match the uppercase letter ``ẞ'' (U+1E9E, LATIN CAPITAL LETTER
227SHARP S) even though lowercasing the latter yields the former.
228
229@option{-y} is an obsolete synonym that is provided for compatibility.
230(@option{-i} is specified by POSIX.)
231
232@item --no-ignore-case
233@opindex --no-ignore-case
234Do not ignore case distinctions in patterns and input data. This is
235the default. This option is useful for passing to shell scripts that
236already use @option{-i}, in order to cancel its effects because the
237two options override each other.
238
239@item -v
240@itemx --invert-match
241@opindex -v
242@opindex --invert-match
243@cindex invert matching
244@cindex print non-matching lines
245Invert the sense of matching, to select non-matching lines.
246(@option{-v} is specified by POSIX.)
247
248@item -w
249@itemx --word-regexp
250@opindex -w
251@opindex --word-regexp
252@cindex matching whole words
253Select only those lines containing matches that form whole words.
254The test is that the matching substring must either
255be at the beginning of the line,
256or preceded by a non-word constituent character.
257Similarly,
258it must be either at the end of the line
259or followed by a non-word constituent character.
260Word constituent characters are letters, digits, and the underscore.
261This option has no effect if @option{-x} is also specified.
262
263Because the @option{-w} option can match a substring that does not
264begin and end with word constituents, it differs from surrounding a
265regular expression with @samp{\<} and @samp{\>}. For example, although
266@samp{grep -w @@} matches a line containing only @samp{@@}, @samp{grep
267'\<@@\>'} cannot match any line because @samp{@@} is not a
268word constituent. @xref{The Backslash Character and Special
269Expressions}.
270
271@item -x
272@itemx --line-regexp
273@opindex -x
274@opindex --line-regexp
275@cindex match the whole line
276Select only those matches that exactly match the whole line.
277For regular expression patterns, this is like parenthesizing each
278pattern and then surrounding it with @samp{^} and @samp{$}.
279(@option{-x} is specified by POSIX.)
280
281@end table
282
283@node General Output Control
284@subsection General Output Control
285
286@table @option
287
288@item -c
289@itemx --count
290@opindex -c
291@opindex --count
292@cindex counting lines
293Suppress normal output;
294instead print a count of matching lines for each input file.
295With the @option{-v} (@option{--invert-match}) option,
296count non-matching lines.
297(@option{-c} is specified by POSIX.)
298
299@item --color[=@var{WHEN}]
300@itemx --colour[=@var{WHEN}]
301@opindex --color
302@opindex --colour
303@cindex highlight, color, colour
304Surround the matched (non-empty) strings, matching lines, context lines,
305file names, line numbers, byte offsets, and separators (for fields and
306groups of context lines) with escape sequences to display them in color
307on the terminal.
308The colors are defined by the environment variable @env{GREP_COLORS}
309and default to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36}
310for bold red matched text, magenta file names, green line numbers,
311green byte offsets, cyan separators, and default terminal colors otherwise.
312The deprecated environment variable @env{GREP_COLOR} is still supported,
313but its setting does not have priority;
314it defaults to @samp{01;31} (bold red)
315which only covers the color for matched text.
316@var{WHEN} is @samp{never}, @samp{always}, or @samp{auto}.
317
318@item -L
319@itemx --files-without-match
320@opindex -L
321@opindex --files-without-match
322@cindex files which don't match
323Suppress normal output;
324instead print the name of each input file from which
325no output would normally have been printed.
326
327@item -l
328@itemx --files-with-matches
329@opindex -l
330@opindex --files-with-matches
331@cindex names of matching files
332Suppress normal output;
333instead print the name of each input file from which
334output would normally have been printed.
335Scanning each input file stops upon first match.
336(@option{-l} is specified by POSIX.)
337
338@item -m @var{num}
339@itemx --max-count=@var{num}
340@opindex -m
341@opindex --max-count
342@cindex max-count
343Stop after the first @var{num} selected lines.
344If the input is standard input from a regular file,
345and @var{num} selected lines are output,
346@command{grep} ensures that the standard input is positioned
347just after the last selected line before exiting,
348regardless of the presence of trailing context lines.
349This enables a calling process to resume a search.
350For example, the following shell script makes use of it:
351
352@example
353while grep -m 1 'PATTERN'
354do
355 echo xxxx
356done < FILE
357@end example
358
359But the following probably will not work because a pipe is not a regular
360file:
361
362@example
363# This probably will not work.
364cat FILE |
365while grep -m 1 'PATTERN'
366do
367 echo xxxx
368done
369@end example
370
371@cindex context lines
372When @command{grep} stops after @var{num} selected lines,
373it outputs any trailing context lines.
374When the @option{-c} or @option{--count} option is also used,
375@command{grep} does not output a count greater than @var{num}.
376When the @option{-v} or @option{--invert-match} option is also used,
377@command{grep} stops after outputting @var{num} non-matching lines.
378
379@item -o
380@itemx --only-matching
381@opindex -o
382@opindex --only-matching
383@cindex only matching
384Print only the matched (non-empty) parts of matching lines,
385with each such part on a separate output line.
386Output lines use the same delimiters as input, and delimiters are null
387bytes if @option{-z} (@option{--null-data}) is also used (@pxref{Other
388Options}).
389
390@item -q
391@itemx --quiet
392@itemx --silent
393@opindex -q
394@opindex --quiet
395@opindex --silent
396@cindex quiet, silent
397Quiet; do not write anything to standard output.
398Exit immediately with zero status if any match is found,
399even if an error was detected.
400Also see the @option{-s} or @option{--no-messages} option.
401(@option{-q} is specified by POSIX.)
402
403@item -s
404@itemx --no-messages
405@opindex -s
406@opindex --no-messages
407@cindex suppress error messages
408Suppress error messages about nonexistent or unreadable files.
409Portability note:
410unlike GNU @command{grep},
4117th Edition Unix @command{grep} did not conform to POSIX,
412because it lacked @option{-q}
413and its @option{-s} option behaved like
414GNU @command{grep}'s @option{-q} option.@footnote{Of course, 7th Edition
415Unix predated POSIX by several years!}
416USG-style @command{grep} also lacked @option{-q}
417but its @option{-s} option behaved like GNU @command{grep}'s.
418Portable shell scripts should avoid both
419@option{-q} and @option{-s} and should redirect
420standard and error output to @file{/dev/null} instead.
421(@option{-s} is specified by POSIX.)
422
423@end table
424
425@node Output Line Prefix Control
426@subsection Output Line Prefix Control
427
428When several prefix fields are to be output,
429the order is always file name, line number, and byte offset,
430regardless of the order in which these options were specified.
431
432@table @option
433
434@item -b
435@itemx --byte-offset
436@opindex -b
437@opindex --byte-offset
438@cindex byte offset
439Print the 0-based byte offset within the input file
440before each line of output.
441If @option{-o} (@option{--only-matching}) is specified,
442print the offset of the matching part itself.
443
444@item -H
445@itemx --with-filename
446@opindex -H
447@opindex --with-filename
448@cindex with filename prefix
449Print the file name for each match.
450This is the default when there is more than one file to search.
451
452@item -h
453@itemx --no-filename
454@opindex -h
455@opindex --no-filename
456@cindex no filename prefix
457Suppress the prefixing of file names on output.
458This is the default when there is only one file
459(or only standard input) to search.
460
461@item --label=@var{LABEL}
462@opindex --label
463@cindex changing name of standard input
464Display input actually coming from standard input
465as input coming from file @var{LABEL}.
466This can be useful for commands that transform a file's contents
467before searching; e.g.:
468
469@example
470gzip -cd foo.gz | grep --label=foo -H 'some pattern'
471@end example
472
473@item -n
474@itemx --line-number
475@opindex -n
476@opindex --line-number
477@cindex line numbering
478Prefix each line of output with the 1-based line number within its input file.
479(@option{-n} is specified by POSIX.)
480
481@item -T
482@itemx --initial-tab
483@opindex -T
484@opindex --initial-tab
485@cindex tab-aligned content lines
486Make sure that the first character of actual line content lies on a tab stop,
487so that the alignment of tabs looks normal.
488This is useful with options that prefix their output to the actual content:
489@option{-H}, @option{-n}, and @option{-b}.
490This may also prepend spaces to output line numbers and byte offsets
491so that lines from a single file all start at the same column.
492
493@item -Z
494@itemx --null
495@opindex -Z
496@opindex --null
497@cindex zero-terminated file names
498Output a zero byte (the ASCII NUL character)
499instead of the character that normally follows a file name.
500For example,
501@samp{grep -lZ} outputs a zero byte after each file name
502instead of the usual newline.
503This option makes the output unambiguous,
504even in the presence of file names containing unusual characters like newlines.
505This option can be used with commands like
506@samp{find -print0}, @samp{perl -0}, @samp{sort -z}, and @samp{xargs -0}
507to process arbitrary file names,
508even those that contain newline characters.
509
510@end table
511
512@node Context Line Control
513@subsection Context Line Control
514
515@cindex context lines
516@dfn{Context lines} are non-matching lines that are near a matching line.
517They are output only if one of the following options are used.
518Regardless of how these options are set,
519@command{grep} never outputs any given line more than once.
520If the @option{-o} (@option{--only-matching}) option is specified,
521these options have no effect and a warning is given upon their use.
522
523@table @option
524
525@item -A @var{num}
526@itemx --after-context=@var{num}
527@opindex -A
528@opindex --after-context
529@cindex after context
530@cindex context lines, after match
531Print @var{num} lines of trailing context after matching lines.
532
533@item -B @var{num}
534@itemx --before-context=@var{num}
535@opindex -B
536@opindex --before-context
537@cindex before context
538@cindex context lines, before match
539Print @var{num} lines of leading context before matching lines.
540
541@item -C @var{num}
542@itemx -@var{num}
543@itemx --context=@var{num}
544@opindex -C
545@opindex --context
546@opindex -@var{num}
547@cindex context lines
548Print @var{num} lines of leading and trailing output context.
549
550@item --group-separator=@var{string}
551@opindex --group-separator
552@cindex group separator
553When @option{-A}, @option{-B} or @option{-C} are in use,
554print @var{string} instead of @option{--} between groups of lines.
555
556@item --no-group-separator
557@opindex --group-separator
558@cindex group separator
559When @option{-A}, @option{-B} or @option{-C} are in use,
560do not print a separator between groups of lines.
561
562@end table
563
564Here are some points about how @command{grep} chooses
565the separator to print between prefix fields and line content:
566
567@itemize @bullet
568@item
569Matching lines normally use @samp{:} as a separator
570between prefix fields and actual line content.
571
572@item
573Context (i.e., non-matching) lines use @samp{-} instead.
574
575@item
576When context is not specified,
577matching lines are simply output one right after another.
578
579@item
580When context is specified,
581lines that are adjacent in the input form a group
582and are output one right after another, while
583by default a separator appears between non-adjacent groups.
584
585@item
586The default separator
587is a @samp{--} line; its presence and appearance
588can be changed with the options above.
589
590@item
591Each group may contain
592several matching lines when they are close enough to each other
593that two adjacent groups connect and can merge into a single
594contiguous one.
595@end itemize
596
597@node File and Directory Selection
598@subsection File and Directory Selection
599
600@table @option
601
602@item -a
603@itemx --text
604@opindex -a
605@opindex --text
606@cindex suppress binary data
607@cindex binary files
608Process a binary file as if it were text;
609this is equivalent to the @samp{--binary-files=text} option.
610
611@item --binary-files=@var{type}
612@opindex --binary-files
613@cindex binary files
614If a file's data or metadata
615indicate that the file contains binary data,
616assume that the file is of type @var{type}.
617Non-text bytes indicate binary data; these are either output bytes that are
618improperly encoded for the current locale (@pxref{Environment
619Variables}), or null input bytes when the
620@option{-z} (@option{--null-data}) option is not given (@pxref{Other
621Options}).
622
623By default, @var{type} is @samp{binary}, and @command{grep}
624suppresses output after null input binary data is discovered,
625and suppresses output lines that contain improperly encoded data.
626When some output is suppressed, @command{grep} follows any output
627with a one-line message saying that a binary file matches.
628
629If @var{type} is @samp{without-match},
630when @command{grep} discovers null input binary data
631it assumes that the rest of the file does not match;
632this is equivalent to the @option{-I} option.
633
634If @var{type} is @samp{text},
635@command{grep} processes binary data as if it were text;
636this is equivalent to the @option{-a} option.
637
638When @var{type} is @samp{binary}, @command{grep} may treat non-text
639bytes as line terminators even without the @option{-z}
640(@option{--null-data}) option. This means choosing @samp{binary}
641versus @samp{text} can affect whether a pattern matches a file. For
642example, when @var{type} is @samp{binary} the pattern @samp{q$} might
643match @samp{q} immediately followed by a null byte, even though this
644is not matched when @var{type} is @samp{text}. Conversely, when
645@var{type} is @samp{binary} the pattern @samp{.} (period) might not
646match a null byte.
647
648@emph{Warning:} The @option{-a} (@option{--binary-files=text}) option
649might output binary garbage, which can have nasty side effects if the
650output is a terminal and if the terminal driver interprets some of it
651as commands. On the other hand, when reading files whose text
652encodings are unknown, it can be helpful to use @option{-a} or to set
653@samp{LC_ALL='C'} in the environment, in order to find more matches
654even if the matches are unsafe for direct display.
655
656@item -D @var{action}
657@itemx --devices=@var{action}
658@opindex -D
659@opindex --devices
660@cindex device search
661If an input file is a device, FIFO, or socket, use @var{action} to process it.
662If @var{action} is @samp{read},
663all devices are read just as if they were ordinary files.
664If @var{action} is @samp{skip},
665devices, FIFOs, and sockets are silently skipped.
666By default, devices are read if they are on the command line or if the
667@option{-R} (@option{--dereference-recursive}) option is used, and are
668skipped if they are encountered recursively and the @option{-r}
669(@option{--recursive}) option is used.
670This option has no effect on a file that is read via standard input.
671
672@item -d @var{action}
673@itemx --directories=@var{action}
674@opindex -d
675@opindex --directories
676@cindex directory search
677@cindex symbolic links
678If an input file is a directory, use @var{action} to process it.
679By default, @var{action} is @samp{read},
680which means that directories are read just as if they were ordinary files
681(some operating systems and file systems disallow this,
682and will cause @command{grep}
683to print error messages for every directory or silently skip them).
684If @var{action} is @samp{skip}, directories are silently skipped.
685If @var{action} is @samp{recurse},
686@command{grep} reads all files under each directory, recursively,
687following command-line symbolic links and skipping other symlinks;
688this is equivalent to the @option{-r} option.
689
690@item --exclude=@var{glob}
691@opindex --exclude
692@cindex exclude files
693@cindex searching directory trees
694Skip any command-line file with a name suffix that matches the pattern
695@var{glob}, using wildcard matching; a name suffix is either the whole
696name, or a trailing part that starts with a non-slash character
697immediately after a slash (@samp{/}) in the name.
698When searching recursively, skip any subfile whose base
699name matches @var{glob}; the base name is the part after the last
700slash. A pattern can use
701@samp{*}, @samp{?}, and @samp{[}...@samp{]} as wildcards,
702and @code{\} to quote a wildcard or backslash character literally.
703
704@item --exclude-from=@var{file}
705@opindex --exclude-from
706@cindex exclude files
707@cindex searching directory trees
708Skip files whose name matches any of the patterns
709read from @var{file} (using wildcard matching as described
710under @option{--exclude}).
711
712@item --exclude-dir=@var{glob}
713@opindex --exclude-dir
714@cindex exclude directories
715Skip any command-line directory with a name suffix that matches the
716pattern @var{glob}. When searching recursively, skip any subdirectory
717whose base name matches @var{glob}. Ignore any redundant trailing
718slashes in @var{glob}.
719
720@item -I
721Process a binary file as if it did not contain matching data;
722this is equivalent to the @samp{--binary-files=without-match} option.
723
724@item --include=@var{glob}
725@opindex --include
726@cindex include files
727@cindex searching directory trees
728Search only files whose name matches @var{glob},
729using wildcard matching as described under @option{--exclude}.
730If contradictory @option{--include} and @option{--exclude} options are
731given, the last matching one wins. If no @option{--include} or
732@option{--exclude} options match, a file is included unless the first
733such option is @option{--include}.
734
735@item -r
736@itemx --recursive
737@opindex -r
738@opindex --recursive
739@cindex recursive search
740@cindex searching directory trees
741@cindex symbolic links
742For each directory operand,
743read and process all files in that directory, recursively.
744Follow symbolic links on the command line, but skip symlinks
745that are encountered recursively.
746Note that if no file operand is given, grep searches the working directory.
747This is the same as the @samp{--directories=recurse} option.
748
749@item -R
750@itemx --dereference-recursive
751@opindex -R
752@opindex --dereference-recursive
753@cindex recursive search
754@cindex searching directory trees
755@cindex symbolic links
756For each directory operand, read and process all files in that
757directory, recursively, following all symbolic links.
758
759@end table
760
761@node Other Options
762@subsection Other Options
763
764@table @option
765
766@item --
767@opindex --
768@cindex option delimiter
769Delimit the option list. Later arguments, if any, are treated as
770operands even if they begin with @samp{-}. For example, @samp{grep PAT --
771-file1 file2} searches for the pattern PAT in the files named @file{-file1}
772and @file{file2}.
773
774@item --line-buffered
775@opindex --line-buffered
776@cindex line buffering
777Use line buffering for standard output, regardless of output device.
778By default, standard output is line buffered for interactive devices,
779and is fully buffered otherwise. With full buffering, the output
780buffer is flushed when full; with line buffering, the buffer is also
781flushed after every output line. The buffer size is system dependent.
782
783@item -U
784@itemx --binary
785@opindex -U
786@opindex --binary
787@cindex MS-Windows binary I/O
788@cindex binary I/O
789On platforms that distinguish between text and binary I/O,
790use the latter when reading and writing files other
791than the user's terminal, so that all input bytes are read and written
792as-is. This overrides the default behavior where @command{grep}
793follows the operating system's advice whether to use text or binary
794I/O@. On MS-Windows when @command{grep} uses text I/O it reads a
795carriage return--newline pair as a newline and a Control-Z as
796end-of-file, and it writes a newline as a carriage return--newline
797pair.
798
799When using text I/O @option{--byte-offset} (@option{-b}) counts and
800@option{--binary-files} heuristics apply to input data after text-I/O
801processing. Also, the @option{--binary-files} heuristics need not agree
802with the @option{--binary} option; that is, they may treat the data as
803text even if @option{--binary} is given, or vice versa.
804@xref{File and Directory Selection}.
805
806This option has no effect on GNU and other POSIX-compatible platforms,
807which do not distinguish text from binary I/O.
808
809@item -z
810@itemx --null-data
811@opindex -z
812@opindex --null-data
813@cindex zero-terminated lines
814Treat input and output data as sequences of lines, each terminated by
815a zero byte (the ASCII NUL character) instead of a newline.
816Like the @option{-Z} or @option{--null} option,
817this option can be used with commands like
818@samp{sort -z} to process arbitrary file names.
819
820@end table
821
822@node Environment Variables
823@section Environment Variables
824
825The behavior of @command{grep} is affected
826by the following environment variables.
827
828@vindex LANGUAGE @r{environment variable}
829@vindex LC_ALL @r{environment variable}
830@vindex LC_MESSAGES @r{environment variable}
831@vindex LANG @r{environment variable}
832The locale for category @w{@code{LC_@var{foo}}}
833is specified by examining the three environment variables
834@env{LC_ALL}, @w{@env{LC_@var{foo}}}, and @env{LANG},
835in that order.
836The first of these variables that is set specifies the locale.
837For example, if @env{LC_ALL} is not set,
838but @env{LC_COLLATE} is set to @samp{pt_BR},
839then the Brazilian Portuguese locale is used
840for the @env{LC_COLLATE} category.
841As a special case for @env{LC_MESSAGES} only, the environment variable
842@env{LANGUAGE} can contain a colon-separated list of languages that
843overrides the three environment variables that ordinarily specify
844the @env{LC_MESSAGES} category.
845The @samp{C} locale is used if none of these environment variables are set,
846if the locale catalog is not installed,
847or if @command{grep} was not compiled
848with national language support (NLS).
849The shell command @code{locale -a} lists locales that are currently available.
850
851Many of the environment variables in the following list let you
852control highlighting using
853Select Graphic Rendition (SGR)
854commands interpreted by the terminal or terminal emulator.
855(See the
856section
857in the documentation of your text terminal
858for permitted values and their meanings as character attributes.)
859These substring values are integers in decimal representation
860and can be concatenated with semicolons.
861@command{grep} takes care of assembling the result
862into a complete SGR sequence (@samp{\33[}...@samp{m}).
863Common values to concatenate include
864@samp{1} for bold,
865@samp{4} for underline,
866@samp{5} for blink,
867@samp{7} for inverse,
868@samp{39} for default foreground color,
869@samp{30} to @samp{37} for foreground colors,
870@samp{90} to @samp{97} for 16-color mode foreground colors,
871@samp{38;5;0} to @samp{38;5;255}
872for 88-color and 256-color modes foreground colors,
873@samp{49} for default background color,
874@samp{40} to @samp{47} for background colors,
875@samp{100} to @samp{107} for 16-color mode background colors,
876and @samp{48;5;0} to @samp{48;5;255}
877for 88-color and 256-color modes background colors.
878
879The two-letter names used in the @env{GREP_COLORS} environment variable
880(and some of the others) refer to terminal ``capabilities,'' the ability
881of a terminal to highlight text, or change its color, and so on.
882These capabilities are stored in an online database and accessed by
883the @code{terminfo} library.
884
885@cindex environment variables
886
887@table @env
888
889@item GREP_COLOR
890@vindex GREP_COLOR @r{environment variable}
891@cindex highlight markers
892This variable specifies the color used to highlight matched (non-empty) text.
893It is deprecated in favor of @env{GREP_COLORS}, but still supported.
894The @samp{mt}, @samp{ms}, and @samp{mc} capabilities of @env{GREP_COLORS}
895have priority over it.
896It can only specify the color used to highlight
897the matching non-empty text in any matching line
898(a selected line when the @option{-v} command-line option is omitted,
899or a context line when @option{-v} is specified).
900The default is @samp{01;31},
901which means a bold red foreground text on the terminal's default background.
902
903@item GREP_COLORS
904@vindex GREP_COLORS @r{environment variable}
905@cindex highlight markers
906This variable specifies the colors and other attributes
907used to highlight various parts of the output.
908Its value is a colon-separated list of @code{terminfo} capabilities
909that defaults to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36}
910with the @samp{rv} and @samp{ne} boolean capabilities omitted (i.e., false).
911Supported capabilities are as follows.
912
913@table @code
914@item sl=
915@vindex sl GREP_COLORS @r{capability}
916SGR substring for whole selected lines
917(i.e.,
918matching lines when the @option{-v} command-line option is omitted,
919or non-matching lines when @option{-v} is specified).
920If however the boolean @samp{rv} capability
921and the @option{-v} command-line option are both specified,
922it applies to context matching lines instead.
923The default is empty (i.e., the terminal's default color pair).
924
925@item cx=
926@vindex cx GREP_COLORS @r{capability}
927SGR substring for whole context lines
928(i.e.,
929non-matching lines when the @option{-v} command-line option is omitted,
930or matching lines when @option{-v} is specified).
931If however the boolean @samp{rv} capability
932and the @option{-v} command-line option are both specified,
933it applies to selected non-matching lines instead.
934The default is empty (i.e., the terminal's default color pair).
935
936@item rv
937@vindex rv GREP_COLORS @r{capability}
938Boolean value that reverses (swaps) the meanings of
939the @samp{sl=} and @samp{cx=} capabilities
940when the @option{-v} command-line option is specified.
941The default is false (i.e., the capability is omitted).
942
943@item mt=01;31
944@vindex mt GREP_COLORS @r{capability}
945SGR substring for matching non-empty text in any matching line
946(i.e.,
947a selected line when the @option{-v} command-line option is omitted,
948or a context line when @option{-v} is specified).
949Setting this is equivalent to setting both @samp{ms=} and @samp{mc=}
950at once to the same value.
951The default is a bold red text foreground over the current line background.
952
953@item ms=01;31
954@vindex ms GREP_COLORS @r{capability}
955SGR substring for matching non-empty text in a selected line.
956(This is used only when the @option{-v} command-line option is omitted.)
957The effect of the @samp{sl=} (or @samp{cx=} if @samp{rv}) capability
958remains active when this takes effect.
959The default is a bold red text foreground over the current line background.
960
961@item mc=01;31
962@vindex mc GREP_COLORS @r{capability}
963SGR substring for matching non-empty text in a context line.
964(This is used only when the @option{-v} command-line option is specified.)
965The effect of the @samp{cx=} (or @samp{sl=} if @samp{rv}) capability
966remains active when this takes effect.
967The default is a bold red text foreground over the current line background.
968
969@item fn=35
970@vindex fn GREP_COLORS @r{capability}
971SGR substring for file names prefixing any content line.
972The default is a magenta text foreground over the terminal's default background.
973
974@item ln=32
975@vindex ln GREP_COLORS @r{capability}
976SGR substring for line numbers prefixing any content line.
977The default is a green text foreground over the terminal's default background.
978
979@item bn=32
980@vindex bn GREP_COLORS @r{capability}
981SGR substring for byte offsets prefixing any content line.
982The default is a green text foreground over the terminal's default background.
983
984@item se=36
985@vindex fn GREP_COLORS @r{capability}
986SGR substring for separators that are inserted
987between selected line fields (@samp{:}),
988between context line fields (@samp{-}),
989and between groups of adjacent lines
990when nonzero context is specified (@samp{--}).
991The default is a cyan text foreground over the terminal's default background.
992
993@item ne
994@vindex ne GREP_COLORS @r{capability}
995Boolean value that prevents clearing to the end of line
996using Erase in Line (EL) to Right (@samp{\33[K})
997each time a colorized item ends.
998This is needed on terminals on which EL is not supported.
999It is otherwise useful on terminals
1000for which the @code{back_color_erase}
1001(@code{bce}) boolean @code{terminfo} capability does not apply,
1002when the chosen highlight colors do not affect the background,
1003or when EL is too slow or causes too much flicker.
1004The default is false (i.e., the capability is omitted).
1005@end table
1006
1007Note that boolean capabilities have no @samp{=}... part.
1008They are omitted (i.e., false) by default and become true when specified.
1009
1010
1011@item LC_ALL
1012@itemx LC_COLLATE
1013@itemx LANG
1014@vindex LC_ALL @r{environment variable}
1015@vindex LC_COLLATE @r{environment variable}
1016@vindex LANG @r{environment variable}
1017@cindex character type
1018@cindex national language support
1019@cindex NLS
1020These variables specify the locale for the @env{LC_COLLATE} category,
1021which might affect how range expressions like @samp{[a-z]} are
1022interpreted.
1023
1024@item LC_ALL
1025@itemx LC_CTYPE
1026@itemx LANG
1027@vindex LC_ALL @r{environment variable}
1028@vindex LC_CTYPE @r{environment variable}
1029@vindex LANG @r{environment variable}
1030@cindex encoding error
1031@cindex null character
1032These variables specify the locale for the @env{LC_CTYPE} category,
1033which determines the type of characters,
1034e.g., which characters are whitespace.
1035This category also determines the character encoding.
1036@xref{Character Encoding}.
1037
1038@item LANGUAGE
1039@itemx LC_ALL
1040@itemx LC_MESSAGES
1041@itemx LANG
1042@vindex LANGUAGE @r{environment variable}
1043@vindex LC_ALL @r{environment variable}
1044@vindex LC_MESSAGES @r{environment variable}
1045@vindex LANG @r{environment variable}
1046@cindex language of messages
1047@cindex message language
1048@cindex national language support
1049@cindex translation of message language
1050These variables specify the locale for the @env{LC_MESSAGES} category,
1051which determines the language that @command{grep} uses for messages.
1052The default @samp{C} locale uses American English messages.
1053
1054@item POSIXLY_CORRECT
1055@vindex POSIXLY_CORRECT @r{environment variable}
1056If set, @command{grep} behaves as POSIX requires; otherwise,
1057@command{grep} behaves more like other GNU programs.
1058POSIX
1059requires that options that
1060follow file names must be treated as file names;
1061by default,
1062such options are permuted to the front of the operand list
1063and are treated as options.
1064Also, @env{POSIXLY_CORRECT} disables special handling of an
1065invalid bracket expression. @xref{invalid-bracket-expr}.
1066
1067@item _@var{N}_GNU_nonoption_argv_flags_
1068@vindex _@var{N}_GNU_nonoption_argv_flags_ @r{environment variable}
1069(Here @code{@var{N}} is @command{grep}'s numeric process ID.)
1070If the @var{i}th character of this environment variable's value is @samp{1},
1071do not consider the @var{i}th operand of @command{grep} to be an option,
1072even if it appears to be one.
1073A shell can put this variable in the environment for each command it runs,
1074specifying which operands are the results of file name wildcard expansion
1075and therefore should not be treated as options.
1076This behavior is available only with the GNU C library,
1077and only when @env{POSIXLY_CORRECT} is not set.
1078
1079@end table
1080
1081The @env{GREP_OPTIONS} environment variable of @command{grep} 2.20 and
1082earlier is no longer supported, as it caused problems when writing
1083portable scripts. To make arbitrary changes to how @command{grep}
1084works, you can use an alias or script instead. For example, if
1085@command{grep} is in the directory @samp{/usr/bin} you can prepend
1086@file{$HOME/bin} to your @env{PATH} and create an executable script
1087@file{$HOME/bin/grep} containing the following:
1088
1089@example
1090#! /bin/sh
1091export PATH=/usr/bin
1092exec grep --color=auto --devices=skip "$@@"
1093@end example
1094
1095
1096@node Exit Status
1097@section Exit Status
1098@cindex exit status
1099@cindex return status
1100
1101Normally the exit status is 0 if a line is selected, 1 if no lines
1102were selected, and 2 if an error occurred. However, if the
1103@option{-q} or @option{--quiet} or @option{--silent} option is used
1104and a line is selected, the exit status is 0 even if an error
1105occurred. Other @command{grep} implementations may exit with status
1106greater than 2 on error.
1107
1108@node grep Programs
1109@section @command{grep} Programs
1110@cindex @command{grep} programs
1111@cindex variants of @command{grep}
1112
1113@command{grep} searches the named input files
1114for lines containing a match to the given patterns.
1115By default, @command{grep} prints the matching lines.
1116A file named @file{-} stands for standard input.
1117If no input is specified, @command{grep} searches the working
1118directory @file{.} if given a command-line option specifying
1119recursion; otherwise, @command{grep} searches standard input.
1120There are four major variants of @command{grep},
1121controlled by the following options.
1122
1123@table @option
1124
1125@item -G
1126@itemx --basic-regexp
1127@opindex -G
1128@opindex --basic-regexp
1129@cindex matching basic regular expressions
1130Interpret patterns as basic regular expressions (BREs).
1131This is the default.
1132
1133@item -E
1134@itemx --extended-regexp
1135@opindex -E
1136@opindex --extended-regexp
1137@cindex matching extended regular expressions
1138Interpret patterns as extended regular expressions (EREs).
1139(@option{-E} is specified by POSIX.)
1140
1141@item -F
1142@itemx --fixed-strings
1143@opindex -F
1144@opindex --fixed-strings
1145@cindex matching fixed strings
1146Interpret patterns as fixed strings, not regular expressions.
1147(@option{-F} is specified by POSIX.)
1148
1149@item -P
1150@itemx --perl-regexp
1151@opindex -P
1152@opindex --perl-regexp
1153@cindex matching Perl-compatible regular expressions
1154Interpret patterns as Perl-compatible regular expressions (PCREs).
1155PCRE support is here to stay, but consider this option experimental when
1156combined with the @option{-z} (@option{--null-data}) option, and note that
1157@samp{grep@ -P} may warn of unimplemented features.
1158@xref{Other Options}.
1159
1160@end table
1161
1162In addition,
1163two variant programs @command{egrep} and @command{fgrep} are available.
1164@command{egrep} is the same as @samp{grep@ -E}.
1165@command{fgrep} is the same as @samp{grep@ -F}.
1166Direct invocation as either
1167@command{egrep} or @command{fgrep} is deprecated,
1168but is provided to allow historical applications
1169that rely on them to run unmodified.
1170
1171
1172@node Regular Expressions
1173@chapter Regular Expressions
1174@cindex regular expressions
1175
1176A @dfn{regular expression} is a pattern that describes a set of strings.
1177Regular expressions are constructed analogously to arithmetic expressions,
1178by using various operators to combine smaller expressions.
1179@command{grep} understands
1180three different versions of regular expression syntax:
1181basic (BRE), extended (ERE), and Perl-compatible (PCRE).
1182In GNU @command{grep},
1183there is no difference in available functionality between the basic and
1184extended syntaxes.
1185In other implementations, basic regular expressions are less powerful.
1186The following description applies to extended regular expressions;
1187differences for basic regular expressions are summarized afterwards.
1188Perl-compatible regular expressions give additional functionality, and
1189are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual
1190pages, but work only if PCRE is available in the system.
1191
1192@menu
1193* Fundamental Structure::
1194* Character Classes and Bracket Expressions::
1195* The Backslash Character and Special Expressions::
1196* Anchoring::
1197* Back-references and Subexpressions::
1198* Basic vs Extended::
1199* Character Encoding::
1200* Matching Non-ASCII::
1201@end menu
1202
1203@node Fundamental Structure
1204@section Fundamental Structure
1205
1206@cindex ordinary characters
1207@cindex special characters
1208In regular expressions, the characters @samp{.?*+@{|()[\^$} are
1209@dfn{special characters} and have uses described below. All other
1210characters are @dfn{ordinary characters}, and each ordinary character
1211is a regular expression that matches itself.
1212
1213@opindex .
1214@cindex dot
1215@cindex period
1216The period @samp{.} matches any single character.
1217It is unspecified whether @samp{.} matches an encoding error.
1218
1219@cindex interval expressions
1220A regular expression may be followed by one of several
1221repetition operators; the operators beginning with @samp{@{}
1222are called @dfn{interval expressions}.
1223
1224@table @samp
1225
1226@item ?
1227@opindex ?
1228@cindex question mark
1229@cindex match expression at most once
1230The preceding item is optional and is matched at most once.
1231
1232@item *
1233@opindex *
1234@cindex asterisk
1235@cindex match expression zero or more times
1236The preceding item is matched zero or more times.
1237
1238@item +
1239@opindex +
1240@cindex plus sign
1241@cindex match expression one or more times
1242The preceding item is matched one or more times.
1243
1244@item @{@var{n}@}
1245@opindex @{@var{n}@}
1246@cindex braces, one argument
1247@cindex match expression @var{n} times
1248The preceding item is matched exactly @var{n} times.
1249
1250@item @{@var{n},@}
1251@opindex @{@var{n},@}
1252@cindex braces, second argument omitted
1253@cindex match expression @var{n} or more times
1254The preceding item is matched @var{n} or more times.
1255
1256@item @{,@var{m}@}
1257@opindex @{,@var{m}@}
1258@cindex braces, first argument omitted
1259@cindex match expression at most @var{m} times
1260The preceding item is matched at most @var{m} times.
1261This is a GNU extension.
1262
1263@item @{@var{n},@var{m}@}
1264@opindex @{@var{n},@var{m}@}
1265@cindex braces, two arguments
1266@cindex match expression from @var{n} to @var{m} times
1267The preceding item is matched at least @var{n} times, but not more than
1268@var{m} times.
1269
1270@end table
1271
1272The empty regular expression matches the empty string.
1273Two regular expressions may be concatenated;
1274the resulting regular expression
1275matches any string formed by concatenating two substrings
1276that respectively match the concatenated expressions.
1277
1278Two regular expressions may be joined by the infix operator @samp{|};
1279the resulting regular expression
1280matches any string matching either alternate expression.
1281
1282Repetition takes precedence over concatenation,
1283which in turn takes precedence over alternation.
1284A whole expression may be enclosed in parentheses
1285to override these precedence rules and form a subexpression.
1286An unmatched @samp{)} matches just itself.
1287
1288@node Character Classes and Bracket Expressions
1289@section Character Classes and Bracket Expressions
1290
1291@cindex bracket expression
1292@cindex character class
1293A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
1294@samp{]}.
1295It matches any single character in that list.
1296If the first character of the list is the caret @samp{^},
1297then it matches any character @strong{not} in the list,
1298and it is unspecified whether it matches an encoding error.
1299For example, the regular expression
1300@samp{[0123456789]} matches any single digit,
1301whereas @samp{[^()]} matches any single character that is not
1302an opening or closing parenthesis, and might or might not match an
1303encoding error.
1304
1305@cindex range expression
1306Within a bracket expression, a @dfn{range expression} consists of two
1307characters separated by a hyphen.
1308It matches any single character that
1309sorts between the two characters, inclusive.
1310In the default C locale, the sorting sequence is the native character
1311order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
1312In other locales, the sorting sequence is not specified, and
1313@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
1314@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
1315characters that it matches might even be erratic.
1316To obtain the traditional interpretation
1317of bracket expressions, you can use the @samp{C} locale by setting the
1318@env{LC_ALL} environment variable to the value @samp{C}.
1319
1320Finally, certain named classes of characters are predefined within
1321bracket expressions, as follows.
1322Their interpretation depends on the @env{LC_CTYPE} locale;
1323for example, @samp{[[:alnum:]]} means the character class of numbers and letters
1324in the current locale.
1325
1326@cindex classes of characters
1327@cindex character classes
1328@table @samp
1329
1330@item [:alnum:]
1331@opindex alnum @r{character class}
1332@cindex alphanumeric characters
1333Alphanumeric characters:
1334@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
1335character encoding, this is the same as @samp{[0-9A-Za-z]}.
1336
1337@item [:alpha:]
1338@opindex alpha @r{character class}
1339@cindex alphabetic characters
1340Alphabetic characters:
1341@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
1342character encoding, this is the same as @samp{[A-Za-z]}.
1343
1344@item [:blank:]
1345@opindex blank @r{character class}
1346@cindex blank characters
1347Blank characters:
1348space and tab.
1349
1350@item [:cntrl:]
1351@opindex cntrl @r{character class}
1352@cindex control characters
1353Control characters.
1354In ASCII, these characters have octal codes 000
1355through 037, and 177 (DEL).
1356In other character sets, these are
1357the equivalent characters, if any.
1358
1359@item [:digit:]
1360@opindex digit @r{character class}
1361@cindex digit characters
1362@cindex numeric characters
1363Digits: @code{0 1 2 3 4 5 6 7 8 9}.
1364
1365@item [:graph:]
1366@opindex graph @r{character class}
1367@cindex graphic characters
1368Graphical characters:
1369@samp{[:alnum:]} and @samp{[:punct:]}.
1370
1371@item [:lower:]
1372@opindex lower @r{character class}
1373@cindex lower-case letters
1374Lower-case letters; in the @samp{C} locale and ASCII character
1375encoding, this is
1376@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
1377
1378@item [:print:]
1379@opindex print @r{character class}
1380@cindex printable characters
1381Printable characters:
1382@samp{[:alnum:]}, @samp{[:punct:]}, and space.
1383
1384@item [:punct:]
1385@opindex punct @r{character class}
1386@cindex punctuation characters
1387Punctuation characters; in the @samp{C} locale and ASCII character
1388encoding, this is
1389@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}.
1390
1391@item [:space:]
1392@opindex space @r{character class}
1393@cindex space characters
1394@cindex whitespace characters
1395Space characters: in the @samp{C} locale, this is
1396tab, newline, vertical tab, form feed, carriage return, and space.
1397@xref{Usage}, for more discussion of matching newlines.
1398
1399@item [:upper:]
1400@opindex upper @r{character class}
1401@cindex upper-case letters
1402Upper-case letters: in the @samp{C} locale and ASCII character
1403encoding, this is
1404@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
1405
1406@item [:xdigit:]
1407@opindex xdigit @r{character class}
1408@cindex xdigit class
1409@cindex hexadecimal digits
1410Hexadecimal digits:
1411@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
1412
1413@end table
1414Note that the brackets in these class names are
1415part of the symbolic names, and must be included in addition to
1416the brackets delimiting the bracket expression.
1417
1418@anchor{invalid-bracket-expr}
1419If you mistakenly omit the outer brackets, and search for say, @samp{[:upper:]},
1420GNU @command{grep} prints a diagnostic and exits with status 2, on
1421the assumption that you did not intend to search for the nominally
1422equivalent regular expression: @samp{[:epru]}.
1423Set the @env{POSIXLY_CORRECT} environment variable to disable this feature.
1424
1425Special characters lose their special meaning inside bracket expressions.
1426
1427@table @samp
1428@item ]
1429ends the bracket expression if it's not the first list item.
1430So, if you want to make the @samp{]} character a list item,
1431you must put it first.
1432
1433@item [.
1434represents the open collating symbol.
1435
1436@item .]
1437represents the close collating symbol.
1438
1439@item [=
1440represents the open equivalence class.
1441
1442@item =]
1443represents the close equivalence class.
1444
1445@item [:
1446represents the open character class symbol, and should be followed by a
1447valid character class name.
1448
1449@item :]
1450represents the close character class symbol.
1451
1452@item -
1453represents the range if it's not first or last in a list or the ending point
1454of a range.
1455
1456@item ^
1457represents the characters not in the list.
1458If you want to make the @samp{^}
1459character a list item, place it anywhere but first.
1460
1461@end table
1462
1463@node The Backslash Character and Special Expressions
1464@section The Backslash Character and Special Expressions
1465@cindex backslash
1466
1467The @samp{\} character followed by a special character is a regular
1468expression that matches the special character.
1469The @samp{\} character,
1470when followed by certain ordinary characters,
1471takes a special meaning:
1472
1473@table @samp
1474
1475@item \b
1476Match the empty string at the edge of a word.
1477
1478@item \B
1479Match the empty string provided it's not at the edge of a word.
1480
1481@item \<
1482Match the empty string at the beginning of a word.
1483
1484@item \>
1485Match the empty string at the end of a word.
1486
1487@item \w
1488Match word constituent, it is a synonym for @samp{[_[:alnum:]]}.
1489
1490@item \W
1491Match non-word constituent, it is a synonym for @samp{[^_[:alnum:]]}.
1492
1493@item \s
1494Match whitespace, it is a synonym for @samp{[[:space:]]}.
1495
1496@item \S
1497Match non-whitespace, it is a synonym for @samp{[^[:space:]]}.
1498
1499@end table
1500
1501For example, @samp{\brat\b} matches the separate word @samp{rat},
1502@samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}.
1503
1504@node Anchoring
1505@section Anchoring
1506@cindex anchoring
1507
1508The caret @samp{^} and the dollar sign @samp{$} are special characters that
1509respectively match the empty string at the beginning and end of a line.
1510They are termed @dfn{anchors}, since they force the match to be ``anchored''
1511to beginning or end of a line, respectively.
1512
1513@node Back-references and Subexpressions
1514@section Back-references and Subexpressions
1515@cindex subexpression
1516@cindex back-reference
1517
1518The back-reference @samp{\@var{n}},
1519where @var{n} is a single nonzero digit, matches
1520the substring previously matched by the @var{n}th parenthesized subexpression
1521of the regular expression.
1522For example, @samp{(a)\1} matches @samp{aa}.
1523If the parenthesized subexpression does not participate in the match,
1524the back-reference makes the whole match fail;
1525for example, @samp{(a)*\1} fails to match @samp{a}.
1526If the parenthesized subexpression matches more than one substring,
1527the back-reference refers to the last matched substring;
1528for example, @samp{^(ab*)*\1$} matches @samp{ababbabb} but not @samp{ababbab}.
1529When multiple regular expressions are given with
1530@option{-e} or from a file (@samp{-f @var{file}}),
1531back-references are local to each expression.
1532
1533@xref{Known Bugs}, for some known problems with back-references.
1534
1535@node Basic vs Extended
1536@section Basic vs Extended Regular Expressions
1537@cindex basic regular expressions
1538
1539In basic regular expressions the characters @samp{?}, @samp{+},
1540@samp{@{}, @samp{|}, @samp{(}, and @samp{)} lose their special meaning;
1541instead use the backslashed versions @samp{\?}, @samp{\+}, @samp{\@{},
1542@samp{\|}, @samp{\(}, and @samp{\)}. Also, a backslash is needed
1543before an interval expression's closing @samp{@}}, and an unmatched
1544@code{\)} is invalid.
1545
1546Portable scripts should avoid the following constructs, as
1547POSIX says they produce undefined results:
1548
1549@itemize @bullet
1550@item
1551Extended regular expressions that use back-references.
1552@item
1553Basic regular expressions that use @samp{\?}, @samp{\+}, or @samp{\|}.
1554@item
1555Empty parenthesized regular expressions like @samp{()}.
1556@item
1557Empty alternatives (as in, e.g, @samp{a|}).
1558@item
1559Repetition operators that immediately follow empty expressions,
1560unescaped @samp{$}, or other repetition operators.
1561@item
1562A backslash escaping an ordinary character (e.g., @samp{\S}),
1563unless it is a back-reference.
1564@item
1565An unescaped @samp{[} that is not part of a bracket expression.
1566@item
1567In extended regular expressions, an unescaped @samp{@{} that is not
1568part of an interval expression.
1569@end itemize
1570
1571@cindex interval expressions
1572Traditional @command{egrep} did not support interval expressions and
1573some @command{egrep} implementations use @samp{\@{} and @samp{\@}} instead, so
1574portable scripts should avoid interval expressions in @samp{grep@ -E} patterns
1575and should use @samp{[@{]} to match a literal @samp{@{}.
1576
1577GNU @command{grep@ -E} attempts to support traditional usage by
1578assuming that @samp{@{} is not special if it would be the start of an
1579invalid interval expression.
1580For example, the command
1581@samp{grep@ -E@ '@{1'} searches for the two-character string @samp{@{1}
1582instead of reporting a syntax error in the regular expression.
1583POSIX allows this behavior as an extension, but portable scripts
1584should avoid it.
1585
1586@node Character Encoding
1587@section Character Encoding
1588@cindex character encoding
1589
1590The @env{LC_CTYPE} locale specifies the encoding of characters in
1591patterns and data, that is, whether text is encoded in UTF-8, ASCII,
1592or some other encoding. @xref{Environment Variables}.
1593
1594In the @samp{C} or @samp{POSIX} locale, every character is encoded as
1595a single byte and every byte is a valid character. In more-complex
1596encodings such as UTF-8, a sequence of multiple bytes may be needed to
1597represent a character, and some bytes may be encoding errors that do
1598not contribute to the representation of any character. POSIX does not
1599specify the behavior of @command{grep} when patterns or input data
1600contain encoding errors or null characters, so portable scripts should
1601avoid such usage. As an extension to POSIX, GNU @command{grep} treats
1602null characters like any other character. However, unless the
1603@option{-a} (@option{--binary-files=text}) option is used, the
1604presence of null characters in input or of encoding errors in output
1605causes GNU @command{grep} to treat the file as binary and suppress
1606details about matches. @xref{File and Directory Selection}.
1607
1608Regardless of locale, the 103 characters in the POSIX Portable
1609Character Set (a subset of ASCII) are always encoded as a single byte,
1610and the 128 ASCII characters have their usual single-byte encodings on
1611all but oddball platforms.
1612
1613@node Matching Non-ASCII
1614@section Matching Non-ASCII and Non-printable Characters
1615@cindex non-ASCII matching
1616@cindex non-printable matching
1617
1618In a regular expression, non-ASCII and non-printable characters other
1619than newline are not special, and represent themselves. For example,
1620in a locale using UTF-8 the command @samp{grep 'Λ@tie{}ω'} (where the
1621white space between @samp{Λ} and the @samp{ω} is a tab character)
1622searches for @samp{Λ} (Unicode character U+039B GREEK CAPITAL LETTER
1623LAMBDA), followed by a tab (U+0009 TAB), followed by @samp{ω} (U+03C9
1624GREEK SMALL LETTER OMEGA).
1625
1626Suppose you want to limit your pattern to only printable characters
1627(or even only printable ASCII characters) to keep your script readable
1628or portable, but you also want to match specific non-ASCII or non-null
1629non-printable characters. If you are using the @option{-P}
1630(@option{--perl-regexp}) option, PCREs give you several ways to do
1631this. Otherwise, if you are using Bash, the GNU project's shell, you
1632can represent these characters via ANSI-C quoting. For example, the
1633Bash commands @samp{grep $'Λ\tω'} and @samp{grep $'\u039B\t\u03C9'}
1634both search for the same three-character string @samp{Λ@tie{}ω}
1635mentioned earlier. However, because Bash translates ANSI-C quoting
1636before @command{grep} sees the pattern, this technique should not be
1637used to match printable ASCII characters; for example, @samp{grep
1638$'\u005E'} is equivalent to @samp{grep '^'} and matches any line, not
1639just lines containing the character @samp{^} (U+005E CIRCUMFLEX
1640ACCENT).
1641
1642Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable
1643shell scripts written in ASCII should use other methods to match
1644specific non-ASCII characters. For example, in a UTF-8 locale the
1645command @samp{grep "$(printf '\316\233\t\317\211\n')"} is a portable
1646albeit hard-to-read alternative to Bash's @samp{grep $'Λ\tω'}.
1647However, none of these techniques will let you put a null character
1648directly into a command-line pattern; null characters can appear only
1649in a pattern specified via the @option{-f} (@option{--file}) option.
1650
1651@node Usage
1652@chapter Usage
1653
1654@cindex usage, examples
1655Here is an example command that invokes GNU @command{grep}:
1656
1657@example
1658grep -i 'hello.*world' menu.h main.c
1659@end example
1660
1661@noindent
1662This lists all lines in the files @file{menu.h} and @file{main.c} that
1663contain the string @samp{hello} followed by the string @samp{world};
1664this is because @samp{.*} matches zero or more characters within a line.
1665@xref{Regular Expressions}.
1666The @option{-i} option causes @command{grep}
1667to ignore case, causing it to match the line @samp{Hello, world!}, which
1668it would not otherwise match.
1669
1670Here is a more complex example,
1671showing the location and contents of any line
1672containing @samp{f} and ending in @samp{.c},
1673within all files in the current directory whose names
1674start with non-@samp{.}, contain @samp{g}, and end in @samp{.h}.
1675The @option{-n} option outputs line numbers, the @option{--} argument
1676treats any later arguments as file names not options even if
1677@code{*g*.h} expands to a file name that starts with @samp{-},
1678and the empty file @file{/dev/null} causes file names to be output
1679even if only one file name happens to be of the form @samp{*g*.h}.
1680
1681@example
1682grep -n -- 'f.*\.c$' *g*.h /dev/null
1683@end example
1684
1685@noindent
1686Note that the regular expression syntax used in the pattern differs
1687from the globbing syntax that the shell uses to match file names.
1688
1689@xref{Invoking}, for more details about
1690how to invoke @command{grep}.
1691
1692@cindex using @command{grep}, Q&A
1693@cindex FAQ about @command{grep} usage
1694Here are some common questions and answers about @command{grep} usage.
1695
1696@enumerate
1697
1698@item
1699How can I list just the names of matching files?
1700
1701@example
1702grep -l 'main' test-*.c
1703@end example
1704
1705@noindent
1706lists names of @samp{test-*.c} files in the current directory whose contents
1707mention @samp{main}.
1708
1709@item
1710How do I search directories recursively?
1711
1712@example
1713grep -r 'hello' /home/gigi
1714@end example
1715
1716@noindent
1717searches for @samp{hello} in all files
1718under the @file{/home/gigi} directory.
1719For more control over which files are searched,
1720use @command{find} and @command{grep}.
1721For example, the following command searches only C files:
1722
1723@example
1724find /home/gigi -name '*.c' ! -type d \
1725 -exec grep -H 'hello' '@{@}' +
1726@end example
1727
1728This differs from the command:
1729
1730@example
1731grep -H 'hello' /home/gigi/*.c
1732@end example
1733
1734which merely looks for @samp{hello} in non-hidden C files in
1735@file{/home/gigi} whose names end in @samp{.c}.
1736The @command{find} command line above is more similar to the command:
1737
1738@example
1739grep -r --include='*.c' 'hello' /home/gigi
1740@end example
1741
1742@item
1743What if a pattern or file has a leading @samp{-}?
1744
1745@example
1746grep -- '--cut here--' *
1747@end example
1748
1749@noindent
1750searches for all lines matching @samp{--cut here--}.
1751Without @option{--},
1752@command{grep} would attempt to parse @samp{--cut here--} as a list of
1753options, and there would be similar problems with any file names
1754beginning with @samp{-}.
1755
1756Alternatively, you can prevent misinterpretation of leading @samp{-}
1757by using @option{-e} for patterns and leading @samp{./} for files:
1758
1759@example
1760grep -e '--cut here--' ./*
1761@end example
1762
1763@item
1764Suppose I want to search for a whole word, not a part of a word?
1765
1766@example
1767grep -w 'hello' test*.log
1768@end example
1769
1770@noindent
1771searches only for instances of @samp{hello} that are entire words;
1772it does not match @samp{Othello}.
1773For more control, use @samp{\<} and
1774@samp{\>} to match the start and end of words.
1775For example:
1776
1777@example
1778grep 'hello\>' test*.log
1779@end example
1780
1781@noindent
1782searches only for words ending in @samp{hello}, so it matches the word
1783@samp{Othello}.
1784
1785@item
1786How do I output context around the matching lines?
1787
1788@example
1789grep -C 2 'hello' test*.log
1790@end example
1791
1792@noindent
1793prints two lines of context around each matching line.
1794
1795@item
1796How do I force @command{grep} to print the name of the file?
1797
1798Append @file{/dev/null}:
1799
1800@example
1801grep 'eli' /etc/passwd /dev/null
1802@end example
1803
1804gets you:
1805
1806@example
1807/etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash
1808@end example
1809
1810Alternatively, use @option{-H}, which is a GNU extension:
1811
1812@example
1813grep -H 'eli' /etc/passwd
1814@end example
1815
1816@item
1817Why do people use strange regular expressions on @command{ps} output?
1818
1819@example
1820ps -ef | grep '[c]ron'
1821@end example
1822
1823If the pattern had been written without the square brackets, it would
1824have matched not only the @command{ps} output line for @command{cron},
1825but also the @command{ps} output line for @command{grep}.
1826Note that on some platforms,
1827@command{ps} limits the output to the width of the screen;
1828@command{grep} does not have any limit on the length of a line
1829except the available memory.
1830
1831@item
1832Why does @command{grep} report ``Binary file matches''?
1833
1834If @command{grep} listed all matching ``lines'' from a binary file, it
1835would probably generate output that is not useful, and it might even
1836muck up your display.
1837So GNU @command{grep} suppresses output from
1838files that appear to be binary files.
1839To force GNU @command{grep}
1840to output lines even from files that appear to be binary, use the
1841@option{-a} or @samp{--binary-files=text} option.
1842To eliminate the
1843``Binary file matches'' messages, use the @option{-I} or
1844@samp{--binary-files=without-match} option,
1845or the @option{-s} or @option{--no-messages} option.
1846
1847@item
1848Why doesn't @samp{grep -lv} print non-matching file names?
1849
1850@samp{grep -lv} lists the names of all files containing one or more
1851lines that do not match.
1852To list the names of all files that contain no
1853matching lines, use the @option{-L} or @option{--files-without-match}
1854option.
1855
1856@item
1857I can do ``OR'' with @samp{|}, but what about ``AND''?
1858
1859@example
1860grep 'paul' /etc/motd | grep 'franc,ois'
1861@end example
1862
1863@noindent
1864finds all lines that contain both @samp{paul} and @samp{franc,ois}.
1865
1866@item
1867Why does the empty pattern match every input line?
1868
1869The @command{grep} command searches for lines that contain strings
1870that match a pattern. Every line contains the empty string, so an
1871empty pattern causes @command{grep} to find a match on each line. It
1872is not the only such pattern: @samp{^}, @samp{$}, and many
1873other patterns cause @command{grep} to match every line.
1874
1875To match empty lines, use the pattern @samp{^$}. To match blank
1876lines, use the pattern @samp{^[[:blank:]]*$}. To match no lines at
1877all, use the command @samp{grep -f /dev/null}.
1878
1879@item
1880How can I search in both standard input and in files?
1881
1882Use the special file name @samp{-}:
1883
1884@example
1885cat /etc/passwd | grep 'alain' - /etc/motd
1886@end example
1887
1888@item
1889Why is this back-reference failing?
1890
1891@example
1892echo 'ba' | grep -E '(a)\1|b\1'
1893@end example
1894
1895This outputs an error message, because the second @samp{\1}
1896has nothing to refer back to, meaning it will never match anything.
1897
1898@item
1899How can I match across lines?
1900
1901Standard grep cannot do this, as it is fundamentally line-based.
1902Therefore, merely using the @code{[:space:]} character class does not
1903match newlines in the way you might expect.
1904
1905With the GNU @command{grep} option @option{-z} (@option{--null-data}), each
1906input and output ``line'' is null-terminated; @pxref{Other Options}. Thus,
1907you can match newlines in the input, but typically if there is a match
1908the entire input is output, so this usage is often combined with
1909output-suppressing options like @option{-q}, e.g.:
1910
1911@example
1912printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
1913@end example
1914
1915If this does not suffice, you can transform the input
1916before giving it to @command{grep}, or turn to @command{awk},
1917@command{sed}, @command{perl}, or many other utilities that are
1918designed to operate across lines.
1919
1920@item
1921What do @command{grep}, @command{fgrep}, and @command{egrep} stand for?
1922
1923The name @command{grep} comes from the way line editing was done on Unix.
1924For example,
1925@command{ed} uses the following syntax
1926to print a list of matching lines on the screen:
1927
1928@example
1929global/regular expression/print
1930g/re/p
1931@end example
1932
1933@command{fgrep} stands for Fixed @command{grep};
1934@command{egrep} stands for Extended @command{grep}.
1935
1936@end enumerate
1937
1938
1939@node Performance
1940@chapter Performance
1941
1942@cindex performance
1943Typically @command{grep} is an efficient way to search text. However,
1944it can be quite slow in some cases, and it can search large files
1945where even minor performance tweaking can help significantly.
1946Although the algorithm used by @command{grep} is an implementation
1947detail that can change from release to release, understanding its
1948basic strengths and weaknesses can help you improve its performance.
1949
1950The @command{grep} command operates partly via a set of automata that
1951are designed for efficiency, and partly via a slower matcher that
1952takes over when the fast matchers run into unusual features like
1953back-references. When feasible, the Boyer--Moore fast string
1954searching algorithm is used to match a single fixed pattern, and the
1955Aho--Corasick algorithm is used to match multiple fixed patterns.
1956
1957@cindex locales
1958Generally speaking @command{grep} operates more efficiently in
1959single-byte locales, since it can avoid the special processing needed
1960for multi-byte characters. If your patterns will work just as well
1961that way, setting @env{LC_ALL} to a single-byte locale can help
1962performance considerably. Setting @samp{LC_ALL='C'} can be
1963particularly efficient, as @command{grep} is tuned for that locale.
1964
1965@cindex case insensitive search
1966Outside the @samp{C} locale, case-insensitive search, and search for
1967bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be
1968surprisingly inefficient due to difficulties in fast portable access to
1969concepts like multi-character collating elements.
1970
1971@cindex back-references
1972A back-reference such as @samp{\1} can hurt performance significantly
1973in some cases, since back-references cannot in general be implemented
1974via a finite state automaton, and instead trigger a backtracking
1975algorithm that can be quite inefficient. For example, although the
1976pattern @samp{^(.*)\1@{14@}(.*)\2@{13@}$} matches only lines whose
1977lengths can be written as a sum @math{15x + 14y} for nonnegative
1978integers @math{x} and @math{y}, the pattern matcher does not perform
1979linear Diophantine analysis and instead backtracks through all
1980possible matching strings, using an algorithm that is exponential in
1981the worst case.
1982
1983@cindex holes in files
1984On some operating systems that support files with holes---large
1985regions of zeros that are not physically present on secondary
1986storage---@command{grep} can skip over the holes efficiently without
1987needing to read the zeros. This optimization is not available if the
1988@option{-a} (@option{--binary-files=text}) option is used (@pxref{File and
1989Directory Selection}), unless the @option{-z} (@option{--null-data})
1990option is also used (@pxref{Other Options}).
1991
1992For more about the algorithms used by @command{grep} and about
1993related string matching algorithms, see:
1994
1995@frenchspacing on
1996@itemize @bullet
1997@item
1998Aho AV. Algorithms for finding patterns in strings.
1999In: van Leeuwen J. @emph{Handbook of Theoretical Computer Science}, vol. A.
2000New York: Elsevier; 1990. p. 255--300.
2001This surveys classic string matching algorithms, some of which are
2002used by @command{grep}.
2003
2004@item
2005Aho AV, Corasick MJ. Efficient string matching: an aid to bibliographic search.
2006@emph{CACM}. 1975;18(6):333--40.
2007@url{https://dx.doi.org/10.1145/360825.360855}.
2008This introduces the Aho--Corasick algorithm.
2009
2010@item
2011Boyer RS, Moore JS. A fast string searching algorithm.
2012@emph{CACM}. 1977;20(10):762--72.
2013@url{https://dx.doi.org/10.1145/359842.359859}.
2014This introduces the Boyer--Moore algorithm.
2015
2016@item
2017Faro S, Lecroq T. The exact online string matching problem: a review
2018of the most recent results.
2019@emph{ACM Comput Surv}. 2013;45(2):13.
2020@url{https://dx.doi.org/10.1145/2431211.2431212}.
2021This surveys string matching algorithms that might help improve the
2022performance of @command{grep} in the future.
2023@end itemize
2024@frenchspacing off
2025
2026@node Reporting Bugs
2027@chapter Reporting bugs
2028
2029@cindex bugs, reporting
2030Bug reports can be found at the
2031@url{https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep,
2032GNU bug report logs for @command{grep}}.
2033If you find a bug not listed there, please email it to
2034@email{bug-grep@@gnu.org} to create a new bug report.
2035
2036@menu
2037* Known Bugs::
2038@end menu
2039
2040@node Known Bugs
2041@section Known Bugs
2042@cindex Bugs, known
2043
2044Large repetition counts in the @samp{@{n,m@}} construct may cause
2045@command{grep} to use lots of memory.
2046In addition, certain other
2047obscure regular expressions require exponential time and
2048space, and may cause @command{grep} to run out of memory.
2049
2050Back-references can greatly slow down matching, as they can generate
2051exponentially many matching possibilities that can consume both time
2052and memory to explore. Also, the POSIX specification for
2053back-references is at times unclear. Furthermore, many regular
2054expression implementations have back-reference bugs that can cause
2055programs to return incorrect answers or even crash, and fixing these
2056bugs has often been low-priority: for example, as of 2021 the
2057@url{https://sourceware.org/bugzilla/,GNU C library bug database}
2058contained back-reference bugs
2059@url{https://sourceware.org/bugzilla/show_bug.cgi?id=52,,52},
2060@url{https://sourceware.org/bugzilla/show_bug.cgi?id=10844,,10844},
2061@url{https://sourceware.org/bugzilla/show_bug.cgi?id=11053,,11053},
2062@url{https://sourceware.org/bugzilla/show_bug.cgi?id=24269,,24269}
2063and @url{https://sourceware.org/bugzilla/show_bug.cgi?id=25322,,25322},
2064with little sign of forthcoming fixes. Luckily,
2065back-references are rarely useful and it should be little trouble to
2066avoid them in practical applications.
2067
2068
2069@node Copying
2070@chapter Copying
2071@cindex copying
2072
2073GNU @command{grep} is licensed under the GNU GPL, which makes it @dfn{free
2074software}.
2075
2076The ``free'' in ``free software'' refers to liberty, not price. As
2077some GNU project advocates like to point out, think of ``free speech''
2078rather than ``free beer''. In short, you have the right (freedom) to
2079run and change @command{grep} and distribute it to other people, and---if you
2080want---charge money for doing either. The important restriction is
2081that you have to grant your recipients the same rights and impose the
2082same restrictions.
2083
2084This general method of licensing software is sometimes called
2085@dfn{open source}. The GNU project prefers the term ``free software''
2086for reasons outlined at
2087@url{https://www.gnu.org/philosophy/open-source-misses-the-point.html}.
2088
2089This manual is free documentation in the same sense. The
2090documentation license is included below. The license for the program
2091is available with the source code, or at
2092@url{https://www.gnu.org/licenses/gpl.html}.
2093
2094@menu
2095* GNU Free Documentation License::
2096@end menu
2097
2098@node GNU Free Documentation License
2099@section GNU Free Documentation License
2100
2101@include fdl.texi
2102
2103
2104@node Index
2105@unnumbered Index
2106
2107@printindex cp
2108
2109@bye
Note: See TracBrowser for help on using the repository browser.

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette