VirtualBox

source: kBuild/trunk/src/sed/doc/sed.texi@ 3613

Last change on this file since 3613 was 3613, checked in by bird, 7 months ago

src/sed: Merged in changes between 4.1.5 and 4.9 from the vendor branch. (svn merge /vendor/sed/4.1.5 /vendor/sed/current .)

File size: 161.5 KB
Line 
1\input texinfo @c -*-texinfo-*-
2@c
3@c -- Stuff that needs adding: ----------------------------------------------
4@c (nothing!)
5@c --------------------------------------------------------------------------
6@c Check for consistency: regexps in @code, text that they match in @samp.
7@c
8@c Tips:
9@c @command for command
10@c @samp for command fragments: @samp{cat -s}
11@c @code for sed commands and flags
12@c Use ``quote'' not `quote' or "quote".
13@c
14@c %**start of header
15@setfilename sed.info
16@settitle sed, a stream editor
17@c %**end of header
18
19@c @smallbook
20
21@include version.texi
22
23@c Combine indices.
24@syncodeindex ky cp
25@syncodeindex pg cp
26@syncodeindex tp cp
27
28@defcodeindex op
29@syncodeindex op fn
30
31@include config.texi
32
33@copying
34This file documents version @value{VERSION} of
35@value{SSED}, a stream editor.
36
37Copyright @copyright{} 1998--2022 Free Software Foundation, Inc.
38
39@quotation
40Permission is granted to copy, distribute and/or modify this document
41under the terms of the GNU Free Documentation License, Version 1.3
42or any later version published by the Free Software Foundation;
43with no Invariant Sections, no Front-Cover Texts, and no
44Back-Cover Texts. A copy of the license is included in the
45section entitled ``GNU Free Documentation License''.
46@end quotation
47@end copying
48
49@setchapternewpage off
50
51@titlepage
52@title @value{SSED}, a stream editor
53@subtitle version @value{VERSION}, @value{UPDATED}
54@author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon
55
56@page
57@vskip 0pt plus 1filll
58@insertcopying
59@end titlepage
60
61@contents
62
63@ifnottex
64@node Top
65@top @value{SSED}
66
67@insertcopying
68@end ifnottex
69
70@menu
71* Introduction:: Introduction
72* Invoking sed:: Invocation
73* sed scripts:: @command{sed} scripts
74* sed addresses:: Addresses: selecting lines
75* sed regular expressions:: Regular expressions: selecting text
76* advanced sed:: Advanced @command{sed}: cycles and buffers
77* Examples:: Some sample scripts
78* Limitations:: Limitations and (non-)limitations of @value{SSED}
79* Other Resources:: Other resources for learning about @command{sed}
80* Reporting Bugs:: Reporting bugs
81* GNU Free Documentation License:: Copying and sharing this manual
82* Concept Index:: A menu with all the topics in this manual.
83* Command and Option Index:: A menu with all @command{sed} commands and
84 command-line options.
85@end menu
86
87
88@node Introduction
89@chapter Introduction
90
91@cindex Stream editor
92@command{sed} is a stream editor.
93A stream editor is used to perform basic text
94transformations on an input stream
95(a file or input from a pipeline).
96While in some ways similar to an editor which
97permits scripted edits (such as @command{ed}),
98@command{sed} works by making only one pass over the
99input(s), and is consequently more efficient.
100But it is @command{sed}'s ability to filter text in a pipeline
101which particularly distinguishes it from other types of
102editors.
103
104
105@node Invoking sed
106@chapter Running sed
107
108This chapter covers how to run @command{sed}. Details of @command{sed}
109scripts and individual @command{sed} commands are discussed in the
110next chapter.
111
112@menu
113* Overview::
114* Command-Line Options::
115* Exit status::
116@end menu
117
118
119@node Overview
120@section Overview
121Normally @command{sed} is invoked like this:
122
123@example
124sed SCRIPT INPUTFILE...
125@end example
126
127For example, to change every @samp{hello} to @samp{world}
128in the file @file{input.txt}:
129
130@example
131sed 's/hello/world/g' input.txt > output.txt
132@end example
133
134Without the @samp{g} (global) modifier, @command{sed} affects
135only the first instance per line.
136
137@cindex stdin
138@cindex standard input
139If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
140@command{sed} filters the contents of the standard input. The following
141commands are equivalent:
142
143@example
144sed 's/hello/world/g' input.txt > output.txt
145sed 's/hello/world/g' < input.txt > output.txt
146cat input.txt | sed 's/hello/world/g' - > output.txt
147@end example
148
149@cindex stdout
150@cindex output
151@cindex standard output
152@cindex -i, example
153@command{sed} writes output to standard output. Use @option{-i} to edit
154files in-place instead of printing to standard output.
155See also the @code{W} and @code{s///w} commands for writing output to
156other files. The following command modifies @file{file.txt} and
157does not produce any output:
158
159@example
160sed -i 's/hello/world/' file.txt
161@end example
162
163@cindex -n, example
164@cindex p, example
165@cindex suppressing output
166@cindex output, suppressing
167By default @command{sed} prints all processed input (except input
168that has been modified/deleted by commands such as @command{d}).
169Use @option{-n} to suppress output, and the @code{p} command
170to print specific lines. The following command prints only line 45
171of the input file:
172
173@example
174sed -n '45p' file.txt
175@end example
176
177
178
179@cindex multiple files
180@cindex -s, example
181@command{sed} treats multiple input files as one long stream.
182The following example prints the first line of the first file
183(@file{one.txt}) and the last line of the last file (@file{three.txt}).
184Use @option{-s} to reverse this behavior.
185
186@example
187sed -n '1p ; $p' one.txt two.txt three.txt
188@end example
189
190
191@cindex -e, example
192@cindex --expression, example
193@cindex -f, example
194@cindex --file, example
195@cindex script parameter
196@cindex parameters, script
197Without @option{-e} or @option{-f} options, @command{sed} uses
198the first non-option parameter as the @var{script}, and the following
199non-option parameters as input files.
200If @option{-e} or @option{-f} options are used to specify a @var{script},
201all non-option parameters are taken as input files.
202Options @option{-e} and @option{-f} can be combined, and can appear
203multiple times (in which case the final effective @var{script} will be
204concatenation of all the individual @var{script}s).
205
206The following examples are equivalent:
207
208@example
209sed 's/hello/world/' input.txt > output.txt
210
211sed -e 's/hello/world/' input.txt > output.txt
212sed --expression='s/hello/world/' input.txt > output.txt
213
214echo 's/hello/world/' > myscript.sed
215sed -f myscript.sed input.txt > output.txt
216sed --file=myscript.sed input.txt > output.txt
217@end example
218
219
220@node Command-Line Options
221@section Command-Line Options
222
223The full format for invoking @command{sed} is:
224
225@example
226sed OPTIONS... [SCRIPT] [INPUTFILE...]
227@end example
228
229@command{sed} may be invoked with the following command-line options:
230
231@table @code
232@item --version
233@opindex --version
234@cindex Version, printing
235Print out the version of @command{sed} that is being run and a copyright notice,
236then exit.
237
238@item --help
239@opindex --help
240@cindex Usage summary, printing
241Print a usage message briefly summarizing these command-line options
242and the bug-reporting address,
243then exit.
244
245@item -n
246@itemx --quiet
247@itemx --silent
248@opindex -n
249@opindex --quiet
250@opindex --silent
251@cindex Disabling autoprint, from command line
252By default, @command{sed} prints out the pattern space
253at the end of each cycle through the script (@pxref{Execution Cycle, ,
254How @code{sed} works}).
255These options disable this automatic printing,
256and @command{sed} only produces output when explicitly told to
257via the @code{p} command.
258
259@item --debug
260@opindex --debug
261@cindex @value{SSEDEXT}, debug
262Print the input sed program in canonical form,
263and annotate program execution.
264@codequotebacktick on
265@codequoteundirected on
266@example
267$ echo 1 | sed '\%1%s21232'
2683
269
270$ echo 1 | sed --debug '\%1%s21232'
271SED PROGRAM:
272 /1/ s/1/3/
273INPUT: 'STDIN' line 1
274PATTERN: 1
275COMMAND: /1/ s/1/3/
276PATTERN: 3
277END-OF-CYCLE:
2783
279@end example
280@codequotebacktick off
281@codequoteundirected off
282
283
284@item -e @var{script}
285@itemx --expression=@var{script}
286@opindex -e
287@opindex --expression
288@cindex Script, from command line
289Add the commands in @var{script} to the set of commands to be
290run while processing the input.
291
292@item -f @var{script-file}
293@itemx --file=@var{script-file}
294@opindex -f
295@opindex --file
296@cindex Script, from a file
297Add the commands contained in the file @var{script-file}
298to the set of commands to be run while processing the input.
299
300@item -i[@var{SUFFIX}]
301@itemx --in-place[=@var{SUFFIX}]
302@opindex -i
303@opindex --in-place
304@cindex In-place editing, activating
305@cindex @value{SSEDEXT}, in-place editing
306This option specifies that files are to be edited in-place.
307@value{SSED} does this by creating a temporary file and
308sending output to this file rather than to the standard
309output.@footnote{This applies to commands such as @code{=},
310@code{a}, @code{c}, @code{i}, @code{l}, @code{p}. You can
311still write to the standard output by using the @code{w}
312@cindex @value{SSEDEXT}, @file{/dev/stdout} file
313or @code{W} commands together with the @file{/dev/stdout}
314special file}.
315
316This option implies @option{-s}.
317
318When the end of the file is reached, the temporary file is
319renamed to the output file's original name. The extension,
320if supplied, is used to modify the name of the old file
321before renaming the temporary file, thereby making a backup
322copy@footnote{Note that @value{SSED} creates the backup
323file whether or not any output is actually changed.}).
324
325@cindex In-place editing, Perl-style backup file names
326This rule is followed: if the extension doesn't contain a @code{*},
327then it is appended to the end of the current filename as a
328suffix; if the extension does contain one or more @code{*}
329characters, then @emph{each} asterisk is replaced with the
330current filename. This allows you to add a prefix to the
331backup file, instead of (or in addition to) a suffix, or
332even to place backup copies of the original files into another
333directory (provided the directory already exists).
334
335If no extension is supplied, the original file is
336overwritten without making a backup.
337
338Because @option{-i} takes an optional argument, it should
339not be followed by other short options:
340@table @code
341@item sed -Ei '...' FILE
342Same as @option{-E -i} with no backup suffix - @file{FILE} will be
343edited in-place without creating a backup.
344
345@item sed -iE '...' FILE
346This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup
347of @file{FILE}
348@end table
349
350Be cautious of using @option{-n} with @option{-i}: the former disables
351automatic printing of lines and the latter changes the file in-place
352without a backup. Used carelessly (and without an explicit @code{p} command),
353the output file will be empty:
354@codequotebacktick on
355@codequoteundirected on
356@example
357# WRONG USAGE: 'FILE' will be truncated.
358sed -ni 's/foo/bar/' FILE
359@end example
360@codequotebacktick off
361@codequoteundirected off
362
363@item -l @var{N}
364@itemx --line-length=@var{N}
365@opindex -l
366@opindex --line-length
367@cindex Line length, setting
368Specify the default line-wrap length for the @code{l} command.
369A length of 0 (zero) means to never wrap long lines. If
370not specified, it is taken to be 70.
371
372@item --posix
373@opindex --posix
374@cindex @value{SSEDEXT}, disabling
375@value{SSED} includes several extensions to POSIX
376sed. In order to simplify writing portable scripts, this
377option disables all the extensions that this manual documents,
378including additional commands.
379@cindex @code{POSIXLY_CORRECT} behavior, enabling
380Most of the extensions accept @command{sed} programs that
381are outside the syntax mandated by POSIX, but some
382of them (such as the behavior of the @command{N} command
383described in @ref{Reporting Bugs}) actually violate the
384standard. If you want to disable only the latter kind of
385extension, you can set the @code{POSIXLY_CORRECT} variable
386to a non-empty value.
387
388@item -b
389@itemx --binary
390@opindex -b
391@opindex --binary
392This option is available on every platform, but is only effective where the
393operating system makes a distinction between text files and binary files.
394When such a distinction is made---as is the case for MS-DOS, Windows,
395Cygwin---text files are composed of lines separated by a carriage return
396@emph{and} a line feed character, and @command{sed} does not see the
397ending CR. When this option is specified, @command{sed} will open
398input files in binary mode, thus not requesting this special processing
399and considering lines to end at a line feed.
400
401@item --follow-symlinks
402@opindex --follow-symlinks
403This option is available only on platforms that support
404symbolic links and has an effect only if option @option{-i}
405is specified. In this case, if the file that is specified
406on the command line is a symbolic link, @command{sed} will
407follow the link and edit the ultimate destination of the
408link. The default behavior is to break the symbolic link,
409so that the link destination will not be modified.
410
411@item -E
412@itemx -r
413@itemx --regexp-extended
414@opindex -E
415@opindex -r
416@opindex --regexp-extended
417@cindex Extended regular expressions, choosing
418@cindex GNU extensions, extended regular expressions
419Use extended regular expressions rather than basic
420regular expressions. Extended regexps are those that
421@command{egrep} accepts; they can be clearer because they
422usually have fewer backslashes.
423Historically this was a GNU extension,
424but the @option{-E}
425extension has since been added to the POSIX standard
426(http://austingroupbugs.net/view.php?id=528),
427so use @option{-E} for portability.
428GNU sed has accepted @option{-E} as an undocumented option for years,
429and *BSD seds have accepted @option{-E} for years as well,
430but scripts that use @option{-E} might not port to other older systems.
431@xref{ERE syntax, , Extended regular expressions}.
432
433
434@item -s
435@itemx --separate
436@opindex -s
437@opindex --separate
438@cindex Working on separate files
439By default, @command{sed} will consider the files specified on the
440command line as a single continuous long stream. This @value{SSED}
441extension allows the user to consider them as separate files:
442range addresses (such as @samp{/abc/,/def/}) are not allowed
443to span several files, line numbers are relative to the start
444of each file, @code{$} refers to the last line of each file,
445and files invoked from the @code{R} commands are rewound at the
446start of each file.
447
448@item --sandbox
449@opindex --sandbox
450@cindex Sandbox mode
451In sandbox mode, @code{e/w/r} commands are rejected - programs containing
452them will be aborted without being run. Sandbox mode ensures @command{sed}
453operates only on the input files designated on the command line, and
454cannot run external programs.
455
456
457@item -u
458@itemx --unbuffered
459@opindex -u
460@opindex --unbuffered
461@cindex Unbuffered I/O, choosing
462Buffer both input and output as minimally as practical.
463(This is particularly useful if the input is coming from
464the likes of @samp{tail -f}, and you wish to see the transformed
465output as soon as possible.)
466
467@item -z
468@itemx --null-data
469@itemx --zero-terminated
470@opindex -z
471@opindex --null-data
472@opindex --zero-terminated
473Treat the input as a set of lines, each terminated by a zero byte
474(the ASCII @samp{NUL} character) instead of a newline. This option can
475be used with commands like @samp{sort -z} and @samp{find -print0}
476to process arbitrary file names.
477@end table
478
479If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file}
480options are given on the command-line,
481then the first non-option argument on the command line is
482taken to be the @var{script} to be executed.
483
484@cindex Files to be processed as input
485If any command-line parameters remain after processing the above,
486these parameters are interpreted as the names of input files to
487be processed.
488@cindex Standard input, processing as input
489A file name of @samp{-} refers to the standard input stream.
490The standard input will be processed if no file names are specified.
491
492@node Exit status
493@section Exit status
494@cindex exit status
495An exit status of zero indicates success, and a nonzero value
496indicates failure. @value{SSED} returns the following exit status
497error values:
498
499@table @asis
500@item 0
501Successful completion.
502
503@item 1
504Invalid command, invalid syntax, invalid regular expression or a
505@value{SSED} extension command used with @option{--posix}.
506
507@item 2
508One or more of the input file specified on the command line could not be
509opened (e.g. if a file is not found, or read permission is denied).
510Processing continued with other files.
511
512@item 4
513An I/O error, or a serious processing error during runtime,
514@value{SSED} aborted immediately.
515@end table
516
517@cindex Q, example
518@cindex exit status, example
519Additionally, the commands @code{q} and @code{Q} can be used to terminate
520@command{sed} with a custom exit code value (this is a @value{SSED} extension):
521
522@example
523$ echo | sed 'Q42' ; echo $?
52442
525@end example
526
527
528@node sed scripts
529@chapter @command{sed} scripts
530
531
532@menu
533* sed script overview:: @command{sed} script overview
534* sed commands list:: @command{sed} commands summary
535* The "s" Command:: @command{sed}'s Swiss Army Knife
536* Common Commands:: Often used commands
537* Other Commands:: Less frequently used commands
538* Programming Commands:: Commands for @command{sed} gurus
539* Extended Commands:: Commands specific of @value{SSED}
540* Multiple commands syntax:: Extension for easier scripting
541@end menu
542
543@node sed script overview
544@section @command{sed} script overview
545
546@cindex @command{sed} script structure
547@cindex Script structure
548
549A @command{sed} program consists of one or more @command{sed} commands,
550passed in by one or more of the
551@option{-e}, @option{-f}, @option{--expression}, and @option{--file}
552options, or the first non-option argument if zero of these
553options are used.
554This document will refer to ``the'' @command{sed} script;
555this is understood to mean the in-order concatenation
556of all of the @var{script}s and @var{script-file}s passed in.
557@xref{Overview}.
558
559
560@cindex @command{sed} commands syntax
561@cindex syntax, @command{sed} commands
562@cindex addresses, syntax
563@cindex syntax, addresses
564@command{sed} commands follow this syntax:
565
566@example
567[addr]@var{X}[options]
568@end example
569
570@var{X} is a single-letter @command{sed} command.
571@c TODO: add @pxref{commands} when there is a command-list section.
572@code{[addr]} is an optional line address. If @code{[addr]} is specified,
573the command @var{X} will be executed only on the matched lines.
574@code{[addr]} can be a single line number, a regular expression,
575or a range of lines (@pxref{sed addresses}).
576Additional @code{[options]} are used for some @command{sed} commands.
577
578@cindex @command{d}, example
579@cindex address range, example
580@cindex example, address range
581The following example deletes lines 30 to 35 in the input.
582@code{30,35} is an address range. @command{d} is the delete command:
583
584@example
585sed '30,35d' input.txt > output.txt
586@end example
587
588@cindex @command{q}, example
589@cindex regular expression, example
590@cindex example, regular expression
591The following example prints all input until a line
592starting with the string @samp{foo} is found. If such line is found,
593@command{sed} will terminate with exit status 42.
594If such line was not found (and no other error occurred), @command{sed}
595will exit with status 0.
596@code{/^foo/} is a regular-expression address.
597@command{q} is the quit command. @code{42} is the command option.
598
599@example
600sed '/^foo/q42' input.txt > output.txt
601@end example
602
603
604@cindex multiple @command{sed} commands
605@cindex @command{sed} commands, multiple
606@cindex newline, command separator
607@cindex semicolons, command separator
608@cindex ;, command separator
609@cindex -e, example
610@cindex -f, example
611Commands within a @var{script} or @var{script-file} can be
612separated by semicolons (@code{;}) or newlines (ASCII 10).
613Multiple scripts can be specified with @option{-e} or @option{-f}
614options.
615
616The following examples are all equivalent. They perform two @command{sed}
617operations: deleting any lines matching the regular expression @code{/^foo/},
618and replacing all occurrences of the string @samp{hello} with @samp{world}:
619
620@example
621sed '/^foo/d ; s/hello/world/g' input.txt > output.txt
622
623sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt
624
625echo '/^foo/d' > script.sed
626echo 's/hello/world/g' >> script.sed
627sed -f script.sed input.txt > output.txt
628
629echo 's/hello/world/g' > script2.sed
630sed -e '/^foo/d' -f script2.sed input.txt > output.txt
631@end example
632
633
634@cindex @command{a}, and semicolons
635@cindex @command{c}, and semicolons
636@cindex @command{i}, and semicolons
637Commands @command{a}, @command{c}, @command{i}, due to their syntax,
638cannot be followed by semicolons working as command separators and
639thus should be terminated
640with newlines or be placed at the end of a @var{script} or @var{script-file}.
641Commands can also be preceded with optional non-significant
642whitespace characters.
643@xref{Multiple commands syntax}.
644
645
646
647@node sed commands list
648@section @command{sed} commands summary
649
650The following commands are supported in @value{SSED}.
651Some are standard POSIX commands, while other are @value{SSEDEXT}.
652Details and examples for each command are in the following sections.
653(Mnemonics) are shown in parentheses.
654
655@table @code
656
657@item a\
658@itemx @var{text}
659Append @var{text} after a line.
660
661@item a @var{text}
662Append @var{text} after a line (alternative syntax).
663
664@item b @var{label}
665Branch unconditionally to @var{label}.
666The @var{label} may be omitted, in which case the next cycle is started.
667
668@item c\
669@itemx @var{text}
670Replace (change) lines with @var{text}.
671
672@item c @var{text}
673Replace (change) lines with @var{text} (alternative syntax).
674
675@item d
676Delete the pattern space;
677immediately start next cycle.
678
679@item D
680If pattern space contains newlines, delete text in the pattern
681space up to the first newline, and restart cycle with the resultant
682pattern space, without reading a new line of input.
683
684If pattern space contains no newline, start a normal new cycle as if
685the @code{d} command was issued.
686@c TODO: add a section about D+N and D+n commands
687
688@item e
689Executes the command that is found in pattern space and
690replaces the pattern space with the output; a trailing newline
691is suppressed.
692
693@item e @var{command}
694Executes @var{command} and sends its output to the output stream.
695The command can run across multiple lines, all but the last ending with
696a back-slash.
697
698@item F
699(filename) Print the file name of the current input file (with a trailing
700newline).
701
702@item g
703Replace the contents of the pattern space with the contents of the hold space.
704
705@item G
706Append a newline to the contents of the pattern space,
707and then append the contents of the hold space to that of the pattern space.
708
709@item h
710(hold) Replace the contents of the hold space with the contents of the
711pattern space.
712
713@item H
714Append a newline to the contents of the hold space,
715and then append the contents of the pattern space to that of the hold space.
716
717@item i\
718@itemx @var{text}
719insert @var{text} before a line.
720
721@item i @var{text}
722insert @var{text} before a line (alternative syntax).
723
724@item l
725Print the pattern space in an unambiguous form.
726
727@item n
728(next) If auto-print is not disabled, print the pattern space,
729then, regardless, replace the pattern space with the next line of input.
730If there is no more input then @command{sed} exits without processing
731any more commands.
732
733@item N
734Add a newline to the pattern space,
735then append the next line of input to the pattern space.
736If there is no more input then @command{sed} exits without processing
737any more commands.
738
739@item p
740Print the pattern space.
741@c useful with @option{-n}
742
743@item P
744Print the pattern space, up to the first <newline>.
745
746@item q@var{[exit-code]}
747(quit) Exit @command{sed} without processing any more commands or input.
748
749@item Q@var{[exit-code]}
750(quit) This command is the same as @code{q}, but will not print the
751contents of pattern space. Like @code{q}, it provides the
752ability to return an exit code to the caller.
753@c useful to quit on a conditional without printing
754
755@item r filename
756Reads file @var{filename}.
757
758@item R filename
759Queue a line of @var{filename} to be read and
760inserted into the output stream at the end of the current cycle,
761or when the next input line is read.
762@c useful to interleave files
763
764@item s@var{/regexp/replacement/[flags]}
765(substitute) Match the regular-expression against the content of the
766pattern space. If found, replace matched string with
767@var{replacement}.
768
769@item t @var{label}
770(test) Branch to @var{label} only if there has been a successful
771@code{s}ubstitution since the last input line was read or conditional
772branch was taken. The @var{label} may be omitted, in which case the
773next cycle is started.
774
775@item T @var{label}
776(test) Branch to @var{label} only if there have been no successful
777@code{s}ubstitutions since the last input line was read or
778conditional branch was taken. The @var{label} may be omitted,
779in which case the next cycle is started.
780
781@item v @var{[version]}
782(version) This command does nothing, but makes @command{sed} fail if
783@value{SSED} extensions are not supported, or if the requested version
784is not available.
785
786@item w filename
787Write the pattern space to @var{filename}.
788
789@item W filename
790Write to the given filename the portion of the pattern space up to
791the first newline
792
793@item x
794Exchange the contents of the hold and pattern spaces.
795
796
797@item y/src/dst/
798Transliterate any characters in the pattern space which match
799any of the @var{source-chars} with the corresponding character
800in @var{dest-chars}.
801
802
803@item z
804(zap) This command empties the content of pattern space.
805
806@item #
807A comment, until the next newline.
808
809
810@item @{ @var{cmd ; cmd ...} @}
811Group several commands together.
812@c useful for multiple commands on same address
813
814@item =
815Print the current input line number (with a trailing newline).
816
817@item : @var{label}
818Specify the location of @var{label} for branch commands (@code{b},
819@code{t}, @code{T}).
820
821@end table
822
823
824@node The "s" Command
825@section The @code{s} Command
826
827The @code{s} command (as in substitute) is probably the most important
828in @command{sed} and has a lot of different options. The syntax of
829the @code{s} command is
830@samp{s/@var{regexp}/@var{replacement}/@var{flags}}.
831
832Its basic concept is simple: the @code{s} command attempts to match
833the pattern space against the supplied regular expression @var{regexp};
834if the match is successful, then that portion of the
835pattern space which was matched is replaced with @var{replacement}.
836
837For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular
838Expression Addresses}.
839
840@cindex Backreferences, in regular expressions
841@cindex Parenthesized substrings
842The @var{replacement} can contain @code{\@var{n}} (@var{n} being
843a number from 1 to 9, inclusive) references, which refer to
844the portion of the match which is contained between the @var{n}th
845@code{\(} and its matching @code{\)}.
846Also, the @var{replacement} can contain unescaped @code{&}
847characters which reference the whole matched portion
848of the pattern space.
849
850@c TODO: xref to backreference section mention @var{\'}.
851
852The @code{/}
853characters may be uniformly replaced by any other single
854character within any given @code{s} command. The @code{/}
855character (or whatever other character is used in its stead)
856can appear in the @var{regexp} or @var{replacement}
857only if it is preceded by a @code{\} character.
858
859
860
861@cindex @value{SSEDEXT}, case modifiers in @code{s} commands
862Finally, as a @value{SSED} extension, you can include a
863special sequence made of a backslash and one of the letters
864@code{L}, @code{l}, @code{U}, @code{u}, or @code{E}.
865The meaning is as follows:
866
867@table @code
868@item \L
869Turn the replacement
870to lowercase until a @code{\U} or @code{\E} is found,
871
872@item \l
873Turn the
874next character to lowercase,
875
876@item \U
877Turn the replacement to uppercase
878until a @code{\L} or @code{\E} is found,
879
880@item \u
881Turn the next character
882to uppercase,
883
884@item \E
885Stop case conversion started by @code{\L} or @code{\U}.
886@end table
887
888When the @code{g} flag is being used, case conversion does not
889propagate from one occurrence of the regular expression to
890another. For example, when the following command is executed
891with @samp{a-b-} in pattern space:
892@example
893s/\(b\?\)-/x\u\1/g
894@end example
895
896@noindent
897the output is @samp{axxB}. When replacing the first @samp{-},
898the @samp{\u} sequence only affects the empty replacement of
899@samp{\1}. It does not affect the @code{x} character that is
900added to pattern space when replacing @code{b-} with @code{xB}.
901
902On the other hand, @code{\l} and @code{\u} do affect the remainder
903of the replacement text if they are followed by an empty substitution.
904With @samp{a-b-} in pattern space, the following command:
905@example
906s/\(b\?\)-/\u\1x/g
907@end example
908
909@noindent
910will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with
911@samp{Bx}. If this behavior is undesirable, you can prevent it by
912adding a @samp{\E} sequence---after @samp{\1} in this case.
913
914To include a literal @code{\}, @code{&}, or newline in the final
915replacement, be sure to precede the desired @code{\}, @code{&},
916or newline in the @var{replacement} with a @code{\}.
917
918@findex s command, option flags
919@cindex Substitution of text, options
920The @code{s} command can be followed by zero or more of the
921following @var{flags}:
922
923@table @code
924@item g
925@cindex Global substitution
926@cindex Replacing all text matching regexp in a line
927Apply the replacement to @emph{all} matches to the @var{regexp},
928not just the first.
929
930@item @var{number}
931@cindex Replacing only @var{n}th match of regexp in a line
932Only replace the @var{number}th match of the @var{regexp}.
933
934@cindex GNU extensions, @code{g} and @var{number} modifier
935interaction in @code{s} command
936@cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
937Note: the @sc{posix} standard does not specify what should happen
938when you mix the @code{g} and @var{number} modifiers,
939and currently there is no widely agreed upon meaning
940across @command{sed} implementations.
941For @value{SSED}, the interaction is defined to be:
942ignore matches before the @var{number}th,
943and then match and replace all matches from
944the @var{number}th on.
945
946@item p
947@cindex Text, printing after substitution
948If the substitution was made, then print the new pattern space.
949
950Note: when both the @code{p} and @code{e} options are specified,
951the relative ordering of the two produces very different results.
952In general, @code{ep} (evaluate then print) is what you want,
953but operating the other way round can be useful for debugging.
954For this reason, the current version of @value{SSED} interprets
955specially the presence of @code{p} options both before and after
956@code{e}, printing the pattern space before and after evaluation,
957while in general flags for the @code{s} command show their
958effect just once. This behavior, although documented, might
959change in future versions.
960
961@item w @var{filename}
962@cindex Text, writing to a file after substitution
963@cindex @value{SSEDEXT}, @file{/dev/stdout} file
964@cindex @value{SSEDEXT}, @file{/dev/stderr} file
965If the substitution was made, then write out the result to the named file.
966As a @value{SSED} extension, two special values of @var{filename} are
967supported: @file{/dev/stderr}, which writes the result to the standard
968error, and @file{/dev/stdout}, which writes to the standard
969output.@footnote{This is equivalent to @code{p} unless the @option{-i}
970option is being used.}
971
972@item e
973@cindex Evaluate Bourne-shell commands, after substitution
974@cindex Subprocesses
975@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
976@cindex @value{SSEDEXT}, subprocesses
977This command allows one to pipe input from a shell command
978into pattern space. If a substitution was made, the command
979that is found in pattern space is executed and pattern space
980is replaced with its output. A trailing newline is suppressed;
981results are undefined if the command to be executed contains
982a @sc{nul} character. This is a @value{SSED} extension.
983
984@item I
985@itemx i
986@cindex GNU extensions, @code{I} modifier
987@cindex Case-insensitive matching
988The @code{I} modifier to regular-expression matching is a GNU
989extension which makes @command{sed} match @var{regexp} in a
990case-insensitive manner.
991
992@item M
993@itemx m
994@cindex @value{SSEDEXT}, @code{M} modifier
995The @code{M} modifier to regular-expression matching is a @value{SSED}
996extension which directs @value{SSED} to match the regular expression
997in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
998match respectively (in addition to the normal behavior) the empty string
999after a newline, and the empty string before a newline. There are
1000special character sequences
1001@ifclear PERL
1002(@code{\`} and @code{\'})
1003@end ifclear
1004which always match the beginning or the end of the buffer.
1005In addition,
1006the period character does not match a new-line character in
1007multi-line mode.
1008
1009
1010@end table
1011
1012@node Common Commands
1013@section Often-Used Commands
1014
1015If you use @command{sed} at all, you will quite likely want to know
1016these commands.
1017
1018@table @code
1019@item #
1020[No addresses allowed.]
1021
1022@findex # (comments)
1023@cindex Comments, in scripts
1024The @code{#} character begins a comment;
1025the comment continues until the next newline.
1026
1027@cindex Portability, comments
1028If you are concerned about portability, be aware that
1029some implementations of @command{sed} (which are not @sc{posix}
1030conforming) may only support a single one-line comment,
1031and then only when the very first character of the script is a @code{#}.
1032
1033@findex -n, forcing from within a script
1034@cindex Caveat --- #n on first line
1035Warning: if the first two characters of the @command{sed} script
1036are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
1037If you want to put a comment in the first line of your script
1038and that comment begins with the letter @samp{n}
1039and you do not want this behavior,
1040then be sure to either use a capital @samp{N},
1041or place at least one space before the @samp{n}.
1042
1043@item q [@var{exit-code}]
1044@findex q (quit) command
1045@cindex @value{SSEDEXT}, returning an exit code
1046@cindex Quitting
1047Exit @command{sed} without processing any more commands or input.
1048
1049Example: stop after printing the second line:
1050@example
1051$ seq 3 | sed 2q
10521
10532
1054@end example
1055
1056This command accepts only one address.
1057Note that the current pattern space is printed if auto-print is
1058not disabled with the @option{-n} options. The ability to return
1059an exit code from the @command{sed} script is a @value{SSED} extension.
1060
1061See also the @value{SSED} extension @code{Q} command which quits silently
1062without printing the current pattern space.
1063
1064@item d
1065@findex d (delete) command
1066@cindex Text, deleting
1067Delete the pattern space;
1068immediately start next cycle.
1069
1070Example: delete the second input line:
1071@example
1072$ seq 3 | sed 2d
10731
10743
1075@end example
1076
1077@item p
1078@findex p (print) command
1079@cindex Text, printing
1080Print out the pattern space (to the standard output).
1081This command is usually only used in conjunction with the @option{-n}
1082command-line option.
1083
1084Example: print only the second input line:
1085@example
1086$ seq 3 | sed -n 2p
10872
1088@end example
1089
1090@item n
1091@findex n (next-line) command
1092@cindex Next input line, replace pattern space with
1093@cindex Read next input line
1094If auto-print is not disabled, print the pattern space,
1095then, regardless, replace the pattern space with the next line of input.
1096If there is no more input then @command{sed} exits without processing
1097any more commands.
1098
1099This command is useful to skip lines (e.g. process every Nth line).
1100
1101Example: perform substitution on every 3rd line (i.e. two @code{n} commands
1102skip two lines):
1103@codequoteundirected on
1104@codequotebacktick on
1105@example
1106$ seq 6 | sed 'n;n;s/./x/'
11071
11082
1109x
11104
11115
1112x
1113@end example
1114
1115@value{SSED} provides an extension address syntax of @var{first}~@var{step}
1116to achieve the same result:
1117
1118@example
1119$ seq 6 | sed '0~3s/./x/'
11201
11212
1122x
11234
11245
1125x
1126@end example
1127
1128@codequotebacktick off
1129@codequoteundirected off
1130
1131
1132@item @{ @var{commands} @}
1133@findex @{@} command grouping
1134@cindex Grouping commands
1135@cindex Command groups
1136A group of commands may be enclosed between
1137@code{@{} and @code{@}} characters.
1138This is particularly useful when you want a group of commands
1139to be triggered by a single address (or address-range) match.
1140
1141Example: perform substitution then print the second input line:
1142@codequoteundirected on
1143@codequotebacktick on
1144@example
1145$ seq 3 | sed -n '2@{s/2/X/ ; p@}'
1146X
1147@end example
1148@codequoteundirected off
1149@codequotebacktick off
1150
1151@end table
1152
1153
1154@node Other Commands
1155@section Less Frequently-Used Commands
1156
1157Though perhaps less frequently used than those in the previous
1158section, some very small yet useful @command{sed} scripts can be built with
1159these commands.
1160
1161@table @code
1162@item y/@var{source-chars}/@var{dest-chars}/
1163@findex y (transliterate) command
1164@cindex Transliteration
1165Transliterate any characters in the pattern space which match
1166any of the @var{source-chars} with the corresponding character
1167in @var{dest-chars}.
1168
1169Example: transliterate @samp{a-j} into @samp{0-9}:
1170@codequoteundirected on
1171@codequotebacktick on
1172@example
1173$ echo hello world | sed 'y/abcdefghij/0123456789/'
117474llo worl3
1175@end example
1176@codequoteundirected off
1177@codequotebacktick off
1178
1179(The @code{/} characters may be uniformly replaced by
1180any other single character within any given @code{y} command.)
1181
1182Instances of the @code{/} (or whatever other character is used in its stead),
1183@code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
1184lists, provide that each instance is escaped by a @code{\}.
1185The @var{source-chars} and @var{dest-chars} lists @emph{must}
1186contain the same number of characters (after de-escaping).
1187
1188See the @command{tr} command from GNU coreutils for similar functionality.
1189
1190@item a @var{text}
1191Appending @var{text} after a line. This is a GNU extension
1192to the standard @code{a} command - see below for details.
1193
1194Example: Add @samp{hello} after the second line:
1195@codequoteundirected on
1196@codequotebacktick on
1197@example
1198$ seq 3 | sed '2a hello'
11991
12002
1201hello
12023
1203@end example
1204@codequoteundirected off
1205@codequotebacktick off
1206
1207Leading whitespace after the @code{a} command is ignored.
1208The text to add is read until the end of the line.
1209
1210
1211@item a\
1212@itemx @var{text}
1213@findex a (append text lines) command
1214@cindex Appending text after a line
1215@cindex Text, appending
1216Appending @var{text} after a line.
1217
1218Example: Add @samp{hello} after the second line
1219(@print{} indicates printed output lines):
1220@codequoteundirected on
1221@codequotebacktick on
1222@example
1223$ seq 3 | sed '2a\
1224hello'
1225@print{}1
1226@print{}2
1227@print{}hello
1228@print{}3
1229@end example
1230@codequoteundirected off
1231@codequotebacktick off
1232
1233The @code{a} command queues the lines of text which follow this command
1234(each but the last ending with a @code{\},
1235which are removed from the output)
1236to be output at the end of the current cycle,
1237or when the next input line is read.
1238
1239@cindex @value{SSEDEXT}, two addresses supported by most commands
1240As a GNU extension, this command accepts two addresses.
1241
1242Escape sequences in @var{text} are processed, so you should
1243use @code{\\} in @var{text} to print a single backslash.
1244
1245The commands resume after the last line without a backslash (@code{\}) -
1246@samp{world} in the following example:
1247@codequoteundirected on
1248@codequotebacktick on
1249@example
1250$ seq 3 | sed '2a\
1251hello\
1252world
12533s/./X/'
1254@print{}1
1255@print{}2
1256@print{}hello
1257@print{}world
1258@print{}X
1259@end example
1260@codequoteundirected off
1261@codequotebacktick off
1262
1263As a GNU extension, the @code{a} command and @var{text} can be
1264separated into two @code{-e} parameters, enabling easier scripting:
1265@codequoteundirected on
1266@codequotebacktick on
1267@example
1268$ seq 3 | sed -e '2a\' -e hello
12691
12702
1271hello
12723
1273
1274$ sed -e '2a\' -e "$VAR"
1275@end example
1276@codequoteundirected off
1277@codequotebacktick off
1278
1279@item i @var{text}
1280insert @var{text} before a line. This is a GNU extension
1281to the standard @code{i} command - see below for details.
1282
1283Example: Insert @samp{hello} before the second line:
1284@codequoteundirected on
1285@codequotebacktick on
1286@example
1287$ seq 3 | sed '2i hello'
12881
1289hello
12902
12913
1292@end example
1293@codequoteundirected off
1294@codequotebacktick off
1295
1296Leading whitespace after the @code{i} command is ignored.
1297The text to add is read until the end of the line.
1298
1299@anchor{insert command}
1300@item i\
1301@itemx @var{text}
1302@findex i (insert text lines) command
1303@cindex Inserting text before a line
1304@cindex Text, insertion
1305Immediately output the lines of text which follow this command.
1306
1307Example: Insert @samp{hello} before the second line
1308(@print{} indicates printed output lines):
1309@codequoteundirected on
1310@codequotebacktick on
1311@example
1312$ seq 3 | sed '2i\
1313hello'
1314@print{}1
1315@print{}hello
1316@print{}2
1317@print{}3
1318@end example
1319@codequoteundirected off
1320@codequotebacktick off
1321
1322@cindex @value{SSEDEXT}, two addresses supported by most commands
1323As a GNU extension, this command accepts two addresses.
1324
1325Escape sequences in @var{text} are processed, so you should
1326use @code{\\} in @var{text} to print a single backslash.
1327
1328The commands resume after the last line without a backslash (@code{\}) -
1329@samp{world} in the following example:
1330@codequoteundirected on
1331@codequotebacktick on
1332@example
1333$ seq 3 | sed '2i\
1334hello\
1335world
1336s/./X/'
1337@print{}X
1338@print{}hello
1339@print{}world
1340@print{}X
1341@print{}X
1342@end example
1343@codequoteundirected off
1344@codequotebacktick off
1345
1346As a GNU extension, the @code{i} command and @var{text} can be
1347separated into two @code{-e} parameters, enabling easier scripting:
1348@codequoteundirected on
1349@codequotebacktick on
1350@example
1351$ seq 3 | sed -e '2i\' -e hello
13521
1353hello
13542
13553
1356
1357$ sed -e '2i\' -e "$VAR"
1358@end example
1359@codequoteundirected off
1360@codequotebacktick off
1361
1362@item c @var{text}
1363Replaces the line(s) with @var{text}. This is a GNU extension
1364to the standard @code{c} command - see below for details.
1365
1366Example: Replace the 2nd to 9th lines with the word @samp{hello}:
1367@codequoteundirected on
1368@codequotebacktick on
1369@example
1370$ seq 10 | sed '2,9c hello'
13711
1372hello
137310
1374@end example
1375@codequoteundirected off
1376@codequotebacktick off
1377
1378Leading whitespace after the @code{c} command is ignored.
1379The text to add is read until the end of the line.
1380
1381@item c\
1382@itemx @var{text}
1383@findex c (change to text lines) command
1384@cindex Replacing selected lines with other text
1385Delete the lines matching the address or address-range,
1386and output the lines of text which follow this command.
1387
1388Example: Replace 2nd to 4th lines with the words @samp{hello} and
1389@samp{world} (@print{} indicates printed output lines):
1390@codequoteundirected on
1391@codequotebacktick on
1392@example
1393$ seq 5 | sed '2,4c\
1394hello\
1395world'
1396@print{}1
1397@print{}hello
1398@print{}world
1399@print{}5
1400@end example
1401@codequoteundirected off
1402@codequotebacktick off
1403
1404If no addresses are given, each line is replaced.
1405
1406A new cycle is started after this command is done,
1407since the pattern space will have been deleted.
1408In the following example, the @code{c} starts a
1409new cycle and the substitution command is not performed
1410on the replaced text:
1411
1412@codequoteundirected on
1413@codequotebacktick on
1414@example
1415$ seq 3 | sed '2c\
1416hello
1417s/./X/'
1418@print{}X
1419@print{}hello
1420@print{}X
1421@end example
1422@codequoteundirected off
1423@codequotebacktick off
1424
1425As a GNU extension, the @code{c} command and @var{text} can be
1426separated into two @code{-e} parameters, enabling easier scripting:
1427@codequoteundirected on
1428@codequotebacktick on
1429@example
1430$ seq 3 | sed -e '2c\' -e hello
14311
1432hello
14333
1434
1435$ sed -e '2c\' -e "$VAR"
1436@end example
1437@codequoteundirected off
1438@codequotebacktick off
1439
1440
1441@item =
1442@findex = (print line number) command
1443@cindex Printing line number
1444@cindex Line number, printing
1445Print out the current input line number (with a trailing newline).
1446
1447@codequoteundirected on
1448@codequotebacktick on
1449@example
1450$ printf '%s\n' aaa bbb ccc | sed =
14511
1452aaa
14532
1454bbb
14553
1456ccc
1457@end example
1458@codequoteundirected off
1459@codequotebacktick off
1460
1461@cindex @value{SSEDEXT}, two addresses supported by most commands
1462As a GNU extension, this command accepts two addresses.
1463
1464
1465
1466
1467@item l @var{n}
1468@findex l (list unambiguously) command
1469@cindex List pattern space
1470@cindex Printing text unambiguously
1471@cindex Line length, setting
1472@cindex @value{SSEDEXT}, setting line length
1473Print the pattern space in an unambiguous form:
1474non-printable characters (and the @code{\} character)
1475are printed in C-style escaped form; long lines are split,
1476with a trailing @code{\} character to indicate the split;
1477the end of each line is marked with a @code{$}.
1478
1479@var{n} specifies the desired line-wrap length;
1480a length of 0 (zero) means to never wrap long lines. If omitted,
1481the default as specified on the command line is used. The @var{n}
1482parameter is a @value{SSED} extension.
1483
1484@item r @var{filename}
1485
1486@findex r (read file) command
1487@cindex Read text from a file
1488Reads file @var{filename}. Example:
1489
1490@codequoteundirected on
1491@codequotebacktick on
1492@example
1493$ seq 3 | sed '2r/etc/hostname'
14941
14952
1496fencepost.gnu.org
14973
1498@end example
1499@codequoteundirected off
1500@codequotebacktick off
1501
1502@cindex @value{SSEDEXT}, @file{/dev/stdin} file
1503Queue the contents of @var{filename} to be read and
1504inserted into the output stream at the end of the current cycle,
1505or when the next input line is read.
1506Note that if @var{filename} cannot be read, it is treated as
1507if it were an empty file, without any error indication.
1508
1509As a @value{SSED} extension, the special value @file{/dev/stdin}
1510is supported for the file name, which reads the contents of the
1511standard input.
1512
1513@cindex @value{SSEDEXT}, two addresses supported by most commands
1514As a GNU extension, this command accepts two addresses. The
1515file will then be reread and inserted on each of the addressed lines.
1516
1517As a @value{SSED} extension, the @code{r} command accepts a zero address,
1518inserting a file @emph{before} the first line of the input
1519@pxref{Adding a header to multiple files}.
1520
1521@item w @var{filename}
1522@findex w (write file) command
1523@cindex Write to a file
1524@cindex @value{SSEDEXT}, @file{/dev/stdout} file
1525@cindex @value{SSEDEXT}, @file{/dev/stderr} file
1526Write the pattern space to @var{filename}.
1527As a @value{SSED} extension, two special values of @var{filename} are
1528supported: @file{/dev/stderr}, which writes the result to the standard
1529error, and @file{/dev/stdout}, which writes to the standard
1530output.@footnote{This is equivalent to @code{p} unless the @option{-i}
1531option is being used.}
1532
1533The file will be created (or truncated) before the first input line is
1534read; all @code{w} commands (including instances of the @code{w} flag
1535on successful @code{s} commands) which refer to the same @var{filename}
1536are output without closing and reopening the file.
1537
1538@item D
1539@findex D (delete first line) command
1540@cindex Delete first line from pattern space
1541If pattern space contains no newline, start a normal new cycle as if
1542the @code{d} command was issued. Otherwise, delete text in the pattern
1543space up to the first newline, and restart cycle with the resultant
1544pattern space, without reading a new line of input.
1545
1546@item N
1547@findex N (append Next line) command
1548@cindex Next input line, append to pattern space
1549@cindex Append next input line to pattern space
1550Add a newline to the pattern space,
1551then append the next line of input to the pattern space.
1552If there is no more input then @command{sed} exits without processing
1553any more commands.
1554
1555When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is
1556added between the lines (instead of a new line).
1557
1558By default @command{sed} does not terminate if there is no 'next' input line.
1559This is a GNU extension which can be disabled with @option{--posix}.
1560@xref{N_command_last_line,,N command on the last line}.
1561
1562
1563@item P
1564@findex P (print first line) command
1565@cindex Print first line from pattern space
1566Print out the portion of the pattern space up to the first newline.
1567
1568@item h
1569@findex h (hold) command
1570@cindex Copy pattern space into hold space
1571@cindex Replace hold space with copy of pattern space
1572@cindex Hold space, copying pattern space into
1573Replace the contents of the hold space with the contents of the pattern space.
1574
1575@item H
1576@findex H (append Hold) command
1577@cindex Append pattern space to hold space
1578@cindex Hold space, appending from pattern space
1579Append a newline to the contents of the hold space,
1580and then append the contents of the pattern space to that of the hold space.
1581
1582@item g
1583@findex g (get) command
1584@cindex Copy hold space into pattern space
1585@cindex Replace pattern space with copy of hold space
1586@cindex Hold space, copy into pattern space
1587Replace the contents of the pattern space with the contents of the hold space.
1588
1589@item G
1590@findex G (appending Get) command
1591@cindex Append hold space to pattern space
1592@cindex Hold space, appending to pattern space
1593Append a newline to the contents of the pattern space,
1594and then append the contents of the hold space to that of the pattern space.
1595
1596@item x
1597@findex x (eXchange) command
1598@cindex Exchange hold space with pattern space
1599@cindex Hold space, exchange with pattern space
1600Exchange the contents of the hold and pattern spaces.
1601
1602@end table
1603
1604
1605@node Programming Commands
1606@section Commands for @command{sed} gurus
1607
1608In most cases, use of these commands indicates that you are
1609probably better off programming in something like @command{awk}
1610or Perl. But occasionally one is committed to sticking
1611with @command{sed}, and these commands can enable one to write
1612quite convoluted scripts.
1613
1614@cindex Flow of control in scripts
1615@table @code
1616@item : @var{label}
1617[No addresses allowed.]
1618
1619@findex : (label) command
1620@cindex Labels, in scripts
1621Specify the location of @var{label} for branch commands.
1622In all other respects, a no-op.
1623
1624@item b @var{label}
1625@findex b (branch) command
1626@cindex Branch to a label, unconditionally
1627@cindex Goto, in scripts
1628Unconditionally branch to @var{label}.
1629The @var{label} may be omitted, in which case the next cycle is started.
1630
1631@item t @var{label}
1632@findex t (test and branch if successful) command
1633@cindex Branch to a label, if @code{s///} succeeded
1634@cindex Conditional branch
1635Branch to @var{label} only if there has been a successful @code{s}ubstitution
1636since the last input line was read or conditional branch was taken.
1637The @var{label} may be omitted, in which case the next cycle is started.
1638
1639@end table
1640
1641@node Extended Commands
1642@section Commands Specific to @value{SSED}
1643
1644These commands are specific to @value{SSED}, so you
1645must use them with care and only when you are sure that
1646hindering portability is not evil. They allow you to check
1647for @value{SSED} extensions or to do tasks that are required
1648quite often, yet are unsupported by standard @command{sed}s.
1649
1650@table @code
1651@item e [@var{command}]
1652@findex e (evaluate) command
1653@cindex Evaluate Bourne-shell commands
1654@cindex Subprocesses
1655@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
1656@cindex @value{SSEDEXT}, subprocesses
1657This command allows one to pipe input from a shell command
1658into pattern space. Without parameters, the @code{e} command
1659executes the command that is found in pattern space and
1660replaces the pattern space with the output; a trailing newline
1661is suppressed.
1662
1663If a parameter is specified, instead, the @code{e} command
1664interprets it as a command and sends its output to the output stream.
1665The command can run across multiple lines, all but the last ending with
1666a back-slash.
1667
1668In both cases, the results are undefined if the command to be
1669executed contains a @sc{nul} character.
1670
1671Note that, unlike the @code{r} command, the output of the command will
1672be printed immediately; the @code{r} command instead delays the output
1673to the end of the current cycle.
1674
1675@item F
1676@findex F (File name) command
1677@cindex Printing file name
1678@cindex File name, printing
1679Print out the file name of the current input file (with a trailing
1680newline).
1681
1682@item Q [@var{exit-code}]
1683This command accepts only one address.
1684
1685@findex Q (silent Quit) command
1686@cindex @value{SSEDEXT}, quitting silently
1687@cindex @value{SSEDEXT}, returning an exit code
1688@cindex Quitting
1689This command is the same as @code{q}, but will not print the
1690contents of pattern space. Like @code{q}, it provides the
1691ability to return an exit code to the caller.
1692
1693This command can be useful because the only alternative ways
1694to accomplish this apparently trivial function are to use
1695the @option{-n} option (which can unnecessarily complicate
1696your script) or resorting to the following snippet, which
1697wastes time by reading the whole file without any visible effect:
1698
1699@example
1700:eat
1701$d @i{@r{Quit silently on the last line}}
1702N @i{@r{Read another line, silently}}
1703g @i{@r{Overwrite pattern space each time to save memory}}
1704b eat
1705@end example
1706
1707@item R @var{filename}
1708@findex R (read line) command
1709@cindex Read text from a file
1710@cindex @value{SSEDEXT}, reading a file a line at a time
1711@cindex @value{SSEDEXT}, @code{R} command
1712@cindex @value{SSEDEXT}, @file{/dev/stdin} file
1713Queue a line of @var{filename} to be read and
1714inserted into the output stream at the end of the current cycle,
1715or when the next input line is read.
1716Note that if @var{filename} cannot be read, or if its end is
1717reached, no line is appended, without any error indication.
1718
1719As with the @code{r} command, the special value @file{/dev/stdin}
1720is supported for the file name, which reads a line from the
1721standard input.
1722
1723@item T @var{label}
1724@findex T (test and branch if failed) command
1725@cindex @value{SSEDEXT}, branch if @code{s///} failed
1726@cindex Branch to a label, if @code{s///} failed
1727@cindex Conditional branch
1728Branch to @var{label} only if there have been no successful
1729@code{s}ubstitutions since the last input line was read or
1730conditional branch was taken. The @var{label} may be omitted,
1731in which case the next cycle is started.
1732
1733@item v @var{version}
1734@findex v (version) command
1735@cindex @value{SSEDEXT}, checking for their presence
1736@cindex Requiring @value{SSED}
1737This command does nothing, but makes @command{sed} fail if
1738@value{SSED} extensions are not supported, simply because other
1739versions of @command{sed} do not implement it. In addition, you
1740can specify the version of @command{sed} that your script
1741requires, such as @code{4.0.5}. The default is @code{4.0}
1742because that is the first version that implemented this command.
1743
1744This command enables all @value{SSEDEXT} even if
1745@env{POSIXLY_CORRECT} is set in the environment.
1746
1747@item W @var{filename}
1748@findex W (write first line) command
1749@cindex Write first line to a file
1750@cindex @value{SSEDEXT}, writing first line to a file
1751Write to the given filename the portion of the pattern space up to
1752the first newline. Everything said under the @code{w} command about
1753file handling holds here too.
1754
1755@item z
1756@findex z (Zap) command
1757@cindex @value{SSEDEXT}, emptying pattern space
1758@cindex Emptying pattern space
1759This command empties the content of pattern space. It is
1760usually the same as @samp{s/.*//}, but is more efficient
1761and works in the presence of invalid multibyte sequences
1762in the input stream. @sc{posix} mandates that such sequences
1763are @emph{not} matched by @samp{.}, so that there is no portable
1764way to clear @command{sed}'s buffers in the middle of the
1765script in most multibyte locales (including UTF-8 locales).
1766@end table
1767
1768
1769@node Multiple commands syntax
1770@section Multiple commands syntax
1771
1772@c POSIX says:
1773@c Editing commands other than {...}, a, b, c, i, r, t, w, :, and #
1774@c can be followed by a <semicolon>, optional <blank> characters, and
1775@c another editing command. However, when an s editing command is used
1776@c with the w flag, following it with another command in this manner
1777@c produces undefined results.
1778
1779There are several methods to specify multiple commands in a @command{sed}
1780program.
1781
1782Using newlines is most natural when running a sed script from a file
1783(using the @option{-f} option).
1784
1785On the command line, all @command{sed} commands may be separated by newlines.
1786Alternatively, you may specify each command as an argument to an @option{-e}
1787option:
1788
1789@codequoteundirected on
1790@codequotebacktick on
1791@example
1792@group
1793$ seq 6 | sed '1d
17943d
17955d'
17962
17974
17986
1799
1800$ seq 6 | sed -e 1d -e 3d -e 5d
18012
18024
18036
1804@end group
1805@end example
1806@codequoteundirected off
1807@codequotebacktick off
1808
1809A semicolon (@samp{;}) may be used to separate most simple commands:
1810
1811@codequoteundirected on
1812@codequotebacktick on
1813@example
1814@group
1815$ seq 6 | sed '1d;3d;5d'
18162
18174
18186
1819@end group
1820@end example
1821@codequoteundirected off
1822@codequotebacktick off
1823
1824The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can
1825be separated with a semicolon (this is a non-portable @value{SSED} extension).
1826
1827@codequoteundirected on
1828@codequotebacktick on
1829@example
1830@group
1831$ seq 4 | sed '@{1d;3d@}'
18322
18334
1834
1835$ seq 6 | sed '@{1d;3d@};5d'
18362
18374
18386
1839@end group
1840@end example
1841@codequoteundirected off
1842@codequotebacktick off
1843
1844Labels used in @code{b},@code{t},@code{T},@code{:} commands are read
1845until a semicolon. Leading and trailing whitespace is ignored. In
1846the examples below the label is @samp{x}. The first example works
1847with @value{SSED}. The second is a portable equivalent. For more
1848information about branching and labels @pxref{Branching and flow
1849control}.
1850
1851@codequoteundirected on
1852@codequotebacktick on
1853@example
1854@group
1855$ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d'
18561
1857=2
1858
1859$ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
18601
1861=2
1862@end group
1863@end example
1864@codequoteundirected off
1865@codequotebacktick off
1866
1867
1868
1869@subsection Commands Requiring a newline
1870
1871The following commands cannot be separated by a semicolon and
1872require a newline:
1873
1874@table @asis
1875
1876@item @code{a},@code{c},@code{i} (append/change/insert)
1877
1878All characters following @code{a},@code{c},@code{i} commands are taken
1879as the text to append/change/insert. Using a semicolon leads to
1880undesirable results:
1881
1882@codequoteundirected on
1883@codequotebacktick on
1884@example
1885@group
1886$ seq 2 | sed '1aHello ; 2d'
18871
1888Hello ; 2d
18892
1890@end group
1891@end example
1892@codequoteundirected off
1893@codequotebacktick off
1894
1895Separate the commands using @option{-e} or a newline:
1896
1897@codequoteundirected on
1898@codequotebacktick on
1899@example
1900@group
1901$ seq 2 | sed -e 1aHello -e 2d
19021
1903Hello
1904
1905$ seq 2 | sed '1aHello
19062d'
19071
1908Hello
1909@end group
1910@end example
1911@codequoteundirected off
1912@codequotebacktick off
1913
1914Note that specifying the text to add (@samp{Hello}) immediately
1915after @code{a},@code{c},@code{i} is itself a @value{SSED} extension.
1916A portable, POSIX-compliant alternative is:
1917
1918@codequoteundirected on
1919@codequotebacktick on
1920@example
1921@group
1922$ seq 2 | sed '1a\
1923Hello
19242d'
19251
1926Hello
1927@end group
1928@end example
1929@codequoteundirected off
1930@codequotebacktick off
1931
1932@item @code{#} (comment)
1933
1934All characters following @samp{#} until the next newline are ignored.
1935
1936@codequoteundirected on
1937@codequotebacktick on
1938@example
1939@group
1940$ seq 3 | sed '# this is a comment ; 2d'
19411
19422
19433
1944
1945
1946$ seq 3 | sed '# this is a comment
19472d'
19481
19493
1950@end group
1951@end example
1952@codequoteundirected off
1953@codequotebacktick off
1954
1955@item @code{r},@code{R},@code{w},@code{W} (reading and writing files)
1956
1957The @code{r},@code{R},@code{w},@code{W} commands parse the filename
1958until end of the line. If whitespace, comments or semicolons are found,
1959they will be included in the filename, leading to unexpected results:
1960
1961@codequoteundirected on
1962@codequotebacktick on
1963@example
1964@group
1965$ seq 2 | sed '1w hello.txt ; 2d'
19661
19672
1968
1969$ ls -log
1970total 4
1971-rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
1972
1973$ cat 'hello.txt ; 2d'
19741
1975@end group
1976@end example
1977@codequoteundirected off
1978@codequotebacktick off
1979
1980Note that @command{sed} silently ignores read/write errors in
1981@code{r},@code{R},@code{w},@code{W} commands (such as missing files).
1982In the following example, @command{sed} tries to read a file named
1983@samp{@file{hello.txt ; N}}. The file is missing, and the error is silently
1984ignored:
1985
1986@codequoteundirected on
1987@codequotebacktick on
1988@example
1989@group
1990$ echo x | sed '1rhello.txt ; N'
1991x
1992@end group
1993@end example
1994@codequoteundirected off
1995@codequotebacktick off
1996
1997@item @code{e} (command execution)
1998
1999Any characters following the @code{e} command until the end of the line
2000will be sent to the shell. If whitespace, comments or semicolons are found,
2001they will be included in the shell command, leading to unexpected results:
2002
2003@codequoteundirected on
2004@codequotebacktick on
2005@example
2006@group
2007$ echo a | sed '1e touch foo#bar'
2008a
2009
2010$ ls -1
2011foo#bar
2012
2013$ echo a | sed '1e touch foo ; s/a/b/'
2014sh: 1: s/a/b/: not found
2015a
2016@end group
2017@end example
2018@codequoteundirected off
2019@codequotebacktick off
2020
2021
2022@item @code{s///[we]} (substitute with @code{e} or @code{w} flags)
2023
2024In a substitution command, the @code{w} flag writes the substitution
2025result to a file, and the @code{e} flag executes the substitution result
2026as a shell command. As with the @code{r/R/w/W/e} commands, these
2027must be terminated with a newline. If whitespace, comments or semicolons
2028are found, they will be included in the shell command or filename, leading to
2029unexpected results:
2030
2031@codequoteundirected on
2032@codequotebacktick on
2033@example
2034@group
2035$ echo a | sed 's/a/b/w1.txt#foo'
2036b
2037
2038$ ls -1
20391.txt#foo
2040@end group
2041@end example
2042@codequoteundirected off
2043@codequotebacktick off
2044
2045@end table
2046
2047
2048@node sed addresses
2049@chapter Addresses: selecting lines
2050
2051@menu
2052* Addresses overview:: Addresses overview
2053* Numeric Addresses:: selecting lines by numbers
2054* Regexp Addresses:: selecting lines by text matching
2055* Range Addresses:: selecting a range of lines
2056* Zero Address:: Using address @code{0}
2057@end menu
2058
2059@node Addresses overview
2060@section Addresses overview
2061
2062@cindex addresses, numeric
2063@cindex numeric addresses
2064Addresses determine on which line(s) the @command{sed} command will be
2065executed. The following command replaces any first occurrence of @samp{hello}
2066with @samp{world} only on line 144:
2067
2068@codequoteundirected on
2069@codequotebacktick on
2070@example
2071sed '144s/hello/world/' input.txt > output.txt
2072@end example
2073@codequoteundirected off
2074@codequotebacktick off
2075
2076
2077
2078If no address is specified, the command is performed on all lines.
2079The following command replaces @samp{hello} with @samp{world},
2080targeting every line of the input file.
2081However, note that it modifies only the first instance of @samp{hello}
2082on each line.
2083Use the @samp{g} modifier to affect every instance on each affected line.
2084
2085@codequoteundirected on
2086@codequotebacktick on
2087@example
2088sed 's/hello/world/' input.txt > output.txt
2089@end example
2090@codequoteundirected off
2091@codequotebacktick off
2092
2093
2094
2095@cindex addresses, regular expression
2096@cindex regular expression addresses
2097Addresses can contain regular expressions to match lines based
2098on content instead of line numbers. The following command replaces
2099@samp{hello} with @samp{world} only on lines
2100containing the string @samp{apple}:
2101
2102@codequoteundirected on
2103@codequotebacktick on
2104@example
2105sed '/apple/s/hello/world/' input.txt > output.txt
2106@end example
2107@codequoteundirected off
2108@codequotebacktick off
2109
2110
2111
2112@cindex addresses, range
2113@cindex range addresses
2114An address range is specified with two addresses separated by a comma
2115(@code{,}). Addresses can be numeric, regular expressions, or a mix of
2116both.
2117The following command replaces @samp{hello} with @samp{world}
2118only on lines 4 to 17 (inclusive):
2119
2120@codequoteundirected on
2121@codequotebacktick on
2122@example
2123sed '4,17s/hello/world/' input.txt > output.txt
2124@end example
2125@codequoteundirected off
2126@codequotebacktick off
2127
2128
2129
2130@cindex Excluding lines
2131@cindex Selecting non-matching lines
2132@cindex addresses, negating
2133@cindex addresses, excluding
2134Appending the @code{!} character to the end of an address
2135specification (before the command letter) negates the sense of the
2136match. That is, if the @code{!} character follows an address or an
2137address range, then only lines which do @emph{not} match the addresses
2138will be selected. The following command replaces @samp{hello}
2139with @samp{world} only on lines @emph{not} containing the string
2140@samp{apple}:
2141
2142@example
2143sed '/apple/!s/hello/world/' input.txt > output.txt
2144@end example
2145
2146The following command replaces @samp{hello} with
2147@samp{world} only on lines 1 to 3 and from line 18 to the last line of the
2148input file (i.e. excluding lines 4 to 17):
2149
2150@example
2151sed '4,17!s/hello/world/' input.txt > output.txt
2152@end example
2153
2154
2155
2156
2157
2158@node Numeric Addresses
2159@section Selecting lines by numbers
2160@cindex Addresses, in @command{sed} scripts
2161@cindex Line selection
2162@cindex Selecting lines to process
2163
2164Addresses in a @command{sed} script can be in any of the following forms:
2165@table @code
2166@item @var{number}
2167@cindex Address, numeric
2168@cindex Line, selecting by number
2169Specifying a line number will match only that line in the input.
2170(Note that @command{sed} counts lines continuously across all input files
2171unless @option{-i} or @option{-s} options are specified.)
2172
2173@item $
2174@cindex Address, last line
2175@cindex Last line, selecting
2176@cindex Line, selecting last
2177This address matches the last line of the last file of input, or
2178the last line of each file when the @option{-i} or @option{-s} options
2179are specified.
2180
2181
2182@item @var{first}~@var{step}
2183@cindex GNU extensions, @samp{@var{n}~@var{m}} addresses
2184This GNU extension matches every @var{step}th line
2185starting with line @var{first}.
2186In particular, lines will be selected when there exists
2187a non-negative @var{n} such that the current line-number equals
2188@var{first} + (@var{n} * @var{step}).
2189Thus, one would use @code{1~2} to select the odd-numbered lines and
2190@code{0~2} for even-numbered lines;
2191to pick every third line starting with the second, @samp{2~3} would be used;
2192to pick every fifth line starting with the tenth, use @samp{10~5};
2193and @samp{50~0} is just an obscure way of saying @code{50}.
2194
2195The following commands demonstrate the step address usage:
2196
2197@example
2198$ seq 10 | sed -n '0~4p'
21994
22008
2201
2202$ seq 10 | sed -n '1~3p'
22031
22044
22057
220610
2207@end example
2208
2209
2210@end table
2211
2212
2213
2214@node Regexp Addresses
2215@section selecting lines by text matching
2216
2217@value{SSED} supports the following regular expression addresses.
2218The default regular expression is
2219@ref{BRE syntax, , Basic Regular Expression (BRE)}.
2220If @option{-E} or @option{-r} options are used, The regular expression should be
2221in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax.
2222@xref{BRE vs ERE}.
2223
2224@table @code
2225@item /@var{regexp}/
2226@cindex Address, as a regular expression
2227@cindex Line, selecting by regular expression match
2228This will select any line which matches the regular expression @var{regexp}.
2229If @var{regexp} itself includes any @code{/} characters,
2230each must be escaped by a backslash (@code{\}).
2231
2232The following command prints lines in @file{/etc/passwd}
2233which end with @samp{bash}@footnote{
2234There are of course many other ways to do the same,
2235e.g.
2236@example
2237grep 'bash$' /etc/passwd
2238awk -F: '$7 == "/bin/bash"' /etc/passwd
2239@end example
2240}:
2241
2242@example
2243sed -n '/bash$/p' /etc/passwd
2244@end example
2245
2246@cindex empty regular expression
2247@cindex @value{SSEDEXT}, modifiers and the empty regular expression
2248The empty regular expression @samp{//} repeats the last regular
2249expression match (the same holds if the empty regular expression is
2250passed to the @code{s} command). Note that modifiers to regular expressions
2251are evaluated when the regular expression is compiled, thus it is invalid to
2252specify them together with the empty regular expression.
2253
2254@item \%@var{regexp}%
2255(The @code{%} may be replaced by any other single character.)
2256
2257@cindex Slash character, in regular expressions
2258This also matches the regular expression @var{regexp},
2259but allows one to use a different delimiter than @code{/}.
2260This is particularly useful if the @var{regexp} itself contains
2261a lot of slashes, since it avoids the tedious escaping of every @code{/}.
2262If @var{regexp} itself includes any delimiter characters,
2263each must be escaped by a backslash (@code{\}).
2264
2265The following commands are equivalent. They print lines
2266which start with @samp{/home/alice/documents/}:
2267
2268@example
2269sed -n '/^\/home\/alice\/documents\//p'
2270sed -n '\%^/home/alice/documents/%p'
2271sed -n '\;^/home/alice/documents/;p'
2272@end example
2273
2274
2275@item /@var{regexp}/I
2276@itemx \%@var{regexp}%I
2277@cindex GNU extensions, @code{I} modifier
2278@cindex case insensitive, regular expression
2279The @code{I} modifier to regular-expression matching is a GNU
2280extension which causes the @var{regexp} to be matched in
2281a case-insensitive manner.
2282
2283In many other programming languages, a lower case @code{i} is used
2284for case-insensitive regular expression matching. However, in @command{sed}
2285the @code{i} is used for the insert command (@pxref{insert command}).
2286
2287Observe the difference between the following examples.
2288
2289In this example, @code{/b/I} is the address: regular expression with @code{I}
2290modifier. @code{d} is the delete command:
2291
2292@example
2293$ printf "%s\n" a b c | sed '/b/Id'
2294a
2295c
2296@end example
2297
2298Here, @code{/b/} is the address: a regular expression.
2299@code{i} is the insert command.
2300@code{d} is the value to insert.
2301A line with @samp{d} is then inserted above the matched line:
2302
2303@example
2304$ printf "%s\n" a b c | sed '/b/id'
2305a
2306d
2307b
2308c
2309@end example
2310
2311@item /@var{regexp}/M
2312@itemx \%@var{regexp}%M
2313@cindex @value{SSEDEXT}, @code{M} modifier
2314The @code{M} modifier to regular-expression matching is a @value{SSED}
2315extension which directs @value{SSED} to match the regular expression
2316in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
2317match respectively (in addition to the normal behavior) the empty string
2318after a newline, and the empty string before a newline. There are
2319special character sequences
2320@ifclear PERL
2321(@code{\`} and @code{\'})
2322@end ifclear
2323which always match the beginning or the end of the buffer.
2324In addition,
2325the period character does not match a new-line character in
2326multi-line mode.
2327@end table
2328
2329
2330@cindex regex addresses and pattern space
2331@cindex regex addresses and input lines
2332Regex addresses operate on the content of the current
2333pattern space. If the pattern space is changed (for example with @code{s///}
2334command) the regular expression matching will operate on the changed text.
2335
2336In the following example, automatic printing is disabled with
2337@option{-n}. The @code{s/2/X/} command changes lines containing
2338@samp{2} to @samp{X}. The command @code{/[0-9]/p} matches
2339lines with digits and prints them.
2340Because the second line is changed before the @code{/[0-9]/} regex,
2341it will not match and will not be printed:
2342
2343@codequoteundirected on
2344@codequotebacktick on
2345@example
2346@group
2347$ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
23481
23493
2350@end group
2351@end example
2352@codequoteundirected off
2353@codequotebacktick off
2354
2355
2356@node Range Addresses
2357@section Range Addresses
2358
2359@cindex Range of lines
2360@cindex Several lines, selecting
2361An address range can be specified by specifying two addresses
2362separated by a comma (@code{,}). An address range matches lines
2363starting from where the first address matches, and continues
2364until the second address matches (inclusively):
2365
2366@example
2367$ seq 10 | sed -n '4,6p'
23684
23695
23706
2371@end example
2372
2373If the second address is a @var{regexp}, then checking for the
2374ending match will start with the line @emph{following} the
2375line which matched the first address: a range will always
2376span at least two lines (except of course if the input stream
2377ends).
2378
2379@example
2380$ seq 10 | sed -n '4,/[0-9]/p'
23814
23825
2383@end example
2384
2385If the second address is a @var{number} less than (or equal to)
2386the line matching the first address, then only the one line is
2387matched:
2388
2389@example
2390$ seq 10 | sed -n '4,1p'
23914
2392@end example
2393
2394@anchor{Zero Address Regex Range}
2395@cindex Special addressing forms
2396@cindex Range with start address of zero
2397@cindex Zero, as range start address
2398@cindex @var{addr1},+N
2399@cindex @var{addr1},~N
2400@cindex GNU extensions, special two-address forms
2401@cindex GNU extensions, @code{0} address
2402@cindex GNU extensions, 0,@var{addr2} addressing
2403@cindex GNU extensions, @var{addr1},+@var{N} addressing
2404@cindex GNU extensions, @var{addr1},~@var{N} addressing
2405@value{SSED} also supports some special two-address forms; all these
2406are GNU extensions:
2407@table @code
2408@item 0,/@var{regexp}/
2409A line number of @code{0} can be used in an address specification like
2410@code{0,/@var{regexp}/} so that @command{sed} will try to match
2411@var{regexp} in the first input line too. In other words,
2412@code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
2413except that if @var{addr2} matches the very first line of input the
2414@code{0,/@var{regexp}/} form will consider it to end the range, whereas
2415the @code{1,/@var{regexp}/} form will match the beginning of its range and
2416hence make the range span up to the @emph{second} occurrence of the
2417regular expression.
2418
2419The following examples demonstrate the difference between starting
2420with address 1 and 0:
2421
2422@example
2423$ seq 10 | sed -n '1,/[0-9]/p'
24241
24252
2426
2427$ seq 10 | sed -n '0,/[0-9]/p'
24281
2429@end example
2430
2431
2432@item @var{addr1},+@var{N}
2433Matches @var{addr1} and the @var{N} lines following @var{addr1}.
2434
2435@example
2436$ seq 10 | sed -n '6,+2p'
24376
24387
24398
2440@end example
2441
2442@var{addr1} can be a line number or a regular expression.
2443
2444@item @var{addr1},~@var{N}
2445Matches @var{addr1} and the lines following @var{addr1}
2446until the next line whose input line number is a multiple of @var{N}.
2447The following command prints starting at line 6, until the next line which
2448is a multiple of 4 (i.e. line 8):
2449
2450@example
2451$ seq 10 | sed -n '6,~4p'
24526
24537
24548
2455@end example
2456
2457@var{addr1} can be a line number or a regular expression.
2458
2459@end table
2460
2461
2462
2463@node Zero Address
2464@section Zero Address
2465@cindex Zero Address
2466As a @value{SSED} extension, @code{0} address can be used in two cases:
2467@enumerate
2468@item
2469In a regex range addresses as @code{0,/@var{regexp}/}
2470(@pxref{Zero Address Regex Range}).
2471@item
2472With the @code{r} command, inserting a file before the first line
2473(@pxref{Adding a header to multiple files}).
2474@end enumerate
2475
2476Note that these are the only places where the @code{0} address makes
2477sense; Commands which are given the @code{0} address in any
2478other way will give an error.
2479
2480
2481
2482@node sed regular expressions
2483@chapter Regular Expressions: selecting text
2484
2485@menu
2486* Regular Expressions Overview:: Overview of Regular expression in @command{sed}
2487* BRE vs ERE:: Basic (BRE) and extended (ERE) regular expression
2488 syntax
2489* BRE syntax:: Overview of basic regular expression syntax
2490* ERE syntax:: Overview of extended regular expression syntax
2491* Character Classes and Bracket Expressions::
2492* regexp extensions:: Additional regular expression commands
2493* Back-references and Subexpressions:: Back-references and Subexpressions
2494* Escapes:: Specifying special characters
2495* Locale Considerations:: Multibyte characters and locale considerations
2496@end menu
2497
2498@node Regular Expressions Overview
2499@section Overview of regular expression in @command{sed}
2500
2501@c NOTE: Keep examples in the 'overview' section
2502@c neutral in regards to BRE/ERE - to ease understanding.
2503
2504
2505To know how to use @command{sed}, people should understand regular
2506expressions (@dfn{regexp} for short). A regular expression
2507is a pattern that is matched against a
2508subject string from left to right. Most characters are
2509@dfn{ordinary}: they stand for
2510themselves in a pattern, and match the corresponding characters.
2511Regular expressions in @command{sed} are specified between two
2512slashes.
2513
2514The following command prints lines containing the string @samp{hello}:
2515
2516@example
2517sed -n '/hello/p'
2518@end example
2519
2520The above example is equivalent to this @command{grep} command:
2521
2522@example
2523grep 'hello'
2524@end example
2525
2526The power of regular expressions comes from the ability to include
2527alternatives and repetitions in the pattern. These are encoded in the
2528pattern by the use of @dfn{special characters}, which do not stand for
2529themselves but instead are interpreted in some special way.
2530
2531The character @code{^} (caret) in a regular expression matches the
2532beginning of the line. The character @code{.} (dot) matches any single
2533character. The following @command{sed} command matches and prints
2534lines which start with the letter @samp{b}, followed by any single character,
2535followed by the letter @samp{d}:
2536
2537@example
2538$ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p'
2539bad
2540bed
2541bid
2542body
2543@end example
2544
2545The following sections explain the meaning and usage of special
2546characters in regular expressions.
2547
2548@node BRE vs ERE
2549@section Basic (BRE) and extended (ERE) regular expression
2550
2551Basic and extended regular expressions are two variations on the
2552syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the
2553default in @command{sed} (and similarly in @command{grep}).
2554Use the POSIX-specified @option{-E} option (@option{-r},
2555@option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax.
2556
2557In @value{SSED}, the only difference between basic and extended regular
2558expressions is in the behavior of a few special characters: @samp{?},
2559@samp{+}, parentheses, braces (@samp{@{@}}), and @samp{|}.
2560
2561With basic (BRE) syntax, these characters do not have special meaning
2562unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax
2563it is reversed: these characters are special unless they are prefixed
2564with backslash (@samp{\}).
2565
2566@multitable @columnfractions .28 .36 .35
2567
2568@headitem Desired pattern
2569@tab Basic (BRE) Syntax
2570@tab Extended (ERE) Syntax
2571
2572@item literal @samp{+} (plus sign)
2573
2574@tab
2575@exampleindent 0
2576@codequoteundirected on
2577@codequotebacktick on
2578@example
2579$ echo 'a+b=c' > foo
2580$ sed -n '/a+b/p' foo
2581a+b=c
2582@end example
2583@codequotebacktick off
2584@codequoteundirected off
2585
2586@tab
2587@exampleindent 0
2588@codequoteundirected on
2589@codequotebacktick on
2590@example
2591$ echo 'a+b=c' > foo
2592$ sed -E -n '/a\+b/p' foo
2593a+b=c
2594@end example
2595@codequotebacktick off
2596@codequoteundirected off
2597
2598
2599@item One or more @samp{a} characters followed by @samp{b}
2600(plus sign as special meta-character)
2601
2602@tab
2603@exampleindent 0
2604@codequoteundirected on
2605@codequotebacktick on
2606@example
2607$ echo aab > foo
2608$ sed -n '/a\+b/p' foo
2609aab
2610@end example
2611@codequotebacktick off
2612@codequoteundirected off
2613
2614@tab
2615@exampleindent 0
2616@codequoteundirected on
2617@codequotebacktick on
2618@example
2619$ echo aab > foo
2620$ sed -E -n '/a+b/p' foo
2621aab
2622@end example
2623@codequotebacktick off
2624@codequoteundirected off
2625
2626@end multitable
2627
2628
2629
2630
2631@node BRE syntax
2632@section Overview of basic regular expression syntax
2633
2634Here is a brief description
2635of regular expression syntax as used in @command{sed}.
2636
2637@table @code
2638@item @var{char}
2639A single ordinary character matches itself.
2640
2641@item *
2642@cindex GNU extensions, to basic regular expressions
2643Matches a sequence of zero or more instances of matches for the
2644preceding regular expression, which must be an ordinary character, a
2645special character preceded by @code{\}, a @code{.}, a grouped regexp
2646(see below), or a bracket expression. As a GNU extension, a
2647postfixed regular expression can also be followed by @code{*}; for
2648example, @code{a**} is equivalent to @code{a*}. POSIX
26491003.1-2001 says that @code{*} stands for itself when it appears at
2650the start of a regular expression or subexpression, but many
2651non-GNU implementations do not support this and portable
2652scripts should instead use @code{\*} in these contexts.
2653@item .
2654Matches any character, including newline.
2655
2656@item ^
2657Matches the null string at beginning of the pattern space, i.e. what
2658appears after the circumflex must appear at the beginning of the
2659pattern space.
2660
2661In most scripts, pattern space is initialized to the content of each
2662line (@pxref{Execution Cycle, , How @code{sed} works}). So, it is a
2663useful simplification to think of @code{^#include} as matching only
2664lines where @samp{#include} is the first thing on the line---if there is
2665any preceding space, for example, the match fails. This simplification is
2666valid as long as the original content of pattern space is not modified,
2667for example with an @code{s} command.
2668
2669@code{^} acts as a special character only at the beginning of the
2670regular expression or subexpression (that is, after @code{\(} or
2671@code{\|}). Portable scripts should avoid @code{^} at the beginning of
2672a subexpression, though, as POSIX allows implementations that
2673treat @code{^} as an ordinary character in that context.
2674
2675@item $
2676It is the same as @code{^}, but refers to end of pattern space.
2677@code{$} also acts as a special character only at the end
2678of the regular expression or subexpression (that is, before @code{\)}
2679or @code{\|}), and its use at the end of a subexpression is not
2680portable.
2681
2682
2683@item [@var{list}]
2684@itemx [^@var{list}]
2685Matches any single character in @var{list}: for example,
2686@code{[aeiou]} matches all vowels. A list may include
2687sequences like @code{@var{char1}-@var{char2}}, which
2688matches any character between (inclusive) @var{char1}
2689and @var{char2}.
2690@xref{Character Classes and Bracket Expressions}.
2691
2692@item \+
2693@cindex GNU extensions, to basic regular expressions
2694As @code{*}, but matches one or more. It is a GNU extension.
2695
2696@item \?
2697@cindex GNU extensions, to basic regular expressions
2698As @code{*}, but only matches zero or one. It is a GNU extension.
2699
2700@item \@{@var{i}\@}
2701As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
2702decimal integer; for portability, keep it between 0 and 255
2703inclusive).
2704
2705@item \@{@var{i},@var{j}\@}
2706Matches between @var{i} and @var{j}, inclusive, sequences.
2707
2708@item \@{@var{i},\@}
2709Matches more than or equal to @var{i} sequences.
2710
2711@item \(@var{regexp}\)
2712Groups the inner @var{regexp} as a whole, this is used to:
2713
2714@itemize @bullet
2715@item
2716@cindex GNU extensions, to basic regular expressions
2717Apply postfix operators, like @code{\(abcd\)*}:
2718this will search for zero or more whole sequences
2719of @samp{abcd}, while @code{abcd*} would search
2720for @samp{abc} followed by zero or more occurrences
2721of @samp{d}. Note that support for @code{\(abcd\)*} is
2722required by POSIX 1003.1-2001, but many non-GNU
2723implementations do not support it and hence it is not universally
2724portable.
2725
2726@item
2727Use back references (see below).
2728@end itemize
2729
2730
2731@item @var{regexp1}\|@var{regexp2}
2732@cindex GNU extensions, to basic regular expressions
2733Matches either @var{regexp1} or @var{regexp2}. Use
2734parentheses to use complex alternative regular expressions.
2735The matching process tries each alternative in turn, from
2736left to right, and the first one that succeeds is used.
2737It is a GNU extension.
2738
2739@item @var{regexp1}@var{regexp2}
2740Matches the concatenation of @var{regexp1} and @var{regexp2}.
2741Concatenation binds more tightly than @code{\|}, @code{^}, and
2742@code{$}, but less tightly than the other regular expression
2743operators.
2744
2745@item \@var{digit}
2746Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized
2747subexpression in the regular expression. This is called a @dfn{back
2748reference}. Subexpressions are implicitly numbered by counting
2749occurrences of @code{\(} left-to-right.
2750
2751@item \n
2752Matches the newline character.
2753
2754@item \@var{char}
2755Matches @var{char}, where @var{char} is one of @code{$},
2756@code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
2757Note that the only C-like
2758backslash sequences that you can portably assume to be
2759interpreted are @code{\n} and @code{\\}; in particular
2760@code{\t} is not portable, and matches a @samp{t} under most
2761implementations of @command{sed}, rather than a tab character.
2762
2763@end table
2764
2765@cindex Greedy regular expression matching
2766Note that the regular expression matcher is greedy, i.e., matches
2767are attempted from left to right and, if two or more matches are
2768possible starting at the same character, it selects the longest.
2769
2770@noindent
2771Examples:
2772@table @samp
2773@item abcdef
2774Matches @samp{abcdef}.
2775
2776@item a*b
2777Matches zero or more @samp{a}s followed by a single
2778@samp{b}. For example, @samp{b} or @samp{aaaaab}.
2779
2780@item a\?b
2781Matches @samp{b} or @samp{ab}.
2782
2783@item a\+b\+
2784Matches one or more @samp{a}s followed by one or more
2785@samp{b}s: @samp{ab} is the shortest possible match, but
2786other examples are @samp{aaaab} or @samp{abbbbb} or
2787@samp{aaaaaabbbbbbb}.
2788
2789@item .*
2790@itemx .\+
2791These two both match all the characters in a string;
2792however, the first matches every string (including the empty
2793string), while the second matches only strings containing
2794at least one character.
2795
2796@item ^main.*(.*)
2797This matches a string starting with @samp{main},
2798followed by an opening and closing
2799parenthesis. The @samp{n}, @samp{(} and @samp{)} need not
2800be adjacent.
2801
2802@item ^#
2803This matches a string beginning with @samp{#}.
2804
2805@item \\$
2806This matches a string ending with a single backslash. The
2807regexp contains two backslashes for escaping.
2808
2809@item \$
2810Instead, this matches a string consisting of a single dollar sign,
2811because it is escaped.
2812
2813@item [a-zA-Z0-9]
2814In the C locale, this matches any ASCII letters or digits.
2815
2816@item [^ @kbd{@key{TAB}}]\+
2817(Here @kbd{@key{TAB}} stands for a single tab character.)
2818This matches a string of one or more
2819characters, none of which is a space or a tab.
2820Usually this means a word.
2821
2822@item ^\(.*\)\n\1$
2823This matches a string consisting of two equal substrings separated by
2824a newline.
2825
2826@item .\@{9\@}A$
2827This matches nine characters followed by an @samp{A} at the end of a line.
2828
2829@item ^.\@{15\@}A
2830This matches the start of a string that contains 16 characters,
2831the last of which is an @samp{A}.
2832
2833@end table
2834
2835
2836@node ERE syntax
2837@section Overview of extended regular expression syntax
2838@cindex Extended regular expressions, syntax
2839
2840The only difference between basic and extended regular expressions is in
2841the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
2842braces (@samp{@{@}}), and @samp{|}. While basic regular expressions
2843require these to be escaped if you want them to behave as special
2844characters, when using extended regular expressions you must escape
2845them if you want them @emph{to match a literal character}. @samp{|}
2846is special here because @samp{\|} is a GNU extension -- standard
2847basic regular expressions do not provide its functionality.
2848
2849@noindent
2850Examples:
2851@table @code
2852@item abc?
2853becomes @samp{abc\?} when using extended regular expressions. It matches
2854the literal string @samp{abc?}.
2855
2856@item c\+
2857becomes @samp{c+} when using extended regular expressions. It matches
2858one or more @samp{c}s.
2859
2860@item a\@{3,\@}
2861becomes @samp{a@{3,@}} when using extended regular expressions. It matches
2862three or more @samp{a}s.
2863
2864@item \(abc\)\@{2,3\@}
2865becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It
2866matches either @samp{abcabc} or @samp{abcabcabc}.
2867
2868@item \(abc*\)\1
2869becomes @samp{(abc*)\1} when using extended regular expressions.
2870Backreferences must still be escaped when using extended regular
2871expressions.
2872
2873@item a\|b
2874becomes @samp{a|b} when using extended regular expressions. It matches
2875@samp{a} or @samp{b}.
2876@end table
2877
2878@node Character Classes and Bracket Expressions
2879@section Character Classes and Bracket Expressions
2880
2881@c The 'character class' section is shamelessly copied from grep's manual.
2882
2883@cindex bracket expression
2884@cindex character class
2885A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
2886@samp{]}.
2887It matches any single character in that list;
2888if the first character of the list is the caret @samp{^},
2889then it matches any character @strong{not} in the list.
2890For example, the following command replaces the strings
2891@samp{gray} or @samp{grey} with @samp{blue}:
2892
2893@example
2894sed 's/gr[ae]y/blue/'
2895@end example
2896
2897@c TODO: fix 'ref' to look good in both HTML and PDF
2898Bracket expressions can be used in both
2899@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
2900regular expressions (that is, with or without the @option{-E}/@option{-r}
2901options).
2902
2903@cindex range expression
2904Within a bracket expression, a @dfn{range expression} consists of two
2905characters separated by a hyphen.
2906It matches any single character that
2907sorts between the two characters, inclusive.
2908In the default C locale, the sorting sequence is the native character
2909order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
2910
2911
2912Finally, certain named classes of characters are predefined within
2913bracket expressions, as follows.
2914
2915These named classes must be used @emph{inside} brackets
2916themselves. Correct usage:
2917@example
2918$ echo 1 | sed 's/[[:digit:]]/X/'
2919X
2920@end example
2921
2922Incorrect usage is rejected by newer @command{sed} versions.
2923Older versions accepted it but treated it as a single bracket expression
2924(which is equivalent to @samp{[dgit:]},
2925that is, only the characters @var{d/g/i/t/:}):
2926@example
2927# current GNU sed versions - incorrect usage rejected
2928$ echo 1 | sed 's/[:digit:]/X/'
2929sed: character class syntax is [[:space:]], not [:space:]
2930
2931# older GNU sed versions
2932$ echo 1 | sed 's/[:digit:]/X/'
29331
2934@end example
2935
2936
2937@cindex classes of characters
2938@cindex character classes
2939@cindex named character classes
2940@table @samp
2941
2942@item [:alnum:]
2943@opindex alnum @r{character class}
2944@cindex alphanumeric characters
2945Alphanumeric characters:
2946@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
2947character encoding, this is the same as @samp{[0-9A-Za-z]}.
2948
2949@item [:alpha:]
2950@opindex alpha @r{character class}
2951@cindex alphabetic characters
2952Alphabetic characters:
2953@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
2954character encoding, this is the same as @samp{[A-Za-z]}.
2955
2956@item [:blank:]
2957@opindex blank @r{character class}
2958@cindex blank characters
2959Blank characters:
2960space and tab.
2961
2962@item [:cntrl:]
2963@opindex cntrl @r{character class}
2964@cindex control characters
2965Control characters.
2966In ASCII, these characters have octal codes 000
2967through 037, and 177 (DEL).
2968In other character sets, these are
2969the equivalent characters, if any.
2970
2971@item [:digit:]
2972@opindex digit @r{character class}
2973@cindex digit characters
2974@cindex numeric characters
2975Digits: @code{0 1 2 3 4 5 6 7 8 9}.
2976
2977@item [:graph:]
2978@opindex graph @r{character class}
2979@cindex graphic characters
2980Graphical characters:
2981@samp{[:alnum:]} and @samp{[:punct:]}.
2982
2983@item [:lower:]
2984@opindex lower @r{character class}
2985@cindex lower-case letters
2986Lower-case letters; in the @samp{C} locale and ASCII character
2987encoding, this is
2988@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
2989
2990@item [:print:]
2991@opindex print @r{character class}
2992@cindex printable characters
2993Printable characters:
2994@samp{[:alnum:]}, @samp{[:punct:]}, and space.
2995
2996@item [:punct:]
2997@opindex punct @r{character class}
2998@cindex punctuation characters
2999Punctuation characters; in the @samp{C} locale and ASCII character
3000encoding, this is
3001@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}.
3002
3003@item [:space:]
3004@opindex space @r{character class}
3005@cindex space characters
3006@cindex whitespace characters
3007Space characters: in the @samp{C} locale, this is
3008tab, newline, vertical tab, form feed, carriage return, and space.
3009
3010
3011@item [:upper:]
3012@opindex upper @r{character class}
3013@cindex upper-case letters
3014Upper-case letters: in the @samp{C} locale and ASCII character
3015encoding, this is
3016@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
3017
3018@item [:xdigit:]
3019@opindex xdigit @r{character class}
3020@cindex xdigit class
3021@cindex hexadecimal digits
3022Hexadecimal digits:
3023@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
3024
3025@end table
3026Note that the brackets in these class names are
3027part of the symbolic names, and must be included in addition to
3028the brackets delimiting the bracket expression.
3029
3030Most meta-characters lose their special meaning inside bracket expressions:
3031
3032@table @samp
3033@item ]
3034ends the bracket expression if it's not the first list item.
3035So, if you want to make the @samp{]} character a list item,
3036you must put it first.
3037
3038@item -
3039represents the range if it's not first or last in a list or the ending point
3040of a range.
3041
3042@item ^
3043represents the characters not in the list.
3044If you want to make the @samp{^}
3045character a list item, place it anywhere but first.
3046@end table
3047
3048TODO: incorporate this paragraph (copied verbatim from BRE section).
3049
3050@cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
3051The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
3052are normally not special within @var{list}. For example, @code{[\*]}
3053matches either @samp{\} or @samp{*}, because the @code{\} is not
3054special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and
3055@code{[:space:]} are special within @var{list} and represent collating
3056symbols, equivalence classes, and character classes, respectively, and
3057@code{[} is therefore special within @var{list} when it is followed by
3058@code{.}, @code{=}, or @code{:}. Also, when not in
3059@env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
3060@code{\t} are recognized within @var{list}. @xref{Escapes}.
3061@c ********
3062
3063
3064@c TODO: improve explanation about collation classes and equivalence classes
3065@c perhaps dedicate a section to Locales ??
3066
3067@table @samp
3068@item [.
3069represents the open collating symbol.
3070
3071@item .]
3072represents the close collating symbol.
3073
3074@item [=
3075represents the open equivalence class.
3076
3077@item =]
3078represents the close equivalence class.
3079
3080@item [:
3081represents the open character class symbol, and should be followed by a
3082valid character class name.
3083
3084@item :]
3085represents the close character class symbol.
3086@end table
3087
3088
3089@node regexp extensions
3090@section regular expression extensions
3091
3092The following sequences have special meaning inside regular expressions
3093(used in @ref{Regexp Addresses,,addresses} and the @code{s} command).
3094
3095These can be used in both
3096@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
3097regular expressions (that is, with or without the @option{-E}/@option{-r}
3098options).
3099
3100@table @code
3101@item \w
3102Matches any ``word'' character. A ``word'' character is any
3103letter or digit or the underscore character.
3104
3105@example
3106$ echo "abc %-= def." | sed 's/\w/X/g'
3107XXX %-= XXX.
3108@end example
3109
3110
3111@item \W
3112Matches any ``non-word'' character.
3113
3114@example
3115$ echo "abc %-= def." | sed 's/\W/X/g'
3116abcXXXXXdefX
3117@end example
3118
3119
3120@item \b
3121Matches a word boundary; that is it matches if the character
3122to the left is a ``word'' character and the character to the
3123right is a ``non-word'' character, or vice-versa.
3124
3125@example
3126$ echo "abc %-= def." | sed 's/\b/X/g'
3127XabcX %-= XdefX.
3128@end example
3129
3130
3131@item \B
3132Matches everywhere but on a word boundary; that is it matches
3133if the character to the left and the character to the right
3134are either both ``word'' characters or both ``non-word''
3135characters.
3136
3137@example
3138$ echo "abc %-= def." | sed 's/\B/X/g'
3139aXbXc X%X-X=X dXeXf.X
3140@end example
3141
3142
3143@item \s
3144Matches whitespace characters (spaces and tabs).
3145Newlines embedded in the pattern/hold spaces will also match:
3146
3147@example
3148$ echo "abc %-= def." | sed 's/\s/X/g'
3149abcX%-=Xdef.
3150@end example
3151
3152
3153@item \S
3154Matches non-whitespace characters.
3155
3156@example
3157$ echo "abc %-= def." | sed 's/\S/X/g'
3158XXX XXX XXXX
3159@end example
3160
3161
3162@item \<
3163Matches the beginning of a word.
3164
3165@example
3166$ echo "abc %-= def." | sed 's/\</X/g'
3167Xabc %-= Xdef.
3168@end example
3169
3170
3171@item \>
3172Matches the end of a word.
3173
3174@example
3175$ echo "abc %-= def." | sed 's/\>/X/g'
3176abcX %-= defX.
3177@end example
3178
3179
3180@item \`
3181Matches only at the start of pattern space. This is different
3182from @code{^} in multi-line mode.
3183
3184Compare the following two examples:
3185
3186@example
3187$ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm'
3188Xa
3189Xb
3190Xc
3191
3192$ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm'
3193Xa
3194b
3195c
3196@end example
3197
3198@item \'
3199Matches only at the end of pattern space. This is different
3200from @code{$} in multi-line mode.
3201
3202
3203
3204@end table
3205
3206
3207@node Back-references and Subexpressions
3208@section Back-references and Subexpressions
3209@cindex subexpression
3210@cindex back-reference
3211
3212@dfn{back-references} are regular expression commands which refer to a
3213previous part of the matched regular expression. Back-references are
3214specified with backslash and a single digit (e.g. @samp{\1}). The
3215part of the regular expression they refer to is called a
3216@dfn{subexpression}, and is designated with parentheses.
3217
3218Back-references and subexpressions are used in two cases: in the
3219regular expression search pattern, and in the @var{replacement} part
3220of the @command{s} command (@pxref{Regexp Addresses,,Regular
3221Expression Addresses} and @ref{The "s" Command}).
3222
3223In a regular expression pattern, back-references are used to match
3224the same content as a previously matched subexpression. In the
3225following example, the subexpression is @samp{.} - any single
3226character (being surrounded by parentheses makes it a
3227subexpression). The back-reference @samp{\1} asks to match the same
3228content (same character) as the sub-expression.
3229
3230The command below matches words starting with any character,
3231followed by the letter @samp{o}, followed by the same character as the
3232first.
3233
3234@example
3235$ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
3236bob
3237mom
3238non
3239pop
3240sos
3241tot
3242wow
3243@end example
3244
3245Multiple subexpressions are automatically numbered from
3246left-to-right. This command searches for 6-letter
3247palindromes (the first three letters are 3 subexpressions,
3248followed by 3 back-references in reverse order):
3249
3250@example
3251$ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
3252redder
3253@end example
3254
3255In the @command{s} command, back-references can be
3256used in the @var{replacement} part to refer back to subexpressions in
3257the @var{regexp} part.
3258
3259The following example uses two subexpressions in the regular
3260expression to match two space-separated words. The back-references in
3261the @var{replacement} part prints the words in a different order:
3262
3263@example
3264$ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
3265The name is Bond, James Bond.
3266@end example
3267
3268
3269When used with alternation, if the group does not participate in the
3270match then the back-reference makes the whole match fail. For
3271example, @samp{a(.)|b\1} will not match @samp{ba}. When multiple
3272regular expressions are given with @option{-e} or from a file
3273(@samp{-f @var{file}}), back-references are local to each expression.
3274
3275
3276@node Escapes
3277@section Escape Sequences - specifying special characters
3278
3279@cindex GNU extensions, special escapes
3280Until this chapter, we have only encountered escapes of the form
3281@samp{\^}, which tell @command{sed} not to interpret the circumflex
3282as a special character, but rather to take it literally. For
3283example, @samp{\*} matches a single asterisk rather than zero
3284or more backslashes.
3285
3286@cindex @code{POSIXLY_CORRECT} behavior, escapes
3287This chapter introduces another kind of escape@footnote{All
3288the escapes introduced here are GNU
3289extensions, with the exception of @code{\n}. In basic regular
3290expression mode, setting @code{POSIXLY_CORRECT} disables them inside
3291bracket expressions.}---that
3292is, escapes that are applied to a character or sequence of characters
3293that ordinarily are taken literally, and that @command{sed} replaces
3294with a special character. This provides a way
3295of encoding non-printable characters in patterns in a visible manner.
3296There is no restriction on the appearance of non-printing characters
3297in a @command{sed} script but when a script is being prepared in the
3298shell or by text editing, it is usually easier to use one of
3299the following escape sequences than the binary character it
3300represents:
3301
3302The list of these escapes is:
3303
3304@table @code
3305@item \a
3306Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7).
3307
3308@item \f
3309Produces or matches a form feed (@sc{ascii} 12).
3310
3311@item \n
3312Produces or matches a newline (@sc{ascii} 10).
3313
3314@item \r
3315Produces or matches a carriage return (@sc{ascii} 13).
3316
3317@item \t
3318Produces or matches a horizontal tab (@sc{ascii} 9).
3319
3320@item \v
3321Produces or matches a so called ``vertical tab'' (@sc{ascii} 11).
3322
3323@item \c@var{x}
3324Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is
3325any character. The precise effect of @samp{\c@var{x}} is as follows:
3326if @var{x} is a lower case letter, it is converted to upper case.
3327Then bit 6 of the character (hex 40) is inverted. Thus @samp{\cz} becomes
3328hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B.
3329
3330@item \d@var{xxx}
3331Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}.
3332
3333@item \o@var{xxx}
3334Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
3335
3336@item \x@var{xx}
3337Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}.
3338@end table
3339
3340@samp{\b} (backspace) was omitted because of the conflict with
3341the existing ``word boundary'' meaning.
3342
3343@subsection Escaping Precedence
3344
3345@value{SSED} processes escape sequences @emph{before} passing
3346the text onto the regular-expression matching of the @command{s///} command
3347and Address matching. Thus the following two commands are equivalent
3348(@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}):
3349
3350@codequoteundirected on
3351@codequotebacktick on
3352@example
3353@group
3354$ echo 'a^c' | sed 's/^/b/'
3355ba^c
3356
3357$ echo 'a^c' | sed 's/\x5e/b/'
3358ba^c
3359@end group
3360@end example
3361@codequoteundirected off
3362@codequotebacktick off
3363
3364As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal
3365@sc{ascii} values of @samp{[},@samp{]}, respectively):
3366
3367@codequoteundirected on
3368@codequotebacktick on
3369@example
3370@group
3371$ echo abc | sed 's/[a]/x/'
3372Xbc
3373$ echo abc | sed 's/\x5ba\x5d/x/'
3374Xbc
3375@end group
3376@end example
3377@codequoteundirected off
3378@codequotebacktick off
3379
3380However it is recommended to avoid such special characters
3381due to unexpected edge-cases. For example, the following
3382are not equivalent:
3383
3384@codequoteundirected on
3385@codequotebacktick on
3386@example
3387@group
3388$ echo 'a^c' | sed 's/\^/b/'
3389abc
3390
3391$ echo 'a^c' | sed 's/\\\x5e/b/'
3392a^c
3393@end group
3394@end example
3395@codequoteundirected off
3396@codequotebacktick off
3397
3398@c also: this fails in different places:
3399@c $ sed 's/[//'
3400@c sed: -e expression #1, char 5: unterminated `s' command
3401@c $ sed 's/\x5b//'
3402@c sed: -e expression #1, char 8: Invalid regular expression
3403@c
3404@c which is OK but confusing to explain why (the first
3405@c fails in compile.c:snarf_char_class while the second
3406@c is passed to the regex engine and then fails).
3407
3408
3409@node Locale Considerations
3410@section Multibyte characters and Locale Considerations
3411
3412@value{SSED} processes valid multibyte characters in multibyte locales
3413(e.g. @code{UTF-8}). @footnote{Some regexp edge-cases depends on the
3414operating system and libc implementation. The examples shown are known
3415to work as-expected on GNU/Linux systems using glibc.}
3416
3417@noindent The following example uses the Greek letter Capital Sigma
3418(@value{ucsigma},
3419Unicode code point @code{0x03A3}). In a @code{UTF-8} locale,
3420@command{sed} correctly processes the Sigma as one character despite
3421it being 2 octets (bytes):
3422
3423@codequoteundirected on
3424@codequotebacktick on
3425@example
3426@group
3427$ locale | grep LANG
3428LANG=en_US.UTF-8
3429
3430$ printf 'a\u03A3b'
3431a@value{ucsigma}b
3432
3433$ printf 'a\u03A3b' | sed 's/./X/g'
3434XXX
3435
3436$ printf 'a\u03A3b' | od -tx1 -An
3437 61 ce a3 62
3438@end group
3439@end example
3440@codequoteundirected off
3441@codequotebacktick off
3442
3443@noindent
3444To force @command{sed} to process octets separately, use the @code{C} locale
3445(also known as the @code{POSIX} locale):
3446
3447@codequoteundirected on
3448@codequotebacktick on
3449@example
3450$ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g'
3451XXXX
3452@end example
3453@codequoteundirected off
3454@codequotebacktick off
3455
3456@subsection Invalid multibyte characters
3457
3458@command{sed}'s regular expressions @emph{do not} match
3459invalid multibyte sequences in a multibyte locale.
3460
3461@noindent
3462In the following examples, the ascii value @code{0xCE} is
3463an incomplete multibyte character (shown here as @value{unicodeFFFD}).
3464The regular expression @samp{.} does not match it:
3465
3466@codequoteundirected on
3467@codequotebacktick on
3468@example
3469@group
3470$ printf 'a\xCEb\n'
3471a@value{unicodeFFFD}e
3472
3473$ printf 'a\xCEb\n' | sed 's/./X/g'
3474X@value{unicodeFFFD}X
3475
3476$ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An
3477 58 ce 58 0a
3478 X X \n
3479@end group
3480@end example
3481@codequoteundirected off
3482@codequotebacktick off
3483
3484@noindent Similarly, the 'catch-all' regular expression @samp{.*} does not
3485match the entire line:
3486
3487@codequoteundirected on
3488@codequotebacktick on
3489@example
3490@group
3491$ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An
3492 ce 63 0a
3493 c \n
3494@end group
3495@end example
3496@codequoteundirected off
3497@codequotebacktick off
3498
3499@noindent
3500@value{SSED} offers the special @command{z} command to clear the
3501current pattern space regardless of invalid multibyte characters
3502(i.e. it works like @code{s/.*//} but also removes invalid multibyte
3503characters):
3504
3505@codequoteundirected on
3506@codequotebacktick on
3507@example
3508@group
3509$ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An
3510 0a
3511 \n
3512@end group
3513@end example
3514@codequoteundirected off
3515@codequotebacktick off
3516
3517@noindent Alternatively, force the @code{C} locale to process
3518each octet separately (every octet is a valid character in the @code{C}
3519locale):
3520
3521@codequoteundirected on
3522@codequotebacktick on
3523@example
3524@group
3525$ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An
3526 0a
3527 \n
3528@end group
3529@end example
3530@codequoteundirected off
3531@codequotebacktick off
3532
3533
3534@command{sed}'s inability to process invalid multibyte characters
3535can be used to detect such invalid sequences in a file.
3536In the following examples, the @code{\xCE\xCE} is an invalid
3537multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence
3538(of the Greek Sigma character).
3539
3540@noindent
3541The following @command{sed} program removes all valid
3542characters using @code{s/.//g}. Any content left in the pattern space
3543(the invalid characters) are added to the hold space using the
3544@code{H} command. On the last line (@code{$}), the hold space is retrieved
3545(@code{x}), newlines are removed (@code{s/\n//g}), and any remaining
3546octets are printed unambiguously (@code{l}). Thus, any invalid
3547multibyte sequences are printed as octal values:
3548
3549@codequoteundirected on
3550@codequotebacktick on
3551@example
3552@group
3553$ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
3554
3555$ cat invalid.txt
3556ab
3557c
3558@value{unicodeFFFD}@value{unicodeFFFD}de
3559@value{ucsigma}f
3560
3561$ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt
3562\316\316$
3563@end group
3564@end example
3565@codequoteundirected off
3566@codequotebacktick off
3567
3568@noindent With a few more commands, @command{sed} can print
3569the exact line number corresponding to each invalid characters (line 3).
3570These characters can then be removed by forcing the @code{C} locale
3571and using octal escape sequences:
3572
3573@codequoteundirected on
3574@codequotebacktick on
3575@example
3576$ sed -n 's/.//g;=;l' invalid.txt | paste - - | awk '$2!="$"'
35773 \316\316$
3578
3579$ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
3580@end example
3581@codequoteundirected off
3582@codequotebacktick off
3583
3584@subsection Upper/Lower case conversion
3585
3586
3587@value{SSED}'s substitute command (@code{s}) supports upper/lower
3588case conversions using @code{\U},@code{\L} codes.
3589These conversions support multibyte characters:
3590
3591@codequoteundirected on
3592@codequotebacktick on
3593@example
3594$ printf 'ABC\u03a3\n'
3595ABC@value{ucsigma}
3596
3597$ printf 'ABC\u03a3\n' | sed 's/.*/\L&/'
3598abc@value{lcsigma}
3599@end example
3600@codequoteundirected off
3601@codequotebacktick off
3602
3603@noindent
3604@xref{The "s" Command}.
3605
3606
3607@subsection Multibyte regexp character classes
3608
3609@c TODO: fix following paragraphs (copied verbatim from 'bracket
3610@c expression' section).
3611
3612In other locales, the sorting sequence is not specified, and
3613@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
3614@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
3615characters that it matches might even be erratic.
3616To obtain the traditional interpretation
3617of bracket expressions, you can use the @samp{C} locale by setting the
3618@env{LC_ALL} environment variable to the value @samp{C}.
3619
3620@example
3621# TODO: is there any real-world system/locale where 'A'
3622# is replaced by '-' ?
3623$ echo A | sed 's/[a-z]/-/'
3624A
3625@end example
3626
3627Their interpretation depends on the @env{LC_CTYPE} locale;
3628for example, @samp{[[:alnum:]]} means the character class of numbers and letters
3629in the current locale.
3630
3631TODO: show example of collation
3632
3633@codequoteundirected on
3634@codequotebacktick on
3635@example
3636# TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
3637$ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
3638clichX
3639@end example
3640@codequoteundirected off
3641@codequotebacktick off
3642
3643
3644@node advanced sed
3645@chapter Advanced @command{sed}: cycles and buffers
3646
3647@menu
3648* Execution Cycle:: How @command{sed} works
3649* Hold and Pattern Buffers::
3650* Multiline techniques:: Using D,G,H,N,P to process multiple lines
3651* Branching and flow control::
3652@end menu
3653
3654@node Execution Cycle
3655@section How @command{sed} Works
3656
3657@cindex Buffer spaces, pattern and hold
3658@cindex Spaces, pattern and hold
3659@cindex Pattern space, definition
3660@cindex Hold space, definition
3661@command{sed} maintains two data buffers: the active @emph{pattern} space,
3662and the auxiliary @emph{hold} space. Both are initially empty.
3663
3664@command{sed} operates by performing the following cycle on each
3665line of input: first, @command{sed} reads one line from the input
3666stream, removes any trailing newline, and places it in the pattern space.
3667Then commands are executed; each command can have an address associated
3668to it: addresses are a kind of condition code, and a command is only
3669executed if the condition is verified before the command is to be
3670executed.
3671
3672When the end of the script is reached, unless the @option{-n} option
3673is in use, the contents of pattern space are printed out to the output
3674stream, adding back the trailing newline if it was removed.@footnote{Actually,
3675if @command{sed} prints a line without the terminating newline, it will
3676nevertheless print the missing newline as soon as more text is sent to
3677the same output stream, which gives the ``least expected surprise''
3678even though it does not make commands like @samp{sed -n p} exactly
3679identical to @command{cat}.} Then the next cycle starts for the next
3680input line.
3681
3682Unless special commands (like @samp{D}) are used, the pattern space is
3683deleted between two cycles. The hold space, on the other hand, keeps
3684its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
3685@samp{g}, @samp{G} to move data between both buffers).
3686
3687@node Hold and Pattern Buffers
3688@section Hold and Pattern Buffers
3689
3690TODO
3691
3692@node Multiline techniques
3693@section Multiline techniques - using D,G,H,N,P to process multiple lines
3694
3695Multiple lines can be processed as one buffer using the
3696@code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to
3697their lowercase counterparts (@code{d},@code{g},
3698@code{h},@code{n},@code{p}), except that these commands append or
3699subtract data while respecting embedded newlines - allowing adding and
3700removing lines from the pattern and hold spaces.
3701
3702They operate as follows:
3703@table @code
3704@item D
3705@emph{deletes} line from the pattern space until the first newline,
3706and restarts the cycle.
3707
3708@item G
3709@emph{appends} line from the hold space to the pattern space, with a
3710newline before it.
3711
3712@item H
3713@emph{appends} line from the pattern space to the hold space, with a
3714newline before it.
3715
3716@item N
3717@emph{appends} line from the input file to the pattern space.
3718
3719@item P
3720@emph{prints} line from the pattern space until the first newline.
3721
3722@end table
3723
3724
3725The following example illustrates the operation of @code{N} and
3726@code{D} commands:
3727
3728@codequoteundirected on
3729@codequotebacktick on
3730@example
3731@group
3732$ seq 6 | sed -n 'N;l;D'
37331\n2$
37342\n3$
37353\n4$
37364\n5$
37375\n6$
3738@end group
3739@end example
3740@codequoteundirected off
3741@codequotebacktick off
3742
3743@enumerate
3744@item
3745@command{sed} starts by reading the first line into the pattern space
3746(i.e. @samp{1}).
3747@item
3748At the beginning of every cycle, the @code{N}
3749command appends a newline and the next line to the pattern space
3750(i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle).
3751@item
3752The @code{l} command prints the content of the pattern space
3753unambiguously.
3754@item
3755The @code{D} command then removes the content of pattern
3756space up to the first newline (leaving @samp{2} at the end of
3757the first cycle).
3758@item
3759At the next cycle the @code{N} command appends a
3760newline and the next input line to the pattern space
3761(e.g. @samp{2}, @samp{\n}, @samp{3}).
3762@end enumerate
3763
3764
3765@cindex processing paragraphs
3766@cindex paragraphs, processing
3767A common technique to process blocks of text such as paragraphs
3768(instead of line-by-line) is using the following construct:
3769
3770@codequoteundirected on
3771@codequotebacktick on
3772@example
3773sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/'
3774@end example
3775@codequoteundirected off
3776@codequotebacktick off
3777
3778@enumerate
3779@item
3780The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines,
3781and adds the current line (in the pattern space) to the hold space.
3782On all lines except the last, the pattern space is deleted and the cycle is
3783restarted.
3784
3785@item
3786The other expressions @code{x} and @code{s} are executed only on empty
3787lines (i.e. paragraph separators). The @code{x} command fetches the
3788accumulated lines from the hold space back to the pattern space. The
3789@code{s///} command then operates on all the text in the paragraph
3790(including the embedded newlines).
3791@end enumerate
3792
3793The following example demonstrates this technique:
3794@codequoteundirected on
3795@codequotebacktick on
3796@example
3797@group
3798$ cat input.txt
3799a a a aa aaa
3800aaaa aaaa aa
3801aaaa aaa aaa
3802
3803bbbb bbb bbb
3804bb bb bbb bb
3805bbbbbbbb bbb
3806
3807ccc ccc cccc
3808cccc ccccc c
3809cc cc cc cc
3810
3811$ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
3812
3813START-->
3814a a a aa aaa
3815aaaa aaaa aa
3816aaaa aaa aaa
3817<--END
3818
3819START-->
3820bbbb bbb bbb
3821bb bb bbb bb
3822bbbbbbbb bbb
3823<--END
3824
3825START-->
3826ccc ccc cccc
3827cccc ccccc c
3828cc cc cc cc
3829<--END
3830@end group
3831@end example
3832@codequoteundirected off
3833@codequotebacktick off
3834
3835For more annotated examples, @pxref{Text search across multiple lines}
3836and @ref{Line length adjustment}.
3837
3838@node Branching and flow control
3839@section Branching and Flow Control
3840
3841The branching commands @code{b}, @code{t}, and @code{T} enable
3842changing the flow of @command{sed} programs.
3843
3844By default, @command{sed} reads an input line into the pattern buffer,
3845then continues to processes all commands in order.
3846Commands without addresses affect all lines.
3847Commands with addresses affect only matching lines.
3848@xref{Execution Cycle} and @ref{Addresses overview}.
3849
3850@command{sed} does not support a typical @code{if/then} construct.
3851Instead, some commands can be used as conditionals or to change the
3852default flow control:
3853
3854@table @code
3855
3856@item d
3857delete (clears) the current pattern space,
3858and restart the program cycle without processing the rest of the commands
3859and without printing the pattern space.
3860
3861@item D
3862delete the contents of the pattern space @emph{up to the first newline},
3863and restart the program cycle without processing the rest of
3864the commands and without printing the pattern space.
3865
3866@item [addr]X
3867@itemx [addr]@{ X ; X ; X @}
3868@item /regexp/X
3869@item /regexp/@{ X ; X ; X @}
3870Addresses and regular expressions can be used as an @code{if/then}
3871conditional: If @var{[addr]} matches the current pattern space,
3872execute the command(s).
3873For example: The command @code{/^#/d} means:
3874@emph{if} the current pattern matches the regular expression @code{^#} (a line
3875starting with a hash), @emph{then} execute the @code{d} command:
3876delete the line without printing it, and restart the program cycle
3877immediately.
3878
3879@item b
3880branch unconditionally (that is: always jump to a label, skipping
3881or repeating other commands, without restarting a new cycle). Combined
3882with an address, the branch can be conditionally executed on matched
3883lines.
3884
3885@item t
3886branch conditionally (that is: jump to a label) @emph{only if} a
3887@code{s///} command has succeeded since the last input line was read
3888or another conditional branch was taken.
3889
3890@item T
3891similar but opposite to the @code{t} command: branch only if
3892there has been @emph{no} successful substitutions since the last
3893input line was read.
3894@end table
3895
3896
3897The following two @command{sed} programs are equivalent. The first
3898(contrived) example uses the @code{b} command to skip the @code{s///}
3899command on lines containing @samp{1}. The second example uses an
3900address with negation (@samp{!}) to perform substitution only on
3901desired lines. The @code{y///} command is still executed on all
3902lines:
3903
3904@codequoteundirected on
3905@codequotebacktick on
3906@example
3907@group
3908$ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
3909a4
3910z5
3911z6
3912
3913$ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
3914a4
3915z5
3916z6
3917@end group
3918@end example
3919@codequoteundirected off
3920@codequotebacktick off
3921
3922
3923
3924@subsection Branching and Cycles
3925@cindex labels
3926@cindex omitting labels
3927@cindex cycle, restarting
3928@cindex restarting a cycle
3929The @code{b},@code{t} and @code{T} commands can be followed by a label
3930(typically a single letter). Labels are defined with a colon followed by
3931one or more letters (e.g. @samp{:x}). If the label is omitted the
3932branch commands restart the cycle. Note the difference between
3933branching to a label and restarting the cycle: when a cycle is
3934restarted, @command{sed} first prints the current content of the
3935pattern space, then reads the next input line into the pattern space;
3936Jumping to a label (even if it is at the beginning of the program)
3937does not print the pattern space and does not read the next input line.
3938
3939The following program is a no-op. The @code{b} command (the only command
3940in the program) does not have a label, and thus simply restarts the cycle.
3941On each cycle, the pattern space is printed and the next input line is read:
3942
3943@example
3944@group
3945$ seq 3 | sed b
39461
39472
39483
3949@end group
3950@end example
3951
3952@cindex infinite loop, branching
3953@cindex branching, infinite loop
3954The following example is an infinite-loop - it doesn't terminate and
3955doesn't print anything. The @code{b} command jumps to the @samp{x}
3956label, and a new cycle is never started:
3957
3958@codequoteundirected on
3959@codequotebacktick on
3960@example
3961@group
3962$ seq 3 | sed ':x ; bx'
3963
3964# The above command requires gnu sed (which supports additional
3965# commands following a label, without a newline). A portable equivalent:
3966# sed -e ':x' -e bx
3967@end group
3968@end example
3969@codequoteundirected off
3970@codequotebacktick off
3971
3972@cindex branching and n, N
3973@cindex n, and branching
3974@cindex N, and branching
3975Branching is often complemented with the @code{n} or @code{N} commands:
3976both commands read the next input line into the pattern space without waiting
3977for the cycle to restart. Before reading the next input line, @code{n}
3978prints the current pattern space then empties it, while @code{N}
3979appends a newline and the next input line to the pattern space.
3980
3981Consider the following two examples:
3982
3983@codequoteundirected on
3984@codequotebacktick on
3985@example
3986@group
3987$ seq 3 | sed ':x ; n ; bx'
39881
39892
39903
3991
3992$ seq 3 | sed ':x ; N ; bx'
39931
39942
39953
3996@end group
3997@end example
3998@codequoteundirected off
3999@codequotebacktick off
4000
4001@itemize
4002@item
4003Both examples do not inf-loop, despite never starting a new cycle.
4004
4005@item
4006In the first example, the @code{n} commands first prints the content
4007of the pattern space, empties the pattern space then reads the next
4008input line.
4009
4010@item
4011In the second example, the @code{N} commands appends the next input
4012line to the pattern space (with a newline). Lines are accumulated in
4013the pattern space until there are no more input lines to read, then
4014the @code{N} command terminates the @command{sed} program. When the
4015program terminates, the end-of-cycle actions are performed, and the
4016entire pattern space is printed.
4017
4018@item
4019The second example requires @value{SSED},
4020because it uses the non-POSIX-standard behavior of @code{N}.
4021See the ``@code{N} command on the last line'' paragraph
4022in @ref{Reporting Bugs}.
4023
4024@item
4025To further examine the difference between the two examples,
4026try the following commands:
4027@codequoteundirected on
4028@codequotebacktick on
4029@example
4030@group
4031printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
4032printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
4033printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
4034printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
4035@end group
4036@end example
4037@codequoteundirected off
4038@codequotebacktick off
4039
4040@end itemize
4041
4042
4043
4044@subsection Branching example: joining lines
4045
4046@cindex joining lines with branching
4047@cindex branching, joining lines
4048@cindex quoted-printable lines, joining
4049@cindex joining quoted-printable lines
4050@cindex t, joining lines with
4051@cindex b, joining lines with
4052@cindex b, versus t
4053@cindex t, versus b
4054As a real-world example of using branching, consider the case of
4055@uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files,
4056typically used to encode email messages.
4057In these files long lines are split and marked with a @dfn{soft line break}
4058consisting of a single @samp{=} character at the end of the line:
4059
4060@example
4061@group
4062$ cat jaques.txt
4063All the wor=
4064ld's a stag=
4065e,
4066And all the=
4067 men and wo=
4068men merely =
4069players:
4070They have t=
4071heir exits =
4072and their e=
4073ntrances;
4074And one man=
4075 in his tim=
4076e plays man=
4077y parts.
4078@end group
4079@end example
4080
4081
4082The following program uses an address match @samp{/=$/} as a
4083conditional: If the current pattern space ends with a @samp{=}, it
4084reads the next input line using @code{N}, replaces all @samp{=}
4085characters which are followed by a newline, and unconditionally
4086branches (@code{b}) to the beginning of the program without restarting
4087a new cycle. If the pattern space does not ends with @samp{=}, the
4088default action is performed: the pattern space is printed and a new
4089cycle is started:
4090
4091@codequoteundirected on
4092@codequotebacktick on
4093@example
4094@group
4095$ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt
4096All the world's a stage,
4097And all the men and women merely players:
4098They have their exits and their entrances;
4099And one man in his time plays many parts.
4100@end group
4101@end example
4102@codequoteundirected off
4103@codequotebacktick off
4104
4105Here's an alternative program with a slightly different approach: On
4106all lines except the last, @code{N} appends the line to the pattern
4107space. A substitution command then removes soft line breaks
4108(@samp{=} at the end of a line, i.e. followed by a newline) by replacing
4109them with an empty string.
4110@emph{if} the substitution was successful (meaning the pattern space contained
4111a line which should be joined), The conditional branch command @code{t} jumps
4112to the beginning of the program without completing or restarting the cycle.
4113If the substitution failed (meaning there were no soft line breaks),
4114The @code{t} command will @emph{not} branch. Then, @code{P} will
4115print the pattern space content until the first newline, and @code{D}
4116will delete the pattern space content until the first new line.
4117(To learn more about @code{N}, @code{P} and @code{D} commands
4118@pxref{Multiline techniques}).
4119
4120
4121@codequoteundirected on
4122@codequotebacktick on
4123@example
4124@group
4125$ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
4126All the world's a stage,
4127And all the men and women merely players:
4128They have their exits and their entrances;
4129And one man in his time plays many parts.
4130@end group
4131@end example
4132@codequoteundirected off
4133@codequotebacktick off
4134
4135
4136For more line-joining examples @pxref{Joining lines}.
4137
4138
4139@node Examples
4140@chapter Some Sample Scripts
4141
4142Here are some @command{sed} scripts to guide you in the art of mastering
4143@command{sed}.
4144
4145@menu
4146
4147Useful one-liners:
4148* Joining lines::
4149
4150Some exotic examples:
4151* Centering lines::
4152* Increment a number::
4153* Rename files to lower case::
4154* Print bash environment::
4155* Reverse chars of lines::
4156* Text search across multiple lines::
4157* Line length adjustment::
4158* Adding a header to multiple files::
4159
4160Emulating standard utilities:
4161* tac:: Reverse lines of files
4162* cat -n:: Numbering lines
4163* cat -b:: Numbering non-blank lines
4164* wc -c:: Counting chars
4165* wc -w:: Counting words
4166* wc -l:: Counting lines
4167* head:: Printing the first lines
4168* tail:: Printing the last lines
4169* uniq:: Make duplicate lines unique
4170* uniq -d:: Print duplicated lines of input
4171* uniq -u:: Remove all duplicated lines
4172* cat -s:: Squeezing blank lines
4173@end menu
4174
4175@node Joining lines
4176@section Joining lines
4177
4178This section uses @code{N}, @code{D} and @code{P} commands to process
4179multiple lines, and the @code{b} and @code{t} commands for branching.
4180@xref{Multiline techniques} and @ref{Branching and flow control}.
4181
4182Join specific lines (e.g. if lines 2 and 3 need to be joined):
4183
4184@codequoteundirected on
4185@codequotebacktick on
4186@example
4187$ cat lines.txt
4188hello
4189hel
4190lo
4191hello
4192
4193$ sed '2@{N;s/\n//;@}' lines.txt
4194hello
4195hello
4196hello
4197@end example
4198@codequoteundirected off
4199@codequotebacktick off
4200
4201Join backslash-continued lines:
4202
4203@codequoteundirected on
4204@codequotebacktick on
4205@example
4206$ cat 1.txt
4207this \
4208is \
4209a \
4210long \
4211line
4212and another \
4213line
4214
4215$ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}' 1.txt
4216this is a long line
4217and another line
4218
4219
4220#TODO: The above requires gnu sed.
4221# non-gnu seds need newlines after ':' and 'b'
4222@end example
4223@codequoteundirected off
4224@codequotebacktick off
4225
4226Join lines that start with whitespace (e.g SMTP headers):
4227
4228@codequoteundirected on
4229@codequotebacktick on
4230@example
4231@group
4232$ cat 2.txt
4233Subject: Hello
4234 World
4235Content-Type: multipart/alternative;
4236 boundary=94eb2c190cc6370f06054535da6a
4237Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
4238Authentication-Results: mx.gnu.org;
4239 dkim=pass header.i=@@gnu.org;
4240 spf=pass
4241Message-ID: <abcdef@@gnu.org>
4242From: John Doe <jdoe@@gnu.org>
4243To: Jane Smith <jsmith@@gnu.org>
4244
4245$ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
4246Subject: Hello World
4247Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
4248Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
4249Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass
4250Message-ID: <abcdef@@gnu.org>
4251From: John Doe <jdoe@@gnu.org>
4252To: Jane Smith <jsmith@@gnu.org>
4253
4254# A portable (non-gnu) variation:
4255# sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D'
4256@end group
4257@end example
4258@codequoteundirected off
4259@codequotebacktick off
4260
4261
4262@node Centering lines
4263@section Centering Lines
4264
4265This script centers all lines of a file on a 80 columns width.
4266To change that width, the number in @code{\@{@dots{}\@}} must be
4267replaced, and the number of added spaces also must be changed.
4268
4269Note how the buffer commands are used to separate parts in
4270the regular expressions to be matched---this is a common
4271technique.
4272
4273@c start-------------------------------------------
4274@example
4275#!/usr/bin/sed -f
4276
4277@group
4278# Put 80 spaces in the buffer
42791 @{
4280 x
4281 s/^$/ /
4282 s/^.*$/&&&&&&&&/
4283 x
4284@}
4285@end group
4286
4287@group
4288# delete leading and trailing spaces
4289y/@kbd{@key{TAB}}/ /
4290s/^ *//
4291s/ *$//
4292@end group
4293
4294@group
4295# add a newline and 80 spaces to end of line
4296G
4297@end group
4298
4299@group
4300# keep first 81 chars (80 + a newline)
4301s/^\(.\@{81\@}\).*$/\1/
4302@end group
4303
4304@group
4305# \2 matches half of the spaces, which are moved to the beginning
4306s/^\(.*\)\n\(.*\)\2/\2\1/
4307@end group
4308@end example
4309@c end---------------------------------------------
4310
4311@node Increment a number
4312@section Increment a Number
4313
4314This script is one of a few that demonstrate how to do arithmetic
4315in @command{sed}. This is indeed possible,@footnote{@command{sed} guru Greg
4316Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator!
4317It is distributed together with sed.} but must be done manually.
4318
4319To increment one number you just add 1 to last digit, replacing
4320it by the following digit. There is one exception: when the digit
4321is a nine the previous digits must be also incremented until you
4322don't have a nine.
4323
4324This solution by Bruno Haible is very clever and smart because
4325it uses a single buffer; if you don't have this limitation, the
4326algorithm used in @ref{cat -n, Numbering lines}, is faster.
4327It works by replacing trailing nines with an underscore, then
4328using multiple @code{s} commands to increment the last digit,
4329and then again substituting underscores with zeros.
4330
4331@c start-------------------------------------------
4332@example
4333#!/usr/bin/sed -f
4334
4335/[^0-9]/ d
4336
4337@group
4338# replace all trailing 9s by _ (any other character except digits, could
4339# be used)
4340:d
4341s/9\(_*\)$/_\1/
4342td
4343@end group
4344
4345@group
4346# incr last digit only. The first line adds a most-significant
4347# digit of 1 if we have to add a digit.
4348@end group
4349
4350@group
4351s/^\(_*\)$/1\1/; tn
4352s/8\(_*\)$/9\1/; tn
4353s/7\(_*\)$/8\1/; tn
4354s/6\(_*\)$/7\1/; tn
4355s/5\(_*\)$/6\1/; tn
4356s/4\(_*\)$/5\1/; tn
4357s/3\(_*\)$/4\1/; tn
4358s/2\(_*\)$/3\1/; tn
4359s/1\(_*\)$/2\1/; tn
4360s/0\(_*\)$/1\1/; tn
4361@end group
4362
4363@group
4364:n
4365y/_/0/
4366@end group
4367@end example
4368@c end---------------------------------------------
4369
4370@node Rename files to lower case
4371@section Rename Files to Lower Case
4372
4373This is a pretty strange use of @command{sed}. We transform text, and
4374transform it to be shell commands, then just feed them to shell.
4375Don't worry, even worse hacks are done when using @command{sed}; I have
4376seen a script converting the output of @command{date} into a @command{bc}
4377program!
4378
4379The main body of this is the @command{sed} script, which remaps the name
4380from lower to upper (or vice-versa) and even checks out
4381if the remapped name is the same as the original name.
4382Note how the script is parameterized using shell
4383variables and proper quoting.
4384
4385@c start-------------------------------------------
4386@example
4387@group
4388#! /bin/sh
4389# rename files to lower/upper case...
4390#
4391# usage:
4392# move-to-lower *
4393# move-to-upper *
4394# or
4395# move-to-lower -R .
4396# move-to-upper -R .
4397#
4398@end group
4399
4400@group
4401help()
4402@{
4403 cat << eof
4404Usage: $0 [-n] [-r] [-h] files...
4405@end group
4406
4407@group
4408-n do nothing, only see what would be done
4409-R recursive (use find)
4410-h this message
4411files files to remap to lower case
4412@end group
4413
4414@group
4415Examples:
4416 $0 -n * (see if everything is ok, then...)
4417 $0 *
4418@end group
4419
4420 $0 -R .
4421
4422@group
4423eof
4424@}
4425@end group
4426
4427@group
4428apply_cmd='sh'
4429finder='echo "$@@" | tr " " "\n"'
4430files_only=
4431@end group
4432
4433@group
4434while :
4435do
4436 case "$1" in
4437 -n) apply_cmd='cat' ;;
4438 -R) finder='find "$@@" -type f';;
4439 -h) help ; exit 1 ;;
4440 *) break ;;
4441 esac
4442 shift
4443done
4444@end group
4445
4446@group
4447if [ -z "$1" ]; then
4448 echo Usage: $0 [-h] [-n] [-r] files...
4449 exit 1
4450fi
4451@end group
4452
4453@group
4454LOWER='abcdefghijklmnopqrstuvwxyz'
4455UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
4456@end group
4457
4458@group
4459case `basename $0` in
4460 *upper*) TO=$UPPER; FROM=$LOWER ;;
4461 *) FROM=$UPPER; TO=$LOWER ;;
4462esac
4463@end group
4464
4465eval $finder | sed -n '
4466
4467@group
4468# remove all trailing slashes
4469s/\/*$//
4470@end group
4471
4472@group
4473# add ./ if there is no path, only a filename
4474/\//! s/^/.\//
4475@end group
4476
4477@group
4478# save path+filename
4479h
4480@end group
4481
4482@group
4483# remove path
4484s/.*\///
4485@end group
4486
4487@group
4488# do conversion only on filename
4489y/'$FROM'/'$TO'/
4490@end group
4491
4492@group
4493# now line contains original path+file, while
4494# hold space contains the new filename
4495x
4496@end group
4497
4498@group
4499# add converted file name to line, which now contains
4500# path/file-name\nconverted-file-name
4501G
4502@end group
4503
4504@group
4505# check if converted file name is equal to original file name,
4506# if it is, do not print anything
4507/^.*\/\(.*\)\n\1/b
4508@end group
4509
4510@group
4511# escape special characters for the shell
4512s/["$`\\]/\\&/g
4513@end group
4514
4515@group
4516# now, transform path/fromfile\n, into
4517# mv path/fromfile path/tofile and print it
4518s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p
4519@end group
4520
4521' | $apply_cmd
4522@end example
4523@c end---------------------------------------------
4524
4525@node Print bash environment
4526@section Print @command{bash} Environment
4527
4528This script strips the definition of the shell functions
4529from the output of the @command{set} Bourne-shell command.
4530
4531@c start-------------------------------------------
4532@example
4533#!/bin/sh
4534
4535@group
4536set | sed -n '
4537:x
4538@end group
4539
4540@group
4541@ifinfo
4542# if no occurrence of "=()" print and load next line
4543@end ifinfo
4544@ifnotinfo
4545# if no occurrence of @samp{=()} print and load next line
4546@end ifnotinfo
4547/=()/! @{ p; b; @}
4548/ () $/! @{ p; b; @}
4549@end group
4550
4551@group
4552# possible start of functions section
4553# save the line in case this is a var like FOO="() "
4554h
4555@end group
4556
4557@group
4558# if the next line has a brace, we quit because
4559# nothing comes after functions
4560n
4561/^@{/ q
4562@end group
4563
4564@group
4565# print the old line
4566x; p
4567@end group
4568
4569@group
4570# work on the new line now
4571x; bx
4572'
4573@end group
4574@end example
4575@c end---------------------------------------------
4576
4577@node Reverse chars of lines
4578@section Reverse Characters of Lines
4579
4580This script can be used to reverse the position of characters
4581in lines. The technique moves two characters at a time, hence
4582it is faster than more intuitive implementations.
4583
4584Note the @code{tx} command before the definition of the label.
4585This is often needed to reset the flag that is tested by
4586the @code{t} command.
4587
4588Imaginative readers will find uses for this script. An example
4589is reversing the output of @command{banner}.@footnote{This requires
4590another script to pad the output of banner; for example
4591
4592@example
4593#! /bin/sh
4594
4595banner -w $1 $2 $3 $4 |
4596 sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' |
4597 ~/sedscripts/reverseline.sed
4598@end example
4599}
4600
4601@c start-------------------------------------------
4602@example
4603#!/usr/bin/sed -f
4604
4605/../! b
4606
4607@group
4608# Reverse a line. Begin embedding the line between two newlines
4609s/^.*$/\
4610&\
4611/
4612@end group
4613
4614@group
4615# Move first character at the end. The regexp matches until
4616# there are zero or one characters between the markers
4617tx
4618:x
4619s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/
4620tx
4621@end group
4622
4623@group
4624# Remove the newline markers
4625s/\n//g
4626@end group
4627@end example
4628@c end---------------------------------------------
4629
4630
4631@node Text search across multiple lines
4632@section Text search across multiple lines
4633
4634This section uses @code{N} and @code{D} commands to search for
4635consecutive words spanning multiple lines. @xref{Multiline techniques}.
4636
4637These examples deal with finding doubled occurrences of words in a document.
4638
4639Finding doubled words in a single line is easy using GNU @command{grep}
4640and similarly with @value{SSED}:
4641
4642@c NOTE: in all examples, 'the@ the' is used to prevent
4643@c 'make syntax-check' from complaining about double words.
4644@codequoteundirected on
4645@codequotebacktick on
4646@example
4647@group
4648$ cat two-cities-dup1.txt
4649It was the best of times,
4650it was the worst of times,
4651it was the@ the age of wisdom,
4652it was the age of foolishness,
4653
4654$ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
4655it was the@ the age of wisdom,
4656
4657$ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
46583:it was the@ the age of wisdom,
4659
4660$ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
4661it was the@ the age of wisdom,
4662
4663$ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt
46643
4665it was the@ the age of wisdom,
4666@end group
4667@end example
4668@codequoteundirected off
4669@codequotebacktick off
4670
4671@itemize @bullet
4672@item
4673The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}),
4674followed by one-or-more word-characters (@samp{\w+}), followed by whitespace
4675(@samp{\s+}). @xref{regexp extensions}.
4676
4677@item
4678Adding parentheses around the @samp{(\w+)} expression creates a subexpression.
4679The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression
4680(in the parentheses) followed by a back-reference, separated by whitespace.
4681A successful match means the @var{PATTERN} was repeated twice in succession.
4682@xref{Back-references and Subexpressions}.
4683
4684@item
4685The word-boundery expression (@samp{\b}) at both ends ensures partial
4686words are not matched (e.g. @samp{the then} is not a desired match).
4687@c Thanks to Jim for pointing this out in
4688@c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html
4689
4690@item
4691The @option{-E} option enables extended regular expression syntax, alleviating
4692the need to add backslashes before the parenthesis. @xref{ERE syntax}.
4693
4694@end itemize
4695
4696When the doubled word span two lines the above regular expression
4697will not find them as @command{grep} and @command{sed} operate line-by-line.
4698
4699By using @command{N} and @command{D} commands, @command{sed} can apply
4700regular expressions on multiple lines (that is, multiple lines are stored
4701in the pattern space, and the regular expression works on it):
4702
4703@c NOTE: use 'the@*the' instead of a real new line to prevent
4704@c 'make syntax-check' to complain about doubled-words.
4705@codequoteundirected on
4706@codequotebacktick on
4707@example
4708$ cat two-cities-dup2.txt
4709It was the best of times, it was the
4710worst of times, it was the@*the age of wisdom,
4711it was the age of foolishness,
4712
4713$ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}' two-cities-dup2.txt
47143
4715worst of times, it was the@*the age of wisdom,
4716@end example
4717@codequoteundirected off
4718@codequotebacktick off
4719
4720@itemize @bullet
4721@item
4722The @command{N} command appends the next line to the pattern space
4723(thus ensuring it contains two consecutive lines in every cycle).
4724
4725@item
4726The regular expression uses @samp{\s+} for word separator which matches
4727both spaces and newlines.
4728
4729@item
4730The regular expression matches, the entire pattern space is printed
4731with @command{p}. No lines are printed by default due to the @option{-n} option.
4732
4733@item
4734The @command{D} removes the first line from the pattern space (up until the
4735first newline), readying it for the next cycle.
4736@end itemize
4737
4738See the GNU @command{coreutils} manual for an alternative solution using
4739@command{tr -s} and @command{uniq} at
4740@c NOTE: cheating and keeping the URL line shorter than 80 characters
4741@c by using 'gnu.org' and '/s/'.
4742@url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}.
4743
4744@node Line length adjustment
4745@section Line length adjustment
4746
4747This section uses @code{N} and @code{P} commands to read and write
4748lines, and the @code{b} command for branching.
4749@xref{Multiline techniques} and @ref{Branching and flow control}.
4750
4751This (somewhat contrived) example deal with formatting and wrapping
4752lines of text of the following input file:
4753
4754@example
4755@group
4756$ cat two-cities-mix.txt
4757It was the best of times, it was
4758the worst of times, it
4759was the age of
4760wisdom,
4761it
4762was
4763the age
4764of foolishness,
4765@end group
4766@end example
4767
4768@exdent The following sed program wraps lines at 40 characters:
4769@codequoteundirected on
4770@codequotebacktick on
4771@example
4772@group
4773$ cat wrap40.sed
4774# outer loop
4775:x
4776
4777# Append a newline followed by the next input line to the pattern buffer
4778N
4779
4780# Remove all newlines from the pattern buffer
4781s/\n/ /g
4782
4783
4784# Inner loop
4785:y
4786
4787# Add a newline after the first 40 characters
4788s/(.@{40,40@})/\1\n/
4789
4790# If there is a newline in the pattern buffer
4791# (i.e. the previous substitution added a newline)
4792/\n/ @{
4793 # There are newlines in the pattern buffer -
4794 # print the content until the first newline.
4795 P
4796
4797 # Remove the printed characters and the first newline
4798 s/.*\n//
4799
4800 # branch to label 'y' - repeat inner loop
4801 by
4802 @}
4803
4804# No newlines in the pattern buffer - Branch to label 'x' (outer loop)
4805# and read the next input line
4806bx
4807@end group
4808@end example
4809@codequoteundirected off
4810@codequotebacktick off
4811
4812
4813
4814@exdent The wrapped output:
4815@codequoteundirected on
4816@codequotebacktick on
4817@example
4818@group
4819$ sed -E -f wrap40.sed two-cities-mix.txt
4820It was the best of times, it was the wor
4821st of times, it was the age of wisdom, i
4822t was the age of foolishness,
4823@end group
4824@end example
4825@codequoteundirected off
4826@codequotebacktick off
4827
4828
4829
4830
4831@node Adding a header to multiple files
4832@section Adding a header to multiple files
4833
4834@value{SSED} can be used to safely modify multiple files at once.
4835
4836@exdent Add a single line to the beginning of source code files:
4837
4838@codequoteundirected on
4839@codequotebacktick on
4840@example
4841sed -i '1i/* Copyright (C) FOO BAR */' *.c
4842@end example
4843@codequoteundirected off
4844@codequotebacktick off
4845
4846@exdent Adding a few lines is possible using @samp{\n} in the text:
4847
4848@codequoteundirected on
4849@codequotebacktick on
4850@example
4851sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c
4852@end example
4853@codequoteundirected off
4854@codequotebacktick off
4855
4856To add multiple lines from another file, use @code{0rFILE}.
4857A typical use case is adding a license notice header to all files:
4858
4859@codequoteundirected on
4860@codequotebacktick on
4861@example
4862## Create the header file:
4863$ cat<<'EOF'>LIC.TXT
4864/*
4865 Copyright (C) 1989-2021 FOO BAR
4866
4867 This program is free software; you can redistribute it and/or modify
4868 it under the terms of the GNU General Public License as published by
4869 the Free Software Foundation; either version 3, or (at your option)
4870 any later version.
4871
4872 This program is distributed in the hope that it will be useful,
4873 but WITHOUT ANY WARRANTY; without even the implied warranty of
4874 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4875 GNU General Public License for more details.
4876
4877 You should have received a copy of the GNU General Public License
4878 along with this program; If not, see <https://www.gnu.org/licenses/>.
4879*/
4880EOF
4881
4882## Add the file at the beginning of all source code files:
4883$ sed -i '0rLIC.TXT' *.cpp *.h
4884@end example
4885@codequoteundirected off
4886@codequotebacktick off
4887
4888
4889With script files (e.g. @file{.sh},@file{.py},@file{.pl} files)
4890the license notice typically appears @emph{after} the first line (the
4891'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE}
4892@emph{after} the first line:
4893
4894@codequoteundirected on
4895@codequotebacktick on
4896@example
4897## Create the header file:
4898$ cat<<'EOF'>LIC.TXT
4899##
4900## Copyright (C) 1989-2021 FOO BAR
4901##
4902## This program is free software; you can redistribute it and/or modify
4903## it under the terms of the GNU General Public License as published by
4904## the Free Software Foundation; either version 3, or (at your option)
4905## any later version.
4906##
4907## This program is distributed in the hope that it will be useful,
4908## but WITHOUT ANY WARRANTY; without even the implied warranty of
4909## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4910## GNU General Public License for more details.
4911##
4912## You should have received a copy of the GNU General Public License
4913## along with this program; If not, see <https://www.gnu.org/licenses/>.
4914##
4915##
4916EOF
4917
4918## Add the file at the beginning of all source code files:
4919$ sed -i '1rLIC.TXT' *.py *.sh
4920@end example
4921@codequoteundirected off
4922@codequotebacktick off
4923
4924The above @command{sed} commands can be combined with @command{find}
4925to locate files in all subdirectories, @command{xargs} to run additional
4926commands on selected files and @command{grep} to filter out files that already
4927contain a copyright notice:
4928
4929@codequoteundirected on
4930@codequotebacktick on
4931@example
4932find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \
4933 | xargs grep -Li copyright \
4934 | xargs -r sed -i '0rLIC.TXT'
4935@end example
4936@codequoteundirected off
4937@codequotebacktick off
4938
4939@exdent Or a slightly safe version (handling files with spaces and newlines):
4940
4941@codequoteundirected on
4942@codequotebacktick on
4943@example
4944find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \
4945 | xargs -0 grep -Z -Li copyright \
4946 | xargs -0 -r sed -i '0rLIC.TXT'
4947@end example
4948@codequoteundirected off
4949@codequotebacktick off
4950
4951Note: using the @code{0} address with @code{r} command requires @value{SSED}
4952version 4.9 or later. @xref{Zero Address}.
4953
4954
4955
4956@node tac
4957@section Reverse Lines of Files
4958
4959This one begins a series of totally useless (yet interesting)
4960scripts emulating various Unix commands. This, in particular,
4961is a @command{tac} workalike.
4962
4963Note that on implementations other than GNU @command{sed}
4964this script might easily overflow internal buffers.
4965
4966@c start-------------------------------------------
4967@example
4968#!/usr/bin/sed -nf
4969
4970# reverse all lines of input, i.e. first line became last, ...
4971
4972@group
4973# from the second line, the buffer (which contains all previous lines)
4974# is *appended* to current line, so, the order will be reversed
49751! G
4976@end group
4977
4978@group
4979# on the last line we're done -- print everything
4980$ p
4981@end group
4982
4983@group
4984# store everything on the buffer again
4985h
4986@end group
4987@end example
4988@c end---------------------------------------------
4989
4990@node cat -n
4991@section Numbering Lines
4992
4993This script replaces @samp{cat -n}; in fact it formats its output
4994exactly like GNU @command{cat} does.
4995
4996Of course this is completely useless and for two reasons: first,
4997because somebody else did it in C, second, because the following
4998Bourne-shell script could be used for the same purpose and would
4999be much faster:
5000
5001@c start-------------------------------------------
5002@example
5003@group
5004#! /bin/sh
5005sed -e "=" $@@ | sed -e '
5006 s/^/ /
5007 N
5008 s/^ *\(......\)\n/\1 /
5009'
5010@end group
5011@end example
5012@c end---------------------------------------------
5013
5014It uses @command{sed} to print the line number, then groups lines two
5015by two using @code{N}. Of course, this script does not teach as much as
5016the one presented below.
5017
5018The algorithm used for incrementing uses both buffers, so the line
5019is printed as soon as possible and then discarded. The number
5020is split so that changing digits go in a buffer and unchanged ones go
5021in the other; the changed digits are modified in a single step
5022(using a @code{y} command). The line number for the next line
5023is then composed and stored in the hold space, to be used in the
5024next iteration.
5025
5026@c start-------------------------------------------
5027@example
5028#!/usr/bin/sed -nf
5029
5030@group
5031# Prime the pump on the first line
5032x
5033/^$/ s/^.*$/1/
5034@end group
5035
5036@group
5037# Add the correct line number before the pattern
5038G
5039h
5040@end group
5041
5042@group
5043# Format it and print it
5044s/^/ /
5045s/^ *\(......\)\n/\1 /p
5046@end group
5047
5048@group
5049# Get the line number from hold space; add a zero
5050# if we're going to add a digit on the next line
5051g
5052s/\n.*$//
5053/^9*$/ s/^/0/
5054@end group
5055
5056@group
5057# separate changing/unchanged digits with an x
5058s/.9*$/x&/
5059@end group
5060
5061@group
5062# keep changing digits in hold space
5063h
5064s/^.*x//
5065y/0123456789/1234567890/
5066x
5067@end group
5068
5069@group
5070# keep unchanged digits in pattern space
5071s/x.*$//
5072@end group
5073
5074@group
5075# compose the new number, remove the newline implicitly added by G
5076G
5077s/\n//
5078h
5079@end group
5080@end example
5081@c end---------------------------------------------
5082
5083@node cat -b
5084@section Numbering Non-blank Lines
5085
5086Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only
5087have to select which lines are to be numbered and which are not.
5088
5089The part that is common to this script and the previous one is
5090not commented to show how important it is to comment @command{sed}
5091scripts properly...
5092
5093@c start-------------------------------------------
5094@example
5095#!/usr/bin/sed -nf
5096
5097@group
5098/^$/ @{
5099 p
5100 b
5101@}
5102@end group
5103
5104@group
5105# Same as cat -n from now
5106x
5107/^$/ s/^.*$/1/
5108G
5109h
5110s/^/ /
5111s/^ *\(......\)\n/\1 /p
5112x
5113s/\n.*$//
5114/^9*$/ s/^/0/
5115s/.9*$/x&/
5116h
5117s/^.*x//
5118y/0123456789/1234567890/
5119x
5120s/x.*$//
5121G
5122s/\n//
5123h
5124@end group
5125@end example
5126@c end---------------------------------------------
5127
5128@node wc -c
5129@section Counting Characters
5130
5131This script shows another way to do arithmetic with @command{sed}.
5132In this case we have to add possibly large numbers, so implementing
5133this by successive increments would not be feasible (and possibly
5134even more complicated to contrive than this script).
5135
5136The approach is to map numbers to letters, kind of an abacus
5137implemented with @command{sed}. @samp{a}s are units, @samp{b}s are
5138tens and so on: we simply add the number of characters
5139on the current line as units, and then propagate the carry
5140to tens, hundreds, and so on.
5141
5142As usual, running totals are kept in hold space.
5143
5144On the last line, we convert the abacus form back to decimal.
5145For the sake of variety, this is done with a loop rather than
5146with some 80 @code{s} commands@footnote{Some implementations
5147have a limit of 199 commands per script}: first we
5148convert units, removing @samp{a}s from the number; then we
5149rotate letters so that tens become @samp{a}s, and so on
5150until no more letters remain.
5151
5152@c start-------------------------------------------
5153@example
5154#!/usr/bin/sed -nf
5155
5156@group
5157# Add n+1 a's to hold space (+1 is for the newline)
5158s/./a/g
5159H
5160x
5161s/\n/a/
5162@end group
5163
5164@group
5165# Do the carry. The t's and b's are not necessary,
5166# but they do speed up the thing
5167t a
5168: a; s/aaaaaaaaaa/b/g; t b; b done
5169: b; s/bbbbbbbbbb/c/g; t c; b done
5170: c; s/cccccccccc/d/g; t d; b done
5171: d; s/dddddddddd/e/g; t e; b done
5172: e; s/eeeeeeeeee/f/g; t f; b done
5173: f; s/ffffffffff/g/g; t g; b done
5174: g; s/gggggggggg/h/g; t h; b done
5175: h; s/hhhhhhhhhh//g
5176@end group
5177
5178@group
5179: done
5180$! @{
5181 h
5182 b
5183@}
5184@end group
5185
5186# On the last line, convert back to decimal
5187
5188@group
5189: loop
5190/a/! s/[b-h]*/&0/
5191s/aaaaaaaaa/9/
5192s/aaaaaaaa/8/
5193s/aaaaaaa/7/
5194s/aaaaaa/6/
5195s/aaaaa/5/
5196s/aaaa/4/
5197s/aaa/3/
5198s/aa/2/
5199s/a/1/
5200@end group
5201
5202@group
5203: next
5204y/bcdefgh/abcdefg/
5205/[a-h]/ b loop
5206p
5207@end group
5208@end example
5209@c end---------------------------------------------
5210
5211@node wc -w
5212@section Counting Words
5213
5214This script is almost the same as the previous one, once each
5215of the words on the line is converted to a single @samp{a}
5216(in the previous script each letter was changed to an @samp{a}).
5217
5218It is interesting that real @command{wc} programs have optimized
5219loops for @samp{wc -c}, so they are much slower at counting
5220words rather than characters. This script's bottleneck,
5221instead, is arithmetic, and hence the word-counting one
5222is faster (it has to manage smaller numbers).
5223
5224Again, the common parts are not commented to show the importance
5225of commenting @command{sed} scripts.
5226
5227@c start-------------------------------------------
5228@example
5229#!/usr/bin/sed -nf
5230
5231@group
5232# Convert words to a's
5233s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g
5234s/^/ /
5235s/ [^ ][^ ]*/a /g
5236s/ //g
5237@end group
5238
5239@group
5240# Append them to hold space
5241H
5242x
5243s/\n//
5244@end group
5245
5246@group
5247# From here on it is the same as in wc -c.
5248/aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
5249/bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
5250/cccccccccc/! bx; s/cccccccccc/d/g
5251/dddddddddd/! bx; s/dddddddddd/e/g
5252/eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
5253/ffffffffff/! bx; s/ffffffffff/g/g
5254/gggggggggg/! bx; s/gggggggggg/h/g
5255s/hhhhhhhhhh//g
5256:x
5257$! @{ h; b; @}
5258:y
5259/a/! s/[b-h]*/&0/
5260s/aaaaaaaaa/9/
5261s/aaaaaaaa/8/
5262s/aaaaaaa/7/
5263s/aaaaaa/6/
5264s/aaaaa/5/
5265s/aaaa/4/
5266s/aaa/3/
5267s/aa/2/
5268s/a/1/
5269y/bcdefgh/abcdefg/
5270/[a-h]/ by
5271p
5272@end group
5273@end example
5274@c end---------------------------------------------
5275
5276@node wc -l
5277@section Counting Lines
5278
5279No strange things are done now, because @command{sed} gives us
5280@samp{wc -l} functionality for free!!! Look:
5281
5282@c start-------------------------------------------
5283@example
5284@group
5285#!/usr/bin/sed -nf
5286$=
5287@end group
5288@end example
5289@c end---------------------------------------------
5290
5291@node head
5292@section Printing the First Lines
5293
5294This script is probably the simplest useful @command{sed} script.
5295It displays the first 10 lines of input; the number of displayed
5296lines is right before the @code{q} command.
5297
5298@c start-------------------------------------------
5299@example
5300@group
5301#!/usr/bin/sed -f
530210q
5303@end group
5304@end example
5305@c end---------------------------------------------
5306
5307@node tail
5308@section Printing the Last Lines
5309
5310Printing the last @var{n} lines rather than the first is more complex
5311but indeed possible. @var{n} is encoded in the second line, before
5312the bang character.
5313
5314This script is similar to the @command{tac} script in that it keeps the
5315final output in the hold space and prints it at the end:
5316
5317@c start-------------------------------------------
5318@example
5319#!/usr/bin/sed -nf
5320
5321@group
53221! @{; H; g; @}
53231,10 !s/[^\n]*\n//
5324$p
5325h
5326@end group
5327@end example
5328@c end---------------------------------------------
5329
5330Mainly, the scripts keeps a window of 10 lines and slides it
5331by adding a line and deleting the oldest (the substitution command
5332on the second line works like a @code{D} command but does not
5333restart the loop).
5334
5335The ``sliding window'' technique is a very powerful way to write
5336efficient and complex @command{sed} scripts, because commands like
5337@code{P} would require a lot of work if implemented manually.
5338
5339To introduce the technique, which is fully demonstrated in the
5340rest of this chapter and is based on the @code{N}, @code{P}
5341and @code{D} commands, here is an implementation of @command{tail}
5342using a simple ``sliding window.''
5343
5344This looks complicated but in fact the working is the same as
5345the last script: after we have kicked in the appropriate number
5346of lines, however, we stop using the hold space to keep inter-line
5347state, and instead use @code{N} and @code{D} to slide pattern
5348space by one line:
5349
5350@c start-------------------------------------------
5351@example
5352#!/usr/bin/sed -f
5353
5354@group
53551h
53562,10 @{; H; g; @}
5357$q
53581,9d
5359N
5360D
5361@end group
5362@end example
5363@c end---------------------------------------------
5364
5365Note how the first, second and fourth line are inactive after
5366the first ten lines of input. After that, all the script does
5367is: exiting on the last line of input, appending the next input
5368line to pattern space, and removing the first line.
5369
5370@node uniq
5371@section Make Duplicate Lines Unique
5372
5373This is an example of the art of using the @code{N}, @code{P}
5374and @code{D} commands, probably the most difficult to master.
5375
5376@c start-------------------------------------------
5377@example
5378@group
5379#!/usr/bin/sed -f
5380h
5381@end group
5382
5383@group
5384:b
5385# On the last line, print and exit
5386$b
5387N
5388/^\(.*\)\n\1$/ @{
5389 # The two lines are identical. Undo the effect of
5390 # the n command.
5391 g
5392 bb
5393@}
5394@end group
5395
5396@group
5397# If the @code{N} command had added the last line, print and exit
5398$b
5399@end group
5400
5401@group
5402# The lines are different; print the first and go
5403# back working on the second.
5404P
5405D
5406@end group
5407@end example
5408@c end---------------------------------------------
5409
5410As you can see, we maintain a 2-line window using @code{P} and @code{D}.
5411This technique is often used in advanced @command{sed} scripts.
5412
5413@node uniq -d
5414@section Print Duplicated Lines of Input
5415
5416This script prints only duplicated lines, like @samp{uniq -d}.
5417
5418@c start-------------------------------------------
5419@example
5420#!/usr/bin/sed -nf
5421
5422@group
5423$b
5424N
5425/^\(.*\)\n\1$/ @{
5426 # Print the first of the duplicated lines
5427 s/.*\n//
5428 p
5429@end group
5430
5431@group
5432 # Loop until we get a different line
5433 :b
5434 $b
5435 N
5436 /^\(.*\)\n\1$/ @{
5437 s/.*\n//
5438 bb
5439 @}
5440@}
5441@end group
5442
5443@group
5444# The last line cannot be followed by duplicates
5445$b
5446@end group
5447
5448@group
5449# Found a different one. Leave it alone in the pattern space
5450# and go back to the top, hunting its duplicates
5451D
5452@end group
5453@end example
5454@c end---------------------------------------------
5455
5456@node uniq -u
5457@section Remove All Duplicated Lines
5458
5459This script prints only unique lines, like @samp{uniq -u}.
5460
5461@c start-------------------------------------------
5462@example
5463#!/usr/bin/sed -f
5464
5465@group
5466# Search for a duplicate line --- until that, print what you find.
5467$b
5468N
5469/^\(.*\)\n\1$/ ! @{
5470 P
5471 D
5472@}
5473@end group
5474
5475@group
5476:c
5477# Got two equal lines in pattern space. At the
5478# end of the file we simply exit
5479$d
5480@end group
5481
5482@group
5483# Else, we keep reading lines with @code{N} until we
5484# find a different one
5485s/.*\n//
5486N
5487/^\(.*\)\n\1$/ @{
5488 bc
5489@}
5490@end group
5491
5492@group
5493# Remove the last instance of the duplicate line
5494# and go back to the top
5495D
5496@end group
5497@end example
5498@c end---------------------------------------------
5499
5500@node cat -s
5501@section Squeezing Blank Lines
5502
5503As a final example, here are three scripts, of increasing complexity
5504and speed, that implement the same function as @samp{cat -s}, that is
5505squeezing blank lines.
5506
5507The first leaves a blank line at the beginning and end if there are
5508some already.
5509
5510@c start-------------------------------------------
5511@example
5512#!/usr/bin/sed -f
5513
5514@group
5515# on empty lines, join with next
5516# Note there is a star in the regexp
5517:x
5518/^\n*$/ @{
5519N
5520bx
5521@}
5522@end group
5523
5524@group
5525# now, squeeze all '\n', this can be also done by:
5526# s/^\(\n\)*/\1/
5527s/\n*/\
5528/
5529@end group
5530@end example
5531@c end---------------------------------------------
5532
5533This one is a bit more complex and removes all empty lines
5534at the beginning. It does leave a single blank line at end
5535if one was there.
5536
5537@c start-------------------------------------------
5538@example
5539#!/usr/bin/sed -f
5540
5541@group
5542# delete all leading empty lines
55431,/^./@{
5544/./!d
5545@}
5546@end group
5547
5548@group
5549# on an empty line we remove it and all the following
5550# empty lines, but one
5551:x
5552/./!@{
5553N
5554s/^\n$//
5555tx
5556@}
5557@end group
5558@end example
5559@c end---------------------------------------------
5560
5561This removes leading and trailing blank lines. It is also the
5562fastest. Note that loops are completely done with @code{n} and
5563@code{b}, without relying on @command{sed} to restart the
5564script automatically at the end of a line.
5565
5566@c start-------------------------------------------
5567@example
5568#!/usr/bin/sed -nf
5569
5570@group
5571# delete all (leading) blanks
5572/./!d
5573@end group
5574
5575@group
5576# get here: so there is a non empty
5577:x
5578# print it
5579p
5580# get next
5581n
5582# got chars? print it again, etc...
5583/./bx
5584@end group
5585
5586@group
5587# no, don't have chars: got an empty line
5588:z
5589# get next, if last line we finish here so no trailing
5590# empty lines are written
5591n
5592# also empty? then ignore it, and get next... this will
5593# remove ALL empty lines
5594/./!bz
5595@end group
5596
5597@group
5598# all empty lines were deleted/ignored, but we have a non empty. As
5599# what we want to do is to squeeze, insert a blank line artificially
5600i\
5601@end group
5602
5603bx
5604@end example
5605@c end---------------------------------------------
5606
5607@node Limitations
5608@chapter @value{SSED}'s Limitations and Non-limitations
5609
5610@cindex GNU extensions, unlimited line length
5611@cindex Portability, line length limitations
5612For those who want to write portable @command{sed} scripts,
5613be aware that some implementations have been known to
5614limit line lengths (for the pattern and hold spaces)
5615to be no more than 4000 bytes.
5616The @sc{posix} standard specifies that conforming @command{sed}
5617implementations shall support at least 8192 byte line lengths.
5618@value{SSED} has no built-in limit on line length;
5619as long as it can @code{malloc()} more (virtual) memory,
5620you can feed or construct lines as long as you like.
5621
5622However, recursion is used to handle subpatterns and indefinite
5623repetition. This means that the available stack space may limit
5624the size of the buffer that can be processed by certain patterns.
5625
5626
5627@node Other Resources
5628@chapter Other Resources for Learning About @command{sed}
5629
5630For up to date information about @value{SSED} please
5631visit @uref{https://www.gnu.org/software/sed/}.
5632
5633Send general questions and suggestions to @email{sed-devel@@gnu.org}.
5634Visit the mailing list archives for past discussions at
5635@uref{https://lists.gnu.org/archive/html/sed-devel/}.
5636
5637@cindex Additional reading about @command{sed}
5638The following resources provide information about @command{sed}
5639(both @value{SSED} and other variations). Note these not maintained by
5640@value{SSED} developers.
5641
5642@itemize @bullet
5643
5644@item
5645sed @code{$HOME}: @uref{http://sed.sf.net}
5646
5647@item
5648sed FAQ: @uref{http://sed.sf.net/sedfaq.html}
5649
5650@item
5651seder's grabbag: @uref{http://sed.sf.net/grabbag}
5652
5653@item
5654The @code{sed-users} mailing list maintained by Sven Guckes:
5655@uref{http://groups.yahoo.com/group/sed-users/}
5656(note this is @emph{not} the @value{SSED} mailing list).
5657
5658@end itemize
5659
5660@node Reporting Bugs
5661@chapter Reporting Bugs
5662
5663@cindex Bugs, reporting
5664Email bug reports to @email{bug-sed@@gnu.org}.
5665Also, please include the output of @samp{sed --version} in the body
5666of your report if at all possible.
5667
5668Please do not send a bug report like this:
5669
5670@example
5671@i{@i{@r{while building frobme-1.3.4}}}
5672$ configure
5673@error{} sed: file sedscr line 1: Unknown option to 's'
5674@end example
5675
5676If @value{SSED} doesn't configure your favorite package, take a
5677few extra minutes to identify the specific problem and make a stand-alone
5678test case. Unlike other programs such as C compilers, making such test
5679cases for @command{sed} is quite simple.
5680
5681A stand-alone test case includes all the data necessary to perform the
5682test, and the specific invocation of @command{sed} that causes the problem.
5683The smaller a stand-alone test case is, the better. A test case should
5684not involve something as far removed from @command{sed} as ``try to configure
5685frobme-1.3.4''. Yes, that is in principle enough information to look
5686for the bug, but that is not a very practical prospect.
5687
5688Here are a few commonly reported bugs that are not bugs.
5689
5690@table @asis
5691@anchor{N_command_last_line}
5692@item @code{N} command on the last line
5693@cindex Portability, @code{N} command on the last line
5694@cindex Non-bugs, @code{N} command on the last line
5695
5696Most versions of @command{sed} exit without printing anything when
5697the @command{N} command is issued on the last line of a file.
5698@value{SSED} prints pattern space before exiting unless of course
5699the @command{-n} command switch has been specified. This choice is
5700by design.
5701
5702Default behavior (gnu extension, non-POSIX conforming):
5703@example
5704$ seq 3 | sed N
57051
57062
57073
5708@end example
5709@noindent
5710To force POSIX-conforming behavior:
5711@example
5712$ seq 3 | sed --posix N
57131
57142
5715@end example
5716
5717For example, the behavior of
5718@example
5719sed N foo bar
5720@end example
5721@noindent
5722would depend on whether foo has an even or an odd number of
5723lines@footnote{which is the actual ``bug'' that prompted the
5724change in behavior}. Or, when writing a script to read the
5725next few lines following a pattern match, traditional
5726implementations of @code{sed} would force you to write
5727something like
5728@example
5729/foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @}
5730@end example
5731@noindent
5732instead of just
5733@example
5734/foo/@{ N;N;N;N;N;N;N;N;N; @}
5735@end example
5736
5737@cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
5738In any case, the simplest workaround is to use @code{$d;N} in
5739scripts that rely on the traditional behavior, or to set
5740the @code{POSIXLY_CORRECT} variable to a non-empty value.
5741
5742@item Regex syntax clashes (problems with backslashes)
5743@cindex GNU extensions, to basic regular expressions
5744@cindex Non-bugs, regex syntax clashes
5745@command{sed} uses the @sc{posix} basic regular expression syntax. According to
5746the standard, the meaning of some escape sequences is undefined in
5747this syntax; notable in the case of @command{sed} are @code{\|},
5748@code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<},
5749@code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
5750
5751As in all GNU programs that use @sc{posix} basic regular
5752expressions, @command{sed} interprets these escape sequences as special
5753characters. So, @code{x\+} matches one or more occurrences of @samp{x}.
5754@code{abc\|def} matches either @samp{abc} or @samp{def}.
5755
5756This syntax may cause problems when running scripts written for other
5757@command{sed}s. Some @command{sed} programs have been written with the
5758assumption that @code{\|} and @code{\+} match the literal characters
5759@code{|} and @code{+}. Such scripts must be modified by removing the
5760spurious backslashes if they are to be used with modern implementations
5761of @command{sed}, like
5762GNU @command{sed}.
5763
5764On the other hand, some scripts use s|abc\|def||g to remove occurrences
5765of @emph{either} @code{abc} or @code{def}. While this worked until
5766@command{sed} 4.0.x, newer versions interpret this as removing the
5767string @code{abc|def}. This is again undefined behavior according to
5768POSIX, and this interpretation is arguably more robust: older
5769@command{sed}s, for example, required that the regex matcher parsed
5770@code{\/} as @code{/} in the common case of escaping a slash, which is
5771again undefined behavior; the new behavior avoids this, and this is good
5772because the regex matcher is only partially under our control.
5773
5774@cindex GNU extensions, special escapes
5775In addition, this version of @command{sed} supports several escape characters
5776(some of which are multi-character) to insert non-printable characters
5777in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r},
5778@code{\t}, @code{\v}, @code{\x}). These can cause similar problems
5779with scripts written for other @command{sed}s.
5780
5781@item @option{-i} clobbers read-only files
5782@cindex In-place editing
5783@cindex @value{SSEDEXT}, in-place editing
5784@cindex Non-bugs, in-place editing
5785
5786In short, @samp{sed -i} will let you delete the contents of
5787a read-only file, and in general the @option{-i} option
5788(@pxref{Invoking sed, , Invocation}) lets you clobber
5789protected files. This is not a bug, but rather a consequence
5790of how the Unix file system works.
5791
5792The permissions on a file say what can happen to the data
5793in that file, while the permissions on a directory say what can
5794happen to the list of files in that directory. @samp{sed -i}
5795will not ever open for writing a file that is already on disk.
5796Rather, it will work on a temporary file that is finally renamed
5797to the original name: if you rename or delete files, you're actually
5798modifying the contents of the directory, so the operation depends on
5799the permissions of the directory, not of the file. For this same
5800reason, @command{sed} does not let you use @option{-i} on a writable file
5801in a read-only directory, and will break hard or symbolic links when
5802@option{-i} is used on such a file.
5803
5804@item @code{0a} does not work (gives an error)
5805@cindex @code{0} address
5806@cindex GNU extensions, @code{0} address
5807@cindex Non-bugs, @code{0} address
5808
5809There is no line 0. 0 is a special address that is only used to treat
5810addresses like @code{0,/@var{RE}/} as active when the script starts: if
5811you write @code{1,/abc/d} and the first line includes the string @samp{abc},
5812then that match would be ignored because address ranges must span at least
5813two lines (barring the end of the file); but what you probably wanted is
5814to delete every line up to the first one including @samp{abc}, and this
5815is obtained with @code{0,/abc/d}.
5816
5817@ifclear PERL
5818@item @code{[a-z]} is case insensitive
5819@cindex Non-bugs, localization-related
5820
5821You are encountering problems with locales. POSIX mandates that @code{[a-z]}
5822uses the current locale's collation order -- in C parlance, that means using
5823@code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
5824case-insensitive collation order, others don't.
5825
5826Another problem is that @code{[a-z]} tries to use collation symbols.
5827This only happens if you are on the GNU system, using
5828GNU libc's regular expression matcher instead of compiling the
5829one supplied with GNU sed. In a Danish locale, for example,
5830the regular expression @code{^[a-z]$} matches the string @samp{aa},
5831because this is a single collating symbol that comes after @samp{a}
5832and before @samp{b}; @samp{ll} behaves similarly in Spanish
5833locales, or @samp{ij} in Dutch locales.
5834
5835To work around these problems, which may cause bugs in shell scripts, set
5836the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
5837
5838@item @code{s/.*//} does not clear pattern space
5839@cindex Non-bugs, localization-related
5840@cindex @value{SSEDEXT}, emptying pattern space
5841@cindex Emptying pattern space
5842
5843This happens if your input stream includes invalid multibyte
5844sequences. @sc{posix} mandates that such sequences
5845are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
5846pattern space as you would expect. In fact, there is no way to clear
5847sed's buffers in the middle of the script in most multibyte locales
5848(including UTF-8 locales). For this reason, @value{SSED} provides a `z'
5849command (for `zap') as an extension.
5850
5851To work around these problems, which may cause bugs in shell scripts, set
5852the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
5853@end ifclear
5854@end table
5855
5856
5857
5858
5859@page
5860@node GNU Free Documentation License
5861@appendix GNU Free Documentation License
5862
5863@include fdl.texi
5864
5865
5866@page
5867@node Concept Index
5868@unnumbered Concept Index
5869
5870This is a general index of all issues discussed in this manual, with the
5871exception of the @command{sed} commands and command-line options.
5872
5873@printindex cp
5874
5875@page
5876@node Command and Option Index
5877@unnumbered Command and Option Index
5878
5879This is an alphabetical list of all @command{sed} commands and command-line
5880options.
5881
5882@printindex fn
5883
5884@contents
5885@bye
5886
5887@c XXX FIXME: the term "cycle" is never defined...
Note: See TracBrowser for help on using the repository browser.

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette