1 | \input texinfo @c -*-texinfo-*-
|
---|
2 | @c
|
---|
3 | @c -- Stuff that needs adding: ----------------------------------------------
|
---|
4 | @c (nothing!)
|
---|
5 | @c --------------------------------------------------------------------------
|
---|
6 | @c Check for consistency: regexps in @code, text that they match in @samp.
|
---|
7 | @c
|
---|
8 | @c Tips:
|
---|
9 | @c @command for command
|
---|
10 | @c @samp for command fragments: @samp{cat -s}
|
---|
11 | @c @code for sed commands and flags
|
---|
12 | @c Use ``quote'' not `quote' or "quote".
|
---|
13 | @c
|
---|
14 | @c %**start of header
|
---|
15 | @setfilename sed.info
|
---|
16 | @settitle sed, a stream editor
|
---|
17 | @c %**end of header
|
---|
18 |
|
---|
19 | @c @smallbook
|
---|
20 |
|
---|
21 | @include version.texi
|
---|
22 |
|
---|
23 | @c Combine indices.
|
---|
24 | @syncodeindex ky cp
|
---|
25 | @syncodeindex pg cp
|
---|
26 | @syncodeindex tp cp
|
---|
27 |
|
---|
28 | @defcodeindex op
|
---|
29 | @syncodeindex op fn
|
---|
30 |
|
---|
31 | @include config.texi
|
---|
32 |
|
---|
33 | @copying
|
---|
34 | This file documents version @value{VERSION} of
|
---|
35 | @value{SSED}, a stream editor.
|
---|
36 |
|
---|
37 | Copyright @copyright{} 1998--2022 Free Software Foundation, Inc.
|
---|
38 |
|
---|
39 | @quotation
|
---|
40 | Permission is granted to copy, distribute and/or modify this document
|
---|
41 | under the terms of the GNU Free Documentation License, Version 1.3
|
---|
42 | or any later version published by the Free Software Foundation;
|
---|
43 | with no Invariant Sections, no Front-Cover Texts, and no
|
---|
44 | Back-Cover Texts. A copy of the license is included in the
|
---|
45 | section entitled ``GNU Free Documentation License''.
|
---|
46 | @end quotation
|
---|
47 | @end copying
|
---|
48 |
|
---|
49 | @setchapternewpage off
|
---|
50 |
|
---|
51 | @titlepage
|
---|
52 | @title @value{SSED}, a stream editor
|
---|
53 | @subtitle version @value{VERSION}, @value{UPDATED}
|
---|
54 | @author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon
|
---|
55 |
|
---|
56 | @page
|
---|
57 | @vskip 0pt plus 1filll
|
---|
58 | @insertcopying
|
---|
59 | @end titlepage
|
---|
60 |
|
---|
61 | @contents
|
---|
62 |
|
---|
63 | @ifnottex
|
---|
64 | @node Top
|
---|
65 | @top @value{SSED}
|
---|
66 |
|
---|
67 | @insertcopying
|
---|
68 | @end ifnottex
|
---|
69 |
|
---|
70 | @menu
|
---|
71 | * Introduction:: Introduction
|
---|
72 | * Invoking sed:: Invocation
|
---|
73 | * sed scripts:: @command{sed} scripts
|
---|
74 | * sed addresses:: Addresses: selecting lines
|
---|
75 | * sed regular expressions:: Regular expressions: selecting text
|
---|
76 | * advanced sed:: Advanced @command{sed}: cycles and buffers
|
---|
77 | * Examples:: Some sample scripts
|
---|
78 | * Limitations:: Limitations and (non-)limitations of @value{SSED}
|
---|
79 | * Other Resources:: Other resources for learning about @command{sed}
|
---|
80 | * Reporting Bugs:: Reporting bugs
|
---|
81 | * GNU Free Documentation License:: Copying and sharing this manual
|
---|
82 | * Concept Index:: A menu with all the topics in this manual.
|
---|
83 | * Command and Option Index:: A menu with all @command{sed} commands and
|
---|
84 | command-line options.
|
---|
85 | @end menu
|
---|
86 |
|
---|
87 |
|
---|
88 | @node Introduction
|
---|
89 | @chapter Introduction
|
---|
90 |
|
---|
91 | @cindex Stream editor
|
---|
92 | @command{sed} is a stream editor.
|
---|
93 | A stream editor is used to perform basic text
|
---|
94 | transformations on an input stream
|
---|
95 | (a file or input from a pipeline).
|
---|
96 | While in some ways similar to an editor which
|
---|
97 | permits scripted edits (such as @command{ed}),
|
---|
98 | @command{sed} works by making only one pass over the
|
---|
99 | input(s), and is consequently more efficient.
|
---|
100 | But it is @command{sed}'s ability to filter text in a pipeline
|
---|
101 | which particularly distinguishes it from other types of
|
---|
102 | editors.
|
---|
103 |
|
---|
104 |
|
---|
105 | @node Invoking sed
|
---|
106 | @chapter Running sed
|
---|
107 |
|
---|
108 | This chapter covers how to run @command{sed}. Details of @command{sed}
|
---|
109 | scripts and individual @command{sed} commands are discussed in the
|
---|
110 | next chapter.
|
---|
111 |
|
---|
112 | @menu
|
---|
113 | * Overview::
|
---|
114 | * Command-Line Options::
|
---|
115 | * Exit status::
|
---|
116 | @end menu
|
---|
117 |
|
---|
118 |
|
---|
119 | @node Overview
|
---|
120 | @section Overview
|
---|
121 | Normally @command{sed} is invoked like this:
|
---|
122 |
|
---|
123 | @example
|
---|
124 | sed SCRIPT INPUTFILE...
|
---|
125 | @end example
|
---|
126 |
|
---|
127 | For example, to change every @samp{hello} to @samp{world}
|
---|
128 | in the file @file{input.txt}:
|
---|
129 |
|
---|
130 | @example
|
---|
131 | sed 's/hello/world/g' input.txt > output.txt
|
---|
132 | @end example
|
---|
133 |
|
---|
134 | Without the @samp{g} (global) modifier, @command{sed} affects
|
---|
135 | only the first instance per line.
|
---|
136 |
|
---|
137 | @cindex stdin
|
---|
138 | @cindex standard input
|
---|
139 | If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
|
---|
140 | @command{sed} filters the contents of the standard input. The following
|
---|
141 | commands are equivalent:
|
---|
142 |
|
---|
143 | @example
|
---|
144 | sed 's/hello/world/g' input.txt > output.txt
|
---|
145 | sed 's/hello/world/g' < input.txt > output.txt
|
---|
146 | cat input.txt | sed 's/hello/world/g' - > output.txt
|
---|
147 | @end example
|
---|
148 |
|
---|
149 | @cindex stdout
|
---|
150 | @cindex output
|
---|
151 | @cindex standard output
|
---|
152 | @cindex -i, example
|
---|
153 | @command{sed} writes output to standard output. Use @option{-i} to edit
|
---|
154 | files in-place instead of printing to standard output.
|
---|
155 | See also the @code{W} and @code{s///w} commands for writing output to
|
---|
156 | other files. The following command modifies @file{file.txt} and
|
---|
157 | does not produce any output:
|
---|
158 |
|
---|
159 | @example
|
---|
160 | sed -i 's/hello/world/' file.txt
|
---|
161 | @end example
|
---|
162 |
|
---|
163 | @cindex -n, example
|
---|
164 | @cindex p, example
|
---|
165 | @cindex suppressing output
|
---|
166 | @cindex output, suppressing
|
---|
167 | By default @command{sed} prints all processed input (except input
|
---|
168 | that has been modified/deleted by commands such as @command{d}).
|
---|
169 | Use @option{-n} to suppress output, and the @code{p} command
|
---|
170 | to print specific lines. The following command prints only line 45
|
---|
171 | of the input file:
|
---|
172 |
|
---|
173 | @example
|
---|
174 | sed -n '45p' file.txt
|
---|
175 | @end example
|
---|
176 |
|
---|
177 |
|
---|
178 |
|
---|
179 | @cindex multiple files
|
---|
180 | @cindex -s, example
|
---|
181 | @command{sed} treats multiple input files as one long stream.
|
---|
182 | The following example prints the first line of the first file
|
---|
183 | (@file{one.txt}) and the last line of the last file (@file{three.txt}).
|
---|
184 | Use @option{-s} to reverse this behavior.
|
---|
185 |
|
---|
186 | @example
|
---|
187 | sed -n '1p ; $p' one.txt two.txt three.txt
|
---|
188 | @end example
|
---|
189 |
|
---|
190 |
|
---|
191 | @cindex -e, example
|
---|
192 | @cindex --expression, example
|
---|
193 | @cindex -f, example
|
---|
194 | @cindex --file, example
|
---|
195 | @cindex script parameter
|
---|
196 | @cindex parameters, script
|
---|
197 | Without @option{-e} or @option{-f} options, @command{sed} uses
|
---|
198 | the first non-option parameter as the @var{script}, and the following
|
---|
199 | non-option parameters as input files.
|
---|
200 | If @option{-e} or @option{-f} options are used to specify a @var{script},
|
---|
201 | all non-option parameters are taken as input files.
|
---|
202 | Options @option{-e} and @option{-f} can be combined, and can appear
|
---|
203 | multiple times (in which case the final effective @var{script} will be
|
---|
204 | concatenation of all the individual @var{script}s).
|
---|
205 |
|
---|
206 | The following examples are equivalent:
|
---|
207 |
|
---|
208 | @example
|
---|
209 | sed 's/hello/world/' input.txt > output.txt
|
---|
210 |
|
---|
211 | sed -e 's/hello/world/' input.txt > output.txt
|
---|
212 | sed --expression='s/hello/world/' input.txt > output.txt
|
---|
213 |
|
---|
214 | echo 's/hello/world/' > myscript.sed
|
---|
215 | sed -f myscript.sed input.txt > output.txt
|
---|
216 | sed --file=myscript.sed input.txt > output.txt
|
---|
217 | @end example
|
---|
218 |
|
---|
219 |
|
---|
220 | @node Command-Line Options
|
---|
221 | @section Command-Line Options
|
---|
222 |
|
---|
223 | The full format for invoking @command{sed} is:
|
---|
224 |
|
---|
225 | @example
|
---|
226 | sed OPTIONS... [SCRIPT] [INPUTFILE...]
|
---|
227 | @end example
|
---|
228 |
|
---|
229 | @command{sed} may be invoked with the following command-line options:
|
---|
230 |
|
---|
231 | @table @code
|
---|
232 | @item --version
|
---|
233 | @opindex --version
|
---|
234 | @cindex Version, printing
|
---|
235 | Print out the version of @command{sed} that is being run and a copyright notice,
|
---|
236 | then exit.
|
---|
237 |
|
---|
238 | @item --help
|
---|
239 | @opindex --help
|
---|
240 | @cindex Usage summary, printing
|
---|
241 | Print a usage message briefly summarizing these command-line options
|
---|
242 | and the bug-reporting address,
|
---|
243 | then exit.
|
---|
244 |
|
---|
245 | @item -n
|
---|
246 | @itemx --quiet
|
---|
247 | @itemx --silent
|
---|
248 | @opindex -n
|
---|
249 | @opindex --quiet
|
---|
250 | @opindex --silent
|
---|
251 | @cindex Disabling autoprint, from command line
|
---|
252 | By default, @command{sed} prints out the pattern space
|
---|
253 | at the end of each cycle through the script (@pxref{Execution Cycle, ,
|
---|
254 | How @code{sed} works}).
|
---|
255 | These options disable this automatic printing,
|
---|
256 | and @command{sed} only produces output when explicitly told to
|
---|
257 | via the @code{p} command.
|
---|
258 |
|
---|
259 | @item --debug
|
---|
260 | @opindex --debug
|
---|
261 | @cindex @value{SSEDEXT}, debug
|
---|
262 | Print the input sed program in canonical form,
|
---|
263 | and annotate program execution.
|
---|
264 | @codequotebacktick on
|
---|
265 | @codequoteundirected on
|
---|
266 | @example
|
---|
267 | $ echo 1 | sed '\%1%s21232'
|
---|
268 | 3
|
---|
269 |
|
---|
270 | $ echo 1 | sed --debug '\%1%s21232'
|
---|
271 | SED PROGRAM:
|
---|
272 | /1/ s/1/3/
|
---|
273 | INPUT: 'STDIN' line 1
|
---|
274 | PATTERN: 1
|
---|
275 | COMMAND: /1/ s/1/3/
|
---|
276 | PATTERN: 3
|
---|
277 | END-OF-CYCLE:
|
---|
278 | 3
|
---|
279 | @end example
|
---|
280 | @codequotebacktick off
|
---|
281 | @codequoteundirected off
|
---|
282 |
|
---|
283 |
|
---|
284 | @item -e @var{script}
|
---|
285 | @itemx --expression=@var{script}
|
---|
286 | @opindex -e
|
---|
287 | @opindex --expression
|
---|
288 | @cindex Script, from command line
|
---|
289 | Add the commands in @var{script} to the set of commands to be
|
---|
290 | run while processing the input.
|
---|
291 |
|
---|
292 | @item -f @var{script-file}
|
---|
293 | @itemx --file=@var{script-file}
|
---|
294 | @opindex -f
|
---|
295 | @opindex --file
|
---|
296 | @cindex Script, from a file
|
---|
297 | Add the commands contained in the file @var{script-file}
|
---|
298 | to the set of commands to be run while processing the input.
|
---|
299 |
|
---|
300 | @item -i[@var{SUFFIX}]
|
---|
301 | @itemx --in-place[=@var{SUFFIX}]
|
---|
302 | @opindex -i
|
---|
303 | @opindex --in-place
|
---|
304 | @cindex In-place editing, activating
|
---|
305 | @cindex @value{SSEDEXT}, in-place editing
|
---|
306 | This option specifies that files are to be edited in-place.
|
---|
307 | @value{SSED} does this by creating a temporary file and
|
---|
308 | sending output to this file rather than to the standard
|
---|
309 | output.@footnote{This applies to commands such as @code{=},
|
---|
310 | @code{a}, @code{c}, @code{i}, @code{l}, @code{p}. You can
|
---|
311 | still write to the standard output by using the @code{w}
|
---|
312 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file
|
---|
313 | or @code{W} commands together with the @file{/dev/stdout}
|
---|
314 | special file}.
|
---|
315 |
|
---|
316 | This option implies @option{-s}.
|
---|
317 |
|
---|
318 | When the end of the file is reached, the temporary file is
|
---|
319 | renamed to the output file's original name. The extension,
|
---|
320 | if supplied, is used to modify the name of the old file
|
---|
321 | before renaming the temporary file, thereby making a backup
|
---|
322 | copy@footnote{Note that @value{SSED} creates the backup
|
---|
323 | file whether or not any output is actually changed.}).
|
---|
324 |
|
---|
325 | @cindex In-place editing, Perl-style backup file names
|
---|
326 | This rule is followed: if the extension doesn't contain a @code{*},
|
---|
327 | then it is appended to the end of the current filename as a
|
---|
328 | suffix; if the extension does contain one or more @code{*}
|
---|
329 | characters, then @emph{each} asterisk is replaced with the
|
---|
330 | current filename. This allows you to add a prefix to the
|
---|
331 | backup file, instead of (or in addition to) a suffix, or
|
---|
332 | even to place backup copies of the original files into another
|
---|
333 | directory (provided the directory already exists).
|
---|
334 |
|
---|
335 | If no extension is supplied, the original file is
|
---|
336 | overwritten without making a backup.
|
---|
337 |
|
---|
338 | Because @option{-i} takes an optional argument, it should
|
---|
339 | not be followed by other short options:
|
---|
340 | @table @code
|
---|
341 | @item sed -Ei '...' FILE
|
---|
342 | Same as @option{-E -i} with no backup suffix - @file{FILE} will be
|
---|
343 | edited in-place without creating a backup.
|
---|
344 |
|
---|
345 | @item sed -iE '...' FILE
|
---|
346 | This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup
|
---|
347 | of @file{FILE}
|
---|
348 | @end table
|
---|
349 |
|
---|
350 | Be cautious of using @option{-n} with @option{-i}: the former disables
|
---|
351 | automatic printing of lines and the latter changes the file in-place
|
---|
352 | without a backup. Used carelessly (and without an explicit @code{p} command),
|
---|
353 | the output file will be empty:
|
---|
354 | @codequotebacktick on
|
---|
355 | @codequoteundirected on
|
---|
356 | @example
|
---|
357 | # WRONG USAGE: 'FILE' will be truncated.
|
---|
358 | sed -ni 's/foo/bar/' FILE
|
---|
359 | @end example
|
---|
360 | @codequotebacktick off
|
---|
361 | @codequoteundirected off
|
---|
362 |
|
---|
363 | @item -l @var{N}
|
---|
364 | @itemx --line-length=@var{N}
|
---|
365 | @opindex -l
|
---|
366 | @opindex --line-length
|
---|
367 | @cindex Line length, setting
|
---|
368 | Specify the default line-wrap length for the @code{l} command.
|
---|
369 | A length of 0 (zero) means to never wrap long lines. If
|
---|
370 | not specified, it is taken to be 70.
|
---|
371 |
|
---|
372 | @item --posix
|
---|
373 | @opindex --posix
|
---|
374 | @cindex @value{SSEDEXT}, disabling
|
---|
375 | @value{SSED} includes several extensions to POSIX
|
---|
376 | sed. In order to simplify writing portable scripts, this
|
---|
377 | option disables all the extensions that this manual documents,
|
---|
378 | including additional commands.
|
---|
379 | @cindex @code{POSIXLY_CORRECT} behavior, enabling
|
---|
380 | Most of the extensions accept @command{sed} programs that
|
---|
381 | are outside the syntax mandated by POSIX, but some
|
---|
382 | of them (such as the behavior of the @command{N} command
|
---|
383 | described in @ref{Reporting Bugs}) actually violate the
|
---|
384 | standard. If you want to disable only the latter kind of
|
---|
385 | extension, you can set the @code{POSIXLY_CORRECT} variable
|
---|
386 | to a non-empty value.
|
---|
387 |
|
---|
388 | @item -b
|
---|
389 | @itemx --binary
|
---|
390 | @opindex -b
|
---|
391 | @opindex --binary
|
---|
392 | This option is available on every platform, but is only effective where the
|
---|
393 | operating system makes a distinction between text files and binary files.
|
---|
394 | When such a distinction is made---as is the case for MS-DOS, Windows,
|
---|
395 | Cygwin---text files are composed of lines separated by a carriage return
|
---|
396 | @emph{and} a line feed character, and @command{sed} does not see the
|
---|
397 | ending CR. When this option is specified, @command{sed} will open
|
---|
398 | input files in binary mode, thus not requesting this special processing
|
---|
399 | and considering lines to end at a line feed.
|
---|
400 |
|
---|
401 | @item --follow-symlinks
|
---|
402 | @opindex --follow-symlinks
|
---|
403 | This option is available only on platforms that support
|
---|
404 | symbolic links and has an effect only if option @option{-i}
|
---|
405 | is specified. In this case, if the file that is specified
|
---|
406 | on the command line is a symbolic link, @command{sed} will
|
---|
407 | follow the link and edit the ultimate destination of the
|
---|
408 | link. The default behavior is to break the symbolic link,
|
---|
409 | so that the link destination will not be modified.
|
---|
410 |
|
---|
411 | @item -E
|
---|
412 | @itemx -r
|
---|
413 | @itemx --regexp-extended
|
---|
414 | @opindex -E
|
---|
415 | @opindex -r
|
---|
416 | @opindex --regexp-extended
|
---|
417 | @cindex Extended regular expressions, choosing
|
---|
418 | @cindex GNU extensions, extended regular expressions
|
---|
419 | Use extended regular expressions rather than basic
|
---|
420 | regular expressions. Extended regexps are those that
|
---|
421 | @command{egrep} accepts; they can be clearer because they
|
---|
422 | usually have fewer backslashes.
|
---|
423 | Historically this was a GNU extension,
|
---|
424 | but the @option{-E}
|
---|
425 | extension has since been added to the POSIX standard
|
---|
426 | (http://austingroupbugs.net/view.php?id=528),
|
---|
427 | so use @option{-E} for portability.
|
---|
428 | GNU sed has accepted @option{-E} as an undocumented option for years,
|
---|
429 | and *BSD seds have accepted @option{-E} for years as well,
|
---|
430 | but scripts that use @option{-E} might not port to other older systems.
|
---|
431 | @xref{ERE syntax, , Extended regular expressions}.
|
---|
432 |
|
---|
433 |
|
---|
434 | @item -s
|
---|
435 | @itemx --separate
|
---|
436 | @opindex -s
|
---|
437 | @opindex --separate
|
---|
438 | @cindex Working on separate files
|
---|
439 | By default, @command{sed} will consider the files specified on the
|
---|
440 | command line as a single continuous long stream. This @value{SSED}
|
---|
441 | extension allows the user to consider them as separate files:
|
---|
442 | range addresses (such as @samp{/abc/,/def/}) are not allowed
|
---|
443 | to span several files, line numbers are relative to the start
|
---|
444 | of each file, @code{$} refers to the last line of each file,
|
---|
445 | and files invoked from the @code{R} commands are rewound at the
|
---|
446 | start of each file.
|
---|
447 |
|
---|
448 | @item --sandbox
|
---|
449 | @opindex --sandbox
|
---|
450 | @cindex Sandbox mode
|
---|
451 | In sandbox mode, @code{e/w/r} commands are rejected - programs containing
|
---|
452 | them will be aborted without being run. Sandbox mode ensures @command{sed}
|
---|
453 | operates only on the input files designated on the command line, and
|
---|
454 | cannot run external programs.
|
---|
455 |
|
---|
456 |
|
---|
457 | @item -u
|
---|
458 | @itemx --unbuffered
|
---|
459 | @opindex -u
|
---|
460 | @opindex --unbuffered
|
---|
461 | @cindex Unbuffered I/O, choosing
|
---|
462 | Buffer both input and output as minimally as practical.
|
---|
463 | (This is particularly useful if the input is coming from
|
---|
464 | the likes of @samp{tail -f}, and you wish to see the transformed
|
---|
465 | output as soon as possible.)
|
---|
466 |
|
---|
467 | @item -z
|
---|
468 | @itemx --null-data
|
---|
469 | @itemx --zero-terminated
|
---|
470 | @opindex -z
|
---|
471 | @opindex --null-data
|
---|
472 | @opindex --zero-terminated
|
---|
473 | Treat the input as a set of lines, each terminated by a zero byte
|
---|
474 | (the ASCII @samp{NUL} character) instead of a newline. This option can
|
---|
475 | be used with commands like @samp{sort -z} and @samp{find -print0}
|
---|
476 | to process arbitrary file names.
|
---|
477 | @end table
|
---|
478 |
|
---|
479 | If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file}
|
---|
480 | options are given on the command-line,
|
---|
481 | then the first non-option argument on the command line is
|
---|
482 | taken to be the @var{script} to be executed.
|
---|
483 |
|
---|
484 | @cindex Files to be processed as input
|
---|
485 | If any command-line parameters remain after processing the above,
|
---|
486 | these parameters are interpreted as the names of input files to
|
---|
487 | be processed.
|
---|
488 | @cindex Standard input, processing as input
|
---|
489 | A file name of @samp{-} refers to the standard input stream.
|
---|
490 | The standard input will be processed if no file names are specified.
|
---|
491 |
|
---|
492 | @node Exit status
|
---|
493 | @section Exit status
|
---|
494 | @cindex exit status
|
---|
495 | An exit status of zero indicates success, and a nonzero value
|
---|
496 | indicates failure. @value{SSED} returns the following exit status
|
---|
497 | error values:
|
---|
498 |
|
---|
499 | @table @asis
|
---|
500 | @item 0
|
---|
501 | Successful completion.
|
---|
502 |
|
---|
503 | @item 1
|
---|
504 | Invalid command, invalid syntax, invalid regular expression or a
|
---|
505 | @value{SSED} extension command used with @option{--posix}.
|
---|
506 |
|
---|
507 | @item 2
|
---|
508 | One or more of the input file specified on the command line could not be
|
---|
509 | opened (e.g. if a file is not found, or read permission is denied).
|
---|
510 | Processing continued with other files.
|
---|
511 |
|
---|
512 | @item 4
|
---|
513 | An I/O error, or a serious processing error during runtime,
|
---|
514 | @value{SSED} aborted immediately.
|
---|
515 | @end table
|
---|
516 |
|
---|
517 | @cindex Q, example
|
---|
518 | @cindex exit status, example
|
---|
519 | Additionally, the commands @code{q} and @code{Q} can be used to terminate
|
---|
520 | @command{sed} with a custom exit code value (this is a @value{SSED} extension):
|
---|
521 |
|
---|
522 | @example
|
---|
523 | $ echo | sed 'Q42' ; echo $?
|
---|
524 | 42
|
---|
525 | @end example
|
---|
526 |
|
---|
527 |
|
---|
528 | @node sed scripts
|
---|
529 | @chapter @command{sed} scripts
|
---|
530 |
|
---|
531 |
|
---|
532 | @menu
|
---|
533 | * sed script overview:: @command{sed} script overview
|
---|
534 | * sed commands list:: @command{sed} commands summary
|
---|
535 | * The "s" Command:: @command{sed}'s Swiss Army Knife
|
---|
536 | * Common Commands:: Often used commands
|
---|
537 | * Other Commands:: Less frequently used commands
|
---|
538 | * Programming Commands:: Commands for @command{sed} gurus
|
---|
539 | * Extended Commands:: Commands specific of @value{SSED}
|
---|
540 | * Multiple commands syntax:: Extension for easier scripting
|
---|
541 | @end menu
|
---|
542 |
|
---|
543 | @node sed script overview
|
---|
544 | @section @command{sed} script overview
|
---|
545 |
|
---|
546 | @cindex @command{sed} script structure
|
---|
547 | @cindex Script structure
|
---|
548 |
|
---|
549 | A @command{sed} program consists of one or more @command{sed} commands,
|
---|
550 | passed in by one or more of the
|
---|
551 | @option{-e}, @option{-f}, @option{--expression}, and @option{--file}
|
---|
552 | options, or the first non-option argument if zero of these
|
---|
553 | options are used.
|
---|
554 | This document will refer to ``the'' @command{sed} script;
|
---|
555 | this is understood to mean the in-order concatenation
|
---|
556 | of all of the @var{script}s and @var{script-file}s passed in.
|
---|
557 | @xref{Overview}.
|
---|
558 |
|
---|
559 |
|
---|
560 | @cindex @command{sed} commands syntax
|
---|
561 | @cindex syntax, @command{sed} commands
|
---|
562 | @cindex addresses, syntax
|
---|
563 | @cindex syntax, addresses
|
---|
564 | @command{sed} commands follow this syntax:
|
---|
565 |
|
---|
566 | @example
|
---|
567 | [addr]@var{X}[options]
|
---|
568 | @end example
|
---|
569 |
|
---|
570 | @var{X} is a single-letter @command{sed} command.
|
---|
571 | @c TODO: add @pxref{commands} when there is a command-list section.
|
---|
572 | @code{[addr]} is an optional line address. If @code{[addr]} is specified,
|
---|
573 | the command @var{X} will be executed only on the matched lines.
|
---|
574 | @code{[addr]} can be a single line number, a regular expression,
|
---|
575 | or a range of lines (@pxref{sed addresses}).
|
---|
576 | Additional @code{[options]} are used for some @command{sed} commands.
|
---|
577 |
|
---|
578 | @cindex @command{d}, example
|
---|
579 | @cindex address range, example
|
---|
580 | @cindex example, address range
|
---|
581 | The following example deletes lines 30 to 35 in the input.
|
---|
582 | @code{30,35} is an address range. @command{d} is the delete command:
|
---|
583 |
|
---|
584 | @example
|
---|
585 | sed '30,35d' input.txt > output.txt
|
---|
586 | @end example
|
---|
587 |
|
---|
588 | @cindex @command{q}, example
|
---|
589 | @cindex regular expression, example
|
---|
590 | @cindex example, regular expression
|
---|
591 | The following example prints all input until a line
|
---|
592 | starting with the string @samp{foo} is found. If such line is found,
|
---|
593 | @command{sed} will terminate with exit status 42.
|
---|
594 | If such line was not found (and no other error occurred), @command{sed}
|
---|
595 | will exit with status 0.
|
---|
596 | @code{/^foo/} is a regular-expression address.
|
---|
597 | @command{q} is the quit command. @code{42} is the command option.
|
---|
598 |
|
---|
599 | @example
|
---|
600 | sed '/^foo/q42' input.txt > output.txt
|
---|
601 | @end example
|
---|
602 |
|
---|
603 |
|
---|
604 | @cindex multiple @command{sed} commands
|
---|
605 | @cindex @command{sed} commands, multiple
|
---|
606 | @cindex newline, command separator
|
---|
607 | @cindex semicolons, command separator
|
---|
608 | @cindex ;, command separator
|
---|
609 | @cindex -e, example
|
---|
610 | @cindex -f, example
|
---|
611 | Commands within a @var{script} or @var{script-file} can be
|
---|
612 | separated by semicolons (@code{;}) or newlines (ASCII 10).
|
---|
613 | Multiple scripts can be specified with @option{-e} or @option{-f}
|
---|
614 | options.
|
---|
615 |
|
---|
616 | The following examples are all equivalent. They perform two @command{sed}
|
---|
617 | operations: deleting any lines matching the regular expression @code{/^foo/},
|
---|
618 | and replacing all occurrences of the string @samp{hello} with @samp{world}:
|
---|
619 |
|
---|
620 | @example
|
---|
621 | sed '/^foo/d ; s/hello/world/g' input.txt > output.txt
|
---|
622 |
|
---|
623 | sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt
|
---|
624 |
|
---|
625 | echo '/^foo/d' > script.sed
|
---|
626 | echo 's/hello/world/g' >> script.sed
|
---|
627 | sed -f script.sed input.txt > output.txt
|
---|
628 |
|
---|
629 | echo 's/hello/world/g' > script2.sed
|
---|
630 | sed -e '/^foo/d' -f script2.sed input.txt > output.txt
|
---|
631 | @end example
|
---|
632 |
|
---|
633 |
|
---|
634 | @cindex @command{a}, and semicolons
|
---|
635 | @cindex @command{c}, and semicolons
|
---|
636 | @cindex @command{i}, and semicolons
|
---|
637 | Commands @command{a}, @command{c}, @command{i}, due to their syntax,
|
---|
638 | cannot be followed by semicolons working as command separators and
|
---|
639 | thus should be terminated
|
---|
640 | with newlines or be placed at the end of a @var{script} or @var{script-file}.
|
---|
641 | Commands can also be preceded with optional non-significant
|
---|
642 | whitespace characters.
|
---|
643 | @xref{Multiple commands syntax}.
|
---|
644 |
|
---|
645 |
|
---|
646 |
|
---|
647 | @node sed commands list
|
---|
648 | @section @command{sed} commands summary
|
---|
649 |
|
---|
650 | The following commands are supported in @value{SSED}.
|
---|
651 | Some are standard POSIX commands, while other are @value{SSEDEXT}.
|
---|
652 | Details and examples for each command are in the following sections.
|
---|
653 | (Mnemonics) are shown in parentheses.
|
---|
654 |
|
---|
655 | @table @code
|
---|
656 |
|
---|
657 | @item a\
|
---|
658 | @itemx @var{text}
|
---|
659 | Append @var{text} after a line.
|
---|
660 |
|
---|
661 | @item a @var{text}
|
---|
662 | Append @var{text} after a line (alternative syntax).
|
---|
663 |
|
---|
664 | @item b @var{label}
|
---|
665 | Branch unconditionally to @var{label}.
|
---|
666 | The @var{label} may be omitted, in which case the next cycle is started.
|
---|
667 |
|
---|
668 | @item c\
|
---|
669 | @itemx @var{text}
|
---|
670 | Replace (change) lines with @var{text}.
|
---|
671 |
|
---|
672 | @item c @var{text}
|
---|
673 | Replace (change) lines with @var{text} (alternative syntax).
|
---|
674 |
|
---|
675 | @item d
|
---|
676 | Delete the pattern space;
|
---|
677 | immediately start next cycle.
|
---|
678 |
|
---|
679 | @item D
|
---|
680 | If pattern space contains newlines, delete text in the pattern
|
---|
681 | space up to the first newline, and restart cycle with the resultant
|
---|
682 | pattern space, without reading a new line of input.
|
---|
683 |
|
---|
684 | If pattern space contains no newline, start a normal new cycle as if
|
---|
685 | the @code{d} command was issued.
|
---|
686 | @c TODO: add a section about D+N and D+n commands
|
---|
687 |
|
---|
688 | @item e
|
---|
689 | Executes the command that is found in pattern space and
|
---|
690 | replaces the pattern space with the output; a trailing newline
|
---|
691 | is suppressed.
|
---|
692 |
|
---|
693 | @item e @var{command}
|
---|
694 | Executes @var{command} and sends its output to the output stream.
|
---|
695 | The command can run across multiple lines, all but the last ending with
|
---|
696 | a back-slash.
|
---|
697 |
|
---|
698 | @item F
|
---|
699 | (filename) Print the file name of the current input file (with a trailing
|
---|
700 | newline).
|
---|
701 |
|
---|
702 | @item g
|
---|
703 | Replace the contents of the pattern space with the contents of the hold space.
|
---|
704 |
|
---|
705 | @item G
|
---|
706 | Append a newline to the contents of the pattern space,
|
---|
707 | and then append the contents of the hold space to that of the pattern space.
|
---|
708 |
|
---|
709 | @item h
|
---|
710 | (hold) Replace the contents of the hold space with the contents of the
|
---|
711 | pattern space.
|
---|
712 |
|
---|
713 | @item H
|
---|
714 | Append a newline to the contents of the hold space,
|
---|
715 | and then append the contents of the pattern space to that of the hold space.
|
---|
716 |
|
---|
717 | @item i\
|
---|
718 | @itemx @var{text}
|
---|
719 | insert @var{text} before a line.
|
---|
720 |
|
---|
721 | @item i @var{text}
|
---|
722 | insert @var{text} before a line (alternative syntax).
|
---|
723 |
|
---|
724 | @item l
|
---|
725 | Print the pattern space in an unambiguous form.
|
---|
726 |
|
---|
727 | @item n
|
---|
728 | (next) If auto-print is not disabled, print the pattern space,
|
---|
729 | then, regardless, replace the pattern space with the next line of input.
|
---|
730 | If there is no more input then @command{sed} exits without processing
|
---|
731 | any more commands.
|
---|
732 |
|
---|
733 | @item N
|
---|
734 | Add a newline to the pattern space,
|
---|
735 | then append the next line of input to the pattern space.
|
---|
736 | If there is no more input then @command{sed} exits without processing
|
---|
737 | any more commands.
|
---|
738 |
|
---|
739 | @item p
|
---|
740 | Print the pattern space.
|
---|
741 | @c useful with @option{-n}
|
---|
742 |
|
---|
743 | @item P
|
---|
744 | Print the pattern space, up to the first <newline>.
|
---|
745 |
|
---|
746 | @item q@var{[exit-code]}
|
---|
747 | (quit) Exit @command{sed} without processing any more commands or input.
|
---|
748 |
|
---|
749 | @item Q@var{[exit-code]}
|
---|
750 | (quit) This command is the same as @code{q}, but will not print the
|
---|
751 | contents of pattern space. Like @code{q}, it provides the
|
---|
752 | ability to return an exit code to the caller.
|
---|
753 | @c useful to quit on a conditional without printing
|
---|
754 |
|
---|
755 | @item r filename
|
---|
756 | Reads file @var{filename}.
|
---|
757 |
|
---|
758 | @item R filename
|
---|
759 | Queue a line of @var{filename} to be read and
|
---|
760 | inserted into the output stream at the end of the current cycle,
|
---|
761 | or when the next input line is read.
|
---|
762 | @c useful to interleave files
|
---|
763 |
|
---|
764 | @item s@var{/regexp/replacement/[flags]}
|
---|
765 | (substitute) Match the regular-expression against the content of the
|
---|
766 | pattern space. If found, replace matched string with
|
---|
767 | @var{replacement}.
|
---|
768 |
|
---|
769 | @item t @var{label}
|
---|
770 | (test) Branch to @var{label} only if there has been a successful
|
---|
771 | @code{s}ubstitution since the last input line was read or conditional
|
---|
772 | branch was taken. The @var{label} may be omitted, in which case the
|
---|
773 | next cycle is started.
|
---|
774 |
|
---|
775 | @item T @var{label}
|
---|
776 | (test) Branch to @var{label} only if there have been no successful
|
---|
777 | @code{s}ubstitutions since the last input line was read or
|
---|
778 | conditional branch was taken. The @var{label} may be omitted,
|
---|
779 | in which case the next cycle is started.
|
---|
780 |
|
---|
781 | @item v @var{[version]}
|
---|
782 | (version) This command does nothing, but makes @command{sed} fail if
|
---|
783 | @value{SSED} extensions are not supported, or if the requested version
|
---|
784 | is not available.
|
---|
785 |
|
---|
786 | @item w filename
|
---|
787 | Write the pattern space to @var{filename}.
|
---|
788 |
|
---|
789 | @item W filename
|
---|
790 | Write to the given filename the portion of the pattern space up to
|
---|
791 | the first newline
|
---|
792 |
|
---|
793 | @item x
|
---|
794 | Exchange the contents of the hold and pattern spaces.
|
---|
795 |
|
---|
796 |
|
---|
797 | @item y/src/dst/
|
---|
798 | Transliterate any characters in the pattern space which match
|
---|
799 | any of the @var{source-chars} with the corresponding character
|
---|
800 | in @var{dest-chars}.
|
---|
801 |
|
---|
802 |
|
---|
803 | @item z
|
---|
804 | (zap) This command empties the content of pattern space.
|
---|
805 |
|
---|
806 | @item #
|
---|
807 | A comment, until the next newline.
|
---|
808 |
|
---|
809 |
|
---|
810 | @item @{ @var{cmd ; cmd ...} @}
|
---|
811 | Group several commands together.
|
---|
812 | @c useful for multiple commands on same address
|
---|
813 |
|
---|
814 | @item =
|
---|
815 | Print the current input line number (with a trailing newline).
|
---|
816 |
|
---|
817 | @item : @var{label}
|
---|
818 | Specify the location of @var{label} for branch commands (@code{b},
|
---|
819 | @code{t}, @code{T}).
|
---|
820 |
|
---|
821 | @end table
|
---|
822 |
|
---|
823 |
|
---|
824 | @node The "s" Command
|
---|
825 | @section The @code{s} Command
|
---|
826 |
|
---|
827 | The @code{s} command (as in substitute) is probably the most important
|
---|
828 | in @command{sed} and has a lot of different options. The syntax of
|
---|
829 | the @code{s} command is
|
---|
830 | @samp{s/@var{regexp}/@var{replacement}/@var{flags}}.
|
---|
831 |
|
---|
832 | Its basic concept is simple: the @code{s} command attempts to match
|
---|
833 | the pattern space against the supplied regular expression @var{regexp};
|
---|
834 | if the match is successful, then that portion of the
|
---|
835 | pattern space which was matched is replaced with @var{replacement}.
|
---|
836 |
|
---|
837 | For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular
|
---|
838 | Expression Addresses}.
|
---|
839 |
|
---|
840 | @cindex Backreferences, in regular expressions
|
---|
841 | @cindex Parenthesized substrings
|
---|
842 | The @var{replacement} can contain @code{\@var{n}} (@var{n} being
|
---|
843 | a number from 1 to 9, inclusive) references, which refer to
|
---|
844 | the portion of the match which is contained between the @var{n}th
|
---|
845 | @code{\(} and its matching @code{\)}.
|
---|
846 | Also, the @var{replacement} can contain unescaped @code{&}
|
---|
847 | characters which reference the whole matched portion
|
---|
848 | of the pattern space.
|
---|
849 |
|
---|
850 | @c TODO: xref to backreference section mention @var{\'}.
|
---|
851 |
|
---|
852 | The @code{/}
|
---|
853 | characters may be uniformly replaced by any other single
|
---|
854 | character within any given @code{s} command. The @code{/}
|
---|
855 | character (or whatever other character is used in its stead)
|
---|
856 | can appear in the @var{regexp} or @var{replacement}
|
---|
857 | only if it is preceded by a @code{\} character.
|
---|
858 |
|
---|
859 |
|
---|
860 |
|
---|
861 | @cindex @value{SSEDEXT}, case modifiers in @code{s} commands
|
---|
862 | Finally, as a @value{SSED} extension, you can include a
|
---|
863 | special sequence made of a backslash and one of the letters
|
---|
864 | @code{L}, @code{l}, @code{U}, @code{u}, or @code{E}.
|
---|
865 | The meaning is as follows:
|
---|
866 |
|
---|
867 | @table @code
|
---|
868 | @item \L
|
---|
869 | Turn the replacement
|
---|
870 | to lowercase until a @code{\U} or @code{\E} is found,
|
---|
871 |
|
---|
872 | @item \l
|
---|
873 | Turn the
|
---|
874 | next character to lowercase,
|
---|
875 |
|
---|
876 | @item \U
|
---|
877 | Turn the replacement to uppercase
|
---|
878 | until a @code{\L} or @code{\E} is found,
|
---|
879 |
|
---|
880 | @item \u
|
---|
881 | Turn the next character
|
---|
882 | to uppercase,
|
---|
883 |
|
---|
884 | @item \E
|
---|
885 | Stop case conversion started by @code{\L} or @code{\U}.
|
---|
886 | @end table
|
---|
887 |
|
---|
888 | When the @code{g} flag is being used, case conversion does not
|
---|
889 | propagate from one occurrence of the regular expression to
|
---|
890 | another. For example, when the following command is executed
|
---|
891 | with @samp{a-b-} in pattern space:
|
---|
892 | @example
|
---|
893 | s/\(b\?\)-/x\u\1/g
|
---|
894 | @end example
|
---|
895 |
|
---|
896 | @noindent
|
---|
897 | the output is @samp{axxB}. When replacing the first @samp{-},
|
---|
898 | the @samp{\u} sequence only affects the empty replacement of
|
---|
899 | @samp{\1}. It does not affect the @code{x} character that is
|
---|
900 | added to pattern space when replacing @code{b-} with @code{xB}.
|
---|
901 |
|
---|
902 | On the other hand, @code{\l} and @code{\u} do affect the remainder
|
---|
903 | of the replacement text if they are followed by an empty substitution.
|
---|
904 | With @samp{a-b-} in pattern space, the following command:
|
---|
905 | @example
|
---|
906 | s/\(b\?\)-/\u\1x/g
|
---|
907 | @end example
|
---|
908 |
|
---|
909 | @noindent
|
---|
910 | will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with
|
---|
911 | @samp{Bx}. If this behavior is undesirable, you can prevent it by
|
---|
912 | adding a @samp{\E} sequence---after @samp{\1} in this case.
|
---|
913 |
|
---|
914 | To include a literal @code{\}, @code{&}, or newline in the final
|
---|
915 | replacement, be sure to precede the desired @code{\}, @code{&},
|
---|
916 | or newline in the @var{replacement} with a @code{\}.
|
---|
917 |
|
---|
918 | @findex s command, option flags
|
---|
919 | @cindex Substitution of text, options
|
---|
920 | The @code{s} command can be followed by zero or more of the
|
---|
921 | following @var{flags}:
|
---|
922 |
|
---|
923 | @table @code
|
---|
924 | @item g
|
---|
925 | @cindex Global substitution
|
---|
926 | @cindex Replacing all text matching regexp in a line
|
---|
927 | Apply the replacement to @emph{all} matches to the @var{regexp},
|
---|
928 | not just the first.
|
---|
929 |
|
---|
930 | @item @var{number}
|
---|
931 | @cindex Replacing only @var{n}th match of regexp in a line
|
---|
932 | Only replace the @var{number}th match of the @var{regexp}.
|
---|
933 |
|
---|
934 | @cindex GNU extensions, @code{g} and @var{number} modifier
|
---|
935 | interaction in @code{s} command
|
---|
936 | @cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
|
---|
937 | Note: the @sc{posix} standard does not specify what should happen
|
---|
938 | when you mix the @code{g} and @var{number} modifiers,
|
---|
939 | and currently there is no widely agreed upon meaning
|
---|
940 | across @command{sed} implementations.
|
---|
941 | For @value{SSED}, the interaction is defined to be:
|
---|
942 | ignore matches before the @var{number}th,
|
---|
943 | and then match and replace all matches from
|
---|
944 | the @var{number}th on.
|
---|
945 |
|
---|
946 | @item p
|
---|
947 | @cindex Text, printing after substitution
|
---|
948 | If the substitution was made, then print the new pattern space.
|
---|
949 |
|
---|
950 | Note: when both the @code{p} and @code{e} options are specified,
|
---|
951 | the relative ordering of the two produces very different results.
|
---|
952 | In general, @code{ep} (evaluate then print) is what you want,
|
---|
953 | but operating the other way round can be useful for debugging.
|
---|
954 | For this reason, the current version of @value{SSED} interprets
|
---|
955 | specially the presence of @code{p} options both before and after
|
---|
956 | @code{e}, printing the pattern space before and after evaluation,
|
---|
957 | while in general flags for the @code{s} command show their
|
---|
958 | effect just once. This behavior, although documented, might
|
---|
959 | change in future versions.
|
---|
960 |
|
---|
961 | @item w @var{filename}
|
---|
962 | @cindex Text, writing to a file after substitution
|
---|
963 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file
|
---|
964 | @cindex @value{SSEDEXT}, @file{/dev/stderr} file
|
---|
965 | If the substitution was made, then write out the result to the named file.
|
---|
966 | As a @value{SSED} extension, two special values of @var{filename} are
|
---|
967 | supported: @file{/dev/stderr}, which writes the result to the standard
|
---|
968 | error, and @file{/dev/stdout}, which writes to the standard
|
---|
969 | output.@footnote{This is equivalent to @code{p} unless the @option{-i}
|
---|
970 | option is being used.}
|
---|
971 |
|
---|
972 | @item e
|
---|
973 | @cindex Evaluate Bourne-shell commands, after substitution
|
---|
974 | @cindex Subprocesses
|
---|
975 | @cindex @value{SSEDEXT}, evaluating Bourne-shell commands
|
---|
976 | @cindex @value{SSEDEXT}, subprocesses
|
---|
977 | This command allows one to pipe input from a shell command
|
---|
978 | into pattern space. If a substitution was made, the command
|
---|
979 | that is found in pattern space is executed and pattern space
|
---|
980 | is replaced with its output. A trailing newline is suppressed;
|
---|
981 | results are undefined if the command to be executed contains
|
---|
982 | a @sc{nul} character. This is a @value{SSED} extension.
|
---|
983 |
|
---|
984 | @item I
|
---|
985 | @itemx i
|
---|
986 | @cindex GNU extensions, @code{I} modifier
|
---|
987 | @cindex Case-insensitive matching
|
---|
988 | The @code{I} modifier to regular-expression matching is a GNU
|
---|
989 | extension which makes @command{sed} match @var{regexp} in a
|
---|
990 | case-insensitive manner.
|
---|
991 |
|
---|
992 | @item M
|
---|
993 | @itemx m
|
---|
994 | @cindex @value{SSEDEXT}, @code{M} modifier
|
---|
995 | The @code{M} modifier to regular-expression matching is a @value{SSED}
|
---|
996 | extension which directs @value{SSED} to match the regular expression
|
---|
997 | in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
|
---|
998 | match respectively (in addition to the normal behavior) the empty string
|
---|
999 | after a newline, and the empty string before a newline. There are
|
---|
1000 | special character sequences
|
---|
1001 | @ifclear PERL
|
---|
1002 | (@code{\`} and @code{\'})
|
---|
1003 | @end ifclear
|
---|
1004 | which always match the beginning or the end of the buffer.
|
---|
1005 | In addition,
|
---|
1006 | the period character does not match a new-line character in
|
---|
1007 | multi-line mode.
|
---|
1008 |
|
---|
1009 |
|
---|
1010 | @end table
|
---|
1011 |
|
---|
1012 | @node Common Commands
|
---|
1013 | @section Often-Used Commands
|
---|
1014 |
|
---|
1015 | If you use @command{sed} at all, you will quite likely want to know
|
---|
1016 | these commands.
|
---|
1017 |
|
---|
1018 | @table @code
|
---|
1019 | @item #
|
---|
1020 | [No addresses allowed.]
|
---|
1021 |
|
---|
1022 | @findex # (comments)
|
---|
1023 | @cindex Comments, in scripts
|
---|
1024 | The @code{#} character begins a comment;
|
---|
1025 | the comment continues until the next newline.
|
---|
1026 |
|
---|
1027 | @cindex Portability, comments
|
---|
1028 | If you are concerned about portability, be aware that
|
---|
1029 | some implementations of @command{sed} (which are not @sc{posix}
|
---|
1030 | conforming) may only support a single one-line comment,
|
---|
1031 | and then only when the very first character of the script is a @code{#}.
|
---|
1032 |
|
---|
1033 | @findex -n, forcing from within a script
|
---|
1034 | @cindex Caveat --- #n on first line
|
---|
1035 | Warning: if the first two characters of the @command{sed} script
|
---|
1036 | are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
|
---|
1037 | If you want to put a comment in the first line of your script
|
---|
1038 | and that comment begins with the letter @samp{n}
|
---|
1039 | and you do not want this behavior,
|
---|
1040 | then be sure to either use a capital @samp{N},
|
---|
1041 | or place at least one space before the @samp{n}.
|
---|
1042 |
|
---|
1043 | @item q [@var{exit-code}]
|
---|
1044 | @findex q (quit) command
|
---|
1045 | @cindex @value{SSEDEXT}, returning an exit code
|
---|
1046 | @cindex Quitting
|
---|
1047 | Exit @command{sed} without processing any more commands or input.
|
---|
1048 |
|
---|
1049 | Example: stop after printing the second line:
|
---|
1050 | @example
|
---|
1051 | $ seq 3 | sed 2q
|
---|
1052 | 1
|
---|
1053 | 2
|
---|
1054 | @end example
|
---|
1055 |
|
---|
1056 | This command accepts only one address.
|
---|
1057 | Note that the current pattern space is printed if auto-print is
|
---|
1058 | not disabled with the @option{-n} options. The ability to return
|
---|
1059 | an exit code from the @command{sed} script is a @value{SSED} extension.
|
---|
1060 |
|
---|
1061 | See also the @value{SSED} extension @code{Q} command which quits silently
|
---|
1062 | without printing the current pattern space.
|
---|
1063 |
|
---|
1064 | @item d
|
---|
1065 | @findex d (delete) command
|
---|
1066 | @cindex Text, deleting
|
---|
1067 | Delete the pattern space;
|
---|
1068 | immediately start next cycle.
|
---|
1069 |
|
---|
1070 | Example: delete the second input line:
|
---|
1071 | @example
|
---|
1072 | $ seq 3 | sed 2d
|
---|
1073 | 1
|
---|
1074 | 3
|
---|
1075 | @end example
|
---|
1076 |
|
---|
1077 | @item p
|
---|
1078 | @findex p (print) command
|
---|
1079 | @cindex Text, printing
|
---|
1080 | Print out the pattern space (to the standard output).
|
---|
1081 | This command is usually only used in conjunction with the @option{-n}
|
---|
1082 | command-line option.
|
---|
1083 |
|
---|
1084 | Example: print only the second input line:
|
---|
1085 | @example
|
---|
1086 | $ seq 3 | sed -n 2p
|
---|
1087 | 2
|
---|
1088 | @end example
|
---|
1089 |
|
---|
1090 | @item n
|
---|
1091 | @findex n (next-line) command
|
---|
1092 | @cindex Next input line, replace pattern space with
|
---|
1093 | @cindex Read next input line
|
---|
1094 | If auto-print is not disabled, print the pattern space,
|
---|
1095 | then, regardless, replace the pattern space with the next line of input.
|
---|
1096 | If there is no more input then @command{sed} exits without processing
|
---|
1097 | any more commands.
|
---|
1098 |
|
---|
1099 | This command is useful to skip lines (e.g. process every Nth line).
|
---|
1100 |
|
---|
1101 | Example: perform substitution on every 3rd line (i.e. two @code{n} commands
|
---|
1102 | skip two lines):
|
---|
1103 | @codequoteundirected on
|
---|
1104 | @codequotebacktick on
|
---|
1105 | @example
|
---|
1106 | $ seq 6 | sed 'n;n;s/./x/'
|
---|
1107 | 1
|
---|
1108 | 2
|
---|
1109 | x
|
---|
1110 | 4
|
---|
1111 | 5
|
---|
1112 | x
|
---|
1113 | @end example
|
---|
1114 |
|
---|
1115 | @value{SSED} provides an extension address syntax of @var{first}~@var{step}
|
---|
1116 | to achieve the same result:
|
---|
1117 |
|
---|
1118 | @example
|
---|
1119 | $ seq 6 | sed '0~3s/./x/'
|
---|
1120 | 1
|
---|
1121 | 2
|
---|
1122 | x
|
---|
1123 | 4
|
---|
1124 | 5
|
---|
1125 | x
|
---|
1126 | @end example
|
---|
1127 |
|
---|
1128 | @codequotebacktick off
|
---|
1129 | @codequoteundirected off
|
---|
1130 |
|
---|
1131 |
|
---|
1132 | @item @{ @var{commands} @}
|
---|
1133 | @findex @{@} command grouping
|
---|
1134 | @cindex Grouping commands
|
---|
1135 | @cindex Command groups
|
---|
1136 | A group of commands may be enclosed between
|
---|
1137 | @code{@{} and @code{@}} characters.
|
---|
1138 | This is particularly useful when you want a group of commands
|
---|
1139 | to be triggered by a single address (or address-range) match.
|
---|
1140 |
|
---|
1141 | Example: perform substitution then print the second input line:
|
---|
1142 | @codequoteundirected on
|
---|
1143 | @codequotebacktick on
|
---|
1144 | @example
|
---|
1145 | $ seq 3 | sed -n '2@{s/2/X/ ; p@}'
|
---|
1146 | X
|
---|
1147 | @end example
|
---|
1148 | @codequoteundirected off
|
---|
1149 | @codequotebacktick off
|
---|
1150 |
|
---|
1151 | @end table
|
---|
1152 |
|
---|
1153 |
|
---|
1154 | @node Other Commands
|
---|
1155 | @section Less Frequently-Used Commands
|
---|
1156 |
|
---|
1157 | Though perhaps less frequently used than those in the previous
|
---|
1158 | section, some very small yet useful @command{sed} scripts can be built with
|
---|
1159 | these commands.
|
---|
1160 |
|
---|
1161 | @table @code
|
---|
1162 | @item y/@var{source-chars}/@var{dest-chars}/
|
---|
1163 | @findex y (transliterate) command
|
---|
1164 | @cindex Transliteration
|
---|
1165 | Transliterate any characters in the pattern space which match
|
---|
1166 | any of the @var{source-chars} with the corresponding character
|
---|
1167 | in @var{dest-chars}.
|
---|
1168 |
|
---|
1169 | Example: transliterate @samp{a-j} into @samp{0-9}:
|
---|
1170 | @codequoteundirected on
|
---|
1171 | @codequotebacktick on
|
---|
1172 | @example
|
---|
1173 | $ echo hello world | sed 'y/abcdefghij/0123456789/'
|
---|
1174 | 74llo worl3
|
---|
1175 | @end example
|
---|
1176 | @codequoteundirected off
|
---|
1177 | @codequotebacktick off
|
---|
1178 |
|
---|
1179 | (The @code{/} characters may be uniformly replaced by
|
---|
1180 | any other single character within any given @code{y} command.)
|
---|
1181 |
|
---|
1182 | Instances of the @code{/} (or whatever other character is used in its stead),
|
---|
1183 | @code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
|
---|
1184 | lists, provide that each instance is escaped by a @code{\}.
|
---|
1185 | The @var{source-chars} and @var{dest-chars} lists @emph{must}
|
---|
1186 | contain the same number of characters (after de-escaping).
|
---|
1187 |
|
---|
1188 | See the @command{tr} command from GNU coreutils for similar functionality.
|
---|
1189 |
|
---|
1190 | @item a @var{text}
|
---|
1191 | Appending @var{text} after a line. This is a GNU extension
|
---|
1192 | to the standard @code{a} command - see below for details.
|
---|
1193 |
|
---|
1194 | Example: Add @samp{hello} after the second line:
|
---|
1195 | @codequoteundirected on
|
---|
1196 | @codequotebacktick on
|
---|
1197 | @example
|
---|
1198 | $ seq 3 | sed '2a hello'
|
---|
1199 | 1
|
---|
1200 | 2
|
---|
1201 | hello
|
---|
1202 | 3
|
---|
1203 | @end example
|
---|
1204 | @codequoteundirected off
|
---|
1205 | @codequotebacktick off
|
---|
1206 |
|
---|
1207 | Leading whitespace after the @code{a} command is ignored.
|
---|
1208 | The text to add is read until the end of the line.
|
---|
1209 |
|
---|
1210 |
|
---|
1211 | @item a\
|
---|
1212 | @itemx @var{text}
|
---|
1213 | @findex a (append text lines) command
|
---|
1214 | @cindex Appending text after a line
|
---|
1215 | @cindex Text, appending
|
---|
1216 | Appending @var{text} after a line.
|
---|
1217 |
|
---|
1218 | Example: Add @samp{hello} after the second line
|
---|
1219 | (@print{} indicates printed output lines):
|
---|
1220 | @codequoteundirected on
|
---|
1221 | @codequotebacktick on
|
---|
1222 | @example
|
---|
1223 | $ seq 3 | sed '2a\
|
---|
1224 | hello'
|
---|
1225 | @print{}1
|
---|
1226 | @print{}2
|
---|
1227 | @print{}hello
|
---|
1228 | @print{}3
|
---|
1229 | @end example
|
---|
1230 | @codequoteundirected off
|
---|
1231 | @codequotebacktick off
|
---|
1232 |
|
---|
1233 | The @code{a} command queues the lines of text which follow this command
|
---|
1234 | (each but the last ending with a @code{\},
|
---|
1235 | which are removed from the output)
|
---|
1236 | to be output at the end of the current cycle,
|
---|
1237 | or when the next input line is read.
|
---|
1238 |
|
---|
1239 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
---|
1240 | As a GNU extension, this command accepts two addresses.
|
---|
1241 |
|
---|
1242 | Escape sequences in @var{text} are processed, so you should
|
---|
1243 | use @code{\\} in @var{text} to print a single backslash.
|
---|
1244 |
|
---|
1245 | The commands resume after the last line without a backslash (@code{\}) -
|
---|
1246 | @samp{world} in the following example:
|
---|
1247 | @codequoteundirected on
|
---|
1248 | @codequotebacktick on
|
---|
1249 | @example
|
---|
1250 | $ seq 3 | sed '2a\
|
---|
1251 | hello\
|
---|
1252 | world
|
---|
1253 | 3s/./X/'
|
---|
1254 | @print{}1
|
---|
1255 | @print{}2
|
---|
1256 | @print{}hello
|
---|
1257 | @print{}world
|
---|
1258 | @print{}X
|
---|
1259 | @end example
|
---|
1260 | @codequoteundirected off
|
---|
1261 | @codequotebacktick off
|
---|
1262 |
|
---|
1263 | As a GNU extension, the @code{a} command and @var{text} can be
|
---|
1264 | separated into two @code{-e} parameters, enabling easier scripting:
|
---|
1265 | @codequoteundirected on
|
---|
1266 | @codequotebacktick on
|
---|
1267 | @example
|
---|
1268 | $ seq 3 | sed -e '2a\' -e hello
|
---|
1269 | 1
|
---|
1270 | 2
|
---|
1271 | hello
|
---|
1272 | 3
|
---|
1273 |
|
---|
1274 | $ sed -e '2a\' -e "$VAR"
|
---|
1275 | @end example
|
---|
1276 | @codequoteundirected off
|
---|
1277 | @codequotebacktick off
|
---|
1278 |
|
---|
1279 | @item i @var{text}
|
---|
1280 | insert @var{text} before a line. This is a GNU extension
|
---|
1281 | to the standard @code{i} command - see below for details.
|
---|
1282 |
|
---|
1283 | Example: Insert @samp{hello} before the second line:
|
---|
1284 | @codequoteundirected on
|
---|
1285 | @codequotebacktick on
|
---|
1286 | @example
|
---|
1287 | $ seq 3 | sed '2i hello'
|
---|
1288 | 1
|
---|
1289 | hello
|
---|
1290 | 2
|
---|
1291 | 3
|
---|
1292 | @end example
|
---|
1293 | @codequoteundirected off
|
---|
1294 | @codequotebacktick off
|
---|
1295 |
|
---|
1296 | Leading whitespace after the @code{i} command is ignored.
|
---|
1297 | The text to add is read until the end of the line.
|
---|
1298 |
|
---|
1299 | @anchor{insert command}
|
---|
1300 | @item i\
|
---|
1301 | @itemx @var{text}
|
---|
1302 | @findex i (insert text lines) command
|
---|
1303 | @cindex Inserting text before a line
|
---|
1304 | @cindex Text, insertion
|
---|
1305 | Immediately output the lines of text which follow this command.
|
---|
1306 |
|
---|
1307 | Example: Insert @samp{hello} before the second line
|
---|
1308 | (@print{} indicates printed output lines):
|
---|
1309 | @codequoteundirected on
|
---|
1310 | @codequotebacktick on
|
---|
1311 | @example
|
---|
1312 | $ seq 3 | sed '2i\
|
---|
1313 | hello'
|
---|
1314 | @print{}1
|
---|
1315 | @print{}hello
|
---|
1316 | @print{}2
|
---|
1317 | @print{}3
|
---|
1318 | @end example
|
---|
1319 | @codequoteundirected off
|
---|
1320 | @codequotebacktick off
|
---|
1321 |
|
---|
1322 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
---|
1323 | As a GNU extension, this command accepts two addresses.
|
---|
1324 |
|
---|
1325 | Escape sequences in @var{text} are processed, so you should
|
---|
1326 | use @code{\\} in @var{text} to print a single backslash.
|
---|
1327 |
|
---|
1328 | The commands resume after the last line without a backslash (@code{\}) -
|
---|
1329 | @samp{world} in the following example:
|
---|
1330 | @codequoteundirected on
|
---|
1331 | @codequotebacktick on
|
---|
1332 | @example
|
---|
1333 | $ seq 3 | sed '2i\
|
---|
1334 | hello\
|
---|
1335 | world
|
---|
1336 | s/./X/'
|
---|
1337 | @print{}X
|
---|
1338 | @print{}hello
|
---|
1339 | @print{}world
|
---|
1340 | @print{}X
|
---|
1341 | @print{}X
|
---|
1342 | @end example
|
---|
1343 | @codequoteundirected off
|
---|
1344 | @codequotebacktick off
|
---|
1345 |
|
---|
1346 | As a GNU extension, the @code{i} command and @var{text} can be
|
---|
1347 | separated into two @code{-e} parameters, enabling easier scripting:
|
---|
1348 | @codequoteundirected on
|
---|
1349 | @codequotebacktick on
|
---|
1350 | @example
|
---|
1351 | $ seq 3 | sed -e '2i\' -e hello
|
---|
1352 | 1
|
---|
1353 | hello
|
---|
1354 | 2
|
---|
1355 | 3
|
---|
1356 |
|
---|
1357 | $ sed -e '2i\' -e "$VAR"
|
---|
1358 | @end example
|
---|
1359 | @codequoteundirected off
|
---|
1360 | @codequotebacktick off
|
---|
1361 |
|
---|
1362 | @item c @var{text}
|
---|
1363 | Replaces the line(s) with @var{text}. This is a GNU extension
|
---|
1364 | to the standard @code{c} command - see below for details.
|
---|
1365 |
|
---|
1366 | Example: Replace the 2nd to 9th lines with the word @samp{hello}:
|
---|
1367 | @codequoteundirected on
|
---|
1368 | @codequotebacktick on
|
---|
1369 | @example
|
---|
1370 | $ seq 10 | sed '2,9c hello'
|
---|
1371 | 1
|
---|
1372 | hello
|
---|
1373 | 10
|
---|
1374 | @end example
|
---|
1375 | @codequoteundirected off
|
---|
1376 | @codequotebacktick off
|
---|
1377 |
|
---|
1378 | Leading whitespace after the @code{c} command is ignored.
|
---|
1379 | The text to add is read until the end of the line.
|
---|
1380 |
|
---|
1381 | @item c\
|
---|
1382 | @itemx @var{text}
|
---|
1383 | @findex c (change to text lines) command
|
---|
1384 | @cindex Replacing selected lines with other text
|
---|
1385 | Delete the lines matching the address or address-range,
|
---|
1386 | and output the lines of text which follow this command.
|
---|
1387 |
|
---|
1388 | Example: Replace 2nd to 4th lines with the words @samp{hello} and
|
---|
1389 | @samp{world} (@print{} indicates printed output lines):
|
---|
1390 | @codequoteundirected on
|
---|
1391 | @codequotebacktick on
|
---|
1392 | @example
|
---|
1393 | $ seq 5 | sed '2,4c\
|
---|
1394 | hello\
|
---|
1395 | world'
|
---|
1396 | @print{}1
|
---|
1397 | @print{}hello
|
---|
1398 | @print{}world
|
---|
1399 | @print{}5
|
---|
1400 | @end example
|
---|
1401 | @codequoteundirected off
|
---|
1402 | @codequotebacktick off
|
---|
1403 |
|
---|
1404 | If no addresses are given, each line is replaced.
|
---|
1405 |
|
---|
1406 | A new cycle is started after this command is done,
|
---|
1407 | since the pattern space will have been deleted.
|
---|
1408 | In the following example, the @code{c} starts a
|
---|
1409 | new cycle and the substitution command is not performed
|
---|
1410 | on the replaced text:
|
---|
1411 |
|
---|
1412 | @codequoteundirected on
|
---|
1413 | @codequotebacktick on
|
---|
1414 | @example
|
---|
1415 | $ seq 3 | sed '2c\
|
---|
1416 | hello
|
---|
1417 | s/./X/'
|
---|
1418 | @print{}X
|
---|
1419 | @print{}hello
|
---|
1420 | @print{}X
|
---|
1421 | @end example
|
---|
1422 | @codequoteundirected off
|
---|
1423 | @codequotebacktick off
|
---|
1424 |
|
---|
1425 | As a GNU extension, the @code{c} command and @var{text} can be
|
---|
1426 | separated into two @code{-e} parameters, enabling easier scripting:
|
---|
1427 | @codequoteundirected on
|
---|
1428 | @codequotebacktick on
|
---|
1429 | @example
|
---|
1430 | $ seq 3 | sed -e '2c\' -e hello
|
---|
1431 | 1
|
---|
1432 | hello
|
---|
1433 | 3
|
---|
1434 |
|
---|
1435 | $ sed -e '2c\' -e "$VAR"
|
---|
1436 | @end example
|
---|
1437 | @codequoteundirected off
|
---|
1438 | @codequotebacktick off
|
---|
1439 |
|
---|
1440 |
|
---|
1441 | @item =
|
---|
1442 | @findex = (print line number) command
|
---|
1443 | @cindex Printing line number
|
---|
1444 | @cindex Line number, printing
|
---|
1445 | Print out the current input line number (with a trailing newline).
|
---|
1446 |
|
---|
1447 | @codequoteundirected on
|
---|
1448 | @codequotebacktick on
|
---|
1449 | @example
|
---|
1450 | $ printf '%s\n' aaa bbb ccc | sed =
|
---|
1451 | 1
|
---|
1452 | aaa
|
---|
1453 | 2
|
---|
1454 | bbb
|
---|
1455 | 3
|
---|
1456 | ccc
|
---|
1457 | @end example
|
---|
1458 | @codequoteundirected off
|
---|
1459 | @codequotebacktick off
|
---|
1460 |
|
---|
1461 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
---|
1462 | As a GNU extension, this command accepts two addresses.
|
---|
1463 |
|
---|
1464 |
|
---|
1465 |
|
---|
1466 |
|
---|
1467 | @item l @var{n}
|
---|
1468 | @findex l (list unambiguously) command
|
---|
1469 | @cindex List pattern space
|
---|
1470 | @cindex Printing text unambiguously
|
---|
1471 | @cindex Line length, setting
|
---|
1472 | @cindex @value{SSEDEXT}, setting line length
|
---|
1473 | Print the pattern space in an unambiguous form:
|
---|
1474 | non-printable characters (and the @code{\} character)
|
---|
1475 | are printed in C-style escaped form; long lines are split,
|
---|
1476 | with a trailing @code{\} character to indicate the split;
|
---|
1477 | the end of each line is marked with a @code{$}.
|
---|
1478 |
|
---|
1479 | @var{n} specifies the desired line-wrap length;
|
---|
1480 | a length of 0 (zero) means to never wrap long lines. If omitted,
|
---|
1481 | the default as specified on the command line is used. The @var{n}
|
---|
1482 | parameter is a @value{SSED} extension.
|
---|
1483 |
|
---|
1484 | @item r @var{filename}
|
---|
1485 |
|
---|
1486 | @findex r (read file) command
|
---|
1487 | @cindex Read text from a file
|
---|
1488 | Reads file @var{filename}. Example:
|
---|
1489 |
|
---|
1490 | @codequoteundirected on
|
---|
1491 | @codequotebacktick on
|
---|
1492 | @example
|
---|
1493 | $ seq 3 | sed '2r/etc/hostname'
|
---|
1494 | 1
|
---|
1495 | 2
|
---|
1496 | fencepost.gnu.org
|
---|
1497 | 3
|
---|
1498 | @end example
|
---|
1499 | @codequoteundirected off
|
---|
1500 | @codequotebacktick off
|
---|
1501 |
|
---|
1502 | @cindex @value{SSEDEXT}, @file{/dev/stdin} file
|
---|
1503 | Queue the contents of @var{filename} to be read and
|
---|
1504 | inserted into the output stream at the end of the current cycle,
|
---|
1505 | or when the next input line is read.
|
---|
1506 | Note that if @var{filename} cannot be read, it is treated as
|
---|
1507 | if it were an empty file, without any error indication.
|
---|
1508 |
|
---|
1509 | As a @value{SSED} extension, the special value @file{/dev/stdin}
|
---|
1510 | is supported for the file name, which reads the contents of the
|
---|
1511 | standard input.
|
---|
1512 |
|
---|
1513 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
---|
1514 | As a GNU extension, this command accepts two addresses. The
|
---|
1515 | file will then be reread and inserted on each of the addressed lines.
|
---|
1516 |
|
---|
1517 | As a @value{SSED} extension, the @code{r} command accepts a zero address,
|
---|
1518 | inserting a file @emph{before} the first line of the input
|
---|
1519 | @pxref{Adding a header to multiple files}.
|
---|
1520 |
|
---|
1521 | @item w @var{filename}
|
---|
1522 | @findex w (write file) command
|
---|
1523 | @cindex Write to a file
|
---|
1524 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file
|
---|
1525 | @cindex @value{SSEDEXT}, @file{/dev/stderr} file
|
---|
1526 | Write the pattern space to @var{filename}.
|
---|
1527 | As a @value{SSED} extension, two special values of @var{filename} are
|
---|
1528 | supported: @file{/dev/stderr}, which writes the result to the standard
|
---|
1529 | error, and @file{/dev/stdout}, which writes to the standard
|
---|
1530 | output.@footnote{This is equivalent to @code{p} unless the @option{-i}
|
---|
1531 | option is being used.}
|
---|
1532 |
|
---|
1533 | The file will be created (or truncated) before the first input line is
|
---|
1534 | read; all @code{w} commands (including instances of the @code{w} flag
|
---|
1535 | on successful @code{s} commands) which refer to the same @var{filename}
|
---|
1536 | are output without closing and reopening the file.
|
---|
1537 |
|
---|
1538 | @item D
|
---|
1539 | @findex D (delete first line) command
|
---|
1540 | @cindex Delete first line from pattern space
|
---|
1541 | If pattern space contains no newline, start a normal new cycle as if
|
---|
1542 | the @code{d} command was issued. Otherwise, delete text in the pattern
|
---|
1543 | space up to the first newline, and restart cycle with the resultant
|
---|
1544 | pattern space, without reading a new line of input.
|
---|
1545 |
|
---|
1546 | @item N
|
---|
1547 | @findex N (append Next line) command
|
---|
1548 | @cindex Next input line, append to pattern space
|
---|
1549 | @cindex Append next input line to pattern space
|
---|
1550 | Add a newline to the pattern space,
|
---|
1551 | then append the next line of input to the pattern space.
|
---|
1552 | If there is no more input then @command{sed} exits without processing
|
---|
1553 | any more commands.
|
---|
1554 |
|
---|
1555 | When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is
|
---|
1556 | added between the lines (instead of a new line).
|
---|
1557 |
|
---|
1558 | By default @command{sed} does not terminate if there is no 'next' input line.
|
---|
1559 | This is a GNU extension which can be disabled with @option{--posix}.
|
---|
1560 | @xref{N_command_last_line,,N command on the last line}.
|
---|
1561 |
|
---|
1562 |
|
---|
1563 | @item P
|
---|
1564 | @findex P (print first line) command
|
---|
1565 | @cindex Print first line from pattern space
|
---|
1566 | Print out the portion of the pattern space up to the first newline.
|
---|
1567 |
|
---|
1568 | @item h
|
---|
1569 | @findex h (hold) command
|
---|
1570 | @cindex Copy pattern space into hold space
|
---|
1571 | @cindex Replace hold space with copy of pattern space
|
---|
1572 | @cindex Hold space, copying pattern space into
|
---|
1573 | Replace the contents of the hold space with the contents of the pattern space.
|
---|
1574 |
|
---|
1575 | @item H
|
---|
1576 | @findex H (append Hold) command
|
---|
1577 | @cindex Append pattern space to hold space
|
---|
1578 | @cindex Hold space, appending from pattern space
|
---|
1579 | Append a newline to the contents of the hold space,
|
---|
1580 | and then append the contents of the pattern space to that of the hold space.
|
---|
1581 |
|
---|
1582 | @item g
|
---|
1583 | @findex g (get) command
|
---|
1584 | @cindex Copy hold space into pattern space
|
---|
1585 | @cindex Replace pattern space with copy of hold space
|
---|
1586 | @cindex Hold space, copy into pattern space
|
---|
1587 | Replace the contents of the pattern space with the contents of the hold space.
|
---|
1588 |
|
---|
1589 | @item G
|
---|
1590 | @findex G (appending Get) command
|
---|
1591 | @cindex Append hold space to pattern space
|
---|
1592 | @cindex Hold space, appending to pattern space
|
---|
1593 | Append a newline to the contents of the pattern space,
|
---|
1594 | and then append the contents of the hold space to that of the pattern space.
|
---|
1595 |
|
---|
1596 | @item x
|
---|
1597 | @findex x (eXchange) command
|
---|
1598 | @cindex Exchange hold space with pattern space
|
---|
1599 | @cindex Hold space, exchange with pattern space
|
---|
1600 | Exchange the contents of the hold and pattern spaces.
|
---|
1601 |
|
---|
1602 | @end table
|
---|
1603 |
|
---|
1604 |
|
---|
1605 | @node Programming Commands
|
---|
1606 | @section Commands for @command{sed} gurus
|
---|
1607 |
|
---|
1608 | In most cases, use of these commands indicates that you are
|
---|
1609 | probably better off programming in something like @command{awk}
|
---|
1610 | or Perl. But occasionally one is committed to sticking
|
---|
1611 | with @command{sed}, and these commands can enable one to write
|
---|
1612 | quite convoluted scripts.
|
---|
1613 |
|
---|
1614 | @cindex Flow of control in scripts
|
---|
1615 | @table @code
|
---|
1616 | @item : @var{label}
|
---|
1617 | [No addresses allowed.]
|
---|
1618 |
|
---|
1619 | @findex : (label) command
|
---|
1620 | @cindex Labels, in scripts
|
---|
1621 | Specify the location of @var{label} for branch commands.
|
---|
1622 | In all other respects, a no-op.
|
---|
1623 |
|
---|
1624 | @item b @var{label}
|
---|
1625 | @findex b (branch) command
|
---|
1626 | @cindex Branch to a label, unconditionally
|
---|
1627 | @cindex Goto, in scripts
|
---|
1628 | Unconditionally branch to @var{label}.
|
---|
1629 | The @var{label} may be omitted, in which case the next cycle is started.
|
---|
1630 |
|
---|
1631 | @item t @var{label}
|
---|
1632 | @findex t (test and branch if successful) command
|
---|
1633 | @cindex Branch to a label, if @code{s///} succeeded
|
---|
1634 | @cindex Conditional branch
|
---|
1635 | Branch to @var{label} only if there has been a successful @code{s}ubstitution
|
---|
1636 | since the last input line was read or conditional branch was taken.
|
---|
1637 | The @var{label} may be omitted, in which case the next cycle is started.
|
---|
1638 |
|
---|
1639 | @end table
|
---|
1640 |
|
---|
1641 | @node Extended Commands
|
---|
1642 | @section Commands Specific to @value{SSED}
|
---|
1643 |
|
---|
1644 | These commands are specific to @value{SSED}, so you
|
---|
1645 | must use them with care and only when you are sure that
|
---|
1646 | hindering portability is not evil. They allow you to check
|
---|
1647 | for @value{SSED} extensions or to do tasks that are required
|
---|
1648 | quite often, yet are unsupported by standard @command{sed}s.
|
---|
1649 |
|
---|
1650 | @table @code
|
---|
1651 | @item e [@var{command}]
|
---|
1652 | @findex e (evaluate) command
|
---|
1653 | @cindex Evaluate Bourne-shell commands
|
---|
1654 | @cindex Subprocesses
|
---|
1655 | @cindex @value{SSEDEXT}, evaluating Bourne-shell commands
|
---|
1656 | @cindex @value{SSEDEXT}, subprocesses
|
---|
1657 | This command allows one to pipe input from a shell command
|
---|
1658 | into pattern space. Without parameters, the @code{e} command
|
---|
1659 | executes the command that is found in pattern space and
|
---|
1660 | replaces the pattern space with the output; a trailing newline
|
---|
1661 | is suppressed.
|
---|
1662 |
|
---|
1663 | If a parameter is specified, instead, the @code{e} command
|
---|
1664 | interprets it as a command and sends its output to the output stream.
|
---|
1665 | The command can run across multiple lines, all but the last ending with
|
---|
1666 | a back-slash.
|
---|
1667 |
|
---|
1668 | In both cases, the results are undefined if the command to be
|
---|
1669 | executed contains a @sc{nul} character.
|
---|
1670 |
|
---|
1671 | Note that, unlike the @code{r} command, the output of the command will
|
---|
1672 | be printed immediately; the @code{r} command instead delays the output
|
---|
1673 | to the end of the current cycle.
|
---|
1674 |
|
---|
1675 | @item F
|
---|
1676 | @findex F (File name) command
|
---|
1677 | @cindex Printing file name
|
---|
1678 | @cindex File name, printing
|
---|
1679 | Print out the file name of the current input file (with a trailing
|
---|
1680 | newline).
|
---|
1681 |
|
---|
1682 | @item Q [@var{exit-code}]
|
---|
1683 | This command accepts only one address.
|
---|
1684 |
|
---|
1685 | @findex Q (silent Quit) command
|
---|
1686 | @cindex @value{SSEDEXT}, quitting silently
|
---|
1687 | @cindex @value{SSEDEXT}, returning an exit code
|
---|
1688 | @cindex Quitting
|
---|
1689 | This command is the same as @code{q}, but will not print the
|
---|
1690 | contents of pattern space. Like @code{q}, it provides the
|
---|
1691 | ability to return an exit code to the caller.
|
---|
1692 |
|
---|
1693 | This command can be useful because the only alternative ways
|
---|
1694 | to accomplish this apparently trivial function are to use
|
---|
1695 | the @option{-n} option (which can unnecessarily complicate
|
---|
1696 | your script) or resorting to the following snippet, which
|
---|
1697 | wastes time by reading the whole file without any visible effect:
|
---|
1698 |
|
---|
1699 | @example
|
---|
1700 | :eat
|
---|
1701 | $d @i{@r{Quit silently on the last line}}
|
---|
1702 | N @i{@r{Read another line, silently}}
|
---|
1703 | g @i{@r{Overwrite pattern space each time to save memory}}
|
---|
1704 | b eat
|
---|
1705 | @end example
|
---|
1706 |
|
---|
1707 | @item R @var{filename}
|
---|
1708 | @findex R (read line) command
|
---|
1709 | @cindex Read text from a file
|
---|
1710 | @cindex @value{SSEDEXT}, reading a file a line at a time
|
---|
1711 | @cindex @value{SSEDEXT}, @code{R} command
|
---|
1712 | @cindex @value{SSEDEXT}, @file{/dev/stdin} file
|
---|
1713 | Queue a line of @var{filename} to be read and
|
---|
1714 | inserted into the output stream at the end of the current cycle,
|
---|
1715 | or when the next input line is read.
|
---|
1716 | Note that if @var{filename} cannot be read, or if its end is
|
---|
1717 | reached, no line is appended, without any error indication.
|
---|
1718 |
|
---|
1719 | As with the @code{r} command, the special value @file{/dev/stdin}
|
---|
1720 | is supported for the file name, which reads a line from the
|
---|
1721 | standard input.
|
---|
1722 |
|
---|
1723 | @item T @var{label}
|
---|
1724 | @findex T (test and branch if failed) command
|
---|
1725 | @cindex @value{SSEDEXT}, branch if @code{s///} failed
|
---|
1726 | @cindex Branch to a label, if @code{s///} failed
|
---|
1727 | @cindex Conditional branch
|
---|
1728 | Branch to @var{label} only if there have been no successful
|
---|
1729 | @code{s}ubstitutions since the last input line was read or
|
---|
1730 | conditional branch was taken. The @var{label} may be omitted,
|
---|
1731 | in which case the next cycle is started.
|
---|
1732 |
|
---|
1733 | @item v @var{version}
|
---|
1734 | @findex v (version) command
|
---|
1735 | @cindex @value{SSEDEXT}, checking for their presence
|
---|
1736 | @cindex Requiring @value{SSED}
|
---|
1737 | This command does nothing, but makes @command{sed} fail if
|
---|
1738 | @value{SSED} extensions are not supported, simply because other
|
---|
1739 | versions of @command{sed} do not implement it. In addition, you
|
---|
1740 | can specify the version of @command{sed} that your script
|
---|
1741 | requires, such as @code{4.0.5}. The default is @code{4.0}
|
---|
1742 | because that is the first version that implemented this command.
|
---|
1743 |
|
---|
1744 | This command enables all @value{SSEDEXT} even if
|
---|
1745 | @env{POSIXLY_CORRECT} is set in the environment.
|
---|
1746 |
|
---|
1747 | @item W @var{filename}
|
---|
1748 | @findex W (write first line) command
|
---|
1749 | @cindex Write first line to a file
|
---|
1750 | @cindex @value{SSEDEXT}, writing first line to a file
|
---|
1751 | Write to the given filename the portion of the pattern space up to
|
---|
1752 | the first newline. Everything said under the @code{w} command about
|
---|
1753 | file handling holds here too.
|
---|
1754 |
|
---|
1755 | @item z
|
---|
1756 | @findex z (Zap) command
|
---|
1757 | @cindex @value{SSEDEXT}, emptying pattern space
|
---|
1758 | @cindex Emptying pattern space
|
---|
1759 | This command empties the content of pattern space. It is
|
---|
1760 | usually the same as @samp{s/.*//}, but is more efficient
|
---|
1761 | and works in the presence of invalid multibyte sequences
|
---|
1762 | in the input stream. @sc{posix} mandates that such sequences
|
---|
1763 | are @emph{not} matched by @samp{.}, so that there is no portable
|
---|
1764 | way to clear @command{sed}'s buffers in the middle of the
|
---|
1765 | script in most multibyte locales (including UTF-8 locales).
|
---|
1766 | @end table
|
---|
1767 |
|
---|
1768 |
|
---|
1769 | @node Multiple commands syntax
|
---|
1770 | @section Multiple commands syntax
|
---|
1771 |
|
---|
1772 | @c POSIX says:
|
---|
1773 | @c Editing commands other than {...}, a, b, c, i, r, t, w, :, and #
|
---|
1774 | @c can be followed by a <semicolon>, optional <blank> characters, and
|
---|
1775 | @c another editing command. However, when an s editing command is used
|
---|
1776 | @c with the w flag, following it with another command in this manner
|
---|
1777 | @c produces undefined results.
|
---|
1778 |
|
---|
1779 | There are several methods to specify multiple commands in a @command{sed}
|
---|
1780 | program.
|
---|
1781 |
|
---|
1782 | Using newlines is most natural when running a sed script from a file
|
---|
1783 | (using the @option{-f} option).
|
---|
1784 |
|
---|
1785 | On the command line, all @command{sed} commands may be separated by newlines.
|
---|
1786 | Alternatively, you may specify each command as an argument to an @option{-e}
|
---|
1787 | option:
|
---|
1788 |
|
---|
1789 | @codequoteundirected on
|
---|
1790 | @codequotebacktick on
|
---|
1791 | @example
|
---|
1792 | @group
|
---|
1793 | $ seq 6 | sed '1d
|
---|
1794 | 3d
|
---|
1795 | 5d'
|
---|
1796 | 2
|
---|
1797 | 4
|
---|
1798 | 6
|
---|
1799 |
|
---|
1800 | $ seq 6 | sed -e 1d -e 3d -e 5d
|
---|
1801 | 2
|
---|
1802 | 4
|
---|
1803 | 6
|
---|
1804 | @end group
|
---|
1805 | @end example
|
---|
1806 | @codequoteundirected off
|
---|
1807 | @codequotebacktick off
|
---|
1808 |
|
---|
1809 | A semicolon (@samp{;}) may be used to separate most simple commands:
|
---|
1810 |
|
---|
1811 | @codequoteundirected on
|
---|
1812 | @codequotebacktick on
|
---|
1813 | @example
|
---|
1814 | @group
|
---|
1815 | $ seq 6 | sed '1d;3d;5d'
|
---|
1816 | 2
|
---|
1817 | 4
|
---|
1818 | 6
|
---|
1819 | @end group
|
---|
1820 | @end example
|
---|
1821 | @codequoteundirected off
|
---|
1822 | @codequotebacktick off
|
---|
1823 |
|
---|
1824 | The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can
|
---|
1825 | be separated with a semicolon (this is a non-portable @value{SSED} extension).
|
---|
1826 |
|
---|
1827 | @codequoteundirected on
|
---|
1828 | @codequotebacktick on
|
---|
1829 | @example
|
---|
1830 | @group
|
---|
1831 | $ seq 4 | sed '@{1d;3d@}'
|
---|
1832 | 2
|
---|
1833 | 4
|
---|
1834 |
|
---|
1835 | $ seq 6 | sed '@{1d;3d@};5d'
|
---|
1836 | 2
|
---|
1837 | 4
|
---|
1838 | 6
|
---|
1839 | @end group
|
---|
1840 | @end example
|
---|
1841 | @codequoteundirected off
|
---|
1842 | @codequotebacktick off
|
---|
1843 |
|
---|
1844 | Labels used in @code{b},@code{t},@code{T},@code{:} commands are read
|
---|
1845 | until a semicolon. Leading and trailing whitespace is ignored. In
|
---|
1846 | the examples below the label is @samp{x}. The first example works
|
---|
1847 | with @value{SSED}. The second is a portable equivalent. For more
|
---|
1848 | information about branching and labels @pxref{Branching and flow
|
---|
1849 | control}.
|
---|
1850 |
|
---|
1851 | @codequoteundirected on
|
---|
1852 | @codequotebacktick on
|
---|
1853 | @example
|
---|
1854 | @group
|
---|
1855 | $ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d'
|
---|
1856 | 1
|
---|
1857 | =2
|
---|
1858 |
|
---|
1859 | $ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
|
---|
1860 | 1
|
---|
1861 | =2
|
---|
1862 | @end group
|
---|
1863 | @end example
|
---|
1864 | @codequoteundirected off
|
---|
1865 | @codequotebacktick off
|
---|
1866 |
|
---|
1867 |
|
---|
1868 |
|
---|
1869 | @subsection Commands Requiring a newline
|
---|
1870 |
|
---|
1871 | The following commands cannot be separated by a semicolon and
|
---|
1872 | require a newline:
|
---|
1873 |
|
---|
1874 | @table @asis
|
---|
1875 |
|
---|
1876 | @item @code{a},@code{c},@code{i} (append/change/insert)
|
---|
1877 |
|
---|
1878 | All characters following @code{a},@code{c},@code{i} commands are taken
|
---|
1879 | as the text to append/change/insert. Using a semicolon leads to
|
---|
1880 | undesirable results:
|
---|
1881 |
|
---|
1882 | @codequoteundirected on
|
---|
1883 | @codequotebacktick on
|
---|
1884 | @example
|
---|
1885 | @group
|
---|
1886 | $ seq 2 | sed '1aHello ; 2d'
|
---|
1887 | 1
|
---|
1888 | Hello ; 2d
|
---|
1889 | 2
|
---|
1890 | @end group
|
---|
1891 | @end example
|
---|
1892 | @codequoteundirected off
|
---|
1893 | @codequotebacktick off
|
---|
1894 |
|
---|
1895 | Separate the commands using @option{-e} or a newline:
|
---|
1896 |
|
---|
1897 | @codequoteundirected on
|
---|
1898 | @codequotebacktick on
|
---|
1899 | @example
|
---|
1900 | @group
|
---|
1901 | $ seq 2 | sed -e 1aHello -e 2d
|
---|
1902 | 1
|
---|
1903 | Hello
|
---|
1904 |
|
---|
1905 | $ seq 2 | sed '1aHello
|
---|
1906 | 2d'
|
---|
1907 | 1
|
---|
1908 | Hello
|
---|
1909 | @end group
|
---|
1910 | @end example
|
---|
1911 | @codequoteundirected off
|
---|
1912 | @codequotebacktick off
|
---|
1913 |
|
---|
1914 | Note that specifying the text to add (@samp{Hello}) immediately
|
---|
1915 | after @code{a},@code{c},@code{i} is itself a @value{SSED} extension.
|
---|
1916 | A portable, POSIX-compliant alternative is:
|
---|
1917 |
|
---|
1918 | @codequoteundirected on
|
---|
1919 | @codequotebacktick on
|
---|
1920 | @example
|
---|
1921 | @group
|
---|
1922 | $ seq 2 | sed '1a\
|
---|
1923 | Hello
|
---|
1924 | 2d'
|
---|
1925 | 1
|
---|
1926 | Hello
|
---|
1927 | @end group
|
---|
1928 | @end example
|
---|
1929 | @codequoteundirected off
|
---|
1930 | @codequotebacktick off
|
---|
1931 |
|
---|
1932 | @item @code{#} (comment)
|
---|
1933 |
|
---|
1934 | All characters following @samp{#} until the next newline are ignored.
|
---|
1935 |
|
---|
1936 | @codequoteundirected on
|
---|
1937 | @codequotebacktick on
|
---|
1938 | @example
|
---|
1939 | @group
|
---|
1940 | $ seq 3 | sed '# this is a comment ; 2d'
|
---|
1941 | 1
|
---|
1942 | 2
|
---|
1943 | 3
|
---|
1944 |
|
---|
1945 |
|
---|
1946 | $ seq 3 | sed '# this is a comment
|
---|
1947 | 2d'
|
---|
1948 | 1
|
---|
1949 | 3
|
---|
1950 | @end group
|
---|
1951 | @end example
|
---|
1952 | @codequoteundirected off
|
---|
1953 | @codequotebacktick off
|
---|
1954 |
|
---|
1955 | @item @code{r},@code{R},@code{w},@code{W} (reading and writing files)
|
---|
1956 |
|
---|
1957 | The @code{r},@code{R},@code{w},@code{W} commands parse the filename
|
---|
1958 | until end of the line. If whitespace, comments or semicolons are found,
|
---|
1959 | they will be included in the filename, leading to unexpected results:
|
---|
1960 |
|
---|
1961 | @codequoteundirected on
|
---|
1962 | @codequotebacktick on
|
---|
1963 | @example
|
---|
1964 | @group
|
---|
1965 | $ seq 2 | sed '1w hello.txt ; 2d'
|
---|
1966 | 1
|
---|
1967 | 2
|
---|
1968 |
|
---|
1969 | $ ls -log
|
---|
1970 | total 4
|
---|
1971 | -rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
|
---|
1972 |
|
---|
1973 | $ cat 'hello.txt ; 2d'
|
---|
1974 | 1
|
---|
1975 | @end group
|
---|
1976 | @end example
|
---|
1977 | @codequoteundirected off
|
---|
1978 | @codequotebacktick off
|
---|
1979 |
|
---|
1980 | Note that @command{sed} silently ignores read/write errors in
|
---|
1981 | @code{r},@code{R},@code{w},@code{W} commands (such as missing files).
|
---|
1982 | In the following example, @command{sed} tries to read a file named
|
---|
1983 | @samp{@file{hello.txt ; N}}. The file is missing, and the error is silently
|
---|
1984 | ignored:
|
---|
1985 |
|
---|
1986 | @codequoteundirected on
|
---|
1987 | @codequotebacktick on
|
---|
1988 | @example
|
---|
1989 | @group
|
---|
1990 | $ echo x | sed '1rhello.txt ; N'
|
---|
1991 | x
|
---|
1992 | @end group
|
---|
1993 | @end example
|
---|
1994 | @codequoteundirected off
|
---|
1995 | @codequotebacktick off
|
---|
1996 |
|
---|
1997 | @item @code{e} (command execution)
|
---|
1998 |
|
---|
1999 | Any characters following the @code{e} command until the end of the line
|
---|
2000 | will be sent to the shell. If whitespace, comments or semicolons are found,
|
---|
2001 | they will be included in the shell command, leading to unexpected results:
|
---|
2002 |
|
---|
2003 | @codequoteundirected on
|
---|
2004 | @codequotebacktick on
|
---|
2005 | @example
|
---|
2006 | @group
|
---|
2007 | $ echo a | sed '1e touch foo#bar'
|
---|
2008 | a
|
---|
2009 |
|
---|
2010 | $ ls -1
|
---|
2011 | foo#bar
|
---|
2012 |
|
---|
2013 | $ echo a | sed '1e touch foo ; s/a/b/'
|
---|
2014 | sh: 1: s/a/b/: not found
|
---|
2015 | a
|
---|
2016 | @end group
|
---|
2017 | @end example
|
---|
2018 | @codequoteundirected off
|
---|
2019 | @codequotebacktick off
|
---|
2020 |
|
---|
2021 |
|
---|
2022 | @item @code{s///[we]} (substitute with @code{e} or @code{w} flags)
|
---|
2023 |
|
---|
2024 | In a substitution command, the @code{w} flag writes the substitution
|
---|
2025 | result to a file, and the @code{e} flag executes the substitution result
|
---|
2026 | as a shell command. As with the @code{r/R/w/W/e} commands, these
|
---|
2027 | must be terminated with a newline. If whitespace, comments or semicolons
|
---|
2028 | are found, they will be included in the shell command or filename, leading to
|
---|
2029 | unexpected results:
|
---|
2030 |
|
---|
2031 | @codequoteundirected on
|
---|
2032 | @codequotebacktick on
|
---|
2033 | @example
|
---|
2034 | @group
|
---|
2035 | $ echo a | sed 's/a/b/w1.txt#foo'
|
---|
2036 | b
|
---|
2037 |
|
---|
2038 | $ ls -1
|
---|
2039 | 1.txt#foo
|
---|
2040 | @end group
|
---|
2041 | @end example
|
---|
2042 | @codequoteundirected off
|
---|
2043 | @codequotebacktick off
|
---|
2044 |
|
---|
2045 | @end table
|
---|
2046 |
|
---|
2047 |
|
---|
2048 | @node sed addresses
|
---|
2049 | @chapter Addresses: selecting lines
|
---|
2050 |
|
---|
2051 | @menu
|
---|
2052 | * Addresses overview:: Addresses overview
|
---|
2053 | * Numeric Addresses:: selecting lines by numbers
|
---|
2054 | * Regexp Addresses:: selecting lines by text matching
|
---|
2055 | * Range Addresses:: selecting a range of lines
|
---|
2056 | * Zero Address:: Using address @code{0}
|
---|
2057 | @end menu
|
---|
2058 |
|
---|
2059 | @node Addresses overview
|
---|
2060 | @section Addresses overview
|
---|
2061 |
|
---|
2062 | @cindex addresses, numeric
|
---|
2063 | @cindex numeric addresses
|
---|
2064 | Addresses determine on which line(s) the @command{sed} command will be
|
---|
2065 | executed. The following command replaces any first occurrence of @samp{hello}
|
---|
2066 | with @samp{world} only on line 144:
|
---|
2067 |
|
---|
2068 | @codequoteundirected on
|
---|
2069 | @codequotebacktick on
|
---|
2070 | @example
|
---|
2071 | sed '144s/hello/world/' input.txt > output.txt
|
---|
2072 | @end example
|
---|
2073 | @codequoteundirected off
|
---|
2074 | @codequotebacktick off
|
---|
2075 |
|
---|
2076 |
|
---|
2077 |
|
---|
2078 | If no address is specified, the command is performed on all lines.
|
---|
2079 | The following command replaces @samp{hello} with @samp{world},
|
---|
2080 | targeting every line of the input file.
|
---|
2081 | However, note that it modifies only the first instance of @samp{hello}
|
---|
2082 | on each line.
|
---|
2083 | Use the @samp{g} modifier to affect every instance on each affected line.
|
---|
2084 |
|
---|
2085 | @codequoteundirected on
|
---|
2086 | @codequotebacktick on
|
---|
2087 | @example
|
---|
2088 | sed 's/hello/world/' input.txt > output.txt
|
---|
2089 | @end example
|
---|
2090 | @codequoteundirected off
|
---|
2091 | @codequotebacktick off
|
---|
2092 |
|
---|
2093 |
|
---|
2094 |
|
---|
2095 | @cindex addresses, regular expression
|
---|
2096 | @cindex regular expression addresses
|
---|
2097 | Addresses can contain regular expressions to match lines based
|
---|
2098 | on content instead of line numbers. The following command replaces
|
---|
2099 | @samp{hello} with @samp{world} only on lines
|
---|
2100 | containing the string @samp{apple}:
|
---|
2101 |
|
---|
2102 | @codequoteundirected on
|
---|
2103 | @codequotebacktick on
|
---|
2104 | @example
|
---|
2105 | sed '/apple/s/hello/world/' input.txt > output.txt
|
---|
2106 | @end example
|
---|
2107 | @codequoteundirected off
|
---|
2108 | @codequotebacktick off
|
---|
2109 |
|
---|
2110 |
|
---|
2111 |
|
---|
2112 | @cindex addresses, range
|
---|
2113 | @cindex range addresses
|
---|
2114 | An address range is specified with two addresses separated by a comma
|
---|
2115 | (@code{,}). Addresses can be numeric, regular expressions, or a mix of
|
---|
2116 | both.
|
---|
2117 | The following command replaces @samp{hello} with @samp{world}
|
---|
2118 | only on lines 4 to 17 (inclusive):
|
---|
2119 |
|
---|
2120 | @codequoteundirected on
|
---|
2121 | @codequotebacktick on
|
---|
2122 | @example
|
---|
2123 | sed '4,17s/hello/world/' input.txt > output.txt
|
---|
2124 | @end example
|
---|
2125 | @codequoteundirected off
|
---|
2126 | @codequotebacktick off
|
---|
2127 |
|
---|
2128 |
|
---|
2129 |
|
---|
2130 | @cindex Excluding lines
|
---|
2131 | @cindex Selecting non-matching lines
|
---|
2132 | @cindex addresses, negating
|
---|
2133 | @cindex addresses, excluding
|
---|
2134 | Appending the @code{!} character to the end of an address
|
---|
2135 | specification (before the command letter) negates the sense of the
|
---|
2136 | match. That is, if the @code{!} character follows an address or an
|
---|
2137 | address range, then only lines which do @emph{not} match the addresses
|
---|
2138 | will be selected. The following command replaces @samp{hello}
|
---|
2139 | with @samp{world} only on lines @emph{not} containing the string
|
---|
2140 | @samp{apple}:
|
---|
2141 |
|
---|
2142 | @example
|
---|
2143 | sed '/apple/!s/hello/world/' input.txt > output.txt
|
---|
2144 | @end example
|
---|
2145 |
|
---|
2146 | The following command replaces @samp{hello} with
|
---|
2147 | @samp{world} only on lines 1 to 3 and from line 18 to the last line of the
|
---|
2148 | input file (i.e. excluding lines 4 to 17):
|
---|
2149 |
|
---|
2150 | @example
|
---|
2151 | sed '4,17!s/hello/world/' input.txt > output.txt
|
---|
2152 | @end example
|
---|
2153 |
|
---|
2154 |
|
---|
2155 |
|
---|
2156 |
|
---|
2157 |
|
---|
2158 | @node Numeric Addresses
|
---|
2159 | @section Selecting lines by numbers
|
---|
2160 | @cindex Addresses, in @command{sed} scripts
|
---|
2161 | @cindex Line selection
|
---|
2162 | @cindex Selecting lines to process
|
---|
2163 |
|
---|
2164 | Addresses in a @command{sed} script can be in any of the following forms:
|
---|
2165 | @table @code
|
---|
2166 | @item @var{number}
|
---|
2167 | @cindex Address, numeric
|
---|
2168 | @cindex Line, selecting by number
|
---|
2169 | Specifying a line number will match only that line in the input.
|
---|
2170 | (Note that @command{sed} counts lines continuously across all input files
|
---|
2171 | unless @option{-i} or @option{-s} options are specified.)
|
---|
2172 |
|
---|
2173 | @item $
|
---|
2174 | @cindex Address, last line
|
---|
2175 | @cindex Last line, selecting
|
---|
2176 | @cindex Line, selecting last
|
---|
2177 | This address matches the last line of the last file of input, or
|
---|
2178 | the last line of each file when the @option{-i} or @option{-s} options
|
---|
2179 | are specified.
|
---|
2180 |
|
---|
2181 |
|
---|
2182 | @item @var{first}~@var{step}
|
---|
2183 | @cindex GNU extensions, @samp{@var{n}~@var{m}} addresses
|
---|
2184 | This GNU extension matches every @var{step}th line
|
---|
2185 | starting with line @var{first}.
|
---|
2186 | In particular, lines will be selected when there exists
|
---|
2187 | a non-negative @var{n} such that the current line-number equals
|
---|
2188 | @var{first} + (@var{n} * @var{step}).
|
---|
2189 | Thus, one would use @code{1~2} to select the odd-numbered lines and
|
---|
2190 | @code{0~2} for even-numbered lines;
|
---|
2191 | to pick every third line starting with the second, @samp{2~3} would be used;
|
---|
2192 | to pick every fifth line starting with the tenth, use @samp{10~5};
|
---|
2193 | and @samp{50~0} is just an obscure way of saying @code{50}.
|
---|
2194 |
|
---|
2195 | The following commands demonstrate the step address usage:
|
---|
2196 |
|
---|
2197 | @example
|
---|
2198 | $ seq 10 | sed -n '0~4p'
|
---|
2199 | 4
|
---|
2200 | 8
|
---|
2201 |
|
---|
2202 | $ seq 10 | sed -n '1~3p'
|
---|
2203 | 1
|
---|
2204 | 4
|
---|
2205 | 7
|
---|
2206 | 10
|
---|
2207 | @end example
|
---|
2208 |
|
---|
2209 |
|
---|
2210 | @end table
|
---|
2211 |
|
---|
2212 |
|
---|
2213 |
|
---|
2214 | @node Regexp Addresses
|
---|
2215 | @section selecting lines by text matching
|
---|
2216 |
|
---|
2217 | @value{SSED} supports the following regular expression addresses.
|
---|
2218 | The default regular expression is
|
---|
2219 | @ref{BRE syntax, , Basic Regular Expression (BRE)}.
|
---|
2220 | If @option{-E} or @option{-r} options are used, The regular expression should be
|
---|
2221 | in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax.
|
---|
2222 | @xref{BRE vs ERE}.
|
---|
2223 |
|
---|
2224 | @table @code
|
---|
2225 | @item /@var{regexp}/
|
---|
2226 | @cindex Address, as a regular expression
|
---|
2227 | @cindex Line, selecting by regular expression match
|
---|
2228 | This will select any line which matches the regular expression @var{regexp}.
|
---|
2229 | If @var{regexp} itself includes any @code{/} characters,
|
---|
2230 | each must be escaped by a backslash (@code{\}).
|
---|
2231 |
|
---|
2232 | The following command prints lines in @file{/etc/passwd}
|
---|
2233 | which end with @samp{bash}@footnote{
|
---|
2234 | There are of course many other ways to do the same,
|
---|
2235 | e.g.
|
---|
2236 | @example
|
---|
2237 | grep 'bash$' /etc/passwd
|
---|
2238 | awk -F: '$7 == "/bin/bash"' /etc/passwd
|
---|
2239 | @end example
|
---|
2240 | }:
|
---|
2241 |
|
---|
2242 | @example
|
---|
2243 | sed -n '/bash$/p' /etc/passwd
|
---|
2244 | @end example
|
---|
2245 |
|
---|
2246 | @cindex empty regular expression
|
---|
2247 | @cindex @value{SSEDEXT}, modifiers and the empty regular expression
|
---|
2248 | The empty regular expression @samp{//} repeats the last regular
|
---|
2249 | expression match (the same holds if the empty regular expression is
|
---|
2250 | passed to the @code{s} command). Note that modifiers to regular expressions
|
---|
2251 | are evaluated when the regular expression is compiled, thus it is invalid to
|
---|
2252 | specify them together with the empty regular expression.
|
---|
2253 |
|
---|
2254 | @item \%@var{regexp}%
|
---|
2255 | (The @code{%} may be replaced by any other single character.)
|
---|
2256 |
|
---|
2257 | @cindex Slash character, in regular expressions
|
---|
2258 | This also matches the regular expression @var{regexp},
|
---|
2259 | but allows one to use a different delimiter than @code{/}.
|
---|
2260 | This is particularly useful if the @var{regexp} itself contains
|
---|
2261 | a lot of slashes, since it avoids the tedious escaping of every @code{/}.
|
---|
2262 | If @var{regexp} itself includes any delimiter characters,
|
---|
2263 | each must be escaped by a backslash (@code{\}).
|
---|
2264 |
|
---|
2265 | The following commands are equivalent. They print lines
|
---|
2266 | which start with @samp{/home/alice/documents/}:
|
---|
2267 |
|
---|
2268 | @example
|
---|
2269 | sed -n '/^\/home\/alice\/documents\//p'
|
---|
2270 | sed -n '\%^/home/alice/documents/%p'
|
---|
2271 | sed -n '\;^/home/alice/documents/;p'
|
---|
2272 | @end example
|
---|
2273 |
|
---|
2274 |
|
---|
2275 | @item /@var{regexp}/I
|
---|
2276 | @itemx \%@var{regexp}%I
|
---|
2277 | @cindex GNU extensions, @code{I} modifier
|
---|
2278 | @cindex case insensitive, regular expression
|
---|
2279 | The @code{I} modifier to regular-expression matching is a GNU
|
---|
2280 | extension which causes the @var{regexp} to be matched in
|
---|
2281 | a case-insensitive manner.
|
---|
2282 |
|
---|
2283 | In many other programming languages, a lower case @code{i} is used
|
---|
2284 | for case-insensitive regular expression matching. However, in @command{sed}
|
---|
2285 | the @code{i} is used for the insert command (@pxref{insert command}).
|
---|
2286 |
|
---|
2287 | Observe the difference between the following examples.
|
---|
2288 |
|
---|
2289 | In this example, @code{/b/I} is the address: regular expression with @code{I}
|
---|
2290 | modifier. @code{d} is the delete command:
|
---|
2291 |
|
---|
2292 | @example
|
---|
2293 | $ printf "%s\n" a b c | sed '/b/Id'
|
---|
2294 | a
|
---|
2295 | c
|
---|
2296 | @end example
|
---|
2297 |
|
---|
2298 | Here, @code{/b/} is the address: a regular expression.
|
---|
2299 | @code{i} is the insert command.
|
---|
2300 | @code{d} is the value to insert.
|
---|
2301 | A line with @samp{d} is then inserted above the matched line:
|
---|
2302 |
|
---|
2303 | @example
|
---|
2304 | $ printf "%s\n" a b c | sed '/b/id'
|
---|
2305 | a
|
---|
2306 | d
|
---|
2307 | b
|
---|
2308 | c
|
---|
2309 | @end example
|
---|
2310 |
|
---|
2311 | @item /@var{regexp}/M
|
---|
2312 | @itemx \%@var{regexp}%M
|
---|
2313 | @cindex @value{SSEDEXT}, @code{M} modifier
|
---|
2314 | The @code{M} modifier to regular-expression matching is a @value{SSED}
|
---|
2315 | extension which directs @value{SSED} to match the regular expression
|
---|
2316 | in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
|
---|
2317 | match respectively (in addition to the normal behavior) the empty string
|
---|
2318 | after a newline, and the empty string before a newline. There are
|
---|
2319 | special character sequences
|
---|
2320 | @ifclear PERL
|
---|
2321 | (@code{\`} and @code{\'})
|
---|
2322 | @end ifclear
|
---|
2323 | which always match the beginning or the end of the buffer.
|
---|
2324 | In addition,
|
---|
2325 | the period character does not match a new-line character in
|
---|
2326 | multi-line mode.
|
---|
2327 | @end table
|
---|
2328 |
|
---|
2329 |
|
---|
2330 | @cindex regex addresses and pattern space
|
---|
2331 | @cindex regex addresses and input lines
|
---|
2332 | Regex addresses operate on the content of the current
|
---|
2333 | pattern space. If the pattern space is changed (for example with @code{s///}
|
---|
2334 | command) the regular expression matching will operate on the changed text.
|
---|
2335 |
|
---|
2336 | In the following example, automatic printing is disabled with
|
---|
2337 | @option{-n}. The @code{s/2/X/} command changes lines containing
|
---|
2338 | @samp{2} to @samp{X}. The command @code{/[0-9]/p} matches
|
---|
2339 | lines with digits and prints them.
|
---|
2340 | Because the second line is changed before the @code{/[0-9]/} regex,
|
---|
2341 | it will not match and will not be printed:
|
---|
2342 |
|
---|
2343 | @codequoteundirected on
|
---|
2344 | @codequotebacktick on
|
---|
2345 | @example
|
---|
2346 | @group
|
---|
2347 | $ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
|
---|
2348 | 1
|
---|
2349 | 3
|
---|
2350 | @end group
|
---|
2351 | @end example
|
---|
2352 | @codequoteundirected off
|
---|
2353 | @codequotebacktick off
|
---|
2354 |
|
---|
2355 |
|
---|
2356 | @node Range Addresses
|
---|
2357 | @section Range Addresses
|
---|
2358 |
|
---|
2359 | @cindex Range of lines
|
---|
2360 | @cindex Several lines, selecting
|
---|
2361 | An address range can be specified by specifying two addresses
|
---|
2362 | separated by a comma (@code{,}). An address range matches lines
|
---|
2363 | starting from where the first address matches, and continues
|
---|
2364 | until the second address matches (inclusively):
|
---|
2365 |
|
---|
2366 | @example
|
---|
2367 | $ seq 10 | sed -n '4,6p'
|
---|
2368 | 4
|
---|
2369 | 5
|
---|
2370 | 6
|
---|
2371 | @end example
|
---|
2372 |
|
---|
2373 | If the second address is a @var{regexp}, then checking for the
|
---|
2374 | ending match will start with the line @emph{following} the
|
---|
2375 | line which matched the first address: a range will always
|
---|
2376 | span at least two lines (except of course if the input stream
|
---|
2377 | ends).
|
---|
2378 |
|
---|
2379 | @example
|
---|
2380 | $ seq 10 | sed -n '4,/[0-9]/p'
|
---|
2381 | 4
|
---|
2382 | 5
|
---|
2383 | @end example
|
---|
2384 |
|
---|
2385 | If the second address is a @var{number} less than (or equal to)
|
---|
2386 | the line matching the first address, then only the one line is
|
---|
2387 | matched:
|
---|
2388 |
|
---|
2389 | @example
|
---|
2390 | $ seq 10 | sed -n '4,1p'
|
---|
2391 | 4
|
---|
2392 | @end example
|
---|
2393 |
|
---|
2394 | @anchor{Zero Address Regex Range}
|
---|
2395 | @cindex Special addressing forms
|
---|
2396 | @cindex Range with start address of zero
|
---|
2397 | @cindex Zero, as range start address
|
---|
2398 | @cindex @var{addr1},+N
|
---|
2399 | @cindex @var{addr1},~N
|
---|
2400 | @cindex GNU extensions, special two-address forms
|
---|
2401 | @cindex GNU extensions, @code{0} address
|
---|
2402 | @cindex GNU extensions, 0,@var{addr2} addressing
|
---|
2403 | @cindex GNU extensions, @var{addr1},+@var{N} addressing
|
---|
2404 | @cindex GNU extensions, @var{addr1},~@var{N} addressing
|
---|
2405 | @value{SSED} also supports some special two-address forms; all these
|
---|
2406 | are GNU extensions:
|
---|
2407 | @table @code
|
---|
2408 | @item 0,/@var{regexp}/
|
---|
2409 | A line number of @code{0} can be used in an address specification like
|
---|
2410 | @code{0,/@var{regexp}/} so that @command{sed} will try to match
|
---|
2411 | @var{regexp} in the first input line too. In other words,
|
---|
2412 | @code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
|
---|
2413 | except that if @var{addr2} matches the very first line of input the
|
---|
2414 | @code{0,/@var{regexp}/} form will consider it to end the range, whereas
|
---|
2415 | the @code{1,/@var{regexp}/} form will match the beginning of its range and
|
---|
2416 | hence make the range span up to the @emph{second} occurrence of the
|
---|
2417 | regular expression.
|
---|
2418 |
|
---|
2419 | The following examples demonstrate the difference between starting
|
---|
2420 | with address 1 and 0:
|
---|
2421 |
|
---|
2422 | @example
|
---|
2423 | $ seq 10 | sed -n '1,/[0-9]/p'
|
---|
2424 | 1
|
---|
2425 | 2
|
---|
2426 |
|
---|
2427 | $ seq 10 | sed -n '0,/[0-9]/p'
|
---|
2428 | 1
|
---|
2429 | @end example
|
---|
2430 |
|
---|
2431 |
|
---|
2432 | @item @var{addr1},+@var{N}
|
---|
2433 | Matches @var{addr1} and the @var{N} lines following @var{addr1}.
|
---|
2434 |
|
---|
2435 | @example
|
---|
2436 | $ seq 10 | sed -n '6,+2p'
|
---|
2437 | 6
|
---|
2438 | 7
|
---|
2439 | 8
|
---|
2440 | @end example
|
---|
2441 |
|
---|
2442 | @var{addr1} can be a line number or a regular expression.
|
---|
2443 |
|
---|
2444 | @item @var{addr1},~@var{N}
|
---|
2445 | Matches @var{addr1} and the lines following @var{addr1}
|
---|
2446 | until the next line whose input line number is a multiple of @var{N}.
|
---|
2447 | The following command prints starting at line 6, until the next line which
|
---|
2448 | is a multiple of 4 (i.e. line 8):
|
---|
2449 |
|
---|
2450 | @example
|
---|
2451 | $ seq 10 | sed -n '6,~4p'
|
---|
2452 | 6
|
---|
2453 | 7
|
---|
2454 | 8
|
---|
2455 | @end example
|
---|
2456 |
|
---|
2457 | @var{addr1} can be a line number or a regular expression.
|
---|
2458 |
|
---|
2459 | @end table
|
---|
2460 |
|
---|
2461 |
|
---|
2462 |
|
---|
2463 | @node Zero Address
|
---|
2464 | @section Zero Address
|
---|
2465 | @cindex Zero Address
|
---|
2466 | As a @value{SSED} extension, @code{0} address can be used in two cases:
|
---|
2467 | @enumerate
|
---|
2468 | @item
|
---|
2469 | In a regex range addresses as @code{0,/@var{regexp}/}
|
---|
2470 | (@pxref{Zero Address Regex Range}).
|
---|
2471 | @item
|
---|
2472 | With the @code{r} command, inserting a file before the first line
|
---|
2473 | (@pxref{Adding a header to multiple files}).
|
---|
2474 | @end enumerate
|
---|
2475 |
|
---|
2476 | Note that these are the only places where the @code{0} address makes
|
---|
2477 | sense; Commands which are given the @code{0} address in any
|
---|
2478 | other way will give an error.
|
---|
2479 |
|
---|
2480 |
|
---|
2481 |
|
---|
2482 | @node sed regular expressions
|
---|
2483 | @chapter Regular Expressions: selecting text
|
---|
2484 |
|
---|
2485 | @menu
|
---|
2486 | * Regular Expressions Overview:: Overview of Regular expression in @command{sed}
|
---|
2487 | * BRE vs ERE:: Basic (BRE) and extended (ERE) regular expression
|
---|
2488 | syntax
|
---|
2489 | * BRE syntax:: Overview of basic regular expression syntax
|
---|
2490 | * ERE syntax:: Overview of extended regular expression syntax
|
---|
2491 | * Character Classes and Bracket Expressions::
|
---|
2492 | * regexp extensions:: Additional regular expression commands
|
---|
2493 | * Back-references and Subexpressions:: Back-references and Subexpressions
|
---|
2494 | * Escapes:: Specifying special characters
|
---|
2495 | * Locale Considerations:: Multibyte characters and locale considerations
|
---|
2496 | @end menu
|
---|
2497 |
|
---|
2498 | @node Regular Expressions Overview
|
---|
2499 | @section Overview of regular expression in @command{sed}
|
---|
2500 |
|
---|
2501 | @c NOTE: Keep examples in the 'overview' section
|
---|
2502 | @c neutral in regards to BRE/ERE - to ease understanding.
|
---|
2503 |
|
---|
2504 |
|
---|
2505 | To know how to use @command{sed}, people should understand regular
|
---|
2506 | expressions (@dfn{regexp} for short). A regular expression
|
---|
2507 | is a pattern that is matched against a
|
---|
2508 | subject string from left to right. Most characters are
|
---|
2509 | @dfn{ordinary}: they stand for
|
---|
2510 | themselves in a pattern, and match the corresponding characters.
|
---|
2511 | Regular expressions in @command{sed} are specified between two
|
---|
2512 | slashes.
|
---|
2513 |
|
---|
2514 | The following command prints lines containing the string @samp{hello}:
|
---|
2515 |
|
---|
2516 | @example
|
---|
2517 | sed -n '/hello/p'
|
---|
2518 | @end example
|
---|
2519 |
|
---|
2520 | The above example is equivalent to this @command{grep} command:
|
---|
2521 |
|
---|
2522 | @example
|
---|
2523 | grep 'hello'
|
---|
2524 | @end example
|
---|
2525 |
|
---|
2526 | The power of regular expressions comes from the ability to include
|
---|
2527 | alternatives and repetitions in the pattern. These are encoded in the
|
---|
2528 | pattern by the use of @dfn{special characters}, which do not stand for
|
---|
2529 | themselves but instead are interpreted in some special way.
|
---|
2530 |
|
---|
2531 | The character @code{^} (caret) in a regular expression matches the
|
---|
2532 | beginning of the line. The character @code{.} (dot) matches any single
|
---|
2533 | character. The following @command{sed} command matches and prints
|
---|
2534 | lines which start with the letter @samp{b}, followed by any single character,
|
---|
2535 | followed by the letter @samp{d}:
|
---|
2536 |
|
---|
2537 | @example
|
---|
2538 | $ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p'
|
---|
2539 | bad
|
---|
2540 | bed
|
---|
2541 | bid
|
---|
2542 | body
|
---|
2543 | @end example
|
---|
2544 |
|
---|
2545 | The following sections explain the meaning and usage of special
|
---|
2546 | characters in regular expressions.
|
---|
2547 |
|
---|
2548 | @node BRE vs ERE
|
---|
2549 | @section Basic (BRE) and extended (ERE) regular expression
|
---|
2550 |
|
---|
2551 | Basic and extended regular expressions are two variations on the
|
---|
2552 | syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the
|
---|
2553 | default in @command{sed} (and similarly in @command{grep}).
|
---|
2554 | Use the POSIX-specified @option{-E} option (@option{-r},
|
---|
2555 | @option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax.
|
---|
2556 |
|
---|
2557 | In @value{SSED}, the only difference between basic and extended regular
|
---|
2558 | expressions is in the behavior of a few special characters: @samp{?},
|
---|
2559 | @samp{+}, parentheses, braces (@samp{@{@}}), and @samp{|}.
|
---|
2560 |
|
---|
2561 | With basic (BRE) syntax, these characters do not have special meaning
|
---|
2562 | unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax
|
---|
2563 | it is reversed: these characters are special unless they are prefixed
|
---|
2564 | with backslash (@samp{\}).
|
---|
2565 |
|
---|
2566 | @multitable @columnfractions .28 .36 .35
|
---|
2567 |
|
---|
2568 | @headitem Desired pattern
|
---|
2569 | @tab Basic (BRE) Syntax
|
---|
2570 | @tab Extended (ERE) Syntax
|
---|
2571 |
|
---|
2572 | @item literal @samp{+} (plus sign)
|
---|
2573 |
|
---|
2574 | @tab
|
---|
2575 | @exampleindent 0
|
---|
2576 | @codequoteundirected on
|
---|
2577 | @codequotebacktick on
|
---|
2578 | @example
|
---|
2579 | $ echo 'a+b=c' > foo
|
---|
2580 | $ sed -n '/a+b/p' foo
|
---|
2581 | a+b=c
|
---|
2582 | @end example
|
---|
2583 | @codequotebacktick off
|
---|
2584 | @codequoteundirected off
|
---|
2585 |
|
---|
2586 | @tab
|
---|
2587 | @exampleindent 0
|
---|
2588 | @codequoteundirected on
|
---|
2589 | @codequotebacktick on
|
---|
2590 | @example
|
---|
2591 | $ echo 'a+b=c' > foo
|
---|
2592 | $ sed -E -n '/a\+b/p' foo
|
---|
2593 | a+b=c
|
---|
2594 | @end example
|
---|
2595 | @codequotebacktick off
|
---|
2596 | @codequoteundirected off
|
---|
2597 |
|
---|
2598 |
|
---|
2599 | @item One or more @samp{a} characters followed by @samp{b}
|
---|
2600 | (plus sign as special meta-character)
|
---|
2601 |
|
---|
2602 | @tab
|
---|
2603 | @exampleindent 0
|
---|
2604 | @codequoteundirected on
|
---|
2605 | @codequotebacktick on
|
---|
2606 | @example
|
---|
2607 | $ echo aab > foo
|
---|
2608 | $ sed -n '/a\+b/p' foo
|
---|
2609 | aab
|
---|
2610 | @end example
|
---|
2611 | @codequotebacktick off
|
---|
2612 | @codequoteundirected off
|
---|
2613 |
|
---|
2614 | @tab
|
---|
2615 | @exampleindent 0
|
---|
2616 | @codequoteundirected on
|
---|
2617 | @codequotebacktick on
|
---|
2618 | @example
|
---|
2619 | $ echo aab > foo
|
---|
2620 | $ sed -E -n '/a+b/p' foo
|
---|
2621 | aab
|
---|
2622 | @end example
|
---|
2623 | @codequotebacktick off
|
---|
2624 | @codequoteundirected off
|
---|
2625 |
|
---|
2626 | @end multitable
|
---|
2627 |
|
---|
2628 |
|
---|
2629 |
|
---|
2630 |
|
---|
2631 | @node BRE syntax
|
---|
2632 | @section Overview of basic regular expression syntax
|
---|
2633 |
|
---|
2634 | Here is a brief description
|
---|
2635 | of regular expression syntax as used in @command{sed}.
|
---|
2636 |
|
---|
2637 | @table @code
|
---|
2638 | @item @var{char}
|
---|
2639 | A single ordinary character matches itself.
|
---|
2640 |
|
---|
2641 | @item *
|
---|
2642 | @cindex GNU extensions, to basic regular expressions
|
---|
2643 | Matches a sequence of zero or more instances of matches for the
|
---|
2644 | preceding regular expression, which must be an ordinary character, a
|
---|
2645 | special character preceded by @code{\}, a @code{.}, a grouped regexp
|
---|
2646 | (see below), or a bracket expression. As a GNU extension, a
|
---|
2647 | postfixed regular expression can also be followed by @code{*}; for
|
---|
2648 | example, @code{a**} is equivalent to @code{a*}. POSIX
|
---|
2649 | 1003.1-2001 says that @code{*} stands for itself when it appears at
|
---|
2650 | the start of a regular expression or subexpression, but many
|
---|
2651 | non-GNU implementations do not support this and portable
|
---|
2652 | scripts should instead use @code{\*} in these contexts.
|
---|
2653 | @item .
|
---|
2654 | Matches any character, including newline.
|
---|
2655 |
|
---|
2656 | @item ^
|
---|
2657 | Matches the null string at beginning of the pattern space, i.e. what
|
---|
2658 | appears after the circumflex must appear at the beginning of the
|
---|
2659 | pattern space.
|
---|
2660 |
|
---|
2661 | In most scripts, pattern space is initialized to the content of each
|
---|
2662 | line (@pxref{Execution Cycle, , How @code{sed} works}). So, it is a
|
---|
2663 | useful simplification to think of @code{^#include} as matching only
|
---|
2664 | lines where @samp{#include} is the first thing on the line---if there is
|
---|
2665 | any preceding space, for example, the match fails. This simplification is
|
---|
2666 | valid as long as the original content of pattern space is not modified,
|
---|
2667 | for example with an @code{s} command.
|
---|
2668 |
|
---|
2669 | @code{^} acts as a special character only at the beginning of the
|
---|
2670 | regular expression or subexpression (that is, after @code{\(} or
|
---|
2671 | @code{\|}). Portable scripts should avoid @code{^} at the beginning of
|
---|
2672 | a subexpression, though, as POSIX allows implementations that
|
---|
2673 | treat @code{^} as an ordinary character in that context.
|
---|
2674 |
|
---|
2675 | @item $
|
---|
2676 | It is the same as @code{^}, but refers to end of pattern space.
|
---|
2677 | @code{$} also acts as a special character only at the end
|
---|
2678 | of the regular expression or subexpression (that is, before @code{\)}
|
---|
2679 | or @code{\|}), and its use at the end of a subexpression is not
|
---|
2680 | portable.
|
---|
2681 |
|
---|
2682 |
|
---|
2683 | @item [@var{list}]
|
---|
2684 | @itemx [^@var{list}]
|
---|
2685 | Matches any single character in @var{list}: for example,
|
---|
2686 | @code{[aeiou]} matches all vowels. A list may include
|
---|
2687 | sequences like @code{@var{char1}-@var{char2}}, which
|
---|
2688 | matches any character between (inclusive) @var{char1}
|
---|
2689 | and @var{char2}.
|
---|
2690 | @xref{Character Classes and Bracket Expressions}.
|
---|
2691 |
|
---|
2692 | @item \+
|
---|
2693 | @cindex GNU extensions, to basic regular expressions
|
---|
2694 | As @code{*}, but matches one or more. It is a GNU extension.
|
---|
2695 |
|
---|
2696 | @item \?
|
---|
2697 | @cindex GNU extensions, to basic regular expressions
|
---|
2698 | As @code{*}, but only matches zero or one. It is a GNU extension.
|
---|
2699 |
|
---|
2700 | @item \@{@var{i}\@}
|
---|
2701 | As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
|
---|
2702 | decimal integer; for portability, keep it between 0 and 255
|
---|
2703 | inclusive).
|
---|
2704 |
|
---|
2705 | @item \@{@var{i},@var{j}\@}
|
---|
2706 | Matches between @var{i} and @var{j}, inclusive, sequences.
|
---|
2707 |
|
---|
2708 | @item \@{@var{i},\@}
|
---|
2709 | Matches more than or equal to @var{i} sequences.
|
---|
2710 |
|
---|
2711 | @item \(@var{regexp}\)
|
---|
2712 | Groups the inner @var{regexp} as a whole, this is used to:
|
---|
2713 |
|
---|
2714 | @itemize @bullet
|
---|
2715 | @item
|
---|
2716 | @cindex GNU extensions, to basic regular expressions
|
---|
2717 | Apply postfix operators, like @code{\(abcd\)*}:
|
---|
2718 | this will search for zero or more whole sequences
|
---|
2719 | of @samp{abcd}, while @code{abcd*} would search
|
---|
2720 | for @samp{abc} followed by zero or more occurrences
|
---|
2721 | of @samp{d}. Note that support for @code{\(abcd\)*} is
|
---|
2722 | required by POSIX 1003.1-2001, but many non-GNU
|
---|
2723 | implementations do not support it and hence it is not universally
|
---|
2724 | portable.
|
---|
2725 |
|
---|
2726 | @item
|
---|
2727 | Use back references (see below).
|
---|
2728 | @end itemize
|
---|
2729 |
|
---|
2730 |
|
---|
2731 | @item @var{regexp1}\|@var{regexp2}
|
---|
2732 | @cindex GNU extensions, to basic regular expressions
|
---|
2733 | Matches either @var{regexp1} or @var{regexp2}. Use
|
---|
2734 | parentheses to use complex alternative regular expressions.
|
---|
2735 | The matching process tries each alternative in turn, from
|
---|
2736 | left to right, and the first one that succeeds is used.
|
---|
2737 | It is a GNU extension.
|
---|
2738 |
|
---|
2739 | @item @var{regexp1}@var{regexp2}
|
---|
2740 | Matches the concatenation of @var{regexp1} and @var{regexp2}.
|
---|
2741 | Concatenation binds more tightly than @code{\|}, @code{^}, and
|
---|
2742 | @code{$}, but less tightly than the other regular expression
|
---|
2743 | operators.
|
---|
2744 |
|
---|
2745 | @item \@var{digit}
|
---|
2746 | Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized
|
---|
2747 | subexpression in the regular expression. This is called a @dfn{back
|
---|
2748 | reference}. Subexpressions are implicitly numbered by counting
|
---|
2749 | occurrences of @code{\(} left-to-right.
|
---|
2750 |
|
---|
2751 | @item \n
|
---|
2752 | Matches the newline character.
|
---|
2753 |
|
---|
2754 | @item \@var{char}
|
---|
2755 | Matches @var{char}, where @var{char} is one of @code{$},
|
---|
2756 | @code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
|
---|
2757 | Note that the only C-like
|
---|
2758 | backslash sequences that you can portably assume to be
|
---|
2759 | interpreted are @code{\n} and @code{\\}; in particular
|
---|
2760 | @code{\t} is not portable, and matches a @samp{t} under most
|
---|
2761 | implementations of @command{sed}, rather than a tab character.
|
---|
2762 |
|
---|
2763 | @end table
|
---|
2764 |
|
---|
2765 | @cindex Greedy regular expression matching
|
---|
2766 | Note that the regular expression matcher is greedy, i.e., matches
|
---|
2767 | are attempted from left to right and, if two or more matches are
|
---|
2768 | possible starting at the same character, it selects the longest.
|
---|
2769 |
|
---|
2770 | @noindent
|
---|
2771 | Examples:
|
---|
2772 | @table @samp
|
---|
2773 | @item abcdef
|
---|
2774 | Matches @samp{abcdef}.
|
---|
2775 |
|
---|
2776 | @item a*b
|
---|
2777 | Matches zero or more @samp{a}s followed by a single
|
---|
2778 | @samp{b}. For example, @samp{b} or @samp{aaaaab}.
|
---|
2779 |
|
---|
2780 | @item a\?b
|
---|
2781 | Matches @samp{b} or @samp{ab}.
|
---|
2782 |
|
---|
2783 | @item a\+b\+
|
---|
2784 | Matches one or more @samp{a}s followed by one or more
|
---|
2785 | @samp{b}s: @samp{ab} is the shortest possible match, but
|
---|
2786 | other examples are @samp{aaaab} or @samp{abbbbb} or
|
---|
2787 | @samp{aaaaaabbbbbbb}.
|
---|
2788 |
|
---|
2789 | @item .*
|
---|
2790 | @itemx .\+
|
---|
2791 | These two both match all the characters in a string;
|
---|
2792 | however, the first matches every string (including the empty
|
---|
2793 | string), while the second matches only strings containing
|
---|
2794 | at least one character.
|
---|
2795 |
|
---|
2796 | @item ^main.*(.*)
|
---|
2797 | This matches a string starting with @samp{main},
|
---|
2798 | followed by an opening and closing
|
---|
2799 | parenthesis. The @samp{n}, @samp{(} and @samp{)} need not
|
---|
2800 | be adjacent.
|
---|
2801 |
|
---|
2802 | @item ^#
|
---|
2803 | This matches a string beginning with @samp{#}.
|
---|
2804 |
|
---|
2805 | @item \\$
|
---|
2806 | This matches a string ending with a single backslash. The
|
---|
2807 | regexp contains two backslashes for escaping.
|
---|
2808 |
|
---|
2809 | @item \$
|
---|
2810 | Instead, this matches a string consisting of a single dollar sign,
|
---|
2811 | because it is escaped.
|
---|
2812 |
|
---|
2813 | @item [a-zA-Z0-9]
|
---|
2814 | In the C locale, this matches any ASCII letters or digits.
|
---|
2815 |
|
---|
2816 | @item [^ @kbd{@key{TAB}}]\+
|
---|
2817 | (Here @kbd{@key{TAB}} stands for a single tab character.)
|
---|
2818 | This matches a string of one or more
|
---|
2819 | characters, none of which is a space or a tab.
|
---|
2820 | Usually this means a word.
|
---|
2821 |
|
---|
2822 | @item ^\(.*\)\n\1$
|
---|
2823 | This matches a string consisting of two equal substrings separated by
|
---|
2824 | a newline.
|
---|
2825 |
|
---|
2826 | @item .\@{9\@}A$
|
---|
2827 | This matches nine characters followed by an @samp{A} at the end of a line.
|
---|
2828 |
|
---|
2829 | @item ^.\@{15\@}A
|
---|
2830 | This matches the start of a string that contains 16 characters,
|
---|
2831 | the last of which is an @samp{A}.
|
---|
2832 |
|
---|
2833 | @end table
|
---|
2834 |
|
---|
2835 |
|
---|
2836 | @node ERE syntax
|
---|
2837 | @section Overview of extended regular expression syntax
|
---|
2838 | @cindex Extended regular expressions, syntax
|
---|
2839 |
|
---|
2840 | The only difference between basic and extended regular expressions is in
|
---|
2841 | the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
|
---|
2842 | braces (@samp{@{@}}), and @samp{|}. While basic regular expressions
|
---|
2843 | require these to be escaped if you want them to behave as special
|
---|
2844 | characters, when using extended regular expressions you must escape
|
---|
2845 | them if you want them @emph{to match a literal character}. @samp{|}
|
---|
2846 | is special here because @samp{\|} is a GNU extension -- standard
|
---|
2847 | basic regular expressions do not provide its functionality.
|
---|
2848 |
|
---|
2849 | @noindent
|
---|
2850 | Examples:
|
---|
2851 | @table @code
|
---|
2852 | @item abc?
|
---|
2853 | becomes @samp{abc\?} when using extended regular expressions. It matches
|
---|
2854 | the literal string @samp{abc?}.
|
---|
2855 |
|
---|
2856 | @item c\+
|
---|
2857 | becomes @samp{c+} when using extended regular expressions. It matches
|
---|
2858 | one or more @samp{c}s.
|
---|
2859 |
|
---|
2860 | @item a\@{3,\@}
|
---|
2861 | becomes @samp{a@{3,@}} when using extended regular expressions. It matches
|
---|
2862 | three or more @samp{a}s.
|
---|
2863 |
|
---|
2864 | @item \(abc\)\@{2,3\@}
|
---|
2865 | becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It
|
---|
2866 | matches either @samp{abcabc} or @samp{abcabcabc}.
|
---|
2867 |
|
---|
2868 | @item \(abc*\)\1
|
---|
2869 | becomes @samp{(abc*)\1} when using extended regular expressions.
|
---|
2870 | Backreferences must still be escaped when using extended regular
|
---|
2871 | expressions.
|
---|
2872 |
|
---|
2873 | @item a\|b
|
---|
2874 | becomes @samp{a|b} when using extended regular expressions. It matches
|
---|
2875 | @samp{a} or @samp{b}.
|
---|
2876 | @end table
|
---|
2877 |
|
---|
2878 | @node Character Classes and Bracket Expressions
|
---|
2879 | @section Character Classes and Bracket Expressions
|
---|
2880 |
|
---|
2881 | @c The 'character class' section is shamelessly copied from grep's manual.
|
---|
2882 |
|
---|
2883 | @cindex bracket expression
|
---|
2884 | @cindex character class
|
---|
2885 | A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
|
---|
2886 | @samp{]}.
|
---|
2887 | It matches any single character in that list;
|
---|
2888 | if the first character of the list is the caret @samp{^},
|
---|
2889 | then it matches any character @strong{not} in the list.
|
---|
2890 | For example, the following command replaces the strings
|
---|
2891 | @samp{gray} or @samp{grey} with @samp{blue}:
|
---|
2892 |
|
---|
2893 | @example
|
---|
2894 | sed 's/gr[ae]y/blue/'
|
---|
2895 | @end example
|
---|
2896 |
|
---|
2897 | @c TODO: fix 'ref' to look good in both HTML and PDF
|
---|
2898 | Bracket expressions can be used in both
|
---|
2899 | @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
|
---|
2900 | regular expressions (that is, with or without the @option{-E}/@option{-r}
|
---|
2901 | options).
|
---|
2902 |
|
---|
2903 | @cindex range expression
|
---|
2904 | Within a bracket expression, a @dfn{range expression} consists of two
|
---|
2905 | characters separated by a hyphen.
|
---|
2906 | It matches any single character that
|
---|
2907 | sorts between the two characters, inclusive.
|
---|
2908 | In the default C locale, the sorting sequence is the native character
|
---|
2909 | order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
|
---|
2910 |
|
---|
2911 |
|
---|
2912 | Finally, certain named classes of characters are predefined within
|
---|
2913 | bracket expressions, as follows.
|
---|
2914 |
|
---|
2915 | These named classes must be used @emph{inside} brackets
|
---|
2916 | themselves. Correct usage:
|
---|
2917 | @example
|
---|
2918 | $ echo 1 | sed 's/[[:digit:]]/X/'
|
---|
2919 | X
|
---|
2920 | @end example
|
---|
2921 |
|
---|
2922 | Incorrect usage is rejected by newer @command{sed} versions.
|
---|
2923 | Older versions accepted it but treated it as a single bracket expression
|
---|
2924 | (which is equivalent to @samp{[dgit:]},
|
---|
2925 | that is, only the characters @var{d/g/i/t/:}):
|
---|
2926 | @example
|
---|
2927 | # current GNU sed versions - incorrect usage rejected
|
---|
2928 | $ echo 1 | sed 's/[:digit:]/X/'
|
---|
2929 | sed: character class syntax is [[:space:]], not [:space:]
|
---|
2930 |
|
---|
2931 | # older GNU sed versions
|
---|
2932 | $ echo 1 | sed 's/[:digit:]/X/'
|
---|
2933 | 1
|
---|
2934 | @end example
|
---|
2935 |
|
---|
2936 |
|
---|
2937 | @cindex classes of characters
|
---|
2938 | @cindex character classes
|
---|
2939 | @cindex named character classes
|
---|
2940 | @table @samp
|
---|
2941 |
|
---|
2942 | @item [:alnum:]
|
---|
2943 | @opindex alnum @r{character class}
|
---|
2944 | @cindex alphanumeric characters
|
---|
2945 | Alphanumeric characters:
|
---|
2946 | @samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
|
---|
2947 | character encoding, this is the same as @samp{[0-9A-Za-z]}.
|
---|
2948 |
|
---|
2949 | @item [:alpha:]
|
---|
2950 | @opindex alpha @r{character class}
|
---|
2951 | @cindex alphabetic characters
|
---|
2952 | Alphabetic characters:
|
---|
2953 | @samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
|
---|
2954 | character encoding, this is the same as @samp{[A-Za-z]}.
|
---|
2955 |
|
---|
2956 | @item [:blank:]
|
---|
2957 | @opindex blank @r{character class}
|
---|
2958 | @cindex blank characters
|
---|
2959 | Blank characters:
|
---|
2960 | space and tab.
|
---|
2961 |
|
---|
2962 | @item [:cntrl:]
|
---|
2963 | @opindex cntrl @r{character class}
|
---|
2964 | @cindex control characters
|
---|
2965 | Control characters.
|
---|
2966 | In ASCII, these characters have octal codes 000
|
---|
2967 | through 037, and 177 (DEL).
|
---|
2968 | In other character sets, these are
|
---|
2969 | the equivalent characters, if any.
|
---|
2970 |
|
---|
2971 | @item [:digit:]
|
---|
2972 | @opindex digit @r{character class}
|
---|
2973 | @cindex digit characters
|
---|
2974 | @cindex numeric characters
|
---|
2975 | Digits: @code{0 1 2 3 4 5 6 7 8 9}.
|
---|
2976 |
|
---|
2977 | @item [:graph:]
|
---|
2978 | @opindex graph @r{character class}
|
---|
2979 | @cindex graphic characters
|
---|
2980 | Graphical characters:
|
---|
2981 | @samp{[:alnum:]} and @samp{[:punct:]}.
|
---|
2982 |
|
---|
2983 | @item [:lower:]
|
---|
2984 | @opindex lower @r{character class}
|
---|
2985 | @cindex lower-case letters
|
---|
2986 | Lower-case letters; in the @samp{C} locale and ASCII character
|
---|
2987 | encoding, this is
|
---|
2988 | @code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
|
---|
2989 |
|
---|
2990 | @item [:print:]
|
---|
2991 | @opindex print @r{character class}
|
---|
2992 | @cindex printable characters
|
---|
2993 | Printable characters:
|
---|
2994 | @samp{[:alnum:]}, @samp{[:punct:]}, and space.
|
---|
2995 |
|
---|
2996 | @item [:punct:]
|
---|
2997 | @opindex punct @r{character class}
|
---|
2998 | @cindex punctuation characters
|
---|
2999 | Punctuation characters; in the @samp{C} locale and ASCII character
|
---|
3000 | encoding, this is
|
---|
3001 | @code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}.
|
---|
3002 |
|
---|
3003 | @item [:space:]
|
---|
3004 | @opindex space @r{character class}
|
---|
3005 | @cindex space characters
|
---|
3006 | @cindex whitespace characters
|
---|
3007 | Space characters: in the @samp{C} locale, this is
|
---|
3008 | tab, newline, vertical tab, form feed, carriage return, and space.
|
---|
3009 |
|
---|
3010 |
|
---|
3011 | @item [:upper:]
|
---|
3012 | @opindex upper @r{character class}
|
---|
3013 | @cindex upper-case letters
|
---|
3014 | Upper-case letters: in the @samp{C} locale and ASCII character
|
---|
3015 | encoding, this is
|
---|
3016 | @code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
|
---|
3017 |
|
---|
3018 | @item [:xdigit:]
|
---|
3019 | @opindex xdigit @r{character class}
|
---|
3020 | @cindex xdigit class
|
---|
3021 | @cindex hexadecimal digits
|
---|
3022 | Hexadecimal digits:
|
---|
3023 | @code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
|
---|
3024 |
|
---|
3025 | @end table
|
---|
3026 | Note that the brackets in these class names are
|
---|
3027 | part of the symbolic names, and must be included in addition to
|
---|
3028 | the brackets delimiting the bracket expression.
|
---|
3029 |
|
---|
3030 | Most meta-characters lose their special meaning inside bracket expressions:
|
---|
3031 |
|
---|
3032 | @table @samp
|
---|
3033 | @item ]
|
---|
3034 | ends the bracket expression if it's not the first list item.
|
---|
3035 | So, if you want to make the @samp{]} character a list item,
|
---|
3036 | you must put it first.
|
---|
3037 |
|
---|
3038 | @item -
|
---|
3039 | represents the range if it's not first or last in a list or the ending point
|
---|
3040 | of a range.
|
---|
3041 |
|
---|
3042 | @item ^
|
---|
3043 | represents the characters not in the list.
|
---|
3044 | If you want to make the @samp{^}
|
---|
3045 | character a list item, place it anywhere but first.
|
---|
3046 | @end table
|
---|
3047 |
|
---|
3048 | TODO: incorporate this paragraph (copied verbatim from BRE section).
|
---|
3049 |
|
---|
3050 | @cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
|
---|
3051 | The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
|
---|
3052 | are normally not special within @var{list}. For example, @code{[\*]}
|
---|
3053 | matches either @samp{\} or @samp{*}, because the @code{\} is not
|
---|
3054 | special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and
|
---|
3055 | @code{[:space:]} are special within @var{list} and represent collating
|
---|
3056 | symbols, equivalence classes, and character classes, respectively, and
|
---|
3057 | @code{[} is therefore special within @var{list} when it is followed by
|
---|
3058 | @code{.}, @code{=}, or @code{:}. Also, when not in
|
---|
3059 | @env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
|
---|
3060 | @code{\t} are recognized within @var{list}. @xref{Escapes}.
|
---|
3061 | @c ********
|
---|
3062 |
|
---|
3063 |
|
---|
3064 | @c TODO: improve explanation about collation classes and equivalence classes
|
---|
3065 | @c perhaps dedicate a section to Locales ??
|
---|
3066 |
|
---|
3067 | @table @samp
|
---|
3068 | @item [.
|
---|
3069 | represents the open collating symbol.
|
---|
3070 |
|
---|
3071 | @item .]
|
---|
3072 | represents the close collating symbol.
|
---|
3073 |
|
---|
3074 | @item [=
|
---|
3075 | represents the open equivalence class.
|
---|
3076 |
|
---|
3077 | @item =]
|
---|
3078 | represents the close equivalence class.
|
---|
3079 |
|
---|
3080 | @item [:
|
---|
3081 | represents the open character class symbol, and should be followed by a
|
---|
3082 | valid character class name.
|
---|
3083 |
|
---|
3084 | @item :]
|
---|
3085 | represents the close character class symbol.
|
---|
3086 | @end table
|
---|
3087 |
|
---|
3088 |
|
---|
3089 | @node regexp extensions
|
---|
3090 | @section regular expression extensions
|
---|
3091 |
|
---|
3092 | The following sequences have special meaning inside regular expressions
|
---|
3093 | (used in @ref{Regexp Addresses,,addresses} and the @code{s} command).
|
---|
3094 |
|
---|
3095 | These can be used in both
|
---|
3096 | @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
|
---|
3097 | regular expressions (that is, with or without the @option{-E}/@option{-r}
|
---|
3098 | options).
|
---|
3099 |
|
---|
3100 | @table @code
|
---|
3101 | @item \w
|
---|
3102 | Matches any ``word'' character. A ``word'' character is any
|
---|
3103 | letter or digit or the underscore character.
|
---|
3104 |
|
---|
3105 | @example
|
---|
3106 | $ echo "abc %-= def." | sed 's/\w/X/g'
|
---|
3107 | XXX %-= XXX.
|
---|
3108 | @end example
|
---|
3109 |
|
---|
3110 |
|
---|
3111 | @item \W
|
---|
3112 | Matches any ``non-word'' character.
|
---|
3113 |
|
---|
3114 | @example
|
---|
3115 | $ echo "abc %-= def." | sed 's/\W/X/g'
|
---|
3116 | abcXXXXXdefX
|
---|
3117 | @end example
|
---|
3118 |
|
---|
3119 |
|
---|
3120 | @item \b
|
---|
3121 | Matches a word boundary; that is it matches if the character
|
---|
3122 | to the left is a ``word'' character and the character to the
|
---|
3123 | right is a ``non-word'' character, or vice-versa.
|
---|
3124 |
|
---|
3125 | @example
|
---|
3126 | $ echo "abc %-= def." | sed 's/\b/X/g'
|
---|
3127 | XabcX %-= XdefX.
|
---|
3128 | @end example
|
---|
3129 |
|
---|
3130 |
|
---|
3131 | @item \B
|
---|
3132 | Matches everywhere but on a word boundary; that is it matches
|
---|
3133 | if the character to the left and the character to the right
|
---|
3134 | are either both ``word'' characters or both ``non-word''
|
---|
3135 | characters.
|
---|
3136 |
|
---|
3137 | @example
|
---|
3138 | $ echo "abc %-= def." | sed 's/\B/X/g'
|
---|
3139 | aXbXc X%X-X=X dXeXf.X
|
---|
3140 | @end example
|
---|
3141 |
|
---|
3142 |
|
---|
3143 | @item \s
|
---|
3144 | Matches whitespace characters (spaces and tabs).
|
---|
3145 | Newlines embedded in the pattern/hold spaces will also match:
|
---|
3146 |
|
---|
3147 | @example
|
---|
3148 | $ echo "abc %-= def." | sed 's/\s/X/g'
|
---|
3149 | abcX%-=Xdef.
|
---|
3150 | @end example
|
---|
3151 |
|
---|
3152 |
|
---|
3153 | @item \S
|
---|
3154 | Matches non-whitespace characters.
|
---|
3155 |
|
---|
3156 | @example
|
---|
3157 | $ echo "abc %-= def." | sed 's/\S/X/g'
|
---|
3158 | XXX XXX XXXX
|
---|
3159 | @end example
|
---|
3160 |
|
---|
3161 |
|
---|
3162 | @item \<
|
---|
3163 | Matches the beginning of a word.
|
---|
3164 |
|
---|
3165 | @example
|
---|
3166 | $ echo "abc %-= def." | sed 's/\</X/g'
|
---|
3167 | Xabc %-= Xdef.
|
---|
3168 | @end example
|
---|
3169 |
|
---|
3170 |
|
---|
3171 | @item \>
|
---|
3172 | Matches the end of a word.
|
---|
3173 |
|
---|
3174 | @example
|
---|
3175 | $ echo "abc %-= def." | sed 's/\>/X/g'
|
---|
3176 | abcX %-= defX.
|
---|
3177 | @end example
|
---|
3178 |
|
---|
3179 |
|
---|
3180 | @item \`
|
---|
3181 | Matches only at the start of pattern space. This is different
|
---|
3182 | from @code{^} in multi-line mode.
|
---|
3183 |
|
---|
3184 | Compare the following two examples:
|
---|
3185 |
|
---|
3186 | @example
|
---|
3187 | $ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm'
|
---|
3188 | Xa
|
---|
3189 | Xb
|
---|
3190 | Xc
|
---|
3191 |
|
---|
3192 | $ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm'
|
---|
3193 | Xa
|
---|
3194 | b
|
---|
3195 | c
|
---|
3196 | @end example
|
---|
3197 |
|
---|
3198 | @item \'
|
---|
3199 | Matches only at the end of pattern space. This is different
|
---|
3200 | from @code{$} in multi-line mode.
|
---|
3201 |
|
---|
3202 |
|
---|
3203 |
|
---|
3204 | @end table
|
---|
3205 |
|
---|
3206 |
|
---|
3207 | @node Back-references and Subexpressions
|
---|
3208 | @section Back-references and Subexpressions
|
---|
3209 | @cindex subexpression
|
---|
3210 | @cindex back-reference
|
---|
3211 |
|
---|
3212 | @dfn{back-references} are regular expression commands which refer to a
|
---|
3213 | previous part of the matched regular expression. Back-references are
|
---|
3214 | specified with backslash and a single digit (e.g. @samp{\1}). The
|
---|
3215 | part of the regular expression they refer to is called a
|
---|
3216 | @dfn{subexpression}, and is designated with parentheses.
|
---|
3217 |
|
---|
3218 | Back-references and subexpressions are used in two cases: in the
|
---|
3219 | regular expression search pattern, and in the @var{replacement} part
|
---|
3220 | of the @command{s} command (@pxref{Regexp Addresses,,Regular
|
---|
3221 | Expression Addresses} and @ref{The "s" Command}).
|
---|
3222 |
|
---|
3223 | In a regular expression pattern, back-references are used to match
|
---|
3224 | the same content as a previously matched subexpression. In the
|
---|
3225 | following example, the subexpression is @samp{.} - any single
|
---|
3226 | character (being surrounded by parentheses makes it a
|
---|
3227 | subexpression). The back-reference @samp{\1} asks to match the same
|
---|
3228 | content (same character) as the sub-expression.
|
---|
3229 |
|
---|
3230 | The command below matches words starting with any character,
|
---|
3231 | followed by the letter @samp{o}, followed by the same character as the
|
---|
3232 | first.
|
---|
3233 |
|
---|
3234 | @example
|
---|
3235 | $ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
|
---|
3236 | bob
|
---|
3237 | mom
|
---|
3238 | non
|
---|
3239 | pop
|
---|
3240 | sos
|
---|
3241 | tot
|
---|
3242 | wow
|
---|
3243 | @end example
|
---|
3244 |
|
---|
3245 | Multiple subexpressions are automatically numbered from
|
---|
3246 | left-to-right. This command searches for 6-letter
|
---|
3247 | palindromes (the first three letters are 3 subexpressions,
|
---|
3248 | followed by 3 back-references in reverse order):
|
---|
3249 |
|
---|
3250 | @example
|
---|
3251 | $ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
|
---|
3252 | redder
|
---|
3253 | @end example
|
---|
3254 |
|
---|
3255 | In the @command{s} command, back-references can be
|
---|
3256 | used in the @var{replacement} part to refer back to subexpressions in
|
---|
3257 | the @var{regexp} part.
|
---|
3258 |
|
---|
3259 | The following example uses two subexpressions in the regular
|
---|
3260 | expression to match two space-separated words. The back-references in
|
---|
3261 | the @var{replacement} part prints the words in a different order:
|
---|
3262 |
|
---|
3263 | @example
|
---|
3264 | $ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
|
---|
3265 | The name is Bond, James Bond.
|
---|
3266 | @end example
|
---|
3267 |
|
---|
3268 |
|
---|
3269 | When used with alternation, if the group does not participate in the
|
---|
3270 | match then the back-reference makes the whole match fail. For
|
---|
3271 | example, @samp{a(.)|b\1} will not match @samp{ba}. When multiple
|
---|
3272 | regular expressions are given with @option{-e} or from a file
|
---|
3273 | (@samp{-f @var{file}}), back-references are local to each expression.
|
---|
3274 |
|
---|
3275 |
|
---|
3276 | @node Escapes
|
---|
3277 | @section Escape Sequences - specifying special characters
|
---|
3278 |
|
---|
3279 | @cindex GNU extensions, special escapes
|
---|
3280 | Until this chapter, we have only encountered escapes of the form
|
---|
3281 | @samp{\^}, which tell @command{sed} not to interpret the circumflex
|
---|
3282 | as a special character, but rather to take it literally. For
|
---|
3283 | example, @samp{\*} matches a single asterisk rather than zero
|
---|
3284 | or more backslashes.
|
---|
3285 |
|
---|
3286 | @cindex @code{POSIXLY_CORRECT} behavior, escapes
|
---|
3287 | This chapter introduces another kind of escape@footnote{All
|
---|
3288 | the escapes introduced here are GNU
|
---|
3289 | extensions, with the exception of @code{\n}. In basic regular
|
---|
3290 | expression mode, setting @code{POSIXLY_CORRECT} disables them inside
|
---|
3291 | bracket expressions.}---that
|
---|
3292 | is, escapes that are applied to a character or sequence of characters
|
---|
3293 | that ordinarily are taken literally, and that @command{sed} replaces
|
---|
3294 | with a special character. This provides a way
|
---|
3295 | of encoding non-printable characters in patterns in a visible manner.
|
---|
3296 | There is no restriction on the appearance of non-printing characters
|
---|
3297 | in a @command{sed} script but when a script is being prepared in the
|
---|
3298 | shell or by text editing, it is usually easier to use one of
|
---|
3299 | the following escape sequences than the binary character it
|
---|
3300 | represents:
|
---|
3301 |
|
---|
3302 | The list of these escapes is:
|
---|
3303 |
|
---|
3304 | @table @code
|
---|
3305 | @item \a
|
---|
3306 | Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7).
|
---|
3307 |
|
---|
3308 | @item \f
|
---|
3309 | Produces or matches a form feed (@sc{ascii} 12).
|
---|
3310 |
|
---|
3311 | @item \n
|
---|
3312 | Produces or matches a newline (@sc{ascii} 10).
|
---|
3313 |
|
---|
3314 | @item \r
|
---|
3315 | Produces or matches a carriage return (@sc{ascii} 13).
|
---|
3316 |
|
---|
3317 | @item \t
|
---|
3318 | Produces or matches a horizontal tab (@sc{ascii} 9).
|
---|
3319 |
|
---|
3320 | @item \v
|
---|
3321 | Produces or matches a so called ``vertical tab'' (@sc{ascii} 11).
|
---|
3322 |
|
---|
3323 | @item \c@var{x}
|
---|
3324 | Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is
|
---|
3325 | any character. The precise effect of @samp{\c@var{x}} is as follows:
|
---|
3326 | if @var{x} is a lower case letter, it is converted to upper case.
|
---|
3327 | Then bit 6 of the character (hex 40) is inverted. Thus @samp{\cz} becomes
|
---|
3328 | hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B.
|
---|
3329 |
|
---|
3330 | @item \d@var{xxx}
|
---|
3331 | Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}.
|
---|
3332 |
|
---|
3333 | @item \o@var{xxx}
|
---|
3334 | Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
|
---|
3335 |
|
---|
3336 | @item \x@var{xx}
|
---|
3337 | Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}.
|
---|
3338 | @end table
|
---|
3339 |
|
---|
3340 | @samp{\b} (backspace) was omitted because of the conflict with
|
---|
3341 | the existing ``word boundary'' meaning.
|
---|
3342 |
|
---|
3343 | @subsection Escaping Precedence
|
---|
3344 |
|
---|
3345 | @value{SSED} processes escape sequences @emph{before} passing
|
---|
3346 | the text onto the regular-expression matching of the @command{s///} command
|
---|
3347 | and Address matching. Thus the following two commands are equivalent
|
---|
3348 | (@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}):
|
---|
3349 |
|
---|
3350 | @codequoteundirected on
|
---|
3351 | @codequotebacktick on
|
---|
3352 | @example
|
---|
3353 | @group
|
---|
3354 | $ echo 'a^c' | sed 's/^/b/'
|
---|
3355 | ba^c
|
---|
3356 |
|
---|
3357 | $ echo 'a^c' | sed 's/\x5e/b/'
|
---|
3358 | ba^c
|
---|
3359 | @end group
|
---|
3360 | @end example
|
---|
3361 | @codequoteundirected off
|
---|
3362 | @codequotebacktick off
|
---|
3363 |
|
---|
3364 | As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal
|
---|
3365 | @sc{ascii} values of @samp{[},@samp{]}, respectively):
|
---|
3366 |
|
---|
3367 | @codequoteundirected on
|
---|
3368 | @codequotebacktick on
|
---|
3369 | @example
|
---|
3370 | @group
|
---|
3371 | $ echo abc | sed 's/[a]/x/'
|
---|
3372 | Xbc
|
---|
3373 | $ echo abc | sed 's/\x5ba\x5d/x/'
|
---|
3374 | Xbc
|
---|
3375 | @end group
|
---|
3376 | @end example
|
---|
3377 | @codequoteundirected off
|
---|
3378 | @codequotebacktick off
|
---|
3379 |
|
---|
3380 | However it is recommended to avoid such special characters
|
---|
3381 | due to unexpected edge-cases. For example, the following
|
---|
3382 | are not equivalent:
|
---|
3383 |
|
---|
3384 | @codequoteundirected on
|
---|
3385 | @codequotebacktick on
|
---|
3386 | @example
|
---|
3387 | @group
|
---|
3388 | $ echo 'a^c' | sed 's/\^/b/'
|
---|
3389 | abc
|
---|
3390 |
|
---|
3391 | $ echo 'a^c' | sed 's/\\\x5e/b/'
|
---|
3392 | a^c
|
---|
3393 | @end group
|
---|
3394 | @end example
|
---|
3395 | @codequoteundirected off
|
---|
3396 | @codequotebacktick off
|
---|
3397 |
|
---|
3398 | @c also: this fails in different places:
|
---|
3399 | @c $ sed 's/[//'
|
---|
3400 | @c sed: -e expression #1, char 5: unterminated `s' command
|
---|
3401 | @c $ sed 's/\x5b//'
|
---|
3402 | @c sed: -e expression #1, char 8: Invalid regular expression
|
---|
3403 | @c
|
---|
3404 | @c which is OK but confusing to explain why (the first
|
---|
3405 | @c fails in compile.c:snarf_char_class while the second
|
---|
3406 | @c is passed to the regex engine and then fails).
|
---|
3407 |
|
---|
3408 |
|
---|
3409 | @node Locale Considerations
|
---|
3410 | @section Multibyte characters and Locale Considerations
|
---|
3411 |
|
---|
3412 | @value{SSED} processes valid multibyte characters in multibyte locales
|
---|
3413 | (e.g. @code{UTF-8}). @footnote{Some regexp edge-cases depends on the
|
---|
3414 | operating system and libc implementation. The examples shown are known
|
---|
3415 | to work as-expected on GNU/Linux systems using glibc.}
|
---|
3416 |
|
---|
3417 | @noindent The following example uses the Greek letter Capital Sigma
|
---|
3418 | (@value{ucsigma},
|
---|
3419 | Unicode code point @code{0x03A3}). In a @code{UTF-8} locale,
|
---|
3420 | @command{sed} correctly processes the Sigma as one character despite
|
---|
3421 | it being 2 octets (bytes):
|
---|
3422 |
|
---|
3423 | @codequoteundirected on
|
---|
3424 | @codequotebacktick on
|
---|
3425 | @example
|
---|
3426 | @group
|
---|
3427 | $ locale | grep LANG
|
---|
3428 | LANG=en_US.UTF-8
|
---|
3429 |
|
---|
3430 | $ printf 'a\u03A3b'
|
---|
3431 | a@value{ucsigma}b
|
---|
3432 |
|
---|
3433 | $ printf 'a\u03A3b' | sed 's/./X/g'
|
---|
3434 | XXX
|
---|
3435 |
|
---|
3436 | $ printf 'a\u03A3b' | od -tx1 -An
|
---|
3437 | 61 ce a3 62
|
---|
3438 | @end group
|
---|
3439 | @end example
|
---|
3440 | @codequoteundirected off
|
---|
3441 | @codequotebacktick off
|
---|
3442 |
|
---|
3443 | @noindent
|
---|
3444 | To force @command{sed} to process octets separately, use the @code{C} locale
|
---|
3445 | (also known as the @code{POSIX} locale):
|
---|
3446 |
|
---|
3447 | @codequoteundirected on
|
---|
3448 | @codequotebacktick on
|
---|
3449 | @example
|
---|
3450 | $ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g'
|
---|
3451 | XXXX
|
---|
3452 | @end example
|
---|
3453 | @codequoteundirected off
|
---|
3454 | @codequotebacktick off
|
---|
3455 |
|
---|
3456 | @subsection Invalid multibyte characters
|
---|
3457 |
|
---|
3458 | @command{sed}'s regular expressions @emph{do not} match
|
---|
3459 | invalid multibyte sequences in a multibyte locale.
|
---|
3460 |
|
---|
3461 | @noindent
|
---|
3462 | In the following examples, the ascii value @code{0xCE} is
|
---|
3463 | an incomplete multibyte character (shown here as @value{unicodeFFFD}).
|
---|
3464 | The regular expression @samp{.} does not match it:
|
---|
3465 |
|
---|
3466 | @codequoteundirected on
|
---|
3467 | @codequotebacktick on
|
---|
3468 | @example
|
---|
3469 | @group
|
---|
3470 | $ printf 'a\xCEb\n'
|
---|
3471 | a@value{unicodeFFFD}e
|
---|
3472 |
|
---|
3473 | $ printf 'a\xCEb\n' | sed 's/./X/g'
|
---|
3474 | X@value{unicodeFFFD}X
|
---|
3475 |
|
---|
3476 | $ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An
|
---|
3477 | 58 ce 58 0a
|
---|
3478 | X X \n
|
---|
3479 | @end group
|
---|
3480 | @end example
|
---|
3481 | @codequoteundirected off
|
---|
3482 | @codequotebacktick off
|
---|
3483 |
|
---|
3484 | @noindent Similarly, the 'catch-all' regular expression @samp{.*} does not
|
---|
3485 | match the entire line:
|
---|
3486 |
|
---|
3487 | @codequoteundirected on
|
---|
3488 | @codequotebacktick on
|
---|
3489 | @example
|
---|
3490 | @group
|
---|
3491 | $ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An
|
---|
3492 | ce 63 0a
|
---|
3493 | c \n
|
---|
3494 | @end group
|
---|
3495 | @end example
|
---|
3496 | @codequoteundirected off
|
---|
3497 | @codequotebacktick off
|
---|
3498 |
|
---|
3499 | @noindent
|
---|
3500 | @value{SSED} offers the special @command{z} command to clear the
|
---|
3501 | current pattern space regardless of invalid multibyte characters
|
---|
3502 | (i.e. it works like @code{s/.*//} but also removes invalid multibyte
|
---|
3503 | characters):
|
---|
3504 |
|
---|
3505 | @codequoteundirected on
|
---|
3506 | @codequotebacktick on
|
---|
3507 | @example
|
---|
3508 | @group
|
---|
3509 | $ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An
|
---|
3510 | 0a
|
---|
3511 | \n
|
---|
3512 | @end group
|
---|
3513 | @end example
|
---|
3514 | @codequoteundirected off
|
---|
3515 | @codequotebacktick off
|
---|
3516 |
|
---|
3517 | @noindent Alternatively, force the @code{C} locale to process
|
---|
3518 | each octet separately (every octet is a valid character in the @code{C}
|
---|
3519 | locale):
|
---|
3520 |
|
---|
3521 | @codequoteundirected on
|
---|
3522 | @codequotebacktick on
|
---|
3523 | @example
|
---|
3524 | @group
|
---|
3525 | $ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An
|
---|
3526 | 0a
|
---|
3527 | \n
|
---|
3528 | @end group
|
---|
3529 | @end example
|
---|
3530 | @codequoteundirected off
|
---|
3531 | @codequotebacktick off
|
---|
3532 |
|
---|
3533 |
|
---|
3534 | @command{sed}'s inability to process invalid multibyte characters
|
---|
3535 | can be used to detect such invalid sequences in a file.
|
---|
3536 | In the following examples, the @code{\xCE\xCE} is an invalid
|
---|
3537 | multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence
|
---|
3538 | (of the Greek Sigma character).
|
---|
3539 |
|
---|
3540 | @noindent
|
---|
3541 | The following @command{sed} program removes all valid
|
---|
3542 | characters using @code{s/.//g}. Any content left in the pattern space
|
---|
3543 | (the invalid characters) are added to the hold space using the
|
---|
3544 | @code{H} command. On the last line (@code{$}), the hold space is retrieved
|
---|
3545 | (@code{x}), newlines are removed (@code{s/\n//g}), and any remaining
|
---|
3546 | octets are printed unambiguously (@code{l}). Thus, any invalid
|
---|
3547 | multibyte sequences are printed as octal values:
|
---|
3548 |
|
---|
3549 | @codequoteundirected on
|
---|
3550 | @codequotebacktick on
|
---|
3551 | @example
|
---|
3552 | @group
|
---|
3553 | $ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
|
---|
3554 |
|
---|
3555 | $ cat invalid.txt
|
---|
3556 | ab
|
---|
3557 | c
|
---|
3558 | @value{unicodeFFFD}@value{unicodeFFFD}de
|
---|
3559 | @value{ucsigma}f
|
---|
3560 |
|
---|
3561 | $ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt
|
---|
3562 | \316\316$
|
---|
3563 | @end group
|
---|
3564 | @end example
|
---|
3565 | @codequoteundirected off
|
---|
3566 | @codequotebacktick off
|
---|
3567 |
|
---|
3568 | @noindent With a few more commands, @command{sed} can print
|
---|
3569 | the exact line number corresponding to each invalid characters (line 3).
|
---|
3570 | These characters can then be removed by forcing the @code{C} locale
|
---|
3571 | and using octal escape sequences:
|
---|
3572 |
|
---|
3573 | @codequoteundirected on
|
---|
3574 | @codequotebacktick on
|
---|
3575 | @example
|
---|
3576 | $ sed -n 's/.//g;=;l' invalid.txt | paste - - | awk '$2!="$"'
|
---|
3577 | 3 \316\316$
|
---|
3578 |
|
---|
3579 | $ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
|
---|
3580 | @end example
|
---|
3581 | @codequoteundirected off
|
---|
3582 | @codequotebacktick off
|
---|
3583 |
|
---|
3584 | @subsection Upper/Lower case conversion
|
---|
3585 |
|
---|
3586 |
|
---|
3587 | @value{SSED}'s substitute command (@code{s}) supports upper/lower
|
---|
3588 | case conversions using @code{\U},@code{\L} codes.
|
---|
3589 | These conversions support multibyte characters:
|
---|
3590 |
|
---|
3591 | @codequoteundirected on
|
---|
3592 | @codequotebacktick on
|
---|
3593 | @example
|
---|
3594 | $ printf 'ABC\u03a3\n'
|
---|
3595 | ABC@value{ucsigma}
|
---|
3596 |
|
---|
3597 | $ printf 'ABC\u03a3\n' | sed 's/.*/\L&/'
|
---|
3598 | abc@value{lcsigma}
|
---|
3599 | @end example
|
---|
3600 | @codequoteundirected off
|
---|
3601 | @codequotebacktick off
|
---|
3602 |
|
---|
3603 | @noindent
|
---|
3604 | @xref{The "s" Command}.
|
---|
3605 |
|
---|
3606 |
|
---|
3607 | @subsection Multibyte regexp character classes
|
---|
3608 |
|
---|
3609 | @c TODO: fix following paragraphs (copied verbatim from 'bracket
|
---|
3610 | @c expression' section).
|
---|
3611 |
|
---|
3612 | In other locales, the sorting sequence is not specified, and
|
---|
3613 | @samp{[a-d]} might be equivalent to @samp{[abcd]} or to
|
---|
3614 | @samp{[aBbCcDd]}, or it might fail to match any character, or the set of
|
---|
3615 | characters that it matches might even be erratic.
|
---|
3616 | To obtain the traditional interpretation
|
---|
3617 | of bracket expressions, you can use the @samp{C} locale by setting the
|
---|
3618 | @env{LC_ALL} environment variable to the value @samp{C}.
|
---|
3619 |
|
---|
3620 | @example
|
---|
3621 | # TODO: is there any real-world system/locale where 'A'
|
---|
3622 | # is replaced by '-' ?
|
---|
3623 | $ echo A | sed 's/[a-z]/-/'
|
---|
3624 | A
|
---|
3625 | @end example
|
---|
3626 |
|
---|
3627 | Their interpretation depends on the @env{LC_CTYPE} locale;
|
---|
3628 | for example, @samp{[[:alnum:]]} means the character class of numbers and letters
|
---|
3629 | in the current locale.
|
---|
3630 |
|
---|
3631 | TODO: show example of collation
|
---|
3632 |
|
---|
3633 | @codequoteundirected on
|
---|
3634 | @codequotebacktick on
|
---|
3635 | @example
|
---|
3636 | # TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
|
---|
3637 | $ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
|
---|
3638 | clichX
|
---|
3639 | @end example
|
---|
3640 | @codequoteundirected off
|
---|
3641 | @codequotebacktick off
|
---|
3642 |
|
---|
3643 |
|
---|
3644 | @node advanced sed
|
---|
3645 | @chapter Advanced @command{sed}: cycles and buffers
|
---|
3646 |
|
---|
3647 | @menu
|
---|
3648 | * Execution Cycle:: How @command{sed} works
|
---|
3649 | * Hold and Pattern Buffers::
|
---|
3650 | * Multiline techniques:: Using D,G,H,N,P to process multiple lines
|
---|
3651 | * Branching and flow control::
|
---|
3652 | @end menu
|
---|
3653 |
|
---|
3654 | @node Execution Cycle
|
---|
3655 | @section How @command{sed} Works
|
---|
3656 |
|
---|
3657 | @cindex Buffer spaces, pattern and hold
|
---|
3658 | @cindex Spaces, pattern and hold
|
---|
3659 | @cindex Pattern space, definition
|
---|
3660 | @cindex Hold space, definition
|
---|
3661 | @command{sed} maintains two data buffers: the active @emph{pattern} space,
|
---|
3662 | and the auxiliary @emph{hold} space. Both are initially empty.
|
---|
3663 |
|
---|
3664 | @command{sed} operates by performing the following cycle on each
|
---|
3665 | line of input: first, @command{sed} reads one line from the input
|
---|
3666 | stream, removes any trailing newline, and places it in the pattern space.
|
---|
3667 | Then commands are executed; each command can have an address associated
|
---|
3668 | to it: addresses are a kind of condition code, and a command is only
|
---|
3669 | executed if the condition is verified before the command is to be
|
---|
3670 | executed.
|
---|
3671 |
|
---|
3672 | When the end of the script is reached, unless the @option{-n} option
|
---|
3673 | is in use, the contents of pattern space are printed out to the output
|
---|
3674 | stream, adding back the trailing newline if it was removed.@footnote{Actually,
|
---|
3675 | if @command{sed} prints a line without the terminating newline, it will
|
---|
3676 | nevertheless print the missing newline as soon as more text is sent to
|
---|
3677 | the same output stream, which gives the ``least expected surprise''
|
---|
3678 | even though it does not make commands like @samp{sed -n p} exactly
|
---|
3679 | identical to @command{cat}.} Then the next cycle starts for the next
|
---|
3680 | input line.
|
---|
3681 |
|
---|
3682 | Unless special commands (like @samp{D}) are used, the pattern space is
|
---|
3683 | deleted between two cycles. The hold space, on the other hand, keeps
|
---|
3684 | its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
|
---|
3685 | @samp{g}, @samp{G} to move data between both buffers).
|
---|
3686 |
|
---|
3687 | @node Hold and Pattern Buffers
|
---|
3688 | @section Hold and Pattern Buffers
|
---|
3689 |
|
---|
3690 | TODO
|
---|
3691 |
|
---|
3692 | @node Multiline techniques
|
---|
3693 | @section Multiline techniques - using D,G,H,N,P to process multiple lines
|
---|
3694 |
|
---|
3695 | Multiple lines can be processed as one buffer using the
|
---|
3696 | @code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to
|
---|
3697 | their lowercase counterparts (@code{d},@code{g},
|
---|
3698 | @code{h},@code{n},@code{p}), except that these commands append or
|
---|
3699 | subtract data while respecting embedded newlines - allowing adding and
|
---|
3700 | removing lines from the pattern and hold spaces.
|
---|
3701 |
|
---|
3702 | They operate as follows:
|
---|
3703 | @table @code
|
---|
3704 | @item D
|
---|
3705 | @emph{deletes} line from the pattern space until the first newline,
|
---|
3706 | and restarts the cycle.
|
---|
3707 |
|
---|
3708 | @item G
|
---|
3709 | @emph{appends} line from the hold space to the pattern space, with a
|
---|
3710 | newline before it.
|
---|
3711 |
|
---|
3712 | @item H
|
---|
3713 | @emph{appends} line from the pattern space to the hold space, with a
|
---|
3714 | newline before it.
|
---|
3715 |
|
---|
3716 | @item N
|
---|
3717 | @emph{appends} line from the input file to the pattern space.
|
---|
3718 |
|
---|
3719 | @item P
|
---|
3720 | @emph{prints} line from the pattern space until the first newline.
|
---|
3721 |
|
---|
3722 | @end table
|
---|
3723 |
|
---|
3724 |
|
---|
3725 | The following example illustrates the operation of @code{N} and
|
---|
3726 | @code{D} commands:
|
---|
3727 |
|
---|
3728 | @codequoteundirected on
|
---|
3729 | @codequotebacktick on
|
---|
3730 | @example
|
---|
3731 | @group
|
---|
3732 | $ seq 6 | sed -n 'N;l;D'
|
---|
3733 | 1\n2$
|
---|
3734 | 2\n3$
|
---|
3735 | 3\n4$
|
---|
3736 | 4\n5$
|
---|
3737 | 5\n6$
|
---|
3738 | @end group
|
---|
3739 | @end example
|
---|
3740 | @codequoteundirected off
|
---|
3741 | @codequotebacktick off
|
---|
3742 |
|
---|
3743 | @enumerate
|
---|
3744 | @item
|
---|
3745 | @command{sed} starts by reading the first line into the pattern space
|
---|
3746 | (i.e. @samp{1}).
|
---|
3747 | @item
|
---|
3748 | At the beginning of every cycle, the @code{N}
|
---|
3749 | command appends a newline and the next line to the pattern space
|
---|
3750 | (i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle).
|
---|
3751 | @item
|
---|
3752 | The @code{l} command prints the content of the pattern space
|
---|
3753 | unambiguously.
|
---|
3754 | @item
|
---|
3755 | The @code{D} command then removes the content of pattern
|
---|
3756 | space up to the first newline (leaving @samp{2} at the end of
|
---|
3757 | the first cycle).
|
---|
3758 | @item
|
---|
3759 | At the next cycle the @code{N} command appends a
|
---|
3760 | newline and the next input line to the pattern space
|
---|
3761 | (e.g. @samp{2}, @samp{\n}, @samp{3}).
|
---|
3762 | @end enumerate
|
---|
3763 |
|
---|
3764 |
|
---|
3765 | @cindex processing paragraphs
|
---|
3766 | @cindex paragraphs, processing
|
---|
3767 | A common technique to process blocks of text such as paragraphs
|
---|
3768 | (instead of line-by-line) is using the following construct:
|
---|
3769 |
|
---|
3770 | @codequoteundirected on
|
---|
3771 | @codequotebacktick on
|
---|
3772 | @example
|
---|
3773 | sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/'
|
---|
3774 | @end example
|
---|
3775 | @codequoteundirected off
|
---|
3776 | @codequotebacktick off
|
---|
3777 |
|
---|
3778 | @enumerate
|
---|
3779 | @item
|
---|
3780 | The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines,
|
---|
3781 | and adds the current line (in the pattern space) to the hold space.
|
---|
3782 | On all lines except the last, the pattern space is deleted and the cycle is
|
---|
3783 | restarted.
|
---|
3784 |
|
---|
3785 | @item
|
---|
3786 | The other expressions @code{x} and @code{s} are executed only on empty
|
---|
3787 | lines (i.e. paragraph separators). The @code{x} command fetches the
|
---|
3788 | accumulated lines from the hold space back to the pattern space. The
|
---|
3789 | @code{s///} command then operates on all the text in the paragraph
|
---|
3790 | (including the embedded newlines).
|
---|
3791 | @end enumerate
|
---|
3792 |
|
---|
3793 | The following example demonstrates this technique:
|
---|
3794 | @codequoteundirected on
|
---|
3795 | @codequotebacktick on
|
---|
3796 | @example
|
---|
3797 | @group
|
---|
3798 | $ cat input.txt
|
---|
3799 | a a a aa aaa
|
---|
3800 | aaaa aaaa aa
|
---|
3801 | aaaa aaa aaa
|
---|
3802 |
|
---|
3803 | bbbb bbb bbb
|
---|
3804 | bb bb bbb bb
|
---|
3805 | bbbbbbbb bbb
|
---|
3806 |
|
---|
3807 | ccc ccc cccc
|
---|
3808 | cccc ccccc c
|
---|
3809 | cc cc cc cc
|
---|
3810 |
|
---|
3811 | $ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
|
---|
3812 |
|
---|
3813 | START-->
|
---|
3814 | a a a aa aaa
|
---|
3815 | aaaa aaaa aa
|
---|
3816 | aaaa aaa aaa
|
---|
3817 | <--END
|
---|
3818 |
|
---|
3819 | START-->
|
---|
3820 | bbbb bbb bbb
|
---|
3821 | bb bb bbb bb
|
---|
3822 | bbbbbbbb bbb
|
---|
3823 | <--END
|
---|
3824 |
|
---|
3825 | START-->
|
---|
3826 | ccc ccc cccc
|
---|
3827 | cccc ccccc c
|
---|
3828 | cc cc cc cc
|
---|
3829 | <--END
|
---|
3830 | @end group
|
---|
3831 | @end example
|
---|
3832 | @codequoteundirected off
|
---|
3833 | @codequotebacktick off
|
---|
3834 |
|
---|
3835 | For more annotated examples, @pxref{Text search across multiple lines}
|
---|
3836 | and @ref{Line length adjustment}.
|
---|
3837 |
|
---|
3838 | @node Branching and flow control
|
---|
3839 | @section Branching and Flow Control
|
---|
3840 |
|
---|
3841 | The branching commands @code{b}, @code{t}, and @code{T} enable
|
---|
3842 | changing the flow of @command{sed} programs.
|
---|
3843 |
|
---|
3844 | By default, @command{sed} reads an input line into the pattern buffer,
|
---|
3845 | then continues to processes all commands in order.
|
---|
3846 | Commands without addresses affect all lines.
|
---|
3847 | Commands with addresses affect only matching lines.
|
---|
3848 | @xref{Execution Cycle} and @ref{Addresses overview}.
|
---|
3849 |
|
---|
3850 | @command{sed} does not support a typical @code{if/then} construct.
|
---|
3851 | Instead, some commands can be used as conditionals or to change the
|
---|
3852 | default flow control:
|
---|
3853 |
|
---|
3854 | @table @code
|
---|
3855 |
|
---|
3856 | @item d
|
---|
3857 | delete (clears) the current pattern space,
|
---|
3858 | and restart the program cycle without processing the rest of the commands
|
---|
3859 | and without printing the pattern space.
|
---|
3860 |
|
---|
3861 | @item D
|
---|
3862 | delete the contents of the pattern space @emph{up to the first newline},
|
---|
3863 | and restart the program cycle without processing the rest of
|
---|
3864 | the commands and without printing the pattern space.
|
---|
3865 |
|
---|
3866 | @item [addr]X
|
---|
3867 | @itemx [addr]@{ X ; X ; X @}
|
---|
3868 | @item /regexp/X
|
---|
3869 | @item /regexp/@{ X ; X ; X @}
|
---|
3870 | Addresses and regular expressions can be used as an @code{if/then}
|
---|
3871 | conditional: If @var{[addr]} matches the current pattern space,
|
---|
3872 | execute the command(s).
|
---|
3873 | For example: The command @code{/^#/d} means:
|
---|
3874 | @emph{if} the current pattern matches the regular expression @code{^#} (a line
|
---|
3875 | starting with a hash), @emph{then} execute the @code{d} command:
|
---|
3876 | delete the line without printing it, and restart the program cycle
|
---|
3877 | immediately.
|
---|
3878 |
|
---|
3879 | @item b
|
---|
3880 | branch unconditionally (that is: always jump to a label, skipping
|
---|
3881 | or repeating other commands, without restarting a new cycle). Combined
|
---|
3882 | with an address, the branch can be conditionally executed on matched
|
---|
3883 | lines.
|
---|
3884 |
|
---|
3885 | @item t
|
---|
3886 | branch conditionally (that is: jump to a label) @emph{only if} a
|
---|
3887 | @code{s///} command has succeeded since the last input line was read
|
---|
3888 | or another conditional branch was taken.
|
---|
3889 |
|
---|
3890 | @item T
|
---|
3891 | similar but opposite to the @code{t} command: branch only if
|
---|
3892 | there has been @emph{no} successful substitutions since the last
|
---|
3893 | input line was read.
|
---|
3894 | @end table
|
---|
3895 |
|
---|
3896 |
|
---|
3897 | The following two @command{sed} programs are equivalent. The first
|
---|
3898 | (contrived) example uses the @code{b} command to skip the @code{s///}
|
---|
3899 | command on lines containing @samp{1}. The second example uses an
|
---|
3900 | address with negation (@samp{!}) to perform substitution only on
|
---|
3901 | desired lines. The @code{y///} command is still executed on all
|
---|
3902 | lines:
|
---|
3903 |
|
---|
3904 | @codequoteundirected on
|
---|
3905 | @codequotebacktick on
|
---|
3906 | @example
|
---|
3907 | @group
|
---|
3908 | $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
|
---|
3909 | a4
|
---|
3910 | z5
|
---|
3911 | z6
|
---|
3912 |
|
---|
3913 | $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
|
---|
3914 | a4
|
---|
3915 | z5
|
---|
3916 | z6
|
---|
3917 | @end group
|
---|
3918 | @end example
|
---|
3919 | @codequoteundirected off
|
---|
3920 | @codequotebacktick off
|
---|
3921 |
|
---|
3922 |
|
---|
3923 |
|
---|
3924 | @subsection Branching and Cycles
|
---|
3925 | @cindex labels
|
---|
3926 | @cindex omitting labels
|
---|
3927 | @cindex cycle, restarting
|
---|
3928 | @cindex restarting a cycle
|
---|
3929 | The @code{b},@code{t} and @code{T} commands can be followed by a label
|
---|
3930 | (typically a single letter). Labels are defined with a colon followed by
|
---|
3931 | one or more letters (e.g. @samp{:x}). If the label is omitted the
|
---|
3932 | branch commands restart the cycle. Note the difference between
|
---|
3933 | branching to a label and restarting the cycle: when a cycle is
|
---|
3934 | restarted, @command{sed} first prints the current content of the
|
---|
3935 | pattern space, then reads the next input line into the pattern space;
|
---|
3936 | Jumping to a label (even if it is at the beginning of the program)
|
---|
3937 | does not print the pattern space and does not read the next input line.
|
---|
3938 |
|
---|
3939 | The following program is a no-op. The @code{b} command (the only command
|
---|
3940 | in the program) does not have a label, and thus simply restarts the cycle.
|
---|
3941 | On each cycle, the pattern space is printed and the next input line is read:
|
---|
3942 |
|
---|
3943 | @example
|
---|
3944 | @group
|
---|
3945 | $ seq 3 | sed b
|
---|
3946 | 1
|
---|
3947 | 2
|
---|
3948 | 3
|
---|
3949 | @end group
|
---|
3950 | @end example
|
---|
3951 |
|
---|
3952 | @cindex infinite loop, branching
|
---|
3953 | @cindex branching, infinite loop
|
---|
3954 | The following example is an infinite-loop - it doesn't terminate and
|
---|
3955 | doesn't print anything. The @code{b} command jumps to the @samp{x}
|
---|
3956 | label, and a new cycle is never started:
|
---|
3957 |
|
---|
3958 | @codequoteundirected on
|
---|
3959 | @codequotebacktick on
|
---|
3960 | @example
|
---|
3961 | @group
|
---|
3962 | $ seq 3 | sed ':x ; bx'
|
---|
3963 |
|
---|
3964 | # The above command requires gnu sed (which supports additional
|
---|
3965 | # commands following a label, without a newline). A portable equivalent:
|
---|
3966 | # sed -e ':x' -e bx
|
---|
3967 | @end group
|
---|
3968 | @end example
|
---|
3969 | @codequoteundirected off
|
---|
3970 | @codequotebacktick off
|
---|
3971 |
|
---|
3972 | @cindex branching and n, N
|
---|
3973 | @cindex n, and branching
|
---|
3974 | @cindex N, and branching
|
---|
3975 | Branching is often complemented with the @code{n} or @code{N} commands:
|
---|
3976 | both commands read the next input line into the pattern space without waiting
|
---|
3977 | for the cycle to restart. Before reading the next input line, @code{n}
|
---|
3978 | prints the current pattern space then empties it, while @code{N}
|
---|
3979 | appends a newline and the next input line to the pattern space.
|
---|
3980 |
|
---|
3981 | Consider the following two examples:
|
---|
3982 |
|
---|
3983 | @codequoteundirected on
|
---|
3984 | @codequotebacktick on
|
---|
3985 | @example
|
---|
3986 | @group
|
---|
3987 | $ seq 3 | sed ':x ; n ; bx'
|
---|
3988 | 1
|
---|
3989 | 2
|
---|
3990 | 3
|
---|
3991 |
|
---|
3992 | $ seq 3 | sed ':x ; N ; bx'
|
---|
3993 | 1
|
---|
3994 | 2
|
---|
3995 | 3
|
---|
3996 | @end group
|
---|
3997 | @end example
|
---|
3998 | @codequoteundirected off
|
---|
3999 | @codequotebacktick off
|
---|
4000 |
|
---|
4001 | @itemize
|
---|
4002 | @item
|
---|
4003 | Both examples do not inf-loop, despite never starting a new cycle.
|
---|
4004 |
|
---|
4005 | @item
|
---|
4006 | In the first example, the @code{n} commands first prints the content
|
---|
4007 | of the pattern space, empties the pattern space then reads the next
|
---|
4008 | input line.
|
---|
4009 |
|
---|
4010 | @item
|
---|
4011 | In the second example, the @code{N} commands appends the next input
|
---|
4012 | line to the pattern space (with a newline). Lines are accumulated in
|
---|
4013 | the pattern space until there are no more input lines to read, then
|
---|
4014 | the @code{N} command terminates the @command{sed} program. When the
|
---|
4015 | program terminates, the end-of-cycle actions are performed, and the
|
---|
4016 | entire pattern space is printed.
|
---|
4017 |
|
---|
4018 | @item
|
---|
4019 | The second example requires @value{SSED},
|
---|
4020 | because it uses the non-POSIX-standard behavior of @code{N}.
|
---|
4021 | See the ``@code{N} command on the last line'' paragraph
|
---|
4022 | in @ref{Reporting Bugs}.
|
---|
4023 |
|
---|
4024 | @item
|
---|
4025 | To further examine the difference between the two examples,
|
---|
4026 | try the following commands:
|
---|
4027 | @codequoteundirected on
|
---|
4028 | @codequotebacktick on
|
---|
4029 | @example
|
---|
4030 | @group
|
---|
4031 | printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
|
---|
4032 | printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
|
---|
4033 | printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
|
---|
4034 | printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
|
---|
4035 | @end group
|
---|
4036 | @end example
|
---|
4037 | @codequoteundirected off
|
---|
4038 | @codequotebacktick off
|
---|
4039 |
|
---|
4040 | @end itemize
|
---|
4041 |
|
---|
4042 |
|
---|
4043 |
|
---|
4044 | @subsection Branching example: joining lines
|
---|
4045 |
|
---|
4046 | @cindex joining lines with branching
|
---|
4047 | @cindex branching, joining lines
|
---|
4048 | @cindex quoted-printable lines, joining
|
---|
4049 | @cindex joining quoted-printable lines
|
---|
4050 | @cindex t, joining lines with
|
---|
4051 | @cindex b, joining lines with
|
---|
4052 | @cindex b, versus t
|
---|
4053 | @cindex t, versus b
|
---|
4054 | As a real-world example of using branching, consider the case of
|
---|
4055 | @uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files,
|
---|
4056 | typically used to encode email messages.
|
---|
4057 | In these files long lines are split and marked with a @dfn{soft line break}
|
---|
4058 | consisting of a single @samp{=} character at the end of the line:
|
---|
4059 |
|
---|
4060 | @example
|
---|
4061 | @group
|
---|
4062 | $ cat jaques.txt
|
---|
4063 | All the wor=
|
---|
4064 | ld's a stag=
|
---|
4065 | e,
|
---|
4066 | And all the=
|
---|
4067 | men and wo=
|
---|
4068 | men merely =
|
---|
4069 | players:
|
---|
4070 | They have t=
|
---|
4071 | heir exits =
|
---|
4072 | and their e=
|
---|
4073 | ntrances;
|
---|
4074 | And one man=
|
---|
4075 | in his tim=
|
---|
4076 | e plays man=
|
---|
4077 | y parts.
|
---|
4078 | @end group
|
---|
4079 | @end example
|
---|
4080 |
|
---|
4081 |
|
---|
4082 | The following program uses an address match @samp{/=$/} as a
|
---|
4083 | conditional: If the current pattern space ends with a @samp{=}, it
|
---|
4084 | reads the next input line using @code{N}, replaces all @samp{=}
|
---|
4085 | characters which are followed by a newline, and unconditionally
|
---|
4086 | branches (@code{b}) to the beginning of the program without restarting
|
---|
4087 | a new cycle. If the pattern space does not ends with @samp{=}, the
|
---|
4088 | default action is performed: the pattern space is printed and a new
|
---|
4089 | cycle is started:
|
---|
4090 |
|
---|
4091 | @codequoteundirected on
|
---|
4092 | @codequotebacktick on
|
---|
4093 | @example
|
---|
4094 | @group
|
---|
4095 | $ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt
|
---|
4096 | All the world's a stage,
|
---|
4097 | And all the men and women merely players:
|
---|
4098 | They have their exits and their entrances;
|
---|
4099 | And one man in his time plays many parts.
|
---|
4100 | @end group
|
---|
4101 | @end example
|
---|
4102 | @codequoteundirected off
|
---|
4103 | @codequotebacktick off
|
---|
4104 |
|
---|
4105 | Here's an alternative program with a slightly different approach: On
|
---|
4106 | all lines except the last, @code{N} appends the line to the pattern
|
---|
4107 | space. A substitution command then removes soft line breaks
|
---|
4108 | (@samp{=} at the end of a line, i.e. followed by a newline) by replacing
|
---|
4109 | them with an empty string.
|
---|
4110 | @emph{if} the substitution was successful (meaning the pattern space contained
|
---|
4111 | a line which should be joined), The conditional branch command @code{t} jumps
|
---|
4112 | to the beginning of the program without completing or restarting the cycle.
|
---|
4113 | If the substitution failed (meaning there were no soft line breaks),
|
---|
4114 | The @code{t} command will @emph{not} branch. Then, @code{P} will
|
---|
4115 | print the pattern space content until the first newline, and @code{D}
|
---|
4116 | will delete the pattern space content until the first new line.
|
---|
4117 | (To learn more about @code{N}, @code{P} and @code{D} commands
|
---|
4118 | @pxref{Multiline techniques}).
|
---|
4119 |
|
---|
4120 |
|
---|
4121 | @codequoteundirected on
|
---|
4122 | @codequotebacktick on
|
---|
4123 | @example
|
---|
4124 | @group
|
---|
4125 | $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
|
---|
4126 | All the world's a stage,
|
---|
4127 | And all the men and women merely players:
|
---|
4128 | They have their exits and their entrances;
|
---|
4129 | And one man in his time plays many parts.
|
---|
4130 | @end group
|
---|
4131 | @end example
|
---|
4132 | @codequoteundirected off
|
---|
4133 | @codequotebacktick off
|
---|
4134 |
|
---|
4135 |
|
---|
4136 | For more line-joining examples @pxref{Joining lines}.
|
---|
4137 |
|
---|
4138 |
|
---|
4139 | @node Examples
|
---|
4140 | @chapter Some Sample Scripts
|
---|
4141 |
|
---|
4142 | Here are some @command{sed} scripts to guide you in the art of mastering
|
---|
4143 | @command{sed}.
|
---|
4144 |
|
---|
4145 | @menu
|
---|
4146 |
|
---|
4147 | Useful one-liners:
|
---|
4148 | * Joining lines::
|
---|
4149 |
|
---|
4150 | Some exotic examples:
|
---|
4151 | * Centering lines::
|
---|
4152 | * Increment a number::
|
---|
4153 | * Rename files to lower case::
|
---|
4154 | * Print bash environment::
|
---|
4155 | * Reverse chars of lines::
|
---|
4156 | * Text search across multiple lines::
|
---|
4157 | * Line length adjustment::
|
---|
4158 | * Adding a header to multiple files::
|
---|
4159 |
|
---|
4160 | Emulating standard utilities:
|
---|
4161 | * tac:: Reverse lines of files
|
---|
4162 | * cat -n:: Numbering lines
|
---|
4163 | * cat -b:: Numbering non-blank lines
|
---|
4164 | * wc -c:: Counting chars
|
---|
4165 | * wc -w:: Counting words
|
---|
4166 | * wc -l:: Counting lines
|
---|
4167 | * head:: Printing the first lines
|
---|
4168 | * tail:: Printing the last lines
|
---|
4169 | * uniq:: Make duplicate lines unique
|
---|
4170 | * uniq -d:: Print duplicated lines of input
|
---|
4171 | * uniq -u:: Remove all duplicated lines
|
---|
4172 | * cat -s:: Squeezing blank lines
|
---|
4173 | @end menu
|
---|
4174 |
|
---|
4175 | @node Joining lines
|
---|
4176 | @section Joining lines
|
---|
4177 |
|
---|
4178 | This section uses @code{N}, @code{D} and @code{P} commands to process
|
---|
4179 | multiple lines, and the @code{b} and @code{t} commands for branching.
|
---|
4180 | @xref{Multiline techniques} and @ref{Branching and flow control}.
|
---|
4181 |
|
---|
4182 | Join specific lines (e.g. if lines 2 and 3 need to be joined):
|
---|
4183 |
|
---|
4184 | @codequoteundirected on
|
---|
4185 | @codequotebacktick on
|
---|
4186 | @example
|
---|
4187 | $ cat lines.txt
|
---|
4188 | hello
|
---|
4189 | hel
|
---|
4190 | lo
|
---|
4191 | hello
|
---|
4192 |
|
---|
4193 | $ sed '2@{N;s/\n//;@}' lines.txt
|
---|
4194 | hello
|
---|
4195 | hello
|
---|
4196 | hello
|
---|
4197 | @end example
|
---|
4198 | @codequoteundirected off
|
---|
4199 | @codequotebacktick off
|
---|
4200 |
|
---|
4201 | Join backslash-continued lines:
|
---|
4202 |
|
---|
4203 | @codequoteundirected on
|
---|
4204 | @codequotebacktick on
|
---|
4205 | @example
|
---|
4206 | $ cat 1.txt
|
---|
4207 | this \
|
---|
4208 | is \
|
---|
4209 | a \
|
---|
4210 | long \
|
---|
4211 | line
|
---|
4212 | and another \
|
---|
4213 | line
|
---|
4214 |
|
---|
4215 | $ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}' 1.txt
|
---|
4216 | this is a long line
|
---|
4217 | and another line
|
---|
4218 |
|
---|
4219 |
|
---|
4220 | #TODO: The above requires gnu sed.
|
---|
4221 | # non-gnu seds need newlines after ':' and 'b'
|
---|
4222 | @end example
|
---|
4223 | @codequoteundirected off
|
---|
4224 | @codequotebacktick off
|
---|
4225 |
|
---|
4226 | Join lines that start with whitespace (e.g SMTP headers):
|
---|
4227 |
|
---|
4228 | @codequoteundirected on
|
---|
4229 | @codequotebacktick on
|
---|
4230 | @example
|
---|
4231 | @group
|
---|
4232 | $ cat 2.txt
|
---|
4233 | Subject: Hello
|
---|
4234 | World
|
---|
4235 | Content-Type: multipart/alternative;
|
---|
4236 | boundary=94eb2c190cc6370f06054535da6a
|
---|
4237 | Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
|
---|
4238 | Authentication-Results: mx.gnu.org;
|
---|
4239 | dkim=pass header.i=@@gnu.org;
|
---|
4240 | spf=pass
|
---|
4241 | Message-ID: <abcdef@@gnu.org>
|
---|
4242 | From: John Doe <jdoe@@gnu.org>
|
---|
4243 | To: Jane Smith <jsmith@@gnu.org>
|
---|
4244 |
|
---|
4245 | $ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
|
---|
4246 | Subject: Hello World
|
---|
4247 | Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
|
---|
4248 | Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
|
---|
4249 | Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass
|
---|
4250 | Message-ID: <abcdef@@gnu.org>
|
---|
4251 | From: John Doe <jdoe@@gnu.org>
|
---|
4252 | To: Jane Smith <jsmith@@gnu.org>
|
---|
4253 |
|
---|
4254 | # A portable (non-gnu) variation:
|
---|
4255 | # sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D'
|
---|
4256 | @end group
|
---|
4257 | @end example
|
---|
4258 | @codequoteundirected off
|
---|
4259 | @codequotebacktick off
|
---|
4260 |
|
---|
4261 |
|
---|
4262 | @node Centering lines
|
---|
4263 | @section Centering Lines
|
---|
4264 |
|
---|
4265 | This script centers all lines of a file on a 80 columns width.
|
---|
4266 | To change that width, the number in @code{\@{@dots{}\@}} must be
|
---|
4267 | replaced, and the number of added spaces also must be changed.
|
---|
4268 |
|
---|
4269 | Note how the buffer commands are used to separate parts in
|
---|
4270 | the regular expressions to be matched---this is a common
|
---|
4271 | technique.
|
---|
4272 |
|
---|
4273 | @c start-------------------------------------------
|
---|
4274 | @example
|
---|
4275 | #!/usr/bin/sed -f
|
---|
4276 |
|
---|
4277 | @group
|
---|
4278 | # Put 80 spaces in the buffer
|
---|
4279 | 1 @{
|
---|
4280 | x
|
---|
4281 | s/^$/ /
|
---|
4282 | s/^.*$/&&&&&&&&/
|
---|
4283 | x
|
---|
4284 | @}
|
---|
4285 | @end group
|
---|
4286 |
|
---|
4287 | @group
|
---|
4288 | # delete leading and trailing spaces
|
---|
4289 | y/@kbd{@key{TAB}}/ /
|
---|
4290 | s/^ *//
|
---|
4291 | s/ *$//
|
---|
4292 | @end group
|
---|
4293 |
|
---|
4294 | @group
|
---|
4295 | # add a newline and 80 spaces to end of line
|
---|
4296 | G
|
---|
4297 | @end group
|
---|
4298 |
|
---|
4299 | @group
|
---|
4300 | # keep first 81 chars (80 + a newline)
|
---|
4301 | s/^\(.\@{81\@}\).*$/\1/
|
---|
4302 | @end group
|
---|
4303 |
|
---|
4304 | @group
|
---|
4305 | # \2 matches half of the spaces, which are moved to the beginning
|
---|
4306 | s/^\(.*\)\n\(.*\)\2/\2\1/
|
---|
4307 | @end group
|
---|
4308 | @end example
|
---|
4309 | @c end---------------------------------------------
|
---|
4310 |
|
---|
4311 | @node Increment a number
|
---|
4312 | @section Increment a Number
|
---|
4313 |
|
---|
4314 | This script is one of a few that demonstrate how to do arithmetic
|
---|
4315 | in @command{sed}. This is indeed possible,@footnote{@command{sed} guru Greg
|
---|
4316 | Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator!
|
---|
4317 | It is distributed together with sed.} but must be done manually.
|
---|
4318 |
|
---|
4319 | To increment one number you just add 1 to last digit, replacing
|
---|
4320 | it by the following digit. There is one exception: when the digit
|
---|
4321 | is a nine the previous digits must be also incremented until you
|
---|
4322 | don't have a nine.
|
---|
4323 |
|
---|
4324 | This solution by Bruno Haible is very clever and smart because
|
---|
4325 | it uses a single buffer; if you don't have this limitation, the
|
---|
4326 | algorithm used in @ref{cat -n, Numbering lines}, is faster.
|
---|
4327 | It works by replacing trailing nines with an underscore, then
|
---|
4328 | using multiple @code{s} commands to increment the last digit,
|
---|
4329 | and then again substituting underscores with zeros.
|
---|
4330 |
|
---|
4331 | @c start-------------------------------------------
|
---|
4332 | @example
|
---|
4333 | #!/usr/bin/sed -f
|
---|
4334 |
|
---|
4335 | /[^0-9]/ d
|
---|
4336 |
|
---|
4337 | @group
|
---|
4338 | # replace all trailing 9s by _ (any other character except digits, could
|
---|
4339 | # be used)
|
---|
4340 | :d
|
---|
4341 | s/9\(_*\)$/_\1/
|
---|
4342 | td
|
---|
4343 | @end group
|
---|
4344 |
|
---|
4345 | @group
|
---|
4346 | # incr last digit only. The first line adds a most-significant
|
---|
4347 | # digit of 1 if we have to add a digit.
|
---|
4348 | @end group
|
---|
4349 |
|
---|
4350 | @group
|
---|
4351 | s/^\(_*\)$/1\1/; tn
|
---|
4352 | s/8\(_*\)$/9\1/; tn
|
---|
4353 | s/7\(_*\)$/8\1/; tn
|
---|
4354 | s/6\(_*\)$/7\1/; tn
|
---|
4355 | s/5\(_*\)$/6\1/; tn
|
---|
4356 | s/4\(_*\)$/5\1/; tn
|
---|
4357 | s/3\(_*\)$/4\1/; tn
|
---|
4358 | s/2\(_*\)$/3\1/; tn
|
---|
4359 | s/1\(_*\)$/2\1/; tn
|
---|
4360 | s/0\(_*\)$/1\1/; tn
|
---|
4361 | @end group
|
---|
4362 |
|
---|
4363 | @group
|
---|
4364 | :n
|
---|
4365 | y/_/0/
|
---|
4366 | @end group
|
---|
4367 | @end example
|
---|
4368 | @c end---------------------------------------------
|
---|
4369 |
|
---|
4370 | @node Rename files to lower case
|
---|
4371 | @section Rename Files to Lower Case
|
---|
4372 |
|
---|
4373 | This is a pretty strange use of @command{sed}. We transform text, and
|
---|
4374 | transform it to be shell commands, then just feed them to shell.
|
---|
4375 | Don't worry, even worse hacks are done when using @command{sed}; I have
|
---|
4376 | seen a script converting the output of @command{date} into a @command{bc}
|
---|
4377 | program!
|
---|
4378 |
|
---|
4379 | The main body of this is the @command{sed} script, which remaps the name
|
---|
4380 | from lower to upper (or vice-versa) and even checks out
|
---|
4381 | if the remapped name is the same as the original name.
|
---|
4382 | Note how the script is parameterized using shell
|
---|
4383 | variables and proper quoting.
|
---|
4384 |
|
---|
4385 | @c start-------------------------------------------
|
---|
4386 | @example
|
---|
4387 | @group
|
---|
4388 | #! /bin/sh
|
---|
4389 | # rename files to lower/upper case...
|
---|
4390 | #
|
---|
4391 | # usage:
|
---|
4392 | # move-to-lower *
|
---|
4393 | # move-to-upper *
|
---|
4394 | # or
|
---|
4395 | # move-to-lower -R .
|
---|
4396 | # move-to-upper -R .
|
---|
4397 | #
|
---|
4398 | @end group
|
---|
4399 |
|
---|
4400 | @group
|
---|
4401 | help()
|
---|
4402 | @{
|
---|
4403 | cat << eof
|
---|
4404 | Usage: $0 [-n] [-r] [-h] files...
|
---|
4405 | @end group
|
---|
4406 |
|
---|
4407 | @group
|
---|
4408 | -n do nothing, only see what would be done
|
---|
4409 | -R recursive (use find)
|
---|
4410 | -h this message
|
---|
4411 | files files to remap to lower case
|
---|
4412 | @end group
|
---|
4413 |
|
---|
4414 | @group
|
---|
4415 | Examples:
|
---|
4416 | $0 -n * (see if everything is ok, then...)
|
---|
4417 | $0 *
|
---|
4418 | @end group
|
---|
4419 |
|
---|
4420 | $0 -R .
|
---|
4421 |
|
---|
4422 | @group
|
---|
4423 | eof
|
---|
4424 | @}
|
---|
4425 | @end group
|
---|
4426 |
|
---|
4427 | @group
|
---|
4428 | apply_cmd='sh'
|
---|
4429 | finder='echo "$@@" | tr " " "\n"'
|
---|
4430 | files_only=
|
---|
4431 | @end group
|
---|
4432 |
|
---|
4433 | @group
|
---|
4434 | while :
|
---|
4435 | do
|
---|
4436 | case "$1" in
|
---|
4437 | -n) apply_cmd='cat' ;;
|
---|
4438 | -R) finder='find "$@@" -type f';;
|
---|
4439 | -h) help ; exit 1 ;;
|
---|
4440 | *) break ;;
|
---|
4441 | esac
|
---|
4442 | shift
|
---|
4443 | done
|
---|
4444 | @end group
|
---|
4445 |
|
---|
4446 | @group
|
---|
4447 | if [ -z "$1" ]; then
|
---|
4448 | echo Usage: $0 [-h] [-n] [-r] files...
|
---|
4449 | exit 1
|
---|
4450 | fi
|
---|
4451 | @end group
|
---|
4452 |
|
---|
4453 | @group
|
---|
4454 | LOWER='abcdefghijklmnopqrstuvwxyz'
|
---|
4455 | UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
|
---|
4456 | @end group
|
---|
4457 |
|
---|
4458 | @group
|
---|
4459 | case `basename $0` in
|
---|
4460 | *upper*) TO=$UPPER; FROM=$LOWER ;;
|
---|
4461 | *) FROM=$UPPER; TO=$LOWER ;;
|
---|
4462 | esac
|
---|
4463 | @end group
|
---|
4464 |
|
---|
4465 | eval $finder | sed -n '
|
---|
4466 |
|
---|
4467 | @group
|
---|
4468 | # remove all trailing slashes
|
---|
4469 | s/\/*$//
|
---|
4470 | @end group
|
---|
4471 |
|
---|
4472 | @group
|
---|
4473 | # add ./ if there is no path, only a filename
|
---|
4474 | /\//! s/^/.\//
|
---|
4475 | @end group
|
---|
4476 |
|
---|
4477 | @group
|
---|
4478 | # save path+filename
|
---|
4479 | h
|
---|
4480 | @end group
|
---|
4481 |
|
---|
4482 | @group
|
---|
4483 | # remove path
|
---|
4484 | s/.*\///
|
---|
4485 | @end group
|
---|
4486 |
|
---|
4487 | @group
|
---|
4488 | # do conversion only on filename
|
---|
4489 | y/'$FROM'/'$TO'/
|
---|
4490 | @end group
|
---|
4491 |
|
---|
4492 | @group
|
---|
4493 | # now line contains original path+file, while
|
---|
4494 | # hold space contains the new filename
|
---|
4495 | x
|
---|
4496 | @end group
|
---|
4497 |
|
---|
4498 | @group
|
---|
4499 | # add converted file name to line, which now contains
|
---|
4500 | # path/file-name\nconverted-file-name
|
---|
4501 | G
|
---|
4502 | @end group
|
---|
4503 |
|
---|
4504 | @group
|
---|
4505 | # check if converted file name is equal to original file name,
|
---|
4506 | # if it is, do not print anything
|
---|
4507 | /^.*\/\(.*\)\n\1/b
|
---|
4508 | @end group
|
---|
4509 |
|
---|
4510 | @group
|
---|
4511 | # escape special characters for the shell
|
---|
4512 | s/["$`\\]/\\&/g
|
---|
4513 | @end group
|
---|
4514 |
|
---|
4515 | @group
|
---|
4516 | # now, transform path/fromfile\n, into
|
---|
4517 | # mv path/fromfile path/tofile and print it
|
---|
4518 | s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p
|
---|
4519 | @end group
|
---|
4520 |
|
---|
4521 | ' | $apply_cmd
|
---|
4522 | @end example
|
---|
4523 | @c end---------------------------------------------
|
---|
4524 |
|
---|
4525 | @node Print bash environment
|
---|
4526 | @section Print @command{bash} Environment
|
---|
4527 |
|
---|
4528 | This script strips the definition of the shell functions
|
---|
4529 | from the output of the @command{set} Bourne-shell command.
|
---|
4530 |
|
---|
4531 | @c start-------------------------------------------
|
---|
4532 | @example
|
---|
4533 | #!/bin/sh
|
---|
4534 |
|
---|
4535 | @group
|
---|
4536 | set | sed -n '
|
---|
4537 | :x
|
---|
4538 | @end group
|
---|
4539 |
|
---|
4540 | @group
|
---|
4541 | @ifinfo
|
---|
4542 | # if no occurrence of "=()" print and load next line
|
---|
4543 | @end ifinfo
|
---|
4544 | @ifnotinfo
|
---|
4545 | # if no occurrence of @samp{=()} print and load next line
|
---|
4546 | @end ifnotinfo
|
---|
4547 | /=()/! @{ p; b; @}
|
---|
4548 | / () $/! @{ p; b; @}
|
---|
4549 | @end group
|
---|
4550 |
|
---|
4551 | @group
|
---|
4552 | # possible start of functions section
|
---|
4553 | # save the line in case this is a var like FOO="() "
|
---|
4554 | h
|
---|
4555 | @end group
|
---|
4556 |
|
---|
4557 | @group
|
---|
4558 | # if the next line has a brace, we quit because
|
---|
4559 | # nothing comes after functions
|
---|
4560 | n
|
---|
4561 | /^@{/ q
|
---|
4562 | @end group
|
---|
4563 |
|
---|
4564 | @group
|
---|
4565 | # print the old line
|
---|
4566 | x; p
|
---|
4567 | @end group
|
---|
4568 |
|
---|
4569 | @group
|
---|
4570 | # work on the new line now
|
---|
4571 | x; bx
|
---|
4572 | '
|
---|
4573 | @end group
|
---|
4574 | @end example
|
---|
4575 | @c end---------------------------------------------
|
---|
4576 |
|
---|
4577 | @node Reverse chars of lines
|
---|
4578 | @section Reverse Characters of Lines
|
---|
4579 |
|
---|
4580 | This script can be used to reverse the position of characters
|
---|
4581 | in lines. The technique moves two characters at a time, hence
|
---|
4582 | it is faster than more intuitive implementations.
|
---|
4583 |
|
---|
4584 | Note the @code{tx} command before the definition of the label.
|
---|
4585 | This is often needed to reset the flag that is tested by
|
---|
4586 | the @code{t} command.
|
---|
4587 |
|
---|
4588 | Imaginative readers will find uses for this script. An example
|
---|
4589 | is reversing the output of @command{banner}.@footnote{This requires
|
---|
4590 | another script to pad the output of banner; for example
|
---|
4591 |
|
---|
4592 | @example
|
---|
4593 | #! /bin/sh
|
---|
4594 |
|
---|
4595 | banner -w $1 $2 $3 $4 |
|
---|
4596 | sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' |
|
---|
4597 | ~/sedscripts/reverseline.sed
|
---|
4598 | @end example
|
---|
4599 | }
|
---|
4600 |
|
---|
4601 | @c start-------------------------------------------
|
---|
4602 | @example
|
---|
4603 | #!/usr/bin/sed -f
|
---|
4604 |
|
---|
4605 | /../! b
|
---|
4606 |
|
---|
4607 | @group
|
---|
4608 | # Reverse a line. Begin embedding the line between two newlines
|
---|
4609 | s/^.*$/\
|
---|
4610 | &\
|
---|
4611 | /
|
---|
4612 | @end group
|
---|
4613 |
|
---|
4614 | @group
|
---|
4615 | # Move first character at the end. The regexp matches until
|
---|
4616 | # there are zero or one characters between the markers
|
---|
4617 | tx
|
---|
4618 | :x
|
---|
4619 | s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/
|
---|
4620 | tx
|
---|
4621 | @end group
|
---|
4622 |
|
---|
4623 | @group
|
---|
4624 | # Remove the newline markers
|
---|
4625 | s/\n//g
|
---|
4626 | @end group
|
---|
4627 | @end example
|
---|
4628 | @c end---------------------------------------------
|
---|
4629 |
|
---|
4630 |
|
---|
4631 | @node Text search across multiple lines
|
---|
4632 | @section Text search across multiple lines
|
---|
4633 |
|
---|
4634 | This section uses @code{N} and @code{D} commands to search for
|
---|
4635 | consecutive words spanning multiple lines. @xref{Multiline techniques}.
|
---|
4636 |
|
---|
4637 | These examples deal with finding doubled occurrences of words in a document.
|
---|
4638 |
|
---|
4639 | Finding doubled words in a single line is easy using GNU @command{grep}
|
---|
4640 | and similarly with @value{SSED}:
|
---|
4641 |
|
---|
4642 | @c NOTE: in all examples, 'the@ the' is used to prevent
|
---|
4643 | @c 'make syntax-check' from complaining about double words.
|
---|
4644 | @codequoteundirected on
|
---|
4645 | @codequotebacktick on
|
---|
4646 | @example
|
---|
4647 | @group
|
---|
4648 | $ cat two-cities-dup1.txt
|
---|
4649 | It was the best of times,
|
---|
4650 | it was the worst of times,
|
---|
4651 | it was the@ the age of wisdom,
|
---|
4652 | it was the age of foolishness,
|
---|
4653 |
|
---|
4654 | $ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
|
---|
4655 | it was the@ the age of wisdom,
|
---|
4656 |
|
---|
4657 | $ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
|
---|
4658 | 3:it was the@ the age of wisdom,
|
---|
4659 |
|
---|
4660 | $ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
|
---|
4661 | it was the@ the age of wisdom,
|
---|
4662 |
|
---|
4663 | $ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt
|
---|
4664 | 3
|
---|
4665 | it was the@ the age of wisdom,
|
---|
4666 | @end group
|
---|
4667 | @end example
|
---|
4668 | @codequoteundirected off
|
---|
4669 | @codequotebacktick off
|
---|
4670 |
|
---|
4671 | @itemize @bullet
|
---|
4672 | @item
|
---|
4673 | The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}),
|
---|
4674 | followed by one-or-more word-characters (@samp{\w+}), followed by whitespace
|
---|
4675 | (@samp{\s+}). @xref{regexp extensions}.
|
---|
4676 |
|
---|
4677 | @item
|
---|
4678 | Adding parentheses around the @samp{(\w+)} expression creates a subexpression.
|
---|
4679 | The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression
|
---|
4680 | (in the parentheses) followed by a back-reference, separated by whitespace.
|
---|
4681 | A successful match means the @var{PATTERN} was repeated twice in succession.
|
---|
4682 | @xref{Back-references and Subexpressions}.
|
---|
4683 |
|
---|
4684 | @item
|
---|
4685 | The word-boundery expression (@samp{\b}) at both ends ensures partial
|
---|
4686 | words are not matched (e.g. @samp{the then} is not a desired match).
|
---|
4687 | @c Thanks to Jim for pointing this out in
|
---|
4688 | @c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html
|
---|
4689 |
|
---|
4690 | @item
|
---|
4691 | The @option{-E} option enables extended regular expression syntax, alleviating
|
---|
4692 | the need to add backslashes before the parenthesis. @xref{ERE syntax}.
|
---|
4693 |
|
---|
4694 | @end itemize
|
---|
4695 |
|
---|
4696 | When the doubled word span two lines the above regular expression
|
---|
4697 | will not find them as @command{grep} and @command{sed} operate line-by-line.
|
---|
4698 |
|
---|
4699 | By using @command{N} and @command{D} commands, @command{sed} can apply
|
---|
4700 | regular expressions on multiple lines (that is, multiple lines are stored
|
---|
4701 | in the pattern space, and the regular expression works on it):
|
---|
4702 |
|
---|
4703 | @c NOTE: use 'the@*the' instead of a real new line to prevent
|
---|
4704 | @c 'make syntax-check' to complain about doubled-words.
|
---|
4705 | @codequoteundirected on
|
---|
4706 | @codequotebacktick on
|
---|
4707 | @example
|
---|
4708 | $ cat two-cities-dup2.txt
|
---|
4709 | It was the best of times, it was the
|
---|
4710 | worst of times, it was the@*the age of wisdom,
|
---|
4711 | it was the age of foolishness,
|
---|
4712 |
|
---|
4713 | $ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}' two-cities-dup2.txt
|
---|
4714 | 3
|
---|
4715 | worst of times, it was the@*the age of wisdom,
|
---|
4716 | @end example
|
---|
4717 | @codequoteundirected off
|
---|
4718 | @codequotebacktick off
|
---|
4719 |
|
---|
4720 | @itemize @bullet
|
---|
4721 | @item
|
---|
4722 | The @command{N} command appends the next line to the pattern space
|
---|
4723 | (thus ensuring it contains two consecutive lines in every cycle).
|
---|
4724 |
|
---|
4725 | @item
|
---|
4726 | The regular expression uses @samp{\s+} for word separator which matches
|
---|
4727 | both spaces and newlines.
|
---|
4728 |
|
---|
4729 | @item
|
---|
4730 | The regular expression matches, the entire pattern space is printed
|
---|
4731 | with @command{p}. No lines are printed by default due to the @option{-n} option.
|
---|
4732 |
|
---|
4733 | @item
|
---|
4734 | The @command{D} removes the first line from the pattern space (up until the
|
---|
4735 | first newline), readying it for the next cycle.
|
---|
4736 | @end itemize
|
---|
4737 |
|
---|
4738 | See the GNU @command{coreutils} manual for an alternative solution using
|
---|
4739 | @command{tr -s} and @command{uniq} at
|
---|
4740 | @c NOTE: cheating and keeping the URL line shorter than 80 characters
|
---|
4741 | @c by using 'gnu.org' and '/s/'.
|
---|
4742 | @url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}.
|
---|
4743 |
|
---|
4744 | @node Line length adjustment
|
---|
4745 | @section Line length adjustment
|
---|
4746 |
|
---|
4747 | This section uses @code{N} and @code{P} commands to read and write
|
---|
4748 | lines, and the @code{b} command for branching.
|
---|
4749 | @xref{Multiline techniques} and @ref{Branching and flow control}.
|
---|
4750 |
|
---|
4751 | This (somewhat contrived) example deal with formatting and wrapping
|
---|
4752 | lines of text of the following input file:
|
---|
4753 |
|
---|
4754 | @example
|
---|
4755 | @group
|
---|
4756 | $ cat two-cities-mix.txt
|
---|
4757 | It was the best of times, it was
|
---|
4758 | the worst of times, it
|
---|
4759 | was the age of
|
---|
4760 | wisdom,
|
---|
4761 | it
|
---|
4762 | was
|
---|
4763 | the age
|
---|
4764 | of foolishness,
|
---|
4765 | @end group
|
---|
4766 | @end example
|
---|
4767 |
|
---|
4768 | @exdent The following sed program wraps lines at 40 characters:
|
---|
4769 | @codequoteundirected on
|
---|
4770 | @codequotebacktick on
|
---|
4771 | @example
|
---|
4772 | @group
|
---|
4773 | $ cat wrap40.sed
|
---|
4774 | # outer loop
|
---|
4775 | :x
|
---|
4776 |
|
---|
4777 | # Append a newline followed by the next input line to the pattern buffer
|
---|
4778 | N
|
---|
4779 |
|
---|
4780 | # Remove all newlines from the pattern buffer
|
---|
4781 | s/\n/ /g
|
---|
4782 |
|
---|
4783 |
|
---|
4784 | # Inner loop
|
---|
4785 | :y
|
---|
4786 |
|
---|
4787 | # Add a newline after the first 40 characters
|
---|
4788 | s/(.@{40,40@})/\1\n/
|
---|
4789 |
|
---|
4790 | # If there is a newline in the pattern buffer
|
---|
4791 | # (i.e. the previous substitution added a newline)
|
---|
4792 | /\n/ @{
|
---|
4793 | # There are newlines in the pattern buffer -
|
---|
4794 | # print the content until the first newline.
|
---|
4795 | P
|
---|
4796 |
|
---|
4797 | # Remove the printed characters and the first newline
|
---|
4798 | s/.*\n//
|
---|
4799 |
|
---|
4800 | # branch to label 'y' - repeat inner loop
|
---|
4801 | by
|
---|
4802 | @}
|
---|
4803 |
|
---|
4804 | # No newlines in the pattern buffer - Branch to label 'x' (outer loop)
|
---|
4805 | # and read the next input line
|
---|
4806 | bx
|
---|
4807 | @end group
|
---|
4808 | @end example
|
---|
4809 | @codequoteundirected off
|
---|
4810 | @codequotebacktick off
|
---|
4811 |
|
---|
4812 |
|
---|
4813 |
|
---|
4814 | @exdent The wrapped output:
|
---|
4815 | @codequoteundirected on
|
---|
4816 | @codequotebacktick on
|
---|
4817 | @example
|
---|
4818 | @group
|
---|
4819 | $ sed -E -f wrap40.sed two-cities-mix.txt
|
---|
4820 | It was the best of times, it was the wor
|
---|
4821 | st of times, it was the age of wisdom, i
|
---|
4822 | t was the age of foolishness,
|
---|
4823 | @end group
|
---|
4824 | @end example
|
---|
4825 | @codequoteundirected off
|
---|
4826 | @codequotebacktick off
|
---|
4827 |
|
---|
4828 |
|
---|
4829 |
|
---|
4830 |
|
---|
4831 | @node Adding a header to multiple files
|
---|
4832 | @section Adding a header to multiple files
|
---|
4833 |
|
---|
4834 | @value{SSED} can be used to safely modify multiple files at once.
|
---|
4835 |
|
---|
4836 | @exdent Add a single line to the beginning of source code files:
|
---|
4837 |
|
---|
4838 | @codequoteundirected on
|
---|
4839 | @codequotebacktick on
|
---|
4840 | @example
|
---|
4841 | sed -i '1i/* Copyright (C) FOO BAR */' *.c
|
---|
4842 | @end example
|
---|
4843 | @codequoteundirected off
|
---|
4844 | @codequotebacktick off
|
---|
4845 |
|
---|
4846 | @exdent Adding a few lines is possible using @samp{\n} in the text:
|
---|
4847 |
|
---|
4848 | @codequoteundirected on
|
---|
4849 | @codequotebacktick on
|
---|
4850 | @example
|
---|
4851 | sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c
|
---|
4852 | @end example
|
---|
4853 | @codequoteundirected off
|
---|
4854 | @codequotebacktick off
|
---|
4855 |
|
---|
4856 | To add multiple lines from another file, use @code{0rFILE}.
|
---|
4857 | A typical use case is adding a license notice header to all files:
|
---|
4858 |
|
---|
4859 | @codequoteundirected on
|
---|
4860 | @codequotebacktick on
|
---|
4861 | @example
|
---|
4862 | ## Create the header file:
|
---|
4863 | $ cat<<'EOF'>LIC.TXT
|
---|
4864 | /*
|
---|
4865 | Copyright (C) 1989-2021 FOO BAR
|
---|
4866 |
|
---|
4867 | This program is free software; you can redistribute it and/or modify
|
---|
4868 | it under the terms of the GNU General Public License as published by
|
---|
4869 | the Free Software Foundation; either version 3, or (at your option)
|
---|
4870 | any later version.
|
---|
4871 |
|
---|
4872 | This program is distributed in the hope that it will be useful,
|
---|
4873 | but WITHOUT ANY WARRANTY; without even the implied warranty of
|
---|
4874 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
---|
4875 | GNU General Public License for more details.
|
---|
4876 |
|
---|
4877 | You should have received a copy of the GNU General Public License
|
---|
4878 | along with this program; If not, see <https://www.gnu.org/licenses/>.
|
---|
4879 | */
|
---|
4880 | EOF
|
---|
4881 |
|
---|
4882 | ## Add the file at the beginning of all source code files:
|
---|
4883 | $ sed -i '0rLIC.TXT' *.cpp *.h
|
---|
4884 | @end example
|
---|
4885 | @codequoteundirected off
|
---|
4886 | @codequotebacktick off
|
---|
4887 |
|
---|
4888 |
|
---|
4889 | With script files (e.g. @file{.sh},@file{.py},@file{.pl} files)
|
---|
4890 | the license notice typically appears @emph{after} the first line (the
|
---|
4891 | 'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE}
|
---|
4892 | @emph{after} the first line:
|
---|
4893 |
|
---|
4894 | @codequoteundirected on
|
---|
4895 | @codequotebacktick on
|
---|
4896 | @example
|
---|
4897 | ## Create the header file:
|
---|
4898 | $ cat<<'EOF'>LIC.TXT
|
---|
4899 | ##
|
---|
4900 | ## Copyright (C) 1989-2021 FOO BAR
|
---|
4901 | ##
|
---|
4902 | ## This program is free software; you can redistribute it and/or modify
|
---|
4903 | ## it under the terms of the GNU General Public License as published by
|
---|
4904 | ## the Free Software Foundation; either version 3, or (at your option)
|
---|
4905 | ## any later version.
|
---|
4906 | ##
|
---|
4907 | ## This program is distributed in the hope that it will be useful,
|
---|
4908 | ## but WITHOUT ANY WARRANTY; without even the implied warranty of
|
---|
4909 | ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
---|
4910 | ## GNU General Public License for more details.
|
---|
4911 | ##
|
---|
4912 | ## You should have received a copy of the GNU General Public License
|
---|
4913 | ## along with this program; If not, see <https://www.gnu.org/licenses/>.
|
---|
4914 | ##
|
---|
4915 | ##
|
---|
4916 | EOF
|
---|
4917 |
|
---|
4918 | ## Add the file at the beginning of all source code files:
|
---|
4919 | $ sed -i '1rLIC.TXT' *.py *.sh
|
---|
4920 | @end example
|
---|
4921 | @codequoteundirected off
|
---|
4922 | @codequotebacktick off
|
---|
4923 |
|
---|
4924 | The above @command{sed} commands can be combined with @command{find}
|
---|
4925 | to locate files in all subdirectories, @command{xargs} to run additional
|
---|
4926 | commands on selected files and @command{grep} to filter out files that already
|
---|
4927 | contain a copyright notice:
|
---|
4928 |
|
---|
4929 | @codequoteundirected on
|
---|
4930 | @codequotebacktick on
|
---|
4931 | @example
|
---|
4932 | find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \
|
---|
4933 | | xargs grep -Li copyright \
|
---|
4934 | | xargs -r sed -i '0rLIC.TXT'
|
---|
4935 | @end example
|
---|
4936 | @codequoteundirected off
|
---|
4937 | @codequotebacktick off
|
---|
4938 |
|
---|
4939 | @exdent Or a slightly safe version (handling files with spaces and newlines):
|
---|
4940 |
|
---|
4941 | @codequoteundirected on
|
---|
4942 | @codequotebacktick on
|
---|
4943 | @example
|
---|
4944 | find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \
|
---|
4945 | | xargs -0 grep -Z -Li copyright \
|
---|
4946 | | xargs -0 -r sed -i '0rLIC.TXT'
|
---|
4947 | @end example
|
---|
4948 | @codequoteundirected off
|
---|
4949 | @codequotebacktick off
|
---|
4950 |
|
---|
4951 | Note: using the @code{0} address with @code{r} command requires @value{SSED}
|
---|
4952 | version 4.9 or later. @xref{Zero Address}.
|
---|
4953 |
|
---|
4954 |
|
---|
4955 |
|
---|
4956 | @node tac
|
---|
4957 | @section Reverse Lines of Files
|
---|
4958 |
|
---|
4959 | This one begins a series of totally useless (yet interesting)
|
---|
4960 | scripts emulating various Unix commands. This, in particular,
|
---|
4961 | is a @command{tac} workalike.
|
---|
4962 |
|
---|
4963 | Note that on implementations other than GNU @command{sed}
|
---|
4964 | this script might easily overflow internal buffers.
|
---|
4965 |
|
---|
4966 | @c start-------------------------------------------
|
---|
4967 | @example
|
---|
4968 | #!/usr/bin/sed -nf
|
---|
4969 |
|
---|
4970 | # reverse all lines of input, i.e. first line became last, ...
|
---|
4971 |
|
---|
4972 | @group
|
---|
4973 | # from the second line, the buffer (which contains all previous lines)
|
---|
4974 | # is *appended* to current line, so, the order will be reversed
|
---|
4975 | 1! G
|
---|
4976 | @end group
|
---|
4977 |
|
---|
4978 | @group
|
---|
4979 | # on the last line we're done -- print everything
|
---|
4980 | $ p
|
---|
4981 | @end group
|
---|
4982 |
|
---|
4983 | @group
|
---|
4984 | # store everything on the buffer again
|
---|
4985 | h
|
---|
4986 | @end group
|
---|
4987 | @end example
|
---|
4988 | @c end---------------------------------------------
|
---|
4989 |
|
---|
4990 | @node cat -n
|
---|
4991 | @section Numbering Lines
|
---|
4992 |
|
---|
4993 | This script replaces @samp{cat -n}; in fact it formats its output
|
---|
4994 | exactly like GNU @command{cat} does.
|
---|
4995 |
|
---|
4996 | Of course this is completely useless and for two reasons: first,
|
---|
4997 | because somebody else did it in C, second, because the following
|
---|
4998 | Bourne-shell script could be used for the same purpose and would
|
---|
4999 | be much faster:
|
---|
5000 |
|
---|
5001 | @c start-------------------------------------------
|
---|
5002 | @example
|
---|
5003 | @group
|
---|
5004 | #! /bin/sh
|
---|
5005 | sed -e "=" $@@ | sed -e '
|
---|
5006 | s/^/ /
|
---|
5007 | N
|
---|
5008 | s/^ *\(......\)\n/\1 /
|
---|
5009 | '
|
---|
5010 | @end group
|
---|
5011 | @end example
|
---|
5012 | @c end---------------------------------------------
|
---|
5013 |
|
---|
5014 | It uses @command{sed} to print the line number, then groups lines two
|
---|
5015 | by two using @code{N}. Of course, this script does not teach as much as
|
---|
5016 | the one presented below.
|
---|
5017 |
|
---|
5018 | The algorithm used for incrementing uses both buffers, so the line
|
---|
5019 | is printed as soon as possible and then discarded. The number
|
---|
5020 | is split so that changing digits go in a buffer and unchanged ones go
|
---|
5021 | in the other; the changed digits are modified in a single step
|
---|
5022 | (using a @code{y} command). The line number for the next line
|
---|
5023 | is then composed and stored in the hold space, to be used in the
|
---|
5024 | next iteration.
|
---|
5025 |
|
---|
5026 | @c start-------------------------------------------
|
---|
5027 | @example
|
---|
5028 | #!/usr/bin/sed -nf
|
---|
5029 |
|
---|
5030 | @group
|
---|
5031 | # Prime the pump on the first line
|
---|
5032 | x
|
---|
5033 | /^$/ s/^.*$/1/
|
---|
5034 | @end group
|
---|
5035 |
|
---|
5036 | @group
|
---|
5037 | # Add the correct line number before the pattern
|
---|
5038 | G
|
---|
5039 | h
|
---|
5040 | @end group
|
---|
5041 |
|
---|
5042 | @group
|
---|
5043 | # Format it and print it
|
---|
5044 | s/^/ /
|
---|
5045 | s/^ *\(......\)\n/\1 /p
|
---|
5046 | @end group
|
---|
5047 |
|
---|
5048 | @group
|
---|
5049 | # Get the line number from hold space; add a zero
|
---|
5050 | # if we're going to add a digit on the next line
|
---|
5051 | g
|
---|
5052 | s/\n.*$//
|
---|
5053 | /^9*$/ s/^/0/
|
---|
5054 | @end group
|
---|
5055 |
|
---|
5056 | @group
|
---|
5057 | # separate changing/unchanged digits with an x
|
---|
5058 | s/.9*$/x&/
|
---|
5059 | @end group
|
---|
5060 |
|
---|
5061 | @group
|
---|
5062 | # keep changing digits in hold space
|
---|
5063 | h
|
---|
5064 | s/^.*x//
|
---|
5065 | y/0123456789/1234567890/
|
---|
5066 | x
|
---|
5067 | @end group
|
---|
5068 |
|
---|
5069 | @group
|
---|
5070 | # keep unchanged digits in pattern space
|
---|
5071 | s/x.*$//
|
---|
5072 | @end group
|
---|
5073 |
|
---|
5074 | @group
|
---|
5075 | # compose the new number, remove the newline implicitly added by G
|
---|
5076 | G
|
---|
5077 | s/\n//
|
---|
5078 | h
|
---|
5079 | @end group
|
---|
5080 | @end example
|
---|
5081 | @c end---------------------------------------------
|
---|
5082 |
|
---|
5083 | @node cat -b
|
---|
5084 | @section Numbering Non-blank Lines
|
---|
5085 |
|
---|
5086 | Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only
|
---|
5087 | have to select which lines are to be numbered and which are not.
|
---|
5088 |
|
---|
5089 | The part that is common to this script and the previous one is
|
---|
5090 | not commented to show how important it is to comment @command{sed}
|
---|
5091 | scripts properly...
|
---|
5092 |
|
---|
5093 | @c start-------------------------------------------
|
---|
5094 | @example
|
---|
5095 | #!/usr/bin/sed -nf
|
---|
5096 |
|
---|
5097 | @group
|
---|
5098 | /^$/ @{
|
---|
5099 | p
|
---|
5100 | b
|
---|
5101 | @}
|
---|
5102 | @end group
|
---|
5103 |
|
---|
5104 | @group
|
---|
5105 | # Same as cat -n from now
|
---|
5106 | x
|
---|
5107 | /^$/ s/^.*$/1/
|
---|
5108 | G
|
---|
5109 | h
|
---|
5110 | s/^/ /
|
---|
5111 | s/^ *\(......\)\n/\1 /p
|
---|
5112 | x
|
---|
5113 | s/\n.*$//
|
---|
5114 | /^9*$/ s/^/0/
|
---|
5115 | s/.9*$/x&/
|
---|
5116 | h
|
---|
5117 | s/^.*x//
|
---|
5118 | y/0123456789/1234567890/
|
---|
5119 | x
|
---|
5120 | s/x.*$//
|
---|
5121 | G
|
---|
5122 | s/\n//
|
---|
5123 | h
|
---|
5124 | @end group
|
---|
5125 | @end example
|
---|
5126 | @c end---------------------------------------------
|
---|
5127 |
|
---|
5128 | @node wc -c
|
---|
5129 | @section Counting Characters
|
---|
5130 |
|
---|
5131 | This script shows another way to do arithmetic with @command{sed}.
|
---|
5132 | In this case we have to add possibly large numbers, so implementing
|
---|
5133 | this by successive increments would not be feasible (and possibly
|
---|
5134 | even more complicated to contrive than this script).
|
---|
5135 |
|
---|
5136 | The approach is to map numbers to letters, kind of an abacus
|
---|
5137 | implemented with @command{sed}. @samp{a}s are units, @samp{b}s are
|
---|
5138 | tens and so on: we simply add the number of characters
|
---|
5139 | on the current line as units, and then propagate the carry
|
---|
5140 | to tens, hundreds, and so on.
|
---|
5141 |
|
---|
5142 | As usual, running totals are kept in hold space.
|
---|
5143 |
|
---|
5144 | On the last line, we convert the abacus form back to decimal.
|
---|
5145 | For the sake of variety, this is done with a loop rather than
|
---|
5146 | with some 80 @code{s} commands@footnote{Some implementations
|
---|
5147 | have a limit of 199 commands per script}: first we
|
---|
5148 | convert units, removing @samp{a}s from the number; then we
|
---|
5149 | rotate letters so that tens become @samp{a}s, and so on
|
---|
5150 | until no more letters remain.
|
---|
5151 |
|
---|
5152 | @c start-------------------------------------------
|
---|
5153 | @example
|
---|
5154 | #!/usr/bin/sed -nf
|
---|
5155 |
|
---|
5156 | @group
|
---|
5157 | # Add n+1 a's to hold space (+1 is for the newline)
|
---|
5158 | s/./a/g
|
---|
5159 | H
|
---|
5160 | x
|
---|
5161 | s/\n/a/
|
---|
5162 | @end group
|
---|
5163 |
|
---|
5164 | @group
|
---|
5165 | # Do the carry. The t's and b's are not necessary,
|
---|
5166 | # but they do speed up the thing
|
---|
5167 | t a
|
---|
5168 | : a; s/aaaaaaaaaa/b/g; t b; b done
|
---|
5169 | : b; s/bbbbbbbbbb/c/g; t c; b done
|
---|
5170 | : c; s/cccccccccc/d/g; t d; b done
|
---|
5171 | : d; s/dddddddddd/e/g; t e; b done
|
---|
5172 | : e; s/eeeeeeeeee/f/g; t f; b done
|
---|
5173 | : f; s/ffffffffff/g/g; t g; b done
|
---|
5174 | : g; s/gggggggggg/h/g; t h; b done
|
---|
5175 | : h; s/hhhhhhhhhh//g
|
---|
5176 | @end group
|
---|
5177 |
|
---|
5178 | @group
|
---|
5179 | : done
|
---|
5180 | $! @{
|
---|
5181 | h
|
---|
5182 | b
|
---|
5183 | @}
|
---|
5184 | @end group
|
---|
5185 |
|
---|
5186 | # On the last line, convert back to decimal
|
---|
5187 |
|
---|
5188 | @group
|
---|
5189 | : loop
|
---|
5190 | /a/! s/[b-h]*/&0/
|
---|
5191 | s/aaaaaaaaa/9/
|
---|
5192 | s/aaaaaaaa/8/
|
---|
5193 | s/aaaaaaa/7/
|
---|
5194 | s/aaaaaa/6/
|
---|
5195 | s/aaaaa/5/
|
---|
5196 | s/aaaa/4/
|
---|
5197 | s/aaa/3/
|
---|
5198 | s/aa/2/
|
---|
5199 | s/a/1/
|
---|
5200 | @end group
|
---|
5201 |
|
---|
5202 | @group
|
---|
5203 | : next
|
---|
5204 | y/bcdefgh/abcdefg/
|
---|
5205 | /[a-h]/ b loop
|
---|
5206 | p
|
---|
5207 | @end group
|
---|
5208 | @end example
|
---|
5209 | @c end---------------------------------------------
|
---|
5210 |
|
---|
5211 | @node wc -w
|
---|
5212 | @section Counting Words
|
---|
5213 |
|
---|
5214 | This script is almost the same as the previous one, once each
|
---|
5215 | of the words on the line is converted to a single @samp{a}
|
---|
5216 | (in the previous script each letter was changed to an @samp{a}).
|
---|
5217 |
|
---|
5218 | It is interesting that real @command{wc} programs have optimized
|
---|
5219 | loops for @samp{wc -c}, so they are much slower at counting
|
---|
5220 | words rather than characters. This script's bottleneck,
|
---|
5221 | instead, is arithmetic, and hence the word-counting one
|
---|
5222 | is faster (it has to manage smaller numbers).
|
---|
5223 |
|
---|
5224 | Again, the common parts are not commented to show the importance
|
---|
5225 | of commenting @command{sed} scripts.
|
---|
5226 |
|
---|
5227 | @c start-------------------------------------------
|
---|
5228 | @example
|
---|
5229 | #!/usr/bin/sed -nf
|
---|
5230 |
|
---|
5231 | @group
|
---|
5232 | # Convert words to a's
|
---|
5233 | s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g
|
---|
5234 | s/^/ /
|
---|
5235 | s/ [^ ][^ ]*/a /g
|
---|
5236 | s/ //g
|
---|
5237 | @end group
|
---|
5238 |
|
---|
5239 | @group
|
---|
5240 | # Append them to hold space
|
---|
5241 | H
|
---|
5242 | x
|
---|
5243 | s/\n//
|
---|
5244 | @end group
|
---|
5245 |
|
---|
5246 | @group
|
---|
5247 | # From here on it is the same as in wc -c.
|
---|
5248 | /aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
|
---|
5249 | /bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
|
---|
5250 | /cccccccccc/! bx; s/cccccccccc/d/g
|
---|
5251 | /dddddddddd/! bx; s/dddddddddd/e/g
|
---|
5252 | /eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
|
---|
5253 | /ffffffffff/! bx; s/ffffffffff/g/g
|
---|
5254 | /gggggggggg/! bx; s/gggggggggg/h/g
|
---|
5255 | s/hhhhhhhhhh//g
|
---|
5256 | :x
|
---|
5257 | $! @{ h; b; @}
|
---|
5258 | :y
|
---|
5259 | /a/! s/[b-h]*/&0/
|
---|
5260 | s/aaaaaaaaa/9/
|
---|
5261 | s/aaaaaaaa/8/
|
---|
5262 | s/aaaaaaa/7/
|
---|
5263 | s/aaaaaa/6/
|
---|
5264 | s/aaaaa/5/
|
---|
5265 | s/aaaa/4/
|
---|
5266 | s/aaa/3/
|
---|
5267 | s/aa/2/
|
---|
5268 | s/a/1/
|
---|
5269 | y/bcdefgh/abcdefg/
|
---|
5270 | /[a-h]/ by
|
---|
5271 | p
|
---|
5272 | @end group
|
---|
5273 | @end example
|
---|
5274 | @c end---------------------------------------------
|
---|
5275 |
|
---|
5276 | @node wc -l
|
---|
5277 | @section Counting Lines
|
---|
5278 |
|
---|
5279 | No strange things are done now, because @command{sed} gives us
|
---|
5280 | @samp{wc -l} functionality for free!!! Look:
|
---|
5281 |
|
---|
5282 | @c start-------------------------------------------
|
---|
5283 | @example
|
---|
5284 | @group
|
---|
5285 | #!/usr/bin/sed -nf
|
---|
5286 | $=
|
---|
5287 | @end group
|
---|
5288 | @end example
|
---|
5289 | @c end---------------------------------------------
|
---|
5290 |
|
---|
5291 | @node head
|
---|
5292 | @section Printing the First Lines
|
---|
5293 |
|
---|
5294 | This script is probably the simplest useful @command{sed} script.
|
---|
5295 | It displays the first 10 lines of input; the number of displayed
|
---|
5296 | lines is right before the @code{q} command.
|
---|
5297 |
|
---|
5298 | @c start-------------------------------------------
|
---|
5299 | @example
|
---|
5300 | @group
|
---|
5301 | #!/usr/bin/sed -f
|
---|
5302 | 10q
|
---|
5303 | @end group
|
---|
5304 | @end example
|
---|
5305 | @c end---------------------------------------------
|
---|
5306 |
|
---|
5307 | @node tail
|
---|
5308 | @section Printing the Last Lines
|
---|
5309 |
|
---|
5310 | Printing the last @var{n} lines rather than the first is more complex
|
---|
5311 | but indeed possible. @var{n} is encoded in the second line, before
|
---|
5312 | the bang character.
|
---|
5313 |
|
---|
5314 | This script is similar to the @command{tac} script in that it keeps the
|
---|
5315 | final output in the hold space and prints it at the end:
|
---|
5316 |
|
---|
5317 | @c start-------------------------------------------
|
---|
5318 | @example
|
---|
5319 | #!/usr/bin/sed -nf
|
---|
5320 |
|
---|
5321 | @group
|
---|
5322 | 1! @{; H; g; @}
|
---|
5323 | 1,10 !s/[^\n]*\n//
|
---|
5324 | $p
|
---|
5325 | h
|
---|
5326 | @end group
|
---|
5327 | @end example
|
---|
5328 | @c end---------------------------------------------
|
---|
5329 |
|
---|
5330 | Mainly, the scripts keeps a window of 10 lines and slides it
|
---|
5331 | by adding a line and deleting the oldest (the substitution command
|
---|
5332 | on the second line works like a @code{D} command but does not
|
---|
5333 | restart the loop).
|
---|
5334 |
|
---|
5335 | The ``sliding window'' technique is a very powerful way to write
|
---|
5336 | efficient and complex @command{sed} scripts, because commands like
|
---|
5337 | @code{P} would require a lot of work if implemented manually.
|
---|
5338 |
|
---|
5339 | To introduce the technique, which is fully demonstrated in the
|
---|
5340 | rest of this chapter and is based on the @code{N}, @code{P}
|
---|
5341 | and @code{D} commands, here is an implementation of @command{tail}
|
---|
5342 | using a simple ``sliding window.''
|
---|
5343 |
|
---|
5344 | This looks complicated but in fact the working is the same as
|
---|
5345 | the last script: after we have kicked in the appropriate number
|
---|
5346 | of lines, however, we stop using the hold space to keep inter-line
|
---|
5347 | state, and instead use @code{N} and @code{D} to slide pattern
|
---|
5348 | space by one line:
|
---|
5349 |
|
---|
5350 | @c start-------------------------------------------
|
---|
5351 | @example
|
---|
5352 | #!/usr/bin/sed -f
|
---|
5353 |
|
---|
5354 | @group
|
---|
5355 | 1h
|
---|
5356 | 2,10 @{; H; g; @}
|
---|
5357 | $q
|
---|
5358 | 1,9d
|
---|
5359 | N
|
---|
5360 | D
|
---|
5361 | @end group
|
---|
5362 | @end example
|
---|
5363 | @c end---------------------------------------------
|
---|
5364 |
|
---|
5365 | Note how the first, second and fourth line are inactive after
|
---|
5366 | the first ten lines of input. After that, all the script does
|
---|
5367 | is: exiting on the last line of input, appending the next input
|
---|
5368 | line to pattern space, and removing the first line.
|
---|
5369 |
|
---|
5370 | @node uniq
|
---|
5371 | @section Make Duplicate Lines Unique
|
---|
5372 |
|
---|
5373 | This is an example of the art of using the @code{N}, @code{P}
|
---|
5374 | and @code{D} commands, probably the most difficult to master.
|
---|
5375 |
|
---|
5376 | @c start-------------------------------------------
|
---|
5377 | @example
|
---|
5378 | @group
|
---|
5379 | #!/usr/bin/sed -f
|
---|
5380 | h
|
---|
5381 | @end group
|
---|
5382 |
|
---|
5383 | @group
|
---|
5384 | :b
|
---|
5385 | # On the last line, print and exit
|
---|
5386 | $b
|
---|
5387 | N
|
---|
5388 | /^\(.*\)\n\1$/ @{
|
---|
5389 | # The two lines are identical. Undo the effect of
|
---|
5390 | # the n command.
|
---|
5391 | g
|
---|
5392 | bb
|
---|
5393 | @}
|
---|
5394 | @end group
|
---|
5395 |
|
---|
5396 | @group
|
---|
5397 | # If the @code{N} command had added the last line, print and exit
|
---|
5398 | $b
|
---|
5399 | @end group
|
---|
5400 |
|
---|
5401 | @group
|
---|
5402 | # The lines are different; print the first and go
|
---|
5403 | # back working on the second.
|
---|
5404 | P
|
---|
5405 | D
|
---|
5406 | @end group
|
---|
5407 | @end example
|
---|
5408 | @c end---------------------------------------------
|
---|
5409 |
|
---|
5410 | As you can see, we maintain a 2-line window using @code{P} and @code{D}.
|
---|
5411 | This technique is often used in advanced @command{sed} scripts.
|
---|
5412 |
|
---|
5413 | @node uniq -d
|
---|
5414 | @section Print Duplicated Lines of Input
|
---|
5415 |
|
---|
5416 | This script prints only duplicated lines, like @samp{uniq -d}.
|
---|
5417 |
|
---|
5418 | @c start-------------------------------------------
|
---|
5419 | @example
|
---|
5420 | #!/usr/bin/sed -nf
|
---|
5421 |
|
---|
5422 | @group
|
---|
5423 | $b
|
---|
5424 | N
|
---|
5425 | /^\(.*\)\n\1$/ @{
|
---|
5426 | # Print the first of the duplicated lines
|
---|
5427 | s/.*\n//
|
---|
5428 | p
|
---|
5429 | @end group
|
---|
5430 |
|
---|
5431 | @group
|
---|
5432 | # Loop until we get a different line
|
---|
5433 | :b
|
---|
5434 | $b
|
---|
5435 | N
|
---|
5436 | /^\(.*\)\n\1$/ @{
|
---|
5437 | s/.*\n//
|
---|
5438 | bb
|
---|
5439 | @}
|
---|
5440 | @}
|
---|
5441 | @end group
|
---|
5442 |
|
---|
5443 | @group
|
---|
5444 | # The last line cannot be followed by duplicates
|
---|
5445 | $b
|
---|
5446 | @end group
|
---|
5447 |
|
---|
5448 | @group
|
---|
5449 | # Found a different one. Leave it alone in the pattern space
|
---|
5450 | # and go back to the top, hunting its duplicates
|
---|
5451 | D
|
---|
5452 | @end group
|
---|
5453 | @end example
|
---|
5454 | @c end---------------------------------------------
|
---|
5455 |
|
---|
5456 | @node uniq -u
|
---|
5457 | @section Remove All Duplicated Lines
|
---|
5458 |
|
---|
5459 | This script prints only unique lines, like @samp{uniq -u}.
|
---|
5460 |
|
---|
5461 | @c start-------------------------------------------
|
---|
5462 | @example
|
---|
5463 | #!/usr/bin/sed -f
|
---|
5464 |
|
---|
5465 | @group
|
---|
5466 | # Search for a duplicate line --- until that, print what you find.
|
---|
5467 | $b
|
---|
5468 | N
|
---|
5469 | /^\(.*\)\n\1$/ ! @{
|
---|
5470 | P
|
---|
5471 | D
|
---|
5472 | @}
|
---|
5473 | @end group
|
---|
5474 |
|
---|
5475 | @group
|
---|
5476 | :c
|
---|
5477 | # Got two equal lines in pattern space. At the
|
---|
5478 | # end of the file we simply exit
|
---|
5479 | $d
|
---|
5480 | @end group
|
---|
5481 |
|
---|
5482 | @group
|
---|
5483 | # Else, we keep reading lines with @code{N} until we
|
---|
5484 | # find a different one
|
---|
5485 | s/.*\n//
|
---|
5486 | N
|
---|
5487 | /^\(.*\)\n\1$/ @{
|
---|
5488 | bc
|
---|
5489 | @}
|
---|
5490 | @end group
|
---|
5491 |
|
---|
5492 | @group
|
---|
5493 | # Remove the last instance of the duplicate line
|
---|
5494 | # and go back to the top
|
---|
5495 | D
|
---|
5496 | @end group
|
---|
5497 | @end example
|
---|
5498 | @c end---------------------------------------------
|
---|
5499 |
|
---|
5500 | @node cat -s
|
---|
5501 | @section Squeezing Blank Lines
|
---|
5502 |
|
---|
5503 | As a final example, here are three scripts, of increasing complexity
|
---|
5504 | and speed, that implement the same function as @samp{cat -s}, that is
|
---|
5505 | squeezing blank lines.
|
---|
5506 |
|
---|
5507 | The first leaves a blank line at the beginning and end if there are
|
---|
5508 | some already.
|
---|
5509 |
|
---|
5510 | @c start-------------------------------------------
|
---|
5511 | @example
|
---|
5512 | #!/usr/bin/sed -f
|
---|
5513 |
|
---|
5514 | @group
|
---|
5515 | # on empty lines, join with next
|
---|
5516 | # Note there is a star in the regexp
|
---|
5517 | :x
|
---|
5518 | /^\n*$/ @{
|
---|
5519 | N
|
---|
5520 | bx
|
---|
5521 | @}
|
---|
5522 | @end group
|
---|
5523 |
|
---|
5524 | @group
|
---|
5525 | # now, squeeze all '\n', this can be also done by:
|
---|
5526 | # s/^\(\n\)*/\1/
|
---|
5527 | s/\n*/\
|
---|
5528 | /
|
---|
5529 | @end group
|
---|
5530 | @end example
|
---|
5531 | @c end---------------------------------------------
|
---|
5532 |
|
---|
5533 | This one is a bit more complex and removes all empty lines
|
---|
5534 | at the beginning. It does leave a single blank line at end
|
---|
5535 | if one was there.
|
---|
5536 |
|
---|
5537 | @c start-------------------------------------------
|
---|
5538 | @example
|
---|
5539 | #!/usr/bin/sed -f
|
---|
5540 |
|
---|
5541 | @group
|
---|
5542 | # delete all leading empty lines
|
---|
5543 | 1,/^./@{
|
---|
5544 | /./!d
|
---|
5545 | @}
|
---|
5546 | @end group
|
---|
5547 |
|
---|
5548 | @group
|
---|
5549 | # on an empty line we remove it and all the following
|
---|
5550 | # empty lines, but one
|
---|
5551 | :x
|
---|
5552 | /./!@{
|
---|
5553 | N
|
---|
5554 | s/^\n$//
|
---|
5555 | tx
|
---|
5556 | @}
|
---|
5557 | @end group
|
---|
5558 | @end example
|
---|
5559 | @c end---------------------------------------------
|
---|
5560 |
|
---|
5561 | This removes leading and trailing blank lines. It is also the
|
---|
5562 | fastest. Note that loops are completely done with @code{n} and
|
---|
5563 | @code{b}, without relying on @command{sed} to restart the
|
---|
5564 | script automatically at the end of a line.
|
---|
5565 |
|
---|
5566 | @c start-------------------------------------------
|
---|
5567 | @example
|
---|
5568 | #!/usr/bin/sed -nf
|
---|
5569 |
|
---|
5570 | @group
|
---|
5571 | # delete all (leading) blanks
|
---|
5572 | /./!d
|
---|
5573 | @end group
|
---|
5574 |
|
---|
5575 | @group
|
---|
5576 | # get here: so there is a non empty
|
---|
5577 | :x
|
---|
5578 | # print it
|
---|
5579 | p
|
---|
5580 | # get next
|
---|
5581 | n
|
---|
5582 | # got chars? print it again, etc...
|
---|
5583 | /./bx
|
---|
5584 | @end group
|
---|
5585 |
|
---|
5586 | @group
|
---|
5587 | # no, don't have chars: got an empty line
|
---|
5588 | :z
|
---|
5589 | # get next, if last line we finish here so no trailing
|
---|
5590 | # empty lines are written
|
---|
5591 | n
|
---|
5592 | # also empty? then ignore it, and get next... this will
|
---|
5593 | # remove ALL empty lines
|
---|
5594 | /./!bz
|
---|
5595 | @end group
|
---|
5596 |
|
---|
5597 | @group
|
---|
5598 | # all empty lines were deleted/ignored, but we have a non empty. As
|
---|
5599 | # what we want to do is to squeeze, insert a blank line artificially
|
---|
5600 | i\
|
---|
5601 | @end group
|
---|
5602 |
|
---|
5603 | bx
|
---|
5604 | @end example
|
---|
5605 | @c end---------------------------------------------
|
---|
5606 |
|
---|
5607 | @node Limitations
|
---|
5608 | @chapter @value{SSED}'s Limitations and Non-limitations
|
---|
5609 |
|
---|
5610 | @cindex GNU extensions, unlimited line length
|
---|
5611 | @cindex Portability, line length limitations
|
---|
5612 | For those who want to write portable @command{sed} scripts,
|
---|
5613 | be aware that some implementations have been known to
|
---|
5614 | limit line lengths (for the pattern and hold spaces)
|
---|
5615 | to be no more than 4000 bytes.
|
---|
5616 | The @sc{posix} standard specifies that conforming @command{sed}
|
---|
5617 | implementations shall support at least 8192 byte line lengths.
|
---|
5618 | @value{SSED} has no built-in limit on line length;
|
---|
5619 | as long as it can @code{malloc()} more (virtual) memory,
|
---|
5620 | you can feed or construct lines as long as you like.
|
---|
5621 |
|
---|
5622 | However, recursion is used to handle subpatterns and indefinite
|
---|
5623 | repetition. This means that the available stack space may limit
|
---|
5624 | the size of the buffer that can be processed by certain patterns.
|
---|
5625 |
|
---|
5626 |
|
---|
5627 | @node Other Resources
|
---|
5628 | @chapter Other Resources for Learning About @command{sed}
|
---|
5629 |
|
---|
5630 | For up to date information about @value{SSED} please
|
---|
5631 | visit @uref{https://www.gnu.org/software/sed/}.
|
---|
5632 |
|
---|
5633 | Send general questions and suggestions to @email{sed-devel@@gnu.org}.
|
---|
5634 | Visit the mailing list archives for past discussions at
|
---|
5635 | @uref{https://lists.gnu.org/archive/html/sed-devel/}.
|
---|
5636 |
|
---|
5637 | @cindex Additional reading about @command{sed}
|
---|
5638 | The following resources provide information about @command{sed}
|
---|
5639 | (both @value{SSED} and other variations). Note these not maintained by
|
---|
5640 | @value{SSED} developers.
|
---|
5641 |
|
---|
5642 | @itemize @bullet
|
---|
5643 |
|
---|
5644 | @item
|
---|
5645 | sed @code{$HOME}: @uref{http://sed.sf.net}
|
---|
5646 |
|
---|
5647 | @item
|
---|
5648 | sed FAQ: @uref{http://sed.sf.net/sedfaq.html}
|
---|
5649 |
|
---|
5650 | @item
|
---|
5651 | seder's grabbag: @uref{http://sed.sf.net/grabbag}
|
---|
5652 |
|
---|
5653 | @item
|
---|
5654 | The @code{sed-users} mailing list maintained by Sven Guckes:
|
---|
5655 | @uref{http://groups.yahoo.com/group/sed-users/}
|
---|
5656 | (note this is @emph{not} the @value{SSED} mailing list).
|
---|
5657 |
|
---|
5658 | @end itemize
|
---|
5659 |
|
---|
5660 | @node Reporting Bugs
|
---|
5661 | @chapter Reporting Bugs
|
---|
5662 |
|
---|
5663 | @cindex Bugs, reporting
|
---|
5664 | Email bug reports to @email{bug-sed@@gnu.org}.
|
---|
5665 | Also, please include the output of @samp{sed --version} in the body
|
---|
5666 | of your report if at all possible.
|
---|
5667 |
|
---|
5668 | Please do not send a bug report like this:
|
---|
5669 |
|
---|
5670 | @example
|
---|
5671 | @i{@i{@r{while building frobme-1.3.4}}}
|
---|
5672 | $ configure
|
---|
5673 | @error{} sed: file sedscr line 1: Unknown option to 's'
|
---|
5674 | @end example
|
---|
5675 |
|
---|
5676 | If @value{SSED} doesn't configure your favorite package, take a
|
---|
5677 | few extra minutes to identify the specific problem and make a stand-alone
|
---|
5678 | test case. Unlike other programs such as C compilers, making such test
|
---|
5679 | cases for @command{sed} is quite simple.
|
---|
5680 |
|
---|
5681 | A stand-alone test case includes all the data necessary to perform the
|
---|
5682 | test, and the specific invocation of @command{sed} that causes the problem.
|
---|
5683 | The smaller a stand-alone test case is, the better. A test case should
|
---|
5684 | not involve something as far removed from @command{sed} as ``try to configure
|
---|
5685 | frobme-1.3.4''. Yes, that is in principle enough information to look
|
---|
5686 | for the bug, but that is not a very practical prospect.
|
---|
5687 |
|
---|
5688 | Here are a few commonly reported bugs that are not bugs.
|
---|
5689 |
|
---|
5690 | @table @asis
|
---|
5691 | @anchor{N_command_last_line}
|
---|
5692 | @item @code{N} command on the last line
|
---|
5693 | @cindex Portability, @code{N} command on the last line
|
---|
5694 | @cindex Non-bugs, @code{N} command on the last line
|
---|
5695 |
|
---|
5696 | Most versions of @command{sed} exit without printing anything when
|
---|
5697 | the @command{N} command is issued on the last line of a file.
|
---|
5698 | @value{SSED} prints pattern space before exiting unless of course
|
---|
5699 | the @command{-n} command switch has been specified. This choice is
|
---|
5700 | by design.
|
---|
5701 |
|
---|
5702 | Default behavior (gnu extension, non-POSIX conforming):
|
---|
5703 | @example
|
---|
5704 | $ seq 3 | sed N
|
---|
5705 | 1
|
---|
5706 | 2
|
---|
5707 | 3
|
---|
5708 | @end example
|
---|
5709 | @noindent
|
---|
5710 | To force POSIX-conforming behavior:
|
---|
5711 | @example
|
---|
5712 | $ seq 3 | sed --posix N
|
---|
5713 | 1
|
---|
5714 | 2
|
---|
5715 | @end example
|
---|
5716 |
|
---|
5717 | For example, the behavior of
|
---|
5718 | @example
|
---|
5719 | sed N foo bar
|
---|
5720 | @end example
|
---|
5721 | @noindent
|
---|
5722 | would depend on whether foo has an even or an odd number of
|
---|
5723 | lines@footnote{which is the actual ``bug'' that prompted the
|
---|
5724 | change in behavior}. Or, when writing a script to read the
|
---|
5725 | next few lines following a pattern match, traditional
|
---|
5726 | implementations of @code{sed} would force you to write
|
---|
5727 | something like
|
---|
5728 | @example
|
---|
5729 | /foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @}
|
---|
5730 | @end example
|
---|
5731 | @noindent
|
---|
5732 | instead of just
|
---|
5733 | @example
|
---|
5734 | /foo/@{ N;N;N;N;N;N;N;N;N; @}
|
---|
5735 | @end example
|
---|
5736 |
|
---|
5737 | @cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
|
---|
5738 | In any case, the simplest workaround is to use @code{$d;N} in
|
---|
5739 | scripts that rely on the traditional behavior, or to set
|
---|
5740 | the @code{POSIXLY_CORRECT} variable to a non-empty value.
|
---|
5741 |
|
---|
5742 | @item Regex syntax clashes (problems with backslashes)
|
---|
5743 | @cindex GNU extensions, to basic regular expressions
|
---|
5744 | @cindex Non-bugs, regex syntax clashes
|
---|
5745 | @command{sed} uses the @sc{posix} basic regular expression syntax. According to
|
---|
5746 | the standard, the meaning of some escape sequences is undefined in
|
---|
5747 | this syntax; notable in the case of @command{sed} are @code{\|},
|
---|
5748 | @code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<},
|
---|
5749 | @code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
|
---|
5750 |
|
---|
5751 | As in all GNU programs that use @sc{posix} basic regular
|
---|
5752 | expressions, @command{sed} interprets these escape sequences as special
|
---|
5753 | characters. So, @code{x\+} matches one or more occurrences of @samp{x}.
|
---|
5754 | @code{abc\|def} matches either @samp{abc} or @samp{def}.
|
---|
5755 |
|
---|
5756 | This syntax may cause problems when running scripts written for other
|
---|
5757 | @command{sed}s. Some @command{sed} programs have been written with the
|
---|
5758 | assumption that @code{\|} and @code{\+} match the literal characters
|
---|
5759 | @code{|} and @code{+}. Such scripts must be modified by removing the
|
---|
5760 | spurious backslashes if they are to be used with modern implementations
|
---|
5761 | of @command{sed}, like
|
---|
5762 | GNU @command{sed}.
|
---|
5763 |
|
---|
5764 | On the other hand, some scripts use s|abc\|def||g to remove occurrences
|
---|
5765 | of @emph{either} @code{abc} or @code{def}. While this worked until
|
---|
5766 | @command{sed} 4.0.x, newer versions interpret this as removing the
|
---|
5767 | string @code{abc|def}. This is again undefined behavior according to
|
---|
5768 | POSIX, and this interpretation is arguably more robust: older
|
---|
5769 | @command{sed}s, for example, required that the regex matcher parsed
|
---|
5770 | @code{\/} as @code{/} in the common case of escaping a slash, which is
|
---|
5771 | again undefined behavior; the new behavior avoids this, and this is good
|
---|
5772 | because the regex matcher is only partially under our control.
|
---|
5773 |
|
---|
5774 | @cindex GNU extensions, special escapes
|
---|
5775 | In addition, this version of @command{sed} supports several escape characters
|
---|
5776 | (some of which are multi-character) to insert non-printable characters
|
---|
5777 | in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r},
|
---|
5778 | @code{\t}, @code{\v}, @code{\x}). These can cause similar problems
|
---|
5779 | with scripts written for other @command{sed}s.
|
---|
5780 |
|
---|
5781 | @item @option{-i} clobbers read-only files
|
---|
5782 | @cindex In-place editing
|
---|
5783 | @cindex @value{SSEDEXT}, in-place editing
|
---|
5784 | @cindex Non-bugs, in-place editing
|
---|
5785 |
|
---|
5786 | In short, @samp{sed -i} will let you delete the contents of
|
---|
5787 | a read-only file, and in general the @option{-i} option
|
---|
5788 | (@pxref{Invoking sed, , Invocation}) lets you clobber
|
---|
5789 | protected files. This is not a bug, but rather a consequence
|
---|
5790 | of how the Unix file system works.
|
---|
5791 |
|
---|
5792 | The permissions on a file say what can happen to the data
|
---|
5793 | in that file, while the permissions on a directory say what can
|
---|
5794 | happen to the list of files in that directory. @samp{sed -i}
|
---|
5795 | will not ever open for writing a file that is already on disk.
|
---|
5796 | Rather, it will work on a temporary file that is finally renamed
|
---|
5797 | to the original name: if you rename or delete files, you're actually
|
---|
5798 | modifying the contents of the directory, so the operation depends on
|
---|
5799 | the permissions of the directory, not of the file. For this same
|
---|
5800 | reason, @command{sed} does not let you use @option{-i} on a writable file
|
---|
5801 | in a read-only directory, and will break hard or symbolic links when
|
---|
5802 | @option{-i} is used on such a file.
|
---|
5803 |
|
---|
5804 | @item @code{0a} does not work (gives an error)
|
---|
5805 | @cindex @code{0} address
|
---|
5806 | @cindex GNU extensions, @code{0} address
|
---|
5807 | @cindex Non-bugs, @code{0} address
|
---|
5808 |
|
---|
5809 | There is no line 0. 0 is a special address that is only used to treat
|
---|
5810 | addresses like @code{0,/@var{RE}/} as active when the script starts: if
|
---|
5811 | you write @code{1,/abc/d} and the first line includes the string @samp{abc},
|
---|
5812 | then that match would be ignored because address ranges must span at least
|
---|
5813 | two lines (barring the end of the file); but what you probably wanted is
|
---|
5814 | to delete every line up to the first one including @samp{abc}, and this
|
---|
5815 | is obtained with @code{0,/abc/d}.
|
---|
5816 |
|
---|
5817 | @ifclear PERL
|
---|
5818 | @item @code{[a-z]} is case insensitive
|
---|
5819 | @cindex Non-bugs, localization-related
|
---|
5820 |
|
---|
5821 | You are encountering problems with locales. POSIX mandates that @code{[a-z]}
|
---|
5822 | uses the current locale's collation order -- in C parlance, that means using
|
---|
5823 | @code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
|
---|
5824 | case-insensitive collation order, others don't.
|
---|
5825 |
|
---|
5826 | Another problem is that @code{[a-z]} tries to use collation symbols.
|
---|
5827 | This only happens if you are on the GNU system, using
|
---|
5828 | GNU libc's regular expression matcher instead of compiling the
|
---|
5829 | one supplied with GNU sed. In a Danish locale, for example,
|
---|
5830 | the regular expression @code{^[a-z]$} matches the string @samp{aa},
|
---|
5831 | because this is a single collating symbol that comes after @samp{a}
|
---|
5832 | and before @samp{b}; @samp{ll} behaves similarly in Spanish
|
---|
5833 | locales, or @samp{ij} in Dutch locales.
|
---|
5834 |
|
---|
5835 | To work around these problems, which may cause bugs in shell scripts, set
|
---|
5836 | the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
|
---|
5837 |
|
---|
5838 | @item @code{s/.*//} does not clear pattern space
|
---|
5839 | @cindex Non-bugs, localization-related
|
---|
5840 | @cindex @value{SSEDEXT}, emptying pattern space
|
---|
5841 | @cindex Emptying pattern space
|
---|
5842 |
|
---|
5843 | This happens if your input stream includes invalid multibyte
|
---|
5844 | sequences. @sc{posix} mandates that such sequences
|
---|
5845 | are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
|
---|
5846 | pattern space as you would expect. In fact, there is no way to clear
|
---|
5847 | sed's buffers in the middle of the script in most multibyte locales
|
---|
5848 | (including UTF-8 locales). For this reason, @value{SSED} provides a `z'
|
---|
5849 | command (for `zap') as an extension.
|
---|
5850 |
|
---|
5851 | To work around these problems, which may cause bugs in shell scripts, set
|
---|
5852 | the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
|
---|
5853 | @end ifclear
|
---|
5854 | @end table
|
---|
5855 |
|
---|
5856 |
|
---|
5857 |
|
---|
5858 |
|
---|
5859 | @page
|
---|
5860 | @node GNU Free Documentation License
|
---|
5861 | @appendix GNU Free Documentation License
|
---|
5862 |
|
---|
5863 | @include fdl.texi
|
---|
5864 |
|
---|
5865 |
|
---|
5866 | @page
|
---|
5867 | @node Concept Index
|
---|
5868 | @unnumbered Concept Index
|
---|
5869 |
|
---|
5870 | This is a general index of all issues discussed in this manual, with the
|
---|
5871 | exception of the @command{sed} commands and command-line options.
|
---|
5872 |
|
---|
5873 | @printindex cp
|
---|
5874 |
|
---|
5875 | @page
|
---|
5876 | @node Command and Option Index
|
---|
5877 | @unnumbered Command and Option Index
|
---|
5878 |
|
---|
5879 | This is an alphabetical list of all @command{sed} commands and command-line
|
---|
5880 | options.
|
---|
5881 |
|
---|
5882 | @printindex fn
|
---|
5883 |
|
---|
5884 | @contents
|
---|
5885 | @bye
|
---|
5886 |
|
---|
5887 | @c XXX FIXME: the term "cycle" is never defined...
|
---|