4
root@u1804:~# sed --version
sed (GNU sed) 4.5
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <https://www.gnu.org/software/sed/>.
General help using GNU software: <https://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed@gnu.org>.
root@u1804:~#

I'm new to sed and I created below sed's workflow based on my understanding (correct me if you find anything wrong).

enter image description here

So it seems the default auto printing of the pattern space will always include a newline at the end. My question is, will p includes a newline, too? I have below examples.

root@u1804:~# seq 3 | sed -rn 'p'
1
2
3
root@u1804:

Here the newline at the end of each number is added by sed itself (see the diagram "adds back newline to pattern space"). So it seems p will not append a newline. However, see below example.

root@u1804:~# seq 3 | sed -rn 'x;p;x;p'

1

2

3
root@u1804:~#

Here x exchange pattern space with hold space, which will result in an empty pattern space. Now p applies to the pattern space (nothing in it) should print nothing. But based on the result, it seems here p prints a newline. To me it seems this is inconsistent behavior. Can anyone explain?

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232

3 Answers3

5

To answer your main question:

GNU sed will append a <newline> character when executing the p command unless the input line was missing its terminating <newline> character (see the clarifications about lines below).

As far as I can tell, sed's p flag and its auto-print feature implement the same logic to output the pattern space: if the trailing <newline> character was removed, they add it back; otherwise they don't.

Examples:

$ printf '%s\n%s' '4' '5' | sed ';' | hexdump -C      # auto-print
00000000  34 0a 35                                          |4.5|
00000003
$ printf '%s\n%s' '4' '5' | sed -n 'p;' | hexdump -C  # no auto-print; p flag
00000000  34 0a 35                                          |4.5|
00000003

In both cases there is no <newline> character (0a) in the output for the input lines that don't have one.


About your diagrams:

"Adds back newline to pattern space" is probably inaccurate because the <newline> character is not put in the pattern space1. Also, that step is not related to the -n option - but this does not make the diagram wrong; rather, it should probably be merged into "Print pattern space".
Still, I agree with you about the documentation's lack of clarity.

1 The sentence you quote in your own answer, "the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed", means that the <newline> is appended to the stream, not to pattern space. Of course, since pattern space is cleared in a short while, this is a really minor point


About your tests involving the x flag:

Internally, pattern space and hold space are structures, and "was my trailing <newline> character dropped?" is a member of them. We will call it chomped (as it is named in sed's source code, by the way).
Pattern space is filled with a read line and its chomped attribute depends on how that line is terminated: true if it ends with a <newline> character, false otherwise. On the other hand, hold space is initialized as empty and its chomped attributed is just set to true.
Therefore, when you swap pattern space and hold space and print what was born as hold and is now pattern, a <newline> character is printed.

Examples - these commands have the same output:

$ printf '\n' | sed -n 'p;' | hexdump -C        # input is only a <newline>
00000000  0a                                                |.|
00000001
$ printf '%s' '5' | sed -n 'x;p;' | hexdump -C  # input has no <newline>
00000000  0a                                                |.|
00000001

(I gave only a really brief look at sed's code, so this might well be not accurate).


About lines (clarification started with comments to your answer):

It goes without saying that a line without a terminating <newline> character is a problematic concept. Quoting POSIX:

3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

Furthermore, POSIX defines a text file:

3.403 Text File
A file that contains characters organized into zero or more lines. ...

Finally, POSIX on sed (bold mine):

DESCRIPTION
The sed utility is a stream editor that shall read one or more text files, make editing changes according to a script of editing commands, and write the results to standard output. ...

GNU sed, though, seems to be less strict when defining its input:

sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). ...

So, relating to my first sentence, we should take into account that, for GNU sed, what is read into the pattern space doesn't necessarily have to be a well formed line of text.

fra-san
  • 10,205
  • 2
  • 22
  • 43
  • Thanks for your answer! I'll do some testing later. One quick question, is sed ';' the same as sed ''? – Just a learner Jan 09 '19 at 16:33
  • 1
    I think so. I chose ; as a short "do nothing" program, with no particular reason. I could (and probably should, for clarity) have chosen '' as well. – fra-san Jan 09 '19 at 18:29
  • 1
    This is the right answer. As to the POSIX docs... I wonder why people are so religious about them. Often, they are lacking and this is no exception, the DESCRIPTION from the sed manual is rather nonsensical. "sed is a stream editor that shall read... files". Really ? Then why is it called stream editor ? Because that's what it does, it edits streams - it just happens that it treats files as a stream of data... otherwise it would be called fed not sed. – don_crissti Jan 09 '19 at 18:43
4

I edited my answer to only include an updated diagram based on fra-san's answer. The sole purpose is for new sed users to reference.

enter image description here

  • sed outputs lines. A line is not a line if it's not terminated by a newline character. So the step "Adds newline", should really be incorporated into "Print pattern space" since it makes no sense to print a non-terminated line. Likewise for reading. There is no newline character to remove since a line is terminated by it. sed reads a line and puts it in the pattern space. An analogy would be having to mentally remove the dots from a text while reading it to make sense of the individual sentences. – Kusalananda Jan 08 '19 at 16:41
  • @Kusalananda Interesting. I didn't know that before. Thanks for providing the info. – Just a learner Jan 08 '19 at 16:44
  • @don_crissti Looking at POSIX, it clearly says that the output from sed is a text file. A text file is a collection of lines. A line is terminated by newline. It would be difficult to output a non-terminated line with standard sed (easy with GNU sed, sure). – Kusalananda Jan 08 '19 at 16:56
  • @don_crissti You are right. This is my testing VM, so I use root. It seems I should make it a habit of using a normal user and only use root when necessary, as several people told me this. Thanks for pointing it out and looking forward to your answer. – Just a learner Jan 08 '19 at 17:00
  • @Kusalananda - sed's output is never a text file (unless one uses the command w or the flag with the same name - in which case yes, sed will output to text files). – don_crissti Jan 08 '19 at 17:06
  • @don_crissti "The output files shall be text files whose formats are dependent on the editing commands given." Also, the section on "STDOUT" says that sed writes lines. – Kusalananda Jan 08 '19 at 17:14
  • 2
    POSIX seems indeed to be more clear: "Whenever the pattern space is written to standard output or a named file, sed shall immediately follow it with a ". GNU sed's documentation, on the other hand, includes footnote number 8: "Actually, if sed prints a line without the terminating newline, it will nevertheless print the missing newline as soon as more text is sent to the same output stream, ...". – fra-san Jan 08 '19 at 19:21
  • 1
    @fra-san - exactly, the keyword here is shall... In theory, yes, in practice not quite... – don_crissti Jan 08 '19 at 19:40
1

In GNU sed: The command p will add a trailing newline only if one was in the source text (one was removed from the input when placed in the pattern space) but also add a leading new line if additional text is printed to the same stream.

A trailing newline could be missing on the input only on the last line.

 $ printf 'abc' | od -An -c
    a   b   c                                 # no newline.

$ printf 'abc' | sed '' | od -An -c a b c # also no newline.

$ printf 'abc' | sed -n 'p' | od -An -c a b c # still no newline.

$ printf 'abc' | sed -n 'p;p' | od -An -c a b c \n a b c # leading newline added.

Only print the last line, which will have a new line only if the source file already had a newline on that last line:

 $ printf 'abc\ndef' | sed -n '$p' | od -An -c
    d   e   f

From info sed:

---------- Footnotes ----------

(1) Actually, if 'sed' prints a line without the terminating newline, it will nevertheless print the missing newline as soon as more text is sent to the same output stream, which gives the "least expected surprise" even though it does not make commands like 'sed -n p' exactly identical to 'cat'.

Some other sed versions might add a trailing newline and/or emit a warning.