To answer your main question:
GNU sed
will append a <newline>
character when executing the p
command unless the input line was missing its terminating <newline>
character (see the clarifications about lines below).
As far as I can tell, sed
's p
flag and its auto-print feature implement the same logic to output the pattern space: if the trailing <newline>
character was removed, they add it back; otherwise they don't.
Examples:
$ printf '%s\n%s' '4' '5' | sed ';' | hexdump -C # auto-print
00000000 34 0a 35 |4.5|
00000003
$ printf '%s\n%s' '4' '5' | sed -n 'p;' | hexdump -C # no auto-print; p flag
00000000 34 0a 35 |4.5|
00000003
In both cases there is no <newline>
character (0a
) in the output for the input lines that don't have one.
About your diagrams:
"Adds back newline to pattern space" is probably inaccurate because the <newline>
character is not put in the pattern space1. Also, that step is not related to the -n
option - but this does not make the diagram wrong; rather, it should probably be merged into "Print pattern space".
Still, I agree with you about the documentation's lack of clarity.
1 The sentence you quote in your own answer, "the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed", means that the <newline>
is appended to the stream, not to pattern space. Of course, since pattern space is cleared in a short while, this is a really minor point
About your tests involving the x
flag:
Internally, pattern space and hold space are structures, and "was my trailing <newline>
character dropped?" is a member of them. We will call it chomped (as it is named in sed
's source code, by the way).
Pattern space is filled with a read line and its chomped attribute depends on how that line is terminated: true
if it ends with a <newline>
character, false
otherwise. On the other hand, hold space is initialized as empty and its chomped attributed is just set to true
.
Therefore, when you swap pattern space and hold space and print what was born as hold and is now pattern, a <newline>
character is printed.
Examples - these commands have the same output:
$ printf '\n' | sed -n 'p;' | hexdump -C # input is only a <newline>
00000000 0a |.|
00000001
$ printf '%s' '5' | sed -n 'x;p;' | hexdump -C # input has no <newline>
00000000 0a |.|
00000001
(I gave only a really brief look at sed
's code, so this might well be not accurate).
About lines (clarification started with comments to your answer):
It goes without saying that a line without a terminating <newline>
character is a problematic concept. Quoting POSIX:
3.206 Line
A sequence of zero or more non- <newline>
characters plus a terminating <newline>
character.
Furthermore, POSIX defines a text file:
3.403 Text File
A file that contains characters organized into zero or more lines. ...
Finally, POSIX on sed
(bold mine):
DESCRIPTION
The sed
utility is a stream editor that shall read one or more text files, make editing changes according to a script of editing commands, and write the results to standard output. ...
GNU sed
, though, seems to be less strict when defining its input:
sed
is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). ...
So, relating to my first sentence, we should take into account that, for GNU sed
, what is read into the pattern space doesn't necessarily have to be a well formed line of text.
d
andD
commands (andQ
with GNUsed
) which skip the printing. – Stéphane Chazelas Jan 09 '19 at 15:46