Can sed replace new line characters?

Question

Is there an issue with sed and new line character?
I have a file test.txt with the following contents

aaaaa  
bbbbb  
ccccc  
ddddd

The following does not work:
sed -r -i 's/\n/,/g' test.txt

I know that I can use tr for this but my question is why it seems not possible with sed.

If this is a side effect of processing the file line by line I would be interested in why this happens. I think grep removes new lines. Does sed do the same?

In this case sed might be not the best tool to use (eg. "tr"). There are tools that are more intuitive, easier to read/maintain, performing better (especially on big data) etc. ... Don't use your hammer to put the screws in (even if it works). You can find a comparison on: http://slash4.de/blog/python/sed-replace-newline-or-python-awk-tr-perl-xargs.html — omoser, Feb 26 '15 at 09:51
tr would add a trailing , and would output an unterminated line. Best is to use paste instead: paste -sd , test.txt — Stéphane Chazelas, Jan 10 '17 at 14:09
Updated link in comment by @omoser: https://web.archive.org/web/20151102021030/https://slash4.de/blog/python/sed-replace-newline-or-python-awk-tr-perl-xargs.html — PunctuallyChallenged, Feb 14 '20 at 17:54

score 97 · Answer 1 · edited Sep 18 '19 at 11:19

This works with GNU sed:

sed -z 's/\n/,/g'

-z is included since 4.2.2

NB. -z changes the delimiter to null characters (\0). If your input does not contain any null characters, the whole input is treated as a single line. This can come with its limitations.

To avoid having the newline of the last line replaced, you can change it back:

sed -z 's/\n/,/g;s/,$/\n/'

(Which is GNU sed syntax again, but it doesn't matter as the whole thing is GNU only)

score 72 · Accepted Answer · edited May 23 '17 at 11:33

72

With GNU sed and provided POSIXLY_CORRECT is not in the environment (for single-line input):

sed -i ':a;N;$!ba;s/\n/,/g' test.txt

From https://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n :

create a label via :a
append the current and next line to the pattern space via N
if we are before the last line, branch to the created label $!ba ($! means not to do it on the last line (as there should be one final newline)).
finally the substitution replaces every newline with a comma on the pattern space (which is the whole file).

edited May 23 '17 at 11:33

Community

1

answered Feb 12 '14 at 20:26

Anthon

79,293

This seems to indicate that the problem is that sed reads line by line.But I can't understand why is this an issue.It could just read the line and replace the new line character (or last character) with a , – Jim Feb 12 '14 at 20:27
1

@jim It looks like it is not in the buffer to be matched, but I am not fluent with sed, maybe someone else can shed a light on that. I think you should extend your Q with that specific info, so people are more likely to read it, and hopefully answer. – Anthon Feb 12 '14 at 20:30
This results in ba: Event not found – krb686 May 21 '15 at 14:07
@krb686 What is the "This" you are referring to? Did you run the above sed command with those exact options? On what test.txt file? With which version of sed (try sed --version)? – Anthon May 21 '15 at 14:39
@Anthon Sorry, I think I meant to say "the". I read another SO post that informed me that csh requires me to escape the !. Interestingly, that still did not work for me and I ended up having to double escape the ! in my .csh script. So I don't really have a problem at the moment, but you do you know why that might be? What worked for me was sed :a;N;$\\!ba;s/\n/ /g' – krb686 May 21 '15 at 17:58
Note that that syntax is GNU specific, and even with GNU sed, if POSIXLY_CORRECT is in the environment and the input has only one line, there will be no output. – Stéphane Chazelas Feb 12 '16 at 15:16
this doesn't answer any of OP's three questions. – phil294 Mar 06 '16 at 17:30
@StéphaneChazelas When you say GNU specific you mean the \n in the substitute command? – dev Jan 10 '17 at 12:44
@dev, no. That \n is standard. It's the -i, the :a;..., ba;... (in many sed implementation including the original one, ; is part of the label), and the assumption that N doesn't exit sed on the last line. Also, many non-GNU sed implementations have a limit on the size of their pattern space (that solution loads the whole file in it before substituting) – Stéphane Chazelas Jan 10 '17 at 13:03
Thanks @StéphaneChazelas. While echo 'hello\nworld' | gsed ':a;N;$!ba;s/\n/, /g' worked on my macOS, I tried BSD sed just for the fun of it. Like you said, it was using ; after a label that caused the command to fail. The working equivalent for non-GNU would be echo 'hello\nworld' | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/, /g'. And of course, as you point out in a separate comment, non-GNU will produce no output if input is echo 'hello'. – dev Jan 10 '17 at 13:46

score 15 · Answer 3 · answered Nov 24 '14 at 00:14

sed always removes the trailing \newline just before populating pattern space, and then appends one before writing out the results of its script. A \newline can be had in pattern-space by various means - but never if it is not the result of an edit. This is important - \newlines in sed's pattern space always reflect a change, and never occur in the input stream. \newlines are the only delimiter a sedder can count on with unknown input.

If you want to replace all \newlines with commas and your file is not very large, then you can do:

sed 'H;1h;$!d;x;y/\n/,/'

That appends every input line to hold space - except the first, which instead overwrites hold space - following a \newline character. It then deletes every line not the $!last from output. On the last line Hold and pattern spaces are exchanged and all \newline characters are y///translated to commas.

For large files this sort of thing is bound to cause problems - sed's buffer on line-boundaries, that can be easily overflowed with actions of this sort.

The fact that d deletes the pattern space and starts the next cycle is crucially important here (that is, for all but the last line, x;y/\n/,/ is ignored). — leo, Jul 26 '23 at 03:27

score 11 · Answer 4 · edited Sep 18 '19 at 11:22

From Oracle's web site:

The sed utility works by sequentially reading a file, line by line, into memory. It then performs all actions specified for the line and places the line back in memory to dump to the terminal with the requested changes made. After all actions have taken place to this one line, it reads the next line of the file and repeats the process until it is finished with the file.

Basically this means that because sed is reading line by line the newline character is not matched.

The solution from https://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n is:

sed ':a;N;$!ba;s/\n/,/g'

or, in a portable version (without ; concatening after jump mark labels)

sed -e ':a' -e 'N;$!ba' -e 's/\n/,/g'

An explanation into how that works is provided on that page.

Note that that syntax is GNU specific, and even with GNU sed, if POSIXLY_CORRECT is in the environment and the input has only one line, there will be no output. — Stéphane Chazelas, Feb 12 '16 at 15:16

score 5 · Answer 5 · edited Jan 22 '20 at 08:50

There are actually two questions on your post:

Can sed replace new line characters?

Yes. Absolutely yes. Any sed could do:

s/\n/,/g

or

y/\n/,/

That will transform any newline (that got into the pattern space) into commas.

Is there an issue with sed and new line character?

Yes, there are several issues with the newline character in sed:

By default, sed will place in the pattern space a valid line. Some seds have limits on the length of a line and on accepting NUL bytes. A line ends on a newline. So, as soon as a newline is found on the input, the input gets split, then sed removes the newline and places what is left in the pattern space. So, most of the time, no newline gets into the pattern space.
Only by an edit of the pattern space is a newline added/inserted/edited in.
Almost always, a newline is appended to each consecutive output of sed.
The GNU sed is able to avoid printing a trailing newline if the last line of the input is missing the newline.
Only GNU sed is able to use another delimiter instead of newline (namely NUL bytes with the -z option).

All the above points make it difficult to "convert newlines" to anything.
And, if newlines are replaced with another text character, then sed must contain the whole text file in memory (whatever process was used to get there).

A couple of solutions that capture the whole file in memory in sed are:

sed 'H;1h;$!d;x;y/\n/,/'   file      # most seds. [1]
sed ':a;N;$!ba;s/\n/,/g'   file      # GNU sed.   
sed -z 's/\n/,/g;s/,$/\n/' file      # GNU sed.

A couple of fast solutions that doesn't use much memory are:

tr '\n' ',' file ; echo
awk '{printf("%s%s",NR==1?"":",",$0)}END{print ""}' file

¹From sed solutions: For every line, H adds the line to the hold space (except that the first line completely replace the hold space (avoid a leading newline)), then the pattern space is erased with $!d (except on the last line). On that last line, which was not erased, the rest of commands gets executed. First, get all the lines captured in the hold space with x and then, replace all newlines with a comma with y/\n/,/.

score 3 · Answer 6 · edited Nov 22 '14 at 13:24

3

Alternatively, you can use a slightly simpler syntax:

sed ':a;N;s/\n/,/g;ba'

...just changing sequence order.

edited Nov 22 '14 at 13:24

terdon

242,166

answered Nov 22 '14 at 12:57

Rodec

31

4

But runs the s command for each input line on a pattern space that is increasingly big. – Stéphane Chazelas Feb 12 '16 at 15:08

score 1 · Answer 7 · answered Feb 12 '16 at 10:32

1

Let's say you want to replace newlines by \n. I wanted to do that, so here's what I did:

(echo foo; echo bar; echo baz) | sed -r '$!s/$/\\n/' | tr -d '\n' 
# Output: foo\nbar\nbaz

Here's what it does: for all lines except the last, append \n. Then, delete newlines with tr.

answered Feb 12 '16 at 10:32

Camilo Martin

729

-r is only available in GNU sed, not BSD. – kenorb Sep 11 '19 at 11:23

score 1 · Answer 8 · answered Feb 12 '16 at 14:44

There's some very nice sed magic here. And some good points raised about pattern space overflow. I love to use sed even when it's not the simplest way, because it's so compact and powerful. However it has it's limitations, and for large amounts of data the pattern space would have to be mahoosive.

GNU says this:

For those who want to write portable sed scripts, be aware that some implementations have been known to limit line lengths (for the pattern and hold spaces) to be no more than 4000 bytes. The posix standard specifies that conforming sed implementations shall support at least 8192 byte line lengths. GNU sed has no built-in limit on line length; as long as it can malloc() more (virtual) memory, you can feed or construct lines as long as you like.
However, recursion is used to handle subpatterns and indefinite repetition. This means that the available stack space may limit the size of the buffer that can be processed by certain patterns.

I don't have much to add, but I would like to point you towards my go-to guide for sed. It's excellent. http://www.grymoire.com/Unix/Sed.html

and here is my solution:

for i in $(cat test.txt); do echo -n $i','; done; echo '' >> somewhere

well it works

You might want to read Why is using a shell loop to process text considered bad practice? and Security implications of forgetting to quote a variable in bash/POSIX shells and maybe Why is printf better than echo? — Stéphane Chazelas, Feb 12 '16 at 15:18

score 0 · Answer 9 · edited Dec 13 '19 at 14:05

0

To replace newline with comma, you can try this (csh):

foreach i (`cat test.txt`)
  echo -n "$i ,"
end

edited Dec 13 '19 at 14:05

schrodingerscatcuriosity

12,396

answered Dec 13 '19 at 06:54

Manish

11

1

The user knows they can do this task with tr, but asks whether it can be done using sed. The question is about sed rather than about some other way of performing the same task. – Kusalananda Dec 13 '19 at 07:43

Can sed replace new line characters?

9 Answers9

Linked

Related