12

I have a CSV delimited by commas and I want to delimit it by newlines instead.

Input:

a, b, c

Output:

a
b
c

I've written Java parsers that do this stuff, but couldn't this be done with vim or some other tool?

sed isn't working for me:

#!/bin/sh

# Start
cat > infile.csv << __EOF__
a, b, c
__EOF__
cat infile.csv
sed 's/, /\n/g' infile.csv > outfile.csv

cat outfile.csv
manatwork
  • 31,277
simpatico
  • 631

5 Answers5

21

Seems like the other answers achieve what you want, and a scriptable tool seems the most appropriate choice.

But you asked about vim, so here's how you do it there:

%s/, /\r/g

That is, replace every comma+space with a carriage return. This will then be interpreted as the appropriate line ending character for the file. (You can check this by searching for \r -- it won't be found).

Edd Steel
  • 2,731
8

Similar to Iain's answer, you can also use tr:

$ echo a,b,c | tr ',' '\n'
a
b
c

Both answers assume that the CSV is simple (that is, all commas are field separators). If you have something like a,"b,c",d where b,c is a single field, then things get more difficult

Michael Mrozek
  • 93,103
  • 40
  • 240
  • 233
  • @Mrozek: how do I use tr with the input and output file? – simpatico Mar 19 '11 at 17:55
  • @simpatico cat input | tr ',' '\n' > output – Michael Mrozek Mar 19 '11 at 18:40
  • 1
    tr ',' '\n' < input > output avoids the UUOC. – kojiro Mar 19 '11 at 23:33
  • 3
    @kojiro Yes, but I do not and never will care about that :). I even do cat foo | grep bar; I suspect the UUOC fanatics burn me in effigy every night – Michael Mrozek Mar 19 '11 at 23:51
  • As a relative newcomer to Linux/bash, I've been a bit puzzled by the UUOC issue; So, for general info; To cat or not to cat, that is the question... I just time tested 1000 (random) ASCII text files (total 307666 lines)... cat file* | grep sometext vs <file* grep sometext ... 'cat' is slower for real, user, and sys times: real(0m2.396s/0m2.121s): user(0m6.012s/0m3.172s); sys(0m2.264s/0m1.000s) ... (does time matter?.. sometimes yes, soemtimes no)... but the times are certainly different – Peter.O Mar 20 '11 at 16:32
  • 1
    @fred Nobody's arguing that avoiding the extra command isn't beneficial, it's whether or not you actually care. If I'm writing a script that's going to be used by lots of other people, I'll make an effort to do it right. If I'm running a command that's going to cat a 50 line file one time and pass it to another process, I really don't care how many cats are involved. The 2 seconds I could've saved by skipping the cat are voided by the 10 seconds it takes me to check that command's man page to remember if it takes a filename argument and what switch it is – Michael Mrozek Mar 20 '11 at 18:17
2

If your file is delimited by ', ' (commas followed by space) then

sed 's/, /\n/g' filename.csv >newfile

will do the job. If its delimited by ',' (commas without spaces) then

sed 's/,/\n/g' filename.csv >newfile

will work.

or change the \n to \o12 if your flavour of sed doesn't like it.

  • this replaced it with space and not new lines. Output: a b c – simpatico Mar 19 '11 at 14:52
  • @simpatico: Ok - I 'see' the problem now, your example output wasn't formatted correctly. –  Mar 19 '11 at 15:01
  • @lain: this doens't work with \n . Did you try it? the \ disappears while the n is appended. a, b --> an b – simpatico Mar 19 '11 at 17:57
  • @simpatico: I did test it and it works ok. –  Mar 19 '11 at 18:04
  • @Iain GNU sed and sed (some of which are strictly POSIX-compliant) need not necessarily be the same...And I am pretty sure the functionality you tested is guaranteed on GNU sed only. With the others, it may - or may not - work. Like playing poker. :-) – syntaxerror May 09 '15 at 02:26
2

The use of \n in a s replacement text in sed is allowed, but not mandated, by POSIX. GNU sed does it, but there are implementations that output \n literally.

You can use any POSIX-compliant awk. Set the input field separator FS to a regular expression and the output field separator ORS to a string (with the usual backslash escapes). The assignment $1=$ is needed to rebuild the line to use the different field separator.

awk -vFS=', *' -vOFS='\n' '{$1=$1; print}'

(This assumes that your input contains plain comma-and-whitespace-separated values, without any quoting. If there is quoting, you need to move to a real CSV parser in a language such as Perl or Python.)

0

Using Raku (formerly known as Perl_6)

perl6 -pe 's:g/ "," \s /\n/;'

#OR

perl6 -ne '.split(", ").join("\n").put;'

Also, if you're concerned about embedded newlines, commas-within-quotes, etc., use Raku's Text::CSV module:

raku -MText::CSV -e '.join("\n").put for csv(in => lines, sep => ", ");'

Sample Input:

a, b, c

Sample Output (all code above):

a
b
c

https://unix.stackexchange.com/a/701805/227738
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17