Separate the element after every 2nd ',' and push into next row in bash

Question

An example of my file looks like this:

201012,720,201011,119,201710,16

Output I want:

201012,720
201011,119
201710,16

Please specify whether or not the lines of the file always have an even number of fields. — agc, May 18 '19 at 18:19

score 5 · Accepted Answer · edited May 19 '19 at 18:01

5

Using a Sed loop:

sed -e 's/,/\n/2' -e 'P;D' file

Ex.

$ echo '201012,720,201011,119,201710,16' | sed -e 's/,/\n/2' -e 'P;D'
201012,720
201011,119
201710,16

This replaces the second , with \n, then prints and deletes up the \n, repeatedly until the substitution is no longer successful.

BSD doesn't understand newline as \n in right side of s commands, this is a workaround for ksh,bash,zsh shells:

sed -e :a -e $'s/,/\\\n/2' -e 'P;D' file

Or, a general solution for (old) seds:

sed '
:a
s/,/\
/2
P;D
' file

edited May 19 '19 at 18:01

answered May 18 '19 at 14:29

steeldriver

81,074

Able to process mutiline files even if fields on a line are odd (not-even): +1 – May 18 '19 at 15:42
Doesn't work with BSD sed – jesse_b May 18 '19 at 15:47
@Isaac: It just prints it all on one line like: 201012,720n201011,119,201710,16 – jesse_b May 18 '19 at 17:38
@Jesse_b Ah, when is bsd going to accept \n as a new line? sigh ... – May 18 '19 at 17:44
: unescaped newline inside substitute pattern – jesse_b May 18 '19 at 17:45
@Jesse_b Since it is bash based: Please try this: sed -e :a -e $'s/,/\\\n/2' -e 'P;D;ta' file – May 18 '19 at 17:51
@steeldriver the ta is never executed since the pattern space never survives the D. We can add an extra command after the ta to verify. So, this will suffice : sed -e 's/,/\n/2' -e 'P;D' – Rakesh Sharma May 18 '19 at 19:28
@Jesse_b use this on bsd: sed -e 'y/,/\n/' -e 's/\n/,/' -e 'P;D' to workaround the newline limitation. – Rakesh Sharma May 18 '19 at 19:40
@RakeshSharma thanks - I always forget that D starts a new cycle – steeldriver May 18 '19 at 21:08

score 5 · Answer 2 · answered May 18 '19 at 14:31

5

$ paste -d, - - < <( tr ',' '\n' <file )
201012,720
201011,119
201710,16

or, without the process substitution,

$ tr ',' '\n' <file | paste -d, - -
201012,720
201011,119
201710,16

This replaces all commas in the file with newlines using tr, then uses paste to create two columns separated by a comma from that.

If tr feels a bit too simple, you may replace it with sed 'y/,/\n/', which does the same thing.

answered May 18 '19 at 14:31

Kusalananda

333,661

Not a multi-line solution. If the file contains a line with non-even number of elements, it will be joined with next line. – May 18 '19 at 15:40
1

@Isaac That, I presume, would indicate an issue with the data. – Kusalananda May 18 '19 at 15:57
Which a robust solution should expose clearly in the output. Me thinks. – May 18 '19 at 17:41
Nice. Alternately pipe tr into paste -sd ',\n' – iruvar May 18 '19 at 20:04

score 4 · Answer 3 · answered May 18 '19 at 14:33

4

I was able to accomplish this with the following awk command:

awk -F, -v OFS=, '{for (i=1;i<=NF;i=i+2) {j=i+1; print $i,$j}}' input

This will loop through each column in the input (incrementing by 2 each iteration) and print that column plus the next adjacent column on a line before moving to the next.

$ cat input
201012,720,201011,119,201710,16
$ awk -F, -v OFS=, '{for (i=1;i<=NF;i=i+2) {j=i+1; print $i,$j}}' input
201012,720
201011,119
201710,16

answered May 18 '19 at 14:33

jesse_b

37,005

1

If the number of fields (comma separated) is not-even: a line with a trailing comma will be printed (which should be "the best we can do"). – May 18 '19 at 15:38
@Isaac: A line with a trailing comma should be what you want to indicate there is an empty column in a csv file. – jesse_b May 18 '19 at 15:48
Maybe, If you think of a missing field as equal to an empty field. But I think that 1,2,3,"" is not the same as 1,2,3. ... And, the sed solution do not generate that trailing comma. – May 18 '19 at 16:07
In a csv an empty field is a missing field. You are right though. 1 is not the same as 1,"". One is a csv and the other is not. – jesse_b May 18 '19 at 16:10
This is especially important when an earlier column is empty. For example ,,value1,value2 would be necessary to ensure the values end up in the proper column – jesse_b May 18 '19 at 16:14
If a "proper csv" is required then your answer is probably correct. I can not find a mention to "csv", nor to "proper csv" in the question though. I may be confused. – May 18 '19 at 16:19
OP is dealing with comma separated values (CSV), and is trying to create a file with two columns – jesse_b May 18 '19 at 16:20

agc · Answer 4 · 2019-05-18T23:56:02.253

Using xargs and printf:

xargs -d, printf '%s,%s\n' < file

Output:

201012,720
201011,119
201710,16

The above code assumes each line has an even number of fields. If not, xargs will print lone numbers and dangling commas. But this somewhat slower code should plow through most anything:

tr , '\n' < file | xargs -n2 printf '%s,%s\n' | sed '$s/,$//'

Which can be sped up by increasing -n2 to some reasonable maximum even number, e.g. suppose no number in the input is longer than 15 digits:

m=$(getconf ARG_MAX) m=$(( (m/16) + (m%2) ))
tr , '\n' < file | xargs -n"${m}" printf '%s,%s\n' | sed '$s/,$//'

score 1 · Answer 5 · answered May 18 '19 at 17:53

1

Another sed solution:

sed 's/\([^,]*,[^,]*\),/\1\n/g' file

This replaces each second comma with a newline.

answered May 18 '19 at 17:53

Freddy

25,565

score 0 · Answer 6 · 2019-05-18T19:05:09.133

0

awk -F'\n' -vRS=, '{l=$1; $0=""; getline; print l RS $1}'

or

awk -F'\n' -vRS=, '{print $1 RS (getline > 0 ? $1 : "")}'

You can omit the -F'\n' if the fields don't contain spaces. Or set it to the same value as the record separator (eg. with -F,) if your fields may also contain newlines (if eg. in the output of echo 1,2,3,4 the last field should be 4\n, not 4).

edited May 18 '19 at 19:05

answered May 18 '19 at 14:43

Only prints even field count. Why are not-even fields changed? – May 18 '19 at 17:26
Try echo $'201012,720,201011,119,201710,16,201705\n201709,115,201708,23\n201707' | awk -F'\n' -vRS=, '{l=$1; $0=""; getline; print l","$1}'. – May 18 '19 at 18:07

iruvar · Answer 7 · 2019-05-18T22:57:39.723

0

You could use grep - beware not all versions support -o and that this will not work for an odd number of fields

grep -E -o '[^,]+,[^,]+' file

Or some overkill

gawk 'BEGIN{FPAT="[^,]+,[^,]+";OFS="\n"}; {$1=$1; print}' file

edited May 18 '19 at 22:57

answered May 18 '19 at 22:50

iruvar

16,725

Separate the element after every 2nd ',' and push into next row in bash

7 Answers7