5

An example of my file looks like this:

201012,720,201011,119,201710,16

Output I want:

201012,720
201011,119
201710,16
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Please specify whether or not the lines of the file always have an even number of fields. – agc May 18 '19 at 18:19

7 Answers7

5

Using a Sed loop:

sed -e 's/,/\n/2' -e 'P;D' file

Ex.

$ echo '201012,720,201011,119,201710,16' | sed -e 's/,/\n/2' -e 'P;D'
201012,720
201011,119
201710,16

This replaces the second , with \n, then prints and deletes up the \n, repeatedly until the substitution is no longer successful.

BSD doesn't understand newline as \n in right side of s commands, this is a workaround for ksh,bash,zsh shells:

sed -e :a -e $'s/,/\\\n/2' -e 'P;D' file

Or, a general solution for (old) seds:

sed '
:a
s/,/\
/2
P;D
' file
steeldriver
  • 81,074
5
$ paste -d, - - < <( tr ',' '\n' <file )
201012,720
201011,119
201710,16

or, without the process substitution,

$ tr ',' '\n' <file | paste -d, - -
201012,720
201011,119
201710,16

This replaces all commas in the file with newlines using tr, then uses paste to create two columns separated by a comma from that.

If tr feels a bit too simple, you may replace it with sed 'y/,/\n/', which does the same thing.

Kusalananda
  • 333,661
4

I was able to accomplish this with the following awk command:

awk -F, -v OFS=, '{for (i=1;i<=NF;i=i+2) {j=i+1; print $i,$j}}' input

This will loop through each column in the input (incrementing by 2 each iteration) and print that column plus the next adjacent column on a line before moving to the next.

$ cat input
201012,720,201011,119,201710,16
$ awk -F, -v OFS=, '{for (i=1;i<=NF;i=i+2) {j=i+1; print $i,$j}}' input
201012,720
201011,119
201710,16
jesse_b
  • 37,005
  • 1
    If the number of fields (comma separated) is not-even: a line with a trailing comma will be printed (which should be "the best we can do"). –  May 18 '19 at 15:38
  • @Isaac: A line with a trailing comma should be what you want to indicate there is an empty column in a csv file. – jesse_b May 18 '19 at 15:48
  • Maybe, If you think of a missing field as equal to an empty field. But I think that 1,2,3,"" is not the same as 1,2,3. ... And, the sed solution do not generate that trailing comma. –  May 18 '19 at 16:07
  • In a csv an empty field is a missing field. You are right though. 1 is not the same as 1,"". One is a csv and the other is not. – jesse_b May 18 '19 at 16:10
  • This is especially important when an earlier column is empty. For example ,,value1,value2 would be necessary to ensure the values end up in the proper column – jesse_b May 18 '19 at 16:14
  • If a "proper csv" is required then your answer is probably correct. I can not find a mention to "csv", nor to "proper csv" in the question though. I may be confused. –  May 18 '19 at 16:19
  • OP is dealing with comma separated values (CSV), and is trying to create a file with two columns – jesse_b May 18 '19 at 16:20
1

Using xargs and printf:

xargs -d, printf '%s,%s\n' < file

Output:

201012,720
201011,119
201710,16

The above code assumes each line has an even number of fields. If not, xargs will print lone numbers and dangling commas. But this somewhat slower code should plow through most anything:

tr , '\n' < file | xargs -n2 printf '%s,%s\n' | sed '$s/,$//'

Which can be sped up by increasing -n2 to some reasonable maximum even number, e.g. suppose no number in the input is longer than 15 digits:

m=$(getconf ARG_MAX) m=$(( (m/16) + (m%2) ))
tr , '\n' < file | xargs -n"${m}" printf '%s,%s\n' | sed '$s/,$//'
agc
  • 7,223
1

Another sed solution:

sed 's/\([^,]*,[^,]*\),/\1\n/g' file

This replaces each second comma with a newline.

Freddy
  • 25,565
0
awk -F'\n' -vRS=, '{l=$1; $0=""; getline; print l RS $1}'

or

awk -F'\n' -vRS=, '{print $1 RS (getline > 0 ? $1 : "")}'

You can omit the -F'\n' if the fields don't contain spaces. Or set it to the same value as the record separator (eg. with -F,) if your fields may also contain newlines (if eg. in the output of echo 1,2,3,4 the last field should be 4\n, not 4).

  • Only prints even field count. Why are not-even fields changed? –  May 18 '19 at 17:26
  • Try echo $'201012,720,201011,119,201710,16,201705\n201709,115,201708,23\n201707' | awk -F'\n' -vRS=, '{l=$1; $0=""; getline; print l","$1}'. –  May 18 '19 at 18:07
0

You could use grep - beware not all versions support -o and that this will not work for an odd number of fields

grep -E -o '[^,]+,[^,]+' file

Or some overkill

gawk 'BEGIN{FPAT="[^,]+,[^,]+";OFS="\n"}; {$1=$1; print}' file
iruvar
  • 16,725