1

I'm trying to expand on this question but can't figure out this issue:

Let's say I've got a file roll.txt:

echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" >> roll.txt

I can place a newline after every sixth comma with the following sed command:

sed 's/,/,\n/6; P; D' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345','567'

However, when I try to place two newlines after every sixth comma:

sed 's/,/,\n\n/6; P; D' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789',

'432156789','876543291','213465789','542637819','123456','23456',

'22234','3456','7890543','34567891,'2345','567'

I instead get two newlines after the sixth comma, and four newlines after the 12th comma. Why? And how can I get two newlines after every sixth comma?

jophuh
  • 13
  • 2
    The P; D sequence only prints and deletes up to the first \n - leaving the second \n at the beginning of the pattern space. If you have a recent version of GNU sed, try running it with the --debug option to see what happens. – steeldriver Oct 13 '23 at 01:52

5 Answers5

2

As written in steeldriver's comment, in each cycle you add two lines, but print and remove only one. This would grow worse for longer sequences, with 3 and 7 and 15 empty lines ...

Thus, don't do the replacement, if your first line is empty:

sed '/^\n/!s/,/,\n\n/6; P; D'
Philippos
  • 13,453
2

With GNU awk for mult-char RS you could just define each record as being 6 non-commas-then-comma fields:

$ echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" |
awk -v RS='([^,]*,){0,6}' 'RT{print RT}'
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345',

and if you wanted to make sure every output line is 6 fields and only ends with , when the last field is empty so it's a valid CSV you could do:

$ echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" |
awk -v n=6 'BEGIN{RS="([^,]*,){0,"n"}"; FS=OFS=","} RT{$0=gensub(/,$/,"",1,RT); $n=$n; print}'
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','34567891,'2345',
Ed Morton
  • 31,617
1

Using Raku (formerly known as Perl_6)

If you want to combine elements in Raku you can batch them together:

~$  raku -ne 'put join "\n", .split(",").batch(6).map: *.join(",");' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','34567891,'2345','567'

So to get two-newlines between each batch, just join on \n\n:

~$  raku -ne 'put join "\n\n", .split(",").batch(6).map: *.join(",");' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'

'432156789','876543291','213465789','542637819','123456','23456'

'22234','3456','7890543','34567891,'2345','567'

Raku's batch function is equivalent to Raku's rotor(..., :partial) call. If you want to drop incomplete sets of 6 elements at the end just call rotor().

Finally, sometimes splitting doesn't always give you the answer you're looking for. In that case you can try combing through the data to extract elements of interest. The code below gives the exact equivalent to the answers above, but may be conceptually simpler. The only difficulty is that ' apostrophe can mess up one-liner quoting, so the character can be declared using its Unicode name \c[APOSTROPHE]:

~$ raku -ne 'put join "\n\n", .comb(/ \c[APOSTROPHE] \d+ \c[APOSTROPHE] /).batch(6).map: *.join(",");'  roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'

'432156789','876543291','213465789','542637819','123456','23456'

'22234','3456','7890543','2345','567'

https://unix.stackexchange.com/a/611077/227738
https://docs.raku.org/language/regexes
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
0

Try this:

echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891','2345','567'" > roll.txt

cat roll.txt | sed -E "s/([^,]*,){6}/\0\n\n/g"

You can change the number of items repeated between commas by replacing it with the 6.

  • 1
    What is \0? Don't you simply mean s/([^,]*,){6}/&\n\n/g? – Philippos Oct 13 '23 at 09:23
  • 1
    \0 in the replacement is the whole expression that is matched, same thing as & – Fadi Chamieh Oct 13 '23 at 09:53
  • It doesn't say so in the POSIX specification. The \n in the replacement is a GNU extension, so you are probably safe to use other GNU extensions as well, but I see no reason for using some cryptic non-standard code instead of the well-known standard. – Philippos Oct 16 '23 at 05:20
0

Using awk:

$ awk -F, '{for (i=1;i<NF;i++) printf "%s", $i FS ((i%6==0) ? ORS ORS: "") }END{print $NF; print ""}' file
'123456789','987651234','129873645','213456789','987612345','543216789',

'432156789','876543291','213465789','542637819','123456','23456',

'22234','3456','7890543','34567891,'2345','567'