In Bash, how to print row into a single line using a delimiter?

Question

I've collected data with 3 fields. I want to print the 3rd field data into a single line. This is the data I'm getting.

$ cat file
1234  1234  dei_1/3,dei_2/3,dei_9/0,
dei_10/0,dei_8/4
2345  2345  dei_8/9,dei_5/6,dei_4/9
4244  4244  dei_0/9,dei_4/6,dei_4/1
4235  4235  dei_0/9,dei_4/6,dei_4/,de
i_9/7,dei_1/3,dei_2/3,dei_9/0

Expected Result:

1234  1234  dei_1/3,dei_2/3,dei_9/0,dei_10/0,dei_8/4
2345  2345  dei_8/9,dei_5/6,dei_4/9
4244  4244  dei_0/9,dei_4/6,dei_4/1
4235  4235  dei_0/9,dei_4/6,dei_4/,dei_9/7,dei_1/3,dei_2/3,dei_9/0

Codes I have so far

while read file; do if [[ $file == 1 ]]; then echo -n; fi; done

Can you show how you collect and print the data? That's what needs to get fixed. — choroba, May 29 '18 at 07:25
Similar, but not quite the same: https://unix.stackexchange.com/questions/429314/how-to-merge-lines-broken-by-newlines-inside-a-double-quoted-field — Kusalananda, May 29 '18 at 09:29

score 1 · Answer 1 · answered May 29 '18 at 07:37

The following script join with the previous line any line that doesn't start with 2 numbers:

$ awk -v ORS="" '$1~/^[0-9]+$/ && $2~/^[0-9]+$/ && NR>1{printf "\n"}1' file
1234  1234  dei_1/3,dei_2/3,dei_9/0,dei_10/0,dei_8/4
2345  2345  dei_8/9,dei_5/6,dei_4/9
4244  4244  dei_0/9,dei_4/6,dei_4/1
4235  4235  dei_0/9,dei_4/6,dei_4/,dei_9/7,dei_1/3,dei_2/3,dei_9/0

This relies on ORS(output record separator) that is reset to an empty string. The newline is added if the 2 first fields are numbers (and if it isn't the first line).

You might want to add END {print ""} to ensure the output ends with a newline. — glenn jackman, May 29 '18 at 14:45

score 1 · Answer 2 · answered May 29 '18 at 08:13

1

Short sed approach:

sed -E 'N; s/\n([^[:space:]]*,[^[:space:]]+)/\1/' file

The output:

1234  1234  dei_1/3,dei_2/3,dei_9/0,dei_10/0,dei_8/4
2345  2345  dei_8/9,dei_5/6,dei_4/9
4244  4244  dei_0/9,dei_4/6,dei_4/1
4235  4235  dei_0/9,dei_4/6,dei_4/,dei_9/7,dei_1/3,dei_2/3,dei_9/0

answered May 29 '18 at 08:13

RomanPerekhrest

30,212

Edge case: will not join lines if the trailing line does not contain a comma. – glenn jackman May 29 '18 at 14:46

score 0 · Answer 3 · answered May 29 '18 at 14:43

A couple of awk approaches:

Store the most recent line that starts with a digit, append to it if the current line does not start with a digit

awk '
    /^[[:digit:]]/ {if (prev) print prev; prev=$0; next} 
    {prev = prev $0} 
    END {if (prev) print prev}
' file

Reverse the file. If a line starts with a non-digit, read the next line and append the previous line. Reverse the results. I assume a record is split at most 1 time

tac file | awk '/^[^[:digit:]]/ {this = $0; getline; $0 = $0 this} 1' | tac

In Bash, how to print row into a single line using a delimiter?

3 Answers3