Text processing - join every two lines with commas

Question

I have more than 1000 lines in a file. The file starts as follows (line numbers added):

Station Name
Station Code
A N DEV NAGAR
ACND
ABHAIPUR
AHA
ABOHAR
ABS
ABU ROAD
ABR

I need to convert this to a file, with comma separated entries by joining every two lines. The final data should look like

Station Name,Station Code
A N DEV NAGAR,ACND
ABHAIPUR,AHA
ABOHAR,ABS
ABU ROAD,ABR
...

What I was trying was - trying to write a shell script and then echo them with comma in between. But I guess a simpler effective one-liner would do the job here may be in sed/awk.

Any ideas?

@l0b0 You edited out the OP's remark that the line numbers are "only there for explanation"... — jasonwryan, Oct 18 '12 at 09:15
@jasonwryan Sorry, I thought the lines were there for explanation. Parse error at line 0. — l0b0, Oct 18 '12 at 09:27
http://stackoverflow.com/questions/9605232/merge-two-lines-into-one — Ciro Santilli OurBigBook.com, Aug 15 '16 at 10:50

score 56 · Accepted Answer · edited Oct 19 '12 at 06:57

56

Simply use cat (if you like cats ;-)) and paste:

cat file.in | paste -d, - - > file.out

Explanation: paste reads from a number of files and pastes together the corresponding lines (line 1 from first file with line 1 from second file etc):

paste file1 file2 ...

Instead of a file name, we can use - (dash). paste takes first line from file1 (which is stdin). Then, it wants to read the first line from file2 (which is also stdin). However, since the first line of stdin was already read and processed, what now waits on the input stream is the second line of stdin, which paste happily glues to the first one. The -d option sets the delimiter to be a comma rather than a tab.

Alternatively, do

cat file.in | sed "N;s/\n/,/" > file.out

P.S. Yes, one can simplify the above to

< file.in sed "N;s/\n/,/" > file.out

or

< file.in paste -d, - - > file.out

which has the advantage of not using cat.

However, I did not use this idiom on purpose, for clarity reasons -- it is less verbose and I like cat (CATS ARE NICE). So please do not edit.

Alternatively, if you prefer paste to cats (paste is the command to concatenate files horizontally, while cat concatenates them vertically), you may use:

paste file.in | paste -d, - -

edited Oct 19 '12 at 06:57

Stéphane Chazelas

544,893

answered Oct 17 '12 at 17:54

January

1,937

Just to mention it again. Line numbers are not a part of file :) – mtk Oct 17 '12 at 18:06
The paste command perfectly works, can you please give a little more explanation about it. The hyphens ??? – mtk Oct 17 '12 at 18:08
3

The hyphens mean "read from stdin". If the same input source is repeated, paste knows to read from it several times per row of output. – dubiousjim Oct 17 '12 at 18:13
@sch: cool edit, I won't touch it :-) – January Oct 18 '12 at 10:35
1

With respect to your cat argument. Does sed "N;s/\n/,/" file.in > file.out not work? – Bernhard Oct 18 '12 at 11:13
Of course it does. As I wrote, I use that particular form for clarity. – January Oct 18 '12 at 11:14
1

This type of construction is why I absolutely love the UNIX, "do one thing and do it well" mentality. This just about defines elegance and simplicity. – 0xACE Jul 25 '19 at 20:39

score 15 · Answer 2 · answered Jan 22 '14 at 21:15

15

In case anyone landing here is looking to combine all lines into a CSV one liner, try

cat file | tr '\n' ','

answered Jan 22 '14 at 21:15

Darren Weber

251

score 5 · Answer 3 · answered Oct 18 '12 at 02:17

5

sed 'N;s/\n/,/' file

Using sed, join(N) every 2 lines, and replace the newline(\n) with ",".

answered Oct 18 '12 at 02:17

Guru

5,905

This is exactly what I was looking for! – RonJohn Jun 22 '22 at 19:06

score 4 · Answer 4 · edited Oct 19 '12 at 09:03

4

For the complete set of answers, a possible awk solution may be:

awk 'NR%2==1 {printf $0","} NR%2==0 { print $0}' *file*

edited Oct 19 '12 at 09:03

lurker

327

answered Oct 18 '12 at 11:11

Bernhard

12,272

@downvoter: What is wrong with my answer to deserve a downvote? How can it be improved? – Bernhard Oct 18 '12 at 20:38
1

Maybe because the lazy printf? Will fail in the rare case when a station name contains a format specifier. (See http://pastebin.com/wgxFttrJ for an example.) But this is just a guess, the downvote is not from me. – manatwork Oct 19 '12 at 09:54

score 3 · Answer 5 · edited Aug 29 '15 at 01:45

3

paste -sd ',\n' file.in > file.out

Also note that because we're merely replacing one character with another (every other newline with a comma), we can work on the input file in place:

paste -sd ',\n' file.in 1<> file.in

(but beware it might not work on non-Unix systems that have CRLF terminators (like Microsoft ones) that some emulated POSIX paste might treat in a non-Unix way)

edited Aug 29 '15 at 01:45

cuonglm

153,898

answered Oct 17 '12 at 18:11

Stéphane Chazelas

544,893

What does that 1 is doing here in 1<>? is that a typo? – αғsнιη May 19 '18 at 02:41
@αғsнιη, see this – iruvar Apr 09 '19 at 04:24

score 2 · Answer 6 · edited Oct 18 '12 at 06:31

2

Here is a one-liner (though potentially millions-of-commands-run-er) using pure Bash:

(IFS=; while read -r name; do read -r code; printf '%s\n" "$name,$code"; done < file.in) > file.out

I use a subshell (the paranthesis) so that I won't have to store and restore IFS. Which one otherwise should do as to not mess up the users environment in case the source is sourced. The alternative would be to pass that new IFS only to read as in IFS= read -r name, IFS= read -r code.

The fact that all the commands in the loop are built in the shell makes its performance acceptable and is even faster than the other solutions for small files. But many people would consider it bad practice and one should be careful when generalising it to anything else.

edited Oct 18 '12 at 06:31

Stéphane Chazelas

544,893

answered Oct 17 '12 at 18:25

Deleted

409

in general yay for using subshells to localize environment changes. But in this case it's not needed: you can instead do while IFS='\n' read -r name; do IFS='\n' read -r code ... done < file.in, which is an idiom I often see in shell scripts. The -r flag to read means "interpret the character '' followed by the character 'n' in the stdin stream as two characters, rather than as a newline." Arguably, it may be more aesthetic to create the subshell as you do than to repeat the IFS='\n'. – dubiousjim Oct 17 '12 at 22:10
@dubiousjim: The -r improved the solution technically. Great! I'm not a fan of the idea of passing a changed IFS twice. If I had used one read, super nice, but not twice. Of course that's a matter of opinion. Using a subshell is a bit over the general Bash knowledge I would say, so a lot of folks will have trouble understanding its purpose. That's a bad thing. – Deleted Oct 17 '12 at 22:41

score 1 · Answer 7 · answered Aug 29 '15 at 00:38

1

Hoary old chestnut of an awkidiom

awk '{ORS=NR%2?",":"\n";print}' file
Station Name,Station Code
A N DEV NAGAR,ACND
ABHAIPUR,AHA
ABOHAR,ABS
ABU ROAD,ABR

answered Aug 29 '15 at 00:38

iruvar

16,725

awk '{ORS=NR%2?",":"\n"};1' is shorter and more idiom – cuonglm Aug 29 '15 at 01:48
@cuonglm, i doubt it. In this instance it's still a one-liner despite the print and the intent is clear. 1 is just as clear to old awk hands such as myself but I prefer print – iruvar Aug 29 '15 at 02:09
This was the first simple solution that I found that was easily configurable to more than 2 lines. I fought with sed for a while before searching, but awk made combining every 4 lines easier. Saved me a trip to the $EDITOR! – opello Feb 26 '16 at 17:37

score 1 · Answer 8 · answered Oct 25 '16 at 17:40

For example:

seq 0 70 | xargs -L 2 | sed 's/ /,/g'

Output: ( note: xargs -L number_of_columns works nicely with most any number of columns not just every two lines)

score 0 · Answer 9 · answered May 19 '18 at 01:57

0

POSIX solution with pr:

pr -2 -a -t -s, file

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html

answered May 19 '18 at 01:57

Zombo

1
5
44
63

score 0 · Answer 10 · answered Oct 18 '12 at 00:40

0

Possible with perl too,

perl -pe 's/^\d+\.\s+//;$.&1?chomp:print","' file

answered Oct 18 '12 at 00:40

daisy

54,555

Text processing - join every two lines with commas

10 Answers10

Linked

Related