47

I have more than 1000 lines in a file. The file starts as follows (line numbers added):

Station Name
Station Code
A N DEV NAGAR
ACND
ABHAIPUR
AHA
ABOHAR
ABS
ABU ROAD
ABR

I need to convert this to a file, with comma separated entries by joining every two lines. The final data should look like

Station Name,Station Code
A N DEV NAGAR,ACND
ABHAIPUR,AHA
ABOHAR,ABS
ABU ROAD,ABR
...

What I was trying was - trying to write a shell script and then echo them with comma in between. But I guess a simpler effective one-liner would do the job here may be in sed/awk.

Any ideas?

don_crissti
  • 82,805
mtk
  • 27,530
  • 35
  • 94
  • 130

10 Answers10

56

Simply use cat (if you like cats ;-)) and paste:

cat file.in | paste -d, - - > file.out

Explanation: paste reads from a number of files and pastes together the corresponding lines (line 1 from first file with line 1 from second file etc):

paste file1 file2 ...

Instead of a file name, we can use - (dash). paste takes first line from file1 (which is stdin). Then, it wants to read the first line from file2 (which is also stdin). However, since the first line of stdin was already read and processed, what now waits on the input stream is the second line of stdin, which paste happily glues to the first one. The -d option sets the delimiter to be a comma rather than a tab.

Alternatively, do

cat file.in | sed "N;s/\n/,/" > file.out

P.S. Yes, one can simplify the above to

< file.in sed "N;s/\n/,/" > file.out

or

< file.in paste -d, - - > file.out

which has the advantage of not using cat.

However, I did not use this idiom on purpose, for clarity reasons -- it is less verbose and I like cat (CATS ARE NICE). So please do not edit.

Alternatively, if you prefer paste to cats (paste is the command to concatenate files horizontally, while cat concatenates them vertically), you may use:

paste file.in | paste -d, - -
January
  • 1,937
  • Just to mention it again. Line numbers are not a part of file :) – mtk Oct 17 '12 at 18:06
  • The paste command perfectly works, can you please give a little more explanation about it. The hyphens ??? – mtk Oct 17 '12 at 18:08
  • 3
    The hyphens mean "read from stdin". If the same input source is repeated, paste knows to read from it several times per row of output. – dubiousjim Oct 17 '12 at 18:13
  • @sch: cool edit, I won't touch it :-) – January Oct 18 '12 at 10:35
  • 1
    With respect to your cat argument. Does sed "N;s/\n/,/" file.in > file.out not work? – Bernhard Oct 18 '12 at 11:13
  • Of course it does. As I wrote, I use that particular form for clarity. – January Oct 18 '12 at 11:14
  • 1
    This type of construction is why I absolutely love the UNIX, "do one thing and do it well" mentality. This just about defines elegance and simplicity. – 0xACE Jul 25 '19 at 20:39
15

In case anyone landing here is looking to combine all lines into a CSV one liner, try

cat file | tr '\n' ','
5
sed 'N;s/\n/,/' file

Using sed, join(N) every 2 lines, and replace the newline(\n) with ",".

Guru
  • 5,905
4

For the complete set of answers, a possible awk solution may be:

awk 'NR%2==1 {printf $0","} NR%2==0 { print $0}' *file*
lurker
  • 327
Bernhard
  • 12,272
  • @downvoter: What is wrong with my answer to deserve a downvote? How can it be improved? – Bernhard Oct 18 '12 at 20:38
  • 1
    Maybe because the lazy printf? Will fail in the rare case when a station name contains a format specifier. (See http://pastebin.com/wgxFttrJ for an example.) But this is just a guess, the downvote is not from me. – manatwork Oct 19 '12 at 09:54
3
paste -sd ',\n' file.in > file.out

Also note that because we're merely replacing one character with another (every other newline with a comma), we can work on the input file in place:

paste -sd ',\n' file.in 1<> file.in

(but beware it might not work on non-Unix systems that have CRLF terminators (like Microsoft ones) that some emulated POSIX paste might treat in a non-Unix way)

cuonglm
  • 153,898
2

Here is a one-liner (though potentially millions-of-commands-run-er) using pure Bash:

(IFS=; while read -r name; do read -r code; printf '%s\n" "$name,$code"; done < file.in) > file.out

I use a subshell (the paranthesis) so that I won't have to store and restore IFS. Which one otherwise should do as to not mess up the users environment in case the source is sourced. The alternative would be to pass that new IFS only to read as in IFS= read -r name, IFS= read -r code.

The fact that all the commands in the loop are built in the shell makes its performance acceptable and is even faster than the other solutions for small files. But many people would consider it bad practice and one should be careful when generalising it to anything else.

Deleted
  • 409
  • in general yay for using subshells to localize environment changes. But in this case it's not needed: you can instead do while IFS='\n' read -r name; do IFS='\n' read -r code ... done < file.in, which is an idiom I often see in shell scripts. The -r flag to read means "interpret the character '' followed by the character 'n' in the stdin stream as two characters, rather than as a newline." Arguably, it may be more aesthetic to create the subshell as you do than to repeat the IFS='\n'. – dubiousjim Oct 17 '12 at 22:10
  • @dubiousjim: The -r improved the solution technically. Great! I'm not a fan of the idea of passing a changed IFS twice. If I had used one read, super nice, but not twice. Of course that's a matter of opinion. Using a subshell is a bit over the general Bash knowledge I would say, so a lot of folks will have trouble understanding its purpose. That's a bad thing. – Deleted Oct 17 '12 at 22:41
1

Hoary old chestnut of an awkidiom

awk '{ORS=NR%2?",":"\n";print}' file
Station Name,Station Code
A N DEV NAGAR,ACND
ABHAIPUR,AHA
ABOHAR,ABS
ABU ROAD,ABR
iruvar
  • 16,725
  • awk '{ORS=NR%2?",":"\n"};1' is shorter and more idiom – cuonglm Aug 29 '15 at 01:48
  • @cuonglm, i doubt it. In this instance it's still a one-liner despite the print and the intent is clear. 1 is just as clear to old awk hands such as myself but I prefer print – iruvar Aug 29 '15 at 02:09
  • This was the first simple solution that I found that was easily configurable to more than 2 lines. I fought with sed for a while before searching, but awk made combining every 4 lines easier. Saved me a trip to the $EDITOR! – opello Feb 26 '16 at 17:37
1

For example:

seq 0 70 | xargs -L 2 | sed 's/ /,/g'

Output: ( note: xargs -L number_of_columns works nicely with most any number of columns not just every two lines)

0,1
2,3
4,5
6,7
8,9
10,11
12,13
14,15
16,17
18,19
20,21
22,23
24,25
26,27
28,29
30,31
32,33
34,35
36,37
38,39
40,41
42,43
44,45
46,47
48,49
50,51
52,53
54,55
56,57
58,59
60,61
62,63
64,65
66,67
68,69
70
jmunsch
  • 4,346
0

POSIX solution with pr:

pr -2 -a -t -s, file

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html

Zombo
  • 1
  • 5
  • 44
  • 63
0

Possible with perl too,

perl -pe 's/^\d+\.\s+//;$.&1?chomp:print","' file

daisy
  • 54,555