16

I have two parallel files with the same number of lines in two languages and plan to merge these two files line by line with the delimiter |||. E.g., the two files are as follows:

File A:

1Mo 1,1 I love you.
1Mo 1,2 I like you.
Hi 1,3 I am hungry.
Hi 1,4 I am foolish.

File B:

1Mo 1,1 Ich liebe dich.
1Mo 1,2 Ich mag dich.
Hi 1,3 Ich habe Durst.
Hi 1,4 Ich bin neu.

The expected output is like this:

1Mo 1,1 I love you. ||| 1Mo 1,1 Ich liebe dich.
1Mo 1,2 I like you. ||| 1Mo 1,2 Ich mag dich.
Hi 1,3 I am hungry. ||| Hi 1,3 Ich habe Durst.
Hi 1,4 I am foolish. ||| Hi 1,4 Ich bin neu.

I tried the paste command such as:

paste -d "|||" fileA fileB

But the returned output is only containing one pipe such as:

1Mo 1,1 I love you. |1Mo 1,1 Ich liebe dich.
1Mo 1,2 I like you. |1Mo 1,2 Ich mag dich.

Is there any way to separate each pair of lines by tripe pipe |||?

cuonglm
  • 153,898
Frown
  • 197

5 Answers5

21

With POSIX paste:

:|paste -d ' ||| ' fileA - - - - fileB

paste will concatenate corresponding lines of all input files. Here we have six files, fileA, four dummy files from standard in -, and fileB.

The list of delimiters include a space, three pipe and a space in that order will be used by paste circularly.

For the first line of six files, fileA will be concatenated with the first dummy file (which is nothing, thank to the no-op : operator), produce line1-fileA<space>.

The first dummy file will be concatenated with the second by a pipe, produce line1-fileA |, then the second dummy file with the third dummy file, produce line1-fileA ||, the third dummy file with the the forth dummy file, produce line1-fileA |||.

And the forth dummy file with fileB, produce line1-fileA ||| line1-fileB.

Those step will be repeated for all lines, give you the expected result.


The use of :| is for less-typing, and mainly use in interactive shell. In a script, you should use:

</dev/null paste -d ' ||| ' fileA - - - - fileB

to prevent a subshell from being spawned.

cuonglm
  • 153,898
  • 1
    +1 for the :|. clever alternative to </dev/null – cas Nov 23 '15 at 10:59
  • 4
    ...and +1 for the smart use of 4 dummy files from standard input with - - - -, but next time you can even write a couple of lines for explanation :) – Hastur Nov 23 '15 at 11:57
  • Thx, but I still get the output with one pipe... – Frown Nov 23 '15 at 13:10
  • @hui, did you run the command exactly as given including all the dashes and space characters? What's your operating system? – Stéphane Chazelas Nov 23 '15 at 13:25
  • :|paste -d '|' fileA - - fileB gives the more correct version without the space delimiter. – Pål GD Nov 23 '15 at 14:54
  • @Hastur: Add more details explanation. I don't have time to explain at the time when I posted the answer. – cuonglm Nov 23 '15 at 15:54
  • @PålGD Why do you assert that it is more correct without spaces? In the OP expected output there are, in the OP input files there are not. So it was proper to add. ||| To cuongin : It's a pity I cannot vote for this answer again. ||| – Hastur Nov 23 '15 at 16:14
  • @PålGD: If the lines contain trailing space, then you only need |||, but in the OP's question, they do not. – cuonglm Nov 23 '15 at 16:48
  • @PålGD I disagree the firmness of your assertion(s), and I like the strength of your belief. :-) Indeed I admit the OP should be more clear. He asserts desired output with _|||_ (with spaces) and that he tried ||| (without) obtaining only one |... then asked how to separate with three. BTW the focus of the question is on the ||| symbols and not on the spaces. – Hastur Nov 23 '15 at 16:49
  • @StéphaneChazelas Now it works. Sorry for the last comment. – Frown Nov 23 '15 at 17:46
  • @cuonglm Now it works. Thank you very much! – Frown Nov 23 '15 at 17:47
  • @PålGD: It worked for the OP now. – cuonglm Nov 23 '15 at 17:56
  • @PålGD You're right. And it has not to be (the space). Maybe he writes those lines by hands, or he added the space by hands to evidence the | symbol... hmm... (some second later) Tested: you're right even on the 2 systems on which I just tried. It should be not. – Hastur Nov 23 '15 at 21:00
7

Well, this doesn't use sed, awk, or grep, but you can do it pretty easily in bash. The command is:

(while IFS= read -r a <&3 && IFS= read -r b <&4; do echo "$a ||| $b"; done) 3<fileA 4<fileB

The problem with paste is that the delimiter is a single character. You could also insert a single character and the use sed to transform it, but that would be kind of error-prone if the character already appeared in the input file.

user3188445
  • 5,257
  • 2
    Your solution won't work if line contain any backslash character, or start with dash. You want to use IFS= before each read. You can easily do it with paste. See my answer, and also this one to see why should avoid using while loop in shell script. – cuonglm Nov 23 '15 at 10:40
  • It works for my file. Many Thx!!! – Frown Nov 23 '15 at 17:47
5

An awk (GNU) version

awk '{printf ("%s ||| ", $0); getline < "fileB"; print $0 }' fileA

With the getline command in awk, you can set $0 (all variables for columns) from next input record, if getline < "filename" you set the next $0 from the specified file.

getline < "file" Set $0 from next record of file; set NF.


Why your attempt didn't work as you expect? From man paste we can read

-d, --delimiters=LIST
     reuse characters from LIST instead of TABs

but it uses the delimiters one for each column.

So the command
paste -d '|*|*' fileA fileB fileA fileB gives me lines as

Hi 1,3 I am hungry.|Hi 1,3 Ich habe Durst.*Hi 1,3 I am hungry.|Hi 1,3 Ich...
Hi 1,4 I am foolish.|Hi 1,4 Ich bin neu.*Hi 1,4 I am foolish.|Hi 1,4 Ich...


A sed solution that I suggest to avoid even if close to your original attempt, because it patches the obtained behaviour to your original purpose:

 paste -d '|' fileA fileB | sed 's/|/|||/g'

To avoid because you substitute each pattern | with the new one |||, but you have to assume that the pipe symbol (|) is not present in your data, else you have to deal with special cases and make a more complex the code to avoid side effects.


A variant with the Here String [1] construct <<<

 paste -d ' ||| ' fileA - - - - fileB  <<< ''

You set 5 delimiters with -d ' ||| ' (space,|,|,|,space) and 4 dummy files (- - - -) that will take data from the empty string ''.


Tested on GNU Awk 4.0.1, paste (GNU coreutils) 8.21 and sed (GNU sed) 4.2.2

Hastur
  • 2,355
4

If you want to avoid the magic and drama of circular delimiters and dummy files, you could just append your delimiter to one file before pasting them:

paste <(sed 's/$/ |||/' filea) fileb

gives

1Mo 1,1 I love you. ||| 1Mo 1,1 Ich liebe dich.
1Mo 1,2 I like you. ||| 1Mo 1,2 Ich mag dich.
Hi 1,3 I am hungry. ||| Hi 1,3 Ich habe Durst.
Hi 1,4 I am foolish. |||    Hi 1,4 Ich bin neu.
snth
  • 311
  • I like this for simplicity. I believe you mean "prepend", not "append" though. Checkout Hastur's awk answer for the awk version of this. – Wildcard Nov 27 '15 at 08:39
  • You should change the process substitution to a pipe, so you won't have the limit for number of shells support it. – cuonglm Nov 27 '15 at 10:33
  • @Wildcard yes, prepend, but I'll rewrite it to append to filea. I think awk is a bit overkill for this. – snth Nov 28 '15 at 05:23
  • @cuonglm true, but I wanted to avoid pipes for clarity. I felt a pipe would make it start to look like the dummy files, but you are correct – snth Nov 28 '15 at 05:23
0

you can do it in python too in this way.

lines1 = [ line.rstrip() for line in open("file1") ]
lines2 = [ line.rstrip() for line in open("file2") ]
for i in xrange((len(lines1))): print lines1[i] + " ||| " + lines2[i]
... 
1Mo 1,1 I love you. ||| 1Mo 1,1 Ich liebe dich.
1Mo 1,2 I like you. ||| 1Mo 1,2 Ich mag dich.
Hi 1,3 I am hungry. ||| Hi 1,3 Ich habe Durst.
Hi 1,4 I am foolish. ||| Hi 1,4 Ich bin neu.
c4f4t0r
  • 649