2

I have a file which has 2,000,000 lines in it. I am running some commands for each line. I was trying to achieve some parallelism using GNU-parallel and swift as discussed here. However, I got an interesting idea from one of my friends.

He was suggesting to spawn multiple processes in the server since the server is pretty powerful. I was thinking if I use an index for each line of file, I could spawn multiple processes based on the totallines mod number_of_processes.

For example, if line_numbers are 1,11 and 21, it will be sent to first process and if line numbers are 2,12 and 22 it will be sent to second process so on.

To achieve the above, I was going through background processes in shell scripting. In most of the tutorials/links, they are appending an & to the command and telling that a background process will be spawned by the computer. I am finding it little difficult to understand this concept.

Ramesh
  • 39,297

1 Answers1

2

How does your idea differ from GNU Parallel's --pipe --round-robin?

seq 100 | parallel --pipe --round-robin -j10 -N 1 'echo Start;cat'

Doing it line by line is somewhat inefficient for GNU Parallel. Doing it block by block is more efficient:

seq 1000000 | parallel --pipe --round-robin -j10 'echo Start;cat'

Adjust --block to suit your needs.

Ole Tange
  • 35,514