5

When I am shell scripting, the majority of what I am doing is wrapping the I/O of other modules in python, matlab, etc. To do so, I usually use text files or something of that nature with the input/output paths. I know to read one line from one file I can use,

for file in $(cat $1);
do
    code using $file
done

but what if I wanted to do something using equivalent lines from both files? something like the equivalent Java:

while((line1 = file1.readLine()) != null) {
    line2 = file2.readLine();
    //do something with both lines...
}

What is the standard method for doing this in bash?

Eric
  • 153

2 Answers2

6
exec 3<file1
exec 4<file2
while read line1 <&3 && read line2 <&4
do
        echo "line1=$line1 and line2=$line2"
done
exec 3<&-
exec 4<&-

Discussion

  • In the above, leading and trailing white space is stripped from the input lines. If you want to preserve this whitespace, replace read … with IFS= read …

  • In the above, backslashes in the input will be interpreted as escape characters. If you don't want that, replace read … with read -r …

  • read line1 <&3 reads line1 from file descriptor 3. This can also be written equivalently as read -u3 line1.

  • Statements such as for file in $(cat $1); have some issues that you should know about it. The shell will apply both word splitting pathname expansion to the contents of the file and, unless you were expecting this, it can lead to various errors.

Alternative

while read line1 <&3 && read line2 <&4
do
        echo "line1=$line1 and line2=$line2"
done 3<file1 4<file2
John1024
  • 74,655
  • excellent explanation, +1. Any specific reason the descriptor redirects 3<file1 and 4<file2 are not inlined into the while loop? – iruvar Jun 14 '15 at 23:28
  • @1_CR Thanks. Inlining on the loop works too. I thought this added flexibility (such as one loop could process the first N lines of the files and a second loop process the rest) and also the explicit open and close might look more familiar to the OP who appears to have a python background. I will add the inline version to the answer. – John1024 Jun 14 '15 at 23:46
  • Works awesome, and a great explanation. Thanks! – Eric Jun 15 '15 at 00:38
  • As you said, there are various issues with ... $(cat ...). But since for is a shell construct there's (at least in bash, zsh, or ksh) no "command line" size problem with the argument list in the for-loop. (There would be a problem if some exec'ed command, like ls, would be involved.) – Janis Jun 15 '15 at 07:30
  • @Janis Very interesting: the command line length problem exists only for external commands. Answer corrected. – John1024 Jun 15 '15 at 17:20
5

To iterate over the lines of a file:

while IFS= read -r line; do
  echo "read $line"
done <input-file

To iterate over multiple files, open them on different file descriptors (see When would you use an additional file descriptor?).

while IFS= read -r line1 <&8 || IFS= read -r line2 <&9; do
  echo "read '$line1' from file 1 and '$line2' from file 2"
done 8<input-file1 9<input-file2

Using read <&8 || read <&9 completes the shortest file with empty lines to match the longest file. To exit as soon as the end of either file is reached, use && instead of ||. If you want to detect all cases, check the return code separately.

{
  while
    IFS= read -r line1 <&8; empty1=$?
    IFS= read -r line2 <&9; empty2=$?
    [ "$empty1" -ne 0 ] && [ "$empty2" -ne 0 ]
  do
    echo "read '$line1' from file 1 and '$line2' from file 2"
  done
  if [ "$empty1" -ne 0 ]; then
    echo "Finishing processing file 1"
    …
  fi
  if [ "$empty2" -ne 0 ]; then
    echo "Finishing processing file 2"
    …
  fi
} 8<input-file1 9<input-file2

Alternatively, you can join the two files together. The paste command is convenient for that. By default, it separates the lines by tabs (pass -d to select different delimiters) and completes files with empty lines. If the files don't contain tabs, this unambiguously delimits input lines.

tab=$(printf \\t)
paste input-file1 input-file2 |
while IFS=$tab read -r line1 line2; do … done

Note that shells are not very fast at doing text processing. More specialized tools are best for medium to large inputs. Preprocessing with paste is convenient to zip two files together for any post-treatment. If you need more control over when lines are read, awk can do that with its getline command (similar to the shell's read).

  • 1
    Good one! +1 - If you keep giving such comprehensive answers, I'll eventually learn something :) && vs || was most interesting ... btw, a minor point; IFS=$tab read -r line1 line2 – Peter.O Jun 15 '15 at 01:36
  • Algol-style, nice. – Janis Jun 15 '15 at 07:20