Looping through two files with a for loop and evaluating the indices

Question

I am learning bash shell scripting and am looking for pure bash ways rather than awk, etc.

I am trying to use a for loop to go through the contents of two files that are submitted as arguments in the terminal so that I can write expressions from the information.

The file contents are separated with tabs. The files have no extensions. Here is an example of the type of the file information I want to evaluate:

$cat file1
1     2     3
40    50    60

$cat file2
10     20     30
40     50     60

Here is the code that I have written:

read line1 < "file1"
read line2 < "file2"

difference=0

#I can see the contents of file1 with the below code and by changing the code
#slightly I can see the contents of file2 as well using the following code:

for index1 in $line1
do
echo "The contents of index1 are: $index1"
done

#But I am trying to do something like this which isn't working:
for index1 in $line1, index2 in $line2
do
difference=$(expr $index1 - $index2)
echo $difference
done

Possibly related: How to read from two input files using while loop — steeldriver, Apr 22 '19 at 23:15
Well, lets start with files themselves. What is their format ? Is it CSV? FASTA ? Tabular/column data ? Simply lines ? Numerical data ? — Sergiy Kolodyazhnyy, Apr 22 '19 at 23:16
Can you provide minimal example of how files should look like ? — Sergiy Kolodyazhnyy, Apr 22 '19 at 23:18

score 1 · Answer 1 · answered Apr 23 '19 at 04:38

This is definitely possible, but it's not pleasant. In reality, you'd want to go a different way about things.

For these particular files and this task, this script is probably the simplest, and uses a single while loop instead of any for:

exec 3<file2
while read a1 a2 a3 && read b1 b2 b3 <&3
do
    echo $((a1 - b1))
    echo $((a2 - b2))
    echo $((a3 - b3))
done < file1

This has a fixed three-column structure for each file and just reads them all at once. file2 is opened on file descriptor 3 (exec 3<file2) so that you can read it independently of file1: what you'd written would open the file and read the first line only every time. The read commands put the first word in x1, the second in x2, and the rest of the line in `x3.

There's no built-in or reasonably straightforward way to "zip" two lists together, or to write a parallel for loop. For loops over anything other than simple arrays or literal word lists are difficult to impossible, and constructing the right array is more work than performing the task another way.

If you have a variable number of columns, this is trickier, but we can use mapfile to create arrays from the lines, and then regular array processing with a C-style for loop for each line:

while read line1 && read line2 <&3
do
    mapfile -d $'\t' a1 <<<"$line1"
    mapfile -d $'\t' a2 <<<"$line2"
    for ((i=0; i<${#a1[@]}; i++))
    do
        echo $((a1[i] - a2[i]))
    done
done <file1

This creates two arrays a1 and a2 containing the tab-separated elements of each line, and loops over up to the length of the line from file1 (ignoring any extra items from the other file).

The best generic approximation of something like zip, assuming the files are well-formed and column counts match on corresponding lines, would be something like while read a b ... done < <(paste <(printf '%s\n' $(<file1)) <(printf '%s\n' $(<file2))), which is an absolute abomination.

That said, shell scripts are not a good mechanism for this sort of processing, and awk - or even better, a proper language - would be much more suitable and less fragile than this.

I think you can see from the above scripts that this is overly complicated to achieve, pretty fragile, and difficult to follow because the language just isn't made for it. Those scripts are more readable than the other ways to achieve the same thing, and that isn't saying a lot.

Is it just me or does the second code block is missing opened fd 3 ? or you've left out exec part ? — Sergiy Kolodyazhnyy, Apr 23 '19 at 04:47

Looping through two files with a for loop and evaluating the indices

1 Answers1