Loop over lines in file and subtract previous line from current line

Question

I have a file that contains some numbers

$ cat file.dat
0.092593
0.048631
0.027957
0.030699
0.026250
0.038156
0.011823
0.013284
0.024529
0.022498
0.013217
0.007105
0.018916
0.014079

I want to make a new file that contains the difference of the current line with the previous line. Expected output should be

$ cat newfile.dat
-0.043962
-0.020674
0.002742
-0.004449
0.011906
-0.026333
0.001461
0.011245
-0.002031
-0.009281
-0.006112
0.011811
-0.004837

Thinking this was trivial, I started with this piece of code

f="myfile.dat"    
while read line; do
    curr=$line
    prev=

    bc <<< "$line - $prev" >> newfile.dat
done < $f

but I realized quickly that I have no idea how to access the previous line in the file. I guess I also need to account for that no subtraction should take place when reading the first line. Any guidance on how to proceed is appreciated!

Kusalananda · Accepted Answer · 2018-09-19T19:20:22.887

$ awk 'NR > 1 { print $0 - prev } { prev = $0 }' <file.dat
-0.043962
-0.020674
0.002742
-0.004449
0.011906
-0.026333
0.001461
0.011245
-0.002031
-0.009281
-0.006112
0.011811
-0.004837

Doing this in a shell loop calling bc is cumbersome. The above uses a simple awk script that reads the values off of the file one by one and for any line past the first one, it prints the difference as you describe.

The first block, NR > 1 { print $0 - prev }, conditionally prints the difference between this and the previous line if we've reached line two or further (NR is the number of records read so far, and a "record" is by default a line).

The second block, { prev = $0 }, unconditionally sets prev to the value on the current line.

Redirect the output to newfile.dat to save the result there:

$ awk 'NR > 1 { print $0 - prev } { prev = $0 }' <file.dat >newfile.dat

Why is using a shell loop to process text considered bad practice?

There was some mentioning of the slowness of calling bc in a loop. The following is a way of using a single invocation of bc to do the arithmetics while still reading the data in a shell loop (I would not actually recommend solving this problem in this way, and I'm only showing it here for people interested in co-processes in bash):

#!/bin/bash

coproc bc

{
    read prev

    while read number; do
        printf '%f - %f\n' "$number" "$prev" >&"${COPROC[1]}"
        prev=$number

        read -u "${COPROC[0]}" result
        printf '%f\n' "$result"
    done
} <file.dat >newfile.dat

kill "$COPROC_PID"

The value in ${COPROC[1]} is the standard input file descriptor of bc while ${COPROC[0]} is the standard output file descriptor of bc.

If prev is the value on the current line, how does this actually work? Because if I substitute prev = $0 into the first block, we get {print $0 - $0}. — Yoda, Sep 19 '18 at 14:41
@Yoda prev is the value on the previous line. For the first line, the first block is not executed (due to the NR > 1 condition), so prev only gets the value of the first line (and nothing else happens). For the second line, the difference between the second and first line is printed before prev is set to the value of the second line. The blocks are executed in order, and for each line. — Kusalananda, Sep 19 '18 at 14:42
@Yoda So prev is the value on the current line only at the very end of processing that line, before continuing with the next line. When we actually use prev, it is the value of the previous line. — Kusalananda, Sep 19 '18 at 14:57

Digital Trauma · Answer 2 · 2018-09-19T17:52:41.437

Using some straightforward GNU utilities, and no shell loops:

paste -d- <(head -n-1 file.dat) <(tail -n+2 file.dat) | bc

The idea here is to duplicate the input file into two columns; offset the second column by 1 line, and paste the columns together with - as a separator. head and tail are used to trim off the last line of the 1st column and first line of 2nd column respectively, to achieve the necessary offsetting. The resulting list is the required list of arithmetic differences that is piped to bc for evaluation.

Try it online.

Alternatively, if you like sed, you can do this:

sed '1{s/$/-\\/;p;d};${p;d};s/.*/&\n&-\\/' file.dat | bc

This duplicates each line and inserts -\ at the end of the second version of each line. The first and last lines are treated differently to generate the necessary expressions. The sed output ends up something like this:

a-\
b
b-\
c
c-\
d

These again are valid arithmetic differences that bc can evaluate. Not that bc understands the line-continuation backslashes at the ends of every other line.

Try it online.

score 1 · Answer 3 · answered Sep 19 '18 at 14:40

1

If you wanted to try and force the shell script into working, you were just missing some initialization:

f=myfile.dat
prev=0
while read line; do
    bc <<< "$line - $prev"
    prev=$line
done < $f > newfile.dat

... where I also moved the redirection outside of the loop, just to save some I/O.

The bc solution does not print leading zeroes, while the awk solution does.

answered Sep 19 '18 at 14:40

Jeff Schaller

67,283
35
116
255

1

Certainly shell loops are generally slow, but even slower is spawning a bc process for every iteration of the loop. You can simply echo the "$line - $prev" expression into a pipe which is evaluated by bc outside of the loop: https://tio.run/##TZBNDoIwEIX3PcWEsJX0l7YR3OnWMwAtgQTBoJGYeHdsB4129eZ7byYzratbtzbVHQ5webb94DMXiqKA4/lEaEYtV1ZEIU0uWBRcW6WjEDS3FknOFUVimMqjYMxw7GKCG4kZqfgW5tKaj8VwDtWMKiTGsq1dUm1J2GBty99W5Dr7R0nJ0gUAs68cDP3o9@AmAuH5ppsgSSODHaQxnaCBfciJm4JZQNrCC@omHD365Tsfv@G/fgM – Digital Trauma Sep 19 '18 at 17:02

score 1 · Answer 4 · answered Sep 19 '18 at 14:44

1

You could use an exec redirection to read successive lines of the input file from multiple points in the script - once before the loop (to set up the initial value), then repeatedly during it (for each new value to subtract):

exec 3<file.dat
read prev<&3
while read curr ; do
        bc <<< "$curr - $prev" >> newfile.dat
        prev=$curr
done <&3

answered Sep 19 '18 at 14:44

JigglyNaga

7,886

1

Using { read prev; while ...; do ...; done; } <file.dat, you avoid having to juggle file descriptors. – Kusalananda Sep 19 '18 at 15:00

score 0 · Answer 5 · answered Sep 19 '18 at 18:40

I use arrays. I use them for everything. I cannot remember how awk and sed work without extensive study of the man pages. Here is the way I would do it.

f=( $(< file.dat) )
for ((num=1;num<=${#f[@]};num++))
do
    echo $(bc <<< ${f[$num]}-${f[(($num-1))]})>>differences.dat
done

This is the way I understand it. It has the objectionable features of some of the other answers: looping and calling bc over and over. However, it only reads the file once, like the answers using sed and awk.

score -1 · Answer 6 · edited Sep 20 '18 at 05:42

-1

You could try this

num <- as.data.frame(num)
num$sub_num <- num[c(2:14, c("0")), ]
num$diff <- num$num - num$sub_num

edited Sep 20 '18 at 05:42

RalfFriedl

8,981

answered Sep 20 '18 at 05:16

Harini

1

2

This is not just poorly formatted, it doesn't even specify what programming language is used. – RalfFriedl Sep 20 '18 at 05:43

Loop over lines in file and subtract previous line from current line

6 Answers6