1

From a bash script I am downloading a file from a server using curl call. Now I want to check if the file is fully downloaded. For this I am comparing the size of downloaded file and Content-Length header. Although both are equal I am getting below error:

")syntax error: invalid arithmetic operator (error token is "

Part of bash script:

remote_size=$(curl -kI "${HEADERS[@]}" "$url" | grep -i content-length | awk '{print $2}')
                local_size=$(stat --format=%s "$dest/$file")
                if (( remote_size == local_size )); then
                        echo "File is complete" >&2
                elif (( remote_size > local_size )); then
                        echo "Download is incomplete" >&2
                elif (( remote_size < local_size )); then
                        echo "Remote file shrunk -- probably should delete local and start over" >&2
                fi

Inspired by: https://stackoverflow.com/questions/37885503/check-if-curl-incremental-continue-at-download-is-successful/37885681?

Can anyone please let me know what is the issue here and how to resolve it?

Thanks in advance

P.S: I am testing on ubuntu (on WSL), but the script will finally be part of Linux on an embedded platform. Please let me know if any info is missing

Preeti
  • 223

1 Answers1

4
$ curl -sI https://google.com | sed -n '/content-length/l'
content-length: 220\r$

See the carriage-return (aka CR, \r, ^M) at the end of the line ($ is sed's way to represent the end of the line). HTTP headers are delimited with CRLF, while the Unix line delimiter is LF.

Also using unsanitised data in arithmetic expressions in bash and other Korn-like shells is a command injection vulnerability, all the more a problem here that you used the -k aka --insecure option allowing MitM attackers to inject arbitrary headers in responses.

On a GNU system, you can use:

local_size=$(stat -Lc %s -- "$dest/$file") || die
remote_size=$(curl -sI -- "$url" | LC_ALL=C grep -Piom1 '^content-length:\s*\K\d+') ||
  die "No content-length"
case $((local_size - remote_size)) in
  (0) echo same;;
  (-*) echo remote bigger;;
  (*) echo local bigger;;
esac

By only returning what \d+ matches in the C locale, we make sure remote_size only contains decimal ASCII digits, removing the ACE vulnerability.

A standard equivalent of that GNU grep command could be:

LC_ALL=C sed '/^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]:[[:space:]]*\([0-9]\{1,\}\).*/!d;s//\1/;q'

With the caveat that if the header is not found, it wouldn't return a false exit status like grep does so you'd need an additional check for [ -n "$remote_size" ].

die above could be:

die() {
  [ "$#" -eq 0 ] || printf>&2 '%s\n' "$@"
  exit 1
}

(adapt to whatever logging mechanism you want to use).

Also note that though that would be very unlikely in practice, it's possible for headers to be folded. For instance, the content-length header could be returned as:

Content-Length:<CR>
 123456<CR>

One way to extract the header value is to use formail which is a tool designed to work with RFC822 headers:

remote_size=$(curl... | formail -zcx content-length -U content-length)

With -U content-length, if there's more than one Content-Length header, it's the last one that is returned. Change -U to -u to return the first like with grep -m1 above.

You'll still want to sanitise the result or use ['s (not [[...]]'s!) -lt/-eq/-gt operators instead of ((...)) to avoid the ACE vulnerabilities.

With curl 7.84.0 or newer, you can also get curl to give you the value of that header directly with:

remote_size=$(curl -w '%header{content-length}' -sIo /dev/null -- "$url") || die

Through testing, I find that

  • if there are several occurrences of the header, it will return the value for the first one only
  • it will complain if the value doesn't start with a digit optionally preceded with a +, but still needs to be sanitised as whatever characters there are after that are passed along.
  • it does support folded headers, but rejects a content-length whose first line has an empty value.