$ curl -sI https://google.com | sed -n '/content-length/l'
content-length: 220\r$
See the carriage-return (aka CR
, \r
, ^M
) at the end of the line ($
is sed
's way to represent the end of the line). HTTP headers are delimited with CRLF, while the Unix line delimiter is LF.
Also using unsanitised data in arithmetic expressions in bash and other Korn-like shells is a command injection vulnerability, all the more a problem here that you used the -k
aka --insecure
option allowing MitM attackers to inject arbitrary headers in responses.
On a GNU system, you can use:
local_size=$(stat -Lc %s -- "$dest/$file") || die
remote_size=$(curl -sI -- "$url" | LC_ALL=C grep -Piom1 '^content-length:\s*\K\d+') ||
die "No content-length"
case $((local_size - remote_size)) in
(0) echo same;;
(-*) echo remote bigger;;
(*) echo local bigger;;
esac
By only returning what \d+
matches in the C locale, we make sure remote_size
only contains decimal ASCII digits, removing the ACE vulnerability.
A standard equivalent of that GNU grep
command could be:
LC_ALL=C sed '/^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]:[[:space:]]*\([0-9]\{1,\}\).*/!d;s//\1/;q'
With the caveat that if the header is not found, it wouldn't return a false exit status like grep
does so you'd need an additional check for [ -n "$remote_size" ]
.
die
above could be:
die() {
[ "$#" -eq 0 ] || printf>&2 '%s\n' "$@"
exit 1
}
(adapt to whatever logging mechanism you want to use).
Also note that though that would be very unlikely in practice, it's possible for headers to be folded. For instance, the content-length header could be returned as:
Content-Length:<CR>
123456<CR>
One way to extract the header value is to use formail
which is a tool designed to work with RFC822 headers:
remote_size=$(curl... | formail -zcx content-length -U content-length)
With -U content-length
, if there's more than one Content-Length
header, it's the last one that is returned. Change -U
to -u
to return the first like with grep -m1
above.
You'll still want to sanitise the result or use [
's (not [[...]]
's!) -lt
/-eq
/-gt
operators instead of ((...))
to avoid the ACE vulnerabilities.
With curl
7.84.0 or newer, you can also get curl
to give you the value of that header directly with:
remote_size=$(curl -w '%header{content-length}' -sIo /dev/null -- "$url") || die
Through testing, I find that
- if there are several occurrences of the header, it will return the value for the first one only
- it will complain if the value doesn't start with a digit optionally preceded with a
+
, but still needs to be sanitised as whatever characters there are after that are passed along.
- it does support folded headers, but rejects a content-length whose first line has an empty value.
)
seems like a CR/LF issue. Remove the$'\r'
from the value. – choroba Apr 05 '23 at 10:56