0
while IFS= read -r line
    do
      LOCATION=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $1 }')
      USER=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $2 }')
      MD5=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $3 }')
      FILE=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $4 }')
      CHECK=$(md5sum "$FILE" | awk '{ print $1 }')
      FILENAME="${FILE##*/}"
      echo "$FILENAME"
      REMOTECHECK=$(ssh server md5sum filelocation/"${FILENAME}" < /dev/null | awk '{ print $1 }')

      if [[ "$CHECK" == "$MD5" ]]; then

        echo "Local File MD5: "
        echo "$CHECK"
        echo "Remote File MD5: "
        echo "$REMOTECHECK"
   fi
done < _path to file_

The script works great with filenames without spaces, but I run into an issue when there is a file name with spaces.

Output when there is a filename with spaces.

md5sum: path_to_file/File: No such file or directory
md5sum: Name: No such file or directory
md5sum: With: No such file or directory
md5sum: Spaces.mp4: No such file or directory

From what I have been able to tell the issue is within this line of code.

      REMOTECHECK=$(ssh server md5sum filelocation/"${FILENAME}" < /dev/null | awk '{ print $1 }')

The script above works great with a file name without spaces, the issue only occurs on a file name with spaces.

If you can offer any advice it will be very helpful.

chriss
  • 123

3 Answers3

3

ssh doesn't run a command on the remote host, but sends code for the login shell of the remote user to interpret, so if you want that remote shell to execute a given command with a given list of arguments, you need to construct a command line in that shell syntax that will cause that shell to execute that command with those arguments.

A shell is a command line interpreter. Its primary purpose is to execute commands given the command lines (command lines being another way to say code in the shell syntax) you give it. In a Korn-like shell like yours with a value of $FILENAME being File Name With Spaces.mp4 and a command line like:

 ssh server md5sum filelocation/"${FILENAME}"

The shell's job is to execute a file found in $PATH whose name is ssh (something like /usr/bin/ssh) with these arguments:

  • argv[0]: ssh
  • argv[1]: server
  • argv[2]: md5sum
  • argv[3]: filelocation/File Name With Spaces.mp4

In the shell language syntax, white space separate command arguments, $xxx triggers parameter expansion, and quotes here are used to prevent split+glob upon that expansion.

Then ssh's job is, from that list of argument it receives, to connect to server, join the remaining arguments with spaces, and pass the result to the login shell of the remote user (their preferred shell which they can change with chsh, zsh for me, but could be tcsh, fish, yash, bash, rc...) by executing it with as arguments:

  • argv[0]: that shell's name
  • argv[1]: -c
  • argv[2]: that-result, so here: md5sum filelocation/File Name With Spaces.mp4

Here, while all shells have different syntaxes, that command line is simple enough that it will be interpreted the same by most. That is, it will execute a /path/to/md5sum command with these arguments:

  • argv[0]: md5sum
  • argv[1]: filelocation/File
  • argv[2]: Name
  • argv[3]: With
  • argv[4]: Spaces.mp4

For the md5sum command to be run with one filelocation/File Name With Spaces.mp4 argument instead, we'd need to tell that remote shell that those spaces are not to be taken as argument separators. And that's done via quoting/escaping. And the quoting syntax varies significantly between shells.

In any case, space is not the only character that would cause problem. Any character that is special in the remote shell syntax would be a problem as well. For instance, if the filename was $(reboot).mp4 or blah;rm -rf ~;blah.mp4, you'd have bigger problems.

If you know that remote shell is Bourne-like, you could do:

#! /bin/zsh -
while IFS=, read -ru3 location user md5 file rest; do
  md5sum -- $file | read check rest
  filename=$file:t
  print -r -- $filename
  ssh -n server "md5sum filelocation/${(qq)filename}" | read remotecheck rest
  if [[ $md5 = $check ]]; then
    printf '%s File MD5: %s\n' Local "$check" Remote "$remotecheck"
  fi
done 3< $path_to_file

That ${(qq)file} quotes with single quotes which is the safest way to quote things in Bourne-like shells. So in your case, File Name With Spaces.mp4 would be passed as 'File Name With Spaces.mp4'. If it was File Name With Quote's.mp4, it would be 'File Name With Quote'\''s.mp4', where everything is quoted with '...' except for the ' itself which is quoted with \.

If you cannot guarantee that the remote shell be Bourne-like, see How to execute an arbitrary simple command over ssh without knowing the login shell of the remote user? for more options.

Here, for your particular use-case, to compare the local and remote checksums, another option is to use md5sum's check mode (with -c):

#! /bin/zsh -
while IFS=, read -ru3 location user md5 file rest; do
  (cd -P -- $file:h && md5sum -- $file:t) |
    ssh -n server 'cd ./filelocation && md5sum -c'
done 3< $path_to_file

This time, the name of the file is written by the local md5sum and read by the remote one on its stdin, so we don't need to quote it for the remote shell. And that cd ./filelocation && md5sum -c command line is understood by most shells (the ./ prefix is to avoid the effect of $cdpath/$CDPATH in csh/tcsh/bash, the shells that do or can read their rc file when invoked non-interactively or over ssh).

  • Again, I do not think this is the solution, if it was there would be issues with the entire script running. As of right now, there are only issues running this command with spaces. There is no list of arguments that are being run against the server, there is only one command that is being run. The command that is being run is md5sum filelocation/"${FILENAME}" – chriss Dec 28 '20 at 20:36
  • @chriss exactly why you need not only to quote the filename to protect it from the remote shell, but also to protect those quotes from the local shell – Chris Davies Dec 28 '20 at 23:25
  • @chriss, see if the edit to my answer clears your confusion. – Stéphane Chazelas Dec 29 '20 at 07:52
0

The simplest approach, assuming your know the remote system is using a shell that support POSIX sh syntax, is something like this:

#!/bin/sh

while IFS= read -r line do LOCATION=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $1 }') USER=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $2 }') HASH=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $3 }') FILE=$(echo "$line" | awk 'BEGIN { FS = "," } ; { print $4 }') CHECK=$(b2sum "$FILE" | awk '{ print $1 }') FILENAME="${FILE##*/}" REMOTECHECK=$(printf '%s\0' "$FILE" | ssh castro xargs -0 -I{} b2sum "remotefile/{}" | cut -d' ' -f1) echo "Local File hash: " echo "$CHECK" echo "Remote File hash: " echo "$REMOTECHECK" done

There are several things to note:

First, we use xargs -I to specify a single specific path name on the remote system. This is the simplest way to get the remote system to pass our path name to the remote side and be quoted properly. Trying to quote it otherwise leads to fun edge cases when the file name contains quotation marks. These can be worked around using git rev-parse --sq-quote if you have Git available on your local side, but this is simpler and just as robust. Our xargs usage is not strictly portable (since some systems require the {} to be a separate argument), but Linux and most other common systems implement this behavior.

Second, we minimize the amount of processing that needs to be done on the remote system so that we don't have to make too many assumptions about it. This syntax would probably even work if the remote side were not using a POSIX sh, although I make no guarantees. It of course does rely on the remote side not providing any additional output, though, but that's practically unavoidable.

Third, we don't use MD5. To quote CERT CC, “Software developers…should avoid using the MD5 algorithm in any capacity.” It's not even suitable as a quick check because I have files on my system that collide with MD5. I've used BLAKE2b here (via b2sum), which is both secure and faster than MD5, or you can use SHA-256 (sha256sum or shasum -a 256) if that's not available.

bk2204
  • 4,099
  • 7
  • 9
  • By using xargs -I{} you're moving the problem to all the problematic characters associated with xargs (single quotes, double quotes, backslashes, newline, leading whitespace, the EOF character with some or byte not forming valid characters in the locale). Also, one can't use echo for arbitrary data. With GNU xargs or compatible, you could use printf '%s\0' "$filename" | ssh ... 'xargs -0 -I{} ...'. – Stéphane Chazelas Dec 29 '20 at 07:46
  • xargs -I doesn't have problems with single and double quotes or other characters other than newline. It is of course the case that my code doesn't handle newlines, but considering that the original is a newline-delimited CSV file, I didn't feel the need to zero-terminate the data because it isn't relevant here. – bk2204 Dec 29 '20 at 17:22
  • try echo 'test"qwe' | xargs -I{} echo {}. Or echo ' test' | xargs -I{} echo "<{}>" with GNU xargs for instance. – Stéphane Chazelas Dec 29 '20 at 17:23
  • Fixed. The manual page appears to need some clarification, it seems. – bk2204 Dec 29 '20 at 17:28
  • Yes, reading the POSIX spec is usually best. For GNU utilities, the full info/html/pdf manual is preferable to the man pages. – Stéphane Chazelas Dec 29 '20 at 17:32
  • Note that that -I{} wouldn't work with csh, tcsh, rc, es and older versions of fish. -I'{}' should be OK with all (don't use double quotes which are not quoting operators in rc/es). – Stéphane Chazelas Dec 29 '20 at 17:37
0

The problem lies with this line and the way the two shells interpret it.

REMOTECHECK=$(ssh server md5sum filelocation/"${FILENAME}" < /dev/null | awk '{ print $1 }')

Let's assume the filename is "happy monday", and we'll look at the ssh command specifically

After evaluating the variable value, the local shell gets to see this

ssh server md5sum 'filelocation/happy monday' < /dev/null

In particular, the quotes are removed and the shell treats the content as a single word, filelocation/happy monday.

The result is now executed by ssh and the command line arguments passed to the remote shell (whatever that might be). Remember, the quotes have been removed, so this is what's executed remotely:

md5sum filelocation/happy monday

At this point md5sum is looking for two files, filelocation/happy and monday.

To prevent the loss of the quotes it's necessary to wrap the entire command in another set

ssh server "md5sum 'filelocation/happy monday'"

Reapplying this to your original code,

REMOTECHECK=$(ssh -n server "md5sum 'filelocation/$FILENAME'" | awk '{ print $1 }')
Chris Davies
  • 116,213
  • 16
  • 160
  • 287