9

I need to run a remote script using ssh via Ruby (net/ssh) to recursively copy a folder and exclude a subfolder. I am looking for the fastest way to do it so rsync is not good. Also, I understand that ssh uses sh and not bash.

In bash I do:

cp -r srcdir/!(subdir) dstdir

and it works fine. However when I launch the script via ssh I receive the error

sh: 1: Syntax error: "(" unexpected

because it is using sh.

I have checked the sh man page, but there is no option to exclude files.

Is it my assumption of ssh using sh correct? Any alternative suggestion?

EDIT 1: In case it is useful, the output of sudo cat /etc/shells is the following:

# /etc/shells: valid login shells
/bin/sh
/bin/dash
/bin/bash
/bin/rbash
/usr/bin/tmux
/usr/bin/screen

EDIT 2: OK. So bash it is available and that does not seems to be the problem. I have verified that the ssh is actually using bash. The issue seems to be related to the escaping of parenthesis or exclamation mark. I have tried to run the command from the shell (macos) and this is the actual command:

ssh -i .ssh/key.pem ubuntu@X.X.X.X 'mkdir /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/N; cp -r /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/mesh/!\(constant\) /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/N; ln -s /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/mesh/constant /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/N/constant'

In this way I receive a different error

cp: cannot stat '/home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/mesh/!(constant)': No such file or directory

EDIT 3: Based on the comments I have changed my command adding extglob

If I use

ssh -i .ssh/key.pem ubuntu@X.X.X.X 'shopt -s extglob; mkdir /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/N; cp -r /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/mesh/!\(constant\) /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/N; ln -s /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/mesh/constant /home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/N/constant'

I receive the following error:

cp: cannot stat '/home/ubuntu/OpenFOAM/ubuntu-4.1/run/LES_New-Area_residuals2/mesh/!(constant)': No such file or directory

If I do not escape the parenthesis I get

bash: -c: line 0: syntax error near unexpected token `('
Rojj
  • 213
  • 3
    ssh (well sshd) uses the login shell of the remote user. Could be anything. – Stéphane Chazelas Aug 19 '18 at 17:09
  • Unix doesn't have folders, only directories. :) – tchrist Aug 19 '18 at 18:49
  • 1
    In situations like this I often like to just develop the script on the remote host, then either 1) leave it there, ssh in (programmatically if need be) and execute it or 2) if it changes every time, scp it over, execute it via ssh, and then delete it. An extra step maybe, but you don't end up with escaping nightmares and globs expanding locally instead of remotely and all that. Otherwise I would always use heredoc format like @StéphaneChazelas uses below. – Josh Rumbut Aug 20 '18 at 18:00

6 Answers6

10

I don't know why you think that rsync would be slow. The speed of a copy is mostly determined by the speed of the disk. Rsync has many options to specify what you want included and excluded, so it gives you much better control than shell globbing.

As the bash manual states, the !(patter) is only recognized in bash if extglob is set. In your example you didn't set extglob. Further, a bash started as sh is still bash, but will disable some extensions for compatibility.

The SSH server will start the user's login shell, as specified in /etc/passwd. You can either change the shell, or use that shell to start another shell that fits your needs better.

RalfFriedl
  • 8,981
  • I tested with time. time cp -r mesh/!(constant) N -> real 1.04s and time rsync -a mesh/ N --exclude=constant -> real 1.8s – Rojj Aug 19 '18 at 12:49
  • 7
    @Rojj that’s apples to oranges comparison. For one thing, you’re using -a for rsync but not for cp. That involves preservation of permissions and other attributes, so you’re not actually doing the same thing. – Wildcard Aug 19 '18 at 19:17
10

SSH runs your login shell on the remote system, whatever that is. But !(foo) requires shopt -s extglob, which you might not have set on the remote.

Try this to see if SSH runs Bash on the remote side:

ssh me@somehost 'echo "$BASH_VERSION"'

If that prints anything, but your startup scripts don't set extglob, you can do it by hand on the command passed to ssh:

ssh me@somehost 'shopt -s extglob
    echo srcdir/!(subdir)'                                 
 # or
ssh me@somehost $'shopt -s extglob\n echo srcdir/!(subdir)'   

extglob affects the parsing of the command line, and only takes effect after a newline, so we have to put a literal newline there, a semicolon isn't enough.

ssh me@somehost 'shopt -s extglob; echo srcdir/!(subdir)'

Also not that if you escape the parenthesis with backslashes, they lose their special properties, like any other glob characters. This is not what you want to do in this case.

$ touch foo bar; shopt -s extglob; set +o histexpand
$ echo *
bar foo
$ echo !(foo)
bar
$ echo \*
*
$ echo !\(foo\)
!(foo)
ilkkachu
  • 138,973
6

A few notes first:

  • the ssh server doesn't start sh to interpret the command line sent by the client, it runs the login shell of the user on the remote host, as that-shell -c <the-string-provided-by-the-client>. The login shell of the remote user could be anything. Bear in mind that some shells like tcsh, fish or rc have very different syntax from that of sh.
  • it is really a command line, or more exactly a string (that can contain newline characters, so several lines). Even if you do ssh host cmd arg1 'arg 2' where cmd, arg1 and arg 2 are three arguments passed to ssh, ssh concatenates those arguments with spaces and actually sends the cmd arg1 arg 2 string to sshd, and the remote shell would split that into cmd, arg1, arg and 2.
  • !(subdir) is a glob operator (a ksh glob operator also supported by zsh -o kshglob and bash -O extglob). Like all globs, it excludes hidden files, so beware there may be other files that it excludes.

Here, to avoid the problem with finding out the right syntax for the remote shell, you can actually tell that other shell to start the shell you want and feed it the code via stdin (one of the options listed at How to execute an arbitrary simple command over ssh without knowing the login shell of the remote user?)

ssh host 'bash -O extglob -O dotglob' << 'EOF'
cp -r srcdir/!(subdir) dstdir/
EOF

bash -O extglob -O dotglob is a command line that is understood the same by all major shells, including Bourne-like ones, csh, rc, fish... The above would work as long as bash is installed and is in the user's $PATH (default $PATH, possibly modified by the user's login shell like with ~/.zshenv for zsh, ~/.cshrc for csh, ~/.bashrc for bash).

POSIXly (though in practice, you may find that more systems have a bash command than a pax command), you could do:

ssh host sh << 'EOF'
cd srcdir && pax -rw -'s|^\.//\./subdir\(/.*\)\{0,1\}$||' .//. /path/to/destdir/
EOF

-s applies substitutions to the paths being transferred. When that substitution expands to nothing, the file is excluded. The problem is that substitutions also apply to target of symlinks. That's why we use .//. above to make it less likely that a symlink be affected.

4

I don't think ssh is limited to using sh. It rather depends on what is installed on the target system, how the user is set up, and what shells are allowed in /etc/shells.

Did you consider the chsh command?

ephsmith
  • 1,006
RudiC
  • 8,969
4

If you want to do it in a fast way, you can look at rsync with a different encryption algorithm. This gives you the option to easily exclude etc., at not much speed sacrifice.

rsync -aHAXxv --numeric-ids --progress -e "ssh -T -c arcfour -o Compression=no -x" user@<source>:<source_dir> <dest_dir>

together with adding the arcfour encryption to the line starting with Ciphers in /etc/ssh/ssh_config, if not already enabled, gives you an acceptable speed.

WARNING: The arcfour encryption is insecure. Do NOT run this over insecure channels. If you are concerned about access to the server from insecure channels using arcfour encryption, change the etc/ssh/ssh_config with a host-specific part for your source host - Create a Host section in your ssh_config for your source host, you can use Ciphers arcfour there to mirror the above -c switch, which restricts arcfour encryption to this host only.

For details, refer to ssh_config man pages.

However, if your CPUs support the AES-NI instruction set, try switching to aes128-gcm@openssh.com (yes, that's the cipher name, including the @ stuff), which will use the blazingly fast (with AES-NI) AES128-GCM.

So, with a CPU supporting AES-NI, change "ssh -T -c arcfour -o Compression=no -x" to "ssh -T -c aes128-gcm@openssh.com -o Compression=no -x" for more secure results.

Explanation

rsync

  • (Don't use -z, it is much slower)
  • a: archive mode - rescursive, preserves owner, preserves permissions, preserves modification times, preserves group, copies symlinks as symlinks, preserves device files.
  • H: preserves hard-links
  • A: preserves ACLs
  • X: preserves extended attributes
  • x: don't cross file-system boundaries
  • v: increase verbosity
  • --numeric-ds: don't map uid/gid values by user/group name
  • if you need to sync, add --delete: delete extraneous files from dest dirs (differential clean-up during sync)
  • --progress: show progress during transfer

ssh

  • T: turn off pseudo-tty to decrease cpu load on destination.
  • c arcfour: use the weakest but fastest SSH encryption. Must specify "Ciphers arcfour" in sshd_config on destination.
  • o Compression=no: Turn off SSH compression.
  • x: turn off X forwarding if it is on by default.

The beef is in the ssh options - if you just use rsync -av and the -e ssh -T -c arcfour -o Compression=no -x" part, you can get these speeds as well.


Comparison:

  • 13.6 MB/s rsync -az
  • 16.7 MB/s scp -Cr
  • 44.8 MB/s rsync -a
  • 59.8 MB/s sftp
  • 61.2 MB/s scp -r
  • 61.4 MB/s sftp -R 128 -B 65536
  • 62.4 MB/s rsync -a -P -e "ssh -T -c arcfour -o Compression=no -x"
  • 143.5 MB/s scp -r -c arcfour
  • 144.2 MB/s sftp -oCiphers=arcfour

Sources:

https://gist.github.com/KartikTalwar/4393116

http://nz2nz.blogspot.com/2018/05/rsync-scp-sftp-speed-test.html

emk2203
  • 778
2

As per my calculations, the fastest full copy is always using 'tar' (here assuming GNU tar or compatible).

mkdir -p photos2 &&
  tar -C photos -cf - --exclude=./.thumbcache . |
  tar -C photos2 -xpf -

And tar has ton of options to manipulate attributes, permissions and file selection/exclusion. For example, the above command excludes the top level subfolder called .thumbcache while copying.

Lam Das
  • 21
  • Note that --exclude=.thumbcache excludes all the .thumbcache files, not only the one at the top-level. With GNU tar (not bsdtar), you can use --exclude=./.thumbcache to only exclude the top-level .thumbcache file. – Stéphane Chazelas Aug 20 '18 at 14:39