2

I mean to use rsync to remove certain files (for Efficiently delete large directory containing thousands of files), in this case given to a shell script as patterns in the command line.

So far, this is what I have in my shell script rsync_del.sh

#!/bin/bash

TARGET_DIR=${1}
shift
PATTERNS="${@}"
for patt in ${PATTERNS} ; do
    # Both do the same
    #INCLUDE_PATTERNS="${INCLUDE_PATTERNS}"' --include='\'"${patt}"\'
    INCLUDE_PATTERNS="${INCLUDE_PATTERNS} --include=\"${pattern}\""
done

EMPTYDIR=$(mktemp -d)
echo "Created empty dir ${EMPTYDIR}"
comm="rsync -a --progress --delete ${INCLUDE_PATTERNS} ${EMPTYDIR}/ ${TARGET_DIR}"
echo ${comm}
eval ${comm}

Example patterns that I would like to use are *[1-9].txt, *000??9.txt. The problem is that, when executing

rsync_del.sh trg_dir '*[1-9].txt'

the command line generated is

rsync -a --progress --delete --include='*[1-9].txt' /tmp/tmp.51R9hPgkfG/ trg_dir/

(which seems ok to me), but it is matching, e.g., files like input.dat (and I don't want that).

What is the correct way of implementing/using this? I suspect it is a matter of properly escaping patterns, but I could not make this work.

Note: I need to define the command to be executed in a variable, to echo it prior to executing.

2 Answers2

2

Not looking too closely at the actual use of rsync here and instead concentrating on the creation of the rsync command line.

#!/bin/bash

target=$1
shift

empty=$( mktemp -d )

trap 'rmdir "$empty"' EXIT

for pattern do
   incl+=( --include="$pattern" )
done

rsync --archive --progress --delete "${incl[@]}" --exclude='*' "$empty"/ "$target"

By using an array (the incl array above), you store the arguments to rsync separately and not like a string. The expansion of "${incl[@]}" will be the individually quoted elements of the array. Quoting the arguments becomes trivial and there is no need to call eval. Note also that all parameter expansions need to be properly double quoted.

The issue with your code is that you use most parameter expansions unquoted. This makes the shell perform word splitting and filename generation (globbing) on the variables. This in turn means that you can not use patterns containing spaces or shell globbing patterns that may expand to names of existing files.


For /bin/sh the syntax becomes even less verbose:

#!/bin/sh

target=$1
shift

empty=$( mktemp -d )

trap 'rmdir "$empty"' EXIT

for pattern do
   set -- "$@" --include="$pattern"
   shift
done

rsync --archive --progress --delete "$@" --exclude='*' "$empty"/ "$target"
Kusalananda
  • 333,661
  • Thanks. Use of eval was only to be able to echo the command line to be used, as a reference and to possibly copy and reuse it with minor variations. – sancho.s ReinstateMonicaCellio Sep 27 '18 at 06:48
  • Perhaps it should be for pattern in ... ; do – sancho.s ReinstateMonicaCellio Sep 27 '18 at 06:49
  • @sancho.s No, the loop, when written as I have done above, will loop over the positional parameters correctly. – Kusalananda Sep 27 '18 at 06:54
  • Is there any advantage with using the array? It seems to produce exactly the same result. +1 Even if it was not the cause of the problem of the OP, you identified the need for escaping, so I changed my line to enclose ${INCLUDE_PATTERNS} in the double quotes to avoid local expansion. – sancho.s ReinstateMonicaCellio Oct 02 '18 at 07:14
  • @sancho.s You should do some standard tests with that such as using a pattern with spaces (maybe '* *' to match filenames that contain spaces) and a pattern that contain a filename pattern that matches a file in the current directory (maybe '*.txt' or just '*'). – Kusalananda Oct 02 '18 at 07:28
1

I forgot that I should close the pattern list by excluding anything not explicitly included. I used

comm="rsync -a --progress --delete ${INCLUDE_PATTERNS} --exclude=\"*\" ${EMPTYDIR}/ ${TARGET_DIR}"