0

In linux only two characters are forbidden in filenames "slash" and "null character". So every character with special meaning in every script language should be escaped, BUT every escape sequence is allowed too in file names! Even worse that i.e. bash some escaping methods escapes only some characters, so to escape large amount of different characters you should use a couple of different escaping methods together, BUT they interfere with each other! Even worse that some commands use some characters for their purposes, and other commands use others, so for every single simple operation on files you should escape file name differently! Even worse that only null character could be used to separate filenames safely, but most commands cannot work with that. Even worse, that in linux basically everything is file... So this seems not only nuisance, but matter of security and stability because large portion of linux is script-based so very flawed!

So show me where i'm wrong... Is it even possible to correctly handle all possible file names?

Clarification. Originally i wanted to:

  1. list files and folders under given path

  2. search list to find ones match to given criteria (age or file pattern or size)

  3. move matched files and folders to categories i.e. movies Because of complexity of tests it was not possible (or practical) to do it in one command, so i had to pass file name between different commands. Bash globbing was first thing to throw off because of spaces in filenames. Globbing always split filename with spaces to two elements of list. Then i tried use "find". This was better, but much slower, and difficult to use.

I cannot use any special character to escape file name, because i don't know what character might be in file name. After some testing i discovered that is matter of time before any character will occur.

I've tried defined filter like: audio_ext=(*.mp3 *.wav *.ogg *.mid *.mod *.stm *.s3m *.it *.wma *.669 *.ac3) Soon i've realized that this way i cannot define filters for multiple uses, because globbing kicks rigths away. So i've disabled globbing and history by set -fH. Without globbing i had to do expansion by hand

while IFS= read -r -d $'\0'; do list+=("$REPLY") done < <( find . -maxdepth 1 -mindepth 1 ${params[@]} -print0 2>/dev/null )

Where params is array like "-iname" "*.mp3" "-o" "-iname" "*.wav" etc. This worked until file had "(" in name. Find returned error about wrong usage.

To tell the truth... I've used batch script for this task until recently for 15 years. Time spend on writing was around one or two afternoons. It had drawbacks and issue with ! in filenames, but generally it worked. Now i have trying almost two months to write it in bash. It's ugly, complicated, very buggy, and it seems it will never work good.

harvald
  • 19
  • 3
    What do you mean by handle? How are you enumerating the files? How are you calling the files? Are things being properly quoted? Is there a particular issue you are encountering? – jesse_b Apr 12 '18 at 16:25
  • 6
    This question is too broad. Please give an explicit example of an issue that you are having. Filenames are handled in many different ways depending on the problem at hand. This can be done in a secure way, but is done differently depending on the tools involved. – Kusalananda Apr 12 '18 at 16:27
  • I have voted to reopen this question as it now contains a reasonable explicit question to answer. – Kusalananda Apr 12 '18 at 17:45
  • 1
    I'm voting to close this as "unclear what you're asking" again. Based on the commentary on the answers and the claims that they don't work, or don't address what the asker is trying to accomplish, it seems that the question is not being properly defined. The question needs to contain an explicit example of what is trying to be accomplished, and demonstration of how it is not working. – phemmer Apr 12 '18 at 20:26
  • 2
    Regarding your last addition to the question: You need to quote ${audio_ext[@]} as "${audio_ext[@]}". If you use unquoted variable expansions, then yes, you are definitely going to have problems. – Kusalananda Apr 13 '18 at 07:48
  • Also, you probably left out -name as well. – Kusalananda Apr 13 '18 at 08:04
  • I cannot quote it, because then find treat entire array as one name, and that is simple wrong. Before i tried run find for every filter, so one for each "*.mp3", "*.wav", etc, but even in simple case instead of one or two seconds, it takes even minutes! – harvald Apr 13 '18 at 10:12
  • 1
    The @ variable expansion is special. You must quote it for it to work properly. Try n=('ten' 'forty two' 'one hundred'); for a in "${n[@]}"; do echo "> $a <"; done and then compare that without the double quotes around ${n[@]}. For bonus points repeat both attempts with * substituted for @. – Chris Davies Apr 13 '18 at 11:16
  • What you were doing initially with audio_ext (or, to be precise, what you say you were doing) would never have worked, even with the quotes.  It isn’t conforming to find syntax.  What you have now, with params, still needs to be quoted (to allow things like params=(-iname '*star wars*')). And even then, it’s a disaster waiting to happen, because the -o isn’t going to do what you want unless you enclose it in parentheses (again, as per find syntax). – Scott - Слава Україні Apr 13 '18 at 21:04

2 Answers2

9

Simple. Use globbing to select the files you want, and quote the variable that holds the filename:

shopt -s nullglob
for file in ./*.txt; do
    do_something_with "$file"
done

That's really all there is to it.

More details:


Update: globbing is not responsible for the word splitting effect you're seeing. Failing to quote the variable is.

You can get file info for your conditions with stat

read size mtime < <(stat -c "%s %Y" "$file")
[[ $size -gt 1000 ]] && echo "too big"
[[ $mtime -lt $(date -d yesterday +%s) ]] && echo "too old"

Update 2: creating a filename with many special characters in it requires mixing various quoting mechanisms, but it's still possible to do anything with that file.

$ filename='~ASDFzxcv!@#$%^&*()_+[]\{}|;:",.<>?`'"'"$' \a\t\n\r\f'".txt"
#          ^^ single quoted part ^^^^^^^^^^^^^^^^   
#                             double quoted part ^^^
#                                ANSI-C quoted part ^^^^^^^^^^^^^^

$ echo "$filename"
~ASDFzxcv!@#$%^&*()_+[]\{}|;:",.<>?`'   

.txt

$ printf "%q\n" "$filename"
$'~ASDFzxcv!@#$%^&*()_+[]\\{}|;:",.<>?`\' \a\t\n\r\f.txt'

$ date > "$filename"

$ cat "$filename"
Thu Apr 12 15:14:29 EDT 2018

$ ls -lt
total 3836
-rw-rw-r-- 1 jackman jackman      29 Apr 12 15:14 ~ASDFzxcv!@#$%^&*()_+[]\{}|;:",.<>?`' ?????.txt
                ︙

$ ls -lt --show-control-chars
total 3836
-rw-rw-r-- 1 jackman jackman      29 Apr 12 15:14 ~ASDFzxcv!@#$%^&*()_+[]\{}|;:",.<>?`'     

.txt
                ︙

If the output of ls is redirected to anything other than a terminal (e.g., a file or a pipe), it will use the --show-control-chars style by default.  You can see this by running ls -lt | catls has other display options; e.g., --quoting-style=WORD.

glenn jackman
  • 85,964
4

Filenames can use any character except the nul character (\0) and slash, which is a path separator. Variables may hold any data (except nul characters in most shells). If properly quoted, filenames may be safely stored in variables and used with utilities.

Regarding your points:

To iterate over a set of files (regular files or directories), you may use a simple shell loop like

for name in ./*; do
    # some code that uses "$name"
done

To iterate over files while selecting particular files using specific criteria, find is a better choice. for example, to select all regular files in the current directory (or below) that are older than N days (have modification date at least N days in the past):

find . -type f -mtime +N

Similarly, -size is used to select files based on size, and -name to match the filename against a globbing pattern.

For example, to select the regular files that have filenames that match *.mov and that have been modified in the last week:

find . -type f -name '*.mov' -mtime -7

Then, to actually do something with these files, like moving them to the $HOME/Movies directory:

find . -type f -name '*.mov' -mtime -7 -exec mv {} "$HOME/Movies" ';'

The {} will be replaced by the pathname of the file in the invocation of mv. You do not need to quote the {} (it won't change anything if you do), as find will not invoke the shell's word splitting or filename expansion on the pathname.

A further improvement to this would be to detect file name collisions in the destination directory. For this we use a short helper script that will take a number of filenames on its command line:

destdir="$HOME/Movies"
for name do
    if [ -f "$destdir/${name##*/}" ]; then
        printf "%s already exists in %s, not overwriting it!\n" "${name##*/}" "$destdir" >&2
    else
        mv "$name" "$destdir"
    fi
done

or, in shortcut form:

destdir="$HOME/Movies"
for name do
    [ -f "$destdir/${name##*/}" ] && printf "skipping %s\n" "$name" >&2 && continue
    mv "$name" "$destdir"
done

Plugging that into our find command:

find . -type f -name '*.mov' -mtime -7 -exec sh -c '
    destdir="$HOME/Movies"
    for name do
        [ -f "$destdir/${name##*/}" ] && printf "skipping %s\n" "$name" >&2 && continue
        mv "$name" "$destdir"
    done' sh {} +

Nowhere along the way do we allow the shell to do word splitting or filename globbing on the pathname or filename that we are currently processing.

For further information:

Kusalananda
  • 333,661
  • @JeffSchaller That's absolutely true. – Kusalananda Apr 12 '18 at 20:18
  • Of course your shortcut-form script (with && continue) will go awry if printf every returns a failing exit status. – Scott - Слава Україні Apr 13 '18 at 20:23
  • @Scott Yes, definitely, which it would do if standard error was closed. It that's an issue, the printf should be removed. – Kusalananda Apr 13 '18 at 20:27
  • @Scott Thanks for the edit. I will back out your edit about {} though because I think Stéphane might have a point here: https://unix.stackexchange.com/a/156010/116858 – Kusalananda Apr 13 '18 at 20:35
  • 1
    Stéphane definitely has a point (when does he not?), but I believe that he’s talking specifically about things like find (whatever) -exec sh -c 'mv {} {}.bak' ';', where you’re asking the shell to parse the pathname that gets substituted for {} — and, therefore, his point is irrelevant here. AFAIK, things like -exec mv {} target-dir ';' are safe, as are -exec sh -c '(code using $1)' {} ';' (or … {} +), unless you do something really stupid.  And how can quoting the {} (when it’s a free-standing argument, and not embedded in something like 'mv {} {}.bak') possibly affect anything? – Scott - Слава Україні Apr 13 '18 at 20:54
  • @Scott Ah, I see what you mean, yes. Feel free to edit again if you wish (I'm not by a computer ATM, so editing is awkward). – Kusalananda Apr 13 '18 at 20:59