13

I've written a shell script that handles some "regular" filenames, but I've read Why does my shell script choke on whitespace or other special characters? and Why you shouldn't parse the output of ls and I'd like it to be more robust and handle any valid filenames (and/or directory names). How can I create a test-bed of files and directories to run my script against?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

1 Answers1

12

Create a separate directory to play in (for ease of cleaning up later, mainly); this uses the value of $TMPDIR if it's set, otherwise /tmp:

mkdir "${TMPDIR-/tmp}/testing"
cd "${TMPDIR-/tmp}/testing"

Create files that are separate, but appear similar to each other because of whitespace (space, tab, newline, carriage return, backspace):

touch -- a b 'a ' 'b ' 'a b' 'a  b' $'a\bb'
touch -- a$'\xe2\x80\x82'b a$'\xe2\x80\x83'b a$'\t'b a$'\n'b a$'\r'b

Credit for the above to Patrick. The two hex code ones are UTF-8 space separators known as nut and mutton; "in bidirectional context it acts as White Space and (are) not mirrored. The glyph(s) can, under circumstances, be confused with 20 other glyphs."

Create a plain file and one that would expand to the first if it was treated as a glob:

touch -- x '[x]' 

Credit for the above to Wumpus Q. Wumbley.

In a similar vein:

touch -- 'a?b' 'a*b'

Credit for the above to dave_thompson_085 in the comments here.

touch -- foo\`echo\ malicious\`bar

Credit for the above to godlygeek.

A filename that will expand to something different (and potentially arbitrary execution!) if evaluated in a shell context:

touch '$( echo boom )'

Use:

touch -- single\'quote double\"quote back\\slash

to catch attempts to put a file name in quotes without escaping quotes.

touch -- -a -b -c -r -R - a=x

Credit for the above to Stéphane Chazelas.

Create a named pipe and symlink (to create files that aren't "regular"):

mkfifo fifo
ln -s a alink

Create subdirectories that have various whitespace included in their names, along with token files inside of them:

mkdir subdir "subdir 1" "subdir 2" "subdir 3 " subdir$'\n'4
touch subdir/file0 "subdir 1"/file1 "subdir 2"/file2 "subdir 3 "/file3 subdir$'\n'4/file4

Create filenames only containing * (possibly problematic to remove), a filename consisting of only a (regular!) space, a dead symbolic link, a symbolic link that loops onto itself, and a sub-directory with a link back to the parent directory:

touch -- '*' '**' '***' ' '

ln -s /does/not/exist dead

ln -s loop loop

mkdir subdir_with_link
(cd subdir_with_link && ln -s .. parent)

More misc filenames. The last two are unicode for "fractional slash" and "division slash".

touch -- '(' '!' '!!'  $'\xe2\x81\x84' $'\xe2\x88\x95'

Ideas from Scott:

touch -- '-' '--' ';' '&' '|' '<' '>' '$' ')' '{' '}' = \\ '!' '#' '{a,b}'

Characters that are harmless in some locales but dangerous in others:

touch $'X\xa0Y' # non-breaking space in iso8859-1 which is considered
                # "blank" and "space" in some locales

touch $'\xa3\x5c' $'\xa3\x60' # α and ε in BIG5 or BIG5-HKSCS charset, but
                              # �\ and �` in ASCII

Characters that sort the same in some locales:

touch ① ② # sorts the same in GNU locales, order non-deterministic.

Files that escape the .[!.]* * glob (sometimes used to expand both hidden and non-hidden files):

touch ..foo ...
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • 1
    yes, please describe the test cases your creating most are obvious, some like that one which looks like a Unicode character in it, aren't. – muru Feb 25 '17 at 04:39
  • 1
    I'd add a?b and a*b (quoted of course). @muru: byte sequences E2 80 82/83 are the UTF-8 encoding of U+2002 EN SPACE and U+2003 EM SPACE – dave_thompson_085 Feb 25 '17 at 13:16
  • Some evil geniuses at work there :-c – Chindraba Feb 26 '17 at 05:06
  • It might be interesting to play with - and --, although, depending on the script's requirements, it should maybe be impossible to access them without a leading ./.  And I'm surprised that there are so few with non-glob shell special characters, like ;, &, |, <, >, $, (, ), {, }, =, \, !, and # — for example, {a,b}. – Scott - Слава Україні Feb 26 '17 at 05:50
  • Feel free to add in your ideas, @Scott ! That's why I made it community wiki. – Jeff Schaller Feb 26 '17 at 10:50