I've written a shell script that handles some "regular" filenames, but I've read Why does my shell script choke on whitespace or other special characters? and Why you shouldn't parse the output of ls and I'd like it to be more robust and handle any valid filenames (and/or directory names). How can I create a test-bed of files and directories to run my script against?
1 Answers
Create a separate directory to play in (for ease of cleaning up later, mainly); this uses the value of $TMPDIR
if it's set, otherwise /tmp
:
mkdir "${TMPDIR-/tmp}/testing"
cd "${TMPDIR-/tmp}/testing"
Create files that are separate, but appear similar to each other because of whitespace (space, tab, newline, carriage return, backspace):
touch -- a b 'a ' 'b ' 'a b' 'a b' $'a\bb'
touch -- a$'\xe2\x80\x82'b a$'\xe2\x80\x83'b a$'\t'b a$'\n'b a$'\r'b
Credit for the above to Patrick. The two hex code ones are UTF-8 space separators known as nut and mutton; "in bidirectional context it acts as White Space and (are) not mirrored. The glyph(s) can, under circumstances, be confused with 20 other glyphs."
Create a plain file and one that would expand to the first if it was treated as a glob:
touch -- x '[x]'
Credit for the above to Wumpus Q. Wumbley.
In a similar vein:
touch -- 'a?b' 'a*b'
Credit for the above to dave_thompson_085 in the comments here.
touch -- foo\`echo\ malicious\`bar
Credit for the above to godlygeek.
A filename that will expand to something different (and potentially arbitrary execution!) if evaluated in a shell context:
touch '$( echo boom )'
Use:
touch -- single\'quote double\"quote back\\slash
to catch attempts to put a file name in quotes without escaping quotes.
touch -- -a -b -c -r -R - a=x
Credit for the above to Stéphane Chazelas.
Create a named pipe and symlink (to create files that aren't "regular"):
mkfifo fifo
ln -s a alink
Create subdirectories that have various whitespace included in their names, along with token files inside of them:
mkdir subdir "subdir 1" "subdir 2" "subdir 3 " subdir$'\n'4
touch subdir/file0 "subdir 1"/file1 "subdir 2"/file2 "subdir 3 "/file3 subdir$'\n'4/file4
Create filenames only containing *
(possibly problematic to remove), a filename consisting of only a (regular!) space, a dead symbolic link, a symbolic link that loops onto itself, and a sub-directory with a link back to the parent directory:
touch -- '*' '**' '***' ' '
ln -s /does/not/exist dead
ln -s loop loop
mkdir subdir_with_link
(cd subdir_with_link && ln -s .. parent)
More misc filenames. The last two are unicode for "fractional slash" and "division slash".
touch -- '(' '!' '!!' $'\xe2\x81\x84' $'\xe2\x88\x95'
Ideas from Scott:
touch -- '-' '--' ';' '&' '|' '<' '>' '$' ')' '{' '}' = \\ '!' '#' '{a,b}'
Characters that are harmless in some locales but dangerous in others:
touch $'X\xa0Y' # non-breaking space in iso8859-1 which is considered
# "blank" and "space" in some locales
touch $'\xa3\x5c' $'\xa3\x60' # α and ε in BIG5 or BIG5-HKSCS charset, but
# �\ and �` in ASCII
Characters that sort the same in some locales:
touch ① ② # sorts the same in GNU locales, order non-deterministic.
Files that escape the .[!.]* *
glob (sometimes used to expand both hidden and non-hidden files):
touch ..foo ...

- 67,283
- 35
- 116
- 255
a?b
anda*b
(quoted of course). @muru: byte sequences E2 80 82/83 are the UTF-8 encoding of U+2002 EN SPACE and U+2003 EM SPACE – dave_thompson_085 Feb 25 '17 at 13:16-
and--
, although, depending on the script's requirements, it should maybe be impossible to access them without a leading./
. And I'm surprised that there are so few with non-glob shell special characters, like;
,&
,|
,<
,>
,$
,(
,)
,{
,}
,=
,\
,!
, and#
— for example,{a,b}
. – Scott - Слава Україні Feb 26 '17 at 05:50