Shell valid function name characters

Question

Using extended Unicode characters is (no-doubt) useful for many users.

Simpler shells (ash (busybox), dash) and ksh do fail with:

tést() { echo 34; }

tést

But bash, mksh, lksh, and zsh seem to allow it.

I am aware that POSIX valid function names use this definition of Names. That means this regex:

[a-zA-Z_][a-zA-Z0-9_]*

However, in the first link it is also said:

An implementation may allow other characters in a function name as an extension.

The questions are:

Is this accepted and documented?
Where?
For which shells (if any)?

Related questions:
Its possible use special characters in a shell function name?
I am not interested in using meta-characters (>) in function names.

Upstart and bash function names containing “-”
I do not believe that an operator (subtraction "-") should be part of a name.

you might find alias to be a tad more lenient. and so you can write the function with a some proper, buttoned-down name, and then just define a more stylishly named alias to call the function. in dash there is also some stuff you can do with $PATH and %func. — mikeserv, Nov 25 '15 at 08:20

cuonglm · Accepted Answer · 2015-11-25T10:24:43.247

Since POSIX documentation allow it as an extension, there's nothing prevent implementation from that behavior.

A simple check (ran in zsh):

$ for shell in /bin/*sh 'busybox sh'; do
    printf '[%s]\n' $shell
    $=shell -c 'á() { :; }'
  done
[/bin/ash]
/bin/ash: 1: Syntax error: Bad function name
[/bin/bash]
[/bin/dash]
/bin/dash: 1: Syntax error: Bad function name
[/bin/ksh]
[/bin/lksh]
[/bin/mksh]
[/bin/pdksh]
[/bin/posh]
/bin/posh: á: invalid function name
[/bin/yash]
[/bin/zsh]
[busybox sh]
sh: syntax error: bad function name

show that bash, zsh, yash, ksh93 (which ksh linked to in my system), pdksh and its derivation allow multi-bytes characters as function name.

yash is designed to support multibyte characters from the beginning, so there's no surprise it worked.

The other documentation you can refer is ksh93:

A blank is a tab or a space. An identifier is a sequence of letters, digits, or underscores starting with a letter or underscore. Identifiers are used as components of variable names. A vname is a sequence of one or more identifiers separated by a . and optionally preceded by a .. Vnames are used as function and variable names. A word is a sequence of characters from the character set defined by the current locale, excluding non-quoted metacharacters.

So setting to C locale:

$ export LC_ALL=C
$ á() { echo 1; }
ksh: á: invalid function name

make it failed.

poshisn't worth to be listed in such a list. It depends on Linux specific bugs in libc and will not work on other platforms. — schily, Jun 01 '18 at 08:27
I cannot repeat your claims about ksh93 using a self compiled ksh93 from original sources. While ksh88 seems to accept non-7-Bit-ASCII letters for function names, only the ksh93 binary from Ubuntu seems to accept them. — schily, Jun 01 '18 at 08:30
@schily ksh I used in this test is the binary in Debian (so it may be the same with one on Ubuntu) — cuonglm, Jun 01 '18 at 08:59

Stéphane Chazelas · Answer 2 · 2015-11-29T11:33:05.823

12

Note that functions share the same namespace as other commands including commands in the file system, which on most systems have no limitation on the characters or even bytes they may contain in their path.

So while most shells restrict the characters of their functions, there's no real good reason why they would do that. That means in those shells, there are commands you can't replace with a function.

zsh and rc allow anything for their function names including some with / and the empty string. zsh even allows NUL bytes.

$ zsh
$ $'\0'() echo nul
$ ^@
nul
$ ""() uname
$ ''
Linux
$ /bin/ls() echo test
$ /bin/ls
test

A simple command in shell is a list of arguments, and the first argument is used to derive the command to execute. So, it's just logical that those arguments and function names share the same possible values and in zsh arguments to builtins and functions can be any byte sequence.

There's not security issue here as the functions you (the script author) define are the ones you invoke.

Where there may be security issues is when the parsing is affected by the environment, for instance with shells where the valid names for functions is affected by the locale.

edited Nov 29 '15 at 11:33

answered Nov 27 '15 at 17:11

Stéphane Chazelas

544,893

1

One may play games in bash too, starting with function /bin/sh { echo "$0: $FUNCNAME: Permission denied"; return 126; }, and potentially useful things too with functions named --, //, @ or % etc. – mr.spuratic Nov 27 '15 at 17:44
but dont shells tend to bypass a hash-table lookup when / is found in a name? and a function isnt just an executable name - its code. i would think a simple implementation could encounter a lot of parse problems if its stored function names included metacharacters. – mikeserv Nov 27 '15 at 17:46
Yes, I am aware of the inability of bash to contain nulls in vars, that could be reasonably extended to function names. I do not have an specific example, but I do feel that this games of allowing almost anything for names is more of a potential security breach than an "easy way to work". I hope I am wrong. – Nov 27 '15 at 19:01

Shell valid function name characters

2 Answers2

Linked