32

Is there any objective reason to prefer one form to the other? Performance, reliability, portability?

filename=/some/long/path/to/a_file

parentdir_v1="${filename%/*}"
parentdir_v2="$(dirname "$filename")"

basename_v1="${filename##*/}"
basename_v2="$(basename "$filename")"

echo "$parentdir_v1"
echo "$parentdir_v2"
echo "$basename_v1"
echo "$basename_v2"

Produces:

/some/long/path/to
/some/long/path/to
a_file
a_file

(v1 uses shell parameter expansion, v2 uses external binaries.)

Wildcard
  • 36,499

4 Answers4

39

Both have their quirks, unfortunately.

Both are required by POSIX, so the difference between them isn't a portability concern¹.

The plain way to use the utilities is

base=$(basename -- "$filename")
dir=$(dirname -- "$filename")

Note the double quotes around variable substitutions, as always, and also the -- after the command, in case the file name begins with a dash (otherwise the commands would interpret the file name as an option). This still fails in one edge case, which is rare but might be forced by a malicious user²: command substitution removes trailing newlines. So if a filename is called foo/bar then base will be set to bar instead of bar . A workaround is to add a non-newline character and strip it after the command substitution:

base=$(basename -- "$filename"; echo .); base=${base%.}
dir=$(dirname -- "$filename"; echo .); dir=${dir%.}

With parameter substitution, you don't run into edge cases related to expansion of weird characters, but there are a number of difficulties with the slash character. One thing that is not an edge case at all is that computing the directory part requires different code for the case where there is no /.

base="${filename##*/}"
case "$filename" in
  */*) dirname="${filename%/*}";;
  *) dirname=".";;
esac

The edge case is when there's a trailing slash (including the case of the root directory, which is all slashes). The basename and dirname commands strip off trailing slashes before they do their job. There's no way to strip the trailing slashes in one go if you stick to POSIX constructs, but you can do it in two steps. You need to take care of the case when the input consists of nothing but slashes.

case "$filename" in
  */*[!/]*)
    trail=${filename##*[!/]}; filename=${filename%%"$trail"}
    base=${filename##*/}
    dir=${filename%/*};;
  *[!/]*)
    trail=${filename##*[!/]}
    base=${filename%%"$trail"}
    dir=".";;
  *) base="/"; dir="/";;
esac

If you happen to know that you aren't in an edge case (e.g. a find result other than the starting point always contains a directory part and has no trailing /) then parameter expansion string manipulation is straightforward. If you need to cope with all the edge cases, the utilities are easier to use (but slower).

Sometimes, you may want to treat foo/ like foo/. rather than like foo. If you're acting on a directory entry then foo/ is supposed to be equivalent to foo/., not foo; this makes a difference when foo is a symbolic link to a directory: foo means the symbolic link, foo/ means the target directory. In that case, the basename of a path with a trailing slash is advantageously ., and the path can be its own dirname.

case "$filename" in
  */) base="."; dir="$filename";;
  */*) base="${filename##*/}"; dir="${filename%"$base"}";;
  *) base="$filename"; dir=".";;
esac

The fast and reliable method is to use zsh with its history modifiers (this first strips trailing slashes, like the utilities):

dir=$filename:h base=$filename:t

¹ Unless you're using pre-POSIX shells like Solaris 10 and older's /bin/sh (which lacked parameter expansion string manipulation features on machines still in production — but there's always a POSIX shell called sh in the installation, only it's /usr/xpg4/bin/sh, not /bin/sh).
² For example: submit a file called foo to a file upload service that doesn't protect against this, then delete it and cause foo to be deleted instead

  • Wow. So it sounds like (in any POSIX shell) the most robust way is the second one you mention? base=$(basename -- "$filename"; echo .); base=${base%.}; dir=$(dirname -- "$filename"; echo .); dir=${dir%.}? I was reading carefully and I didn't notice you mentioning any drawbacks. – Wildcard Jan 12 '16 at 22:10
  • 1
    @Wildcard A drawback is that it treats foo/ like foo, not like foo/., which isn't consistent with POSIX-compliant utilities. – Gilles 'SO- stop being evil' Jan 12 '16 at 22:11
  • Got it, thanks. I think I still prefer that method because I would know if I'm trying to deal with directories and I could just tack on (or "tack back on") a trailing / if I need it. – Wildcard Jan 12 '16 at 22:17
  • "e.g. a find result, which always contains a directory part and has no trailing /" Not quite true, find ./ will output ./ as the first result. – Tavian Barnes Mar 19 '19 at 04:10
  • 2
    @Gilles The newline character example just blew my mind. Thanks for the answer – Sam Thomas Mar 29 '19 at 15:46
  • You can remove trailing / in one go POSIXly with ${p%"${p##*[!/]}"} – Stéphane Chazelas Jun 26 '20 at 05:51
  • 1
    "Required by POSIX" doesn't guarantee real-world portability, sadly. I've seen systems without basename and dirname commands. Routers or phones, mostly. – mtraceur Jun 17 '21 at 06:23
11

Both are in POSIX, so portability "should" be of no concern. The shell substitutions should be presumed to run faster.

However - it depends on what you mean by portable. Some (not necessariy) old systems did not implement those features in their /bin/sh (Solaris 10 and older come to mind), while on the other hand, a while back, developers were cautioned that dirname was not as portable as basename.

For reference:

In considering portability, I would have to take into account all of the systems where I maintain programs. Not all are POSIX, so there are tradeoffs. Your tradeoffs may differ.

Thomas Dickey
  • 76,765
8

There is also:

mkdir '
';    dir=$(basename ./'
');   echo "${#dir}"

0

Weird stuff like that happens because there's a lot of interpreting and parsing and the rest that needs to happen when two processes talk. Command substitutions will strip trailing newlines. And NULs (though that's obviously not relevant here). basename and dirname will also strip trailing newlines in any case because how else do you talk to them? I know, trailing newlines in a filename are kind of anathema anyway, but you never know. And it doesn't make sense to go the possibly flawed way when you could do otherwise.

Still... ${pathname##*/} != basename and likewise ${pathname%/*} != dirname. Those commands are specified to carry out a mostly well-defined sequence of steps to arrive at their specified results.

The spec is below, but first here's a terser version:

basename()
    case   $1   in
    (*[!/]*/)     basename         "${1%"${1##*[!/]}"}"   ${2+"$2"}  ;;
    (*/[!/]*)     basename         "${1##*/}"             ${2+"$2"}  ;;
  (${2:+?*}"$2")  printf  %s%b\\n  "${1%"$2"}"       "${1:+\n\c}."   ;;
    (*)           printf  %s%c\\n  "${1##///*}"      "${1#${1#///}}" ;;
    esac

That's a fully POSIX compliant basename in simple sh. It's not difficult to do. I merged a couple branches I use below there because I could without affecting results.

Here's the spec:

basename()
    case   $1 in
    ("")            #  1. If  string  is  a null string, it is 
                    #     unspecified whether the resulting string
                    #     is '.' or a null string. In either case,
                    #     skip steps 2 through 6.
                  echo .
     ;;             #     I feel like I should flip a coin or something.
    (//)            #  2. If string is "//", it is implementation-
                    #     defined whether steps 3 to 6 are skipped or
                    #     or processed.
                    #     Great. What should I do then?
                  echo //
     ;;             #     I guess it's *my* implementation after all.
    (*[!/]*/)       #  3. If string consists entirely of <slash> 
                    #     characters, string shall be set to a sin‐
                    #     gle <slash> character. In this case, skip
                    #     steps 4 to 6.
                    #  4. If there are any trailing <slash> characters
                    #     in string, they shall be removed.
                  basename "${1%"${1##*[!/]}"}" ${2+"$2"}  
      ;;            #     Fair enough, I guess.
     (*/)         echo /
      ;;            #     For step three.
     (*/*)          #  5. If there are any <slash> characters remaining
                    #     in string, the prefix of string up to and 
                    #     including the last <slash> character in
                    #     string shall be removed.
                  basename "${1##*/}" ${2+"$2"}
      ;;            #      == ${pathname##*/}
     ("$2"|\
      "${1%"$2"}")  #  6. If  the  suffix operand is present, is not
                    #     identical to the characters remaining
                    #     in string, and is identical to a suffix of
                    #     the characters remaining  in  string, the
                    #     the  suffix suffix shall be removed from
                    #     string.  Otherwise, string is not modi‐
                    #     fied by this step. It shall not be
                    #     considered an error if suffix is not 
                    #     found in string.
                  printf  %s\\n "$1"
     ;;             #     So far so good for parameter substitution.
     (*)          printf  %s\\n "${1%"$2"}"
     esac           #     I probably won't do dirname.

...maybe the comments are distracting....

mikeserv
  • 58,310
  • 1
    Wow, good point about trailing newlines in filenames. What a can of worms. I don't think I really understand your script, though. I've never seen [!/] before, is that like [^/]? But your comment alongside that doesn't seem to match it.... – Wildcard Jan 07 '16 at 00:42
  • 1
    @Wildcard - well.. it's not my comment. That's the standard. The POSIX spec for basename is a set of instructions on how to do it with your shell. But [!charclass] is the portable way to do that with globs [^class] is for regex - and shells aren't spec'd for regex. About the matching the comment... case filters, so if I match a string which contains a trailing slash / and a !/ then if the next case pattern below matches any trailing / slashes at all they can only be all slashes. And one below that can't have any trailing / – mikeserv Jan 07 '16 at 00:48
2

You can get a boost from in-process basename and dirname (I don't understand why these aren't builtins -- if these aren't candidates, I don't know what is) but the implementation needs to handle things like:

path         dirname    basename
"/usr/lib"    "/usr"    "lib"
"/usr/"       "/"       "usr"
"usr"         "."       "usr"
"/"           "/"       "/"
"."           "."       "."
".."          "."       ".."

^From basename(3)

and other edge cases.

I've been using:

basename(){ 
  test -n "$1" || return 0
  local x="$1"; while :; do case "$x" in */) x="${x%?}";; *) break;; esac; done
  [ -n "$x" ] || { echo /; return; }
  printf '%s\n' "${x##*/}"; 
}

dirname(){ test -n "$1" || return 0 local x="$1"; while :; do case "$x" in /) x="${x%?}";; ) break;; esac; done [ -n "$x" ] || { echo /; return; } set -- "$x"; x="${1%/*}" case "$x" in "$1") x=.;; "") x=/;; esac printf '%s\n' "$x" }

( My latest implementation of GNU basename and dirname adds some special fancy command line switches for stuff such as handling multiple arguments or suffix stripping, but that's super easy to add in the shell. )

It's not that difficult to make these into bash builtins either (by making use of the underlying system implementation), but the above function need not be compiled, and they provide some boost also.

Petr Skocik
  • 28,816
  • The list of edge cases is actually very helpful. Those are all very good points. The list actually seems fairly complete; are there really any other edge cases? – Wildcard Jan 06 '16 at 01:44
  • My former implementation didn't handle things like x// correctly, but I've fixed for you before answering. I hope that's it. – Petr Skocik Jan 06 '16 at 01:48
  • You can run a script to compare what the functions and and the executables do on these examples. I'm getting a 100% match. – Petr Skocik Jan 06 '16 at 01:50
  • 1
    Your dirname function doesn't seem to strip repeated occurrences of slashes. For example: dirname a///b//c//d////e yields a///b//c//d///. – codeforester Oct 18 '17 at 00:48