7

I want to check if a shell variable contains an absolute path.

I don't care if the path exists or not—if it doesn't I'm going to create it—but I do want to ensure that I'm dealing with an absolute pathname.

My code looks something like the following:

myfunction() {
  [ magic test to see if "$1" is an absolute path ] || return 1
  mkdir -p "$(dirname "$1")" || return 1
  commands >> "$1"
}

Or, the use case where the absolute path to be verified is intended to be a directory:

anotherfunction() {
  [ same magic test ] || return 1
  mkdir -p "$1"
  dostuff >> "$1/somefile"
}

If this were awk I would do the check like so: myvar ~ /^\//

There must be a clean way to do this with the shell's string handling, but I'm having trouble coming up with it.

(Mentioning a bash-specific solution would be fine but I'd like to know how to do this portably, also. POSIX string handling seems like it should be sufficient for this.)

Wildcard
  • 36,499

7 Answers7

11

You can just do:

case $1 in (/*) pathchk -- "$1";; (*) ! : ;; esac

That should be enough. And it will write diagnostics to stderr and return failure for inaccessible or uncreatable components. pathchk isn't about existing pathnames - it's about usable pathnames.

The pathchk utility shall check that one or more pathnames are valid (that is, they could be used to access or create a file without causing syntax errors) and portable (that is, no filename truncation results). More extensive portability checks are provided by the -p option.

By default, the pathchk utility shall check each component of each pathname operand based on the underlying file system. A diagnostic shall be written for each pathname operand that:

  • Is longer than {PATH_MAX} bytes (see Pathname Variable Values in <limits.h>)

  • Contains any component longer than {NAME_MAX} bytes in its containing directory

  • Contains any component in a directory that is not searchable

  • Contains any character in any component that is not valid in its containing directory

The format of the diagnostic message is not specified, but shall indicate the error detected and the corresponding pathname operand.

It shall not be considered an error if one or more components of a pathname operand do not exist as long as a file matching the pathname specified by the missing components could be created that does not violate any of the checks specified above.

mikeserv
  • 58,310
  • 2
    I wasn't aware of the pathchk command. Perfect. I'll leave open for a while longer to see if anything better shows up, but I think you nailed it—case switch plus a command actually designed to check a pathname. :) – Wildcard Jan 20 '16 at 01:34
  • @Wildcard - it's pretty handy. Adding -pP can also be used to single out paths with weird characters and other riffraff. – mikeserv Jan 20 '16 at 01:41
  • @cuonglm - excellent point - I kind of already rulled out the - dash, huh...? – mikeserv Jan 20 '16 at 02:54
  • @mikeserv: -P seems to be useless here, since when any path like /path/to/-_start_with_dash is fine. – cuonglm Jan 20 '16 at 03:12
  • @cuonglm - here, yes. but not useless. for example - usernames can be validated with pathchk -Pp. And you can do fn(){ pathchk -P "$@"; } to get a list of the arguments that start w/ - printed to stderr. – mikeserv Jan 20 '16 at 03:14
  • @mikeserv: Fair point, I also realized that my answer need -P for empty path. – cuonglm Jan 20 '16 at 03:16
  • 2
    @cuonglm - true. or [ ${1:+"!"} "${1%%/*}" ] – mikeserv Jan 20 '16 at 03:18
  • 1
    @mikeserv: tricky, as always! – cuonglm Jan 20 '16 at 03:32
  • @StéphaneChazelas - haha! I had that at first, too, but edited it out because, as cuonglm pointed out, the leading / seemed to obviate it. I'm good either way, though. – mikeserv Jan 20 '16 at 11:41
  • @mikeserv, d'oh sorry. – Stéphane Chazelas Jan 20 '16 at 11:42
  • @StéphaneChazelas - that's what I said, too. ^ its up there somewhere. – mikeserv Jan 20 '16 at 11:43
  • Thank you! May I ask why is it ...in (/*) pathchk... and not ...in /*) pathchk..., and is it possible to access the current element anyhow inside the "case" blocks? – Artfaith Jan 30 '24 at 12:43
  • @Artfaith just for personal preference; either is permitted. im not sure i understand your second question – mikeserv Jan 31 '24 at 13:37
8
[ "$1" != "${1#/}" ] || return 1

There may be a better way (that's why I asked). This code strips off any leading / in $1 and checks that the result is not the same as $1 itself.

Wildcard
  • 36,499
  • yeah... that's one way. I wouldn't call it better - and the return is unnecessary. The thing is with this - and the other - there is still a possibility for inaccessible/unreadable/unwritable intermediate components that would preclude path creation for all trailing components. – mikeserv Jan 20 '16 at 01:26
  • @mikeserv, right—but this is just a sanity check. The creation of the path components is handled by mkdir -p, so it just remains to check exit status of that command. Point is that I don't want to run mkdir at all on a relative path name, where the command could succeed, but the accessibility of the var be dependent on my current working directory. – Wildcard Jan 20 '16 at 01:28
  • @mikeserv then I misunderstood your comment. Unreadable/unwritable intermediate components...I assumed you meant because of the filesystem, or permissions issues, etc. I'm just trying to validate the contents of the variable as being a pathname I would want to create. (i.e. a string check.) – Wildcard Jan 20 '16 at 01:31
  • 1
    Right - but there's a command for that. – mikeserv Jan 20 '16 at 01:33
  • well, i did kind of mean because of permissions, yes. if the path string branches out to an inaccessible component - its still no good to you. and if parts of are too long to be valid the same is true. but you really don't need return there: false || return 1 is just redundant, you know? the [ test ] already returns true or false... – mikeserv Jan 20 '16 at 01:55
  • 1
    @mikeserv—but this is in a function body. return in this context skips the remaining commands of the function. – Wildcard Jan 20 '16 at 02:16
  • oh. much more necessary in that case, then. – mikeserv Jan 20 '16 at 02:52
4

Pattern matching is done with case statements in all Bourne-like shells.

is_absolute() {
  case "$1" in
    ///* | //) true;;
          //*) false;; # on some systems, //foo is special and is
                       # not an absolute path. // alone is /
           /*) true;;
            *) false
  esac
}

Remove the first two entries on systems that don't treat //foo specially.

  • i always thought that was for windows machines. – mikeserv Jan 20 '16 at 09:39
  • what a coincidence - ive just upvoted it! i always thought it was used for the win32 POSIX layer and the \\.?Volume stuff - or however that's supposed to go - though with forward slashes of course. – mikeserv Jan 20 '16 at 10:25
  • a note about that other question, though? its your question of course, but it seems a damned shame to mention Cygwin and skip UWIN. Were it me, for preference, I would do the reverse... – mikeserv Jan 20 '16 at 10:39
  • [OT] @mikeserv, I can't tell, I've never used UWIN. Cygwin has been good enough for me for the rare times I've had to use a Windows system. Can you easily get a X server or sshd with UWIN? – Stéphane Chazelas Jan 20 '16 at 11:34
  • Well, it comes with them and installs them as services. You then just enable them. What little I use MS for usually involves family and is usually in a VM. I nested an install of UWIN - and, unlike Cygwin, its familiar. Its also got all of the rest of the ksh93 advanced POSIX style things going on. Ummm ... I think sshd comes with it - maybe I'm misremembering - doesn't jive with ksh's cosh deal though, huh? Definitely X does though. – mikeserv Jan 20 '16 at 11:38
  • 2
    WOW! you can put links in CODE BLOCKS!?!? that's awesome. i had no idea... – mikeserv Jan 20 '16 at 12:15
  • @StéphaneChazelas, does word splitting get skipped for a case switch or should that be case "$1" in? – Wildcard Jan 20 '16 at 17:31
  • 1
    @Wildcard, there can't be word splitting as it's not a list context. Quotes won't harm though. – Stéphane Chazelas Jan 20 '16 at 17:59
3

An absolute path would

  • begin with /
  • not contain any /../ or /./
  • not begin with ../ or ./
  • not end with /.. or /.

so you could do this (portably) with a case statement:

    case "x$1" in
    (x*/..|x*/../*|x../*|x*/.|x*/./*|x./*)
        rc=1
        ;;
    (x/*)
        rc=0
        ;;
    (*)
        rc=1
        ;;
    esac
    return $rc

This intentionally excludes things such as

/../../../foo/../../../bar

which a naive "leading slash" interpretation permits.

For a concise definition of absolute path, refer to realpath in POSIX.

Thomas Dickey
  • 76,765
  • What's the x for? – Wildcard Jan 20 '16 at 01:26
  • 11
    An absolute path starts at /, period. You can navigate up (..) if you want, still absolute. – vonbrand Jan 20 '16 at 01:27
  • I put the "x" first, in case the shell does not like the first character. – Thomas Dickey Jan 20 '16 at 01:29
  • 4
    @ThomasDickey: It's too complicated, POSIX define absolute path as a pathname beginning with a single or more than two /. – cuonglm Jan 20 '16 at 01:50
  • POSIX says more than one thing about pathnames, you can use whatever interpretation you prefer. – Thomas Dickey Jan 20 '16 at 01:56
  • Shouldn't /./ be rejected (based on the same premises of /./././) ? –  Jan 20 '16 at 02:35
  • sure - added that. I agree that pathchk tests for the overall pathname length, but embedded relative-pathname syntax has far more importance to my work. – Thomas Dickey Jan 20 '16 at 09:09
  • 2
    What you're checking for is a canonical (though not checking for symlinks) absolute path. Anything that starts with / (with the exception of //foo on some systems) is an absolute path. An absolute path is a path that is not relative. – Stéphane Chazelas Jan 20 '16 at 09:26
  • I'm not aware of any shell that would require that x. Note that the (*) syntax (as opposed to *)) though POSIX (and I prefer it as well) is not understood by the Bourne shell. – Stéphane Chazelas Jan 20 '16 at 09:28
  • It would be nice if someone actually pointed to a page in POSIX which gave a precise definition. I've been using the sense in realpath for quite a while. (I have encountered problems with pathnames containing spaces - perhaps you overlooked that). I'll stick with the (*) syntax - no reason to argue about that. – Thomas Dickey Jan 20 '16 at 09:29
  • That link goes only to the top-level page... Perhaps you meant this. – Thomas Dickey Jan 20 '16 at 09:43
  • oh. im sorry. i sometimes do that with the frames on that page. here. oh. yes. that is what i meant. – mikeserv Jan 20 '16 at 09:44
  • In any case, OP already answered his own question, and got the most votes. Meanwhile pathchk is interesting - but not helpful to me. – Thomas Dickey Jan 20 '16 at 09:46
  • yeah. i like it for the way it can be used on shell globs to return a code and write the names to stderr if any files matched would turn up unexpected characters. and it doesn't have to be used on pathnames - arbitrary strings are fine if you expect them to be standard word types. – mikeserv Jan 20 '16 at 09:50
2

If by absolute path you mean that it starts with /, and we are talking about bash (as tag suggest):

$ var1='/tmp/foo'
$ var2='tmp/foo'

$ [[ "$var1" =~ ^/ ]] && echo yes || echo no
yes
$ [[ "$var2" =~ ^/ ]] && echo yes || echo no
no
jimmij
  • 47,140
  • That's simple enough. bash only, right? – Wildcard Jan 20 '16 at 01:29
  • @Wildcard yes [[ is bash syntax, may work in other shells as well like zsh. – jimmij Jan 20 '16 at 01:32
  • If you prefer to use glob matching over regular expressions: [[ $var == /* ]] -- quotes are not strictly required within double brackets. – glenn jackman Jan 20 '16 at 03:32
  • 1
    @Wildcard, no, [[...]] comes from ksh. =~ was first added by bash IIRC but later copied by ksh93 and zsh (different syntaxes though). [[ $var1 = /* ]] would work in all ksh variants and versions, bash and zsh. case $var in /*) is the Bourne/POSIX standard one. – Stéphane Chazelas Jan 20 '16 at 16:08
1

POSIX define absolute path as a pathname beginning with a single or more than two /.

There's a utility called pathchk to check pathname, so you can do:

[ -z "${1%%/*}" ] && pathchk -pP "$1"

-p tells pathchk to perform check for path that:

-P guard you from any path component start with - and an empty path.

cuonglm
  • 153,898
1

Just check the first character of the string using substring syntax:

[[ ${var:0:1} = / ]] || return 1
gardenhead
  • 2,017
  • @Wildcard: No, even without double bracket, then double quote ${var:0:1}, it's not POSIX. ${var:0:1} is'n in POSIX. – cuonglm Jan 20 '16 at 06:29
  • 1
    @Wildcard: [ "${1%"${1#/}"}" ] is the POSIX way. – mikeserv Jan 20 '16 at 06:40
  • 1
    Yeah this wasn't meant to be POSIX compliant, there are already good answers for that. This is just the clearest way in Bash IMO. – gardenhead Jan 20 '16 at 06:45
  • Whoops! Removed inaccurate comment; thanks. Not POSIX but very clear, yes. – Wildcard Jan 20 '16 at 06:50
  • i dont consider that it is more clear than the ${1%"${1#/}"} substitution. It simply substitutes away the results of substituting away the first character if it is a /. i never know what the numbers do with ${var:num:num}. personally. you could do [ / = "${1%"${1#?}"}" ] if you liked, but it doesn't add anything useful. With ${1%"${1#/}"} if the first char is not a slash the expansion is null, but if it is a slash it expands only to the slash. It's pretty straightforward. For that matter case $1 in /*) ;; esac also works in bash and is a damn sight clearer than ${1:0:1}. – mikeserv Jan 20 '16 at 07:03