2

I am parsing options with getopts but would like to handle long-options as well.

print-args ()
{
 title="$1" ; shift
 printf "\n%s\n" "${title}: \$@:"
 for arg in "$@"; do
   (( i = i + 1 ))
   printf "%s |%s|\n" "${i}." "$arg"
 done
}

getopts_test () { aggr=() for arg in "$@"; do case $arg in ("--colour"|"--color") aggr+=( "-c" ) ;; ("--colour="|"--color=") aggr+=( "-c" "${arg#=}" ) ;; () aggr+=( "$arg" ) ;; esac done

print-args "print" "$@"

eval set -- "${aggr[@]}" print-args "eval" "$@"

set -- "${aggr[@]}" print-args "set" "$@"

local OPTIND OPTARG local shortopts="C:" while getopts "$shortopts" arg; do case $arg in ("c") context="$OPTARG" ;; (*) break ;; esac done shift $(( OPTIND - 1 )) }

But I wonder whether the use of set -- "${aggr[@]}" is correct.

Or is the following (using eval) more appropriate?

eval set -- "${aggr[@]}"

I have performed a test shown below. With eval, the string "Gunga Din" is split up, whereas with set -- "${aggr[@]}", it is being parsed correctly as a single string.

getopts_test -f -g 130 --colour="170 20" "Gunga Din"

print: $@:

  1. |-f|
  2. |-g|
  3. |130|
  4. |--colour=170 20|
  5. |Gunga Din|

eval: $@:

  1. |-f|
  2. |-g|
  3. |130|
  4. |-c|
  5. |170|
  6. |20|
  7. |Gunga|
  8. |Din|

set: $@:

  1. |-f|
  2. |-g|
  3. |130|
  4. |-c|
  5. |170 20|
  6. |Gunga Din|

Then I ran another function that uses the non-GNU getopt.

getopt_test ()
{
 shortopts="Vuhv::H::w::e::n::l::C:"
 shortopts="${shortopts}bgcrmo"
 longopts="version,usage,help,verbosity::"
 longopts="${longopts},heading::,warning::,error::"
 longopts="${longopts},blu,grn,cyn,red,mgn,org"

opts=$( getopt -o "$shortopts" -l "$longopts" -n "${0##*/}" -- "$@" )

print-args "$@:" "$@" print-args "opts:" "$opts"

set -- "$opts" print-args "set -- "$opts"" "$@"

eval set -- "$opts" print-args "eval set -- "$opts"" "$@"

}

This resulted in the following

getopt_test --warning=3 "foo'bar" "Gunga Din"

$@:

  1. |--warning=3|
  2. |foo'bar|
  3. |Gunga Din|

opts:

  1. | --warning '3' -- 'foo'''bar' 'Gunga Din'|

set -- "$opts"

  1. | --warning '3' -- 'foo'''bar' 'Gunga Din'|

eval set -- "$opts"

  1. |--warning|
  2. |3|
  3. |--|
  4. |foo'bar|
  5. |Gunga Din|

As shown the result of getopt is a single entry with positional arguments re-arranged. This shows the need to use eval set -- "$opts" to split the positional arguments in the opts string into five entries for option parsing and processing.

icedwater
  • 109
Vera
  • 1,223
  • 1
    Do you have the GNU getopt tool? It'll handle quite a lot of this for you. (Here, getopt --versiongetopt from util-linux 2.33.1) – Chris Davies Oct 30 '21 at 10:42
  • 2
    @roaima, note that util-linux is not part of the GNU project. – Stéphane Chazelas Oct 30 '21 at 14:04
  • @StéphaneChazelas I'd had the impression that the newer getopt was GNU – Chris Davies Oct 30 '21 at 17:33
  • @roaima, no, it appears to be associated with the Linux kernel developers rather than GNU, at least as far as we believe wikipedia https://en.wikipedia.org/wiki/Util-linux and e.g. the Debian package page also links to www.kernel.org: https://packages.debian.org/bullseye/util-linux – ilkkachu Oct 30 '21 at 22:40

2 Answers2

3

The idea there is to preprocess the arguments and change each --context to -C which getopts can then process? I suppose that would work, but note that GNU-style long options can also take arguments in the format --context=foobar, and your construct here doesn't support that. The user would need to know that this particular tool here requires --context foobar as two distinct arguments. Or you'd need to make the preprocessing more complex.

You might also want to check all arguments that start with --, as otherwise e.g. a mistyped --cotnext would go to getopts as-is, and you'd get complaints about unknown options. (Or worse, wrong options would be enabled.)

But I wonder whether the use of set -- "${aggr[@]}" is correct.

Or is the following (using eval) more appropriate?

set -- "${aggr[@]}" expands the elements of the array, to distinct words, and then assigns those words to the positional parameters. Each array element will become exactly one positional parameter, without changes.

eval set -- "${aggr[@]}" would expand all the elements of the array, then join them together with spaces, prepend the set -- and evaluate the result as a shell command. That is, if you have the array elements abc def, $(date >&2), ghi'jkl, the command would be

set -- abc def $(date >&2) ghi'jkl 

which would end up with abc and def as two distinct parameters, and it would print the date to stderr, except that the lone single quote will cause a syntax error.

Using eval would be appropriate if you have something that's designed to produce output that's quoted for shell input.


If you're on Linux (and don't care about portability), you could do what roaima suggested in the comments, and use the util-linux version of getopt (without the s). It supports long options too, there's answers showing how to use it in getopt, getopts or manual parsing - what to use when I want to support both short and long options? and in this SO answer and also my answer here.

Incidentally, with that getopt, you would use eval, since as a command, it's limited to producing just a single string as output, not a list like an array, so it uses shell quoting to work around the issue.

ilkkachu
  • 138,973
  • Yes, I change each --context to -C. – Vera Oct 30 '21 at 11:01
  • The code is intended ta address the portability problem. Have used getopt before. – Vera Oct 30 '21 at 13:30
  • The biggest question is whether to use eval or not, because for the case of getopt, the use of eval seems necessary. – Vera Oct 30 '21 at 13:43
  • @khin, if you have something that's explicitly made to be a shell command, then you use eval. If not, then you don't. Your users probably don't want to enter args with spaces as getopts_test "'foo bar'", instead of the normal getopts_test "foo bar", and they'll probably also expect getopts_test * to work, even if some of the filenames contain whitespace (or shell special characters). – ilkkachu Oct 30 '21 at 17:18
  • For my getopts_test, it like that using an array is a neat idea. What do you think? Customarily, I pass filename as non-option arguments by using the break command. – Vera Oct 31 '21 at 04:39
  • Would you be so kind to explain what commands would be explicitly shell command? Am getting baffled with the terminology. Could there also be some tangible code examples? – Vera Oct 31 '21 at 04:50
  • I agree that users would not want to use getopts_test "'foo bar'". They would want to run the command with getopts_test "foo bar". This directs me to use set -- "${aggr[@]}". When users pass getopts_test -f -g 130 --colour="170 20" "foo bar" * and one of the files has spaces (e.g. "un tit.pdf"), using eval, does split "un" from "tit.pdf". I can see that using eval is not the right way to parse arguments when the developer decides to handle user arguments directly without using getopts or getopt. Are we agreed on these points @ilkkachu ? – Vera Oct 31 '21 at 05:08
  • You are correct. Using getopts_test -f -g 130 --colour="170 20" "foo'bar" yields bash: unexpected EOF while looking for matching '. bash: syntax error: unexpected end of file when using eval set -- "${aggr[@]}". Whereas the problem does not occur with set -- "${aggr[@]}". – Vera Oct 31 '21 at 05:28
  • @khin, one way to look at it, is that a shell command is a single string, while what results from it after quote processing is conceptually an array (or list, if you will). The command ls 'foo bar' blah gets converted to the array (ls, foo bar, blah). What the user initially gives as command, e.g. getopts_test ... is a string, but even before the function runs, that gets converted to an array. The elements of that array being ($1, $2, ...) inside the function. Then, you already have an array, and there's no need for the string->array conversion any more. – ilkkachu Oct 31 '21 at 11:31
  • But getopt explicitly does an array->string conversion, just for the purpose of it being converted back to an array, and only because it's not possible to directly return an array from a command substitution. (The output of a command is a stream of bytes, so just a single string, no structure inherently. That structure needs to be built with delimiter characters, e.g. the colons in /etc/passwd delimiting the fields. Just those aren't general, you can't have a colon in a value if the colon is the delimiter. Shell quoting is general and easily available with eval for use with getopt) – ilkkachu Oct 31 '21 at 11:35
  • And of course in general we could use eval in something like eval "$(date +"hour=%H min=%M sec=%S")" which would ask date to print a string that looks like hour=13 min=39 sec=55, which we'd then know would be useful as a command to the shell if we want those values assigned to the three variables. But we wouldn't use eval "$(date +"%H:%M:%S")" because the output 13:39:55 wouldn't be useful as a command. So we just do time="$(date +"%H:%M:%S")" instead. The point is that we'd only use eval if the output is designed to look like a command line, like one we'd write on the prompt. – ilkkachu Oct 31 '21 at 11:49
  • And yes, the single quotes and spaces are what the extra eval would mess up – ilkkachu Oct 31 '21 at 11:51
0

You can parse --foo-style long options with the getopts builtin by adding - as a short option taking an argument to the optstring, and retrieving the actual long option from $OPTARG. Simple example:

while getopts :sc:-: o; do
    case $o in
    :) echo >&2 "option -$OPTARG needs an argument"; continue;;
    '?') echo >&2 "unknown option -$OPTARG"; continue;;
    -) o=${OPTARG%%=*}; OPTARG=${OPTARG#"$o"}; OPTARG=${OPTARG#=};;
    esac
    echo "OPT $o=$OPTARG"
done
shift "$((OPTIND - 1))"
echo "ARGS $*"

which you can then use as either script -c foo or script --context=foo.

If you also want to have the long options validated just like the short ones, and also accept abbreviated forms, you need something more complex. There is not much wisdom in over-engineering a poor shell script like that, but if you want an example, here it is:

short_opts=sc:
long_opts=silent/ch/context:/check/co   # those who take an arg END with :

override via command line for testing purposes

if [ "$#" -ge 2 ]; then

short_opts=$1; long_opts=$2; shift 2

fi

while getopts ":$short_opts-:" o; do case $o in :) echo >&2 "option -$OPTARG needs an argument" ;continue;; '?') echo >&2 "bad option -$OPTARG" ;continue;; -) o=${OPTARG%%=}; OPTARG=${OPTARG#"$o"}; lo=/$long_opts/ case $lo in "/$o"[!/:]"/$o"[!/:]) echo >&2 "ambiguous option --$o"; continue;; "/$o"[:/]) ;; ) o=$o${lo#"/$o"}; o=${o%%[/:]} ;; esac case $lo in "/$o/") OPTARG= ;; "/$o:/") case $OPTARG in '=') OPTARG=${OPTARG#=};; ) eval "OPTARG=$$OPTIND" if [ "$OPTIND" -le "$#" ] && [ "$OPTARG" != -- ]; then OPTIND=$((OPTIND + 1)) else echo >&2 "option --$o needs an argument"; continue fi;; esac;; ) echo >&2 "unknown option --$o"; continue;; esac esac echo "OPT $o=$OPTARG" done shift "$((OPTIND - 1))" echo "ARGS $*"

then

$ ./script --context=33
OPT context=33
$ ./script --con=33
OPT context=33
$ ./script --co
OPT co=
$ ./script --context
option --context needs an argument
  • Does getopt also take a leading colon? Could I check for : and ? with getopt as well? – Vera Oct 31 '21 at 11:34
  • Might want to use ${OPTARG%%=*} in the first one too (with double %%). The second one has something wrong with recognizing invalid options, e.g. --xyz comes up as xyzsilent (and --sil comes up as silsilent too). But you did say it wasn't properly debugged anyway. Might be easier to just drop support for abbreviated long options. – ilkkachu Oct 31 '21 at 12:02
  • What is the purpose of a leading : with getopt(1)? Why does it shut up error messages? – Vera Oct 31 '21 at 16:41