Context-Free Grammar parser with a shell script

Question

I wrote a script the other day for a nontechnical team and I thought I'd replace the cryptic command-line flags with a command that reads more like a sentence.

I came up with a command line along the lines of:
<script.sh> run tests for <module> ... in <language> ...

The keyword "for" introduces a list of modules, whereas "in" introduces a list of languages. ("in" may itself become a language code in the future, as an aside.)

The way I parse this is by defining its high-level structure as a . If + = "run tests", I kick in specific parsing looking for the "for" and "in" keywords.

Next, I wanted to change the syntax to:
<script.sh> run <module> ... tests in <language> ...

In my opinion, this reads better.

I have the skill to make this work with handcrafted Bash argument parsing, but the complexity of that would be overkill IMO. An alternative could be Lex/Yacc: the script could write temporary Lex+Yacc scripts and use them to parse its command line. That also seems overkill.

Is there a good, straightforward way to define a simple context-free grammar that can be used to parse command lines like the above into variables? I'm happy to hear starting points rather than full-fledged solutions.

score 0 · Answer 1 · answered Jul 19 '23 at 22:25

It's fairly unwieldy but it should point you in the right direction.

Parsing is only one stage of the process of interpreting (or compiling and executing) a specified program. It's convenient to use the shell arguments as lexed tokens. As far as interpretation, I highly recommend keeping the parsing and interpreter logic separated from implementation of work—use clearly named functions to do the work pieces. Maybe even write with no-op gating, so you can trace and debug without doing more work. bash -x is your friend, bash -x ./script.bash will let you see a trace of what's executed.

I would encourage you to called to named functions from parsing logic like this. It's finicky to write or modify, and it's easier when you can swap out functions of have called functions echo names instead of executing.

#!/usr/bin/env bash
: ${DEBUG:=}; shopt -s extglob; [[ $DEBUG ]] && shopt -p extglob
acceptArg () {
  local spec="$1" cb="$2" arg="$3" status=([args]=0 [matched]=0); shift 3;
https://unix.stackexchange.com/a/234415/61350  # how-can-i-use-a-variable-as-a-case-condition
matcher="@($spec)"
  case "$arg" in
    $matcher) status[matched]=1; $cb ;;
    *) [[ $DEBUG ]] && echo "spec: '$spec' failed to match arg: '$arg'";;
  esac
  [[ $DEBUG ]] && declare -p status
  return $((status[matched] == 1))
}
count=0
incr(){ echo $((++count)); }
handleFirst(){ incr; }
handleSecond(){ incr; }
acceptArg "f|first" handleFirst "$1";
acceptArg "s|second|*2" handleSecond "$2";
[[ $? == 1 ]] && echo done || echo failed

I'm interested in the same sort of thing, I've been meaning to write something for months or years. Thanks for crystalizing a problem statement. I'm working on updates at https://gist.github.com/mcint/8a589500c44d4dc08dcb09b80882c2fd

Context-Free Grammar parser with a shell script

1 Answers1

https://unix.stackexchange.com/a/234415/61350 # how-can-i-use-a-variable-as-a-case-condition