0

I'm running a series of experiments with bash and want to store the log files in a directory whose name is based on the experiment configuration. Some items in the configuration are boolean (true/false). Take the following configuration as an example:

batch_size=16
fp16=false
bf16=true
checkpoint_activations=true

I'd like to store the log file from the experiment with above configuration as input in a directory with the following name:

output_dir="experiment_bs${batch_size}_dt${fp16 if fp16=true else bf16}_${cp if checkpoint_activations=true else empty}"

Of course, I could declare helper variables:

data_type=""
"${fp16}" && data_type=fp16
"${bf16}" && data_type=bf16
"${cp}" && cp="_cp" || cp=""
output_dir="experiment_bs${batch_size}_dt${data_type}${cp}"

But I find this somewhat clunky and hope that parameter substitutions could be useful here. "${bf16:+bf16}" won't help in my case because this will always print "bf16" regardless of its boolean value as long as it's defined.

Are there any parameter substitutions that could be applied to this use-case? Or is there an even better in-line solution to this problem?

Note: there's an application-specific reason why I don't directly use data_type in my configuration.

3 Answers3

2

You can put any bash commands you want inside $(...), so you could write:

output_dir="experiment_bs${batch_size}_dt$([[ $fp16 = true ]] && echo $fp16 || echo $bf16)_$([[ $checkpoint_activation = true ]] && echo $cp || echo empty)"

Although for legibility I might write instead:

printf -v output_dir "experiment_bs%s_dt%s_%s" \
  "$batch_size" \
  "$([[ $fp16 = true ]] && echo "$fp16" || echo "$bf16")" \
  "$([[ $checkpoint_activation = true ]] && echo "$cp" || echo empty)"

Given your sample inputs...

batch_size=16
fp16=false
bf16=true
checkpoint_activations=true

...both of the above produce the value:

experiment_bs16_dttrue_empty
larsks
  • 34,737
  • Thanks for the suggestion. However, it's still quite wordy and not as elegant as a parameter substitution would have been. – Green绿色 Feb 06 '24 at 02:00
  • 6
    @Green绿色 if you stuff too many conditionals into a line of code because its less wordy and not as elegant, you risk creating write-only code. Consider first the maintainability of the code, elegance and conciseness come after. – muru Feb 06 '24 at 10:06
  • 1
    @muru Thank you for your very relevant suggestion. You're totally right. Inlining multi-line code isn't really what I hoped for when I asked for a concise solution, much more something at the likes of ${VAR:-default}, i.e., a short atomic expression with clear semantics. – Green绿色 Feb 07 '24 at 02:06
2

In zsh, you could define a ? function (and an alias to prevent it being treated as a glob pattern) that implements a form of ternary ? condition if-yes if-no operator, reminiscent of C's condition ? if-yes : if-no:

alias "?='?'"
'?'() if eval $1; then print -r -- $2; else print -r -- $3; fi

output_dir=experiment_bs${batch_size}dt$(? $fp16 fp16 bf16)$(? $cp cp)

With zsh 6.0+ (not released yet as of 2024-02-06), you can change it to:

alias "?='?'"
'?'() if eval $1; then REPLY=$2; else REPLY=$3; fi

output_dir=experiment_bs${batch_size}dt${|? $fp16 fp16 bf16}${|? $cp cp}

to avoid the forking of a process to get the result and allow values to end in newline characters (a feature called valsub (value substitution) copied from mksh).

Note that that ternary operator evaluates the code in the first argument to decide whether to return $2 or $3, so those $fp16/$cp are expected to contain either true or false. Change to $(? '[[ $fp16 = true ]]' fp16 bp16) to check whether $fp16 contains true or anything else.

See also this discussion on the zsh mailing list for some builtin approaches at a ternary operator. And this Q&A about valsubs with details about it and alternatives.

1

If fp16 is a configuration variable, then I wouldn't do "${fp16}" && data_type=fp16 since it turns that configuration variable into a command. Even if we discount the possibility of someone putting something like reboot there, even a typo would cause some odd-looking error messages (like "tru: command not found", or whatever).

Then again, perhaps that just works as a reminder to validate the values your script gets, e.g. with a checker function like:

checkbool() {
    case $1 in
        true|false) return 0;; 
        *) echo >&2 "'$1' is an invalid boolean (must be 'true' or 'false'";
           exit 1;;
    esac
}
checkbool "$fp16"
checkbool "$bf16"
# ...

Also consider if fp16 and bf16 even make sense as independent variables?

In:

"${fp16}" && data_type=fp16
"${bf16}" && data_type=bf16

if both fp16 and bf16 are true, the latter takes precedence. And if neither is set, data_type is left empty, which may or may not be valid. I'm not sure of your exact scenario, but I am left wondering if it'd be better to just have data_type as the configuration variable directly. Okay, the post says there's a reason to not use data_type directly, but thinking about what happens if both or none of the settings are enabled might still make sense.

Anyway, if you want the parameter expansions like "${bf16:+bf16}" to work, you need to use an empty value as falsy, and any non-empty string as truthy. Then you could do e.g. data_type="${enable_fp16:+fp16}", but even that seems hard to use since I don't think there's a good way to get the empty string to turn into the default value, without leaking some other value there. E.g. the opposite "${enable_fp16:-bf16}" would turn the empty string into bf16, but it'd also return the string yes as-is.

And if you were to use empty/non-empty values within the script, would you want to expose that bit internal detail in the configuration to the user? Or would be just better for usability to write out the conditionals to turn the config values into what ever values the script actually needs, clunky or not?

I would go with something like this, which perhaps feels verbose but didn't take that too long to write, really:

# config
batch_size=16
fp16=false
bf16=true
checkpoint_activations=true
## code
# this treats anything that's not 'true' as falsy
if   [[ $fp16  = true && $bf16 != true ]]; then
    data_type=fp16
elif [[ $fp16 != true && $bf16  = true ]]; then
    data_type=bf16
else
    echo >&2 "exactly one of fp16 and bf16 must be 'true'"
    exit 1
fi
cp=
if [[ $checkpoint_activations = true ]]; then
    cp=_cp
fi
# (maybe the value of $batch_size should also be checked, whatever

output_dir="experiment_bs${batch_size}_dt${data_type}${cp}"

Of course one could also check on each assignment to data_type if it was already set, instead of checking the value of each input variable on each condition. (As done above, adding a third variable would require changes to both existing conditions too.)

If you want to go the concise way, the two-way choice function from Stéphane's answer would also work in Bash with minor modifications. Though I'd still rather explicitly check the value, so something like this maybe:

choose() if [[ $1 = true ]]; then printf "%s\n" "$2"
         else printf "%s\n" "$3"
         fi
data_type=$(choose "$fp16" fp16 bf16)
# etc.

Of course, the decision between verbose and explicit vs. compact and concise code is always up to the programmer.

ilkkachu
  • 138,973
  • Thank you. I learned some new things from your answer. As a side question: I'm used to use == in bash, is there a difference to =? – Green绿色 Feb 07 '24 at 02:13
  • 1
    @Green绿色 = is the standard, == is a synonym in bash and some other shells: https://unix.stackexchange.com/a/382012/70524 – muru Feb 07 '24 at 03:57