If fp16
is a configuration variable, then I wouldn't do "${fp16}" && data_type=fp16
since it turns that configuration variable into a command. Even if we discount the possibility of someone putting something like reboot
there, even a typo would cause some odd-looking error messages (like "tru: command not found", or whatever).
Then again, perhaps that just works as a reminder to validate the values your script gets, e.g. with a checker function like:
checkbool() {
case $1 in
true|false) return 0;;
*) echo >&2 "'$1' is an invalid boolean (must be 'true' or 'false'";
exit 1;;
esac
}
checkbool "$fp16"
checkbool "$bf16"
# ...
Also consider if fp16
and bf16
even make sense as independent variables?
In:
"${fp16}" && data_type=fp16
"${bf16}" && data_type=bf16
if both fp16
and bf16
are true, the latter takes precedence. And if neither is set, data_type
is left empty, which may or may not be valid.
I'm not sure of your exact scenario, but I am left wondering if it'd be better to just have data_type
as the configuration variable directly. Okay, the post says there's a reason to not use data_type
directly, but thinking about what happens if both or none of the settings are enabled might still make sense.
Anyway, if you want the parameter expansions like "${bf16:+bf16}"
to work, you need to use an empty value as falsy, and any non-empty string as truthy. Then you could do e.g. data_type="${enable_fp16:+fp16}"
, but even that seems hard to use since I don't think there's a good way to get the empty string to turn into the default value, without leaking some other value there. E.g. the opposite "${enable_fp16:-bf16}"
would turn the empty string into bf16
, but it'd also return the string yes
as-is.
And if you were to use empty/non-empty values within the script, would you want to expose that bit internal detail in the configuration to the user? Or would be just better for usability to write out the conditionals to turn the config values into what ever values the script actually needs, clunky or not?
I would go with something like this, which perhaps feels verbose but didn't take that too long to write, really:
# config
batch_size=16
fp16=false
bf16=true
checkpoint_activations=true
## code
# this treats anything that's not 'true' as falsy
if [[ $fp16 = true && $bf16 != true ]]; then
data_type=fp16
elif [[ $fp16 != true && $bf16 = true ]]; then
data_type=bf16
else
echo >&2 "exactly one of fp16 and bf16 must be 'true'"
exit 1
fi
cp=
if [[ $checkpoint_activations = true ]]; then
cp=_cp
fi
# (maybe the value of $batch_size should also be checked, whatever
output_dir="experiment_bs${batch_size}_dt${data_type}${cp}"
Of course one could also check on each assignment to data_type
if it was already set, instead of checking the value of each input variable on each condition. (As done above, adding a third variable would require changes to both existing conditions too.)
If you want to go the concise way, the two-way choice function from Stéphane's answer would also work in Bash with minor modifications. Though I'd still rather explicitly check the value, so something like this maybe:
choose() if [[ $1 = true ]]; then printf "%s\n" "$2"
else printf "%s\n" "$3"
fi
data_type=$(choose "$fp16" fp16 bf16)
# etc.
Of course, the decision between verbose and explicit vs. compact and concise code is always up to the programmer.
${VAR:-default}
, i.e., a short atomic expression with clear semantics. – Green绿色 Feb 07 '24 at 02:06