Proper way to code dry run option without having to repeat myself?

Question

I'm coding a script that goes and searches for files on a remote server and transfers them back to my local computer. I want to be able to do a dry run first, so I know which files I'm bringing back.

I'm currently using a mix of getopts and output redirection from some code I found here.

It seems to me, through my research, that it's impractical to return arrays from ZSH or Bash functions. To me, that makes it hard to understand how I would code this script up without having to repeat myself a ton.

Here is my current script:

EDIT: Please forgive me mixing some bashisms with zsh things, I started writing this script using #!/bin/bash but switched to zsh.

#!/usr/local/bin/zsh
RED='\033[0;31m'
NC='\033[0m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'

dry_run=0
yesterday=1

# Establish -n flag means to do a dry run.
while getopts "ny:" flag; do
    case "$flag" in
        n) dry_run=1 ;;
        y) yesterday=${OPTARG} ;;
        *) echo 'error in command line parsing' >&2
           exit 1
    esac
done
shift $(($OPTIND-1))

# This is the folder I'm interested in getting files from
folder=${1:?"You must define a folder of interest"}

# Check to see if dry-run, if not proceed with copying the files over. 
if [ "$dry_run" -eq 1 ]; then
    print -Pn "\n%S%11F%{Initiating Dry-Run%}%s%f"

    # SSH onto server and find the most recently updated folder.
    # Then place that newest folder and folder of interest into the absolute file path.
    # Then SSH again and use find with that file-path
    # Return array of file paths
    # TODO: **THIS IS THE SECTION I NEED TO REFACTOR INTO A FUNCTION**
    bison_remote_files=($(
        {
            {
                bison_latest_run=$(ssh -qn falcon1 'find /projects/bison/git/* -mindepth 0 -maxdepth 0 -type d -printf "%T@\t%f\n"' |
                    sort -t$'\t' -r -nk1,5 |
                    sed -n "$yesterday"p |
                    cut -f2-)

                bison_remote_path=$(
                    echo $bison_latest_run |
                        awk -v folder="$folder" '{print "/projects/bison/git/"$1"/assessment/LWR/validation/"folder}')

                ssh -qn falcon1 \
                    "find $bison_remote_path -type f -name '*_out.csv' -not -path '*/doc/*' 2>/dev/null" >&3 3>&-; echo "$?"

                print -Pn "\n\n%U%B%13F%{Fetching data from:%}%u %B%12F%{ /projects/bison/git/${bison_latest_run}%}%b%f\n" >&2

            } | {
                until read -t1 ret; do
                    print -Pn "%S%11F%{.%}%s%f" >&2
                done
                exit "$ret"
            }
        } 3>&1))


    # Maninpulate remote file paths to match the local machine directory
    local_file_path=($(for i in "${bison_remote_files[@]}"; do
                           echo $i |
                               gsed -E "s|/projects/bison/git/bison_[0-9]{8}|$HOME/Documents/projects/bison|g"
                       done
                     ))

    # Loop through remote and local and show where they will be placed
    for ((i=1; i<=${#bison_remote_files[@]}; i++)); do
        print -P "\u251C\U2500%B%1F%{Remote File ->%}%b%f ${bison_remote_files[i]}"
        print -P "\u251C\U2500%B%10F%{Local File  ->%}%b%f ${local_file_path[i]}"

        if [[ $i -lt ${#bison_remote_files[@]} ]]; then
            print -Pn "\U2502\n"
        else
            print -Pn "\U2514\U2500\U2504\U27E2\n"
        fi
    done

# If it's not a dry run, grab all the files using scp
# This is the part I am stuck...
# All my defined variables are un-run in the scope above
# How do I craft a function (or something else) so I don't have to do all the above all over again?    
else
    printf "${YELLOW}Fetching Data from ${NC}(${GREEN}${bison_latest_run}${NC})${YELLOW}...${NC}\n"

    for ((i=0; i<${#NEW_RFILEP[@]}; i++)); do

        scp -qp mcdodyla@falcon1:"${NEW_RFILEP[i]}" "${LOCAL_FILEP[i]}"

        # Check if scp was successful, if it was show green.
        if [ ${PIPESTATUS[0]} -eq 0 ]; then      
            printf "${GREEN}File Created/Updated at:${NC} ${LOCAL_FILEP[i]}\n"
        else
            printf "${RED}Error Fetching File:${NC} ${NEW_RFILEP[i]}\n"
        fi
    done
    printf "${YELLOW}Bison Remote Fetch Complete!${NC}\n"
fi

As you can see, all my data gets stuck in the first if statement case and so if I want to dont want to do a dry run then I have to run all of that code again. Since bash/zsh doesn't really return arrays, how do I refactor this code?

EDIT: Here is an example use-case:

> bfetch -n "HBEP"

Initiating Dry-Run...

Fetching data from:  /projects/bison/git/bison_20190827
├─Remote File -> /projects/bison/git/bison_20190827/assessment/LWR/validation/HBEP/analysis/BK370/HBEP_BK370_out.csv
├─Local File  -> /Users/mcdodj/Documents/projects/bison/assessment/LWR/validation/HBEP/analysis/BK370/HBEP_BK370_out.csv
│
├─Remote File -> /projects/bison/git/bison_20190827/assessment/LWR/validation/HBEP/analysis/BK363/HBEP_BK363_out.csv
├─Local File  -> /Users/mcdodj/Documents/projects/bison/assessment/LWR/validation/HBEP/analysis/BK363/HBEP_BK363_out.csv
│
├─Remote File -> /projects/bison/git/bison_20190827/assessment/LWR/validation/HBEP/analysis/BK365/HBEP_BK365_out.csv
├─Local File  -> /Users/mcdodj/Documents/projects/bison/assessment/LWR/validation/HBEP/analysis/BK365/HBEP_BK365_out.csv

one way is to do something like cmd='scp', then set cmd='echo scp' on a dry-run, and then run $cmd instead of scp (note: this is one of the very few cases where you shouldn't double-quote the variable as "$cmd" because you need shell to word-split it in this case). alternatively, set precmd='' and set it to precmd='echo' for a dry-run, and run commands like $precmd scp .... — cas, Aug 30 '19 at 02:44
this will fail horribly if any of the commands you're prefixing with echo do an shell redirections. or if they rely on the effects of previous commands (e.g. changing a file that is input to another command, and many others). This is a very simple methods that works only for very simple situations. — cas, Aug 30 '19 at 02:46
alternatively, you can put the shell code that is common to both cases (dryrun=1 and dryrun=0) in one or more functions, and call them where needed from both cases (with the appropriate args, of course). — cas, Aug 30 '19 at 02:49
see also: What's the idiomatic way of returning an array in a zsh function? — cas, Aug 30 '19 at 02:51

Jim L. · Accepted Answer · 2019-08-30T22:59:53.863

I don't know zsh, but:

1) first ensure that all your "conversational" print statements go to stderr, NOT to stdout, such as:

print -Pn "\n%S%11F%{Initiating Dry-Run%}%s%f" >&2

and many others.

2) Instead of executing your scp statements, printf them to stdout, such as:

printf 'scp -qp mcdodyla@falcon1:"%s" "%s"\n' "${NEW_RFILEP[i]}"  "${LOCAL_FILEP[i]}"

This pertains to all statements that modify the filesystem, such as cp, rm, rsync, mkdir, touch, whatever. From a brief inspection of your script, scp was the only one that jumped out at me, but you know your code better than I do.

Inspect your code again, and triple-check that all fs-modifying ("irreversible") commands are converted to printf's. You don't want to miss any.

Now, just to test that you've converted your script correctly, run it, and throw away stderr:

./myscript 2>/dev/null

That should display only the stdout from your script.

You must ensure that ALL of that output is valid shell syntax. All of the informational messages should have gone to stderr, and all of the "action" statements should be printf'ed to stdout. If you've still got some informational messages leaking into stdout, go back and edit your script again and ensure that the print statements are redirected >&2.

Once you've definitively proven that you've got the info messages going to stderr, and the actual work going to stdout, your conversion is done.

To dry-run, simply run the script:

./myscript

To actually perform the work, run the script again and pipe stdout to a shell:

./myscript | zsh -v

score 0 · Answer 2 · answered Aug 30 '19 at 21:39

I'm not good enough with scripting to know what different shells are capable of.

However, if the end result is that you have to repeat all that code over again, you could build your code in a macro processor like m4 - and use the m4 processor to expand the source to the full script.

For example, if writing in an assembly language where there aren't arrays, but one has to iteratively cycle through addresses, one would write the routine once with some macro variables for the fixed addresses, and also in the macro file, define the 'array' and a for loop - and after that was processed by m4, one would have the full repetitive source.

Maybe you could do something like that here? or maybe a worthless thought. Just an idea.

Proper way to code dry run option without having to repeat myself?

2 Answers2