Read paths on stdin and spawn a new interactive shell for each line

Question

Consider a command that searches the entire home directory for a file or directory with the wrong permissions:

$ find $HOME -perm 777

This is just an example; the command might be listing broken symlinks:

$ find $HOME -xtype l

or listing lengthy symbolic links:

$ symlinks -s -r $HOME

or any number of other expensive commands that send newline-delimited paths to stdout.

Now, I could gather the results in a pager like this:

$ find $HOME -perm 777 | less

and then cd to the relevant directories in a different virtual terminal. But I'd rather have a script that opens a new interactive shell for each line of output, like this:

$ find $HOME -perm 777 | visit-paths.sh

This way I can e.g. inspect each file or directory, check the timestamp, decide whether I need to change the permissions or delete files, etc.

It's doable with a bash script that reads paths either from a file or from stdin, like so:

#! /usr/bin/env bash

set -e

declare -A ALREADY_SEEN
while IFS='' read -u 10 -r line || test -n "$line"
do
    if test -d "$line"
    then
        VISIT_DIR="$line"
    elif test -f "$line"
    then
        VISIT_DIR="$(dirname "$line")"
    else
        printf "Warning: path does not exist: '%s'\n" "$line" >&2
        continue
    fi
    if test "${ALREADY_SEEN[$VISIT_DIR]}" != '1'
    then
        ( cd "$VISIT_DIR" && $SHELL -i </dev/tty )
        ALREADY_SEEN[${VISIT_DIR}]=1
        continue
    else
        # Same as last time, skip it.
        continue
    fi
done 10< "${*:-/dev/stdin}"

This has some good points, such as:

The script opens a new shell as soon as a new line of output appears on stdin. This means I don't have to wait for the slow command to finish entirely before I start doing things.
The slow command keeps running in the background while I am doing things in the newly spawned shell, so the next path is potentially ready to visit by the time I am done.
I can break out of the loop early if necessary with e.g. false; exit or just Ctrl-C Ctrl-D.
The script handles both filenames and directories.
The script avoids navigating to the same directory twice in a row. (Thanks to @MichaelHomer for explaining how do this with associative arrays.)

However, there is a problem with this script:

The whole pipeline exits if the last command has a non-zero status, which is useful for exiting early but in general requires checking $? each time to prevent accidental early exit.

To try addressing this issue, I wrote a Python script:

#! /usr/bin/env python3

import argparse
import logging
import os
import subprocess
import sys

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='Visit files from file or stdin.'
    )
    parser.add_argument(
        '-v',
        '--verbose',
        help='More verbose logging',
        dest="loglevel",
        default=logging.WARNING,
        action="store_const",
        const=logging.INFO,
    )
    parser.add_argument(
        '-d',
        '--debug',
        help='Enable debugging logs',
        action="store_const",
        dest="loglevel",
        const=logging.DEBUG,
    )
    parser.add_argument(
        'infile',
        nargs='?',
        type=argparse.FileType('r'),
        default=sys.stdin,
        help='Input file (or stdin)',
    )
    args = parser.parse_args()
    logging.basicConfig(level=args.loglevel)
    shell_bin = os.environ['SHELL']
    logging.debug("SHELL = '{}'".format(shell_bin))
    already_visited = set()
    n_visits = 0
    n_skipped = 0
    for i, line in enumerate(args.infile):
        visit_dir = None
        candidate = line.rstrip()
        logging.debug("candidate = '{}'".format(candidate))
        if os.path.isdir(candidate):
            visit_dir = candidate
        elif os.path.isfile(candidate):
            visit_dir = os.path.dirname(candidate)
        else:
            logging.warning("does not exist: '{}'".format(candidate))
            n_skipped +=1
            continue
        if visit_dir is not None:
            real_dir = os.path.realpath(visit_dir)
        else:
            # Should not happen.
            logging.warning("could not determine directory for path: '{}'".format(candidate))
            n_skipped +=1
            continue
        if visit_dir in already_visited:
            logging.info("already visited: '{}'".format(visit_dir))
            n_skipped +=1
            continue
        elif real_dir in already_visited:
            logging.info("already visited: '{}' -> '{}'".format(visit_dir, real_dir))
            n_skipped +=1
            continue
        if i != 0:
            try :
                response = input("#{}. Continue? (y/n) ".format(n_visits + 1))
            except EOFError:
                sys.stdout.write('\n')
                break
            if response in ["n", "no"]:
                break
        logging.info("spawning '{}' in '{}'".format(shell_bin, visit_dir))
        run_args = [shell_bin, "-i"]
        subprocess.call(run_args, cwd=visit_dir, stdin=open('/dev/tty'))
        already_visited.add(visit_dir)
        already_visited.add(real_dir)
        n_visits +=1

    logging.info("# paths received: {}".format(i + 1))
    logging.info("distinct directories visited: {}".format(n_visits))
    logging.info("paths skipped: {}".format(n_skipped))

However, I'm having some issues with the replies to the Continue? (y/n) prompt being passed to the shell that is spawned, causing errors like y: command not found. I suspect the problem is on this line:

subprocess.call(run_args, cwd=visit_dir, stdin=open('/dev/tty'))

Do I need to do something different with the stdin when using subprocess.call?

Alternatively, is there a widely available tool that makes both scripts redundant that I just haven't heard of?

Bash 4.0 and later has associative arrays; would that meet your requirements? — Michael Homer, Mar 30 '19 at 07:02
@MichaelHomer For avoiding duplicates, I would need to check for array membership, which as far as I know isn't any easier with associative arrays than it is with regular Bash arrays. — Nathaniel M. Beaver, May 22 '19 at 02:17
Of course it’s easier, you can just check directly whether the key is set. Sets are just associative arrays of keys to irrelevant singleton values. — Michael Homer, May 22 '19 at 02:21
@MichaelHomer Gosh, you're right! I never thought of it that way, thanks. — Nathaniel M. Beaver, May 22 '19 at 02:44

LL3 · Accepted Answer · 2019-05-22T18:47:29.987

Your Bash script seems to be doing everything as intended, it only needs a || break after the subshell that spawns the interactive shell: that way when you exit from that interactive shell with an induced error like a Ctrl+C immediately followed by a Ctrl+D, or a exit 1 command, you exit early from the whole pipeline.

That of course, as you noted, will make it exit also when the last command you used from the interactive shell exits with an (unwanted) error, but you might easily circumvent that by either issuing a simple : as last command before any normal exit, or perhaps (as a possibly better solution) by testing for Ctrl+C as the only accepted way to quit the entire pipeline, that is by using || { [ $? -eq 130 ] && break; } (instead of just || break) after the subshell that spawns the interactive shell.

As a much simpler approach that doesn't require associative arrays at all, you might just uniq-ing the output from find as in:

find . -perm 777 -printf '%h\n' | uniq | \
(
while IFS= read -r path ; do
    (cd "${path}" && PS1="[*** REVISE \\w]: " bash --norc -i </dev/tty) || \
        { [ $? -eq 130 ] && break; }
done
)

Of course that requires a names source that produces consecutive duplicates (when there are any), like find does. Or you might reorder them by using sort -u instead of uniq, but then you would have to wait for the sort to finish, before seeing the first interactive shell spawn, which is a feat you seem not to desire.

Let's then see the Python script approach.

You don't say how you are invoking it, but if you are using it through a pipe as in:

names-source-cmd | visit-paths.py

then you're using stdin for two conflicting purposes: input for names, and input for your Python's input() function.

You might then want to rather invoke your Python script like in:

names-source-cmd | visit-paths.py /dev/fd/3 3<&0 < /dev/tty

Note the redirections done in the above example: we first redirect the just-created pipe (which will be stdin in that part of the pipeline) to the arbitrary file-descriptor 3 and then reopen stdin onto the tty so that the Python script can use it for its input() function. File-descriptor 3 is then used as source of names via your Python script's argument.

You might also consider the following proof-of-concept:

find | \
(
while IFS= read -ru 3 name; do
    echo "name is ${name}"
    read -p "Continue ? " && [ "$REPLY" = y ] || break
done 3<&0 < /dev/tty
)

The above example uses the same redirection trick. You might therefore use it for your own Bash script, the one that caches seen paths in associative arrays and spawns an interactive shell on each newly seen path.

The main problem with this approach is that exiting early is difficult; I think I'd have to kill the process from another shell somewhere. It's also not in the form of a standalone script and only eliminates consecutive duplicates, but those are more minor issues. — Nathaniel M. Beaver, May 22 '19 at 02:22
@NathanielM.Beaver It's easy to exit early: just add a || break after the cd - bit. Then you can use false; exit or Ctrl+C Ctrl+D or exit 1 for exiting with an error from the spawn interactive shell. Anyway, I've updated my answer also as per your updated question and about everything you pointed out — LL3, May 22 '19 at 16:20
Thanks, that does the job. Can you recommend a resource to understand how the /dev/fd/3 3<&0 < /dev/tty part works? My knowledge of file descriptors and redirection is rudimentary. — Nathaniel M. Beaver, May 23 '19 at 22:03
@NathanielM.Beaver Not really, sorry. You might try and delve the forest of Q&A about [file-descriptors] first, as they are a fundamental Unix concept, and [io-redirection] afterwards, as they are operations done by shells upon file-descriptors. Perhaps after having done so, you might try and ask yet another question to see if someone comes up with a good answer that says-it-all-in-one. I expect a good and comprehensive answer would not be short. In this specific case of yours there's also a pipeline involved, which builds on top of those concepts and may expand the whole topic quite a bit.. — LL3, May 23 '19 at 23:08

Nathaniel M. Beaver · Answer 2 · 2020-05-28T13:49:38.977

Just as a follow-up, the python script can be fixed like this:

if args.infile == sys.stdin:
    old_stdin = sys.stdin
    sys.stdin = open('/dev/tty')
    args.infile = old_stdin

https://stackoverflow.com/questions/7141331/pipe-input-to-python-program-and-later-get-input-from-user

https://stackoverflow.com/questions/8034595/python-raw-input-following-sys-stdin-read-throws-eoferror

https://stackoverflow.com/questions/40270252/eoferror-when-using-input-after-using-sys-stdin-buffer-read

https://bugs.python.org/issue512981

https://bugs.python.org/issue29396

Read paths on stdin and spawn a new interactive shell for each line

2 Answers2