1

Shell Script in Question

Let me explain what I am trying to do by e.g. so you can understand better. Let's say I have 100 .torrent files in a directory. 2 of them will download xxx.epub and yyy.epub respectively if added to a bittorrent client, but I don't know which 2 out of the 100.

So what my script does is, (1) use find to go through all .torrent files in pwd and pass each .torrent file, as it comes by, to transmission-show which will parse the .torrent file and output metadata in human readable format. We'll then use awk to get the file name the torrent file will download and run that against the list.txt which has file names we are looking for, i.e. xxx.epub and yyy.epub.

File: findtor-array.sh

#! /bin/bash
#
# Search .torrent file based on 'Name' field.
#
# USAGE:
# cd ~/myspace # location of .torrent files
# Run `findtor ~/list.txt` (if `findtor.sh` is placed in `~/bin` or `~/.local/bin`)

# Turn the list of file names from ~/list.txt (or any file passed as argument) into an array
readarray -t FILE_NAMES_TO_SEARCH < "$1"

# For each file name from the list...
for FILE_NAME in "${FILE_NAMES_TO_SEARCH[@]}"
do
    # In `pwd` and 1 directory-level under, look for .torrent files and search them for the file name
    find . -maxdepth 2 -name '*.torrent' -type f -exec bash -c "transmission-show \"\$1\" | awk '/^Name\: / || /^File\: /' | awk -F ': ' '\$2 ~ \"$FILE_NAME\" {getline; print}'" _ {} \; >> ~/torrents.txt

    # The `transmission-show` command included in `find`, on it own, for clarity:
    # transmission-show xxx.torrent | awk '/^Name: / || /^File: /' | awk -F ': ' '$2 ~ "SEARCH STRING" {getline; print}'
done

I think the process is simple and I am doing it right (except there are no checks, I know). But somehow the whole task seems too much for the script, because after running it, after sometime it starts throwing these errors continuously until I Ctrl + C it:

_: -c: line 0: unexpected EOF while looking for matching `"'
_: -c: line 1: syntax error: unexpected end of file

Are these "scaling" issues? What am I missing and what can I do to fix it?

its_me
  • 13,959
  • Do you have filenames in the file that you give to your script that contains "? Are they regular expressions? You are currently injecting the contents of the file as code into your awk program. It would be better to pass the values properly, as described in https://unix.stackexchange.com/questions/120788/pass-shell-variable-as-a-pattern-to-awk (I would additionally switch the order of the find and the for loop so that find calls a script that loops, rather than the other way around). – Kusalananda Apr 04 '20 at 07:34
  • @Kusalananda No, and only some files only contain single quotes in their names. Thanks for your suggestion, I will look into it. – its_me Apr 04 '20 at 09:46
  • 1
    @its_me Ok, if you have a single quote in a filename, then I'm not surprised that your code breaks as it is being inserted into the awk code and will end the current single-quoted string where it occurs. Consider passing the string properly with awk -v instead. – Kusalananda Apr 04 '20 at 09:56

3 Answers3

2

FILE_NAME is being passed directly to bash -c in the -exec option of your find command. This causes problems if FILE_NAME contains quotes/shell code. In fact, arbitrary code could be executed. Example: in this particular case, the input file could contain a line '; echo "run commands";'

Instead, pass the loop var to bash -c as a positional parameter. e.g.:

find . -maxdepth 2 -name '*.torrent' -type f -exec sh -c '
transmission-show "$2" |
awk -v search="$1" '\''/^Name: / {name = substr($0,7)} /^File: / && name ~ search {print; exit}'\' \
_ "$FILE_NAME" {} \;

Also, it seems inefficient to loop over all search terms for each file. Consider looping over files and searching with grep -f file:

find . -maxdepth 2 -name '*.torrent' -type f -exec sh -c '
file=$1
shift
if transmission-show "$file" | head -n 1 | cut -d" " -f2- | grep -q "$@"; then
    printf "%s\n" "$file"
fi' _ {} "$@" \;

or without find:

for file in *.torrent */*.torrent; do
    if transmission-show "$file" | head -n 1 | cut -d' ' -f2- | grep -q "$@"; then
        printf '%s\n' "$file"
    fi
done
  • The above simply passes all arguments to grep, so usage would be findtor -f ~/list.txt to take patterns from list, -F for fixed strings, -e expression, etc.
guest
  • 2,134
  • What do you think of my answer? I don't know much bash or shell-scripting, so I had to stick with what I know or find. – its_me Apr 04 '20 at 12:20
  • @its_me: .... well, what do you think of the gracious help that you were just offered as a response to your request ? – Cbhihe Apr 04 '20 at 17:41
2

Based on suggestions from @Kusalananda, the answers (by @guest and @Jetchisel), and this detailed answer by Kevin, I came up with this:

#! /bin/bash
#
# Search for 'Name' field match in torrent metadata for all .torrent files in
# current directory and directories 1-level below.
#
# USAGE e.g.:
# cd ~/torrent-files # location of .torrent files
# Run `~/findtor.sh ~/list.txt`

# Get one file name at a time ($FILE_NAME_TO_SEARCH) to search for from list.txt
# provided as argument to this script.
while IFS= read -r FILE_NAME_TO_SEARCH; do

    # `find` .torrent files in current directory and directories 1-level under
    # it. `-print0` to print the full file name on the standard output, followed
    # by a null character (instead of the newline character that `-print` uses).
    #
    # While that's happening, we'll again use read, this time to pass one
    # .torrent file at a time (from output of `find`) to `transmission-show`
    # for the latter to output the metadata of the torrent file, followed by
    # `awk` commands to look for the file name match ($FILE_NAME_TO_SEARCH) from
    # list.txt.
    find . -maxdepth 2 -name '*.torrent' -type f -print0 |
        while IFS= read -r -d '' TORRENT_NAME; do
            transmission-show "$TORRENT_NAME" | awk '/^Name: / || /^File: /' | awk -F ': ' -v search_string="$FILE_NAME_TO_SEARCH" '$2 ~ search_string {getline; print}';
        done >> ~/torrents-found.txt

done < "$1"

I just ran this and so far it seems to be working great. So a big thank you to everyone involved!

While I did my best, any fixes and further suggestions are welcome.

its_me
  • 13,959
  • Upvoted this, glad you had it working, – Jetchisel Apr 04 '20 at 22:19
  • Hi, this is looking OK. +1 for finding your own solution. I think it could be made more efficient by not looping over the search terms. At the moment the script queries each torrent file against each search term (i.e. 100 search terms would mean that each file is queried 100 times). Instead, as you are looping over each torrent file, it should be possible to extract the "Name" information from transmission-show, then check that against all queries with grep -f ~/list – if that matches, print the filename – guest Apr 04 '20 at 23:30
  • @guest Yes, I am actually working on that part suggested in your answer. Will update my answer once I am done. Thank you! – its_me Apr 05 '20 at 00:00
0

I would write it like this.

#!/usr/bin/env bash

pattern_file="$1"

while IFS= read -r -d '' file; do
    transmission-show "$file" | awk .... "$pattern_file"   ##: Figure out how to do the awk with a file rather than looping through an array.
done < <(find . -maxdepth 2 -name '*.torrent' -type f -print0)

That should avoid the quoting hell :-)

Ok maybe the nullglob is not needed.

EDIT:

Try the find command and use it on your original script.

find . -maxdepth 2 -name '*.torrent' -type f -exec bash -c 'transmission-show "$1" | awk "/^Name\: / || /^File\: /" | awk -F ": " "\$2 ~ \"$FILE_NAME\" {getline; print}"' _ {} + >> ~/torrents.txt
Jetchisel
  • 1,264
  • Thank you, I will test it out right away. Meanwhile, isn't the code supposed end with shopt -u extglob nullglob in this case? – its_me Apr 04 '20 at 05:26
  • help shopt should tell you which option does what. Well you can disable it but if the script ends then the option ends too. – Jetchisel Apr 04 '20 at 05:29
  • shopt --help is where I am coming from with that comment. Just wanted to be sure. Also ShellCheck says if [[ $file == "$pattern" ]]; then would be ideal, what do you think? ($pattern in quotes: https://github.com/koalaman/shellcheck/wiki/SC2053) Unnecessary? – its_me Apr 04 '20 at 05:40
  • the pattern expands to @(foo|bar|baz|more|....) since it is an extglob feature , well if shellcheck says to quote then sure. Basically I will quote everything until it does not work, not the other way around. – Jetchisel Apr 04 '20 at 05:44
  • I don't think the code does what I am doing. transmission-show is supposed to get one .torrent file at a time from find. You are instead giving it one file name at a time from the list.txt which is meant to be searched for in the output of transmission-show, not the other way. Can you please take another look. – its_me Apr 04 '20 at 05:51
  • If you add set -x after the shebang and put an echo before the transmission command should tell you what it does. You're looping at the elements of the array which came from the list.txt, that is what the my code is doing also. Only it does a while loop while you're doing a for loop. – Jetchisel Apr 04 '20 at 05:54
  • And i have no idea what is the awk code doing in that script. – Jetchisel Apr 04 '20 at 05:57
  • (1) transmission-show parses each .torrent files passed to it by find and outputs all the metadata it can read from the file in a human readable format. (2) Then we have a list.txt in which we have a list of file names that some of the torrent files in the directory will download when added to client; but we don't know which. We use awk to search for each file name in the list.txt against the output of transmission-show of all .torrent files int the directory. (3) I set the checks you suggested; the set -x and echo. Confirms my suspicion. Let me know if you still don't understand. – its_me Apr 04 '20 at 06:04
  • E.g. Let's say we have 100 .torrent files in a directory. One of the .torrent files downloads a file named xxx.epub. Now, I don't know which one of those 100 .torrent files will get me the xxx.epub which is what I am out to find out. – its_me Apr 04 '20 at 06:06
  • transmission-show command without the quoting hell: transmission-show xxx.torrent | awk '/^Name: / || /^File: /' | awk -F ': ' '$2 ~ "SEARCH STRING" {getline; print}' -- What find does is provide the xxx.torrent (1000s of them in directories, one at a time); list.txt provides the SEARCH STRING (100s of them, provided one at a time from the array we created). – its_me Apr 04 '20 at 06:14