Bash check if item in list not behaving as expected

Question

I am trying to write a script which has a condition based on a variable appearing in a list:

#!/bin/bash
LIST=ls
function listcontains() {
    [[ $1 =~ (^|[[:space:]])$2($|[[:space:]]) ]] && return 0 || return 1
}
if [ $(listcontains "${LIST}" "multi.sh") ] ; then
     echo "Found!"
else
    echo "Failed :("
fi
$(listcontains "${LIST}" "multi.sh")
echo returned $?

There is a file named in "multi.sh" in the list so I was expecting "Found!" but the script above reports "Failed :(". The subsequent invocations returns 0.

I tried

if [ 0 -eq $(listcontains "${LIST}" "multi.sh") ] ; then

But then I get an error

./script.sh: line 9: [: 0: unary operator expected

Failed :(

What am I missing here?

This seems like a very roundabout way to implement [[ -e multi.sh ]] - what is your actual goal? — steeldriver, Feb 14 '23 at 13:12
Given an argument on command line compare this with a list extracted (using awk) from a small text database to identify matching records and the column in which the search value appearred, then do additional processing on a dataset based on each matching record. The script will not be simply looking for files - I just wrote the above as an easy way to replicate the issue without recreating the entire database. — symcbean, Feb 14 '23 at 13:15
Well, if you're trying to test the return value of a function, that would be just if listcontains "${LIST}" "multi.sh"; then ... - what you're doing now is testing whether the function outputs anything to stdout — steeldriver, Feb 14 '23 at 13:32
Thank you @steeldriver, if listcontains ... gives the desired result (if you add an answer below I'll accept it) although I still don't know why [ $(listcontains ... does not behave as I expected. — symcbean, Feb 14 '23 at 16:02
@symcbean tbh I think you would be better advised to use an array-based approach like that suggested in Gilles Quénot's answer — steeldriver, Feb 14 '23 at 17:16
DO NOT USE ls output for anything. ls is a tool for interactively looking at directory metadata. Any attempts at parsing ls output with code are broken. Globs are much more simple AND correct: for file in *.txt. Read http://mywiki.wooledge.org/ParsingLs — Gilles Quénot, Feb 14 '23 at 18:19
"The script will not be simply looking for files - I just wrote the above as an easy way to replicate the issue without recreating the entire database" — symcbean, Feb 15 '23 at 13:47

ilkkachu · Answer 1 · 2023-02-15T10:28:59.400

if [ $(listcontains "${LIST}" "multi.sh") ] ; then

You're not printing anything from the function, so the command substitution results in no fields after going through word splitting. That's the same as running if [ ]; then ..., and without any arguments between the [ and ], [ returns a falsy status.

Doing it with quotes, if [ "$(listcontains...)" ]; then, wouldn't help, as it'd just pass an empty string to [, like [ "" ], and that also returns a falsy status. (With a single argument between the brackets, it checks if that argument is non-empty.)

The exit status of the command substitution itself is only visible if there's no other command to run, which is the case when you have that $(listcontains ...) alone in a line later.

If you wanted to test it like that, with the command substitution, you'd need to print something. E.g.

listcontains() {
    if ...; then
        echo yes
    fi # else print nothing
}
if [ "$(listcontains ...)" ]; then
    echo ok
fi

(or with if [ "$(listcontains ...)" = yes ]; then ...)

But that's not necessary, as you can look at the exit status directly:

listcontains() {
    if [[ ... ]]; then
        return 0
    fi
    return 1
}
if listcontains ...; then
    echo ok
fi

And since the exit status of a function is the exit status of the last command, we can reduce the function to just:

listcontains() {
    [[ $1 =~ (^|[[:space:]])$2($|[[:space:]]) ]]
}

But you probably want to quote the $2, so that the contents will be taken literally, even if there are regex special characters in there.

E.g. the way it's above, listcontains 'foo matchxsh bar' match.sh would find a match, since the . matches any single character. Something like unmatched parentheses would likely give errors.

You can also shorten it a bit by putting spaces at the start and end of the string before matching so you don't need to care about hitting BOL/EOL:

listcontains() {
    [[ " $1 " =~ [[:space:]]"$2"[[:space:]] ]]
}

Or more POSIXly, so that it works in e.g. Dash too:

listcontains() {
    case " $1 " in
        *[[:space:]]"$2"[[:space:]]*) return 0;;
        *) return 1;;
    esac
}

score 1 · Answer 2 · answered Feb 16 '23 at 12:33

One of the easiest ways to check if an item is in a list is to convert that list into an associative array AKA "hash" (with the items being the keys, and any arbitrary value) and then test whether the item you want is an index of the array.

I typically use "0" or "1" as the value for each key. Sometimes I just test for an empty vs non-empty string. It mostly depends on what language I'm using and what it considers to be true or false.

Effectively, this is using an associative array as a simple set and testing for set membership (if a key has a value it's a member, if it doesn't, it isn't), so the value doesn't matter as long as you know what to test for and how to test for it.

No need for a regex match, just a simple test: Is the item I'm looking for a key in the associative array?

Testing for set membership is also fast...performance doesn't matter much for a one-off test, but it matters a lot if you're testing a large number of potential set members. This is especially true in an excruciatingly slow language like shell.

Here's an example using a list contained in indexed array.

$ items=(item1 item2 item3 item4)
$ declare -A itemhash
$ for i in "${items[@]}" ; do itemhash[$i]=1 ; done

This is what the indexed array and associative array currently contain:

$ declare -p items itemhash
declare -a items=([0]="item1" [1]="item2" [2]="item3" [3]="item4")
declare -A itemhash=([item1]="1" [item2]="1" [item3]="1" [item4]="1" )

OK, the hash (associative array) is populated, now we can test if it contains a particular item:

$ if [ "${itemhash[item1]}" == 1 ] ; then echo in array ; else echo not in array ; fi
in array
$ if [ "${itemhash[item5]}" == 1 ] ; then echo in array ; else echo not in array ; fi
not in array

This method works with pretty much any list, no matter the origin of the items (an indexed array, a list of filenames, the output of a database query, whatever), and it doesn't really matter how you populate the hash - the important thing is that the keys to the hash should be the names of your items, and the values for each key should be something you can easily test for.

You shouldn't have used an example that involved filenames and ls because that has been a huge distraction....but here's another example using testing for the filename 'multi.sh' in a list of filenames in the current directory.

First, when multi.sh doesn't exist in current directory:

$ declare -A foo
$ while read -d '' -r f; do foo[$f]=1 ; done < <(printf '%s\0' *)
$ if [ "${foo[multi.sh]}" == 1 ] ; then echo in array ; else echo not in array ; fi
not in array

Then create multi.sh and try again:

$ unset foo ; declare -A foo
$ touch multi.sh
$ while read -d '' -r f; do foo[$f]=1 ; done < <(printf '%s\0' *)
$ if [ "${foo[multi.sh]}" == 1 ] ; then echo in array ; else echo not in array ; fi
in array

NOTE: I've used printf '%s\0' * rather than ls because parsing the output of ls is a bad idea. Reading a NUL-separated list of filenames will work with any valid filenames, even those containing annoying characters like newlines.

BTW, recent versions of GNU ls have a --zero option for NUL-separated output - I'm still not inclined to use it, I'd rather use printf ... * as above or find.

Bash check if item in list not behaving as expected

2 Answers2