-1

I have two variables, passed as command line arguments to a bash function, containing exclude and include information on file type extension, using comma as delimiter.

excl="el"
excl="el,htm"
incl="texi,org"

I want to use excl and incl to parse the exclude and include options for grep.

excl="el,htm"
incl="texi,org"
grep -hir --exclude=\*.{el,htm} --include=\*.{org,texi} "$@"

excl="el" incl="texi,org" grep -hir --exclude=*.el --include=*.{org,texi} "$@"

Pietru
  • 389
  • 1
  • 17
  • You can pass what you want to exclude and include by specifying in excl and incl. To reflect excl="el,htm" we would then want to have --exclude=\*.{el,htm}. – Pietru Jul 24 '21 at 12:25
  • Specifying both include and exclude doesn't make sense here, as it appears both are being to limited to filename "extensions" by prepending *. (e.g. a filename excluded by matching *.el would never be included by matching *.org). – rowboat Jul 24 '21 at 14:14
  • I can see your reasoning that it has to be one or the other. – Pietru Jul 24 '21 at 14:19

2 Answers2

2

You really need to use arrays to keep track of strings that are separate. Using a single string to hold multiple values makes it impossible to use the code with suffixes that have the delimiter embedded (for example the suffix ,v, which CVS and RCS files have).

exclude=( .el .htm )
include=( .texi .org )

opts=( -h -i -r )

for ext in "${exclude[@]}"; do opts+=( --exclude="$ext" ) done for ext in "${include[@]}"; do opts+=( --include="$ext" ) done

grep "${opts[@]}" "$@"

This stores your filename suffixes in two arrays, exclude and include. It then iterates over the elements of both arrays, adding the appropriate options to a third array called opts. This third array is then used in the call to grep.

The double quoting used when expanding an array, as in e.g. "${opts[@]}", ensures that each individual array element is double quoted and not further split or globbed by the shell.


As a function that takes the lists of included and excluded filename suffixes as two separate arguments:

call_grep () {
    local -n include="$1"
    local -n exclude="$2"
shift 2

local opts=( -h -i -r )
local ext

for ext in "${exclude[@]}"; do
    opts+=( --exclude="*$ext" )
done
for ext in "${include[@]}"; do
    opts+=( --include="*$ext" )
done

grep "${opts[@]}" "$@"

}

The main part of the script:

excl=( .el .htm )
incl=( .texi .org )

call_grep incl excl more arguments here

This sets up the function call_grep to take the names of two arrays. The first array is the array of filename suffixes to include and the second one is the array of suffixes to exclude. The function receives the names of these arrays and uses them to set up two local name-reference variables. The third argument onwards are passed to grep as is.


Again, but using real command line parsing in call_grep:

call_grep () {
        OPTIND=1
    local ext opt
    local opts=( -h -i -r )

    while getopts 'i:e:' opt; do
            case $opt in
                    i)
                            local -n include="$OPTARG"
                            for ext in "${include[@]}"; do
                                    opts+=( --include="*$ext" )
                            done
                            ;;
                    e)
                            local -n exclude="$OPTARG"
                            for ext in "${exclude[@]}"; do
                                    opts+=( --exclude="*$ext" )
                            done
                            ;;
                    *)
                            echo 'Error in option parsing' >&2
                            exit 1
            esac
    done

    shift "$(( OPTIND - 1 ))"

    grep "${opts[@]}" "$@"

}

The function now takes a -i and a -e argument (both are optional). The option argument to each should be the name of an array containing filename suffixes to include or exclude.

You would use this as

excl=( .el .htm )
incl=( .texi .org )

call_grep -i incl -e excl -- more arguments here

You would need to use -- to delimit the function's arguments from those that should be passed directly to grep.


If all you want is a simplified way of calling grep, which does not mention shell patterns or long options:

call_grep () {
        OPTIND=1
    while getopts 'i:e:' opt; do
            case $opt in
                    i)      opts+=( --include="*$OPTARG" ) ;;
                    e)      opts+=( --exclude="*$OPTARG" ) ;;
                    *)      echo 'error' >&2; exit 1
            esac
    done

    shift "$(( OPTIND - 1 ))"

    grep "${opts[@]}" "$@"

}

You'd use -i suffix repeatedly to include multiple suffixes, and similarly for excluding suffixes. For example,

call_grep -i .texi -e .el -e .htm -i .org -- other arguments for grep here

or

call_grep -i{.texi,.org} -e{.htm,.el} -- more here for grep
Kusalananda
  • 333,661
  • 1
    Consider adding a -- after opts and before $@? – Jeff Schaller Jul 24 '21 at 14:17
  • Using array is fair enough. But encl and excl are being passed from bash function arguments. – Pietru Jul 24 '21 at 14:22
  • 1
    @JeffSchaller What if $@ contains further options? – Kusalananda Jul 24 '21 at 14:25
  • @Pietru Uh? You might want to update your question, because it does not mention anything about that. – Kusalananda Jul 24 '21 at 14:26
  • @Pietru Could the array be built when parsing the arguments? Your function would then require multiple options to be given for multiple suffixes. – rowboat Jul 24 '21 at 14:30
  • @rowboat User just puts them together in excl and incl option. I count then build arrays after reading the comma delimited strings. myfunc --incl="texi,org". Had not thought about CVS and RCS files, but we can try to support that as well. I would need some ideas for that though. – Pietru Jul 24 '21 at 14:40
  • Perhaps the incl and excl could use \ as form of escape instruction --incl=texi,org,\,v. – Pietru Jul 24 '21 at 14:49
  • @Pietru And how would you specify a filename suffix that contained \? – Kusalananda Jul 24 '21 at 14:53
  • Do you have experience with filename suffixes with \ ? What would you suggest filename suffixes to allow ? – Pietru Jul 24 '21 at 14:56
  • @Pietru Why would you want to disqualify them? Or, putting it in other words, why would you want to implement a buggy function that can't handle certain filenames when it's fairly easy to implement one that can handle any type of filename? – Kusalananda Jul 24 '21 at 14:58
  • Am not disqualifying the possibilities, but my original intention was not so demanding on expectation you have put forward. I could do that for users if we figure an acceptable way to do it. – Pietru Jul 24 '21 at 15:02
  • @Pietru The other way to solve this is to make the user actually type in --include="*"{.text,.org} or --include='*.texi' --include='*.org'. Or even better, teach them how to use grep directly. – Kusalananda Jul 24 '21 at 15:05
  • Naturally they could use grep directly, but I am also trying to default some parameters so they can adapt it for what they search most. I suggest an incl list with the various possibilities together. What would you have in mind? I can try both ways if you like. How can we construct --incl to have ,v etc. – Pietru Jul 24 '21 at 15:09
  • @Pietru You could do that with alias grep='grep -hir' or alias grep='grep -hir --include="*.texi"'. If you want them to type --incl to define suffixes to include, then you might as well have them type --include. Sorry, I'm failing to see your issue. – Kusalananda Jul 24 '21 at 15:12
  • Definitely simpler. I actually have got a function that captures text within a begin and end region (line number p to line number q) or using either head ortail to capture from beginning or end. Was thinking of incorporating the pattern search to it as well. At least that wan the original motivation. – Pietru Jul 24 '21 at 15:17
  • Yes, can get them to type include. It was just for things to be rapid in typing and running the command. – Pietru Jul 24 '21 at 15:20
  • @Pietru Maybe you need to talk to your users before deciding what they need? – Kusalananda Jul 24 '21 at 15:21
  • The plan was to have completion for a specific utility on the terminal. E.g. indus-<tab> followed by indus-list -h where the help is not too extensive as doing info grep. Constrain things a little bit. I do not have to worry so much with relatively experienced users. – Pietru Jul 24 '21 at 15:32
  • @Pietru Now you're talking about a separate script and about completion. I'm noticing that this is not mentioned at all in your question, and I'm not sure how it's relevant to the question about passing two separate lists of filename suffixes to a shell function. – Kusalananda Jul 24 '21 at 15:36
  • I am only discussing passing two separate lists of filename suffixes, rather than the other things. But as some asked, I explained. Was asked about having the users simply use grep, – Pietru Jul 24 '21 at 15:38
  • @Kusalananda You mentioned handling any type of filename. What the implementation idea exactly? – Pietru Jul 24 '21 at 15:57
  • @Pietru See my latest variant at the end of my answer now. I'd still argue that you question is lacking a lot of details that I've had to work around in the various incarnations of my code in my answer. You may want to update the question with what you are expecting. A call signature fro the function would be nice, i.e. how you expect the function to be called. – Kusalananda Jul 24 '21 at 16:58
  • How would you use call_grep for ,v, etc ? – Pietru Jul 24 '21 at 17:04
  • @Pietru That last variation of the function? call_grep -e ,v followed by whatever other options. If you really want to use brace expansions, call_grep -e{",v",.othersuffix} etc. – Kusalananda Jul 24 '21 at 17:25
  • There is not a particular need for braces as long as things are well defined. – Pietru Jul 24 '21 at 17:31
0

If you want to take the list of extensions in the form el,htm, you might be tempted to do something like this to use brace expansion:

eval 'echo grep -hir --exclude=*.{'"$excl"'} "$@"'

but apart from the usual caveats with eval, like the fact that an unquoted semicolon in $excl would mess up the syntax by terminating the grep command, it also wouldn't work if $excl contained only one extension, or none at all, since {foo} and {} are not processed as brace expansions. So, let's forget about the brace expansion.


We're really going to end up building the list of arguments in an array, like in Kusalananda's answer above, and in Conditionally pass params to a script

Keeping with the comma as separator, the simple way to split on the comma is with read -a. It produces an array and we need another to build the list of options in.

excl="el,htm"
IFS=, read -r -a exts <<< "$excl"
opts=()
for ext in "${exts[@]}"; do 
     opts+=(--exclude="*.$ext")
done
grep -hir "${opts[@]}" "$@"

As you noticed in comments, extensions like RCS's ,v would be a problem here, and so would anything else that doesn't start with a dot. If you still want to give the extensions as one string, you could switch to accepting e.g. a colon, semicolon or a space as the delimiter, and require to user to enter the dot explicitly if they want it, so e.g. with : as the delimiter:

excl=".html:,v"
IFS=: read -r -a exts <<< "$excl"
opts=()
for ext in "${exts[@]}"; do 
     opts+=(--exclude="*$ext")
done
grep -hir "${opts[@]}" "$@"

Of course, what ever character you choose to use as separator, can't be part of an extension, but semicolons and colons are probably rarer still than commas.

ilkkachu
  • 138,973