2

Very often I need to apply a certain simple function to a list (or more precisely to a string where substrings that I want to treat as separate items are delimited by a new line). Say I need to extract certain numbers from a list of filenames of files containing a certain other string, say stringToBeSearched. A simple solution to get the appropriate list of file names would be

grep -l "stringToBeSearched" *

I then simply want to feed this to another function that takes the substring I want. To try to do this I define for example

 f () { echo $(sed 's/begin-\([0-9]*\).end/\1/' <<<$1) ;}

which should extract digits in a file of for example the format begin-123.end. I would already prefer to avoid defining such a function at all since it won't be reused, but I can't seem to find the equivalent of what in Mathematica would be called Pure Functions, i.e. something of the form #1 +#2 & for a anonymous function to add two arguments together.

Applied to a string this function does what I want so the only step remaining is to apply it to the correct list of strings. I thought I would be able to do this using

  grep -l "stringToBeSearched" * | xargs -n1 f

Only this does not seem to work because xargs does not know the function f. The wrong scope I guess. The solution is suggest to be exporting f (https://stackoverflow.com/a/11003457/7238575), but that does not seem to help. Others (https://stackoverflow.com/questions/11003418/calling-shell-functions-with-xargs) suggest we also need to call a new instance of bash.

However, if I try

grep -l "stringToBeSearched" * | xargs -n1 bash -c f

it only prints a list of white lines.

Clearly, there must be a much simpler way to do something as simple as applying a function f to a list.


Example input and output: There are some files containing the text stringToBeSearched. Say one named begin-1.end and one named begin-2.end. Say these files are hidden among files not containing stringToBeSearched. I want to obtain a list of the numbers in the filenames of those files that do contain stringToBeSearched. So in this case I want to obtain a list containing 1 and 2. Ideally I also have an easy way to apply a function not mentioned above say f2 to these functions. So that in the end I want to be able to run f2 1 and 'f2 2.


If this is an XY problem I would appreciate an answer explaining why this is not the method at all more than the answer to the technical problem. The main point of the question is not how to find these numbers I am looking for (although I would like an answer to that too). It is to ask what the general method of applying a function to a list is. The specific problem explained above is just one instance of the kind of problem where I require the operation of applying a function to a list. It is meant to illustrate the problem of not being able to apply a function to the list. It is not the main problem itself.

Kvothe
  • 413
  • 6
  • 14
  • 1
    Are you looking for: grep -l "Updating" * | xargs sed 's/begin-\([0-9]*\).end/\1/', or grep -l "Updating" * | sed 's/begin-\([0-9]*\).end/\1/'? It looks like you want one of these, but provide example input and output to clarify. – muru Sep 23 '19 at 16:11
  • In bash, a list (or array) is a different entity to a string containing newline-delimited items. – Chris Davies Sep 23 '19 at 16:18
  • 2
    Moving away from your abstract, can you explain using example input and output what it is that you want to do. – Chris Davies Sep 23 '19 at 16:18
  • roaima, that's fine but I imagine it can't be hard to go from a list to the delimited string. Indeed it sounds like doing that might be a part of a solution. – Kvothe Sep 23 '19 at 16:19
  • It's not hard, but I wouldn't give you a solution that involved iterating across a list if you wanted a solution that worked with parts of a string. – Chris Davies Sep 23 '19 at 16:22
  • 4
    You seem to be focused on an XY Problem. Please state the actual goal instead of what you imagine to be the solution – muru Sep 23 '19 at 16:22
  • @muru, if this is an XY problem I would appreciate an answer explaining why this is not the method at all more than the answer to the technical problem. The main point of the question is not how to find these numbers I am looking for. It is to ask what the general method of applying a function to a list is. The specific problem explained above is just one instance of the kind of problem where I require the operation of applying a function to a list. – Kvothe Sep 23 '19 at 16:26
  • 2
    I am still trying to understand your example. – Chris Davies Sep 23 '19 at 16:27
  • @Kvothe: couldn't you simply export your function like this export -f f and use it with xargs? – Arkadiusz Drabczyk Sep 23 '19 at 16:29
  • @ArkadiuszDrabczyk, I literally tried that, but maybe I did it wrong. I get the same error message as before exporting "xargs: f: No such file or directory". – Kvothe Sep 23 '19 at 16:31
  • xargs -n1 bash -c 'f "$@"' {} – Arkadiusz Drabczyk Sep 23 '19 at 16:32
  • @ArkadiuszDrabczyk, nice, thanks! That seems to be it. Can you explain why it should be like that? – Kvothe Sep 23 '19 at 16:38
  • Ah, I see. You just want grep -l "Updating" * | sed 's/begin-\([0-9]*\).end/\1/' – muru Sep 23 '19 at 16:43
  • 2
    Getting a list of filenames from grep -l is no better than parsing ls unless you're using GNU grep with the -Z option for NUL-separated output (note freebsd grep's -z doesn't work with -l). Use grep -l -Z ... – cas Sep 24 '19 at 04:37

5 Answers5

4

To apply a function to a list you simply iterate over it:

list=(one two 'twenty one' banana)

f() {
    echo "This is f applied to '$1'"
}

for item in "${list[@]}"
do
    f "$item"
done

If you have a (space) delimited list you can either convert this to an array (a list) or step across it. Note that here any item in the unquoted list that contains a wildcard (*, ?, [...]) will be evaluated as usual in the context of the current directory, so we need to disable that action first (this alone is a good reason for using arrays/lists rather than a string of space-separated items):

text='one two twenty-one banana'

OIFS="$IFS" IFS=' ' OSHELLOPTS="$SHELLOPTS"
set -o noglob

for item in $text
do
    f "$item"
done

IFS="$OIFS"
[[ ! "$OSHELLOPTS:" =~ [=:]noglob: ]] && set +o noglob

Variations abound; here's one with a colon-separated list:

text='one:two:twenty one:banana'

OIFS="$IFS" IFS=':' OSHELLOPTS="$SHELLOPTS"
set -o noglob
...
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
1

All that about pure functions and currying and what not is good for a language with first class functions, but with shell scripting, pipelines are what you should be looking for.

When it comes to the point that you explicitly have to say: take line from input, do X, output, you need to step back and re-examine what you're doing. Most standard tools automatically take lines from input, do X and output, so usually you just need to get the right tool and the right X. So if you end up in a situation where you take a line from input, use that as input to a tool that can already take lines from input, and then capture that command's output and reuse it for output... something's off.

In this case, that's just sed, with X being 's/begin-\([0-9]*\).end/\1/'.

Also, side note: echo $(sed ...) is pointless, just do sed ... directly. You're capturing the output using command substitution and then ... just using it again as the output.

muru
  • 72,889
1

It looks as if you'd like to get the integer N in the filename begin-N.end for each such filename that contains the string stringToBeSearched.

You can do that in a simple loop:

for filename in begin-*.end; do
    if grep -qF 'stringToBeSearched' "$filename"; then
        N=${filename%.end}
        N=${N#begin-}
        printf '%s\n' "$N"
    fi
done

The point with this is that we're not iterating over text. Text containing filenames (which is what the output of grep -l is) is very bad at encoding all possible filenames on a Unix system, especially filenames containing newlines.

Instead, we let the glob pattern begin-*.end expand to a proper list and iterate over that, testing each element of the list with grep and then extracting the integer when we find match.

You could obviously wrap this up in functions if you wish:

test_files () {
    local func="$1";   shift
    local string="$1"; shift

    # Looks for the string "$string" in all given files.
    # Calls "$func" with each pathname that contains the string.

    for pathname do
        if grep -qF "$string" "$pathname"; then
            "$func" "$pathname"
        fi
    done
}

foo () {
    # Takes a string on the form "begin-N.end" and
    # extracts and prints "N".

    local tmp="${1%.end}"
    printf '%s\n' "${tmp#begin-}"
}

test_files foo stringToBeSearched begin-*.end

This is more or less using a simple form of "callback" by which foo is called by test_files for each file that contains a particular string.

Kusalananda
  • 333,661
0

A generic solution that would work would be exporting f() and use it with xargs. For example:

$ f()
> {
>     echo param is: "$1"
> }
$ export -f f
$ grep -l string2 * | xargs -n1 bash -c 'f "$@"'  {}
param is: FILE

You need to use bash -c ... because xargs does not know about f() because as explained here:

Normally, xargs will exec the command you specified directly, without invoking a shell.

And consider using -Z with grep together -0 with xargs to work correctly with files that have a whitespace in their names.

  • Thanks, and this is yet another syntax of xargs (compared to those explained in https://superuser.com/questions/526352/what-is-diff-bentween-xargs-with-braces-and-without-in-linux)? What is referring to what here and why? "$@" is referring to (one line of) the piped input? What does the {} do? – Kvothe Sep 24 '19 at 08:45
  • It's a safe invocation: https://stackoverflow.com/a/11003457/3691891 – Arkadiusz Drabczyk Sep 24 '19 at 10:31
0

If you have a list, you can often benefit from having the commands run in parallel:

env_parallel --session
f () { echo $(sed 's/begin-\([0-9]*\).end/\1/' <<<$1) ;}
grep -l "stringToBeSearched" * | env_parallel f

Or:

f () { echo $(sed 's/begin-\([0-9]*\).end/\1/' <<<$1) ;}
export -f f
grep -l "stringToBeSearched" * | parallel f
Ole Tange
  • 35,514