0

I am listing through files within a directory that match a given criteria. One of the things I wish to do with each file in the directory is extract its 6 figure date and place that in a variable. My script currently looks like this:

for i in $(ls $INPUT_DIR | egrep -i '^'$INPUT_FILE_PREFIX'[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01])'$INPUT_FILE_SUFFIX); do   
 MYDATE=$("$i" | grep -oP '\d{6,6}')
 echo $MYDATE
done

The above leads to the error "somefile": command not found.

What seems strange to me is that if I replace MYDATE=$("$i" | grep -oP '\d{6,6}') with echo "$i" | grep -oP '\d{6,6}' all works fine.

How do I get my script to pass "$i" as a string rather than a command?

ilkkachu
  • 138,973

5 Answers5

3

It looks like you are trying to parse out a six digit number in file names that start with $INPUT_FILE_PREFIX and ends with $INPUT_FILE_SUFFIX.

This will do that:

for name in "$INPUT_DIR/$INPUT_FILE_PREFIX"??????"$INPUT_FILE_SUFFIX"; do
    test -f "$name" || continue

    number=${name#$INPUT_DIR/$INPUT_FILE_PREFIX}
    number=${number%$INPUT_FILE_SUFFIX}

    printf "Number = %s\n" "$number"
done

Change every ? to [0-9] if you want to be sure to only match digits (? matches a single character regardless of what that character is).

The parameter substitutions in the loop removes the first part of the value of $name and then the last part of the remaining string, leaving the number (six characters between prefix and suffix) in the middle as the only thing left in the variable $number.


The command

MYDATE=$("$i" | grep -oP '\d{6,6}')

will, as you discovered, be interpreted as invoking whatever is in $i as a command. At the same time you said that putting echo in front of "$i" would make it work, which it does:

MYDATE=$(echo "$i" | grep -oP '\d{6,6}')

Related to your code: Why *not* parse `ls`?

Kusalananda
  • 333,661
  • I have chosen this as the answer, as it is does address directly my particular query, though I appreciate others for pointing out other methods of looping through the files in the directory. – paul frith Feb 14 '18 at 15:49
2

I would recommend a slightly different way of looping through the filenames -- using bash's extended globbing to gather the filenames:

shopt -s extglob
for d in "${INPUT_DIR}"/"${INPUT_FILE_PREFIX}"[0-9][0-9]@(0[1-9]|1[0-9])@(0[1-9]|[12][0-9]|3[01])"${INPUT_FILE_SUFFIX}"
do 
  [[ $d =~ ${INPUT_FILE_PREFIX}([[:digit:]]+)${INPUT_FILE_SUFFIX} ]]
  MYDATE=${BASH_REMATCH[1]}
done

The globbing syntax is nearly the same as the grep statement you had. Each set of @(...) introduces a request to match any of the given patterns, which are separated by |. I noticed that the (presumed day) pattern of [3] was a single-character class, so I removed its surrounding brackets.

Once we have the filenames in the for loop, you can use bash's conditional expression's regular expression =~ operator to strip out the digits into MYDATE.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
1
for i in $(ls $INPUT_DIR | egrep -i '^'$INPUT_FILE_PREFIX'[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01])'$INPUT_FILE_SUFFIX);

In general, there's no reason to use ls in a structure like this, it just makes the command more awkward to read, plus you run into issues with some corner cases (see ParsingLs in BashGuide). However, the regex you have can not be represented as a standard shell glob, so there's some point into using it. Though since this was tagged with , we can do it in the shell, either with extglob (or using regex matches with the [[ .. ]] construct after a wider glob).

shopt -s extglob
for i in "$INPUT_DIR/$INPUT_FILE_PREFIX"[0-9][0-9]@(0[1-9]|1[0-2])@(0[1-9]|[12][0-9]|3[01])"$INPUT_FILE_SUFFIX" ; do

If you don't really need such a strict pattern, you could just use [0-9][0-9][0-9][0-9][0-9][0-9] instead.

In the assignment to MYDATE, I assume you just want to remove the prefix and the suffix. (though if your prefix/suffix contains a six-digit string, the grep would match that, too.)

MYDATE=${i#"$INPUT_DIR/"}              # remove the directory
MYDATE=${MYDATE#"$INPUT_FILE_PREFIX"}  # remove the prefix
MYDATE=${MYDATE%"$INPUT_FILE_SUFFIX"}  # and the suffix

In full:

shopt -s extglob
for i in "$INPUT_DIR/$INPUT_FILE_PREFIX"[0-9][0-9]@(0[1-9]|1[0-2])@(0[1-9]|[12][0-9]|3[01])"$INPUT_FILE_SUFFIX" ; do
    MYDATE=${i#"$INPUT_DIR/"}              # remove the directory
    MYDATE=${MYDATE#"$INPUT_FILE_PREFIX"}  # remove the prefix
    MYDATE=${MYDATE%"$INPUT_FILE_SUFFIX"}  # and the suffix
    echo "$MYDATE"
done
ilkkachu
  • 138,973
0

Your script is passing "$i" as a string, but you are telling the shell to use it as a command. The line that says

MYDATE=$("$i" | grep -oP '\d{6,6}')

is telling the shell: "Expand the variable i, then run the resulting command $i, feeding its output through this grep command, and assign the result to the MYDATE variable". The $(...) does the same thing as the double backtick construct you use with the echo "$i" | grep -oP '\d{6,6}' command that makes it work. You simply need the echo command in either one of those to get the results you want.

ilkkachu
  • 138,973
John
  • 17,011
0

Have you tried bash process substitution? It looks like it would make your command way easier and without the need for loops or variables.

Basically process substitution is not well known but can be extremely powerful.

Process substitution feeds the output of a process (or processes) into the stdin of another process.

so, your command could be reduced to something like this:

grep -oP '\d{6,6}'<(egrep -i '^'$INPUT_FILE_PREFIX'[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01])'$INPUT_FILE_SUFFIX <(ls $INPUT_DIR))

The logic behind it is better understood with a simpler version:

user@yrmv-191108:~/nums$ ls .
1  10  2  3  4  5  6  7  8  9
user@yrmv-191108:~/nums$ grep 1 <(ls .)
1
10
Tux
  • 283
  • Thanks Jordi - I am actually running a number of processes in my loop, however for sake brevity I have just shown the one in my question above, so I think a loop is more practical for my purposes. – paul frith Feb 14 '18 at 16:58