1

I'm aware I can use awk to parse on multiple delims, but that spawns subprocesses. I wanted to know if compound/nested bash parameter expansion is possible.

I have PDFs in a directory named like "Px_MM-DD-YY_SSSSSSSSSS.pdf" where:

  • "Px" means "Page x", and x has no leading zeroes.
  • "MM" corresponds to the two digit month, with a leading zero if applicable.
  • "DD" corresponds to the two digit day, with a leading zero if applicable.
  • "YY" corresponds to the two digit year, with a leading zero if applicable.
  • "SSSSSSSSSS" corresponds to the ten digit epoch time the PDF was created, which allows me to keep PDF page revisions.

I have a for loop (I'll drop "-mtime" when I'm ready to operate on all the PDFs)

for file in $(find -type f -iname '*_??????????.pdf' -mtime -1)
do
    echo $file
done

where I want to echo only the epoch time.

I can use this for loop

for file in $(find -type f -iname '*_??????????.pdf' -mtime -1)
do
    echo ${file##*_}
done

and for the file named like "./P14_07-21-18_4X_1532144458.pdf", "1532144458.pdf" is echoed to the screen.

I can use this for loop

for file in $(find -type f -iname '*_??????????.pdf' -mtime -1)
do
    echo ${file%.*}
done

and for the file named like "./P14_07-21-18_4X_1532144458.pdf", "./P14_07-21-18_4X_1532144458" is echoed to the screen.

If I replace the echo ... line with any of the formats below

echo ${${file##*_}:0:10}
echo ${(${file##*_}):0:10}
echo ${${file##*_}%.*}
echo ${{file%.*}##_}
echo ${${file%.*}##_}

I get -bash: ... : bad substitution. Am I not getting the syntax right or is nested/compound bash expansion not possible?

user208145
  • 2,485

3 Answers3

2

You cannot perform nested substitution with the variable in the leftmost part. So you can do ${foo#$bar}, but not what you show.

Put the result of the substitution in a variable if you want to use it in further substitutions.

  • I substituted front=${file%.*}; echo ${front##*_} into the do portion of the loop and it worked. Thanks. time says it's as fast (225ms) as the single part substitution. Piping the file names into awk with multiple delims took almost 5 seconds. – user208145 Jul 21 '18 at 04:19
2

You can't have a parameter substitution act on the result of another parameter substitution without first saving the initial result to a variable and applying the second substitution to it.

You also loop over the output of find, which is not recommended.

The correct way to supply a loop with the result of find is to call a child shell and do the loop in there:

find . -type f -iname '*_??????????.pdf' -mtime -1 -exec sh -c '
    for pathname do
        timestamp=${pathname##*_}   # remove up to last _
        timestamp=${timestamp%.pdf} # remove .pdf
        printf "pathname=%s\ttimestamp=%s\n" "$pathname" "$timestamp"
    done' sh {} +

This way, you don't have to worry about what the actual pathnames are. Filenames (i.e names of files and directories and other file types) in Unix may contain any character other than / and \0, for example space and newline. By using a command substitution on find, you force the shell to first of all do word splitting (by default on spaces, tabs and newlines) and secondly to perform filename generation on patterns found in the pathnames returned from find. Your original loop may therefore end up looping over quite different words than what you'd expect.

Related:

Kusalananda
  • 333,661
  • I avoided the -exec option because of the \{\} \; placeholders among having the other ${MyVar%%...} parts in the command. It was visually too busy. – user208145 Jul 25 '18 at 19:42
0

If I understand correctly, the problem of using Awk is that you are invoking one Awk process for each PDF file (I presume you have a HUGE number of those).

You could run something along the lines of

find . ...... -print0 | perl -0nE '/.*_(\d{10}).pdf/ and say "$1.pdf"'

Or if you keeping you structure:

for file in $(find .....| perl ....)
do 
  ...
done

(and of course, replace the Perl command by any Awk, sed, Python equivalent)

(If you have the opportunity of trying this approach tell us the time .... obtained)

JJoao
  • 12,170
  • 1
  • 23
  • 45
  • time for the single parameter expansion was around 200ms, and piping the output into awk to retrieve the same single expansion was near 500ms. – user208145 Jul 25 '18 at 19:40