3

I have the following JPEG files :

$ ls -l
-rw-r--r-- 1 user group 384065 janv. 21 12:10 CamScanner 01-10-2022 14.54.jpg
-rw-r--r-- 1 user group 200892 janv. 10 14:55 CamScanner 01-10-2022 14.55.jpg
-rw-r--r-- 1 user group 283821 janv. 21 12:10 CamScanner 01-10-2022 14.56.jpg

I use $ img2pdf to transform each image into a PDF file. To do that :

$ find . -type f -name "*.jpg" -exec img2pdf "{}" --output $(basename {} .jpg).pdf \;

Result :

$ ls -l *.pdf
-rw-r--r-- 1 user group 385060 janv. 21 13:06 CamScanner 01-10-2022 14.54.jpg.pdf
-rw-r--r-- 1 user group 201887 janv. 21 13:06 CamScanner 01-10-2022 14.55.jpg.pdf
-rw-r--r-- 1 user group 284816 janv. 21 13:06 CamScanner 01-10-2022 14.56.jpg.pdf

How can I remove the .jpg part of the PDF filenames ? I.e., I want CamScanner 01-10-2022 14.54.pdf and not CamScanner 01-10-2022 14.54.jpg.pdf.

Used alone, $ basename filename .extension prints the filename without the extension, e.g. :

$ basename CamScanner\ 01-10-2022\ 14.54.jpg .jpg
CamScanner 01-10-2022 14.54

But it seems that syntax doesn't work in my $ find command. Any idea why ?

Note : if you replace $ img2pdf by $ echo it's the same, $ basename doesn't get rid of the .jpg part :

$ find . -type f -name "*.jpg" -exec echo $(basename {} .jpg).pdf \;
./CamScanner 01-10-2022 14.56.jpg.pdf
./CamScanner 01-10-2022 14.55.jpg.pdf
./CamScanner 01-10-2022 14.54.jpg.pdf
ChennyStar
  • 1,743
  • 1
    Crystal ball guess: $() is shell expansion, that's done before find even gets called. At that point, basename tries to work on the literal {}. – Ulrich Schwarz Jan 21 '22 at 07:12

3 Answers3

5

The issue with your find command is that the command substitution around basename is executed by the shell before it even starts running find (as a step in evaluating what the arguments to find should be).

Whenever you need to run anything other than a simple utility with optional arguments for a pathname found by find, for example if you need to do any piping, redirections or expansions (as in your question), you will need to employ a shell to do those things:

find . -type f -name '*.jpg' \
    -exec sh -c 'img2pdf --output "$(basename "$1" .jpg).pdf" "$1"' sh {} \;

Or, more efficiently (each call to sh -c would handle a batch of found pathnames),

find . -type f -name '*.jpg' -exec sh -c '
    for pathname do
        img2pdf --output "$(basename "$pathname" .jpg).pdf" "$pathname"
    done' sh {} +

Or, with zsh,

for pathname in ./**/*.jpg(.DN); do
    img2pdf --output $pathname:t:r.png $pathname
done

This uses the globbing qualifier .DN to only match regular files (.), to allow matching of hidden names (D), and to remove the pattern if no matches are found (N). It then uses the :t modifier to extract the "tail" (filename component) of $pathname, :r to extract the "root" (no filename suffix) of the resulting base name, and then adds .png to the end.

Note that all of the above variations would write the output to the current directory, regardless of where the JPEG file was found. If all your JPEG files are in the current directory, there is absolutely no need to use find, and you could use a simple loop over the expansion of the *.jpg globbing pattern:

for pathname in ./*.jpg; do
    img2pdf --output "${pathname%.jpg}.png" "$pathname"
done

The parameter substitution ${pathname%.jpg} removes .jpg from the end of the value of $pathname. You may possibly want to use this substitution in place of basename if you want to write the output to the original directories where the JPEG files were found, in the case that you use find over multiple directories, e.g., something like

find . -type f -name '*.jpg' -exec sh -c '
    for pathname do
        img2pdf --output "${pathname%.jpg}.pdf" "$pathname"
    done' sh {} +

See also:

Kusalananda
  • 333,661
  • Great and thorough answer. Thanks ! – ChennyStar Jan 22 '22 at 04:34
  • To write the output to the original directories where the JPEG files were found, the for loop is not necessary : find . -type f -name '*.jpg' -exec sh -c 'img2pdf --output "${1%.jpg}.pdf" "$1"' sh {} \; – ChennyStar Jan 30 '22 at 05:13
  • @ChennyStar No, the for loop is not necessary, but it make it much more efficient as only a single shell is spawned for many found images. Without the loop, you run not only img2pdf for each found file, but also sh -c, which makes it slower. This would be significant if you have many images. – Kusalananda Jan 30 '22 at 06:56
  • OK, understood ! Thanks ! – ChennyStar Jan 30 '22 at 06:58
  • While I agree in theory, in practice the spawning of a shell for each file doesn't seem to have that much of an impact. I ran img2pdf on each .jpg file found in /usr (890 in my cases, I made a copy of /usr in /tmp for these tests). Roughly 4'58" in both cases (with or without a for loop). You can try yourself (make a copy of /usr to /tmp first) : time sudo find /tmp/usr -name '*.jpg' -exec sh -c 'img2pdf --output "${1%.jpg}.pdf" "$1"' sh {} \; vs time sudo find /tmp/usr -name '*.jpg' -exec sh -c 'for f do img2pdf --output "${f%.jpg}.pdf" "$f"; done' sh {} \+ – ChennyStar Jan 30 '22 at 10:39
  • @ChennyStar I'm wondering what you have against the loop. Here's another example: Try time find / -type f -exec sh -c 'echo .' \; 2>/dev/null | wc -l to count all files on the system that your user can see, in a very inefficient way (starts sh for every file). Then try with time find / -type f -exec sh -c 'for p do echo .; done' sh {} + 2>/dev/null | wc -l. It obviously depends on what you're doing an on how many files you're doing it, but you will always find that spawning as few processes as possible will be faster. – Kusalananda Jan 30 '22 at 11:24
  • Thanks. I have nothing against the loop, I was just surprised to find the same result with and without the loop, I expected better performances with the loop. But indeed I tested your examples and the difference is huge (almost 6 minutes vs 7 seconds with the loop, for around 300.000 files). I think in my examples with img2pdf the results were the same because it was only 890 files, and im2pdf itself takes about 400ms per file, so the impact of the spawning was very low. Anyway, thanks again, I understood a few things thanks to your answer. – ChennyStar Jan 30 '22 at 11:48
  • @ChennyStar Yeah, the overhead of starting and running img2pdf drowned out the time used for starting sh. – Kusalananda Jan 30 '22 at 11:52
1

From a pragmatic point of view, once you have reached the state you describe at the end of your question:

./CamScanner 01-10-2022 14.56.jpg.pdf
./CamScanner 01-10-2022 14.55.jpg.pdf
./CamScanner 01-10-2022 14.54.jpg.pdf

You have the option of using rename to get the final file names you want:

~ rename 's/jpg.pdf/pdf/'
0

@Ulrich Schwarz's comment is apt. To bring it full circle, let's assume that you don't have any filenames with quotes or single-quotes in them.

Adapt your find syntax to simply output the basename sans the .jpg, and then use awk perhaps, to reconstruct the img2pdf syntax utilizing the .jpg and .pdf extensions where appropriate:

This find command will output the bare basename:

$ find . -type f -name "*.jpg" -exec basename {} .jpg \;
CamScanner 01-10-2022 14.56
CamScanner 01-10-2022 14.55
CamScanner 01-10-2022 14.54

Now pass those basenames to awk and let awk construct the correct syntax for img2pdf:

$  find . -type f -name "*.jpg" -exec basename {} .jpg \; | \
    awk '{print "cp -vp '\''" $0 ".jpg'\'' --output '\''" $0 ".pdf'\''"}'
img2pdf 'CamScanner 01-10-2022 14.56.jpg' --output 'CamScanner 01-10-2022 14.56.pdf'
img2pdf 'CamScanner 01-10-2022 14.55.jpg' --output 'CamScanner 01-10-2022 14.55.pdf'
img2pdf 'CamScanner 01-10-2022 14.54.jpg' --output 'CamScanner 01-10-2022 14.54.pdf'

If that syntax looks okay, then pipe it to your favorite shell.

Jim L.
  • 7,997
  • 1
  • 13
  • 27
  • 2
    Double quotes in filenames wouldn't be a problem but newline characters or leading dashes would be. With some shells, backslash or exclamation marks would be. With yash, so would sequences of bytes not forming valid characters in the locale. – Stéphane Chazelas Jan 21 '22 at 08:57