0

I am writing a (one-line) script which should recurse thru subdirectories. Find .txt files containing hyperlinks. Use wget to get the contents and download it in the same directory where the text file is located.

Assume all text files found only contain valid hyperlinks.

To test this:
Create a subdirectory ./s1
Create a text file ./s1/s1.txt
Contents of ./s1/s1.txt: www.google.com

This is the one-liner:

find . -type f -name "*.txt" -exec bash -cx "wget -i \"{}\" -P  $(dirname \"{}\") " \;

The problem is that $(dirname \"{}\") does not expand correctly. The bash command being excuted is:

+ wget -i ./s1/s1.txt -P .

So $(dirname \"{}\") returns .
The effect is that a new directory ./s1/s1.txt is created. So the downloaded file is stored as ./s1/s1.txt/index.html

When I replace $(dirname \"{}\") with $(echo \"{}\") the output becomes:

+ wget -i ./s1/s1.txt -P ./s1/s1.txt

So parameter passing as such is correct. So I assume the result dirname is not properly returned to the calling bash shell. Or dirname is not evaluated at all.

When I execute only the bash command

bash -cx "wget -i ./s1/s1.txt -P  $(dirname ./s1/s1.txt)" 

(so outside the find command) the command is executed as expected:

+ wget -i ./s1/s1.txt -P ./s1

What is the correct way to make this one-liner work?

2 Answers2

2

As said in comments, don't try to use find placeholder {} in the bash part. This is not reliable and have possible security issue (shell injection).

Better use this way:

 find . -type f -name '*.txt' -exec sh -c '
     for file; do
         wget -i "$file" -P "$(dirname "$file")"
     done
 ' sh {} +

or using a standard parameter expansion (which besides being more efficient has the advantage of still working if the directory name ends in newline characters):

 find . -type f -name '*.txt' -exec sh -c '
     for file; do
         wget -i "$file" -P "${file%/*}"
     done
 ' sh {} +

$ tree
.
└── s1
    ├── index.html
    └── s1.txt

1 directory, 2 files

  • Thanks you all. I accepted the solution by Stéphane Chazelas for the reasons mentioned in the comment of that asnwer. I am however guilty of setting up an XY problem. I posted a bash expansion problem, but actually it is a wget problem. I should not have used bash in the first place. – Johannes Linkels Apr 16 '23 at 14:44
  • @JohannesLinkels, it's not a wget problem at all. – Stéphane Chazelas Apr 16 '23 at 14:45
2

Here you could just do:

find . -name '*.txt' -type f -execdir wget -i {} -P . ';'

Using the non-standard though very common¹ -execdir predicate of find instead of -exec to run the command from within the directory of the found file (and with {} expanding to the file name's instead of full path, possibly prepended with ./ with some find implementations including GNU find).

With GNU find and xargs, you can run a few in parallel with:

xargs -r0 -n4 -P10 -a <(
  find . -name '*.txt' -type f -printf '-i\0%p\0-P\0%h\0'
  ) wget

Where we get find to construct the list of arguments for wget and output them NUL-delimited (0 being the only byte value that can't occur in an external command line argument of file path), xargs taking 4 at a time, to run wget instances, up to 10 in Parallel.

In zsh:

for file (**/*.txt(N.)) wget -i $file -P $file:h

(add the D glob qualifier if you also want to process hidden files like in the find approach).


In your

find . -type f -name "*.txt" -exec bash -cx "wget -i \"{}\" -P  $(dirname \"{}\") " \;

The $(...) is within double quotes, so is expanded to the output of dirname \"{}\" by the shell you're entering that command in before passing the result to find.

dirname \"{}\", in sh/bash same as dirname '"{}"' like dirname anything-that-does-not-contain-a-slash-and-does-not-start-with-dash outputs . (a path to the current working directory).

So find is called with these arguments:

  1. find
  2. .
  3. -type
  4. f
  5. -name
  6. *.txt
  7. -exec
  8. bash
  9. -cx
  10. wget -i "{}" -P .
  11. ;

And find would run bash with these arguments:

  1. bash
  2. -cx
  3. wget -i "./path/to/the/file.txt" -P .

For each found file, and bash would in turn run wget with:

  1. wget
  2. -i
  3. ./path/to/the/file.txt
  4. -P
  5. .

But only if the path to the file doesn't contain \, ", ` nor " characters with potential disastrous consequences if they did (like if there was a file called $(rm -rf ~).txt).

While using single quotes instead of double quotes:

find . -type f -name "*.txt" -exec bash -cx 'wget -i "{}" -P  "$(dirname "{}")"' \;

would have fixed it, it would still have been very wrong for the reason mentioned above. {} should never be embedded in an argument that is evaluated as code. See @Gilles' answer for how to do it properly.


¹ -execdir AFAIK is from OpenBSD added there in 1996, in FreeBSD in 1997, NetBSD in 2002, GNU find in 2005, sfind in 2010, toybox in 2014 at least.

  • I marked this as the accepted solution because it is straightforward, elegant and completely does away with executing a bash shell which was a workaround in the first place. – Johannes Linkels Apr 16 '23 at 14:34
  • In my version of wget it is not mentioned that -execdir is not standard. GNU Wget 1.21 built on linux-gnu. But maybe that is because it IS the GNU version. – Johannes Linkels Apr 16 '23 at 14:37
  • @JohannesLinkels, ITYM in my version of find? See there for the standard specification of the find utility. IIRC -execdir is from BSD, also added to GNU find a long time ago. – Stéphane Chazelas Apr 16 '23 at 14:43
  • You said it was non standard. Since I only use GNU I assumed you knew other versions. – Johannes Linkels Apr 16 '23 at 14:47
  • @JohannesLinkels, -execdir is a non-standard find option, not wget option. – Stéphane Chazelas Apr 16 '23 at 14:56
  • My bad to say it is a wget option. It it described in the find man page. But why non-standard? – Johannes Linkels Apr 16 '23 at 18:44
  • @JohannesLinkels it's non-standard because it's not defined in the POSIX spec for the find command, which means that it's not guaranteed to be in every version of find. In practice, though, unless you're using an ancient or proprietary version of unix with their own version of find, it'll probably have -execdir. – cas Apr 16 '23 at 23:23
  • BTW, the fact that an option is non-standard also means that different implementations of that option may behave differently, because there's no defined way that it MUST behave - this is more likely to be a problem with other non-standard options in other programs than in find & -execdir. e.g. POSIX stat doesn't have a standard printf option defined, but both GNU and *BSD have implemented different, incompatible options to provide printf output formatting. Compare https://man.freebsd.org/cgi/man.cgi?stat with https://www.man7.org/linux/man-pages/man1/stat.1.html – cas Apr 16 '23 at 23:32