I'd like to be able to process multiple files downloaded by wget -i immediately after they are downloaded (instead of waiting for all files in the list to finish--for the entire wget process to exit). The trouble is: because wget downloads the file in place, I cannot be sure when a file is safe to process (fully downloaded). Ideally, the principled approach is (I believe) to have wget initially download files into a temporary directory and then mv them into the actual destination directory when complete. Because the mv is atomic*, I can guarantee that any file present in the destination directory is completely downloaded and ready for processing.
I've been through the manpage, but can't seem to find anything to this end. My current hacky approach is to use fuser to see if wget no longer has the file open. But, this is very fragile (what if wget opens a file multiple times?) and I'd like to avoid it.
If there isn't a way to achieve this exactly, is there a workaround that can achieve the same effect? The files are HTML pages if that's at all relevant.
*Addendum: Apparently mv may not be atomic (although for my env it is), although I don't think strict atomicity is needed. The only requirement is that once a file is renamed into the destination directory it is completely downloaded (and the complete contents are immediately available at the new path).
edit: Splitting the process up into multiple wget commands is also not ideal because it precludes using some core features of wget (rate limiting, HTTP keepalive, DNS caching, etc.).
fuserjust for the last is less fragile but bad). I guess I could add a sentinel dummy file URL, but it feels like there should be a better way! – Bailey Parker May 15 '19 at 14:33wget [url] -O /dev/stdout | [next step of your process]– Httqm May 15 '19 at 14:39wget -ito download multiple URLs here. – Bailey Parker May 15 '19 at 14:41wgetprocess -- hopefully only the one you're interested in; or thewgetsequence couldtoucha file at the end, which your monitoring process then knows to skip; etc. – Jeff Schaller May 15 '19 at 14:46wget's PID to the monitoring process to get the right one. This removes the ability to have a dependency betweenwgetand the process though. For example you can't dowget -i $(./monitoring-process)and have it emit new files to download. I'm probably pushing against the limits of what I should be doing here (before I should just throw everything into a script). I'm trying to lean onwgetas much as possible, because it does its job well! – Bailey Parker May 15 '19 at 14:55