4

TL;DR:

Q: How to keep a counter in find -exec loop ?


My use-case:

I need to move a lot of directories which are scattered around the place, so I do

find . -type d -name "prefix_*" \
    -exec sh -c '
        new_path="/new/path/$(basedir "$1")";
        [ -d "$new_path" ] || mv "$1" "$new_path";
    ' find_sh {} \;

(The real command is more complex, as I read some metadata for the constitution of /new/path. Anyways, I do not want to argue about the command itself, it's not part of the question, just the use-case).

It works just fine, but it takes quite a while and I want to keep track of the progress.

So I added a counter writing to a file:

i=$(cat ~/find_increment || echo 0);
echo $((i+1)) | tee ~/find_increment;

That also works just fine, but it feels like a really bad idea having some 100.000 disk read and write operations.

I thought about writing to ramdisk instead of disk, but I don't have that option in the environment I need to perform that task.

Is there a better way to keep a counter between -exec runs ?

pLumo
  • 22,565
  • What is find_sh in the first command? So basically you want to know the number of new paths created? – Inian May 10 '19 at 12:20
  • Your 100,000 disk operations won’t result in 100,000 reads from or writes to the hardware; most operations will take place in the cache. – Stephen Kitt May 10 '19 at 12:22
  • no i want to keep progress how much was done while doing it. To print percentage or so. find_sh is just the way how to use sh -c safely. – pLumo May 10 '19 at 12:22
  • hmmmm .... I could probably use /dev/shm ... ?! – pLumo May 10 '19 at 12:23
  • @RoVo that would be an option, but what I’m saying is that there’s no need to worry about it, the kernel will take care of reducing the number of real disk operations. You can try it yourself by issuing 100,000 writes and watching the real block device activity (using blktrace). – Stephen Kitt May 10 '19 at 12:38

2 Answers2

3

Instead of using a pure find command you could combine find with a while read loop or GNU parallel. Both are likely to be faster than find's -exec since you don't start a new shell for every path found by find.

Solution Using GNU Parallel

GNU parallel has the following benefits compared to while read:

  • Easier to get right. No IFS= and -r needed.
  • Built-in job number variable {#}.
    For more handy replacement strings have a look at the tutorial.
  • Easy to parallelize if needed.
    Remove the -j1 and you have as many workers as cores by default.
script='
    echo Processing job number {#}
    new_path="/new/path/$(basedir {})"
    [ -d "$new_path" ] || mv {} "$new_path"
'
find … -print0 | parallel -0 -j1 "$script"

The {} is replaced by parallel by the correctly quoted entry read from stdin. Do not quote {} again.

parallel executes the script with the same shell from which you started it. If you started parallel in bash you can use bash features in the script.

Solution Using While Read

find … -print0 |
while IFS= read -r -d '' old_path; do
    echo Processing job number "$((++job))"
    new_path="/new/path/$(basedir "$old_path")"
    [ -d "$new_path" ] || mv "$old_path" "$new_path"
done 
Socowi
  • 625
  • This works very well and will be most of the times the way to go for me. Unfortunately, parallel is not available in the environment I need to use it right now :-(. – pLumo May 10 '19 at 13:07
  • Glad I could help. I added the while read version for use in your environment without parallel. – Socowi May 10 '19 at 13:14
  • @RoVo can you elaborate on why GNU Parallel is not available? Is it covered in: https://oletange.wordpress.com/2018/03/28/excuses-for-not-installing-gnu-parallel/ – Ole Tange May 11 '19 at 15:29
  • @Socowi Instead of putting the script in a variable (which can result in quoting madness), consider using a bash function instead. Just remember to export -f the function before using it in GNU Parallel. – Ole Tange May 11 '19 at 15:31
  • 1
    To see progress also checkout --bar. In this case I think it might be more useful than the echo. – Ole Tange May 11 '19 at 15:32
0

If available, store the counter in /dev/shm/ to prevent disk writes.

=> Use /dev/shm/find_increment instead of ~/find_increment.

pLumo
  • 22,565