tl;dr I would like to reserve (or "claim"?) some amount of disk space before an rsync
occurs so other rsync
instances will only run if the disk space needed will certainly be available.
background
A job (a shell script that runs rsync
) will:
- Use
rsync
to copy large amount of data from a source disk to a different destination disk - do some work using the copied data
- remove the copied data
Multiple instances of the job script may run simultaneously.
In my case, once in a while, many job scripts simultaneously rsync
and use all available disk space. All of the rsync
instances fail (and so the jobs fail).
pseudo-code
Here is the algorithm I'm imagining:
$job = get_next_incoming_job()
$disk_dst = $job.disk_dst() # destination disk for rsync
$space_need = $job.calculate_space_needed()
_check_space: # jump label
if $space_need > space_available($disk_dst) then
sleep $RANDOM
goto _check_space:
$handle = reserve_space($disk_dst, $space_need) # How??
# rsync will "fill-in" the reserved space - How??
rsync $job.source_data_path() $disk_dst/$job.ID/
do work using $disk_dst/$job.ID/
remove $disk_dst/$job.ID/
release_reserved_space($handle) # How??
The magic function reserve_space
would instantly change the $disk_dst
reported free space (value returned by space_available
). Other rsync
job instances would see space_available()
return less space right away (and thus, delay their work until later).
Currently, space_available()
(via actual program df
) will return a declining number while rsync
instances run. The problem is multiple rsync
instances can run out of space while running. I'd like the rsync
instances to only run when it is certain they can complete (i.e. not run out of disk space while running).