Delete files in a directory only when cumulative sum of files size exceeds xGB

Question

I have a directory, with thousands of files.

I need to sort the files in descending order of creation date (to make sure that the newest files wont be deleted), and start summing the size of these files until the summation reaches a certain limit (example 10GB).

Once that is reached, I need to be able to delete all the files (that are already sorted in descending order), that come after those 10GB of files.

So, after the operation, the contents of my directory should not exceed 10GB in total size, but the newest files must remain.

I need to be able to accomplish this without the usage of GAWK since I don't have GNU system.

Is this doable with the find command only?

Rather than noting you don't have a gnu system, note what you do have, otherwise there's no point in asking the question. OSX? NetBSD? OpenIndiana? OpenBSD? Irix? Perl can do this easily, among others. Most modern BSDs at least support installing gnu tools. Specify shell too while you're at it. find and bash can do it too, but it will be much slower. — Lizardx, Sep 15 '21 at 21:27
hi @Lizardx, I am totally new to this.
So i need to write a shell script, that allows me to accomplish the above.

I am trying to write some code with awk command, but am not able to reach what i need. — Charbel, Sep 17 '21 at 13:08
Similar (though without the no-GNU requirement): Delete the oldest files in folder if combined size of folder is more than 10G — Stéphane Chazelas, Sep 17 '21 at 16:19
You have to provide the requested information. I believe the answer with zsh below is probably assuming correctly that you are using OSX, since otherwise you'd know what you are using and would have posted it. That's a good answer, it is pretty much exactly how I would have done it in Perl as well. — Lizardx, Sep 17 '21 at 21:35
awk is really difficult to use, I don't recommend it for a new user, particularly not for this scenario, and you'd almost certainly need to be running subshells and other complicated situations as well, I'd put your odds of success with awk as close to zero. If you're going to copy paste code, copy paste what the person below supplied, be sure you test it with the print -r in place!!! otherwise you could be VERY sad. — Lizardx, Sep 17 '21 at 21:42
@AdminBee that is correct, the newest files should NOT be deleted — Charbel, Sep 24 '21 at 12:06

Stéphane Chazelas · Answer 1 · 2021-09-18T07:18:18.343

2

With zsh and on systems and filesystems where the st_blocks attribute returned by the lstat() system call is expressed in number of 512 byte units (most):

#! /usr/bin/env zsh
zmodload zsh/stat || exit
zmodload zsh/files || exit # for a builtin rm as well.
disk_usage=0 threshold=$(( 10 * 2**30 ))
set -- */(ND.om)
for f do
  stat -LA blocks +block -- $f &&
    (( (disk_usage += blocks * 512) > threshold )) &&
    break
  shift
done
(( $# == 0 )) || print -r rm -f -- "$@"

(remove the print -r to actually do it).

Note that in the cumulative disk usage, it only counts regular files and if there are several hard links to the same file, their size will all be counted.

edited Sep 18 '21 at 07:18

answered Sep 17 '21 at 16:30

Stéphane Chazelas

544,893

This is a pretty good answer assuming the user is running OSX, which they probably are is my guess, since from what I gather osx switched to zsh as default a while back. The logic is also fairly trivial to map to Perl, assuming basic understanding of modules etc. Also I upvoted it because no attempt was made to construct a clever one liner or something to do it. – Lizardx Sep 17 '21 at 21:39
@Lizardx, not that AFAIK, while macos recently switched to zsh for the default user login shell, it always shipped with zsh. /bin/sh even used to be zsh in earlier versions. – Stéphane Chazelas Sep 17 '21 at 21:45
Oh, that's good to know, I don't keep up on this, that's probably the item I ran across. Assuming apple osx, this is a good answer in this case, and your odds of having assumed the right thing are very high in this case. For other systems, not as good since they often will not ship with zsh by default. /bin/sh being zsh, ouch, lol. – Lizardx Sep 17 '21 at 22:57
1

There is a slight risk there of assuming block size is 512B, it doesn't have to be, even though it often and usually is. I'd rather just get the size in whatever the OS gives me in KiB or B or MiB or whatever. Perl -s gives it bits which is pretty unambiguous, though of course without the block size data, you can't actually know how much room a file is truly occupying. – Lizardx Sep 17 '21 at 23:02
@Lizardx, you're right, I always assumed it was in 512 byte units, but it looks like it's not always true. I've asked a question about it. – Stéphane Chazelas Sep 18 '21 at 07:06
I have to deal with this issue in a tool I make, in fact, I had to shut off some output for size data for a recent issue because the tool could get the block count but not the block size in certain arcane situations. Unit size is an annoyingly erratic measurement which you'd think everyone would get consistent but they don't. Even finding authoritative statements about the unit size used in Linux /sys data, /proc data, etc, is a challenge, though there it's almost always in KiB, but it's hard to find the actual core documentation for that. – Lizardx Sep 18 '21 at 07:51

Delete files in a directory only when cumulative sum of files size exceeds xGB

1 Answers1