How can I execute a command only if a certain file exceeds a defined size? Both should at the end run as a oneliner in crontab.
Pseudocode:
* * * * * find /cache/myfile.csv -size +5G && echo "file is > 5GB"
How can I execute a command only if a certain file exceeds a defined size? Both should at the end run as a oneliner in crontab.
Pseudocode:
* * * * * find /cache/myfile.csv -size +5G && echo "file is > 5GB"
If you have GNU stat
, you can use its --printf
option to get its size.
e.g.
size=$(stat --printf '%s' /cache/myfile.csv)
if [ "$size" -gt 5368709120 ] ; then # 5 GiB = 5 * 1024 * 1024 * 1024
echo "file is > 5GB"
fi
See man stat
for details.
BSD's stat
(e.g. on FreeBSD and on Mac) has a similar formatting option, -f
:
size=$(stat -f '%z' /cache/myfile.csv)
Alternatively, you could use perl's built-in stat
function, or its -s
file test operator (which is similar to bash's -s
file test but it returns the file's size rather than just true if it exists and is non-empty).
perl's stat function returns a 13-element list (array) of metadata about a file containing the following data (copied from perldoc -f stat
):
[...] Not all fields are supported on all filesystem types. Here are
the meanings of the fields:
0 dev device number of filesystem
1 ino inode number
2 mode file mode (type and permissions)
3 nlink number of (hard) links to the file
4 uid numeric user ID of file's owner
5 gid numeric group ID of file's owner
6 rdev the device identifier (special files only)
7 size total size of file, in bytes
8 atime last access time in seconds since the epoch
9 mtime last modify time in seconds since the epoch
10 ctime inode change time in seconds since the epoch (*)
11 blksize preferred I/O size in bytes for interacting with the
file (may vary from file to file)
12 blocks actual number of system-specific blocks allocated
on disk (often, but not always, 512 bytes each)
(The epoch was at 00:00 January 1, 1970 GMT.)
Field 7 is the one we need.
To return the file's size (for later use in a shell command or script) using stat
:
# stat
perl -e 'print scalar((stat(shift))[7])' /cache/myfile.csv
-s
perl -e 'print -s shift' /cache/myfile.csv
Or to do it all in perl:
# stat
perl -e 'print "File is > 5 GiB\n" if (stat(shift))[7] > 5*1024*1024*1024' /cache/myfile.csv
-s
perl -e 'print "File is > 5 GiB\n" if -s shift > 510241024*1024' /cache/myfile.csv
See perldoc -f stat
and perldoc -f -X
(as well as help test
in bash).
BTW, perl's shift
function removes the first element of an array (by default @ARGV
, the array of command line args, if not specified) and returns its value. It's often used in a loop to process all elements of an array, but here we're only interested in the first arg (the filename). See perldoc -f shift
for details, including notes on lexical scope and use in a subroutine.
stat
, not find
(and not ls
either). Part of our job when answering a question is to tell people when they're using the wrong tool or asking the wrong question, to find the underlying task hidden beneath the XY Problem.
– cas
May 16 '23 at 13:45
find
is the correct tool (albeit not with "+5G" as the argument to -size
).
– Kusalananda
May 17 '23 at 13:08
linux
, and linux means GNU tools on everything but tiny distros with only busybox available (and even busybox stat has a -c
formatting option with %s
meaning size in bytes just like GNU stat). More to the point, find
is the wrong tool for getting metadata about a file such as the file's size. That's stat
's job, it's what it's for. If stat didn't have formatting options, the next best option is not find, it's perl with its built-in stat()
function because that's a trivial one-liner compared to a dozen or so lines in C.
– cas
May 17 '23 at 15:48
perl -e 'print scalar((stat(shift))[7])' /cache/myfile.csv
– cas
May 17 '23 at 15:57
-s
file test returns the size of the file (bash's -s
test only returns true if the file exist and is not empty, false otherwise), so extracting the size from the list returned by stat()
isn't necessary. e.g. perl -e 'print -s shift' filename
to output the size for use in shell, or do it all in perl with print "File is > 5GB\n" if -s shift > 5*1024*1024*1024' filename
. See perldoc -f -X
for docs on perl's file tests (and help test
in bash for bash's file tests).
– cas
May 17 '23 at 23:03
To use the file size as a precondition you can use stat
or find
:
[ -n "$(find /cache/myfile.csv -prune -size +5G 2>/dev/null)" ] && echo "file is > 5GB"
Or if the target command (echo
, here) is short, put it into the exec
part of `find
find /cache/myfile.csv -prune -size +5G -exec echo "file is > 5GB" \;
The -prune
is in case myfile.csv
might be a file of type directory, to prevent find
from descending into it.
If you need to treat files in a shell, both version only execute shell's command only if all conditions are met: is a file, is named myfile.csv
and is > 5G:
find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
echo "$1 is > 5GB"
' bash {} \;
or
find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
for file; do echo "$file is > 5GB"; done
' bash {} +
find .... +5G && start.sh
. So, only start the 2nd command if the find command found the file which was above a certain size.
– membersound
May 16 '23 at 13:15
stat
instead.
– cas
May 16 '23 at 13:23
/cache/myfile
isn't a directory, neither command in the answer will do much iterating. Using find
is about the only portable way of conditionally executing a command based on the size of a file.
– Kusalananda
May 16 '23 at 13:25
wc -c
can get the size of a file portably (though not always as efficiently in the wc
implementations that don't do optimisations when the size of the file can be obtained other than by reading it).
– Stéphane Chazelas
May 17 '23 at 16:43
Note that some shells have the feature built-in.
SHELL=/bin/tcsh
* * * * * if (-Z /cache/myfile.csv > 5*1024*1024*1024) echo 'file is > 5GiB'
Or with zsh
, here using glob qualifiers and an anonymous functions, though zsh also has a stat
builtin that predates both GNU and BSD stat
:
SHELL=/bin/zsh
* * * * * (){ if (($#)) echo 'file is > 5GiB'; } /cache/myfile.csv(NLG+5)
(note that like for find -size +5G
, we're talking of gibibytes (1GiB = 1,073,741,824 bytes) here, not gigabytes (1GB = 1,000,000,000 bytes))
For symlinks, tcsh
will get the size of the file it eventually resolved to while zsh
's LG+5
qualifier like find
's -size
will check the size of symlink itself. Change to -LG+5
to check the size after symlink resolution. zsh
's stat
builtin gives you information after symlink resolution by default, -L
to change that. In GNU and BSD stat
, that's reversed. Same with find
where -L
tells it to follow symlinks.
For more ways to get the size of a file, see How can I get the size of a file in a bash script?