On systems (and file systems) supporting the SEEK_HOLE
lseek
flag (like your Ubuntu 12.04 on ext4 would) and assuming the value for SEEK_HOLE
is 4 as it is on Linux:
if perl -le 'seek STDIN,0,4;$p=tell STDIN;
seek STDIN,0,2; exit 1 if $p == tell STDIN'< the-file; then
echo the-file is sparse
else
echo the-file is not sparse
fi
That shell syntax is POSIX. The non-portable stuff in it are perl
and that SEEK_HOLE
.
lseek(SEEK_HOLE)
seeks to the start of the first hole in the file, or the end of the file if no hole is found. Above we know the file is not sparse when the lseek(SEEK_HOLE)
takes us to the end of the file (to the same place as lseek(SEEK_END)
).
If you want to list the sparse files:
find . -type f ! -size 0 -exec perl -le 'for(@ARGV){open(A,"<",$_)or
next;seek A,0,4;$p=tell A;seek A,0,2;print if$p!=tell A;close A}' {} +
The GNU find
(since version 4.3.3) has -printf %S
to report the sparseness of a file. It takes the same approach as frostschutz' answer in that it takes the ratio of disk usage vs file size, so is not guaranteed to report all sparse files (like when there's compression at filesystem level or where the space saved by the holes doesn't compensate for the filesystem infrastructure overhead or large extended attributes), but would work on systems that don't have SEEK_HOLE
or file systems where SEEK_HOLE
is not implemented. Here with GNU tools:
LC_ALL=C find . -type f ! -size 0 -printf '%S:%p\0' |
LC_ALL=C awk -v RS='\0' -F : '$1 < 1 {sub(/^[^:]*:/, ""); print}'
(note that an earlier version of this answer didn't work properly when find
expressed the sparseness as for instance 3.2e-05. Thanks to @flashydave's answer for bringing it to my attention. LC_ALL=C
is need for the decimal radix to be .
instead of the locale's one (not all awk
implementations honour the locale's setting)