Edit 2015
as of util-linux 2.25, the fallocate
utility on Linux has a -d
/--dig-hole
option for that.
fallocate -d the-file
Would dig a hole for every block full of zeros in the file
On older systems, you can do it by hand:
Linux has a FALLOC_FL_PUNCH_HOLE
option to fallocate
that can do this. I found a script on github with an example:
Using FALLOC_FL_PUNCH_HOLE from Python
I modified it a bit to do what you asked -- punch holes in regions of files that are filled with zeros. Here it is:
Using FALLOC_FL_PUNCH_HOLE from Python to punch holes in files
usage: punch.py [-h] [-v VERBOSE] FILE [FILE ...]
Punch out the empty areas in a file, making it sparse
positional arguments:
FILE file(s) to modify in-place
optional arguments:
-h, --help show this help message and exit
-v VERBOSE, --verbose VERBOSE
be verbose
Example:
# create a file with some data, a hole, and some more data
$ dd if=/dev/urandom of=test1 bs=4096 count=1 seek=0
$ dd if=/dev/urandom of=test1 bs=4096 count=1 seek=2
# see that it has holes
$ du --block-size=1 --apparent-size test1
12288 test1
$ du --block-size=1 test1
8192 test1
# copy it, ignoring the hole
$ cat test1 > test2
$ du --block-size=1 --apparent-size test2
12288 test2
$ du --block-size=1 test2
12288 test2
# punch holes again
$ ./punch.py test2
$ du --block-size=1 --apparent-size test2
12288 test2
$ du --block-size=1 test2
8192 test2
# verify
$ cmp test1 test2 && echo "files are the same"
files are the same
Note that punch.py
only finds blocks of 4096 bytes to punch out, so it might not make a file exactly as sparse as it was when you started. It could be made smarter, of course. Also, it's only lightly tested, so be careful and make backups before trusting it!
rsync -aS
. – Gilles 'SO- stop being evil' Oct 16 '12 at 22:05