What are functions to manipulate sparse files under Linux?

Question

What are functions to manipulate sparse files under Linux? (let's say in C, notes about other systems highly welcome) e.g.:

make hole inside of file by removing part of its inside
investigate structure , e.g. generate sequence of pairs denoting beginnings and ends of separated continuous blocks of data
split file into two at some point, by reassigning range of blocks (i.e. without moving actual data)
investigate inodes, and other relevant aspects? (maybe possible to assign some blocks to multiple files in copy-on-write manner?)

Context:

Original question that come to my mind and I arrived from was after man rsync of --sparse option:

Why rsync's --sparse option conflicts with --inplace ?

Is it limitation of filesystem calls api?

From data structure point of view, if source sparse file is seen as sequence of non-continuous blocks of data, than I would expect from "r"syncing to deallocate on destination those ranges that does not exist at source, allocate missing ones, rest update accordingly (even with standard rsync rolling hash algorithm, treating all remaining sequences as one, or running separately on each).

Reference:

man rsync

   -S, --sparse
          Try to handle sparse files efficiently so they take up less space on the destination.  Conflicts with --inplace because it's
not possible to over- write data in a sparse fashion.

For your first two points, have a look at fallocate, punchhole, fiemap. — frostschutz, May 21 '17 at 13:27
thanks to suggestions, I've found fiemap related ans: https://unix.stackexchange.com/a/47450/9689 — Grzegorz Wierzowiecki, May 21 '17 at 13:29

Stephen Kitt · Accepted Answer · 2017-05-22T06:32:39.507

Sparse files are designed to be transparent to userspace: holes are created by seeking past unused areas, and are read as blocks of zeroes. They can’t be detected using standard userspace APIs, at least not yet — as pointed out by Stéphane Chazelas, at least Solaris and Linux support the SEEK_DATA and SEEK_HOLE lseek(2) flags which allow userspace programs to find holes, and these flags might be added to POSIX at some point.

This explains the incompatibility between rsync’ --sparse and --inplace options: when writing to an existing file portably, holes can’t be created in existing data. --sparse works by rewriting the whole file, skipping over (long) sequences of zeroes, which results in sparse files on OSs and file systems which support them.

On Linux, you can retrieve details of files’ sparseness using the fiemap ioctl, and e2fsprogs’ filefrag(8); see Detailed sparse file information on Linux. On the writing side, you can use fallocate(2) (and the handy fallocate(1) utility) to punch holes in an existing file, making it sparse if the holes cover entire blocks. Support is file system dependent — only XFS, btrfs, ext4, and tmpfs currently support these operations. Recent kernels (since 4.1) and very recent versions of util-linux support inserting holes in files, shifting the content after the hole (fallocate -i, introduced in util-linux 2.30 which should be released soon).

Your last two questions are file system surgery, and I’m not sure there’s any generic system call or ioctl available to perform such operations. reflink-compatible file systems allow files to share their contents; this can be achieved using the FICLONEand FICLONERANGE ioctls.

I hadn't realised the -i was not released yet. I happened to spot it in the man page after pulling the latest code for something else. — Stéphane Chazelas, May 22 '17 at 07:39
I'm looking forward to the day rsync supports SEEK_HOLE/SEEK_DATA/FIEMAP. I've been hit a few times by it hanging on a several TB sparse file. — Stéphane Chazelas, May 22 '17 at 07:40

What are functions to manipulate sparse files under Linux?

1 Answers1

Linked