11

I have a sparse file, in which only some blocks are allocated:

~% du -h --apparent-size example
100K    example
~% du -h example
52K     example

I would like to know which blocks of the file are actually allocated. Is there a system call or kernel interface that could be used to get a list of either the allocations, or the holes of file?

Simply checking for a long enough string of zeros (the approach used by GNU cp, rsync, etc) does not work correctly:

~% cp example example1  
~% du -h example1 
32K     example1

It detected other sequences of zeros that were actually allocated.

Juliano
  • 2,538

3 Answers3

7

There is a similar question on SO. The currently accepted answer by @ephemient suggests using an ioctl called fiemap which is documented in linux/Documentation/filesystems/fiemap.txt. Quoting from that file:

The fiemap ioctl is an efficient method for userspace to get file extent mappings. Instead of block-by-block mapping (such as bmap), fiemap returns a list of extents.

Sounds like this is the kind of information you're looking for. Support by filesystems is again optional:

File systems wishing to support fiemap must implement a ->fiemap callback on their inode_operations structure.

Support for the SEEK_DATA and SEEK_HOLE arguments to lseek you mentioned from Solaris was added in Linux 3.1 according to the man page, so you might use that as well. The fiemap ioctl appears to be older, so it might be more portable across different Linux versions for now, whereas lseek might be more portable across operating systems if Solaris has the same.

MvG
  • 4,411
  • 2
    You can get this FIEMAP information by using the ̀--fibmap of the hdparm utility. See the manual. – Totor Mar 03 '13 at 20:51
4

There is a collection of python programs called sparseutils that use SEEK_HOLE and SEEK_DATA to determine which sections of the file are represented as holes and which are data. Usage is quite straightforward. mksparse can be used to generate a sparse file according to some given layout.

 $ echo hole,data,hole | mksparse --hole-size 4096 --data-size 4096 example
 $ du -sh example
 4.0K   example

The sparsemap program can be used to print the layout to stdout:

 $ sparsemap example
 HOLE 4096
 DATA 4096
 HOLE 4096
richard
  • 41
1

It depends on the file system. I don't believe their is a call, which may be why many tools don't handle copying sparse files well. The GNU tool chain use searching for large blocks of zeros as that allows them to remove unused allocated blocks. Many copy tools will convert a sparse file into a file with all blocks allocated.

You will likely have to open the inode, and parse the result. Inode format is file system dependent. Some file systems may have part of your data in the inode itself.

BillThor
  • 8,965
  • 1
    There has to be some FS-agnostic way to have this information. Reading directly from the inode is definitely not an option. I was looking for something like SEEK_DATA and SEEK_HOLE parameters for lseek(), like there are in Solaris: http://www.opensolarisforum.org/man/man2/lseek.html – Juliano Feb 06 '11 at 22:04
  • @Juliano A look at the Linux lseek option doesn't have these options. Solaris supports very few file systems, so it would be relatively easy to support. Linux supports a wide variety of file systems, some of which do not support sparse files. Support for SEEK_DATA / SEEK_HOLE would impose support in code for all the file systems. These methods may not do what you expect. See http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data for more data from the Sun side. – BillThor Feb 06 '11 at 22:20
  • 1
    Filesystems don't need to support anything with the lseek() interface, the kernel witelists the filesystem modules that support SEEK_DATA/SEEK_HOLE through a module property. This is in the manpage itself and the linked blog: "For filesystems that do not supply information about holes, the file will be represented as one entire data region." – Juliano Feb 06 '11 at 22:38
  • @Juliano Still requires kernel mods as well as changes to lseek. As per the blog entry this is fairly new functionality at Sun. For it to work the file system code needs to modified as well. It certainly would require changes to all the file systems supporting sparse files to provide the kernel hooks. – BillThor Feb 06 '11 at 22:46