This is usual (but non-obvious) behaviour on hierarchical storage systems such as Lustre. ncdu
and other sparseness-measuring tools generally rely on the value given in st_blocks
in response to a stat
call. On historical, non-copy-on-write file systems, the obvious behaviour is the one we’ve grown to expect: each file occupies on disk at least the exact amount of non-zero data it contains, so st_blocks
indicates at least the amount of actual non-zero data stored in the file. In Lustre, st_blocks
represents the storage used on the frontend system, not the overall amount of data stored in the file.
There is just one slight exception to this: “released” files (whose contents are entirely removed from frontend storage) indicate that they occupy 512 bytes, not 0. This was implemented as a workaround for an issue with tools such as tar
which will skip reading files with 0 st_blocks
entirely, resulting in data loss on Lustre-based file systems (archiving a released file with sparse file detection, which is common in backup scenarios, would end up storing no data at all). When a file indicates that it occupies 512 bytes, tools have to read it (or use fiemap
ioctls etc.) to determine exactly what it contains; on Lustre such actions prompt the file’s data to be retrieved from wherever it is stored in the hierarchy.
With huge files, it’s unusual for the entire file to be restored to frontend storage, which is why you only end up with a partial “occupied block count” in some scenarios.