13

Of course, the standard way of testing if a file is empty is with test -s FILE, but one of our clients have received a script containing tests like this:

RETVAL=`ls -s ./log/cr_trig.log | awk '{print $1}'`
if test $RETVAL -ne 0
then
    echo "Badness: Log not empty"
    exit 25
fi

with claims from the supplier that it works in the two environments that they have tested it in. Needless to say, it failed badly on both two places that I tested it.

So, I got curious. When does ls -s print 0 for empty files?

This is my findings so far:

  • GFS on Linux: 4
  • ext4 on Linux: 0
  • ZFS on Solaris: 1
  • UFS on Solaris: 0
  • jfs on AIX : 0
  • VxFS on HP-UX: 0
  • HFS on HP-UX: 0
  • HFS on Mac OS X: 0

I haven't examined networked file systems yet.

Question: How can I elegantly explain to the others that their scripts are wrong ?

In my opinion, the "correct" version would be:

if test ! -s ./log/cr_trig.log
then
    echo "Badness: Log not empty"
    exit 25
fi
Matteo
  • 9,796
  • 4
  • 51
  • 66
MattBianco
  • 3,704
  • 6
    Just show them your tests. You have hard data that proves that their test is not portable, what more do you need? – Mat Nov 01 '11 at 08:37
  • One of the most interesting questions I've seen so far on this server. Too bad one can only spend one point. – ktf Nov 01 '11 at 09:48
  • @ktf You can always award a bounty. – Joe Nov 05 '11 at 10:59

3 Answers3

6

Very interesting finding. Although I've never used ls -s to check whether a file is empty or not, I would have assumed, that it reports 0 for empty files, too.

To your question: As Mat already commented, show them your test results. To explain the results to them, state that ls -s reports the amount of allocated blocks in the filesystem, not the actual size in bytes. Obviously some filesystem implementations allocate blocks even if they don't have to store any data instead of storing just a NULL pointer in the inode.

The explanation for this may be performance-related. To create empty files that will stay empty is an exeption for normal processing (the most common usage I've seen would be the creation of status files where the existance of a file represents a certain state of the software).

But normally a file created will get some data soon, so the designers of a certain FS may have assumed that it pays off to immediately allocate a data block upon file creation, so when the first data arrives this task is already done.

Second reason could be that a file has contained data in the past which has been erased. Instead of freeing the last data block it may be worthy to keep that data block for reuse by the same file.

EDIT:

One more reason came to mind: The filesystems where you have found values > 0 are ZFS, the the RAID+LVM+FS implementation and GFS, a cluster filesystem. Both may have to store metadata to maintain file integrity that is not stored in inodes. It could be that ls -s counts in data blocks allocated for this metadata.

ktf
  • 2,717
4

Unlike most (if not all) other file systems, ZFS doesn't preallocate a static array of inodes. Creating an empty file on ZFS will then use a new block of data which is the one reported by ls -s.

I suspect GFS to have to store synchronization/lock data leading to the other non zero result.

jlliagre
  • 61,204
3

ls -s reports the number of blocks that are allocated for the file, not including whatever is stored directly in the directory entry.

In most cases, the number of blocks is the number of bytes divided by the block size in bytes, rounded up.

The number of blocks can be less than that for a sparse file. For example, on most filesystems, this will create a 8192-byte file spanning 0 blocks:

$ perl -e 'truncate STDOUT, 8192' >a
$ ls -l a
-rw-r--r-- 1 gilles gilles 8192 Nov  1 21:32 a
$ ls -s a
0 a

Conversely, the number of blocks can be more if the filesystem preallocates blocks for files or uses blocks to store metadata. I'm not surprised that Zfs has a non-obvious correspondence between file size and number of blocks, given the large number of features it offers and its orientation towards large filesystems; I don't know the details, but the number of blocks depends not only on the size of the files but also on its history (you can have more than one block in an empty file if it's the result of truncating a larger file).

To explain why ls -s is wrong: it doesn't count the size of the file, but a filesystem-dependent quantity. It's a very indirect way to determine whether a file is empty in the first place, requiring an external tool (ls) and some parsing; instead, they should use test -s, which requires no parsing, and performs exactly what is requested. If they think that ls -s is a good way to test whether a file is empty, the onus should be on them to justify that it works.