14

I have a question concerning the find command in Linux.

In all the articles I've found online it says that attribute -size -10M, for example, returns files that are less than 10 MB in size. But when I tried to test this, it seems that -size -10M returns files that are less than or equal 9 MB in size.

If I do

find . -type f -size -1M

the find command returns only empty files (the unit is irrelevant, it can be -1G, -1k...).

find . -type f -size -2M

returns files <= 1M in size, etc.

The man page says:

Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

Ok, so I guess -1M is rounded to 0M, -2M to -1M and so on... ?

But then

find . -type f -size 1M

returns files <= 1M (i.e. 100K and 512K files, but not empty files), while I would expect it to return files that are exactly 1M in size.

find . -type f -size 2M

returns files > 1M and <= 2M, etc.

Is this all normal or am I doing something wrong and what's the exact behavior of the -size parameter?

Greenonline
  • 1,851
  • 7
  • 17
  • 23
golder3
  • 1,054
  • 1
  • 6
  • 15

2 Answers2

21

The GNU find man page says as follows — and this appears specific to GNU find, other implementations may differ, see below:

The + and - prefixes signify greater than and less than, as usual; i.e., an exact size of n units does not match. Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

Question:

Ok, so I guess -1M is rounded to 0M, -2M to -1M and so on... ?

No. It's not the limit in the -size condition that's rounded, but the file size itself.

Take a file of 1234 bytes and a -size -1M directive. The file size is rounded up the nearest unit mentioned in the directive, here, MB's. 1234 -> 1 MB. That doesn't match the condition, since -size -1M demands less than 1 MB (after this rounding). So, indeed, -size -1x for any x, returns only empty files.

Similarly, -size 1M would match the above file, since after rounding, it's exactly 1 MB in size. On the other hand, -size 1k would not, since it rounds to 2 kB.

Note that the - or + in front of the number in the condition is irrelevant for the rounding behaviour.

It may be useful to just always specify the sizes in bytes, since that way there's no rounding to stumble on. -size -$((1024*1024))c will reliably find files that are strictly less than 1 MB (or 1 MiB, if you will) in size. If you want a range, you can use e.g. ( -size +$((512*1024-1))c -size -$((1024*1024+1))c ) for files within [512 kB, 1024 kB].

Another question on this: Why does `find -size -1G` not find any files?


Gilles mentions in that linked question the fact that POSIX only specifies -size N as meaning size in 512-byte blocks (rounded as above: "the file size in bytes, divided by 512 and rounded up to the next integer"), and -size Nc as meaning the size in bytes. Both with the optional plus or minus. The others are left unspecified, and not all find implementations recognize other prefixes, or round like GNU find does.

I tested with Busybox and the *BSD find on my Mac, and it seems they treat conditions with size specifiers in a way that feels more sensible, i.e. -size -1k matches files from 0 to 1023 bytes, the same as -size -1024c, and similarly for -size -1M == -size -1024k (Busybox only has c, b and k). Then again, Busybox doesn't seem to do the rounding even for sizes specified in blocks, against what the POSIX text seems to say it should.

So, YMMV and again, maybe better to stick with sizes in bytes.


Note that there's a similar issue with the -atime, -mtime and -ctime conditions:

-atime n
File was last accessed n*24 hours ago. When find figures out how many 24-hour periods ago the file was last accessed, any fractional part is ignored, so to match -atime +1, a file has to have been accessed at least two days ago.

And similarly, it may be easier to just use -amin +$((24*60-1)) to find files that have been last accessed at least a full 24 h ago. (Up to rounding to a minute, which you can't get rid of.)

See also: Why does find -mtime +1 only return files older than 2 days?


Is this all normal or am I doing something wrong and what's the exact behavior of the -size parameter?

It's "normal" as far as the behaviour of GNU find is concerned, but I wouldn't call it exactly sensible. You're not wrong to be confused, it's find that is confusing.

ilkkachu
  • 138,973
  • 1
    Thanks a lot! The key for me was understanding that the file size is being rounded, not the limit set in the -size condition, which is not very clear from the man page... – golder3 Mar 09 '21 at 11:09
  • Could you say something about the implementations of find that differs in this behavior (how they differ)? – Kusalananda Mar 09 '21 at 11:18
  • @Kusalananda, added something. Do edit if you want to double-check with e.g. OpenBSD. – ilkkachu Mar 09 '21 at 11:42
  • 1
    OpenBSD find has a -size that only follows the POSIX spec. No other suffixes allowed than c. I haven't looked exactly how -ctime etc. works, but I know there is a difference there too. – Kusalananda Mar 09 '21 at 11:54
  • The same thing happens with -mtime (and other time options). -mtime -2 means up to 1 day old, and doesn't include 1 day + 1 second old. – Barmar Mar 10 '21 at 17:17
  • It's analogous to the way people describe their age: you're N years old until your birthday, then you're N+1 years old. – Barmar Mar 10 '21 at 17:19
  • @Barmar, well, I would say ages are usually rounded down, not up. Say, if a person needs to be 18 to be allowed to buy beer, being 17 and some days doesn't cut it. – ilkkachu Mar 10 '21 at 17:38
  • @ilkkachu Good point. – Barmar Mar 10 '21 at 17:41
2

Answer from find manual, -size section:

The + and - prefixes signify greater than and less than, as usual; i.e., an exact size of n units does not match. Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

So in each situation mentioned in question there is a matter of rounding up size to nearest unit BEFORE comparing it with size argument. If -size is using "M" as unit, then everything is being rounded up to Megabytes.

DevilaN
  • 1,966
  • 1
    Isn't that the part of the manual the question already quoted? – ilkkachu Mar 09 '21 at 10:39
  • @ilkkachu Emphasis on "size rounded up" explains observed behaviour consistently. It was better to quote entire context from manual. – DevilaN Mar 09 '21 at 10:48