According to the Open Group specs, POSIX du
doesn't have the -b
option to display the size in bytes. So what is the POSIX-compliant way to get the size of a file or folder in bytes?

- 3,060
2 Answers
As an approximation of what GNU du
does with -sb
, you could do:
cumulative_size() (
export LC_ALL=C
ret=0
[ "$#" -gt 0 ] || set .
for file do
case $file in
(/*) sanitized=$file;;
(*) sanitized=./$file;;
esac
size=$(
find "$sanitized" ! -type b ! -type c -exec ls -niqd {} + |
awk '! seen[$1]++ {sum += $6}
END {print sum}'
)
if [ -n "$size" ]; then
printf '%s\t%s\n' "$size" "$file"
else
ret=1
fi
done
exit "$ret"
)
Like GNU du
does we try to count files only once, by looking at their inode number (as reported in the first field of ls -ni
), but since we don't have the device number which ls
cannot report, that assumes the directory hierarchies don't span several filesystems.
Contrary to du
we also only do the deduplication in each file argument.
For instance in:
cumulative_size dir dir
The cumulative disk usage of dir
and its contents is reported twice, with files within each counted only once, while GNU du -bs
would only report dir
disk usage once.
We exclude device files because ls -n
doesn't report their size. On Linux at least, that won't make a difference as their size is otherwise always reported as 0
.
find
can't be given file paths that start with -
or whose name matches its predicates (including !
, (
... as well). Here we work around that by prefixing the file paths with ./
if they don't start with /
, so find !
or find -print
becomes find ./!
or find ./-print
for instance. That assumes no find
implementation has a predicate that starts with /
. That means we also don't need to pass a --
to ls
to mark the end of its options.
We use ls -n
instead of ls -l
to avoid decoding uids/gids into user or group names (which would be expensive and also cause problem here for names with spaces). POSIX specifies -o
/-g
option to remove those fields altogether, but they are optional there.
The output of ls -n
is only specified in the C/POSIX locale. Also, file paths being arbitrary sequences of non-null bytes, you can only process them as text in the C locale, hence the LC_ALL=C
.
We also use -q
to make sure newline characters in file names or symlink targets don't put a spanner in the works.
Also note that since the full paths are passed to ls
, we can't process directory structures of arbitrary depth as it will stop working once paths lengths exceed PATH_MAX.
The error reporting is rather crude. We only report a non-zero exit status if any of the computed size return empty. So a zero exit status is not a guarantee that all file sizes have been counted.

- 544,893
-
Your answers are always very helpful and a great source to learn from. I am still trying to understand each line. "The output of ls -n is only specified in the C/POSIX locale." Where are you getting this information from? I had a look at the opengroup utilities/ls page – finefoot May 29 '23 at 23:34
-
@finefoot, the date field is locale-dependant as well as what blank characters may be used to separate fields (though in practice, that date field appears after the field we're interested in so unless it contains newline characters, it's likely not going to be a problem, and I've not seen any
ls
implementation that uses blank characters other than space to separate fields). C locale in any case removes complex processing in both printing and parsing and reduces the risk of bad surprise. – Stéphane Chazelas May 30 '23 at 05:06 -
Ahh, great. That's explains it, thank you. :) I've seen
LC_ALL=C
quite a few times in scripts and always wanted to read about the reason why it's used. And you opted forls
compared towc -c
(see below) due to much better performance of reading the size instead of counting the length, right? – finefoot May 30 '23 at 12:27 -
@finefoot, See also What does "LC_ALL=C" do?.
wc -c
only works for non-directory files you have permission to read and have possible side effects for non-regular files. See also How can I get the size of a file in a bash script? – Stéphane Chazelas May 30 '23 at 12:53
Unfortunately, the output format of ls
is apparently not standardized. So it might not be the best idea to parse its output.
An alternative POSIX-compliant way to find out the size in bytes of a single file is to use wc -c
:
-c
Write to the standard output the number of bytes in each input file.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html
$ printf %s 0123456789ABCDEF >sixteenbytestestfile # example file of 16 bytes length
$ wc -c sixteenbytestestfile
16 sixteenbytestestfile
If we don't pass the file as an argument, but via standard input, the filename will be omitted from the output:
$ wc -c <sixteenbytestestfile
16
Apparently, some systems add some whitespace around the number output. We can remove it by using Arithmetic Expansion without any arithmetic operations:
$ filesize=" 123 " # possible wc -c output
$ printf %s\\n "-$filesize-"
- 123 -
$ printf %s\\n "-$((filesize))-"
-123-
In conclusion, here is a definition of a simple function to get the size of a file:
$ filesize() { printf %s\\n "$(($(wc -c <"$1")))"; }
$ filesize sixteenbytestestfile
16

- 3,060
ls -ld somefile | awk '{print $5}'
– edo1 Dec 21 '22 at 00:15--apparent-size
option as well (e.g on fs with compression support disk usage could be less than apparent size). – edo1 Dec 21 '22 at 00:25du -b
does? – Stéphane Chazelas Dec 21 '22 at 08:21