7

If I have files a, b and c in a directory on a Linux machine. How can I get the total number of bytes of these 3 files in a way that does not depend on how e.g. ls shows the information? I mean I am interested in a way that is not error prone

Update
1) I am interested in binary files not ascii files
2) It would be ideal to be a portable solution e.g. GNU linux or Mac working

Jim
  • 10,120
  • What are the errors that you're trying to avoid? Are you OK with double-counting hard links? How about symlinks? And, since it's unclear from your post, are you looking for the size of the file's contents, or the amount of disk space they consume (ie, "test" is 4 bytes but might consume 4k or more depending on disk format). – kdgregory Sep 30 '17 at 16:40
  • @kdgregory:I only need the number of bytes that specific files have. – Jim Sep 30 '17 at 21:02
  • You changed the question, adding a restriction about "binary files". Is this a relevant restriction really since you are picking explicit file names? If so, what's your definition of a "binary file"? – Kusalananda Oct 02 '17 at 10:05
  • @Kusalananda: My bad, I didn't post it properly I am sorry. Binary file has binary data. Not sure if it is relevant since e.g. cat to all the files wont work – Jim Oct 02 '17 at 11:53
  • @Jim cat works on binary data, no problem. Utilities that interpret the data as text won't work though. – Kusalananda Oct 02 '17 at 11:58

7 Answers7

12

Use du with the -c (print total) and -b (bytes) options:

$ ls -l
total 12
-rw-r--r-- 1 terdon terdon  6 Sep 29 17:36 a.txt
-rw-r--r-- 1 terdon terdon 12 Sep 29 17:38 b.txt
-rw-r--r-- 1 terdon terdon 17 Sep 29 17:38 c.txt

Now, run du:

$ du -bc a.txt b.txt c.txt
6   a.txt
12  b.txt
17  c.txt
35  total

And if you just want the total size in a variable:

$ var=$( du -bc a.txt b.txt c.txt | tail -n1 | cut -f1)
$ echo $var
35
terdon
  • 242,166
  • If I run without the b what is the number I get? – Jim Oct 02 '17 at 08:31
  • @Jim it's the space the file(s) use on the disk which depends on the filesystem block size. For example, consider printf '1234' > file. That creates a file with 4 bytes (wc -c file). On a system with a 4KiB block size (which is probably what you have), that will use 1 4KiB block on the file system. Now, look at printf '123' >file. wc -c file reports 3, du -b file also shows 3, but du file shows 4 since that is the size of the file on disk since the smallest unit of size for the file system is 4. But this really should be another question. – terdon Oct 02 '17 at 08:57
  • I did the test indeed I see printed 4 but what is the 4? bytes? Also how do I see the 1 4kiB block used? ls also shows 3 – Jim Oct 03 '17 at 08:21
9

Using stat and awk:

$ stat --printf '%s\n' some individual files here | awk '{ s += $1 } END { print s }'

stat with the given --printf format (on Linux) will output the file sizes of the given files. The awk code then sums these up and reports the grand total.

For macOS:

$ stat -f '%z' some individual files here | awk '{ s += $1 } END { print s }'

The stat utility is non-portable, but you may wrap it in a portability shell script (or shell function):

#!/bin/sh

case $(uname) in
    Linux)       stat --printf '%s\n' "$@" ;;
    Darwin|*BSD) stat -f '%z' "$@" ;;
    *) echo 'Unknown system. I do not know how stat works here' >&2
       exit 1 ;;
esac | awk '{ s += $1 } END { print s }'

This would be called as

$ ./script a b c

where a, b and c are the files whose size in bytes you'd like to add up.

Another solution would be to install GNU coreutils on the macOS system to get access to the same stat implementation as on Linux.


On Linux, you'd be also be able to do

$ du -bcl some individual files here | awk 'END { print $1 }'

but there's no equivalent to this on macOS or the BSD systems (the -b flag is not implemented) unless GNU coreutils is installed.

Kusalananda
  • 333,661
9

With GNU find, you can do:

find a.txt b.txt c.txt -prune -printf '%s\n' | paste -sd + - | bc

That gives the size as reported by ls -l or the stat() system call. For non-regular file types (like fifo, device, symlink), depending on the system, that may not necessarily give you the number of bytes that would be read from them if they were. See there for more options for those.

You could do:

cat a.txt b.txt c.txt | wc -c

for that, but that's not something you'd want to do for fifos or some device files like /dev/zero or /dev/random.

You can add the -L option to the find command to resolve symlinks and get the size of the target instead.

POSIXly, the only command that can get you the file size as returned by the lstat() system call is ls unfortunately.

ls -l doesn't return the size for block devices. It is very difficult to parse its output reliably, and can only be done in a foolproof way (for compliant implementations and for non-device files) for one file at a time:

getsize() {
  LC_ALL=C ls -nd -- "$1" | awk '
   {
     if (/^[cb]/) print 0
     else print $5
     exit
   }
   END {exit (!NR)}'
}

(here assuming a size of 0 for device files which is always true on Linux, but not on all systems).

Then you can do:

sum=0
for file in a b c; do
  sum=$((sum + $(getsize "$file")))
done
echo "$sum"
7

how can I get the total number of bytes of these 3 files

wc + sed approach:

wc -c a.txt b.txt c.txt | sed '$!d;s/total//;'

  • wc -c [FILE]... - print the byte count for each specified file. For multiple files prints the line with total number of bytes (as the last line)
4

Concatenate all the files and use wc to count the bytes.

cat a.txt b.txt c.txt | wc -c

Note that this will be slow for very large files, since it has to read them. Solutions that use commands like stat and find to get the byte counts from the metadata and sum them will probably be faster.

Barmar
  • 9,927
2

du will be able to solve your problem. du will show you the size of the files, and then the size of all the files summed up in your directory.

du -h /path/to/dir

du - estimate file space usage

  • But I want to actually sum the size in a variable. Not just see them – Jim Sep 29 '17 at 13:41
  • 2
    @Jim in that case, please [edit] your question and explain that,. You don't mention that you need to save this in a variable anywhere. – terdon Sep 29 '17 at 14:36
1

Lets say you have a directory files under which you have a.txt b.txt c.txt. Try this:

du -sb files

A sample output can be:

du -sb files
492777810   files

492777810 is the number of bytes.

ss_iwe
  • 1,146