Sum total bytes of files

Question

If I have files a, b and c in a directory on a Linux machine. How can I get the total number of bytes of these 3 files in a way that does not depend on how e.g. ls shows the information? I mean I am interested in a way that is not error prone

Update
1) I am interested in binary files not ascii files
2) It would be ideal to be a portable solution e.g. GNU linux or Mac working

What are the errors that you're trying to avoid? Are you OK with double-counting hard links? How about symlinks? And, since it's unclear from your post, are you looking for the size of the file's contents, or the amount of disk space they consume (ie, "test" is 4 bytes but might consume 4k or more depending on disk format). — kdgregory, Sep 30 '17 at 16:40
@kdgregory:I only need the number of bytes that specific files have. — Jim, Sep 30 '17 at 21:02
You changed the question, adding a restriction about "binary files". Is this a relevant restriction really since you are picking explicit file names? If so, what's your definition of a "binary file"? — Kusalananda, Oct 02 '17 at 10:05
@Kusalananda: My bad, I didn't post it properly I am sorry. Binary file has binary data. Not sure if it is relevant since e.g. cat to all the files wont work — Jim, Oct 02 '17 at 11:53
@Jim cat works on binary data, no problem. Utilities that interpret the data as text won't work though. — Kusalananda, Oct 02 '17 at 11:58

score 12 · Answer 1 · answered Sep 29 '17 at 14:41

12

Use du with the -c (print total) and -b (bytes) options:

$ ls -l
total 12
-rw-r--r-- 1 terdon terdon  6 Sep 29 17:36 a.txt
-rw-r--r-- 1 terdon terdon 12 Sep 29 17:38 b.txt
-rw-r--r-- 1 terdon terdon 17 Sep 29 17:38 c.txt

Now, run du:

$ du -bc a.txt b.txt c.txt
6   a.txt
12  b.txt
17  c.txt
35  total

And if you just want the total size in a variable:

$ var=$( du -bc a.txt b.txt c.txt | tail -n1 | cut -f1)
$ echo $var
35

answered Sep 29 '17 at 14:41

terdon

242,166

If I run without the b what is the number I get? – Jim Oct 02 '17 at 08:31
@Jim it's the space the file(s) use on the disk which depends on the filesystem block size. For example, consider printf '1234' > file. That creates a file with 4 bytes (wc -c file). On a system with a 4KiB block size (which is probably what you have), that will use 1 4KiB block on the file system. Now, look at printf '123' >file. wc -c file reports 3, du -b file also shows 3, but du file shows 4 since that is the size of the file on disk since the smallest unit of size for the file system is 4. But this really should be another question. – terdon Oct 02 '17 at 08:57
I did the test indeed I see printed 4 but what is the 4? bytes? Also how do I see the 1 4kiB block used? ls also shows 3 – Jim Oct 03 '17 at 08:21

Kusalananda · Answer 2 · 2017-10-02T09:55:42.983

9

Using stat and awk:

$ stat --printf '%s\n' some individual files here | awk '{ s += $1 } END { print s }'

stat with the given --printf format (on Linux) will output the file sizes of the given files. The awk code then sums these up and reports the grand total.

For macOS:

$ stat -f '%z' some individual files here | awk '{ s += $1 } END { print s }'

The stat utility is non-portable, but you may wrap it in a portability shell script (or shell function):

#!/bin/sh

case $(uname) in
    Linux)       stat --printf '%s\n' "$@" ;;
    Darwin|*BSD) stat -f '%z' "$@" ;;
    *) echo 'Unknown system. I do not know how stat works here' >&2
       exit 1 ;;
esac | awk '{ s += $1 } END { print s }'

This would be called as

$ ./script a b c

where a, b and c are the files whose size in bytes you'd like to add up.

Another solution would be to install GNU coreutils on the macOS system to get access to the same stat implementation as on Linux.

On Linux, you'd be also be able to do

$ du -bcl some individual files here | awk 'END { print $1 }'

but there's no equivalent to this on macOS or the BSD systems (the -b flag is not implemented) unless GNU coreutils is installed.

edited Oct 02 '17 at 09:55

answered Sep 29 '17 at 13:36

Kusalananda

333,661

2

Also note that if any of those files are of type directory, the size of all files in the directory tree underneath will be added. – Stéphane Chazelas Sep 29 '17 at 14:12
When you say --printf on Linux you mean it behaves differently e.g. on Mac? – Jim Oct 02 '17 at 08:36
@Jim I mean that it's implemented by the stat utility on Linux. The stat utility on macOS (or BSD) does not have this flag. It's a Linux-specific command line flag. But you said you ran on Linux, so I did not give a macOS solution. – Kusalananda Oct 02 '17 at 08:38
@Jim See updated answer. – Kusalananda Oct 02 '17 at 09:56
Most systems where uname returns Linux will have the busybox implementation of stat (or the Android equivalent), not the GNU one. stat -c %s works with both busybox and GNU stat. It may be better to identify the stat implementation rather than the OS. – Stéphane Chazelas Oct 02 '17 at 10:12
@Jim, See How can I get the size of a file in a bash script? or Full file date (without GNU utilities) / Convert ls -l output format to chmod format for more information about the various stat implementations out there. – Stéphane Chazelas Oct 02 '17 at 10:22

Stéphane Chazelas · Answer 3 · 2017-10-02T12:21:53.107

With GNU find, you can do:

find a.txt b.txt c.txt -prune -printf '%s\n' | paste -sd + - | bc

That gives the size as reported by ls -l or the stat() system call. For non-regular file types (like fifo, device, symlink), depending on the system, that may not necessarily give you the number of bytes that would be read from them if they were. See there for more options for those.

You could do:

cat a.txt b.txt c.txt | wc -c

for that, but that's not something you'd want to do for fifos or some device files like /dev/zero or /dev/random.

You can add the -L option to the find command to resolve symlinks and get the size of the target instead.

POSIXly, the only command that can get you the file size as returned by the lstat() system call is ls unfortunately.

ls -l doesn't return the size for block devices. It is very difficult to parse its output reliably, and can only be done in a foolproof way (for compliant implementations and for non-device files) for one file at a time:

getsize() {
  LC_ALL=C ls -nd -- "$1" | awk '
   {
     if (/^[cb]/) print 0
     else print $5
     exit
   }
   END {exit (!NR)}'
}

(here assuming a size of 0 for device files which is always true on Linux, but not on all systems).

Then you can do:

sum=0
for file in a b c; do
  sum=$((sum + $(getsize "$file")))
done
echo "$sum"

RomanPerekhrest · Answer 4 · 2017-09-29T14:24:05.287

7

how can I get the total number of bytes of these 3 files

wc + sed approach:

wc -c a.txt b.txt c.txt | sed '$!d;s/total//;'

wc -c [FILE]... - print the byte count for each specified file. For multiple files prints the line with total number of bytes (as the last line)

edited Sep 29 '17 at 14:24

answered Sep 29 '17 at 14:17

RomanPerekhrest

30,212

score 4 · Answer 5 · answered Sep 30 '17 at 07:05

4

Concatenate all the files and use wc to count the bytes.

cat a.txt b.txt c.txt | wc -c

Note that this will be slow for very large files, since it has to read them. Solutions that use commands like stat and find to get the byte counts from the metadata and sum them will probably be faster.

answered Sep 30 '17 at 07:05

Barmar

9,927

1

+1 for a useful use of cat ;) – Chris Davies Sep 30 '17 at 08:49
this is duplicate answer of @Stéphane Chazelas's answer (even just part of it). https://unix.stackexchange.com/questions/395156/sum-total-bytes-of-files/395165#395165 – RomanPerekhrest Oct 01 '17 at 06:06
Oops, I didn't read all the way through, I just saw the find part. – Barmar Oct 01 '17 at 13:02

score 2 · Answer 6 · answered Sep 29 '17 at 13:34

2

du will be able to solve your problem. du will show you the size of the files, and then the size of all the files summed up in your directory.

du -h /path/to/dir

du - estimate file space usage

answered Sep 29 '17 at 13:34

Hunter.S.Thompson

8,954

But I want to actually sum the size in a variable. Not just see them – Jim Sep 29 '17 at 13:41
2

@Jim in that case, please [edit] your question and explain that,. You don't mention that you need to save this in a variable anywhere. – terdon Sep 29 '17 at 14:36

score 1 · Answer 7 · answered Sep 29 '17 at 13:37

1

Lets say you have a directory files under which you have a.txt b.txt c.txt. Try this:

du -sb files

A sample output can be:

du -sb files
492777810   files

492777810 is the number of bytes.

answered Sep 29 '17 at 13:37

ss_iwe

1,146

I don't want to depend on a directory. I want to sum only specific files – Jim Sep 29 '17 at 13:43

Sum total bytes of files

7 Answers7