410

How can I get the size of a file in a bash script?

How do I assign this to a bash variable so I can use it later?

aknuds1
  • 103
haunted85
  • 4,261

13 Answers13

409

Your best bet if on a GNU system:

stat --printf="%s" file.any

From man stat:

%s total size, in bytes

In a bash script :

#!/bin/bash
FILENAME=/home/heiko/dummy/packages.txt
FILESIZE=$(stat -c%s "$FILENAME")
echo "Size of $FILENAME = $FILESIZE bytes."

NOTE: see @chbrown's answer for how to use stat on BSD or macOS systems.

Kusalananda
  • 333,661
b01
  • 4,434
141
file_size_kb=`du -k "$filename" | cut -f1`

The problem with using stat is that it is a GNU (Linux) extension. du -k and cut -f1 are specified by POSIX and are therefore portable to any Unix system.

Solaris, for example, ships with bash but not with stat. So this is not entirely hypothetical.

ls has a similar problem in that the exact format of the output is not specified, so parsing its output cannot be done portably. du -h is also a GNU extension.

Stick to portable constructs where possible, and you will make somebody's life easier in the future. Maybe your own.

Nemo
  • 2,063
  • 59
    du doesn't give the size of the file, it gives an indication of how much space the file uses, which is subtly different (usually the size reported by du is the size of the file rounded up to the nearest number of blocks, where a block is typically 512B or 1kB or 4kB). – Gilles 'SO- stop being evil' Jul 14 '11 at 10:00
  • 7
    @Gilles, sparse files (i.e., ones with holes in them) report less than the length. – vonbrand Jan 09 '16 at 22:03
  • 22
    This, with --bytes or -b instead of -k, should be the accepted answer. – Amedee Van Gasse Jan 08 '19 at 12:56
  • 2
    The -h ("human") option of du will produce the most appropriate answer for general cases: file_size=`du -h "$filename" | cut -f1 , as it will display K (kilobytes), M (Megabytes) or G (Gigabytes) as appropriate. – fralau Apr 01 '19 at 08:58
  • 3
    @fralau: The OP wants to "assign this to a bash variable so they can use it later", so it is much more likely they want an actual numeric value, not a human-readable approximation. Also, -h is a GNU extension; it is not standard – Nemo Apr 01 '19 at 16:19
  • 3
    Using du with --apparent-size flag will return a more precise size (as stated on man : print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in ('sparse') files, internal fragmentation, indirect blocks, and the like) – Hugo H Aug 06 '19 at 09:42
  • 2
    @AmedeeVanGasse - --bytes is unfortunately not available on BSD Unix (e.g. macos). Important since this answer is talking about POSIX compatibility. – Brian Mar 09 '21 at 21:12
101

You could also use the "word count" command (wc):

wc -c "$filename" | awk '{print $1}'

The problem with wc is that it'll add the filename and indent the output. For example:

$ wc -c somefile.txt
    1160 somefile.txt

If you would like to avoid chaining a full interpreted language or stream editor just to get a file size count, just redirect the input from the file so that wc never sees the filename:

wc -c < "$filename"

This last form can be used with command substitution to easily grab the value you were seeking as a shell variable, as mentioned by Gilles below.

size="$(wc -c <"$filename")"
Eugéne
  • 1,119
75

BSD's (macOS's) stat has a different format argument flag, and different field specifiers. From man stat(1):

  • -f format: Display information using the specified format. See the FORMATS section for a description of valid formats.
  • ... the FORMATS section ...
  • z: The size of file in bytes.

So all together now:

stat -f%z myfile1.txt

NOTE: see @b01's answer for how to use the stat command on GNU/Linux systems. :)

chbrown
  • 907
  • 6
  • 9
49

Depends what you mean by size.

size=$(wc -c < "$file")

will give you the number of bytes that can be read from the file. IOW, it's the size of the contents of the file. It will however read the contents of the file (except if the file is a regular file or symlink to regular file in most wc implementations as an optimisation). That may have side effects. For instance, for a named pipe, what has been read can no longer be read again and for things like /dev/zero or /dev/random which are of infinite size, it's going to take a while. That also means you need read permission to the file, and the last access timestamp of the file may be updated.

That's standard and portable, however note that some wc implementations may include leading blanks in that output. One way to get rid of them is to use:

size=$(($(wc -c < "$file")))

or to avoid an error about an empty arithmetic expression in dash or yash when wc produces no output (like when the file can't be opened):

size=$(($(wc -c < "$file") +0))

ksh93 has wc builtin (provided you enable it, you can also invoke it as command /opt/ast/bin/wc) which makes it the most efficient for regular files in that shell.

Various systems have a command called stat that's an interface to the stat() or lstat() system calls.

Those report information found in the inode. One of that information is the st_size attribute. For regular files, that's the size of the content (how much data could be read from it in the absence of error (that's what most wc -c implementations use in their optimisation)). For symlinks, that's the size in bytes of the target path. For named pipes, depending on the system, it's either 0 or the number of bytes currently in the pipe buffer. Same for block devices where depending on the system, you get 0 or the size in bytes of the underlying storage.

You don't need read permission to the file to get that information, only search permission to the directory it is linked to.

By chronological¹ order, there is:

  • IRIX stat (90's):

    stat -qLs -- "$file"
    

    returns the st_size attribute of $file (lstat()) or:

    stat -s -- "$file"
    

    same except when $file is a symlink in which case it's the st_size of the file after symlink resolution.

  • zsh stat builtin (now also known as zstat) in the zsh/stat module (loaded with zmodload zsh/stat) (1997):

    stat -L +size -- $file # st_size of file
    stat +size -- $file    # after symlink resolution
    

    or to store in a variable:

    stat -L -A size +size -- $file
    

    obviously, that's the most efficient in that shell.

  • GNU stat (2001); also in BusyBox stat since 2005 and Toybox stat since 2013 (both copying the GNU stat interface):

    stat -c %s -- "$file"  # st_size of file
    stat -Lc %s -- "$file" # after symlink resolution
    

    (note the meaning of -L is reversed compared to IRIX or zsh stat).

  • BSDs stat (2002):

    stat -f %z -- "$file"  # st_size of file
    stat -Lf %z -- "$file" # after symlink resolution
    

Or you can use the stat()/lstat() function of some scripting language like perl:

perl -le 'print((lstat shift)[7])' -- "$file"

AIX also has an istat command which will dump all the stat() (not lstat(), so won't work on symlinks) information and which you could post-process with, for example:

LC_ALL=C istat "$file" | awk 'NR == 4 {print $5}'

(thanks @JeffSchaller for the help figuring out the details).

In tcsh:

@ size = -Z $file:q

(size after symlink resolution)

Long before GNU introduced its stat command, the same could be achieved with GNU find command with its -printf predicate (already in 1991):

find -- "$file" -prune -printf '%s\n'    # st_size of file
find -L -- "$file" -prune -printf '%s\n' # after symlink resolution

One issue though is that doesn't work if $file starts with - or is a find predicate (like !, (...).

Since version 4.9, that can be worked around by passing the file path through its stdin rather than as an argument with:

printf '%s\0' "$file" |
  find -files0-from - -prune -printf '%s\n'

The standard command to get the stat()/lstat() information is ls.

POSIXly, you can do:

LC_ALL=C ls -dln -- "$file" | awk '{print $5; exit}'

(-n is required to imply -l so the latter should not be necessary, but you'll find that on some BSDs, it is).

and add -L for the same after symlink resolution. That doesn't work for device files though where the 5th field is the device major number instead of the size.

For block devices, systems where stat() returns 0 for st_size, usually have other APIs to report the size of the block device. For instance, Linux has the BLKGETSIZE64 ioctl(), and most Linux distributions now ship with a blockdev command that can make use of it:

blockdev --getsize64 -- "$device_file"

However, you need read permission to the device file for that. It's usually possible to derive the size by other means. For instance (still on Linux):

lsblk -bdno size -- "$device_file"

Should work except for empty devices.

An approach that works for all seekable files (so includes regular files, most block devices and some character devices) is to open the file and seek to the end:

  • With zsh (after loading the zsh/system module):

    {sysseek -w end 0 && size=$((systell(0)))} < $file
    
  • With ksh93:

    < "$file" <#((size=EOF))
    

    or

    { size=$(<#((EOF))); } < "$file"
    
  • with perl:

    perl -le 'seek STDIN, 0, 2 or die "seek: $!"; print tell STDIN' < "$file"
    

For named pipes, we've seen that some systems (AIX, Solaris, HP/UX at least) make the amount of data in the pipe buffer available in stat()'s st_size. Some (like Linux or FreeBSD) don't.

On Linux at least, you can use the FIONREAD ioctl() after having opened the pipe (in read+write mode to avoid it hanging):

fuser -s -- "$fifo_file" && 
  perl -le 'require "sys/ioctl.ph";
            ioctl(STDIN, &FIONREAD, $n) or die$!;
            print unpack "L", $n' <> "$fifo_file"

However note that while it doesn't read the content of the pipe, the mere opening of the named pipe here can still have side effects. We're using fuser to check first that some process already has the pipe open to alleviate that but that's not foolproof as fuser may not be able to check all processes.

Now, so far we've only been considering the size of the primary data associated with the files. That doesn't take into account the size of the metadata and all the supporting infrastructure needed to store that file.

Another inode attribute returned by stat() is st_blocks. That's the number of 512 byte (1024 on HP/UX) blocks that is used to store the file's data (and sometimes some of its metadata like the extended attributes on ext4 filesystems on Linux). That doesn't include the inode itself, or the entries in the directories the file is linked to.

Size and disk usage are not necessarily tightly related as compression, sparseness (sometimes some metadata), extra infrastructure like indirect blocks in some filesystems have an influence on the latter.

That's typically what du uses to report disk usage. Most of the commands listed above will be able to get you that information.

  • POSIXLY_CORRECT=1 ls -sd -- "$file" | awk '{print $1; exit}'
  • POSIXLY_CORRECT=1 du -s -- "$file" (not for directories where that would include the disk usage of the files within).
  • GNU find -- "$file" -printf '%b\n'
  • zstat -L +block -- $file
  • GNU stat -c %b -- "$file"
  • BSD stat -f %b -- "$file"
  • perl -le 'print((lstat shift)[12])' -- "$file"

¹ Strictly speaking, early versions of UNIX in the 70s, from v1 to v4 had a stat command. It was just dumping information from the inode and didn't take options. It apparently disappeared in v5 (1974) presumably because it was redundant with ls -l.

  • clearly the most comprehensive and informational answer. thank you. i can use this to create cross platform bash scripts using the BSD and GNU stats info – oligofren Jan 11 '17 at 12:50
  • 1
    Fun fact: GNU coreutils wc -c uses fstat, but then reads the last up-to st_blksize bytes. Apparently this is because files in Linux's /proc and /sys for example have stat sizes that are only approximate. This is good for correctness, but bad if the end of the file is on disk and not in memory (esp. if used on many files in a loop). And very bad if the file is migrated to near-line tape storage, or e.g. a FUSE transparent-decompression filesystem. – Peter Cordes Apr 12 '17 at 05:48
  • wouldnt this work too ls -go file | awk '{print $3}' – Zombo Feb 08 '18 at 13:00
  • @StevenPenny those -go would be the SysV ones, they wouldn't work on BSDs (optional (XSI) in POSIX). You'd also need ls -god file | awk '{print $3; exit}' (-d for it to work on directories, exit for symlinks with newlines in the target). The problems with device files also remain. – Stéphane Chazelas Feb 08 '18 at 22:31
  • @StéphaneChazelas - do wc -c <binary_file can return the size of a binary file correctly in bytes?I read in this post on SO that it works, but can you please also confirm. thank you – αғsнιη Feb 17 '19 at 05:48
  • 1
    @αғsнιη the Unix API makes no distinction between text and binary files. It's all sequences of bytes. Some applications may want to interpret those bytes as text but obviously not wc -c which reports the number of bytes. – Stéphane Chazelas Feb 17 '19 at 08:41
  • @haunted85, I've rejected your edit. wc -c < file in most wc implementations will not read the contents of file if its regular but do fstat() (and lseek() to retrieve the initial position in the file) to determine its size. You can run strace wc -c < file or the equivalent command on your system to check. – Stéphane Chazelas Jun 14 '21 at 07:26
24

This script combines many ways to calculate the file size:

(
  du --apparent-size --block-size=1 "$file" 2>/dev/null ||
  gdu --apparent-size --block-size=1 "$file" 2>/dev/null ||
  find "$file" -printf "%s" 2>/dev/null ||
  gfind "$file" -printf "%s" 2>/dev/null ||
  stat --printf="%s" "$file" 2>/dev/null ||
  stat -f%z "$file" 2>/dev/null ||
  wc -c <"$file" 2>/dev/null
) | awk '{print $1}'

The script works on many Unix systems including Linux, BSD, OSX, Solaris, SunOS, etc.

The file size shows the number of bytes. It is the apparent size, which is the bytes the file uses on a typical disk, without special compression, or special sparse areas, or unallocated blocks, etc.

This script has a production version with more help and more options here: https://github.com/SixArm/file-size

12

stat appears to do this with the fewest system calls:

$ set debian-live-8.2.0-amd64-xfce-desktop.iso

$ strace stat --format %s $1 | wc
    282    2795   27364

$ strace wc --bytes $1 | wc
    307    3063   29091

$ strace du --bytes $1 | wc
    437    4376   41955

$ strace find $1 -printf %s | wc
    604    6061   64793
8

ls -l filename will give you lots of information about a file, including its file size, permissions and owner.

The file size in the fifth column, and is displayed in bytes. In the example below, the filesize is just under 2KB:

-rw-r--r-- 1 user owner 1985 2011-07-12 16:48 index.php

Edit: This is apparently not as reliable as the stat command.

Druckles
  • 271
  • I think both ls -l and stat command give reliable size information. I did not find any reference to the contrary. ls -s will give size in number of blocks. – dabest1 Dec 31 '12 at 22:23
  • 3
    @dabest1 it's not reliable in a sense that in another unix, their output can be different (and in some unixes it is). – Eugene Bujak Oct 02 '14 at 14:39
  • Yes, IIRC, Solaris didn't display the group name by default, leading to fewer columns in the output. – Edward Falk Apr 04 '16 at 16:31
  • Since the size is pure numeric, surrounded by whitespace, and the date year is pure numeric, in a defined format, it would be possible to use a regexp to treat user+owner as one field, whether or not the group was present. (an exercise for the reader !) – MikeW Feb 21 '17 at 15:31
8

du filename will tell you disk usage in bytes.

I prefer du -h filename, which gives you the size in a human readable format.

Teddy
  • 205
5

Create small utility functions in your shell scripts that you can delegate to.

Example

#! /bin/sh -
# vim: set ft=sh

# size utility that works on GNU and BSD systems
size(){
    case $(uname) in
        (Darwin | *BSD*)
            stat -Lf %z -- "$1";;
        (*) stat -c %s -- "$1"
    esac
}

for f do
    printf '%s\n' "$f : $(gzip < "$f" | wc -c) bytes (versus $(size "$f") bytes)"
done

Based on info from @Stéphane Chazelas' answer.

oligofren
  • 1,150
  • 1
    See also gzip -v < file > /dev/null to check the compressibility of a file. – Stéphane Chazelas Jan 11 '17 at 14:36
  • @StéphaneChazelas not sure if i think it was an improvement. those case statements can easily put noobs off; I certainly never remember how to get them right :-) are case statements inherently more portable since you did it? i see the point when there are more than two cases, but otherwise ...+ – oligofren Jan 11 '17 at 16:48
  • 1
    I suppose it's also a matter of taste, but here it's the typical case where you'd want to use a case statement. case is the Bourne/POSIX construct to do pattern matching. [[...]] is ksh/bash/zsh only (with variations). – Stéphane Chazelas Jan 11 '17 at 16:55
4

I found an AWK 1 liner, and it had a bug but I fixed it. I also added in PetaBytes after TeraBytes.

FILE_SIZE=234234 # FILESIZE IN BYTES
FILE_SIZE=$(echo "${FILE_SIZE}" | awk '{ split( "B KB MB GB TB PB" , v ); s=1; while( $1>1024 ){ $1/=1024; s++ } printf "%.2f %s", $1, v[s] }')

Considering stat is not on every single system, you can almost always use the AWK solution. Example; the Raspberry Pi does not have stat but it does have awk.

dragon788
  • 852
0

Fastest and simplest (IMO) method is:

bash_var=$(stat -c %s /path/to/filename)
  • 2
    Then upvote one or more of the existing answers that mention stat; no need to repeat it again... – Jeff Schaller Nov 21 '18 at 01:16
  • 1
    @JeffSchaller I just upvoted Stephane's answer on your instructions. I think it is too complicated for my purposes. Which is why I posted this simple answer for like minded souls. – WinEunuuchs2Unix Nov 21 '18 at 01:21
  • 1
    Thank you; it's just that a sixth instance of a "stat" answer doesn't simplify this Q & A, but would rather make a new reader ask themselves "how is this answer different from the other ones?" and lead to more confusion instead of less. – Jeff Schaller Nov 21 '18 at 01:32
  • @JeffSchaller I guess. But I could complain about the many du and wc answers that should have a disclaimer NEVER DO THIS in real life. I just used my answer in a real life application tonight and thought it was worthwhile sharing. I guess we all have our opinions shrugs. – WinEunuuchs2Unix Nov 21 '18 at 01:36
-1

I like the wc option myself. Paired with 'bc,' you can get decimals to as many places as you please.

I was looking to improve a script I had that awk'ed out the 'file size' column of an 'ls -alh' command. I didn't want just integer file sizes, and two decimals seemed to suit, so after reading this discussion, I came up with the code below.

I suggest breaking the line at the semicolons if you include this in a script.

file=$1; string=$(wc -c $file); bite=${string% *}; okay=$(echo "scale=2; $bite/1024" | bc);friend=$(echo -e "$file $okay" "kb"); echo -e "$friend"

My script is called gpfl, for "get picture file length." I use it after doing a mogrify on a file in imagemagick, before opening or re-loading a picture in a GUI jpeg viewer.

I don't know how this rates as an "answer," as it borrows much from what's already been offered and discussed. So I'll leave it there.

BZT

BZT
  • 17
  • 1
    I would prefer using "stat" or "ls". Typically I don't like using "wc" to get file sizes because it physically reads the entire file. If you have a lot of files, or particularly large files, this can take a lot of time. But your solution is creative...+1. – Kevin Fegan Dec 09 '13 at 19:18
  • 2
    I agree with notion of using "stat" over "wc" for filesize, however if you use "wc -c", no data will be read; instead lseek will be used to figure out the number of bytes in a file. http://lingrok.org/xref/coreutils/src/wc.c#228 – bbaja42 Dec 14 '14 at 14:38
  • 1
    @bbaja42: note that GNU Coreutils wc does read the last block of the file, in case stat.st_size was only an approximation (like for Linux /proc and /sys files). I guess they decided not to make the main comment more complicated when they added that logic a couple lines down: http://lingrok.org/xref/coreutils/src/wc.c#246 – Peter Cordes Apr 12 '17 at 05:53