I need to find the largest files in a folder.
How do I scan a folder recursively and sort the contents by size?
I have tried using ls -R -S
, but this lists the directories as well.
I also tried using find
.
I need to find the largest files in a folder.
How do I scan a folder recursively and sort the contents by size?
I have tried using ls -R -S
, but this lists the directories as well.
I also tried using find
.
You can also do this with just du
. Just to be on the safe side I'm using this version of du
:
$ du --version
du (GNU coreutils) 8.5
The approach:
$ du -ah <some DIR> | grep -v "/$" | sort -rh
The command du -ah DIR
will produce a list of all the files and directories in a given directory DIR
. The -h
will produce human readable sizes which I prefer. If you don't want them then drop that switch. I'm using the head -6
just to limit the amount of output!
$ du -ah ~/Downloads/ | head -6
4.4M /home/saml/Downloads/kodak_W820_wireless_frame/W820_W1020_WirelessFrames_exUG_GLB_en.pdf
624K /home/saml/Downloads/kodak_W820_wireless_frame/easyshare_w820.pdf
4.9M /home/saml/Downloads/kodak_W820_wireless_frame/W820_W1020WirelessFrameExUG_GLB_en.pdf
9.8M /home/saml/Downloads/kodak_W820_wireless_frame
8.0K /home/saml/Downloads/bugs.xls
604K /home/saml/Downloads/netgear_gs724t/GS7xxT_HIG_5Jan10.pdf
Easy enough to sort it smallest to biggest:
$ du -ah ~/Downloads/ | sort -h | head -6
0 /home/saml/Downloads/apps_archive/monitoring/nagios/nagios-check_sip-1.3/usr/lib64/nagios/plugins/check_ldaps
0 /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/0/index/write.lock
0 /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/0/translog/translog-1365292480753
0 /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/1/index/write.lock
0 /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/1/translog/translog-1365292480946
0 /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/2/index/write.lock
Reverse it, biggest to smallest:
$ du -ah ~/Downloads/ | sort -rh | head -6
10G /home/saml/Downloads/
3.8G /home/saml/Downloads/audible/audio_books
3.8G /home/saml/Downloads/audible
2.3G /home/saml/Downloads/apps_archive
1.5G /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
1.5G /home/saml/Downloads/digital_blasphemy
Don't show me the directory, just the files:
$ du -ah ~/Downloads/ | grep -v "/$" | sort -rh | head -6
3.8G /home/saml/Downloads/audible/audio_books
3.8G /home/saml/Downloads/audible
2.3G /home/saml/Downloads/apps_archive
1.5G /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
1.5G /home/saml/Downloads/digital_blasphemy
835M /home/saml/Downloads/apps_archive/cad_cam_cae/salome/Salome-V6_5_0-LGPL-x86_64.run
If you want to exclude all directories from the output, you can use a trick with the presence of a dot character. This assumes that your directory names do not contain dots, and that the files you are looking for do. Then you can filter out the directories with grep -v '\s/[^.]*$'
:
$ du -ah ~/Downloads/ | grep -v '\s/[^.]*$' | sort -rh | head -2
1.5G /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
835M /home/saml/Downloads/apps_archive/cad_cam_cae/salome/Salome-V6_5_0-LGPL-x86_64.run
If you just want the list of smallest to biggest, but the top 6 offending files you can reverse the sort switch, drop (-r
), and use tail -6
instead of the head -6
.
$ du -ah ~/Downloads/ | grep -v "/$" | sort -h | tail -6
835M /home/saml/Downloads/apps_archive/cad_cam_cae/salome/Salome-V6_5_0-LGPL-x86_64.run
1.5G /home/saml/Downloads/digital_blasphemy
1.5G /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
2.3G /home/saml/Downloads/apps_archive
3.8G /home/saml/Downloads/audible
3.8G /home/saml/Downloads/audible/audio_books
grep -v "/$"
part doesn't seem to be doing what you expected, as the directories don't have a slash appended. Does anyone know how to exclude directories from results?
– Jan Warchoł
Feb 16 '15 at 10:14
/
s either - for example /home/saml/Downloads/audible
seems to be a directory, but it doesn't have a slash. Only /home/saml/Downloads/
has a slash, but that's probably because you wrote it with a slash when specifying the argument for initial du
.
– Jan Warchoł
Feb 16 '15 at 13:55
~/Downloads/
bit out. As you've stated, it's just to filter out the argument of ~/Downloads
when du
processes it. I changed the word directories to directory since I think that's ultimately what was causing the confusion. Thanks for the feedback!
– slm
Feb 16 '15 at 14:46
find
to generate a list of files only and then have du
tally them up.
– slm
Feb 16 '15 at 14:50
find . -type f -exec du -ah {} + | grep -v "/$" | sort -rh
– flochtililoch
Apr 16 '20 at 01:12
du --apparent-size
to view the length of the file in bytes (not the size taken on disk).
– nyanpasu64
Jul 05 '20 at 05:21
du
to provide a different output for directory (-S
changes their size, not the output format). So I resorted to a workaround that assumes that your directory names do not have any dots in them, and that the files you care about do. In that case you can replace the non-working grep -v '/$'
by grep -v '\s/[^.]*$'
. And you have to make sure you specify DIR as an absolute path, so there is no dot in there.
– odony
Sep 11 '20 at 08:35
If you want to find all files in the current directory and its sub directories and list them according to their size (without considering their path), and assuming none of the file names contain newline characters, with GNU find
, you can do this:
find . -type f -printf "%s\t%p\n" | sort -n
From man find
on a GNU system:
-printf format
True; print format on the standard output,
interpreting `\' escapes and `%' directives.
Field widths and precisions can be specified
as with the `printf' C function. Please note
that many of the fields are printed as %s
rather than %d, and this may mean that flags
don't work as you might expect. This also
means that the `-' flag does work (it forces
fields to be left-aligned). Unlike -print,
-printf does not add a newline at the end of
the string. The escapes and directives are:
%p File's name.
%s File's size in bytes.
From man sort
:
-n, --numeric-sort
compare according to string numerical value
Try the following command:
ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20
It'll list top-20 biggest files in the current directory recursively.
Note: The option -h
for sort
is not available on OSX/BSD, so you've to install sort
from coreutils
(e.g. via brew
) and apply the local bin path to PATH
, e.g.
export PATH="/usr/local/opt/coreutils/libexec/gnubin:$PATH" # Add a "gnubin" for coreutils.
Alternatively use:
ls -1Rs | sed -e "s/^ *//" | grep "^[0-9]" | sort -nr | head -n20
For the biggest directories use du
, e.g.:
du -ah . | sort -rh | head -20
or:
du -a . | sort -rn | head -20
This will find all files recursively, and sort them by size. It prints out all file sizes in kb, and rounds down so you may see 0 KB files, but it was close enough for my uses, and works on OSX.
find . -type f -print0 | xargs -0 ls -la | awk '{print int($5/1000) " KB\t" $9}' | sort -n -r -k1
find . -type f
finds files... it works recursively, you're right, but it lists all the files it finds, not the directories themselves
– Brad Parks
Feb 27 '18 at 12:44
$9
.
I tried to improve on this by printing ranges $9-end. I ended up with: find "$path" -type f -print0 | xargs -0 ls -la | awk '{for (i=5; i<NF; i++) { if (i == 5) { printf int($i/1024) " KiB\t" } else if (i >= 9 ) { printf $i " " } }; if (NF >= 5) print $NF; }' | sort -n -r -k1
. It seems to work, but feedback is much welcome to make this a bit more concise
Simple solution for Mac/Linux which skips directories:
find . -type f -exec du -h {} \; | sort -h
With zsh
, you'd find the largest file (in terms of apparent size like the size column in ls -l
output, not disk usage) with:
ls -ld -- **/*(DOL[1])
For the 6 largest ones:
ls -ld -- **/*(DOL[1,6])
To sort those by file size, you can use ls
's -S
option. Some ls
implementations also have a -U
option for ls
not to sort the list (as it's already sorted by size by zsh
here).
This is an incredibly commmon need for a variety of reasons (I like finding the most recent backup in a directory), and is a surprisingly simple task.
I'm going to provide a Linux solution that uses the find, xargs, stat, tail, awk, and sort utilities.
Most people have provided some unique answers, but I prefer mine because it properly handles filenames, and the use case can easily be changed (modify stat, and sort arguments)
I'll also provide a Python solution that should let you use this functionality even on Windows
find . -type f -print0 | xargs -0 -I{} stat -c '%s %n' {} | sort -n
# Each utility is split on a new line to help
# visualize the concept of transforming our data in a stream
find . -type f -print0 |
xargs -0 -I{} stat -c '%s %n' {} |
sort -n |
tail -n 1 |
awk '{print $2}'
# (Notice only the first argument of stat changed for new functionality!)
find . -type f -print0 | xargs -0 -I{} stat -c '%Y %n' {} |
sort -n | tail -n 1 | awk '{print $2}'
Explanation:
#!/usr/bin/env python
import os, sys
files = list()
for dirpath, dirname, filenames in os.walk(sys.argv[1]):
for filename in filenames:
realpath = os.path.join(dirpath, filename)
files.append(realpath)
files_sorted_by_size = sorted(files, key = lambda x: os.stat(x).st_size)
largest_file = files_sorted_by_size[-1]
print(largest_file)
This script takes a little big longer to explain, but essentially if you save that as a script, it will search through the first argument provided on the command line, and return the largest file in that directory. The script does no error checking, but it should give you an idea of how to approach this in Python, which gives you a nice platform independent way of solving this problem.
Try below command with sort option to have folders with size in ascending order
du -sh * | sort -sh
Variant of this answer from a similar question
find . -type f -exec du -ah {} + | sort -rh | more
find -type f -printf '%s %p\n' | numfmt --to=iec | sort -hr | head
find -type f
finds all files under current directory, recursively-printf '%s %p\n'
: for each file it prints file size in bytes and file name, separated by space, with newlinenumfmt --to=iec
formats the first field (file size) in human readable format (with K, M, G suffixes) and keep the file name unchangedsort -hr
sorts in reverse numerically order all lines, understanding human readable suffixeshead
prints only the first 10 linesSomething that works on any platform except AIX and HP-UX is:
find . -ls | sort +6 | tail