19

I'm basically looking for files then sorting by the size. The script works if I don't sort the size by human readable. But I want the size to be human readable. How can I sort sizes that are human readable?

For example:

 ls -l | sort -k 5 -n | awk '{print $9 " " $5}'

This works as expected, I got the size of my files in bytes ascending:

1.txt 1
test.txt 3
bash.sh* 573
DocGeneration.txt 1131
andres_stuff.txt 1465
Branches.xlsx 15087
foo 23735
bar 60566
2016_stuff.pdf 996850

Now, I want the size to be human readable, so I added an -h parameter to ls, and now some files are out of order:

 ls -lh | sort -k 5 -n | awk '{print $9 " " $5}'
1.txt 1
DocGeneration.txt 1.2K
andres_stuff.txt 1.5K
test.txt 3
Branches.xlsx 15K
foo 24K
bar 60K
bash.sh* 573
2016_stuff.pdf 974K

tvo000
  • 193

5 Answers5

42

Try sort -h k2

-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)

It is part of gnu sort, BSD sort, and others.

  • 5
    Shouldn't parsing the output of ls be avoided? –  Jun 13 '19 at 19:26
  • Thanks. This actually solved my problem. I had to move the sort command to the end because I was having some errors. Now my solution looks like this: find . -type f -size -1024k -exec ls -ahl {} ; | awk '{print $9 " " $5}' | cut -c 3- | column -t | sort -k 2 -h. – tvo000 Jun 13 '19 at 21:07
  • 3
    @Tomasz Not always. If it provides the output you need, piping it to another formatting operation is not particularly dangerous. What you should not do is loop over the output of ls, and instead use file globbing directly. Globbing alone won't work here. That said, I would probably prefer du for this. – Bloodgain Jun 14 '19 at 22:33
  • 1
    @Bloodgain the ls format is not guaranteed to be the same across systems/ls binaries, so parsing it portably is considered impossible. – D. Ben Knoble Jun 17 '19 at 03:00
  • 1
    Also, filenames with whitespace will mangle things – D. Ben Knoble Jun 17 '19 at 03:06
  • @D. Ben Knoble OK, then, just tell me another good way to get the same results as ls -l | grep -v ^l | wc -l (a count of all non-symlink files in the current directory). This is even the way the TLDP Bash How-To recommends counting files. – Bloodgain Jun 21 '19 at 13:45
  • 1
    @Bloodgain : files=(); for f in *; do [[ -L "$f" ]] && files+=("$f"); done; echo ${#files[@]} (I might have the is a symlink test switch wrong). If you don’t care about symlinks, files=(*); echo ${#files[@]}, which becomes portable if you use set and not arrays. – D. Ben Knoble Jun 21 '19 at 14:15
  • 1
    @D.BenKnoble A bit complex for something that's more or less guaranteed to work with ls on any system, but I approve of your excellent bash-fu. As a function/alias, I would want to avoid creating the files variable, but the approach should be similar to just create a counter. Now that I'm thinking about it, I bet you could do something clever with the piped output of echo * or find . -maxdepth 1, but you'd need to handle some special cases. – Bloodgain Jun 25 '19 at 05:47
29

ls has this functionality built in, use the -S option and sort in reverse order: ls -lShr

       -r, --reverse
              reverse order while sorting

       -S     sort by file size, largest first
  • 1
    -h is not a standard ls option, but must be usable if OP already has it. The rest are standard, and it's certainly the answer I would have written. – Toby Speight Jun 14 '19 at 10:44
  • 6
    +1 Don't mess around parsing the output of ls. – David Richerby Jun 14 '19 at 10:59
  • This is the best answer, but it should include the info in @Toby's comment: -S might not be available for your ls. FWIW, -S is supported even with Emacs's libraryls-lisp.el, which is used when the OS has no ls. It works in Emacs on MS Windows, for example. – Drew Jun 14 '19 at 16:37
  • This should be the accepted answer. – Christian Legge Jun 14 '19 at 17:32
  • @scatter, ls isn't the only program that can output human-readable sizes. Knowing how to sort the output of something like du is useful. – Mark Jun 14 '19 at 22:24
  • This is the only correct answer here – Gaius Jun 15 '19 at 09:07
  • 1
    @Drew: Toby's comment says that -h may not be universally available, but OP is already using it anyway. -S really should be universally available, because it's in the POSIX link that Toby provides. However, quite a few non-POSIX toolkits do exist out there. – Kevin Jun 16 '19 at 18:58
  • @Drew: That's what affects Emacs users on Windows. I daresay WSL is a bigger audience. – Kevin Jun 16 '19 at 20:44
  • @Kevin: ls-lisp.el is used by default on Windows. But any user, on any platform can use it. And it does support hS. – Drew Jun 16 '19 at 22:55
  • @Drew: No, it isn't, by default, used on Windows (I just tried it). cmd.exe thinks there's no such thing as ls ("'ls' is not recognized as an internal or external command, operable program or batch file."). PowerShell has ls as an alias for Get-ChildItem, which is written in .Net and has nothing to do with Lisp or Emacs. WSL, depending on distro, most likely has some variation of GNU coreutils installed, or perhaps BusyBox. The only way you get ls-lisp.el is if you choose to install Emacs, which is decidedly not the normal meaning of "used by default on Windows." – Kevin Jun 16 '19 at 23:39
  • @kevin: By "ls-lisp.el is used by default on Windows" I of course meant that it is used by default in Emacs on Windows. If you do not use Emacs then I cannot imagine that you will use or care about Emacs-Lisp library ls-lisp.el. Using vanilla Emacs (i.e., emacs -Q) on Windows uses ls-lisp.el for Dired. But perhaps I'm misunderstanding you. – Drew Jun 17 '19 at 16:14
  • @Drew: Neither the question nor this answer have anything to do with Emacs, so I don't see why you keep insisting it's relevant. – Kevin Jun 17 '19 at 16:58
  • @kevin: I said "FWIW" in my original comment, and specifically said that ls-lisp.el is for use with Emacs. But I retract my second comment, which said that it would be good here to mention ls-lisp.el. You're right that the Q is not about use in Emacs. I've deleted that comment. (Nevertheless, you can use Emacs on any platform to get -S, if -S is not available on that platform.) Thx. – Drew Jun 17 '19 at 17:37
  • ls -S doesn't help if you also need directory sizes. There you need something like du -hs *|sort -h – Patrick Cornelissen Mar 18 '21 at 11:30
5

Since no specific shell was mentioned, here's how to do the whole thing in the zsh shell:

ls -lhf **/*(.Lk-1024oL)

The ** glob pattern matches like * but across / in pathnames, i.e. like a recursive search would do.

The ls command would enable human readable sizes with -h, and long list output format with -l. The -f option disables sorting, so ls would just list the files in the order they are given.

This order is arranged by the **/*(.Lk-1024oL) filename globbing pattern so that the smaller files are listed first. The **/* bit matches every file and directory in this directory and below, but the (...) modifies the glob's behaviour (it's a "glob qualifier").

It's the oL at the end that orders (o) the names by file size (L, "length").

The . at the start makes the glob only match regular files (no directories).

The Lk-1024 bit selects files whose size is less than 1024 KB ("length in KB less than 1024").

If zsh is not your primary interactive shell, then you could use

zsh -c 'ls -lf **/*(.Lk-1024oL)'

Use setopt GLOB_DOTS (or zsh -o GLOB_DOTS -c ...) to also match hidden names. ... or just add D to the glob qualifier string.


Expanding on the above, assuming that you'd want a 2-column output with pathnames and human readable sizes, and also assuming that you have numfmt from GNU coreutils,

zmodload -F zsh/stat b:zstat

for pathname in **/*(.Lk-1024oL); do
    printf '%s\t%s\n' "$pathname" "$(zstat +size "$pathname" | numfmt --to=iec)"
done

or, quicker,

paste <( printf '%s\n' **/*(.Lk-1024oL) ) \
      <( zstat -N +size **/*(.Lk-1024oL) | numfmt --to=iec )
Kusalananda
  • 333,661
4

If your sort does not have the -h option you could use an (albeit very long) awk command like the following:

find . -type f -size -1024k -exec ls -al {} \; | sort -k 5 -n | awk '{if ($5 > 1099511627776) {print $9,$5/1024/1024/1024/1024"T"} else if ($5 > 1073741824) {print $9,$5/1024/1024/1024"G"} else if ($5 > 1048576) {print $9,$5/1024/1024"M"} else if ($5 > 1024) {print $9,$5/1024"K"} else {print $9,$5"B"}}' | column -t

This will sort your output in bytes and then convert them to their human readable size afterward.

jesse_b
  • 37,005
-1

Would this work?

ls -l | awk '{if ($5<=1024) {print}}' | sort -k 5 -n | awk '{print $9"\t"substr($5/1024,1,3)"k"} '| column -t

The first awk exp will look for the files lesser than 1M and the second one will take the byte size from the result and convert it to the KB and prints the first 3 elements to give a human-readable size.

  • That does not really solve OPs question - it only looks in the current directory and will only print regular files. Also will compare against 1Kb instead of 1MB. Finally we are after answers with some explanation about why the code works. – grochmal Jun 13 '19 at 21:53
  • My bad added what it does. – Vignesh SP Jun 13 '19 at 22:16