How to get quick result for top 10 largest directories

Question

I have a directory(mount point) of size 9T, and I would like to get each directory size, especially the once which has consumed more space. For this I am using below command and pushing result into a txt file from bash script.

du -hsx * | sort -rh | head -10

Which is taking more time than i expected, Even after several hours i am not able to get the result in multiple occasion.

I am trying this over a network and using mobgar VPN connection. Anything that can be improved here !

Are you trying to do this over a network or is this drive physically attached to the machine you are running the du command on? This will always be slow (you need to run all the du first and only then will you be able to sort and take the top 10, so the head won't make any difference to the time taken), but it shouldn't be that slow. I just ran du -sch on a directory of 121T and it finished in a few minutes. What can you tell us about this directory? Does it also contain mount points? — terdon, Feb 01 '21 at 17:53
@terdon It may be slow on a local drive too, if it's a physically spinning disk. I've got a local backup drive that is 6 TB, and really slow. Doing du on the files on it would take hours (even though I'm only using 2 TB of it). — Kusalananda, Feb 01 '21 at 21:01
@terdon I am trying this over a network and using mobgar VPN connection and I am actually running 'du' over a mount point. Tried with 'du -sch' but still no result after almost 20 min. — Sachin, Feb 02 '21 at 20:29
Please [edit] your question and include this information. Especially the fact that you are trying to do this on a remotely mounted directory. — terdon, Feb 03 '21 at 09:21
You might want to use some specialized tool, e.g. ncdu or baobab. But it will also be slow. You need to get the size of each file on the drive. Over the network. Over some VPN connection. What do you expect? Run it directly on the target server and it will be much faster, but probably still slow (but a lot faster), depending on the speed of the disk. — pLumo, Feb 03 '21 at 10:59
@pLumo I am not trying to get size of each file but trying to get each directory size so that i can prevent mount directory getting full, had been using VPN from sometime, but not this specific client (Bomgar). Any alternative that i can do ! — Sachin, Feb 03 '21 at 17:09
sure, but what do you think will du know the directory sizes? by going inside and scan each file for its size. For getting to know if the mount is full you might use df -h /path/to/mountpoint which is considerably faster — pLumo, Feb 03 '21 at 17:15
If a directory /mnt/some/dir contains only a single directory, huge, which is one of the largest directories on the mount point /mnt/some. What should be reported, /mnt/some/dir/huge, or /mnt/some/dir or both? — Kusalananda, Feb 04 '21 at 07:57

score 0 · Answer 1 · answered Feb 04 '21 at 06:00

If you want exact disk space then it will take time since your directories are huge. If you have to do this frequently, then it is a better idea to setup a cron job that will run du and create this file once a day (or any frequency according to your requirement and the frequency that these directories get updated)

This way you can quickly do sort operations on this file, on a lightly old snapshot.

kasa · Answer 2 · 2021-02-05T01:29:56.813

-1

Updated version.

First speed of reading will differ from SSD to HDD and the buffering and caching size.

So I am assuming that you have permission to access folders ( quote "i am not able to get the result in multiple occasion", for some folders you may need privilege to access, otherwise you may get this message "du: cannot read directory '/direc/tori/ss ': Permission denied")

Based on your

du -hsx * | sort -rh | head -10
h: print sizes in human readable format
s: --summarize you want the total size of directory,(you didn't ask for sub-directory, did you?)
x: you want to "skip directories on different file systems"
sort the result r: reverse and h: human numeric (you need h here because you use it there with du to distigush K,M,G)
head -10 : get the first top 10 of the sorted list

Is there another way beside du? yes, one of them is using ncdu (curses-based version of du)

Explain:

Examine the time (locally) BASED ON MY SYSTEM'S PERFORMANCE

GNU bash, version 5.1.4

For HDD 7200rpm that has 132G of data Assuming the right directory:

1- time du -hsx * | sort -rh | head -10
real    0m44.978s
user    0m2.432s
sys     0m13.183s

2- time du -B 1 --max-depth=1 | sort -rh | head -10

real    0m43.823s
user    0m2.269s
sys     0m12.879s

real : elapsed time user,and sys : CPU process time.

If you have big file (say iso file) will show up at the first du, but the second will show you only directories.

In my case without using head: this is the sizes using your command files and folders 38G,22G,20G,11G,9.6G,6.9G,5.9G,3.2G,3.2G,2.7G,781M,590M,301M,132M,12M,6.9M,6.7M,3.6M,3.5M,276K,224K,25K,4.0K,4.0K,4.0K,512,0

When I use mine (after converting blocks to K,M,G) 123GiB, 38GiB, 22GiB, 20GiB, 11GiB, 9.6GiB, 6.9GiB, 3.2GiB, 3.2GiB, 781MiB, 590MiB, 301MiB, 132MiB, 12MiB, 6.9MiB, 6.7MiB, 3.6MiB, 3.5MiB, 276KiB, 224KiB, 100KiB, 25KiB, 512B

Files didn't show up.

I don't think I need to proof that the SSD is way faster than the HDD.

ncdu 1.15.1 with no sort

real    0m43.550s
user    0m2.742s
sys     0m13.604s

Also ncdu has ssh option for remote connection check it out.

If you are using different shell, update your question and give more information based on the answers and comments you have read.

Finally I don't think the time difference between the two commands are significant, so you can save the result to text file overnight, and sort it later.

edited Feb 05 '21 at 01:29

answered Feb 01 '21 at 19:57

kasa

99

1

I removed the sudo, from your answer since that was not needed and should not be used unless required. Using $PWD unquoted is not a good idea (and isn't needed, du defaults to the current directory when you don't give it a target) because that will break if the path contains spaces. But most importantly, this will not do what the question asks for. You will just get the size of the parent directory, not every subdirectory as requested in the question. – terdon Feb 03 '21 at 09:24
I agree using sudo here is not necessary, but using it with du will not affect the answer nor will harm the system. Using $PWD, $(pwd), or "$PWD" depends on your shell and usage, I should have pointed it is just your directory. But most importantly, "get each directory size" the question didn't ask for SUB DIRECTORIES. I tried the command on disk that has around 12 folders and it worked as an answer. You can make comment but changing the others answer is not good practice, leave it and make comments. – kasa Feb 03 '21 at 19:56
@kasa Please see https://unix.stackexchange.com/help/editing – Kusalananda Feb 03 '21 at 22:16
1

In this case, adding sudo to the mix is not necessary. Implying that sudo is needed is not correct. For all we know, the user may not even have sudo on their system. Using $PWD unquoted is wrong unless you at the same time make it clear that you are assuming a shell that does not split nor glob the unquoted variables' value. You also seem to assume GNU tools, even though the user has not mentioned what Unix they are using. – Kusalananda Feb 03 '21 at 22:19
@Kusalananda I am not sure of what you mean by GNU tools, the question mentioned, it is bash script. Show me how will you will get the size of folder dr-------- that you don't have access to. I explain that $PWD is your current directory, you can change it. I droped the link of (man du) if more clarification needed based on need. if the size of subdirectories required change the depth. This command can't be used exactly with zsh some change should take place. if the answer is wrong, point it out. – kasa Feb 04 '21 at 04:51
1

The user in the question is using standard shell syntax that would work in any number of shells, and bash is not mentioned. If bash had been used, that would not neccesarily mean that their du implementation had the non-standard --max-depth option. You need to write down assumptions like these. – Kusalananda Feb 04 '21 at 06:44
1

My issue with $PWD is not its meaning but that you leave it unquoted. Doing so would, in most shells, necessitate setting $IFS to an empty string to avoid word splitting (in case the current directory's pathname contains a space, tab or newline), and using set -f to turn off filename globbing (in case the current directory's pathname contains a globbing character, like *). You could use $PWD unquoted with zsh, and the command may work (I don't see why not, assuming GNU du), but you don't mention zsh in your answer. – Kusalananda Feb 04 '21 at 06:44
1

See also https://unix.stackexchange.com/questions/68694 (about quoting). – Kusalananda Feb 04 '21 at 06:46
1

About directory permissions: There is no indication that the user in the question has issues relating to directory permissions. – Kusalananda Feb 04 '21 at 06:46
1

You reverted my changes to the options to cut. Options are introduced on the command line with dashes. Therefore, the option to get the first and second column of a tab-delimited text is either -f 1,2 or -f 1-2. If the data uses another delimiter, specify that with -d delim, where delim is a single character. Note that each instance of the delimiter in the data is counted by cut (as opposed to awk by default). – Kusalananda Feb 04 '21 at 06:57
Also describe why you are using --max-depth=1 (which would give you the sizes of the top-level and first-level subdirectories) and not e.g. --max-depth=0 (which would give you the size of all top-level directories, much like the standard -s option would do). It seems rather arbitrary. – Kusalananda Feb 04 '21 at 07:59
@ Kusalananda . Thank you. I am not here to argue, we can go back to the manual and our experience for what works with which. I should have asked for more clarification to the question. I will re edit my answer later. I have made the assumption people will not just copy and past, especially with the link I presented for du, but.... – kasa Feb 04 '21 at 14:57
For cut, yes set -f low-high, you can write it 1-2 or 1,2 or even 1-4,6. – kasa Feb 04 '21 at 15:29
Yes --max-depth=0 is the same as --summarize, yet you can use --max-depth to indicate the levels if the sub-directories usage required for who doesn't know. – kasa Feb 04 '21 at 15:45

score -1 · Answer 3 · edited Feb 02 '21 at 00:02

-1

You can use:
ls -ldpShR

-l is for the long version
-d show only directories
-p uses / on directories' names
-S sorts by size
-h means human readable
-R means recursively

edited Feb 02 '21 at 00:02

guntbert

1,637

answered Feb 01 '21 at 20:51

lima

1

1

You know that -d removes the recursiveness of -R, right? And how do you get the 10 largest directories out of this? And how is it quicker than the command given in the question? – Kusalananda Feb 01 '21 at 20:57
More to the point, this does not give you the size of the directories and it doesn't sort them by size. The number shown by ls is not the size of the directory on disk. Compare the output of ls -ldpSh /usr/* and du -sch /usr/*/ | sort -rh on your machine to see what I mean. – terdon Feb 03 '21 at 09:29

score -1 · Answer 4 · edited Feb 04 '21 at 06:59

-1

du --max-depth=1 * | sort -nr | head -n 10

This command will create a list with the size of the folders, ordering from the largest to the smaller, and will get the first 10 lines, assuming du from GNU coreutils was used.

edited Feb 04 '21 at 06:59

Kusalananda

333,661

answered Feb 03 '21 at 21:02

Franciscon Santos

601

Is there a reason you use --max-depth=1 here and not e.g. --max-depth=0 or the more standard -s? Can you explain the benefit of your pipeline to that of the one in the question. Note that the issue here is the speed at which the operation is performed. – Kusalananda Feb 04 '21 at 07:18

How to get quick result for top 10 largest directories

4 Answers4