calculate total used disk space by files older than 1000 days using find

Question

We have more than 4 years of data in our system. We need to move 2 years old files and directories in a new repository. Our requirement is needed to know how many TB of data from Jan 2017 to as of now 2.Exclude personal folder I tried to find command but couldn't work out.

find . -type f -mtime +1010 ! -path "./home/01_Personal Folder*" -printf '%s\n' \ 
| awk '{a+=$1;} END {printf "%.1f GB\n", a/2**30;}'

Possibly duplicated with: https://unix.stackexchange.com/questions/41550/find-the-total-size-of-certain-files-within-a-directory-branch — Paulo Tomé, Nov 20 '19 at 16:21
I think you should present your commands seperately, the find and the awk. Possibly duplicated? Yes but not in the sense "already answered". I think find -type f ... |awk and find -type d ... du {} both make sense — , Nov 20 '19 at 16:43
Oh from Jan 2017 to now, that would rather be -1000. Good trick with the 1000 and TEN! (besides the usual extra confusion and lack of info or clear language) — , Nov 20 '19 at 19:28

score 2 · Answer 1 · edited Nov 20 '19 at 17:35

2

Could be something similar to:

find . ! -path "./home/01_Personal Folder*" -type f -mtime +1000 -exec du -ch {} + | grep total

Answer based on: Find the total size of certain files within a directory branch

Explain shell command.

edited Nov 20 '19 at 17:35

Christopher

15,911

answered Nov 20 '19 at 16:42

Paulo Tomé

3,782

it's the folder to be excluded... – Nov 20 '19 at 16:46
1

I want to exclude /home/01_Personal Folder*" – Vijay Selvakumar Nov 20 '19 at 16:47
Added the exclude /home/01_Personal Folder*" option – Paulo Tomé Nov 20 '19 at 16:49
@PauloTomé You found that pipe here, and you say "could be ...". And you are "new". And two upvotes. But I tested the -exec du and soon found a problem. I put it at the top of my answer. I edited quite a bit. – Nov 20 '19 at 19:17

score 2 · Answer 2 · 2019-11-20T23:51:54.760

...
37M     total
29M     total
42M     total
43M     total
36M     total

real    0m1.271s
user    0m0.561s
sys     0m1.278s

Is what I get with:

time find ~/sda1 -type f -exec du -ch {} +|grep total

So now I need a tool to sum up the totals! (This is a -exec...+ overflow)

But with:

time find ~/sda1 -type f -printf "%s\n" | awk '{a+=$1;} END {print a;}'

10483650002

real    0m0.550s
user    0m0.251s
sys     0m0.349s

And:

]# time du ~/sda1 -sh
11G     /.../sda1

real    0m0.458s
user    0m0.116s
sys     0m0.340s

I get it nice and fast.

It seems inefficient to du each file, when find is stating them anyway, and can give the size for free. With find ... -exec du {} +, du is degraded to a calculator of -c "grand total".

There is of course some difference between file size (in bytes) and disk usage (in blocks).

Here just to show that the original `find ... -printf "%s\n" | awk '{...} END {...}' works:

]# find ~ -maxdepth 1 -printf "%s\n" | awk '{a+=$1;} END {print a;}'
1093990
]# find ~ -maxdepth 1 -printf "%s\n" | awk '{a+=$1;} END {printf "%x\n",a;}'
10b166

This is my first awk ever.

I tested this on ~ -maxdepth 1, and the round number struck me, and that "GB" thing at the end of OP, so I played around until I got 10**6 = 16x64K.

calculate total used disk space by files older than 1000 days using find

2 Answers2