0

As part of disk clean-up, I am now required to delete all folders and its content created in the last 3 days (not the current business date):

Example - today's date is 20191121, then:

/opt/png/wsm/data/workdir/batch/*20191120*
/opt/png/wsm/data/workdir/batch/*20191119*
/opt/png/wsm/data/workdir/batch/*20191118*

All I can do now is get the list:

ls -d */ | grep 20191118 (and then 19 and 20)

How I can check how much size I can save by deleting a particular dated folder(s) & its content?

bash version on the server is:

bash-3.2$ bash --version
GNU bash, version 3.2.57(1)-release (i386-pc-solaris2.10)
Copyright (C) 2007 Free Software Foundation, Inc.
Loke12k
  • 11
  • What Solaris are you running and do you have GNU tools available (Solaris 11.4 has, but I'm not sure what version you have; Solaris 11.4 also has bash 4.4.19). – Kusalananda Nov 23 '19 at 14:39

5 Answers5

0

When cleaning files and folders, you might want to use the du command.

du lists drive space usage, has a --time option, and can be used for directories only. Combining them:

du -h */ --time | grep -E '2019\-09\-(18|19|20)'

This will grab any subdirectory which has had its contents last created or modified on november 18th, 19th or 20th. It'll output the size (-h human readable), modification date, and finally the path.

Pudding
  • 161
  • bash-3.2$ du -h */ --time | grep -E '2019-09-(18|19|20)' grep: illegal option -- E Usage: grep -hblcnsviw pattern file . . . – Loke12k Nov 22 '19 at 09:29
  • Apparently we have different versions of grep.. Can you use egrep instead? or just grep? I need to specify the -E option to get regular expression matching working (to match on either 18, 19 or 20) – Pudding Nov 22 '19 at 09:34
  • removing -E from grep. but not getting consolidated result : du -h / --time | grep '2019-09-(18|19|20)' bash-3.2$ du -h / --time | grep '2019-11-(18|19|20)' bash-3.2$ du -h */ --time | grep '20191117' 44M BLOOMBERG.DIF.equity_option_open_us.20191117.001 76M BLOOMBERG.OUT.gics.20191117.001 0K FUNDRUN.ADHOC1.20191117_220401/acbl_out 0K FUNDRUN.ADHOC1.20191117_220401/amdpr_out/partial 1K FUNDRUN.ADHOC1.20191117_220401/amdpr_out/done 3K FUNDRUN.ADHOC1.20191117_220401/amdpr_out 5.4M FUNDRUN.ADHOC1.20191117_220401 0K sc_run_bbps/20191117 – Loke12k Nov 22 '19 at 09:58
  • this command is working du -ah */ --time | grep '2019111[7-9]' but the it produces a huge list of directories rather than giving total size. – Loke12k Nov 22 '19 at 10:04
  • rm command will fail if number of directories and files is over 5000. Use find instead (refer to my answer below) – Nathael Pajani Nov 22 '19 at 10:05
  • @Loke12k Since the fairly commonly supported command grep -E does not work for you, could you please update the text in the question with information about what type of Unix environment you are using? It is obviously not a recent Linux, nor macOS or any of the BSDs... – Kusalananda Nov 22 '19 at 11:28
  • @Kusalananda bash-3.2$ bash --version GNU bash, version 3.2.57(1)-release (i386-pc-solaris2.10) Copyright (C) 2007 Free Software Foundation, Inc. – Loke12k Nov 23 '19 at 12:04
0

On the 22nd of November 2019, the following code will create a string in $pattern that will be 20191121|20191120|20191119. It does this using one call to GNU date to get the dates, and then concatenates the results with | as a delimiter. Note that the default date utility on Solaris can not be used like this, which is why we use gdate (GNU date's name on Solaris, by default).

readarray -t dates < <(
cat <<END_DATE_INPUT | gdate -f - +'%Y%m%d'
1 day ago
2 days ago
3 days ago
END_DATE_INPUT
)

pattern=$( IFS='|';  printf '%s' "${dates[*]}" )

This can be used as an extended shell globbing pattern in bash to delete the wanted directories based on their names, and also to show their sizes (with du):

shopt -s extglob
du -s -h /opt/png/wsm/data/workdir/batch/*@($pattern)*/
#rm -r -f /opt/png/wsm/data/workdir/batch/*@($pattern)*/

If you have many thousands of these directories, use a loop:

shopt -s extglob
for dirpath in /opt/png/wsm/data/workdir/batch/*@($pattern)*/
do
    du -s -h "$dirpath"
    #rm -r -f "$dirpath"
done

You may well want to test this with rm replaced by echo before running it "live".


If you have access to the zsh shell (which you do have on Solaris), and you're happy with working by the last-modified timestamp on the directories (i.e. the time at which something was most recently added or deleted in the directory):

du -s -h /opt/png/wsm/data/workdir/batch/*(/m-3m+0)

(and similarly for rm).

The glob qualifier (/m-3m+0) makes the preceding pattern only match directories that were modified within the last three days, but more than a day ago. Note that this does not use the filenames of the directories.

Kusalananda
  • 333,661
  • rm command will fail if number of directories and files is over 5000. Use find instead (refer to my answer below) – Nathael Pajani Nov 22 '19 at 10:05
  • @NathaelPajani No find call necessary. See update. – Kusalananda Nov 22 '19 at 10:18
  • You're right if the goal is to remove the whole directory, which seems to be the case. But why trying to avoid find ? – Nathael Pajani Nov 23 '19 at 14:23
  • 1
    @NathaelPajani I'm not avoiding find. It's just not necessary to use it. We know exactly where the directories are, and how to match them with a globbing pattern. We just need to figure out what the dates should be that goes into the pattern. – Kusalananda Nov 23 '19 at 14:25
0

Previous answer will fail if the directories you try to rm contain more than 5000 files. This will leave you with old undeleted directories.

Use find instead with "-exec rm {} \+" which will call rm one or more times with a maximum of 5000 args for each call.

In order to find files/directories modified, use find

find . -type d -atime n  # for exact number of days
find . -type d -atime +n  # for greater than n days

You can add -daystart option to measure times from the beginning of today rather than from 24 hour ago

Then, use either "-exec rm {} \+" to remove or "-exec du -sh {} \;" to get disk usage of each directory.

Also consider -maxdepth option for find + du calls to limit find to directories at top level.

Refer to "man find" for more information (from terminal (best) or from man7.org to get an up-to-date man page: http://man7.org/linux/man-pages/man1/find.1.html, but not from Die.net (they even strip the page timestamps so you don't know their man pages or years old and outdated)).

  • the below 2 not producing any result: bash-3.2$ find . -type d -atime 3 bash-3.2$ find . -type d -atime +3 ; however when i try -3 ti give me list of all folders including today's – Loke12k Nov 22 '19 at 10:08
  • 1
    Note that if you base the deletion on the access timestamp, just doing an ls in a directory may update this. You also have to be in the correct directory to run your given code, or you will delete the wrong things. If you delete directories with find and rm, you may want to add -depth to do a depth-first traversal, or use -prune to not enter the found directories. You may also want to restict the deletion to avoid deleting directories access before three days age. – Kusalananda Nov 22 '19 at 10:20
  • @Loke12k : -3 is "less than three days", +3 is "more than three days" and 3 is "exactly three days". What you are looking for seems to be +1 (more than one day). – Nathael Pajani Nov 23 '19 at 14:07
  • @Kusalananda : you're right about access time. A solution would be to use a filesystem mounted with noatime option then access time will not be a problem. – Nathael Pajani Nov 23 '19 at 14:09
  • @Loke12k : I did not provide an exact solution as I do not have access to your exact configuration. I only pointed to some tracks so you can find a solution and learn at the same time. If you need exact solution, I can provide paid support :) – Nathael Pajani Nov 23 '19 at 14:15
  • 1
    On Solaris (which the user is using), the GNU find command is called gfind and is documented in man gfind. The benefit of looking at a manual on your own system with man is that you get the manual for the actual utility that is available on your system, not some random variant that may be available on some other person's machine. – Kusalananda Nov 24 '19 at 19:40
0

Then a much shorter and (from my point of view) more readable syntax for @Kusalananda solution, and using date instead of gdate (though maybe gdate is a date equivalent on solaris ?):

days=$(echo -e "1 day ago \n 2 days ago \n 3 days ago" | date -f - +'%Y%m%d')
dpath="/opt/png/wsm/data/workdir/batch"
for day in $days ; do
    du -s -h "$dpath/$day"
    rm -r -f "$dpath/$day"
done
0

If your Solaris system has bash, then it will typically have zsh as well.

Since that date format happens to sort like numbers, you could use zsh's <start-end> decimal number range glob operator:

zsh -c '
  zmodload zsh/datetime
  now=$EPOCHSECONDS day=$(( 24 * 60 * 60 ))
  for var t (start $(( now - 3*day )) end $(( now - 1*day )) )
    strftime -s $var %Y%m%d $t
  range="<$start-$end>"
  rm -rf -- *$~range*(/)
'

Beware that if run on the wrong day of the year in the middle of the night when switching from/to summer time, subtracting 24*60*60 seconds from the current time might land you on the same day or 2 days ago.

To get the cumulative disk usage of those directories replace rm -rf with du -c. Whether that's the amount of space that will be reclaimed after you remove them depends on whether any files in those directories have hardlinks elsewhere or not.