4

Need help in deleting files from unix directory which are older then 90 days, but need to retain files which belongs to end of the month date. (ex: 28th Feb 2022, 31th march, 30th April ) for Ex:

I have files in directory as: /usr/home :

  1. ABC.txt.20220529 2022-05-30
  2. ABC.txt.20220530 2022-05-31
  3. ABC.txt.20220531 2022-06-01
  4. ABC.txt.20220601 2022-06-02

and if I run my script on the 91st day from 1st June, it should delete 1 and should not delete 2/3/4 need a script either in shell script or python.

AK143
  • 41
  • 2
    How are you working with the date of each file? Are you considering the access date, change date or birth/created date? Or are you working directly with the filenames? – Edgar Magallon Sep 15 '22 at 06:01
  • 2
    In case you are working with the filenames, what is the correct format of these one? ABC.txt.20220529 2022-05-30 or ABC.txt.20220529? – Edgar Magallon Sep 15 '22 at 06:03

3 Answers3

3

Assuming that's the date at the end of the file name you want to consider (the 20220531 in ABC.txt.20220531), in zsh, you could do:

#! /bin/zsh -
zmodload zsh/datetime
day=86400
strftime -s range '<19700101-%Y%m%d>' $(( EPOCHSECONDS - 91 * day ))
not_last() {
  local t
  TZ=UTC0 strftime -rs t %Y%m%d $REPLY:e &&
    TZ=UTC0 strftime -s t %d $(( t + day )) &&
    (( t != 1 ))
}
echo rm -f -- **/*.txt.$~range(-.+not_last)

If it's the last modification time:

#! /bin/zsh -
zmodload zsh/datetime
zmodload zsh/stat
day=86400
not_last() {
  local t
  stat -A t +mtime -- $REPLY &&
    strftime -s t %d $(( t + day )) &&
    (( t != 1 ))
}
echo rm -f -- **/*.txt.*(-.m+90+not_last)

Bearing in mind that if there was a DST change on the first or last day of the month, there's a tiny chance it could throw off the next day calculation.

The type and mtime of the file is considered after symlink resolution. If you want to ignore symlinks, remove the - glob qualifier. Add the D qualifier to also consider hidden files. Remove the **/ if you don't want to consider files in subdirectories.

Remove the echo (dry-run) if happy with the result.

Note that m+90, like find's -mtime +90 selects files that are 91 day old or older, change to m+89 for files that are 90 day old or older.

  • Liked this answer! It's very useful for zsh shells. I would like if you can provide an example by using GNU date according to your answer here: https://unix.stackexchange.com/a/223546/414186. Or it's harder implementing in that way perhaps? – Edgar Magallon Sep 15 '22 at 06:51
  • @EdgarMagallon, there's nothing stopping you calling zsh from other shells. Actually, those in my answer are written as standalone scripts which you can call from any shell or non-shell. – Stéphane Chazelas Sep 15 '22 at 07:53
  • That's right, if the user has zsh installed then they won't have problems. On MacOS and some Linux distros have zsh as the default shell. – Edgar Magallon Sep 15 '22 at 18:44
2

Last day of the month are the ones ending in 0131, 0331, 0430... 1231 and 0229 for bissextile years and 0228 on other years.

With GNU date and a shell with support for zsh-style {x..y} brace expansion, you can get the list of the last day in February from 1970 to 2099 for instance with:

printf '%s\n' {1970..2099}'-03-01 -1 day' | date -uf- +%Y%m%d

So you could construct an extended regexp that matches those dates with:

regexp=$(
  {
    printf '%s\n' {0{1,3,5,7,8},10,12}31 {04,06,09,11}30
    printf '%s\n' {1970..2099}'-03-01 -1 day' | date -uf- '+%Y%m%d'
  } | paste -sd '|' -
)

So with GNU date and an awk implementation that supports NUL as Record Separator:

LC_ALL=C find . -name '*.txt.[0-9]*' -print0 |
  LC_ALL=C awk -F. -v 'RS=\0' \
                   -v 'ORS=\0' \
                   -v regexp="($regexp)\$" \
                   -v cutoff="$(date -d '90 days ago' +%Y%m%d)" '
    /txt\.[0-9]{8}$/ && $NF < cutoff && $NF !~ regexp' |
  xargs -r0 echo rm -f

Or to match on the last modification time of the files as opposed to the date at the end of their name, with the GNU implementation of find:

LC_ALL=C find . -regextype posix-extended \
                -regex '.*\.txt\.[0-9]{8}' \
                -mtime +90 \
                -printf '%TY%Tm%Td-%p\0' |
  LC_ALL=C awk -v 'RS=\0' \
               -v 'ORS=\0' \
               -v regexp="^[0-9]*($regexp)-" '
    $0 !~ regexp {print substr($0, 10)}' |
  xargs -r0 echo rm -f

You could use the same approach to construct a zsh glob pattern that matches those.

#! /bin/zsh -
zmodload zsh/datetime
set -o extendedglob
day=86400

start=19700101 strftime -s end %Y%m%d $(( EPOCHSECONDS - 91 * day )) range="<$start-$end>"

list=({0{1,3,5,7,8},10,12}31 {04,06,09,11}30)

for ((y = $start[1,4]; y <= $end[1,4]; y++)) { TZ=UTC0 strftime -rs t %Y%m%d ${y}0301 && TZ=UTC0 strftime -s d %Y%m%d $(( t - day )) && list+=($d) } endofmonth="*(${(j[|])list})"

echo rm -f -- */.txt.($~range~$~endofmonth)

2

I like the other answers, but I have a simpler solution. Also, the original question assumes that the last-of-month (date) file is always there. But we all know that you don't always have the snapshot from the last of the month.

I'm modifying the problem a bit and answering that;

  1. keep last file for each month that is there, not necessarily the 31st, 30th, 28th
  2. keep files that are 0-90 days old
  3. remove files older than 90 days, but not if they are the last for a month.

I don't care about the list of files in the example, because the approach is important. You can adjust the input if you understand the approach.

Given a random list of days:

function dates() { echo 2022-12-{06..12}  2022-{01,03,05}-{00..31} 2022-02-{00..28} 2022-{04,06}-{01..30}   2022-12-{01..06} 2022-10-{01..03}| tr ' ' \\n;  }

In this list all days are present in months 1-6, only 12 days are listed in December and only 3 days in October.

To find the last days present in this list for each month, we will sort the list in ascending order and then remember every "oldest" date for each month. This gives the last of month.

$ dates | sort \
  | awk -F- '{ lom[$1$2]=$1"-"$2"-"$3 } END { for (i in lom) { print lom[i]} }' \
  | tee /tmp/lom  
2022-01-31
2022-02-28
2022-03-31
2022-04-30
2022-05-31
2022-06-30
2022-10-03
2022-12-12

I don't care about the calendar dates in a real-life IT problem. I care about the files that are actually there. If the 12th is the last snapshot for December and there is no 31st, because the system was broken that day, then I want to keep the 12th.

So now we know what not to delete. The other part is older than 90 days:

dates | awk -v cutoff=$(date +%Y-%m-%d -d 'today -90 days') \
            '{ if ($1 < cutoff) { print $1 } }'  \
      | grep -v -f /tmp/lom

This will print dates that are over 90 days old and exclude the last of moth entries. Short and sweet. Perfect.

As the guru pointed out, the whole thing above can be done in one line. wow!

dates | sort -r | awk -v cutoff=$(date +%F -d '-90 days') -F- '$0 < cutoff && seen[$1$2]++'

The generated list of dates to purge contains this:

2022-01-00  2022-03-11  2022-05-22  2022-04-05
2022-01-01  2022-03-12  2022-05-23  2022-04-06
2022-01-02  2022-03-13  2022-05-24  2022-04-07
2022-01-03  2022-03-14  2022-05-25  2022-04-08
2022-01-04  2022-03-15  2022-05-26  2022-04-09
2022-01-05  2022-03-16  2022-05-27  2022-04-10
2022-01-06  2022-03-17  2022-05-28  2022-04-11
2022-01-07  2022-03-18  2022-05-29  2022-04-12
2022-01-08  2022-03-19  2022-05-30  2022-04-13
2022-01-09  2022-03-20  2022-02-00  2022-04-14
2022-01-10  2022-03-21  2022-02-01  2022-04-15
2022-01-11  2022-03-22  2022-02-02  2022-04-16
2022-01-12  2022-03-23  2022-02-03  2022-04-17
2022-01-13  2022-03-24  2022-02-04  2022-04-18
2022-01-14  2022-03-25  2022-02-05  2022-04-19
2022-01-15  2022-03-26  2022-02-06  2022-04-20
2022-01-16  2022-03-27  2022-02-07  2022-04-21
2022-01-17  2022-03-28  2022-02-08  2022-04-22
2022-01-18  2022-03-29  2022-02-09  2022-04-23
2022-01-19  2022-03-30  2022-02-10  2022-04-24
2022-01-20  2022-05-00  2022-02-11  2022-04-25
2022-01-21  2022-05-01  2022-02-12  2022-04-26
2022-01-22  2022-05-02  2022-02-13  2022-04-27
2022-01-23  2022-05-03  2022-02-14  2022-04-28
2022-01-24  2022-05-04  2022-02-15  2022-04-29
2022-01-25  2022-05-05  2022-02-16  2022-06-00
2022-01-26  2022-05-06  2022-02-17  2022-06-01
2022-01-27  2022-05-07  2022-02-18  2022-06-02
2022-01-28  2022-05-08  2022-02-19  2022-06-03
2022-01-29  2022-05-09  2022-02-20  2022-06-04
2022-01-30  2022-05-10  2022-02-21  2022-06-05
2022-03-00  2022-05-11  2022-02-22  2022-06-06
2022-03-01  2022-05-12  2022-02-23  2022-06-07
2022-03-02  2022-05-13  2022-02-24  2022-06-08
2022-03-03  2022-05-14  2022-02-25  2022-06-09
2022-03-04  2022-05-15  2022-02-26  2022-06-10
2022-03-05  2022-05-16  2022-02-27  2022-06-11
2022-03-06  2022-05-17  2022-04-00  2022-06-12
2022-03-07  2022-05-18  2022-04-01  2022-06-13
2022-03-08  2022-05-19  2022-04-02  2022-06-14
2022-03-09  2022-05-20  2022-04-03  2022-06-15
2022-03-10  2022-05-21  2022-04-04  2022-06-16
  • 1
    See dates | sort -r | sort -t- -uk1,2 to get the latest date for each month and comm to do list subtractions (though the whole thing could be done with dates | sort -r | awk -F- '$0 < cutoff && seen[$1$2]++') – Stéphane Chazelas Sep 15 '22 at 18:51
  • 1
    Note that %Y-%m-%d can also be written %F (see also date --iso-8601) – Stéphane Chazelas Sep 15 '22 at 18:55