cutoff=$( date -d "30 days ago" "+%s" )
while read -r line ; do
timestamp=$( date -d"$( echo $line | cut -d: -f1,2,3 )" "+%s" )
if [ $timestamp -gt $cutoff ] ; then
printf -- '%s\n' "$line"
fi
done
use like this
$ purge.sh < data > newdata
explanation:
first get the timestamp of 30 days ago in epoch format. then parse the timestamps from the input and print them in the epoch format. then compare the parsed timestamps against the 30 days ago timestamp and print only those lines that are newer.
the epoch format is the entire timestamp in one number. the number represents the amount of seconds passed since 1.1.1970. nothing special about that date it is just the convention everyone agreed upon. typically integer but can have a fractional part if more precision than seconds is needed. the fact that it is just a number makes it easy to do time comparison.
see here for more info on epoch: https://en.wikipedia.org/wiki/Unix_time
some details
date -d "30 days ago" "+%s"
date
is cool like that it can parse human readable expressions.
the "+%s"
is the argument for date so that it outputs the epoch format.
cut -d: -f1,2,3
the cut
command cuts the first three columns from input using colon as separator. this is necessary because the time format you used contains spaces and colons AND you reused colon as a column separator. this can be drastically simplified when using a better date time format. more on that later.
[ $timestamp -gt $cutoff ]
this is bash speak for timestamp greater than cutoff
printf -- '%s\n' "$line"
this is just a convoluted but robust way to say echo $line
about the better timeformat
to make your life (and the life of your colleagues) easier i suggest you write your timestamps using the iso format
date -Iseconds
the seconds
means you want precision up to seconds
. which is usually fine enough.
compare
$ date -Iseconds
2022-05-04T21:30:23+02:00
$ date
Mi 4. Mai 21:30:24 CEST 2022
advantages in short: it has no spaces so it is one "word" for most text parsing tools. it is easily sortable. it is still human readable. it has no locale dependent strings (name of day and month).
if you write your timestamps using iso format the purge code can be simplifed to this
while read -r isotimestamp rest ; do
timestamp=$( date -d"$isotimestamp" "+%s" )
if [ $timestamp -gt $cutoff ] ; then
printf -- '%s %s\n' "$isotimestamp" "$rest"
fi
done
now instead of extra cut
we can use read
that splits the first "word" from the rest of the line.
another approach
it would be easier and faster to just keep the last X lines of the file. for example if your system regularly produces max two line per day then just keep the last 60 lines.
tail -n 60 data > newdata
of course this only works if you get about the same amount of lines per day. if you sometimes have over 9000 and sometimes just two lines per day then this approach will not work.
mysql
call, maybe contributors could help you make it more efficient? Also, may I recommend using the$( ... )
-style notation for command substitutions instead of the backticks, as these are considered deprecated? – AdminBee Jul 01 '20 at 07:47