Purge lines in a file based on age in shell script

Question

I wanted to record the slave delay in mysql server every minute,so prepared a simple script.

#!/bin/bash
echo "`date` : `mysql -uroot -pXXXXXX -e"show slave status\G" | grep -i seconds_behind_master | awk '{$1=$1};1'`"

Sample output would be like:

Wed Jul  1 11:40:17 +0530 2020 : Seconds_Behind_Master: NULL
Wed Jul  1 11:41:17 +0530 2020 : Seconds_Behind_Master: NULL
Thu Jul  2 11:42:17 +0530 2020 : Seconds_Behind_Master: 0
Fri Jul  3 11:40:17 +0530 2020 : Seconds_Behind_Master: 1
.
.
.
Fri Jul  31 4:40:17 +0530 2020 : Seconds_Behind_Master: 10

Now in that same script , is there a way i could include a purging logic to purge records older than 30 days based on the date field. I am using centos 8

Thanks in advance.

Welcome to the site. The script line you use to generate the entries seems rather involved; if you could paste the output of the mysql call, maybe contributors could help you make it more efficient? Also, may I recommend using the $( ... )-style notation for command substitutions instead of the backticks, as these are considered deprecated? — AdminBee, Jul 01 '20 at 07:47
@AdminBee Thanks .. will follow the formatting guidelines in my next post.As far the mysql output, i am only interested in seconds_behind_master which i greped out. I dont want my file to grow huge which will be not be of any use. Looking forward to help on how to approach purging — udhayan dharmalingam, Jul 01 '20 at 10:51

Lesmana · Answer 1 · 2022-05-05T07:21:21.117

cutoff=$( date -d "30 days ago" "+%s" )

while read -r line ; do
  timestamp=$( date -d"$( echo $line | cut -d: -f1,2,3 )" "+%s" )
  if [ $timestamp -gt $cutoff ] ; then
    printf -- '%s\n' "$line"
  fi
done

use like this

$ purge.sh < data > newdata

explanation:

first get the timestamp of 30 days ago in epoch format. then parse the timestamps from the input and print them in the epoch format. then compare the parsed timestamps against the 30 days ago timestamp and print only those lines that are newer.

the epoch format is the entire timestamp in one number. the number represents the amount of seconds passed since 1.1.1970. nothing special about that date it is just the convention everyone agreed upon. typically integer but can have a fractional part if more precision than seconds is needed. the fact that it is just a number makes it easy to do time comparison.

see here for more info on epoch: https://en.wikipedia.org/wiki/Unix_time

some details

date -d "30 days ago" "+%s"

date is cool like that it can parse human readable expressions.

the "+%s" is the argument for date so that it outputs the epoch format.

cut -d: -f1,2,3

the cut command cuts the first three columns from input using colon as separator. this is necessary because the time format you used contains spaces and colons AND you reused colon as a column separator. this can be drastically simplified when using a better date time format. more on that later.

[ $timestamp -gt $cutoff ]

this is bash speak for timestamp greater than cutoff

printf -- '%s\n' "$line"

this is just a convoluted but robust way to say echo $line

about the better timeformat

to make your life (and the life of your colleagues) easier i suggest you write your timestamps using the iso format

date -Iseconds

the seconds means you want precision up to seconds. which is usually fine enough.

compare

$ date -Iseconds
2022-05-04T21:30:23+02:00
$ date
Mi 4. Mai 21:30:24 CEST 2022

advantages in short: it has no spaces so it is one "word" for most text parsing tools. it is easily sortable. it is still human readable. it has no locale dependent strings (name of day and month).

if you write your timestamps using iso format the purge code can be simplifed to this

while read -r isotimestamp rest ; do
  timestamp=$( date -d"$isotimestamp" "+%s" )
  if [ $timestamp -gt $cutoff ] ; then
    printf -- '%s %s\n' "$isotimestamp" "$rest"
  fi
done

now instead of extra cut we can use read that splits the first "word" from the rest of the line.

another approach

it would be easier and faster to just keep the last X lines of the file. for example if your system regularly produces max two line per day then just keep the last 60 lines.

tail -n 60 data > newdata

of course this only works if you get about the same amount of lines per day. if you sometimes have over 9000 and sometimes just two lines per day then this approach will not work.

Purge lines in a file based on age in shell script

1 Answers1