9

I'm stuck on how I can go about deleting lines that are newer than given date. Here is a snippet of contents of a file.

buildsave.txt

647919 2013/11/30
647946 2013/11/30
647955 2013/12/01
648266 2013/12/03
648267 2013/12/03
648674 2013/12/04

I would like to remove the lines that are newer than 2013/12/03 leaving only

647919 2013/11/30
647946 2013/11/30
647955 2013/12/01

How can this be done through bash?

Braiam
  • 35,991
Jason G
  • 93

4 Answers4

9

Those dates sort the same lexicographically and chronologically, so it's only a matter of doing a lexical comparison:

awk '$2 < "2013/12/03"'
5

If your system includes the GNU version of the date command, you could use that to convert the date field (after stripping the trailing <br>, if present) to seconds-since-epoch and compare directly to the cutoff date in the same format, e.g. in bash

testsecs=$(date +%s --date="2013/12/03")
while IFS= read -r line; do
  read -r x d <<< "$line" 
  if (( $(date +%s --date="${d%<br>}") < $testsecs )); then
    printf '%s\n' "$line"
  fi
done < buildsave.txt

[Note that this doesn't perform an in-place deletion - you'd need to save the results to a temporary file and rename.]

steeldriver
  • 81,074
  • You sir saved me a headache. This is exactly what I was looking for! – Jason G Jun 05 '14 at 00:22
  • yuck! Those dates sort the same lexicographically and chronologically, there's no need converting them to integer and run 5 commands, create one temp file and two pipes per line! – Stéphane Chazelas Jun 05 '14 at 06:50
2

I assume the <br> in your question at the end of the date column is something unwanted for. In any case, it can be removed easily if it is present. However, coming to the main part you can achieve what you are trying to do using,

sort -k 2n filename.txt

Now, the above command would give the output in a sorted manner. Now, the below command should give what you look for.

sort -k 2n filename.txt | awk '/2013\/12\/03/ {exit} {print}' 

Explanation

The sort command basically sorts the file based on the second column which is the date. So I modified your input file to test the command if it works since the input file has all the data sorted by default. After that, the awk command prints all the lines till we encounter a particular match.

Testing

cat filename.txt

647919 2014/01/01
647946 2012/11/30
647955 2011/01/04
648266 2013/12/03
648267 2013/12/03
648674 2013/12/04

Now, sort -k 2n filename.txt output is,

647955 2011/01/04
647946 2012/11/30
648266 2013/12/03
648267 2013/12/03
648674 2013/12/04
647919 2014/01/01

Now we are satisfied that the file is sorted on the second column. Now, to select values UPTO a particular date,

sort -k 2n filename.txt | awk '/2013\/12\/03/ {exit} {print}' 

In the above example, I get all the values upto 2013/12/03. The output is,

647955 2011/01/04
647946 2012/11/30

No, the <br> is part of my file

If this is the case, we can tweak the command slightly as below.

awk '{print $1, substr($2, 1, length($2)-4)}' filename.txt | 
sort -k 2n filename.txt | awk '/2013\/12\/03/ {exit} {print}' 

So I am just removing all the <br> tags from the second column and then piping the above mentioned command.

References

https://unix.stackexchange.com/a/11323/47538

https://unix.stackexchange.com/a/83069/47538

Ramesh
  • 39,297
-1

Quick and dirty solution for the one date you have given, just delete all lines with sed, that match dates later than this date:

sed -i "" "#[0-9]* 2013/12/0[4-9]#d" testfile.txt
sed -i "" "#[0-9]* 2013/12/[123][0-9]#d" testfile.txt
sed -i "" "#[0-9]* 2014/[0-9][0-9]/[0-3][0-9]#d" testfile.txt

The -i "" is replacing directly inside the file and not creating a backup, but you could also pipe testfile through all 3 sed commands without the -i "".

Depending on your system (linux or mac) you can ommit the "" after -i and sometimes you need the -e parameter for the regular expressions. Gotta try what works for you.

Related question with further info on sed: https://stackoverflow.com/questions/5410757/

toppy
  • 1
  • # is the comment command in sed, so those won't do anything. Use sed '\#patter#d' if you want a different RE delimiter than /. The [0-9]* part is redundant without a ^ anchor. -e is only needed when you want to pass several expressions. linux is a kernel, mac is a computer brand, none have anything to do with sed. The distinction is between GNU sed and FreeBSD sed (which OS/X (as found on some macs) inherited). – Stéphane Chazelas Jun 05 '14 at 10:32