remove lines that is newer than the given date in a file

Question

I'm stuck on how I can go about deleting lines that are newer than given date. Here is a snippet of contents of a file.

buildsave.txt

647919 2013/11/30
647946 2013/11/30
647955 2013/12/01
648266 2013/12/03
648267 2013/12/03
648674 2013/12/04

I would like to remove the lines that are newer than 2013/12/03 leaving only

647919 2013/11/30
647946 2013/11/30
647955 2013/12/01

How can this be done through bash?

score 9 · Answer 1 · answered Jun 05 '14 at 06:50

9

Those dates sort the same lexicographically and chronologically, so it's only a matter of doing a lexical comparison:

awk '$2 < "2013/12/03"'

answered Jun 05 '14 at 06:50

Stéphane Chazelas

544,893

steeldriver · Accepted Answer · 2014-06-05T12:03:30.803

5

If your system includes the GNU version of the date command, you could use that to convert the date field (after stripping the trailing <br>, if present) to seconds-since-epoch and compare directly to the cutoff date in the same format, e.g. in bash

testsecs=$(date +%s --date="2013/12/03")
while IFS= read -r line; do
  read -r x d <<< "$line" 
  if (( $(date +%s --date="${d%<br>}") < $testsecs )); then
    printf '%s\n' "$line"
  fi
done < buildsave.txt

[Note that this doesn't perform an in-place deletion - you'd need to save the results to a temporary file and rename.]

edited Jun 05 '14 at 12:03

answered Jun 04 '14 at 23:38

steeldriver

81,074

You sir saved me a headache. This is exactly what I was looking for! – Jason G Jun 05 '14 at 00:22
yuck! Those dates sort the same lexicographically and chronologically, there's no need converting them to integer and run 5 commands, create one temp file and two pipes per line! – Stéphane Chazelas Jun 05 '14 at 06:50

score 2 · Answer 3 · edited Apr 13 '17 at 12:36

I assume the <br> in your question at the end of the date column is something unwanted for. In any case, it can be removed easily if it is present. However, coming to the main part you can achieve what you are trying to do using,

sort -k 2n filename.txt

Now, the above command would give the output in a sorted manner. Now, the below command should give what you look for.

sort -k 2n filename.txt | awk '/2013\/12\/03/ {exit} {print}'

Explanation

The sort command basically sorts the file based on the second column which is the date. So I modified your input file to test the command if it works since the input file has all the data sorted by default. After that, the awk command prints all the lines till we encounter a particular match.

Testing

cat filename.txt

647919 2014/01/01
647946 2012/11/30
647955 2011/01/04
648266 2013/12/03
648267 2013/12/03
648674 2013/12/04

Now, sort -k 2n filename.txt output is,

647955 2011/01/04
647946 2012/11/30
648266 2013/12/03
648267 2013/12/03
648674 2013/12/04
647919 2014/01/01

Now we are satisfied that the file is sorted on the second column. Now, to select values UPTO a particular date,

sort -k 2n filename.txt | awk '/2013\/12\/03/ {exit} {print}'

In the above example, I get all the values upto 2013/12/03. The output is,

647955 2011/01/04
647946 2012/11/30

No, the <br> is part of my file

If this is the case, we can tweak the command slightly as below.

awk '{print $1, substr($2, 1, length($2)-4)}' filename.txt | 
sort -k 2n filename.txt | awk '/2013\/12\/03/ {exit} {print}'

So I am just removing all the <br> tags from the second column and then piping the above mentioned command.

References

https://unix.stackexchange.com/a/11323/47538

https://unix.stackexchange.com/a/83069/47538

thank you for your input. this indeed work greatly however, the condition to exit doesn't always work when the specific date doesn't exist in the file. — Jason G, Jun 05 '14 at 00:25
no, the br tags seems to be added just to make the thing readable. They can't be seen in the first revision — Braiam, Jun 05 '14 at 11:12

score -1 · Answer 4 · edited May 23 '17 at 12:40

-1

Quick and dirty solution for the one date you have given, just delete all lines with sed, that match dates later than this date:

sed -i "" "#[0-9]* 2013/12/0[4-9]#d" testfile.txt
sed -i "" "#[0-9]* 2013/12/[123][0-9]#d" testfile.txt
sed -i "" "#[0-9]* 2014/[0-9][0-9]/[0-3][0-9]#d" testfile.txt

The -i "" is replacing directly inside the file and not creating a backup, but you could also pipe testfile through all 3 sed commands without the -i "".

Depending on your system (linux or mac) you can ommit the "" after -i and sometimes you need the -e parameter for the regular expressions. Gotta try what works for you.

Related question with further info on sed: https://stackoverflow.com/questions/5410757/

edited May 23 '17 at 12:40

Community

1

answered Jun 05 '14 at 02:03

toppy

1

# is the comment command in sed, so those won't do anything. Use sed '\#patter#d' if you want a different RE delimiter than /. The [0-9]* part is redundant without a ^ anchor. -e is only needed when you want to pass several expressions. linux is a kernel, mac is a computer brand, none have anything to do with sed. The distinction is between GNU sed and FreeBSD sed (which OS/X (as found on some macs) inherited). – Stéphane Chazelas Jun 05 '14 at 10:32

remove lines that is newer than the given date in a file

4 Answers4

Linked

Related