0

I have come across several similar questions, but the solutions seem to be failing me, and I believe it is because I have some junk data in my file that is breaking my attempt to filter by time stamp.

Here is an example file:

2020-12-12 12:10:00,3,4,2
2020-12-12 12:11:00,4,3,2
2020-12-12 12:12:00,4,3,2
2020-12-12 12:13:00,4,3,2
20-12-12 12:14:00,4,3,2
4,3,2
2020-12-12 12:16:00,4,3,1
-12-12 12:17:00,4,3,2
2020-12-12 12:18:00,5,3,2

I'd like to filter this file, creating a new file only containing valid time stamps (each line should start with a valid time stamp).

BegDate="2020-12-12 12:11:00"
EndDate="2020-12-12 12:16:00"

I've tried using sed

TimeWindow=$(sed -rne '/'$BegDate'/,/'$EndDate'/p' $MyFile)
echo $TimeWindow > NewFile.csv

and awk

awk -v from=$BegTime -v to=$EndTime '$1>=from && $1<=to' "$MyFile" > "NewFile.csv"

But both are failing

The desired result for "NewFile.csv" is

2020-12-12 12:11:00,4,3,2
2020-12-12 12:12:00,4,3,2
2020-12-12 12:13:00,4,3,2
2020-12-12 12:16:00,4,3,1
Vint
  • 107
  • 1
    This gives the output in your example, but I would look into the refined answers below. awk -F, -v from="$BegDate" -v to="$EndDate" '$1>=from && $1<=to' "$MyFile" – Kaffe Myers Mar 03 '22 at 19:34
  • I can't understand how you can compare dates with <,>,<= or >= with out translating it to an epoch timestamp. For doing that you need an OO programming languages. – DanieleGrassini Mar 03 '22 at 19:49
  • 2
    @DanieleGrassini, here <=/>= do lexicographical (string) comparisons, not number comparisons. And with that YYYY-MM-DD HH:MM:SS timestamp format, lexicographical order is the same as chronological order. – Stéphane Chazelas Mar 03 '22 at 21:10

1 Answers1

3

You can improve you awk approach by also checking that the first field follows a valid date format.

LC_ALL=C awk -F, -v from="$BegTime" -v to="$EndTime" '
  BEGIN {valid = "^" from "$"; gsub("[0-9]", "[0-9]", valid)}
  $1 ~ valid && $1 >= from && $1 <= to'

Above the regexp to validate dates is built from $BegTime by replacing digits in it with [0-9] (which in the C locale is the same as [0123456789]) and adding ^ at the start and $ at the end for anchoring at start and end.

You could refine the regexp if you also wanted to exclude invalid dates such as 9999-99-99 99:99:99 or 2022-02-29 00:00:00.

Make sure you quote all shell parameter expansions (see how your code matches one of the examples at Security implications of forgetting to quote a variable in bash/POSIX shells). Also note the discrepancy between BegTime and BegDate in your question.