I need some help or advice wrt awk
and its use of regular expressions. I have a data input file with an irregular structure. To parse this file correctly I need to recognize a line of the following form:
@ 8/1/17, 10:04 PM
A line with this pattern marks the end of a complete transaction. It's simply a date & time stamp preceded by a space and the @
character.
I've cobbled a regular expression that seems to match in "most" usage:
\W\@\W[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}\,\W[0-9]{1,2}\:[0-9]{2}\W[AP]M
However, it does not seem to match when used in the following awk
statement:
$ awk 'match($0, /\W\@\W[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}\,\W[0-9]{1,2}\:[0-9]{2}\W[AP]M/) {print $0}' testfile2.txt
My system (macos mojave) has an old version of awk
awk version 20070501
.
I've also found:
grep -e
fails to match this pattern to any line intestfile2.txt
, butegrep
andgrep -E
do match the lines I expected them to match.awk 'match($0, /\@/) {print $0}' testfile2.txt
does match (& print) the expected lines, but I can't rely on a single character!
Here's testfile2.txt:
+13054261988: Forwarding data to primary repository
@ 1/7/18, 4:21 PM
+16744774911: Use this URL: https://www.repo-prime.ga/
@ 1/7/18, 4:22 PM
+13054261988: Will do. Passwords OK?
@ 1/7/18, 6:12 PM
+16744774911: No, use 2FA for all transactions
@ 1/7/18, 8:56 PM
+13054261988: Using Google's authenticator?If so, will need additional information.
@ 1/7/18, 9:36 PM
+13054261988: RSVP ASAP, I have transactions that need to be uploaded.
@ 1/7/18, 9:46 PM
Is my regular expression failing to match in awk
usage due to an error I can't see in my awk
statement, or is it due to the regex itself, a combination of both, etc?
egrep
andgrep -E
, but notgrep -e
on macos. Also works in 'BBEdit v 13'. – Seamus Oct 31 '19 at 18:54macos
has a different set of expressions than my Linux distro (where something similar worked). Actually, it seems thatmacos
may be inconsistent between (for example)awk
andegrep
/grep -E
. Making progress now! – Seamus Oct 31 '19 at 19:03awk '$1=="@" && $4 ~/^[AP]M$/'
orawk '/@.*\<[AP]M\>/'
could probably be enough... – JJoao Oct 31 '19 at 19:50