-1

I need some help or advice wrt awk and its use of regular expressions. I have a data input file with an irregular structure. To parse this file correctly I need to recognize a line of the following form:

@ 8/1/17, 10:04 PM  

A line with this pattern marks the end of a complete transaction. It's simply a date & time stamp preceded by a space and the @ character.

I've cobbled a regular expression that seems to match in "most" usage:

\W\@\W[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}\,\W[0-9]{1,2}\:[0-9]{2}\W[AP]M  

However, it does not seem to match when used in the following awk statement:

$ awk 'match($0, /\W\@\W[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}\,\W[0-9]{1,2}\:[0-9]{2}\W[AP]M/) {print $0}' testfile2.txt

My system (macos mojave) has an old version of awk awk version 20070501.

I've also found:

  • grep -e fails to match this pattern to any line in testfile2.txt, but egrep and grep -E do match the lines I expected them to match.

  • awk 'match($0, /\@/) {print $0}' testfile2.txt does match (& print) the expected lines, but I can't rely on a single character!

Here's testfile2.txt:

+13054261988: Forwarding data to primary repository
@ 1/7/18, 4:21 PM
+16744774911: Use this URL: https://www.repo-prime.ga/
@ 1/7/18, 4:22 PM
+13054261988: Will do. Passwords OK?
@ 1/7/18, 6:12 PM
+16744774911: No, use 2FA for all transactions
@ 1/7/18, 8:56 PM
+13054261988: Using Google's authenticator?

If so, will need additional information.
@ 1/7/18, 9:36 PM
+13054261988: RSVP ASAP, I have transactions that need to be uploaded.
@ 1/7/18, 9:46 PM

Is my regular expression failing to match in awk usage due to an error I can't see in my awk statement, or is it due to the regex itself, a combination of both, etc?

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
Seamus
  • 2,925

2 Answers2

1
  • why strictly matching /\W (non-word character) before @ ? as in your text file @ is at start of the line
  • no need to escape the chars as \@, \,, : (they are not special chars)
  • calling match() is redundant if only need to match a pattern

$ awk '/^@ [0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}, [0-9]{1,2}:[0-9]{2} [AP]M/' file
@ 1/7/18, 4:21 PM
@ 1/7/18, 4:22 PM
@ 1/7/18, 6:12 PM
@ 1/7/18, 8:56 PM
@ 1/7/18, 9:36 PM
@ 1/7/18, 9:46 PM
  • I need this to work with the match stmt as it's part of a larger script. – Seamus Oct 31 '19 at 18:41
  • Doesn't work on macos: $ awk 'match($0, //^@ [0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2}, [0-9]{1,2}:[0-9]{2} [AP]M/'/) {print $0}' testfile2.txt -bash: syntax error near unexpected token `)' – Seamus Nov 01 '19 at 01:21
  • Also doesn't work on macos: $ awk '/^@ [0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2}, [0-9]{1,2}:[0-9]{2} [AP]M/' testfile2.txt gives no output at all – Seamus Nov 01 '19 at 01:21
1

Seems that very old versions of awk had not {…} capability.

This older regex syntax should match in any awk:

awk '/@ [0-9][0-9]?\/[0-9][0-9]?\/[0-9][0-9]?, [1-2]?[0-9]:[0-6][0-9] [AP]M/' file

If your awk could match bracket expressions like [[:blank:]], the regex could be made to be a little more flexible:

awk '/@[[:blank:]][0-9][0-9]?\/[0-9][0-9]?\/[0-9][0-9]?,[[:blank:]][1-2]?[0-9]:[0-6][0-9][[:blank:]][AP]M/' file

If matching one (or more) digits is good enough (I can't see why not), you can use the shorter regex:

awk '/@ [0-9]+\/[0-9]+\/[0-9]+, [1-2]?[0-9]:[0-6][0-9] [AP]M/' file

And you can add start ^ and end $ to make the regex quite more restrictive, if needed.

I am not using match for such a simple matching of a line, but the same regex would work perfectly fine with that function.