1

I have a text file as such:

Attribute 1.............. : attribute value
Encode Date............................. : JUL 2007
Attribute 22076.......... : attribute value`

I want to extract the JUL 2007 segment but only when it is preceded by Encode Date as JUL 2007 might appear elsewhere in the file.

This regex below works when tested at regexr.com with the PHP interpreter with global and multi-line mode enabled:

(?<=Encode Date............................. : ).*$

But running this command gives me no output, what am I missing?

cat file.txt | awk '/(?<=Encode Date............................. : ).*$/{print $0}'

Mr. T
  • 109

1 Answers1

4

awk supports POSIX extended regular expressions (ERE). What you are trying to use is a Perl-compatible regular expression (PCRE). There is no (?<=...) ("look-behind assertion") in EREs.

To get the encoding date from the input, consider

awk -F ':' '$1 ~ /^Encode Date/ { sub("^ ", "", $2); print $2 }' file

This treats each line as :-delimited fields. It picks out the line whose first such field starts with the string Encode Date and removes the space at the start of the second field on that line before printing it.

With sed, it would be slightly shorter:

sed -n '/^Encode Date/s/.*: //p' file

This locates the correct line, then deletes everything up to and including the : and the immediately following space, and output the modified line.

Or, with an equivalent sed operation that tries to modify every line and prints the ones that it modifies successfully,

sed -n 's/^Encode Date.*: //p' file

If you want to print the whole line (like your last command seems to want to do), then use

awk '/^Encode Date/' file

or,

sed -n '/^Encode Date/p' file

or,

grep '^Encode Date' file

Related:

Kusalananda
  • 333,661
  • I guess one could also synthesize a simple lookbehind using the match function as match($0,/PATTERN/) > 0 {print substr($0,RSTART+RLENGTH)} – steeldriver Mar 16 '19 at 23:19
  • @steeldriver Yes, but I wanted to avoid matching each individual dot in the data. They would need escaping, unless you do a string match with index() and use the search string's length of course... But it's much too fiddly for what they appear to want to do. – Kusalananda Mar 16 '19 at 23:27
  • @Kusalananda fair enough - I meant it more as a general comment than specifically for the OP's requirement – steeldriver Mar 16 '19 at 23:28