2

I am trying to use awk to replace the last occurrence of a period in the first field with a semicolon. The field separator is also a semicolon.

I tested (\.)(?!.*\1) as a regex on regex101.com and it correctly highlights the last occurrence of a period when I supply "a.b.c.mp3" as input.

I have tried the following in awk:

awk 'BEGIN{FS=OFS=";"} {gsub(/(\.)(?!.*\1)/, ";", $1)} 1'

It does not replace anything.

I would appreciate anyone that can help with this.

Alex
  • 23
  • 2
  • Share your regex101 link / give sample input, expected output – Gilles Quénot Mar 05 '23 at 20:54
  • 3
    It looks like you are trying to use a Perl-style (PCRE) lookahead assertion - AFAIK awk only supports Extended Regular Expressions (ERE). See Why does my regular expression work in X but not in Y? – steeldriver Mar 05 '23 at 21:05
  • Verifying a regexp on regex101.com or any other online site just proves it does what you expect on that site, you can't assume it'll work with any given command-line tool. – Ed Morton Mar 05 '23 at 21:34
  • Welcome to Unix & Linux Stack Exchange! The site that you link to does not support POSIX Regular Expressions but a few other dialects of regular expressions used in specific programming languages. This means that expressions that the site says do the right thing may not work as expected (or at all) with standard Unix tools like sed, awk, or grep. – Kusalananda Mar 06 '23 at 06:47

2 Answers2

3

AFAIK, no implementation of awk supports PCRE lookarounds like (?!re).

In GNU awk (aka gawk), using the gensub function, you could greedily capture everything before a period, and backsubstitute it in the replacement:

$ echo 'foo.bar.baz;something;else' | 
    gawk 'BEGIN{OFS=FS=";"} {$1 = gensub(/(.*)\./,"\\1;","1",$1)} 1'
foo.bar;baz;something;else

Portably, you could instead use the match function, again with a greedy match, then pick out the substrings before and after the period:

$ echo 'foo.bar.baz;something;else' | 
    mawk 'BEGIN{OFS=FS=";"} match($1,/.*\./){$1 = substr($1,1,RLENGTH-1) ";" substr($1,RLENGTH+1)} 1'
foo.bar;baz;something;else

With GNU awk you could (again non-portably) use match with capture and backsubstitution via its optional array argument:

$ echo 'foo.bar.baz;something;else' | 
    gawk 'BEGIN{OFS=FS=";"} match($1,/(.*)\.(.*)/,a){$1 = a[1] ";" a[2]} 1'
foo.bar;baz;something;else

Since lookahead is perl-compatible, you could of course use perl (although perhaps without the capture of and backreference to \., which in any case seems overkill):

$ echo 'foo.bar.baz;something;else' | 
    perl -F';' -pe '$_ = join ";", $F[0] =~ s/\.(?!.*\.)/;/r, @F[1..$#F]'
foo.bar;baz;something;else

Miller has awk-like sub and gsub that, like GNU awk's gensub, support captures and backreferences:

$ echo 'foo.bar.baz;something;else' | 
    mlr --nidx --fs ';' put '$1 = sub($1,"(.*)\.","\1;")'
foo.bar;baz;something;else

It doesn't currently support lookarounds so far as I know.

steeldriver
  • 81,074
2

How about sed? In your case, you are lucky to work on field 1:

sed 's/\.\([^.]*;\)/;\1/'
FelixJN
  • 13,566
  • 1
    That assumes there's always at least one . in the first field. If not, something like sed 's/^\([^;]*\)\.\([^.]*;\)/\1;\2/' would be needed. – Sundeep Mar 06 '23 at 14:02