awk gensub regex matching group between square brackets

Question

Input from record $0:

-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ

Output into /1 with gensub please:
```
(T)-8.5(o)-3.2(p)-15.3(ik)
```

Please don't use those online testers for awk as the syntax and features vary a lot (See Why does my regular expression work in X but not in Y?) ... can you add what is your exact output required? also, does /\[([^]]+)]TJ/ solve your issue? — Sundeep, Sep 21 '20 at 15:18

Ed Morton · Answer 1 · 2020-09-21T16:09:39.453

2

$ echo '-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ' |
    awk '{print gensub(/.*\[([^]]+)]TJ/,"\\1",1)}'
(T)-8.5(o)-3.2(p)-15.3(ik)

Web sites like regex101 are practically useless for figuring out regexps to use in command line tools as they don't adequately account for regexp versions (BRE, ERE, or PCRE) and/or delimiters any given tool uses and/or whether the tool supports backreferences in the regexp and/or matching text and/or whether the given version of the given tool has any private extensions, and/or any options the tool might have to affect it's behavior wrt regexps, etc.

edited Sep 21 '20 at 16:09

answered Sep 21 '20 at 15:04

Ed Morton

31,617

I don't want: -0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]
I only want: [(T)-8.5(o)-3.2(p)-15.3(ik)]
– andtoe Sep 21 '20 at 15:09
That's not what you show in your question under Operated string of str. If that's not your expected output then edit your question to clearly show the output you expect given the input you provided. – Ed Morton Sep 21 '20 at 15:11
"Operated string of str" shows the actual output of the operation, but that is not what I want. If you would read my question thoroughly, you would understand what I am asking for. No offence. Please read my question thoroughly. – andtoe Sep 21 '20 at 15:13

Sundeep · Answer 2 · 2020-09-21T15:46:39.510

2

$ s='-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ'

$ # if you want to delete []TJ
$ echo "$s" | awk '{print gensub(/\[([^]]+)]TJ/, "\\1", "g")}'
-0.005 Tc 0.005 Tw (T)-8.5(o)-3.2(p)-15.3(ik)

$ # if you just want the portion inside []TJ
$ echo "$s" | awk 'match($0, /\[([^]]+)]TJ/, a){s = a[1]; print s}'
(T)-8.5(o)-3.2(p)-15.3(ik)

GNU awk supports third argument for match method, which makes it easy to extract capture groups. The first element of array will have the entire match. Second element will contain portion matched by first group, third element will contain portion matched by second group and so on.

edited Sep 21 '20 at 15:46

answered Sep 21 '20 at 15:31

Sundeep

12,008

Thank you! It works with a[1].
Just for information. I tried a[0] and it showed with the left bracket [ and the right bracket included ]TJ. Why is that? From intuition the achieved match should be stored in a[0]?
– andtoe Sep 21 '20 at 15:43
2

@andtoe the most common behavior I've seen across different regex implementations is that 0 has entire match, 1 has first capture portion, 2 has second capture portion and so on – Sundeep Sep 21 '20 at 15:47
Last Question. Can awk be given an option or something else to specify a "mode" of a regular expression standard to be used?
Or the other way around: What regular expression standard is used by awk by default?
– andtoe Sep 21 '20 at 15:47
From GNU awk manual: "The regular expressions in awk are a superset of the POSIX specification for Extended Regular Expressions (EREs). POSIX EREs are based on the regular expressions accepted by the traditional egrep utility." – Sundeep Sep 21 '20 at 15:49
@andtoe If you found the answer useful, please consider accepting it so that others facing a similar issue may find it more easily. – AdminBee Sep 21 '20 at 16:12

awk gensub regex matching group between square brackets

2 Answers2