- Input from record
$0
:-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ
- Output into
/1
withgensub
please:(T)-8.5(o)-3.2(p)-15.3(ik)

- 1
2 Answers
$ echo '-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ' |
awk '{print gensub(/.*\[([^]]+)]TJ/,"\\1",1)}'
(T)-8.5(o)-3.2(p)-15.3(ik)
Web sites like regex101 are practically useless for figuring out regexps to use in command line tools as they don't adequately account for regexp versions (BRE, ERE, or PCRE) and/or delimiters any given tool uses and/or whether the tool supports backreferences in the regexp and/or matching text and/or whether the given version of the given tool has any private extensions, and/or any options the tool might have to affect it's behavior wrt regexps, etc.

- 31,617
-
I don't want: -0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]
I only want: [(T)-8.5(o)-3.2(p)-15.3(ik)]
– andtoe Sep 21 '20 at 15:09 -
That's not what you show in your question under
Operated string of str
. If that's not your expected output then edit your question to clearly show the output you expect given the input you provided. – Ed Morton Sep 21 '20 at 15:11 -
"Operated string of str" shows the actual output of the operation, but that is not what I want. If you would read my question thoroughly, you would understand what I am asking for. No offence. Please read my question thoroughly. – andtoe Sep 21 '20 at 15:13
$ s='-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ'
$ # if you want to delete []TJ
$ echo "$s" | awk '{print gensub(/\[([^]]+)]TJ/, "\\1", "g")}'
-0.005 Tc 0.005 Tw (T)-8.5(o)-3.2(p)-15.3(ik)
$ # if you just want the portion inside []TJ
$ echo "$s" | awk 'match($0, /\[([^]]+)]TJ/, a){s = a[1]; print s}'
(T)-8.5(o)-3.2(p)-15.3(ik)
GNU awk
supports third argument for match
method, which makes it easy to extract capture groups. The first element of array will have the entire match. Second element will contain portion matched by first group, third element will contain portion matched by second group and so on.

- 12,008
-
Thank you! It works with
a[1]
.Just for information. I tried
– andtoe Sep 21 '20 at 15:43a[0]
and it showed with the left bracket[
and the right bracket included]TJ
. Why is that? From intuition the achieved match should be stored ina[0]
? -
2@andtoe the most common behavior I've seen across different regex implementations is that
0
has entire match,1
has first capture portion,2
has second capture portion and so on – Sundeep Sep 21 '20 at 15:47 -
Last Question. Can
awk
be given an option or something else to specify a "mode" of a regular expression standard to be used?Or the other way around: What regular expression standard is used by
– andtoe Sep 21 '20 at 15:47awk
by default? -
From GNU awk manual:
"The regular expressions in awk are a superset of the POSIX specification for Extended Regular Expressions (EREs). POSIX EREs are based on the regular expressions accepted by the traditional egrep utility."
– Sundeep Sep 21 '20 at 15:49 -
@andtoe If you found the answer useful, please consider accepting it so that others facing a similar issue may find it more easily. – AdminBee Sep 21 '20 at 16:12
awk
as the syntax and features vary a lot (See Why does my regular expression work in X but not in Y?) ... can you add what is your exact output required? also, does/\[([^]]+)]TJ/
solve your issue? – Sundeep Sep 21 '20 at 15:18