Using any awk in any shell on every Unix box:
$ echo "some comment char '\;' embedded in strings ; along with inline comments" |
awk -F';' '{gsub(/\\\\/,RS); gsub(/\\;/,"\\\\"); gsub(/\\\\/,";",$1); gsub(RS,"\\",$1); print $1}'
some comment char ';' embedded in strings
and borrowing @Stéphane's sample input file:
$ cat file
foo\;bar;baz
foo\\;bar;baz
$ awk -F';' '{gsub(/\\\\/,RS); gsub(/\\;/,"\\\\"); gsub(/\\\\/,";",$1); gsub(RS,"\\",$1); print $1}' file
foo;bar
foo\
and extending that to include a line with more fields:
$ cat file
foo\;bar;baz
foo\\;bar;baz
foo\\;bar\;this\;that\\;baz;here\;and\;there
we can print any or all of the fields as we like (here also outputting the original line first and the field number at the start of each output line that contains a single field):
$ awk -F';' '{print; gsub(/\\\\/,RS) gsub(/\\;/,"\\\\"); for (i=1; i<=NF; i++) { gsub(/\\\\/,";",$i); gsub(RS,"\\",$i); print " " i, $i }; print "---" }' file
foo\;bar;baz
1 foo;bar
2 baz
---
foo\\;bar;baz
1 foo\
2 bar
3 baz
---
foo\\;bar\;this\;that\\;baz;here\;and\;there
1 foo\
2 bar;this;that\
3 baz
4 here;and;there
The above:
- converts every
\\
in the current input line ($0
) into a newline (the default value of RS
), which is a string that cannot exist within a newline-separated records, so we can handle \\;
in the input as an escaped backslash rather than an escaped semi-colon, then
- converts every
\;
in $0
into \\
, which is also now a string that cannot exist in $0 since we just converted them all to RS
s, to get rid of the troublesome ;
in it, then
- the act of modifying
$0
causes awk to resplit $0
into fields at every remaining ;
which puts our desired target string in $1
, then
- we convert every
\\
(created at step 2 above) in $1
to ;
, then
- convert every
RS
(created at step 1 above) in $1
back to \\
, then
- we print that field,
$1
That approach will work for every RS
that is a literal string as defined by POSIX, if your RS
is a regexp as supported by some awks, e.g. GNU awk, then come up with a string without regexp metachars that matches that regexp to use as the replacement instead of RS
cut
as opposed to other more versatile tools likesed
,awk
,perl
,python
, etc. Tools liketr
or at mostgrep
are fine. – Chris Jul 21 '23 at 20:30cut
simply isn't that fancy. If you tell it that;
is your delimiter, then every;
counts; there is no escaping. – larsks Jul 21 '23 at 20:36awk
would've been more readable but you can use grep's PCRE as follows:.... |grep -oP '.*(?=(;.*?))'
and get the result you want – Valentin Bajrami Jul 21 '23 at 21:13grep -Po '.*?(?=(?<!\\);)'
although I think perhaps plain perlperl -F'(?<!\\);' -lne 'print $F[0]'
is clearer – steeldriver Jul 21 '23 at 21:16\\;
? Can you be sure there will be no quoted backslash? If not then you have to actually parse the string which would be ugly. – Hauke Laging Jul 21 '23 at 21:32-f 1
to-f 2
. Letcut
count all the semicolons, adjust your expectations. – waltinator Jul 21 '23 at 23:11cut -f1-2
, which will fetch the first two fields, and results insome comment char '\;' embedded in strings
. – larsks Jul 21 '23 at 23:41