Using any awk in any shell on every Unix box:
$ echo "some comment char '\;' embedded in strings ; along with inline comments" |
awk -F';' '{gsub(/\\\\/,RS); gsub(/\\;/,"\\\\"); gsub(/\\\\/,";",$1); gsub(RS,"\\",$1); print $1}'
some comment char ';' embedded in strings
and borrowing @Stéphane's sample input file:
$ cat file
foo\;bar;baz
foo\\;bar;baz
$ awk -F';' '{gsub(/\\\\/,RS); gsub(/\\;/,"\\\\"); gsub(/\\\\/,";",$1); gsub(RS,"\\",$1); print $1}' file
foo;bar
foo\
and extending that to include a line with more fields:
$ cat file
foo\;bar;baz
foo\\;bar;baz
foo\\;bar\;this\;that\\;baz;here\;and\;there
we can print any or all of the fields as we like (here also outputting the original line first and the field number at the start of each output line that contains a single field):
$ awk -F';' '{print; gsub(/\\\\/,RS) gsub(/\\;/,"\\\\"); for (i=1; i<=NF; i++) { gsub(/\\\\/,";",$i); gsub(RS,"\\",$i); print " " i, $i }; print "---" }' file
foo\;bar;baz
1 foo;bar
2 baz
---
foo\\;bar;baz
1 foo\
2 bar
3 baz
---
foo\\;bar\;this\;that\\;baz;here\;and\;there
1 foo\
2 bar;this;that\
3 baz
4 here;and;there
The above:
- converts every
\\ in the current input line ($0) into a newline (the default value of RS), which is a string that cannot exist within a newline-separated records, so we can handle \\; in the input as an escaped backslash rather than an escaped semi-colon, then
- converts every
\; in $0 into \\, which is also now a string that cannot exist in $0 since we just converted them all to RSs, to get rid of the troublesome ; in it, then
- the act of modifying
$0 causes awk to resplit $0 into fields at every remaining ; which puts our desired target string in $1, then
- we convert every
\\ (created at step 2 above) in $1 to ;, then
- convert every
RS (created at step 1 above) in $1 back to \\, then
- we print that field,
$1
That approach will work for every RS that is a literal string as defined by POSIX, if your RS is a regexp as supported by some awks, e.g. GNU awk, then come up with a string without regexp metachars that matches that regexp to use as the replacement instead of RS
cutas opposed to other more versatile tools likesed,awk,perl,python, etc. Tools liketror at mostgrepare fine. – Chris Jul 21 '23 at 20:30cutsimply isn't that fancy. If you tell it that;is your delimiter, then every;counts; there is no escaping. – larsks Jul 21 '23 at 20:36awkwould've been more readable but you can use grep's PCRE as follows:.... |grep -oP '.*(?=(;.*?))'and get the result you want – Valentin Bajrami Jul 21 '23 at 21:13grep -Po '.*?(?=(?<!\\);)'although I think perhaps plain perlperl -F'(?<!\\);' -lne 'print $F[0]'is clearer – steeldriver Jul 21 '23 at 21:16\\;? Can you be sure there will be no quoted backslash? If not then you have to actually parse the string which would be ugly. – Hauke Laging Jul 21 '23 at 21:32-f 1to-f 2. Letcutcount all the semicolons, adjust your expectations. – waltinator Jul 21 '23 at 23:11cut -f1-2, which will fetch the first two fields, and results insome comment char '\;' embedded in strings. – larsks Jul 21 '23 at 23:41