Print everything before nth delimiter

Question

I'm looking for a sed solution like the one here: Print everything after nth delimiter

sed -E 's/^([^\]*[\]){3}//' infile

but to extract the text before nth delimiter, instead of nth delimiter like the example. Something that works with all sed variants. And operate on all lines like the example.

The delimiter in this example is \ but could be for any other. Should work with any version of sed.

Please [edit] your question and i) tell us your operating system or at least which sed implementation you are using, ii) show us an example input file and iii) the output you expect. We cannot help you parse data you don't show. You mention a delimiter, what is that delimiter? Should we guess you have \ as the delmiter? — terdon, Nov 15 '23 at 13:02
@terdon the delimiter is clearly the backslash as the link shows, but could use any other. Unless stated, should work with any version of sed. — Smeterlink, Nov 15 '23 at 13:31
Regarding "The delimiter in this example is \ but could be for any other. Should work with any version of sed." - There are no possible solutions that would work for any delimiter in any version of sed. Reduce the scope of your question and provide sample input and expected output if you'd like a solution that works portably for your actual needs. — Ed Morton, Nov 15 '23 at 21:55
@EdMorton the solution provided by "don_crissti" works perfectly with both GNU sed and BSD sed: sed -E 's/(^([^\]*[\]){3}[^\]*).*/\1/' infile. You can replace "" with a letter like "e" or a number and keeps running: sed -E 's/(^([^e]*[e]){3}[^e]*).*/\1/' infile. So you are not willing to put the effort to elaborate your answer dude. — Smeterlink, Nov 16 '23 at 06:53
I haven't posted an answer dude because there is no answer to the question you asked and so far you haven't done what I suggested in my comment which is to reduce the scope of your question. GNU and BSD sed aren't "any version of sed" and "a letter like e or a number" aren't "any other" delimiter. If you want a solution that'll work in GNU or BSD sed with any single letter or number as a delimiter then say THAT in your question as there obviously is a solution for THAT scope, but not for "The delimiter in this example is \ but could be for any other. Should work with any version of sed.". — Ed Morton, Nov 16 '23 at 12:15
Try any sed answer you get if the delimiter was ^, abc (or any other multi-char string), /, (or whatever is used to delimit the regexp in the script), or ' (or whatever is used to delimit the whole script), for example - they'll all fail in different ways given some of those "any other" delimiter values, no matter which sed version you use. — Ed Morton, Nov 16 '23 at 12:33

score 5 · Accepted Answer · answered Nov 14 '23 at 19:41

Why don't you use cut?

cut -d '\' -f 1-3 infile

With sed, instead of deleting the match, capture it and use a back-reference to replace the whole line with the captured group:

sed -E 's/(^([^\]*[\]){3}).*/\1/' infile

Though that would also print the trailing backslash... To avoid that you could run

sed -E 's/(^([^\]*[\]){2}[^\]*).*/\1/' infile

Prabhjot Singh · Answer 2 · 2023-11-17T13:49:27.817

Using awk:

$ awk -v var=3 'BEGIN{FS=OFS="\\"}
(NF>=var){ split($0,arr,OFS); 
$0=""; 
for (i=1; i<=var; ++i) $(NF+1)=arr[i];
print}'

To keep n^th delimiter the following command may be used.

$ awk -v var=3 'BEGIN{FS=OFS="\\"} 
(NF>=var){ for (i=1; i<=var; ++i) printf "%s%s", $i, OFS; print ""}'
$ nawk '(match($0, /^([^\\]*[\]){3}/)) 
{ print substr($0,RSTART,RLENGTH)}'

With GNU awk:

The following command uses back-referencing a captured group. This is an awk command taken from this answer. Thanks to @don_crissti

$ awk -F "\\" -v col=3 '(NF>=col){print gensub(/(^([^\\]*[\\]){3}).*/, "\\1", "g")}'

Stéphane Chazelas · Answer 3 · 2023-11-15T13:29:42.817

2

You could replace the n^th delimiter with newline (which cannot otherwise occur in the pattern space) and then delete everything starting with that newline. Here for n == 3:

sed 's/delim/\
/3; P; d'

Or if the n^th delimiter must be retained in the output:

sed 's/delim/&\
/3; P; d'

To skip the lines that don't have n delimiters:

sed -n 's/delim/\
/3; t1
d; :1
P'

edited Nov 15 '23 at 13:29

answered Nov 14 '23 at 20:10

Stéphane Chazelas

544,893

1

This one has the advantage (over mine) that it works with multi-char delimiters... I think the 1st and 2nd could be golfed shorter with P;d instead of s/\n.*// – don_crissti Nov 15 '23 at 11:16
@don_crissti, thanks. I'll add that in as it would address the potential problems with .* choking on non-characters. – Stéphane Chazelas Nov 15 '23 at 13:26

jhnc · Answer 4 · 2023-11-15T00:04:17.147

2

a shorter awk:

awk NF=3 FS='\\' OFS='\\'

define input and output field separators
set number of fields to keep

edited Nov 15 '23 at 00:04

answered Nov 14 '23 at 23:52

jhnc

255

1

What happens when you change NF is undefined behavior per POSIX so don't count on it doing any one thing across all awk variants. In most awks increasing NF will create empty fields at the the end (same as $NF=$7 to increase NF to 7 which IS portable to all awks so there's no point doing NF=7), in some awks decreasing it will remove fields from the end, in other awks decreasing it will do nothing. There's no portable way to decrease NF other than sub() or match() to remove some number of fields. – Ed Morton Nov 15 '23 at 21:49
@EdMorton good point. freebsd awk is such an example of doing nothing, but even there awk 'NF=3,$3=$3' ... seems to work. do you know any other implementations that don't truncate? – jhnc Nov 16 '23 at 05:12
no, sorry, I'd want to check /usr/xpg[46]/bin/awk and nawk on Solaris, tawk, mawk1 (I expect mawk2 behaves like gawk as they share a lot of code), and busybox awk as a start and there's also mks awk and the awk variants written in Go (goawk) and Rust (frawk) but I don't have access to any of those. – Ed Morton Nov 16 '23 at 12:00
1

Rather than having to know the behavior (which could change by release) of all the awk variants in my answers I try to always say either a) "using any awk" if my script uses all POSIX constructs except RE intervals and character classes, or b) "any POSIX awk" if it also uses those, or c) "using GNU awk or other that supports ..." and state the extensions/POSIX-undefined-behavior otherwise and then it's up to the user to determine if it'd also work in their awk if it's doesn't fit that description. For your script I'd say "using GNU awk or other that supports truncating fields by assigning NF". – Ed Morton Nov 16 '23 at 12:03
1

I exclude RE intervals and character classes from my "any awk" category as when I used to use Solaris we had nawk and /usr/xpg4/bin/awk and one of them supported RE intervals while the other supported character classes but neither supported both, I think I've heard of other awks that don't support both (maybe mawk1?), and older versions of gawk didn't support RE intervals unless you added the --posix or --re-intervals options. Oh, and I also exclude a script that does gsub(/^"|"$/,"") from "any awk" as that fails in mawk1 and tawk. – Ed Morton Nov 16 '23 at 12:08

Print everything before nth delimiter

4 Answers4