Selection of line from pattern, replacement with new line and incrementing number for each change

Question

Hi I'm trying to work out how to select a line in a text file from a pattern/word string, and then replace that line with new text with an incrementing number for each line changed.

I was planning on using sed, but have no idea how to implement the increasing number for every line changed.

The aim, before and after:

Text before:

">text_1_lots of other bits of text"   
other lines of text   
">text_2_lots of other bits of text"    
other lines of text   
">text_3_lots of other bits of text"   
other lines of text

Text after:

">text_1"   
other lines of text   
">text_2"    
other lines of text   
">text_3"   
other lines of text

This is what I have so far:

sed -i "s/>text_*/>text_[0-n]/g"

I realise this doesn't work as the [0-n] and wildcard * are just representing what I'm trying to do, and are the closest things that I can think of to achieve that (even though I understand this is essentially pseudo code at the moment)

yes, this is because i figured it would be easier to remove the line and replace it, than to just delete the rest of that specific line after the <text_n . however if there is a simpler code to carry out this deletion then I would much prefer to use that. although would also still like to know how to do the above just out of curiosity. — Giles, Jul 08 '16 at 21:05
Are you trying to remove whatever follows >text_[number_here] (so keep the number but remove the rest ? — don_crissti, Jul 08 '16 at 21:10
You can easily "trim" the lines while preserving the original numbering e.g. sed -E 's/(>text_[0-9]{1,}).*/\1/' file — steeldriver, Jul 08 '16 at 21:13
ideally yes, but only remove the rest of the text of each specific line. I had a quick look into it, and the best i could find was a find and replace type method with sed. — Giles, Jul 08 '16 at 21:14
What should be the output in case you have lines beginning: text 2, text 1, text 3, text 3, text 5? The behavio[u]r in this case is different for all the answers. — Law29, Jul 08 '16 at 22:10

score 4 · Answer 1 · edited Jan 23 '24 at 07:09

perl can do this with a similar syntax to that of sed, but allowing straightforward evaluation of the replacement index e.g.

perl -pe 's/>text_.*/sprintf "text_%d", ++$n/pe' file

See also Replace string with sequential index.

However since in your case the text is already numbered, it's easier just to trim the unwanted portion by capturing and resubstituting it - for example

sed -E 's/(>text_[0-9]+).*/\1/' file

score 3 · Accepted Answer · answered Jul 08 '16 at 21:05

Based on your attempt with sed, it looks like the patten you're trying to match is ">text_ and you want to append a number and a " after that

This is do-able with awk.

awk 'BEGIN {cnt=1} /^">text_/ { gsub("_.*$","_"cnt++"\"",$0) } { print}'

e.g.

$ cat x
">text_lots of other bits of text"
other lines of text
">text_lots of other bits of text"
other lines of text
">text_lots of other bits of text"
other lines of text

$ awk 'BEGIN {cnt=1} /^">text_/ { gsub("_.*$","_"cnt++"\"",$0) } { print}' x
">text_1"
other lines of text
">text_2"
other lines of text
">text_3"
other lines of text

You can change the search pattern ^"text_ to identify the lines you want to change, and the gsub() call will do the replacement; in this case from the first _ to the end of the line is replaced with _ then the count then a "

this works nicely, i realise now from above comments that there defintily is a simpler method to achieve my aim. however this does answer the question i asked, and has helped my understanding to do more complex text editing in the future (and of course worked nicely too). so much appreciated Stephen. — Giles, Jul 08 '16 at 21:17

Law29 · Answer 3 · 2016-07-09T23:04:49.503

2

I think the easiest would be to use bash or perl. Simple bash example that should get you going with your probably more complicated problem:

$ cat script 
#!/bin/bash
i=1
while read a ; do
    if [[ "$a" =~ "\">text_${i}".* ]]
    then echo "\">text_${i}\"" ; i=$((i+1))
    else echo "$a"
    fi
done
$ cat input 
">text_1_lots of other bits of text"
other lines of text
">text_2_lots of other bits of text"
other lines of text
">text_3_lots of other bits of text"
other lines of text
$ cat input | bash script 
">text_1"
other lines of text
">text_2"
other lines of text
">text_3"
other lines of text

edited Jul 09 '16 at 23:04

answered Jul 08 '16 at 21:20

Law29

1,156

If you're using while..read you're doing it wrong. – don_crissti Jul 08 '16 at 21:24
1

@don_crissti I'll use any tool that does the job. – Law29 Jul 08 '16 at 21:26
See also Why is using a shell loop to process text considered bad practice? – Stéphane Chazelas Jan 23 '24 at 07:11

score 0 · Answer 4 · answered Jan 23 '24 at 06:57

Using Raku (formerly known as Perl_6)

~$ raku -pe 'state $i; s/^ \" \> text_ .* /"text_{++$i}"/;'  file

OR:

~$ raku -pe 'state $i; s/^ \" \> text_ .* /{sprintf "\"text_%d\"", ++$i}/;'  file

Raku is a programming language in the Perl-family. The second answer above is basically a translation of @steeldriver's excellent Perl5 code.

Briefly, Raku's -pe sed-like autoprinting commandline flags are used. Non <alnum> characters in Raku regexes must be escaped, however regex atoms can be spread out as they are whitespace tolerant (equal to Perl's \x). A counter variable $i is stated, meaning it is only instantiated once (can use a BEGIN block here instead, e.g. BEGIN my $i;). The replacement should be self-explanatory: Raku code is interplolated within {…} curly-braces.

Sample Input:

">text_A_lots of other bits of text"   
other lines of text   
">text_B_lots of other bits of text"    
other lines of text   
">text_C_lots of other bits of text"   
other lines of text

Sample Output:

"text_1"
other lines of text   
"text_2"
other lines of text   
"text_3"
other lines of text

https://docs.raku.org/language/regexes
https://docs.raku.org
https://raku.org

Selection of line from pattern, replacement with new line and incrementing number for each change

4 Answers4