2

Hi I'm trying to work out how to select a line in a text file from a pattern/word string, and then replace that line with new text with an incrementing number for each line changed.

I was planning on using sed, but have no idea how to implement the increasing number for every line changed.

The aim, before and after:

Text before:

">text_1_lots of other bits of text"   
other lines of text   
">text_2_lots of other bits of text"    
other lines of text   
">text_3_lots of other bits of text"   
other lines of text  

Text after:

">text_1"   
other lines of text   
">text_2"    
other lines of text   
">text_3"   
other lines of text  

This is what I have so far:

sed -i "s/>text_*/>text_[0-n]/g"   

I realise this doesn't work as the [0-n] and wildcard * are just representing what I'm trying to do, and are the closest things that I can think of to achieve that (even though I understand this is essentially pseudo code at the moment)

Kusalananda
  • 333,661
Giles
  • 897
  • yes, this is because i figured it would be easier to remove the line and replace it, than to just delete the rest of that specific line after the <text_n . however if there is a simpler code to carry out this deletion then I would much prefer to use that. although would also still like to know how to do the above just out of curiosity. – Giles Jul 08 '16 at 21:05
  • 1
    Are you trying to remove whatever follows >text_[number_here] (so keep the number but remove the rest ? – don_crissti Jul 08 '16 at 21:10
  • 1
    You can easily "trim" the lines while preserving the original numbering e.g. sed -E 's/(>text_[0-9]{1,}).*/\1/' file – steeldriver Jul 08 '16 at 21:13
  • ideally yes, but only remove the rest of the text of each specific line. I had a quick look into it, and the best i could find was a find and replace type method with sed. – Giles Jul 08 '16 at 21:14
  • What should be the output in case you have lines beginning: text 2, text 1, text 3, text 3, text 5? The behavio[u]r in this case is different for all the answers. – Law29 Jul 08 '16 at 22:10

4 Answers4

4

perl can do this with a similar syntax to that of sed, but allowing straightforward evaluation of the replacement index e.g.

perl -pe 's/>text_.*/sprintf "text_%d", ++$n/pe' file

See also Replace string with sequential index.

However since in your case the text is already numbered, it's easier just to trim the unwanted portion by capturing and resubstituting it - for example

sed -E 's/(>text_[0-9]+).*/\1/' file
steeldriver
  • 81,074
3

Based on your attempt with sed, it looks like the patten you're trying to match is ">text_ and you want to append a number and a " after that

This is do-able with awk.

awk 'BEGIN {cnt=1} /^">text_/ { gsub("_.*$","_"cnt++"\"",$0) } { print}'

e.g.

$ cat x
">text_lots of other bits of text"
other lines of text
">text_lots of other bits of text"
other lines of text
">text_lots of other bits of text"
other lines of text

$ awk 'BEGIN {cnt=1} /^">text_/ { gsub("_.*$","_"cnt++"\"",$0) } { print}' x
">text_1"
other lines of text
">text_2"
other lines of text
">text_3"
other lines of text

You can change the search pattern ^"text_ to identify the lines you want to change, and the gsub() call will do the replacement; in this case from the first _ to the end of the line is replaced with _ then the count then a "

  • this works nicely, i realise now from above comments that there defintily is a simpler method to achieve my aim. however this does answer the question i asked, and has helped my understanding to do more complex text editing in the future (and of course worked nicely too). so much appreciated Stephen. – Giles Jul 08 '16 at 21:17
2

I think the easiest would be to use bash or perl. Simple bash example that should get you going with your probably more complicated problem:

$ cat script 
#!/bin/bash
i=1
while read a ; do
    if [[ "$a" =~ "\">text_${i}".* ]]
    then echo "\">text_${i}\"" ; i=$((i+1))
    else echo "$a"
    fi
done
$ cat input 
">text_1_lots of other bits of text"
other lines of text
">text_2_lots of other bits of text"
other lines of text
">text_3_lots of other bits of text"
other lines of text
$ cat input | bash script 
">text_1"
other lines of text
">text_2"
other lines of text
">text_3"
other lines of text
Law29
  • 1,156
0

Using Raku (formerly known as Perl_6)

~$ raku -pe 'state $i; s/^ \" \> text_ .* /"text_{++$i}"/;'  file

OR:

~$ raku -pe 'state $i; s/^ \" \> text_ .* /{sprintf "\"text_%d\"", ++$i}/;'  file

Raku is a programming language in the Perl-family. The second answer above is basically a translation of @steeldriver's excellent Perl5 code.

Briefly, Raku's -pe sed-like autoprinting commandline flags are used. Non <alnum> characters in Raku regexes must be escaped, however regex atoms can be spread out as they are whitespace tolerant (equal to Perl's \x). A counter variable $i is stated, meaning it is only instantiated once (can use a BEGIN block here instead, e.g. BEGIN my $i;). The replacement should be self-explanatory: Raku code is interplolated within {} curly-braces.

Sample Input:

">text_A_lots of other bits of text"   
other lines of text   
">text_B_lots of other bits of text"    
other lines of text   
">text_C_lots of other bits of text"   
other lines of text  

Sample Output:

"text_1"
other lines of text   
"text_2"
other lines of text   
"text_3"
other lines of text 

https://docs.raku.org/language/regexes
https://docs.raku.org
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17