0

I have been trying but without much success with capture a word between two words and Get Word Between two underscores among many many others...

I want to find the newline before "##", this "##' is after '## baba' but not right after, there is some text between. They are many "##" in the file, always preceded by \n. See below schema:

Desired output

##

## baba {could also be "foo" or "bar"}

rosa rosa rosam rosae ipsum

{append or replace the '\n' before '\n##' with -> helloworld here}


##

##

Once it is found insert "helloworld" given as an argument to the script

My current script find

awk -i inplace -v foo=$2 -v new=$1'\n\n' 'f&&/^##/{print new; f=0} {print} /^## baba/{f=1}' a.md

I want two things: 1/ to replace baba with argument $2 (variable foo), 2/ to include the \n in ^## to have it one line above.

Thank you very much for any help


Edit : Thanks to Rudic I came up with:

a.sh

sed -re "/## $1/,/^\n\n##/ {s/^## *$/$2\n\n\n&/}" a.md

a.md

##


## baba

rosa rosa rosam rosae ipsum



##



##

command line

cat a.md && echo "---------------" && ./test.sh baba remember140416sewol

But output has 2 flaws, 1/ write for each match, I want only the first match, 2/ does not replace the new line before the other newline:

##


## baba

rosa rosa rosam rosae ipsum

{\n <-extra new line}
remember140416sewol


##


remember140416sewol {<-- extra occurence}


##

3 Answers3

1

Mayhap something along this line:

sed '/## *baba/,/^##/ {s/^## *$/helloworld\n&/}' file

or, if given as parameters,

sed "/## *$2/,/^##/ {s/^## *$/$1\n&/}" file
RudiC
  • 8,969
1

The following would take a pattern from the command line as well as a replacement text, and assign these to the awk variables pattern and text.

In the BEGIN block, I modify the pattern to include the regular expression ^##  at the start.

I then use a range expression to trigger a block of code that will execute for the given section in the document (the section starting with ##  followed by the thing that matches the original pattern, until the line that matches the expression ^##$).

If, within that block, the current line happens to match the expression ^##$, I print the hello world string given by the user, with two extra newlines added.

All lines of input are printed by the final { print } block.

If you want to use the positional parameters $1 (for the replacement text) and $2 (for the pattern), replace the baba below with $2 and hello world with $1. Likewise if you have any other two variables that holds the replacement text and pattern.

awk -v pattern="baba" -v text="hello world" '
    BEGIN { pattern = "^## " pattern }
    $0 ~ pattern,/^##$/ { if (/^##$/) print text "\n\n" }
    { print }' a.md

An alternative implementation that takes the pattern and text from two environment variables:

PATTERN="baba" TEXT="hello world" awk '
    BEGIN { pattern = "^## " ENVIRON["PATTERN"] }
    $0 ~ pattern,/^##$/ { if (/^##$/) print ENVIRON["TEXT"] "\n\n" }
    { print }' a.md

Given the document at the end of your question, this would generate

##

## baba

rosa rosa rosam rosae ipsum


hello world


##

##

Related to passing data by variables into awk:


As requested in comments, a script that would take two arguments, a pattern and a replacement string, or the two environment variables PATTERN and STRING:

#!/bin/sh

if [ "$#" -eq 0 ]; then
    # No arguments given.
    # Take pattern and string from environment.

    pattern=${PATTERN:?missing}
    string=${STRING:?missing}
else
    # Arguments given.
    # Take pattern and string from 1st and 2nd argument.

    pattern=${1:?argument 1 (pattern) missing}
    string=${2:?argument 2 (string) missing}
fi

# Either of the two `awk` commands from above would work,
# with $pattern and $string inserted in the appropriate
# command line arguments to awk:

awk -v pattern="$pattern" -v text="$string" '
    BEGIN { pattern = "^## " pattern }
    $0 ~ pattern,/^##$/ { if (/^##$/) print text "\n\n" }
    { print }' a.md

You would run this as either

./script.sh 'baba' 'hello world'

or as

export PATTERN='baba' STRING='hello world'
./script.sh

Failing to provide either two command line arguments, or the two environment variables would result in error messages and the awk code would not run at all.

Kusalananda
  • 333,661
  • Well that is nice of you but if you had full read my question you would have noticed that I have already done more than what you propose (I even made it generic for one of the two variables and added the line returns) – Antonin GAVREL Apr 17 '20 at 10:39
  • @AntoninGAVREL Do consider reading my answer again. The main difference between your code and mine seems to be the incorporation of the pattern given by the user. My code does that, your code does not. If this is not what the question is about, then please clarify so that I may improve on whatever it is my answer is lacking. – Kusalananda Apr 17 '20 at 10:49
  • @AntoninGAVREL I rejected your edit because you added text to the output that my code does not produce (and shouldn't produce). Instead, I added a paragraph about using $1 and $2. – Kusalananda Apr 17 '20 at 11:02
  • Could you rewrite the example to make awk work from a .sh file ? with variable provided as input OR set as environment previously to the call to the .sh file – Antonin GAVREL Apr 17 '20 at 20:54
  • 1
    @AntoninGAVREL Sure, hold on... just a moment. – Kusalananda Apr 17 '20 at 20:59
  • @AntoninGAVREL See updated answer. – Kusalananda Apr 17 '20 at 21:06
  • it is almost this (actually it achieves the same thing that the sed answer, but perhaps with more portability) but I still have the extra line ? see {\n <-extra new line} by the way thank you very much I was so exasperated that I was in the middle of rewriting it in python ;) – Antonin GAVREL Apr 17 '20 at 21:15
  • it is related to the s flag – Antonin GAVREL Apr 17 '20 at 21:28
0

I finally solved it myself using a very useful python script doing exactly what I wanted:

import sys
import os
import re

topic = sys.argv[1]
pattern = "## " + topic

s = r"cat a.md | grep -n '" + pattern + "' a.md | awk -F ':' '/0/ {print$1}'"
#print(s)
pattern = re.compile("##")

stream = os.popen(s)
lineNb = int(stream.read().rstrip())

filename="a.md"
with open(filename, "r") as f:
    for _ in range(lineNb):
            next(f)
    for line_i, line in enumerate(f, 1):
        if re.search(pattern, line):
            index = line_i + lineNb - 1
            #print( "%d\n" % index )
            break

with open(filename, "r") as f:
    contents = f.readlines()
    contents.insert(index - 1, sys.argv[2] + "\n\n")

with open(filename, "w") as f:
    contents = "".join(contents)
    f.write(contents)


could be optimized further probably, any advice welcome.