Sed to discover and replace text BETWEEN two patterns

Question

Dear Stack Exchange Community,

I see other solutions for this but I'm struggling with the regex I need to adapt them to my situation.

I have software-generated files that have lib object member properties with names that I need to replace. I need to use sed to find whatever name of the property, and replace it with the base file name.

Starting with a .js file named bobby.js that contains:

// stage content:
(lib.Scenario2IntroFigure0 = function(mode,startPosition,loop) {
    stuff
}

Ending with the same bobby.js file but it now has:

// stage content:
(lib.bobby = function(mode,startPosition,loop) {
    stuff
}

NOTE: Scenario2IntroFigure0 is different for every file, unfortunately.

Pseudocode describing what I think I should do:

A. Isolate the old name by looking for whatever is between this pattern::

// stage content:
(lib.

B. And this ending pattern:

= function(mode,startPosition,loop) {

C. Get the file base name itself with:

FILENAME=$(basename $1 '.js')

D. Replace old name with file base name and overwrite the file like:

sed -i "s/Scenario2IntroFigure0/$DA_FILE/g" $1

BUT where "Scenario2IntroFigure0" is whatever sed found between those two patterns.

score 0 · Accepted Answer · answered Jun 14 '19 at 03:39

0

Replace $file with your $1:

file="bobby.js"
filename=$(basename "$file" '.js')
sed -i 's/\((lib\.\).*\( = function(mode,startPosition,loop) {\)/\1'"$filename"'\2/' "$file"

answered Jun 14 '19 at 03:39

Freddy

25,565

Thank you very much Freddy. I apologize for the delay in accepting this solution. – mishawagon Jul 01 '19 at 15:25

bu5hman · Answer 2 · 2019-06-15T18:48:31.020

0

Kicking off from @Freddy but including OP requirement that the match is across lines. Done by replacing \n with NULL for the sed using

tr '\n' '\0'

and then switching them back after the sed.

f="bobby.js"
b=$(basename "$f" '.js')
pre="// stage content:\x00\(lib."             #pattern includes NULL (\x00)
post=" = function\(mode,startPosition,loop\)"
cat $f | tr '\n' '\0' | sed -E "s|($pre)[[:alnum:]]+($post)|\1$b\2|g" | tr '\0' '\n'

EDIT

A pure sed solution not involving any messing around with tr

f="bobby.js"
b=$(basename "$f" '.js')
pre="\/\/ stage content:"
mid="\(lib."
post=" = function\(mode,startPosition,loop\)"
sed -E "/^$pre$/{$!{ N;s|($pre\n$mid)[[:alnum:]]+($post)|\1$b\2|;ty;P;D;:y}}" $f

This solution courtesy of a close study of this post and this one. I hope your brain does not ache as much as mine after reading them, but I learned a lot in the process. All respect to the posters, including @Peter.O

We are not worthy!

post script

The original bobby.js is malformed as the opening and closing braces are not matched

edited Jun 15 '19 at 18:48

answered Jun 14 '19 at 10:00

bu5hman

4,756

POSIX text processing utilities are only required to operate on POSIX text files and POSIX text files are comprised of POSIX text lines which by definition all end in \n so by converting the \ns to \0s you're relying on undefined behavior for the subsequent tools. – Ed Morton Jun 15 '19 at 15:33
Which is why the last part of the code puts the \n back. – bu5hman Jun 15 '19 at 15:39
That's too late though as you've already tried to process it with sed and tr by then so if, for example, you're running a version either of those that internally store their input as C strings then they'll truncate the input at the first \0s or otherwise misbehave and still be POSIX compliant. – Ed Morton Jun 15 '19 at 16:10
You've lost me.... I am no POSIX expert, if you want to educate me in chat then I am game. Always keen to be educated. As I see it I am just creating a single stream of text characters (just one line) from the original file to process in sed and then reverting back. I will read up on POSIX and see if I can catch your drift. – bu5hman Jun 15 '19 at 16:19
We can chat if you like but it's very simple - POSIX requires text processing tools to handle lines that end in \n so when you ask a tool to handle text that doesn't end in \n then YMMV with what that does. Beyond that POSIX requires text files to not contain \0s so when you ask a tool to handle text containing \0s then YMMV. A specific example of a problem with lines that contain \0 is that many tools are written in C and in C strings are terminated by \0 so such tools simply cannot store a "line" as a string that contains \0s since the first \0 terminates the string. – Ed Morton Jun 15 '19 at 16:29
In your code you're requiring anything in the pipeline after tr '\n' '\0' (i.e. sed '...' | tr '\0' '\n') to process lines that a) don't end in \n and b) do contain \0s and so YMMV for both reasons. Some versions of some tools will be fine with it (e.g. GNU tools would be OK I expect as AFAIK they don't store data internally as C strings) but others won't. All I'm saying is if/when you write code that relies on that you should give the readers a heads-up and if you know specific tools that WILL work then it'd be good to state that too. – Ed Morton Jun 15 '19 at 16:31
All that has been done is that for the purpose of executing sed the entire file has been converted into a single line. After the sed the single line is broken at the temporary token \0 back to its original form. Any downstream process will receive the output from the second tr pipe (thanks @kulsananda) with the \n. If the issue is the choice of \0 as the token, then any character that does not occur in the input stream will do instead. – bu5hman Jun 15 '19 at 17:01
You keep talking about what will happen after the 2nd tr but I'm talking about what happens after the 1st tr. What has been done is that for the purpose of executing sed the entire file has been converted into a single stream of characters some of which are \0s and which doesn't end in \n. That stream of characters is not a POSIX text line since it both contains \0s and does not end in \n and so what sed does with it will depend on the version of sed you are running and then what tr does with it will depend on that plus the version of tr you are running. – Ed Morton Jun 15 '19 at 17:16
wrt If the issue is the choice of \0 as the token, then any character that does not occur in the input stream will do instead - yes that is one of the 2 problems and in general there is no character that can't exist in the input and which a text processing tool is guaranteed to be able to handle. – Ed Morton Jun 15 '19 at 17:19
FWIW if you really wanted to adopt this approach then here's how to do it in a robust, POSIX-compliant way: awk '{gsub(/@/,"@A"); gsub(/~/,"@B"); printf "%s%s", (NR>1?"~":""), $0} END{print ""}' file | sed 'script' | awk '{gsub(/~/,RS); gsub(/@B/,"~"); gsub(/@A/,"@")}1' Just make sure your sed script isn't using @ or ~ in the search or replacement text and if it is then just pick any other 2 chars to use in the awk scripts. – Ed Morton Jun 15 '19 at 18:12

score 0 · Answer 3 · answered Jun 15 '19 at 15:27

With GNU awk for gensub():

$ awk -v RS= '{ $0=gensub(/(.*\/\/\s+stage content:\s+\(lib\.)\S+(\s+=\s+function\(mode,startPosition,loop\)\s+\{.*)/,"\\1" gensub(/\.js$/,"",1,FILENAME) "\\2",1) } 1' bobby.js
// stage content:
(lib.bobby = function(mode,startPosition,loop) {
    stuff
}

Make it awk -i inplace -v RS=... if you want "inplace" editing.

Sed to discover and replace text BETWEEN two patterns

3 Answers3