8

I am learning sed. Everything seemed to be going fine until I come across the N (multi-line next). I created this file (guide.txt) for practice/understanding/context purposes. Here is the contents of said file...

This guide is meant to walk you through a day as a Network
Administrator. By the end, hopefully you will be better
equipped to perform your duties as a Network Administrator
and maybe even enjoy being a Network Administrator that much more.
Network Administrator
Network Administrator
I'm a Network Administrator

So my goal is to substitute ALL instances of "Network Administrator" with "System User". Because the first instance of "Network Administrator" is separated by a newline (\n) I need the multi-line next operator (N) to append the line that starts with "Administrator" with the previous line ending with "Network\n". No problem. But I also want to catch all the other "Network Administrator" single-line instances.

From my research, I've learned that I will need two substitution commands; one for the newline separated string and one for the others. Also, there is some jive happening because of the last line containing the substitution match and the multi-line next. So I craft this...

$ sed '
> s/Network Administrator/System User/
> N
> s/Network\nAdministrator/System\nUser/
> ' guide.txt

This returns these results...

This guide is meant to walk you through a day as a System
User. By the end, hopefully you will be better
equipped to perform your duties as a System User
and maybe even enjoy being a Network Administrator that much more.
System User
Network Administrator
I'm a System User

I thought that the single-line substitution would catch all the "normal" instances of "Network Administrator" and swap it out for "System User", while the multi-line statement would work its magic on the newline separated instance, but as you can see it returned, what I consider, unexpected results.

After some fiddling, I landed on this...

$ sed '
> s/Network Administrator/System User/
> N
> s/Network\nAdministrator/System\nUser/
> s/Network Administrator/System User/
> ' guide.txt

And voilà, I get the desired output of...

This guide is meant to walk you through a day as a System
User. By the end, hopefully you will be better
equipped to perform your duties as a System User
and maybe even enjoy being a System User that much more.
System User
System User
I'm a System User

Why does this work and the original sed script doesn't? I really want to understand this.

Thanks in advance for any help.

John1024
  • 74,655

2 Answers2

7

First, note that your solution doesn't really work. Consider this test file:

$ cat test1
Network
Administrator Network
Administrator

And then run the command:

$ sed '
 s/Network Administrator/System User/
 N
 s/Network\nAdministrator/System\nUser/
 s/Network Administrator/System User/
 ' test1
System
User Network
Administrator

The problem is that the code does not substitute in for the last Network\nAdministrator.

This solution does work:

$ sed ':a; /Network$/{$!{N;ba}}; s/Network\nAdministrator/System\nUser/g; s/Network Administrator/System User/g' test1
System
User System
User

We can also apply this to your guide.txt:

$ sed ':a; /Network$/{$!{N;ba}}; s/Network\nAdministrator/System\nUser/g; s/Network Administrator/System User/g' guide.txt 
This guide is meant to walk you through a day as a System
User. By the end, hopefully you will be better
equipped to perform your duties as a System User
and maybe even enjoy being a System User that much more.
System User
System User
I'm a System User

The key is to keep reading in lines until you find one that does not end with Network. When that is accomplished, the substitutions can be done.

Compatibility Note: All the above use \n in the replacement text. This requires GNU sed. It will not work on BSD/OSX sed.

[Hat tip to Philippos.]

Multiline version

If it helps clarify, here is the same command split over multiple lines:

$ sed ':a
    /Network$/{
       $!{
           N
           ba
       }
    }
    s/Network\nAdministrator/System\nUser/g
    s/Network Administrator/System User/g
    ' filename

How it works

  1. :a

    This creates a label a.

  2. /Network$/{ $!{N;ba} }

    If this line ends with Network, then, if this is not the last line ($!) read and append the next line (N) and branch back to label a (ba).

  3. s/Network\nAdministrator/System\nUser/g

    Make the substitution with the intermediate newline.

  4. s/Network Administrator/System User/g

    Make the substitution with the intermediate blank.

Simpler solution (GNU only)

With GNU sed (not BSD/OSX), we only need one substitute command:

$ sed -zE 's/Network([[:space:]]+)Administrator/System\1User/g' test1
System
User System
User

And on the guide.txt file:

$ sed -zE 's/Network([[:space:]]+)Administrator/System\1User/g' guide.txt 
This guide is meant to walk you through a day as a System
User. By the end, hopefully you will be better
equipped to perform your duties as a System User
and maybe even enjoy being a System User that much more.
System User
System User
I'm a System User

In this case, -z tells sed to read in up to the first NUL character. Since text files never have a null character, this has the effect of reading the whole file in at once. We can then make the substitution without worrying about missing a line.

This method is not good if the file is huge (usually meaning gigabytes). If it is that large, then reading it all in at once might strain the system RAM.

Solution that works on both GNU and BSD sed

As suggested by Phillipos, the following is a portable solution:

sed 'H;1h;$!d;x;s/Network\([[:space:]]\)Administrator/System\1Us‌​er/g'
John1024
  • 74,655
  • 1
    Excellent information, John! Thanks for shedding some light on this and your alternative solution is very nice. That being said, I still don't understand why my solution isn't a solution. It appears to work, but with your test.txt file it doesn't. Why does my solution appear to work, but doesn't really? Thanks so much for the help. – dlowrie290 Oct 11 '17 at 03:09
  • 1
    @dlowrie290 Your solution reads in lines in pairs. If Network Administrator is split between the first and second line of that pair, your solution successfully makes the substitution. It then prints those two lines and reads in the next pair. If, however, the second line of the first pair ends with Network and the first line of the second pair begins with Administrator, the code misses it. My code avoids this by reading in lines until it finds one that doesn't end with Network. – John1024 Oct 11 '17 at 03:24
  • 2
    Please note that your first multiline solution also depends on GNU extensions to sed: The \n in the replacement is not defined in the standard. sed 'H;1h;$!d;x;s/Network\([[:space:]]\)Administrator/System\1User/g' is a portable way to do it. – Philippos Oct 11 '17 at 05:53
  • @Philippos Excellent points. Answer updated to include the portable solution. – John1024 Oct 11 '17 at 06:10
  • Okay, I added another answer to explain how it works and added an introduction to the N;P;D cycle. – Philippos Oct 11 '17 at 06:32
  • @Philippos Very good and +1. – John1024 Oct 11 '17 at 07:00
  • @Philippos Thanks again: After reading your answer, I added $! to my answer. If I could give you another +1, I would. – John1024 Oct 11 '17 at 07:14
  • 1
    Thanks for the clarification, John! Again, Great stuff and your time/efforts are much appreciated! – dlowrie290 Oct 11 '17 at 12:24
6

As you are learning sed, I'll take the time to add to @John1024's answer:

1) Please note that you are using \n in the replacement string. This works in GNU sed, but is not part of POSIX, so it will insert a backslash and an n in many other seds (using \n in the pattern is portable, btw).

Instead of this I suggest to do s/Network\([[:space:]]\)Administrator/System\1Us‌​er/g: The [[:space:]] will match newline or whitespace, so you don't need two s commands, but combine them in one. By surrounding it with \(...\) you can refer to it in the replacement: The \1 will get replaced by whatever was matched in the first pair of \(\).

2) To properly match patterns over two lines, you should know the N;P;D pattern:

 sed '$!N;s/Network\([[:space:]]\)Administrator/System\1User/g;P;D'

The N is always append the next line (except for the last line, that's why it's "addressed" with $! (=if not last line; you should always consider to preceed N with $! to avoid accidentally ending the script). Then after the replacement the P prints only the first line in the pattern space and the D deletes this line and starts the next cycle with the remains of the pattern space (without reading the next line). This is probably what you originally intended.

Remember this pattern, you will often need it.

3) Another useful pattern for multiline editing, especially when more than two lines are involved: Hold space collecting, as I suggested to John:

sed 'H;1h;$!d;g;s/Network\([[:space:]]\)Administrator/System\1Us‌​er/g'

I repeat it to explain it: H appends each line to the hold space. As this would result in an extra newline before the first line, the first line needs to be moved instead of appended with 1h. The following $!d means "for all lines except the last one, delete the pattern space and start over". Thus, the rest of the script is only executed for the last line. At this point, the whole file is collected in the hold space (so don't use this for very large files!) and the g moves it to the pattern space, so you can do all replacements at once like you can with the -z option of GNU sed.

This is another useful pattern I suggest to keep in mind.

Philippos
  • 13,453
  • Wow! Great explanation! This coupled with John's answer really gave me a better insight to this problem and sed in general. Looks like I've got much more to learn. I wish I could check both of your solutions as answers. Thanks so much for both of your efforts. They are much appreciated. – dlowrie290 Oct 11 '17 at 12:23