3

For some problems like matching a pattern over an unknown number of lines or "replace the last occurence of ..." the option -z of GNU sed is really helpful. How can I achieve the same thing portable?

Example: I have a file

yellow, green,
blue, black, purple,
orange,
white, red, brown
are some colours

and I want to replace the last comma of the file with and. Note that it is unknown in which line or where in that line the comma is. With GNU sed I can do

sed -z 's/\(.*\),/ \1 and/'

to get the desired output

yellow, green,
blue, black, purple,
orange,
white, red and brown
are some colours

How can I do it in a portable way, that will run with any POSIX sed?

muru
  • 72,889
Philippos
  • 13,453
  • 1
    see also: https://unix.stackexchange.com/questions/182153/sed-read-whole-file-into-pattern-space-without-failing-on-single-line-input and https://unix.stackexchange.com/questions/26284/how-can-i-use-sed-to-replace-a-multi-line-string ... I'd also consider perl -0777 as an option for portable solution – Sundeep Aug 01 '19 at 08:11
  • 1
    and another vote for perl being a portable solution. it's worth considering whenever you need to do something that sed can't do or can't do easily. same goes for awk. – cas Aug 01 '19 at 13:31
  • I am a big fan of "using the right tool for the task", but this Q&A is about things that can be done with sed easily. Usually users already have the right approach, they just need a hint to the -z option or this portable pattern. But they need a compact explanation not found in the answers linked by @Sundeep . Please note that using awk or perl or python usually involves programming, while sed is a different appraoch without programming, preferred by a number of people. – Philippos Aug 01 '19 at 13:57
  • 1
    defining what is and isn't "programming" is tricky, at best. but I get what you mean. You can write simple transformations "without programming" in perl, same as you can in sed: e.g. perl -p -e 's/foo/bar/'. Or perl -p -0777 -e 's/foo/bar/g' to process the entire input as one string. Or use -00 to process the input one paragraph at a time (paras are separated by one or more blank lines). So if you have perl installed, but not a sed that understands -Z, perl is a good substitute. – cas Aug 03 '19 at 01:19

2 Answers2

2

In pure POSIX sed you have to paste all lines by yourself. While some people do this with N inside a loop, the easiest approach is to append to the hold space with the H;1h;$!d;x pattern:

  • H appends each line to the hold space. Unfortunally, appending the first line will add a newline to the beginning of the buffer, so
  • 1h will override the hold space for the first line to avoid the wrong newline.
  • $!d will end processing for all lines exept for the last one. They don't need to be printed, because they are stored in the hold space
  • x will be executed only after the last line (for all other lines the d did stop further command processing) and it will exchange hold space and pattern space, so after this command the whole file that was collected in the hold space will be in pattern space, just like it would be with the -z option of GNU sed. Of course you could also use g instead of x, but this will produce a lot of copying, so x is faster.

So the script for the example will look like:

sed 'H;1h;$!d;x;s/\(.*\),/\1 and/'

Please note that processing a file like this is not a good idea for very large files, because this will use lots of RAM.

muru
  • 72,889
Philippos
  • 13,453
0

sed is for doing simple s/old/new on individual strings, that is all. Almost any time you find yourself using constructs other than s, g, and p (with -n) and certainly any time you find yourself talking about "hold space" you are using the wrong tool. For anything more complicated than s/old/new, like this task, you should just use awk instead. The following will work using any awk in any shell on any UNIX box, doesn't store the whole file in memory, and is trivial to tweak if/when you want to additionally do anything else to the text:

$ cat tst.awk
/,/ { printf "%s", prev; prev="" }
{ prev = prev $0 ORS }
END {
    if ( match(prev,/.*,/) ) {
        prev = substr(prev,1,RLENGTH-1) " and" substr(prev,RLENGTH+1)
    }
    printf "%s", prev
}

$ awk -f tst.awk file
yellow, green,
blue, black, purple,
orange,
white, red and brown
are some colours

You COULD do the job more briefly in awk by slurping the whole file into memory and writing this cryptic rune:

$ awk '{r=r$0 ORS} END{h=r;sub(/,[^,]+$/,"",h);sub(/.*,/,"",r);printf "%s and%s",h,r}' file
yellow, green,
blue, black, purple,
orange,
white, red and brown
are some colours

but the point is that, unlike with sed, you don't have to.

Ed Morton
  • 31,617
  • (1) That tries to answer the given example, but not the question that was asked. (2) This is a simple s/old/new/ task, at least with -z. And even the POSIX version is so much easier to read and to understand than each of your awk attempts. (3) The reason for sed to have more commands than just s is that it is designed to do more. And it can. Your That's all claim is just your personal opinion, a common misunderstanding disproved by most text-processing questions here that can easier be done with sed. Your clumsy scripts illustrate that. Why not leave this as a matter of taste? – Philippos Aug 08 '19 at 10:48
  • Many people ask on this forum how to do something in some specific tool and get an answer using some more appropriate tool - VERY often they're asking for a sed solution and get a better awk one instead. Every time you post an answer it's your personal opinion of how to do something. I have about 40 years experience using sed including about a quarter century of that also using awk so my opinion on when to use the 2 tools is not uninformed. "clumsy scripts" - hilarious given the horrendous cryptic, unmaintainable rune you threw up in your answer. – Ed Morton Aug 08 '19 at 13:39
  • Whether or not to use a given tool for a given task is not a matter of taste, it's a matter of clarity, portability, efficiency, robustness, etc. and an awk script will almost always be demonstrably better than a sed script in almost all of those attributes which we strive to achieve when writing software. The problem is that people scribble down some brief rune that may "work:" given the current input but may summon Cthulhu given some other input and people are dazzled by it's complex brevity and throw it into their code until 6 months later it chokes cryptically or can't be enhanced, etc. – Ed Morton Aug 08 '19 at 13:39
  • @Philippos Here's a test for you - try to update your sed script to do, well, anything at all. I can trivially change the awk script to clearly and simply exit with success/fail if it makes the substitution or not (like grep does if it finds the string or not), or print how many input lines were read to stderr, or replace "green" with "grey" on lines that don't contain a comma, or do just about anything else anyone might want to do in future. The sed script would require a complete re-write to do just about anything and would be an even more cryptic mess. That is not a matter of taste. – Ed Morton Aug 08 '19 at 13:52
  • I could give you tons of links to real-world questions, much easier to maintain in sed, but why? I once used to do everything in awk. Recently, I had to learn Python for more complex tasks and found out that gap between tasks better done in sed and those better done in Python is so narrow for me, that I finally dropped awk. awk is so extensive, that it begins to fade out of my memory, while sed``s approach is so different and so simple. I suspect you tried to usesed` as programming tool, which is a bad idea. Anyhow, go fight your holy war, but please do it elsewhere. – Philippos Aug 08 '19 at 14:04
  • wrt I could give you tons of links to real-world questions, much easier to maintain in sed - no, you can't. You think you can because you don't know any better yet. awk is so extensive - no, it's not, that's the point of awk, nothing gets added to the language unless it's hard to do without a specific language construct which is why it's a far smaller language than, say, perls text processing constructs. I absolutely did not ever try to use sed as a programming tool, that's absurd. Wrt a "holy war" - vi vs emacs is a holy war, using awk instead of sed for non-trivial tasks is practical – Ed Morton Aug 08 '19 at 14:12
  • btw, take a look at the top sed answerers on StackOverflow where I usually contribute - https://stackoverflow.com/tags/sed/topusers. I'm not just blowing smoke here. – Ed Morton Aug 08 '19 at 14:16
  • Sorry, but I will not argue about that with someone so stubborn and black-and-white. awk is good for you and easier for you, okay, go with it, just stop claiming that's true for everyone. Read the comments of people why they preferred a different answer over yours. Maybe you can understand that what looks easy for you is complicated for others and the other way around. People are different, tasks are different and tools are different. You can still believe in your golden rule, but I'm out of it. – Philippos Aug 08 '19 at 14:20
  • Good, then stop arguing about it and stop telling me I don't have the right to post my opinion in my answers. We're all trying to help people here by posting the best advice we know how to give. It's not up to you to try to censor what advice we think is valid and appropriate in our posts given our experience. You post what you like in your answers, I'll post what I like in mine, and the people reading our answers can make their own decisions. Telling me or anyone else not to advise people which tool in their experience is appropriate for which tasks is inappropriate at best. – Ed Morton Aug 08 '19 at 14:27
  • Now we agree: your opinion and your experience. That's all I ever wrote. Just add "In my opinion ..." to your answer and everything is fine. (-: – Philippos Aug 08 '19 at 14:34
  • That doesn't need to be added to any answer because it's implicit for every single answer to every single question. Every answer is the opinion of the person posting the answer given their experience which is why we get multiple answers to most questions, because everyone has a different opinion of the right approach to solving it. When someone says "how do I do X" we don't all need to say "In my opinion you should use sed like this...", "In my opinion you should use sed like that...", "In my opinion you should use awk...", " "In my opinion you should use perl...", etc. Glad we agree :-)! – Ed Morton Aug 08 '19 at 14:48