3

I have a file which includes consecutive lines like this

macroa{abc def 123 ghi}
macrob{abc 123 xyz}

I want to check if the first string in macrob is the same as in macroa and if it is to delete it so the result is

macroa{abc def 123 ghi}
macrob{123 xyz}

I'm using the whole file approach from here and my command is

sed -e '1h;2,$H;$!d;g' -e 's/\(macroa{\([a-z]*\) [^\n]*\)\n\(macrob{\)\2 /\1\n\3/g' in > out

However this doesn't work. What am I doing wrong thank you.

tom
  • 77
  • 5
  • 1
    If you're using GNU sed, this should work. What result are you actually getting? – Kusalananda Feb 20 '18 at 06:40
  • @Kusalananda Nothing gets changed. I'm using GNU sed. – tom Feb 20 '18 at 06:56
  • Try with the examples you posted: It works. Could it be that your code has tabs instead of the spaces as separator after abc? This would get converted to spaces when pasting it to your question here. – Philippos Feb 20 '18 at 07:22

2 Answers2

3

I tested your script with GNU sed and it produced the expected result. However, this is not portable to other sed versions, as you use \n inside [] and in the replacement, which is undefined by the standard.

Using it in the replacement can easily be avoided:

sed -e '1h;2,$H;$!d;g' -e 's/\(macroa{\([a-z]*\) [^\n]*\)\(\nmacrob{\)\2 /\1\3/g'

To use it in the [] expression can be done with a trick -- you use the y command to exchange the newline with a normal character before the replacement and change it back afterwards; in this case I use |:

sed -e '1h;2,$H;$!d;g' -e 'y/\n|/|\n/;s/\(macroa{\([a-z]*\) [^|]*\)\(|macrob{\)\2 /\1\3/g;y/\n|/|\n/'

This is the universal solution, however I think it's ugly. In most cases, instead of [^\n], you can write [[:print:]], because typically all code except for the newlines consists of printable characters, so it's:

sed 'H;1h;$!d;g;s/\(macroa{\([a-z]*\) [[:print:]]*\)\n\(macrob{\)\2 /\1\n\3/g'

(I also simplified your initial 1h;2,$H to H;1h.)

Considering don_crissti's comment, I add that the typical approach to solve this kind of Problem is the N;P;D cycle: Always add the Next line, process both together, Print the first line and Delete it from the pattern space to continue with the second:

sed 'N;s/\(macroa{\)\([a-z]* \)\(.*\nmacrob{\)\2/\1\2\3/;P;D'
Philippos
  • 13,453
  • 1
    Thank you for the detailed response. Your versions work fine. – tom Feb 20 '18 at 07:39
  • You are right, @don_crissti, so I added the better approach how you would usually solve this with sed. When I help children with math, I usually help them solve the problem their way, before I I show a different (better) approach to solve it. I should not skip the second part. – Philippos Feb 22 '18 at 07:21
1

If are okay with using awk instead of sed

$ awk -F'[{ ]' 'c && c-- && $1=="macrob" && $2==s{sub(s" ", "")}
                $1=="macroa"{c=1; s=$2} 1' ip.txt
macroa{abc def 123 ghi}
macrob{123 xyz}
  • -F'[{ ]' use { or space character as field separator
  • $1=="macroa"{c=1; s=$2} if first field is macroa, initialize counter with 1 and save second field in a variable. The counter determines which following lines has to be checked
  • c && c-- this will be true as long as counter is non-zero. Since c=1 in this case, only once this will be true and irrespective of further conditions, counter will become zero. So, only consecutive lines can match
  • $1=="macrob" && $2==s required condition
    • sub(s" ", "") remove the string and a space character
  • Further reading: Printing with sed or awk a line following a matching pattern
Sundeep
  • 12,008