how to remove a multilined string/block of text pattern from a file?

Question

I have a text file that has a multiple lined string of text I'd like to scan the file for and remove all instances that it finds of that multilined and potentially sometimes duplicate string.

example string:

recursive-test yes;
test-limit{
tests 10;
};
location "testLoc" {
type test;
};
location "testLoc2"{
type test;
file "/etc/var/test.sql";
};
include "/etc/var/test.conf";
};



recursive-test yes;
test-limit{
tests 10;
};
location "testLoc" {
type test;
};
location "testLoc2"{
type test;
file "/etc/var/test.sql";
};
include "/etc/var/test.conf";
};

otherTestTextHere
123
321

recursive-test yes;
test-limit{
tests 10;
};
location "testLoc" {
type test;
};
location "testLoc2"{
type test;
file "/etc/var/test.sql";
};
include "/etc/var/test.conf";
};

As you can see, the repetitive string of text in the text file is always the same, from start of the string, to the end of the multiple lines, it's always the same:

recursive-test yes;
test-limit{
tests 10;
};
location "testLoc" {
type test;
};
location "testLoc2"{
type test;
file "/etc/var/test.sql";
};
include "/etc/var/test.conf";
};

The multilined string should not be duplicated normally, but as a failsafe I'm looking also for a method that will just scan for all instances and remove it entirely if for some reason the string ever gets duplicated from another application that's writing to the text file.

Using sed I can only figure out how to delete one line at a time, however that wont work for me since sometimes some of the words on some of the lines in the multilined string, show up in other multilined strings that are similar but I want to keep. I'm really just trying to search for 'exact' duplicates of this multilined string from start to finish of the string.

I'm trying to keep it to a one line command line/optimized.

Yes I've seen that post and have tried to see if I can use a similar syntax but could not get it to function. The cheatsheet is not very user friendly. — RCG, Feb 25 '15 at 06:08

Costas · Accepted Answer · 2015-02-25T07:08:02.107

3

How I understand OP there are some blocks of text which separated by empty lines and OP wants to remove every duplicates:

awk -v RS='\n\n' -v ORS="\n\n" '!seen[$0]++' file

If OP wants to just remove the block try it via GNU sed:

sed -z 's~recursive-test yes;\ntest-limit{\ntests 10;\n};\nlocation "testLoc" {\ntype test;\n};\nlocation "testLoc2"{\ntype test;\nfile "/etc/var/test.sql";\n};\ninclude "/etc/var/test.conf";\n};~~g' file

edited Feb 25 '15 at 07:08

answered Feb 25 '15 at 07:01

Costas

14,916

This is exactly what I needed thank you. I'm going to play around with your sed statement and some examples I'll create to try to get a good understanding of how it's all working. This answer works. – RCG Feb 25 '15 at 20:17

score 1 · Answer 2 · answered Feb 25 '15 at 07:01

1

< input python -c 'import sys; sys.stdout.write(sys.stdin.read().replace("""recursive-test yes;\ntest-limit{\ntests 10;\n};\nlocation "testLoc" {\ntype test;\n};\nlocation "testLoc2"{\ntype test;\nfile "/etc/var/test.sql";\n};\ninclude "/etc/var/test.conf";\n};""", ""))'

python's triple quotes (""") nicely helps in not having to escape quotes in the string to match.

answered Feb 25 '15 at 07:01

Anthon

79,293

I'll keep this in mind as well, I never really use python and try to stick to sed since I know it is very powerful and comes built in on my distrobution, however python is also great for all these tasks. – RCG Feb 25 '15 at 20:19

how to remove a multilined string/block of text pattern from a file?

2 Answers2