Using sed to remove string or paragraph between delimiter

Question

I would like to know who to remove a string or paragraph that are between ((( string )))

Lorem ipsum (((dolor sit amet))), consectetur adipiscing elit. Vestibulum aliquet fringilla est, dictum tempor nunc venenatis at. Sed nec velit sit amet velit cursus imperdiet. Vivamus tincidunt ut nunc quis euismod. Quisque sit amet lorem rhoncus, malesuada justo at, ullamcorper erat.

So "dolor sit amet" should not be in the return

Here's the cmd I have for now which detect the first ((( but then stop...

sed -e "/(((/,/)))/d" file.txt

How complex can this get? Can there be nested parentheses? Things like ((( foo ((( bar))) baz)))? How would you like those to be dealt with? How about ((( foo \n bar (((baz))) )))? — terdon, Mar 10 '15 at 15:04
No actualy I need to remove what' inside those. And if there's line break it need to be aware of that too. — Warface, Mar 10 '15 at 15:05
@terdon No there's no nested ((( ))). But it have line breaks so it should be able to search on other lines to find the closing parentheses. — Warface, Mar 10 '15 at 15:14

mikeserv · Answer 1 · 2015-03-12T12:41:31.907

sed -e :p -e '/(((/!b     
'   -e :n -e 's/)))/\     
/;            s/(((.*\n//; tp
$d;N;         s//(((/;     tn'

This should do it. It will branch away (and consequently autoprint) any lines not matching ((( but once one is found it attempts to remove everything between the first occurring ((( sequence and the first occurring ))). If it cannot because the trailing ))) is not found on the current line then it pulls in the Next line, removes everything between the ((( and the head of the next line, and searches again. If it makes it to the end of the $last line while still searching for ))) it gives up. In this way it never buffers more than a line at a time because it removes all that follows ((( each time it has to pull in a newline.

It should handle as many ((( ))) pairs as might occur on a line - and it does not matter if any ( or ) occur between the two ends - it will seek past 2 or fewer ) and any number of (.

After finding ))) it resets to a search for ((( and so it doesn't fail to handle the next pair even after crossing new-line boundaries.

:p - declare the p branch label. The script branches here if it can replace a ))) sequence with a newline then subsequently remove everything between ((( and \n.
/(((/!b - branch away - and autoprint pattern space - if there are no remaining ((( sequences in pattern space.
:n - declare the branch :label n. The script branches here if a ((( is found but a ))) cannot be found on the same line.
s/)))/\n/ - substitute first occurrence of ))) for a newline. This only happens if a ((( has already been matched.
s/(((.*\n// - substitute away everything between first ((( and the only \newline in pattern space.
tp - test for a successful substitution; if true, branch to label :p.
$d;N - the last substitution was not successful; if current line is the $last delete it, else append the Next to pattern space.
s//(((/;tn - repeat last regexp and substitute all between first occurring ((( and the newline just added for ((( then branch to label :n.

score 1 · Answer 2 · answered Mar 10 '15 at 14:52

1

Try

sed 's/((([^)]*)))//' file

or probably even better in your sentence

sed 's/ ((([^)]*)))//' file

answered Mar 10 '15 at 14:52

jimmij

47,140

Oh damn it only works if there's only 1 word inside the delimiter what about paragraph? – Warface Mar 10 '15 at 14:55
If the string/paragraph have line breaks in it seems that it doesn't work. – Warface Mar 10 '15 at 15:01

score 1 · Accepted Answer · answered Mar 10 '15 at 15:19

Doing this for single line strings is very simple:

sed 's/((([^)]*)))//g' file

If you need it to deal with multiline strings, it gets more complex. One approach would be to use tr to replace all newlines with the null character (\0), make the substitution and translate back again:

tr '\n' '\0' < file | sed 's/((([^)]*)))//g' | tr '\0' '\n'

Alternatively, you could just use perl:

perl -0pe 's/\(\(\([^)]+\)\)\)//g;' file

The -0 causes perl to read the entire file into memory (this might be a problem for huge files), the -p means "print each line" but because of the -0, the "line" is actually the entire file. The s/// is the same idea as for sed.

I think I don't have a choice to use the perl command since sed seems to only works with lines. Thanks — Warface, Mar 10 '15 at 15:26
@Warface not sure what you mean. Sed can actually deal with multiline operations, it's just difficult. — terdon, Mar 10 '15 at 15:47

Using sed to remove string or paragraph between delimiter

3 Answers3