1

I know that similar questions have been asked on this forum but, as far as I can see, none of them addressed the problem of patterns being in different lines. Namely, given a text file

( one ) ( two ) (

three

)

four

How may I delete everything that is between each '(' and ')' pair, even when the elements of the pair are on different lines? The desired result is

() () ()

four
James
  • 111

4 Answers4

2

You can use perl: slurp the whole input as a single string, and use the s flag on the s/// command to indicate the newlines are to be treated as plain characters:

perl -0777 -pe 's/\(.*?\)/()/sg' <<END
( one ) ( two ) (

three

)

four
END
() () ()

four
glenn jackman
  • 85,964
0

This can be solved with a simple state machine in Python.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import fileinput
import sys

active = True
for line in fileinput.input():
    for ch in line:
        if ch == '(':
            sys.stdout.write(ch)
            active = False
        elif ch == ')':
            sys.stdout.write(ch)
            active = True
        elif active:
            sys.stdout.write(ch)

Runnable Solution

Usage:

$ echo '( one ) ( two ) (

three

)

four' | python /tmp/statemachine.py

Output:

() () ()

four
0

Python alternative:

python -c 'import sys,re; print(re.sub(r"\([^()]+\)","()",sys.stdin.read().strip()))' <file

The output:

() () ()

four
0

Using sed and will handle even if nested parentheses is there.

sed -z 's/[^()]*)/)/g' infile

Input:

( (zero) one ) ( two ) (

three

)

((((nested))here)end) last
four

Output:

( ()) () ()

(((()))) last
four
αғsнιη
  • 41,407