How to remove all white spaces just between brackets [] using Unix tools?

Question

Replace text between brackets

Input

testing on Linux [Remove white space] testing on Linux

Output

testing on Linux [Removewhitespace] testing on Linux

So, how can we just remove all the white space between the brackets and achieve output as given?

Remove comma between quotes only might help. – mtk Feb 01 '13 at 12:50 — mtk, Feb 01 '13 at 12:50
Replace text between brackets might also help. – manatwork Feb 01 '13 at 15:14 — manatwork, Feb 01 '13 at 15:14

Stéphane Chazelas · Accepted Answer · 2017-10-31T18:07:46.517

If the [, ] are balanced and not nested, you could use GNU awk as in:

gawk -v RS='[][]' '
   NR % 2 == 0 {gsub(/\s/,"")}
   {printf "%s", $0 RT}'

That is use [ and ] as the record separators instead of the newline character and remove blanks on every other records only.

With sed, with the additional requirement that there be no newline character inside [...]:

sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'

If they are balanced but may be nested as in blah [blih [1] bluh] asd, then you could use perl's recursion regexp operators like:

perl -0777 -pe 's{(\[((?:(?>[^][]+)|(?1))*)\])}{$&=~s/\s//rsg}gse'

Another approach, which would scale to very large files would be to use the (?{...}) perl regexp operator to keep track of the bracket depth like in:

perl -pe 'BEGIN{$/=\8192}s{((?:\[(?{$l++})|\](?{$l--})|[^][\s]+)*)(\s+)}
  {"$1".($l>0?"":$2)}gse'

Actually, you can also process the input one character at a time like:

perl -pe 'BEGIN{$/=\1}if($l>0&&/\s/){$_=""}elsif($_ eq"["){$l++}elsif($_ eq"]"){$l--}'

That approach can be implemented with POSIX tools:

od -A n -vt u1 |
  tr -cs 0-9 '[\n*]' |
  awk 'BEGIN{b[32]=""; b[10]=""; b[12]=""} # add more for every blank
       !NF{next}; l>0 && $0 in b {next}
       $0 == "91" {l++}; $0 == "93" {l--}
       {printf "%c", $0}'

With sed (assuming no newline inside the [...]):

sed -e 's/_/_u/g;:1' -e 's/\(\[[^][]*\)\[\([^][]*\)]/\1_o\2_c/g;t1' \
    -e :2 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t2' \
    -e 's/_c/]/g;s/_o/[/g;s/_u/_/g'

Are considered white space above any horizontal (SPC, TAB) or vertical (NL, CR, VT, FF...) spacing character in the ASCII charset. Depending on your locale, others might get included.

@ekassis, not sure about your sed solution as it seems to have been mangled, but I don't expect it to remove more than one space inside [..] or handle more than one [...] per line. I've added a sed solution to my answer. — Stéphane Chazelas, Feb 01 '13 at 15:44
you are right the sed -e 's/([[^]]*)( )/\1/' replace the last space between the brackets. the sed you provided me is the right one. THANK YOU — ekassis, Feb 01 '13 at 17:20
I found this query when I was interested in remove spaces between double quotes. The first sed solution will only work when you're using a different char for opening and closing a pair (this issue may effect some of the other solutions provided also). — curious_prism, Sep 24 '14 at 14:30
Can you explain what each bracket is doing here sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'? I want to remove white space between > < — gary69, Feb 09 '19 at 20:34

derobert · Answer 2 · 2013-02-01T13:47:48.140

2

Perl 5.14 solution (which is shorter and IMO easier to read—especially if you format it over multiple lines in a file, instead of as a one-liner)

perl -pE 's{(\[ .*? \])}{$1 =~ y/ //dr}gex'

That works because in 5.14, the regular expression engine is re-entrant. Here it is, expanded out and commented:

s{
    (\[ .*? \])         # search for [ ... ] block, capture (as $1)
}{
    $1 =~ y/ //dr       # delete spaces. you could add in other whitespace here, too
                        # d = delete; r = return result instead of modifying $1
}gex; # g = global (all [ ... ] blocks), e = replacement is perl code, x = allow extended regex

edited Feb 01 '13 at 13:47

answered Feb 01 '13 at 12:17

derobert

109,670

@StephaneChazelas well, doesn't assume anything about balanced. It does assume not nested, though. It works just fine on empty brackets (it leaves them alone, which is fine, as they contain no whitespace). It does assume one line (which would be easy to change). And indeed its only space, but trivial to change—as noted in the answer... – derobert Feb 01 '13 at 12:42
@StephaneChazelas Indeed, I suppose it does. Will fix. FYI, I don't mind if you edit my answers directly to fix things like that. – derobert Feb 01 '13 at 13:47

score 0 · Answer 3 · answered Feb 01 '13 at 11:50

0

Perl solution:

perl -pe 's/(\[[^]]*?)\s([^][]*\])/$1$2/ while /\[[^]]*?\s[^][]*\]/'

answered Feb 01 '13 at 11:50

choroba

47,233

How to remove all white spaces just between brackets [] using Unix tools?

Input

Output

3 Answers3

Linked

Related