Input
testing on Linux [Remove white space] testing on Linux
Output
testing on Linux [Removewhitespace] testing on Linux
So, how can we just remove all the white space between the brackets and achieve output as given?
testing on Linux [Remove white space] testing on Linux
testing on Linux [Removewhitespace] testing on Linux
So, how can we just remove all the white space between the brackets and achieve output as given?
If the [
, ]
are balanced and not nested, you could use GNU awk
as in:
gawk -v RS='[][]' '
NR % 2 == 0 {gsub(/\s/,"")}
{printf "%s", $0 RT}'
That is use [
and ]
as the record separators instead of the newline character and remove blanks on every other records only.
With sed, with the additional requirement that there be no newline character inside [...]
:
sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'
If they are balanced but may be nested as in blah [blih [1] bluh] asd
, then you could use perl
's recursion regexp operators like:
perl -0777 -pe 's{(\[((?:(?>[^][]+)|(?1))*)\])}{$&=~s/\s//rsg}gse'
Another approach, which would scale to very large files would be to use the (?{...})
perl regexp operator to keep track of the bracket depth like in:
perl -pe 'BEGIN{$/=\8192}s{((?:\[(?{$l++})|\](?{$l--})|[^][\s]+)*)(\s+)}
{"$1".($l>0?"":$2)}gse'
Actually, you can also process the input one character at a time like:
perl -pe 'BEGIN{$/=\1}if($l>0&&/\s/){$_=""}elsif($_ eq"["){$l++}elsif($_ eq"]"){$l--}'
That approach can be implemented with POSIX tools:
od -A n -vt u1 |
tr -cs 0-9 '[\n*]' |
awk 'BEGIN{b[32]=""; b[10]=""; b[12]=""} # add more for every blank
!NF{next}; l>0 && $0 in b {next}
$0 == "91" {l++}; $0 == "93" {l--}
{printf "%c", $0}'
With sed
(assuming no newline inside the [...]
):
sed -e 's/_/_u/g;:1' -e 's/\(\[[^][]*\)\[\([^][]*\)]/\1_o\2_c/g;t1' \
-e :2 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t2' \
-e 's/_c/]/g;s/_o/[/g;s/_u/_/g'
Are considered white space above any horizontal (SPC, TAB) or vertical (NL, CR, VT, FF...) spacing character in the ASCII charset. Depending on your locale, others might get included.
[..]
or handle more than one [...]
per line. I've added a sed
solution to my answer.
– Stéphane Chazelas
Feb 01 '13 at 15:44
sed
solution will only work when you're using a different char for opening and closing a pair (this issue may effect some of the other solutions provided also).
– curious_prism
Sep 24 '14 at 14:30
sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'
? I want to remove white space between > <
– gary69
Feb 09 '19 at 20:34
Perl 5.14 solution (which is shorter and IMO easier to read—especially if you format it over multiple lines in a file, instead of as a one-liner)
perl -pE 's{(\[ .*? \])}{$1 =~ y/ //dr}gex'
That works because in 5.14, the regular expression engine is re-entrant. Here it is, expanded out and commented:
s{
(\[ .*? \]) # search for [ ... ] block, capture (as $1)
}{
$1 =~ y/ //dr # delete spaces. you could add in other whitespace here, too
# d = delete; r = return result instead of modifying $1
}gex; # g = global (all [ ... ] blocks), e = replacement is perl code, x = allow extended regex
Perl solution:
perl -pe 's/(\[[^]]*?)\s([^][]*\])/$1$2/ while /\[[^]]*?\s[^][]*\]/'