How to strip multiple spaces to one using sed?

Question

sed on AIX is not doing what I think it should. I'm trying to replace multiple spaces with a single space in the output of IOSTAT:

# iostat
System configuration: lcpu=4 drives=8 paths=2 vdisks=0

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait
          0.2         31.8                9.7   4.9   82.9      2.5

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk9           0.2      54.2       1.1   1073456960  436765896
hdisk7           0.2      54.1       1.1   1070600212  435678280
hdisk8           0.0       0.0       0.0          0         0
hdisk6           0.0       0.0       0.0          0         0
hdisk1           0.1       6.3       0.5   63344916  112429672
hdisk0           0.1       5.0       0.2   40967838  98574444
cd0              0.0       0.0       0.0          0         0
hdiskpower1      0.2     108.3       2.3   2144057172  872444176

# iostat | grep hdisk1
hdisk1           0.1       6.3       0.5   63345700  112431123

#iostat|grep "hdisk1"|sed -e"s/[ ]*/ /g"
 h d i s k 1 0 . 1 6 . 3 0 . 5 6 3 3 4 5 8 8 0 1 1 2 4 3 2 3 5 4

sed should search & replace (s) multiple spaces (/[ ]*/) with a single space (/ /) for the entire group (/g)... but it's not only doing that... its spacing each character.

What am I doing wrong? I know its got to be something simple... AIX 5300-06

edit: I have another computer that has 10+ hard drives. I'm using this as a parameter to another program for monitoring purposes.

The problem I ran into was that "awk '{print $5}' didn't work because I'm using $1, etc in the secondary stage and gave errors with the Print command. I was looking for a grep/sed/cut version. What seems to work is:

iostat | grep "hdisk1 " | sed -e's/  */ /g' | cut -d" " -f 5

The []s were "0 or more" when I thought they meant "just one". Removing the brackets got it working. Three very good answers really quickly make it hard to choose the "answer".

Another simple option without sed: iostat | grep "hdisk1 " | xargs echo — Paul, Sep 18 '22 at 12:55

score 130 · Answer 1 · answered Aug 19 '11 at 15:06

130

/[ ]*/ matches zero or more spaces, so the empty string between characters matches.

If you're trying to match "one or more spaces", use one of these:

... | sed 's/  */ /g'
... | sed 's/ \{1,\}/ /g'
... | tr -s ' '

answered Aug 19 '11 at 15:06

glenn jackman

85,964

Ahh... [] makes it "optional". That explains it. – WernerCD Aug 19 '11 at 15:28
7

@WernerCD, no * makes it "optional". [ ] just makes a list of characters with only one character in it (a space). It is the quantifier * that means "zero or more of the previous thing" – glenn jackman Aug 19 '11 at 15:33
Ahh... so to be more accurate, changing it from a single space / */, to a double space is what did it then. I gottcha. – WernerCD Aug 19 '11 at 15:50
I was trying to search for a pattern which search only double spaces only and it worked cool – minhas23 Jan 19 '15 at 08:18
14

+1 for the simplest tr -s ' ' solution – Andrejs Oct 09 '16 at 12:32
For some reason i got order s uncompleted with sed, while tr -s " " worked. I'll not forget it never :-B because is SO handy. The simpler the better. – m3nda Mar 31 '18 at 15:01
Wish the OP asked for tr -s ' ' solution. Such a simple and powerful command. – Rakib Fiha Sep 11 '20 at 13:19
for sed to match N spaces do [ ]\{N\}. @glennjackman why doesn't \s\{N\} or [:space:]\{N\} work? – Brian Wiley Jul 22 '22 at 05:39
\s is an extension that some seds support. [:space:] is a POSIX character class that must appear inside a bracket expression -> [[:space:]]. [:space:] by itself is a bracket expression that matches a colon or an s or a p or an a or a c or an e. – glenn jackman Jul 22 '22 at 14:18

score 70 · Accepted Answer · edited May 10 '21 at 18:16

70

The use of grep is redundant, sed can do the same. The problem is in the use of * which also match 0 spaces. With GNU sed, you may use \+ instead:

iostat | sed -n '/hdisk1/s/ \+/ /gp'

Or, with standard sed:

iostat | sed -e '/hdisk/!d' -e 's/ \{2,\}/ /g'

to delete all lines that does not contain the substring hdisk, and to replace all runs of two or more spaces with single spaces, or

iostat | sed -e '/hdisk1/!d' -e 's/   */ /g'

edited May 10 '21 at 18:16

Kusalananda

333,661

answered Aug 19 '11 at 15:06

enzotib

51,661

1

AIX doesn't seem to support +, but removal of the []'s seems to have done the trick. – WernerCD Aug 19 '11 at 15:26
I tried using the sed -n version... what happens is I have another computer that has 10+ drives so it starts doing 1, 10, 11, etc... I tried to add a space /hdisk1 / and it gave me a "not recognized function". what seems to work is >> iostat | grep "hdisk1 " | sed -e's/ */ /g' – WernerCD Aug 19 '11 at 15:33
You can also use \s for whitespace which then is iostat | sed -n '/hdisk1/s/\s\+/ /gp' – Timo Nov 13 '20 at 09:58
iostat | sed -n '/hdisk1/s/ */ /gp' can you explain what this command does and why you use exactly two whitespaces together with asterisk. I know 's/search/replace/' and g means global and p is print. n is supress (the rest excluding the line to print, which should be printed) – Timo Nov 13 '20 at 10:01

Caleb · Answer 3 · 2011-08-19T15:47:36.090

22

Change your * operator to a +. You are matching zero or more of the previous character, which matches every character because everything that isn't a space is ... um ... zero instances of space. You need to match ONE or more. Actually it would be better to match two or more

The bracketed character class is also un-necessary for matching one character. You can just use:

s/  \+/ /g

...unless you want to match tabs or other kinds of spaces too, then the character class is a good idea.

edited Aug 19 '11 at 15:47

answered Aug 19 '11 at 15:14

Caleb

70,105

AIX doesn't seem to support +. – WernerCD Aug 19 '11 at 15:26
1

@WernerCD: Then try s/ */ /g (that's with three spaces, the comment formatting is collapsing them). The star operator will make the previous character optional, so if you to match two or more with it you need to match the first two yourself (two spaces) then add a third space and a star to make the third and following spaces optional. – Caleb Aug 19 '11 at 15:42
3

@userunknown: Actually I'm not mixing two things at all, everybody else is :) Replacing a single space with a single space is pointless, you only need to do this action on matches that have at least two sequential spaces. Two blanks and a plus or three blanks and a star are exactly what is needed. – Caleb Aug 19 '11 at 15:46
@userunknown: It's not that big a deal it's just a waste of a little bit of processing time and it throws off things like match counters. – Caleb Aug 19 '11 at 17:07

mikeserv · Answer 4 · 2015-06-02T19:30:55.733

You can always match the last occurrence in a sequence of anything like:

s/\(sequence\)*/\1/

And so you're on the right track, but rather than replacing the sequence with a space - replace it with its last occurrence - a single space. That way if a sequence of spaces is matched then the sequence is reduced to a single space, but if the null string is matched then the null string is replaced with itself - and no harm, no foul. So, for example:

sed 's/\( \)*/\1/g' <<\IN                                    
# iostat
System configuration: lcpu=4 drives=8 paths=2 vdisks=0

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait
          0.2         31.8                9.7   4.9   82.9      2.5

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk9           0.2      54.2       1.1   1073456960  436765896
hdisk7           0.2      54.1       1.1   1070600212  435678280
hdisk8           0.0       0.0       0.0          0         0
hdisk6           0.0       0.0       0.0          0         0
hdisk1           0.1       6.3       0.5   63344916  112429672
hdisk0           0.1       5.0       0.2   40967838  98574444
cd0              0.0       0.0       0.0          0         0
hdiskpower1      0.2     108.3       2.3   2144057172  872444176

# iostat | grep hdisk1
hdisk1           0.1       6.3       0.5   63345700  112431123

IN

OUTPUT

# iostat
System configuration: lcpu=4 drives=8 paths=2 vdisks=0

tty: tin tout avg-cpu: % user % sys % idle % iowait
 0.2 31.8 9.7 4.9 82.9 2.5

Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk9 0.2 54.2 1.1 1073456960 436765896
hdisk7 0.2 54.1 1.1 1070600212 435678280
hdisk8 0.0 0.0 0.0 0 0
hdisk6 0.0 0.0 0.0 0 0
hdisk1 0.1 6.3 0.5 63344916 112429672
hdisk0 0.1 5.0 0.2 40967838 98574444
cd0 0.0 0.0 0.0 0 0
hdiskpower1 0.2 108.3 2.3 2144057172 872444176

# iostat | grep hdisk1
hdisk1 0.1 6.3 0.5 63345700 112431123

All that said, it is probably far better to avoid regexps completely in this situation and do instead:

tr -s \  <infile

+1 for the simplicity of the real answer, iostat | tr -s \ — Wildcard, Oct 23 '15 at 14:50
'tr -s \ ' is the same as 'tr -s " "'. Made me realise that space can be passed as a argument in the string by escaping with "". I see that it can be used in shell scripts as well. Cool application. — randominstanceOfLivingThing, Jan 09 '18 at 16:52

rozcietrzewiacz · Answer 5 · 2011-08-19T19:46:39.683

5

Notice that you can also do what you attempt, that is

iostat | grep "hdisk1 " | sed -e's/  */ /g' | cut -d" " -f 5

by

iostat | while read disk tma kbps tps re wr; do [ "$disk" = "hdisk1" ] && echo "$re"; done

which might be especially useful if you later attempt to access other fields as well and/or calculate something - like this:

iostat | while read disk tma kbps tps re wr; do [ "$disk" = "hdisk1" ] && echo "$(( re/1024 )) Mb"; done

edited Aug 19 '11 at 19:46

answered Aug 19 '11 at 16:22

rozcietrzewiacz

39,269

Very nice. First version works. My AIX boxes don't seem to like the second one. All three boxes output: "$[ re/1024 ] Mb". The monitoring tool I'm using has conversions for reports so it isn't a "needed" thing for me, but I like it. – WernerCD Aug 19 '11 at 16:59
@enzotib Thanks for correcting the while. – rozcietrzewiacz Aug 19 '11 at 19:42
@WernerCD Ah, this $[ .. ] is probably available in recent versions of bash (maybe zsh too). I updated the answer to a more portable $(( .. )) instead. – rozcietrzewiacz Aug 19 '11 at 19:47
That did the trick. I'll have to look that up. Snazzy. – WernerCD Aug 19 '11 at 22:14

masukomi · Answer 6 · 2021-05-10T16:00:02.243

assuming the problem is

I deleted a bunch of things without using git rm. How do I delete them all from git without manually typing git rm <file path> again and again?

git status | grep "deleted: " | tr "\t" " " | sed -e "s/ *deleted: */git rm /" | bash

Note: the tr in the pipe is used because sed on the mac doesn't respect the \t character so you need to get rid of it first, or install GNU sed and use that. The line output by git (at the time of writing) starts with \tdeleted: Sadly, tz doesn't appear to come with macOS either. You'll want to install that with homebrew because it really helps out in a lot of situations.

Brad Parks · Answer 7 · 2021-02-06T02:50:55.033

0

You can use the following script to convert multiple spaces to a single space, a TAB or any other string:

$ ls | compress_spaces.sh       # converts multiple spaces to one
$ ls | compress_spaces.sh TAB   # converts multiple spaces to a single tab character
$ ls | compress_spaces.sh TEST  # converts multiple spaces to the phrase TEST
$ compress_spaces.sh help       # show the help for this command

compress_spaces.sh

#!/usr/bin/env bash
function show_help()
{
  ME=$(basename "$0")
  IT=$(cat <<EOF
usage: $ME {REPLACE_WITH}
NOTE: If you pass in TAB, then multiple spaces are replaced with a TAB character
no args -> multiple spaces replaced with a single space
  TAB     -> multiple spaces replaced with a single tab character
  TEST    -> multiple spaces replaced with the phrase "TEST"
$ME 
EOF
)
  echo "$IT"
  echo
  exit
}
if [ "$1" == "help" ]
then
  show_help
fi
Show help if we're not getting data from stdin
if [ -t 0 ]; then
  show_help
fi
REPLACE_WITH=${1:-' '}
if [ "$REPLACE_WITH" == "tab" ]
then
  REPLACE_WITH=$'\t'
fi
if [ "$REPLACE_WITH" == "TAB" ]
then
  REPLACE_WITH=$'\t'
fi
sed "s/ {1,}/$REPLACE_WITH/gp"

edited Feb 06 '21 at 02:50

answered Oct 28 '16 at 19:00

Brad Parks

1,669

is this for a shell other than BASH? compress_spaces.sh: line 3: unexpected EOF while looking for matching)'compress_spaces.sh: line 40: syntax error: unexpected end of file` – RobbieTheK Feb 05 '21 at 17:35
updated it to hopefully fix it! – Brad Parks Feb 06 '21 at 02:47
how do you pass a file name to this, @brad-parks? – RobbieTheK Feb 10 '21 at 15:22
@RobbieTheK - you'd have to do something like, cat FILENAME | compress_spaces.sh – Brad Parks Feb 10 '21 at 18:00

How to strip multiple spaces to one using sed?

7 Answers7

OUTPUT

compress_spaces.sh

Show help if we're not getting data from stdin

Linked