How to count spaces in text?

Question

In the following example, there are 4 spaces before inet.

wolf@linux:~$ ip address show eth0 | grep 'inet '
    inet 10.10.10.10/24 brd 10.10.10.255 scope global dynamic eth0
wolf@linux:~$

How do I count the number of spaces like this example.

This sample is easy as it only has 4 spaces.

What if it has more than that? Hundreds, thousands?

Is there an easy way to do this?

"spaces" where? At the beginning of the line, as you noticed previously, or in a particular line, or in all of the output? Because there are more than 4 spaces in the line of output that you showed. — Jeff Schaller, Aug 27 '20 at 15:10

score 8 · Accepted Answer · edited Aug 27 '20 at 14:48

8

You can use tr to delete everything that’s not the character you’re interested in, the wc to count the remaining characters:

ip address show eth0 | grep 'inet ' | tr -d -c ' ' | wc -m

This scales well to large amounts of text, tr is very efficient.

Note however that with some implementations of tr including GNU tr, that only works properly for single-byte characters (such as the space character).

If you only want to count leading spaces, you’ll need something a little more powerful than tr:

ip address show eth0 | grep 'inet ' | sed 's/[^ ].*$//' | tr -d '\n' | wc -m

This deletes every part of each line which is not leading space, then deletes newlines and counts.

See How to count the number of a specific character in each line? if you’re interested in counts per line.

edited Aug 27 '20 at 14:48

Stéphane Chazelas

544,893

answered Aug 27 '20 at 14:41

Stephen Kitt

434,908

Thanks @Stephen, it works. I notice that tr also can be used to visualized the empty space, something like tr ' ' '#'. Is this the best way to visualize white space? – Wolf Aug 27 '20 at 14:47
2

That’s a good way, whether it’s the best depends on circumstances (in particular, on whether # is already present). – Stephen Kitt Aug 27 '20 at 14:50
Thanks for that link. Interesting topic – Wolf Aug 27 '20 at 14:52
Would grep -c ... not work, with a suitable space pattern? The tr/wc is nice though – D. Ben Knoble Aug 28 '20 at 17:30
@D.BenKnoble grep -c counts matching lines, so you’d have to transform the input first. – Stephen Kitt Aug 28 '20 at 18:50

Stéphane Chazelas · Answer 2 · 2020-08-27T15:12:23.860

To count the number of space characters at the start of each line, you could do:

awk -F '[^ ].*' '{print length($1)}'

Which prints the length (in number of characters) of the first field, where field are separated by any sequence of characters starting with a non-space.

To report the maximum amount of whitespace found at the start of any line of the input (the maximum indentation), with GNU wc:

sed 's/[^[:blank:]].*//' | wc -L

That reports that amount of whitespace in terms of display width on a display device where tab stops are 8 columns appart:

$ printf '\tfoo\n' | sed 's/[^[:blank:]].*//' | wc -L
8

$ printf '\u3000foo\n' | sed 's/[^[:blank:]].*//' | wc -L
2

The U+3000 character (the ideographic space character, classified as blank in my locale) is a double-width character encoded on 3 bytes in UTF-8.

If you'd rather wanted that maximum length to be reported in terms of number of characters:

sed 's/[^[:blank:]].*//;s/./x/g' | wc -L

(s/./x/g converts every character on each line to x which we know has a display width of 1).

Or in terms of number of bytes:

sed 's/[^[:blank:]].*//' |
  LC_ALL=C tr -c '\n' '[x*]' | # convert each byte other than newline to x
  wc -L

It's cool to find out so many things can be accomplished with awk. Thanks @Stéphane Chazelas — Wolf, Aug 27 '20 at 14:56

Quasímodo · Answer 3 · 2020-08-27T16:06:50.233

3

Print the number of leading spaces:
```
awk '{print match($0,/[^ ]|$/)-1}' file
```
match($0,/[^ ]|$/) matches the first non-space ([^ ]) or the end-of-line ($) and returns its position.
Print the number of spaces:
```
awk -F '[ ]' '{print (NF?NF-1:0)}' file
```
-F '[ ]' sets the field separator to space. NF is the number of fields. The ternary expression means: "If NF is not 0, print NF-1, else print 0". This is because NF is 0 if the line is empty.

edited Aug 27 '20 at 16:06

answered Aug 27 '20 at 14:54

Quasímodo

18,865
4
36
73

2

Or print NF-!!NF. See also print gsub(" "," ")' – Stéphane Chazelas Aug 27 '20 at 16:12

score 2 · Answer 4 · answered Aug 27 '20 at 14:50

2

it reads like what you really want is how to delete leading white space

many ways to do that, assuming you want to do it in bash I found this from

https://www.cyberciti.biz/tips/delete-leading-spaces-from-front-of-each-word.html

echo "     This is a test"
remove leading white space on the output
echo "     This is a test" | sed -e 's/^[ \t]*//'

so in your case you could do

ip address show eth0 | grep 'inet ' | sed -e 's/^[ \t]*//'

also check out How do I trim leading and trailing whitespace from each line of some output?

answered Aug 27 '20 at 14:50

ron

6,575

Thanks @ron. I appreciate the answer and tips given. – Wolf Aug 27 '20 at 14:55
2

In standard sed implementations, [ \t] matches on either SPC, backslash or t. GNU sed however only does it when POSIXLY_CORRECT is in the environment. sed 's/^[[:blank:]]*//' would be standard. It would remove spaces and tabs and all other characters classified as blank in the locale. – Stéphane Chazelas Aug 27 '20 at 14:59

score 0 · Answer 5 · answered Aug 27 '20 at 18:26

0

I have taken below example

`echo "      praveen"| grep -o "^ *"| awk '{print length($0)}'`6
output
6

Python

>>> a="      praveen"
>>> import re
>>> k=re.compile(r'^ *')
>>> m=re.search(k,a)
>>> print len(m.group())
6
>>>

answered Aug 27 '20 at 18:26

Praveen Kumar BS

5,211

Rakesh Sharma · Answer 6 · 2020-08-28T08:46:45.287

$ ip address show eth0 \
| grep -oP '^\h*(?=inet\h)' \
| wc -m;

This uses GNU grep with PCRE mode and looks for any leading horizontal whitespace, aka, [[:blank :]], followed by inet and another blank. Then we feed it to wc to get a char count.

Using gnu awk with the FPATH variable set to a run of blanks.

$ ... |
 gawk -v FPAT='\\s*' '$0=length($1)""'

Using python list compression we can also feed grep's o/p to get the count.

$ ... | python3 -c 'import sys;print(*[len(l)-len(l.lstrip()) for l in sys.stdin],sep="\n")'

We can also feed the grep output to perl.

$ ..... |
  perl -F'\H' -pE '$_=$F[0]=~y///c'

Here we split the record on nonhorizontal whitespace and then replace every character in the first field, in a scalar context returns the translations done. Assign it to the record and have the -p option get it autoprinted.

How to count spaces in text?

6 Answers6

remove leading white space on the output