array1=($(find /etc -mindepth 1 -maxdepth 1 -type d))
Is wrong as it performs split+glob on the output of find
to get the list (and the output of find
without -print0
is not post-processable anyway). The correct syntax in bash
(4.4+) would be:
readarray -td '' array1 < <(find /etc -mindepth 1 -maxdepth 1 -type d -print0)
Or in zsh
:
array1=(/etc/*(ND/))
In echo $var | wc -c
You're counting the number of bytes in the output of echo
. That's not the number of bytes in $var
for several reasons:
- you forgot to quote
$var
so it's subject to split+glob
echo
does some transformations. Some implementations expand \x
escape sequences, some treat values like -n
as options
- finally,
echo
append a newline character to the output (-n
can skip that with some echo
implementations).
Here, to use wc
to count the bytes, you'd do:
printf %s "$var" | wc -c
In bash
, ${#var}
expands to the number of characters in the variable¹. For it to be the number of bytes, you can fix the locale to C:
LC_ALL=C
echo "${#var}"
To get the sum of the length in byte of all the elements of an array, you could concatenate them and then get the length of the resulting string:
printf %s "${array[@]}" | wc -c
Or:
IFS=
concat="${array[*]}"
LC_ALL=C
echo "${#concat}"
With zsh, you could do:
() { set -o localoptions +o multibyte
echo ${#${(j[])array}}
}
Where the j[sep]
parameter expansion flag is used to join the elements of the array instead of using "${array[*]}"
which uses the global $IFS
. Instead of fixing the locale to C
we can just disable the multibyte
option to get character ≍ byte (which we do here locally in an anonymous function).
Note that to see the difference between byte and character, you need a locale that uses a multibyte encoding as its charmap (such as UTF-8, GB18030, BIG5...) and characters encoded on more than one byte. a
is typically encoded on one byte, so you won't see a difference. €
is encoded on 3 bytes in UTF-8 and one byte in ISO8859-15 for instance.
An example (here from zsh
):
$ a=($'\xe2\x82\xac20' '$25' $'\xa420')
$ locale charmap
UTF-8
$ typeset -p a
typeset -a a=( €20 '$25' $'\M-$20' )
$ printf %s "${a[@]}" | wc -c
11
$ printf %s "${a[@]}" | wc -m
8
$ echo ${#${(j[])a}}
9
$ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}}
11
And if I switch to a locale where the charmap is ISO8859-15:
$ locale charmap
ISO-8859-15
$ a=($'\xe2\x82\xac20' '$25' $'\xa420')
$ typeset -p a
typeset -a a=( â¬20 '$25' €20 )
$ printf %s "${a[@]}" | wc -c
11
$ printf %s "${a[@]}" | wc -m
11
$ echo ${#${(j[])a}}
11
$ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}}
11
ISO8859-15 is a single byte character encoding, so character ≍ byte there.
More reading:
¹ similar to what wc -m
does except that bash (or zsh) will also count bytes that can't be decoded into a character as one character each.
echo
you're adding a newline. If you want the actual size in bytes, useecho -n
to avoid adding a newline. This is why an "empty" variable gives 1 when you useecho
and a single character gives 2. – frabjous May 09 '22 at 22:51