array1=($(find /etc -mindepth 1 -maxdepth 1 -type d))
Is wrong as it performs split+glob on the output of find to get the list (and the output of find without -print0 is not post-processable anyway). The correct syntax in bash (4.4+) would be:
readarray -td '' array1 < <(find /etc -mindepth 1 -maxdepth 1 -type d -print0)
Or in zsh:
array1=(/etc/*(ND/))
In echo $var | wc -c
You're counting the number of bytes in the output of echo. That's not the number of bytes in $var for several reasons:
- you forgot to quote
$var so it's subject to split+glob
echo does some transformations. Some implementations expand \x escape sequences, some treat values like -n as options
- finally,
echo append a newline character to the output (-n can skip that with some echo implementations).
Here, to use wc to count the bytes, you'd do:
printf %s "$var" | wc -c
In bash, ${#var} expands to the number of characters in the variable¹. For it to be the number of bytes, you can fix the locale to C:
LC_ALL=C
echo "${#var}"
To get the sum of the length in byte of all the elements of an array, you could concatenate them and then get the length of the resulting string:
printf %s "${array[@]}" | wc -c
Or:
IFS=
concat="${array[*]}"
LC_ALL=C
echo "${#concat}"
With zsh, you could do:
() { set -o localoptions +o multibyte
echo ${#${(j[])array}}
}
Where the j[sep] parameter expansion flag is used to join the elements of the array instead of using "${array[*]}" which uses the global $IFS. Instead of fixing the locale to C we can just disable the multibyte option to get character ≍ byte (which we do here locally in an anonymous function).
Note that to see the difference between byte and character, you need a locale that uses a multibyte encoding as its charmap (such as UTF-8, GB18030, BIG5...) and characters encoded on more than one byte. a is typically encoded on one byte, so you won't see a difference. € is encoded on 3 bytes in UTF-8 and one byte in ISO8859-15 for instance.
An example (here from zsh):
$ a=($'\xe2\x82\xac20' '$25' $'\xa420')
$ locale charmap
UTF-8
$ typeset -p a
typeset -a a=( €20 '$25' $'\M-$20' )
$ printf %s "${a[@]}" | wc -c
11
$ printf %s "${a[@]}" | wc -m
8
$ echo ${#${(j[])a}}
9
$ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}}
11
And if I switch to a locale where the charmap is ISO8859-15:
$ locale charmap
ISO-8859-15
$ a=($'\xe2\x82\xac20' '$25' $'\xa420')
$ typeset -p a
typeset -a a=( â¬20 '$25' €20 )
$ printf %s "${a[@]}" | wc -c
11
$ printf %s "${a[@]}" | wc -m
11
$ echo ${#${(j[])a}}
11
$ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}}
11
ISO8859-15 is a single byte character encoding, so character ≍ byte there.
More reading:
¹ similar to what wc -m does except that bash (or zsh) will also count bytes that can't be decoded into a character as one character each.
echoyou're adding a newline. If you want the actual size in bytes, useecho -nto avoid adding a newline. This is why an "empty" variable gives 1 when you useechoand a single character gives 2. – frabjous May 09 '22 at 22:51