Bourne/POSIX-like shells have a split+glob operator and it's invoked every time you leave a parameter expansion ($var
, $-
...), command substitution ($(...)
), or arithmetic expansion ($((...))
) unquoted in list context.
Actually, you invoked it by mistake when you did for name in ${array[@]}
instead of for name in "${array[@]}"
. (Actually, you should beware that invoking that operator like that by mistake is source of many bugs and security vulnerabilities).
That operator is configured with the $IFS
special parameter (to tell what characters to split on (though beware that space, tab and newline receive a special treatment there)) and the -f
option to disable (set -f
) or enable (set +f
) the glob
part.
Also note that while the S
in $IFS
was originally (in the Bourne shell where $IFS
comes from) for Separator, in POSIX shells, the characters in $IFS
should rather be seen as delimiters or terminators (see below for an example).
So to split on _
:
string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
array=($string) # invoke the split+glob operator
for i in "${array[@]}"; do # loop over the array elements.
To see the distinction between separator and delimiter, try on:
string='var1_var2_'
That will split it into var1
and var2
only (no extra empty element).
So, to make it similar to JavaScript's split()
, you'd need an extra step:
string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
temp=${string}_ # add an extra delimiter
array=($temp) # invoke the split+glob operator
(note that it would split an empty $string
into 1 (not 0) element, like JavaScript's split()
).
To see the special treatments tab, space and newline receive, compare:
IFS=' '; string=' var1 var2 '
(where you get var1
and var2
) with
IFS='_'; string='_var1__var2__'
where you get: ''
, var1
, ''
, var2
, ''
.
Note that the zsh
shell doesn't invoke that split+glob operator implicitly like that unless in sh
or ksh
emulation. There, you have to invoke it explicitely. $=string
for the split part, $~string
for the glob part ($=~string
for both), and it also has a split operator where you can specify the separator:
array=(${(s:_:)string})
or to preserve the empty elements:
array=("${(@s:_:)string}")
Note that there s
is for splitting, not delimiting (also with $IFS
, a known POSIX non-conformance of zsh
). It's different from JavaScript's split()
in that an empty string is split into 0 (not 1) element.
A notable difference with $IFS
-splitting is that ${(s:abc:)string}
splits on the abc
string, while with IFS=abc
, that would split on a
, b
or c
.
With zsh
and ksh93
, the special treatment that space, tab or newline receive can be removed by doubling them in $IFS
.
As a historic note, the Bourne shell (the ancestor or modern POSIX shells) always stripped the empty elements. It also had a number of bugs related to splitting and expansion of $@ with non-default values of $IFS
. For instance IFS=_; set -f; set -- $@
would not be equivalent to IFS=_; set -f; set -- $1 $2 $3...
.
Splitting on regexps
Now for something closer to JavaScript's split()
that can split on regular expressions, you'd need to rely on external utilities.
In the POSIX tool-chest,awk
has a split
operator that can split on extended regular expressions (those are more or less a subset of the Perl-like regular expressions supported by JavaScript).
split() {
awk -v q="'" '
function quote(s) {
gsub(q, q "\\" q q, s)
return q s q
}
BEGIN {
n = split(ARGV[1], a, ARGV[2])
for (i = 1; i <= n; i++) printf " %s", quote(a[i])
exit
}' "$@"
}
string=a__b_+c
eval "array=($(split "$string" '[_+]+'))"
The zsh
shell has builtin support for Perl-compatible regular expressions (in its zsh/pcre
module), but using it to split a string, though possible is relatively cumbersome.
shell
are you using, withbash
you can doIFS='_' read -a array <<< "${string}"
– gwillie Sep 08 '15 at 10:19perl
can do that too. It's not "pure" shell, but it's quite common. – Sobrique Sep 08 '15 at 11:01