[TL,DR: use the urlencode_grouped_case
version in the last code block.]
Awk can do most of the job, except that it annoyingly lacks a way to convert from a character to its number. If od
is present on your device, you can use it to convert all characters (more precisely, bytes) into the corresponding number (written in decimal, so that awk can read it), then use awk to convert valid characters back into literals and quoted characters into the proper form.
urlencode_od_awk () {
echo -n "$1" | od -t d1 | awk '{
for (i = 2; i <= NF; i++) {
printf(($i>=48 && $i<=57) || ($i>=65 && $i<=90) || ($i>=97 && $i<=122) ||
$i==45 || $i==46 || $i==95 || $i==126 ?
"%c" : "%%%02x", $i)
}
}'
}
If your device doesn't have od
, you can do everything inside the shell; this will significantly help performance (fewer calls to external program — none if printf
is a builtin) and be easier to write correctly. I believe all Busybox shells support the ${VAR#PREFIX}
construct to trim a prefix from a string; use it to strip the first character of the string repeatedly.
urlencode_many_printf () {
string=$1
while [ -n "$string" ]; do
tail=${string#?}
head=${string%$tail}
case $head in
[-._~0-9A-Za-z]) printf %c "$head";;
*) printf %%%02x "'$head"
esac
string=$tail
done
echo
}
If printf
is not a builtin but an external utility, you will again gain performance by invoking it only once for the whole function instead of once per character. Build up the format and parameters, then make a single call to printf
.
urlencode_single_printf () {
string=$1; format=; set --
while [ -n "$string" ]; do
tail=${string#?}
head=${string%$tail}
case $head in
[-._~0-9A-Za-z]) format=$format%c; set -- "$@" "$head";;
*) format=$format%%%02x; set -- "$@" "'$head";;
esac
string=$tail
done
printf "$format\\n" "$@"
}
This is optimal in terms of external calls (there's a single one, and you can't do it with pure shell constructs unless you're willing to enumerate all characters that need to be escaped). If most of the characters in the argument are to be passed unchanged, you can process them in a batch.
urlencode_grouped_literals () {
string=$1; format=; set --
while
literal=${string%%[!-._~0-9A-Za-z]*}
if [ -n "$literal" ]; then
format=$format%s
set -- "$@" "$literal"
string=${string#$literal}
fi
[ -n "$string" ]
do
tail=${string#?}
head=${string%$tail}
format=$format%%%02x
set -- "$@" "'$head"
string=$tail
done
printf "$format\\n" "$@"
}
Depending on compilation options, [
(a.k.a. test
) may be an external utility. We're only using it for string matching which can also be done within the shell with the case
construct. Here are the last two approaches rewritten to avoid the test
builtin, first going character by character:
urlencode_single_fork () {
string=$1; format=; set --
while case "$string" in "") false;; esac do
tail=${string#?}
head=${string%$tail}
case $head in
[-._~0-9A-Za-z]) format=$format%c; set -- "$@" "$head";;
*) format=$format%%%02x; set -- "$@" "'$head";;
esac
string=$tail
done
printf "$format\\n" "$@"
}
and copying each literal segment in a batch:
urlencode_grouped_case () {
string=$1; format=; set --
while
literal=${string%%[!-._~0-9A-Za-z]*}
case "$literal" in
?*)
format=$format%s
set -- "$@" "$literal"
string=${string#$literal};;
esac
case "$string" in
"") false;;
esac
do
tail=${string#?}
head=${string%$tail}
format=$format%%%02x
set -- "$@" "'$head"
string=$tail
done
printf "$format\\n" "$@"
}
I tested on my router (MIPS processor, DD-WRT-based distribution, BusyBox with ash, external printf
and [
). Each version is a noticeable speed improvement on the previous one. Moving to a single fork is the most significant improvement; it's the one that makes the function respond almost instantly (in human terms) as opposed to after a few seconds for a realistic long URL parameter.
Note that the code above may fail in fancy locales (not likely on a router). You may need export LC_ALL=C
if you use a non-default locale.
[
utility on my router. Avoiding the external[
is a significant speed improvement. – Gilles 'SO- stop being evil' Jan 09 '13 at 11:29