2

I was trying to determine the best performance for string fill like in:
str+="A" #one per loop

I came with this script for bash:

#!/bin/bash
bReport=false
nLimit=${1-3000}; #up to 25000

echo "nLimit='$nLimit'"
shopt -s expand_aliases
nStop=100000;fMaxWorkTime=1.0;
alias GetTime='date +"%s.%N"';
nTimeBegin="`GetTime`";
nDelayPart="`GetTime`";
strFinal="";
str="";
fPartWorkSleep="`bc <<< "scale=10;($fMaxWorkTime/$nStop)*$nLimit"`"
echo "fPartWorkSleep='$fPartWorkSleep'"
nCount=0;
while true;do 
    str+="A";
    ((nCount++))&&:;
    if(((nCount%nLimit)==0)) || ((nCount==nStop));then 
        strFinal+="$str";
        str="";
        if $bReport;then
            echo "`bc <<< "$(GetTime)-$nDelayPart"` #${#strFinal} #`bc <<< "$(GetTime)-$nTimeBegin"`";
            nDelayPart="`GetTime`";
        fi
        sleep $fPartWorkSleep # like doing some weigthy thing based on the amount of data processed
    fi;
    if((nCount==nStop));then 
        break;
    fi;
done;
echo "strFinal size ${#strFinal}"
echo "took `bc <<< "$(GetTime)-$nTimeBegin"`"

And in bash the best performance/size is when str is limited from 3000 to 25000 characters (on my machine). After each part is filled, it must be emptied and some weigthy action can be performed with str value (and the weight is relative to its size).

So my question is, what shell has the best string fill performance? based on what I exposed. I am willing to use other shell than bash, just for this kind of algorithm, it if proves to be faster.

PS.: I had to use nCount as checks on string size degraded performance.

  • Depends on your system, I suppose. In my case, a value of 5000 worked best. Have you tried timing just str+="A"? http://paste.ubuntu.com/14124037/ – muru Dec 21 '15 at 02:27
  • i don't understand - what does this do? you mean you want a string which is exactly 40000 chars long, and any shorter string should have the difference made up at the tail end? – mikeserv Dec 21 '15 at 02:34
  • @mikeserv the final string will have 100000 (nStop), but each part (nLimit) has best performance if filled up to 40000 (before performing actions on that amount) on bash on my machine. The action can be performed in any amount, but if I use like 1000 per part, the performance is severely degraded (will take much more time than if I use 40000 per part), and also, if I use 45000 per part, the performance is worse than 40000 (so 40000 became like a magic number for bash string fill) – Aquarius Power Dec 21 '15 at 02:56
  • 1
    @AquariusPower your script also runs bc each time you empty the string. How do you compensate for that? – muru Dec 21 '15 at 03:07
  • well... if instead of tacking it onto your value, if you just kept it separate then you could do "$value$fill$fill$fill" and so on. it's what i do with set and shift - i just get as many empty positional parameters as i need to make it to 40k and then change $IFS to whatever the filler char should be. and if $IFS were changed again the whole 30 some thousand char string would just change each time $* expanded. – mikeserv Dec 21 '15 at 03:07
  • @muru not only bc as date, well pointed, let me check.. – Aquarius Power Dec 21 '15 at 03:12
  • yeah - it's way faster when you just concatenate. – mikeserv Dec 21 '15 at 03:20
  • @muru (I fixed a bug in the script report part. it also can be disabled with bReport to not interfere in the timing) on my script, using 40000 completes in 2.4s. using the maximum 100000 (that is equivalent to your script), it takes 3.1s. So basically instead of fully filling the string, if I use smaller blocks to perform the action in parts (not in full), the performance is better. – Aquarius Power Dec 21 '15 at 03:34
  • I had to rework the script to make it more robust. My timing reports on these comments changed... – Aquarius Power Dec 21 '15 at 04:13

1 Answers1

2
for sh  in bash zsh yash dash mksh ksh
do      printf  "\n%s:\t" "$sh"
        time    "$sh" -c '
                        str="some string"
                        set     "" ""
                        while   ${20001+"break"}
                        do      set "$@$@";done
                        IFS=A;  printf %.100000s\\n "$str$*$*$*$*$*"'|
                wc -c
done

bash:   100001
"$sh" -c   0.15s user 0.01s system 94% cpu 0.176 total
wc -c  0.00s user 0.00s system 1% cpu 0.175 total

zsh:    100001
"$sh" -c   0.03s user 0.01s system 97% cpu 0.034 total
wc -c  0.00s user 0.00s system 9% cpu 0.034 total

yash:   100001
"$sh" -c   0.06s user 0.01s system 94% cpu 0.067 total
wc -c  0.00s user 0.00s system 5% cpu 0.067 total

dash:   100001
"$sh" -c   0.02s user 0.01s system 92% cpu 0.029 total
wc -c  0.00s user 0.00s system 11% cpu 0.028 total

ksh:    100001
"$sh" -c   0.02s user 0.00s system 96% cpu 0.021 total
wc -c  0.00s user 0.00s system 16% cpu 0.021 total

So this benches the various shells set to $sh in the for loop on how quickly they can generate a string of 100,000 characters. The first 11 of those 100,000 chars are some string as is first set to the value of $str, but the tail fill is 999,989 A chars.

The shells get the A chars in $* which substitutes in the first character in the value of the special shell parameter $IFS as a concatenation delimiter between every positional parameter in shell's argument array. Because all of the arguments are "" null, the only chars in $* are the delimiter chars.

The arguments are accrued at an exponential rate for each iteration of the while loop - which only breaks when the $20001 parameter has finally been ${set+}. Until then, basically the while loop does:

### first iteration
while $unset_param; do set "" """" ""; done
### second iteration
while $unset_param; do set "" "" """" "" ""; done
### third iteration
while $unset_param; do set "" "" "" "" """" "" "" "" ""; done

...and so on.

After the while loop completes $IFS is set to A and the special shell parameter $* is concatenated five times to the tail of $str. printf trims the resulting %string argument to a maximum of .100000 bytes before writing it out to its stdout.

One might use the same strategy like:

str='some string'
set "" ""
while ${51+"break"}; do set "$@$@"; done
shift "$((${#}-(51-${#str}))"

...which results in a total argument count of 40 - and so 39 delimiters...

IFS=.; printf %s\\n "$str$*"

some string.......................................

And you can reuse the same arguments you've already set w/ a different $IFS for a different fill:

for IFS in a b c; do printf %s\\n "$str$*"; done

some stringaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
some stringbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
some stringccccccccccccccccccccccccccccccccccccccc

You can also fill in the null arguments with a printf format string rather than using $IFS:

printf "%s m%sy%1ss%st%sr%si%sn%sg" "$str$@"

some string my string my string my string my string my string
mikeserv
  • 58,310