Why is using && 75 times faster than if...fi and how to make code clearer

Question

I have the following working code:

largest_prime=1
for number_under_test in {1..100}
do
  is_prime=true
  factors=''
  for ((divider = 2; divider < number_under_test-1; divider++));
  do
    remainder=$(($number_under_test % $divider))
    [ $remainder == 0 ] && [ is_prime ] && is_prime=false && factors+=$divider' '
  done
  [ $is_prime == true ] && echo "${number_under_test} is prime!" || echo "${number_under_test} is NOT prime (factors= $factors)"  [ $is_prime == true ] && largest_prime=$number_under_test
done
printf "\nLargest Prime= $largest_prime\n"

This code runs quickly is 0.194 seconds. However I found the && is_prime= false a bit hard to read and it could look (to the untrained eye) as if it was being tested rather than being set which is what it does. So I tried changed the && into an if...then and this works - but is 75 times slower at 14.48 seconds. It's most noticeable on the higher numbers.

largest_prime=1
for number_under_test in {1..100}
do
  is_prime=true
  factors=''
  for ((divider = 2; divider < number_under_test-1; divider++));
  do  
    remainder=$(($number_under_test % $divider))
    if ([ $remainder == 0 ] && [ $is_prime == true ]); then
      is_prime=false
      factors+=$divider' '
    fi  
  done
  [ $is_prime == true ] && echo "${number_under_test} is prime!" || echo "${number_under_test} is NOT prime (factors= $factors)"  [ $is_prime == true ] && largest_prime=$number_under_test
done  
printf "\nLargest Prime= $largest_prime\n"

Is there any was to have the clarity of the block without the slowness?

Update (1/4/2015 10:40am EST)

Great feedback! I am now using the following. Any other feedback ?

largest_prime=1
separator=' '
for number_under_test in {1..100}; {
  is_prime=true
  factors=''
  for ((divider = 2; divider < (number_under_test/2)+1; divider++)) {
    remainder=$(($number_under_test % $divider))
    if [ $remainder == 0 ]; then
      is_prime=false
      factors+=$divider' '
    fi
  } 
  if $is_prime; then
    printf "\n${number_under_test} IS prime\n\n"
    largest_prime=$number_under_test
  else
    printf "${number_under_test} is NOT prime, factors are: "
    printf "$factors\n"
  fi
}
printf "\nLargest Prime= $largest_prime\n"

On a sidenote, running your script prints that Largest Prime= 100 on my computer. — Giulio Muscarello, Jan 03 '15 at 18:09
Also on a sidenote, if you're interested in efficiency, one trivial way to improve this would be to only iterate up to number_under_test/2 instead of up to number_under_test-1: No factor of a number n is greater than n/2, so you will still find all factors for non-prime numbers by doing this. (Also if you were only interested in testing for primeness, it would be sufficient to iterate up to sqrt(n), but Bash doesn't have a built-in function to compute square roots anyway.) — Malte Skoruppa, Jan 04 '15 at 00:42
Matte, good point (+1). The only change was that didn't work for the number 4 so I had to make it (number_under_test/2)+1 to allow for that — Michael Durrant, Jan 04 '15 at 14:36
In your updated version, the braces {} are not really needed after the then clause because the then already serves as a grouping operator (along with elif, else, or fi). In fact, in some shells, you could write, for example, for i in 1 2 3; { echo $i; } with no do or done. — Jonathan Leffler, Jan 04 '15 at 15:27
you should unset IFS. if $is_prime will fail unless $is_prime evals to a word. If it is split for any reason it will fail. You also have the problem of needing to generate your entire parameter set before you can do even a single calculation. It is very inefficient and the whole {1..100} block must reside in memory for the duration. — mikeserv, Jan 04 '15 at 20:26
if [ $remainder == 0 ] -- this is an arithmetic test, so use an arithmetic expression: if (( $remainder == 0 )) -- or remove the temp variable altogether: if (( $number_under_test % $divider == 0 )). You don't actually use the $separator variable anywhere — glenn jackman, Jan 05 '15 at 14:35
Don't use printf without a format specifier. If one of your variables happens to contain a %, you'll get an error. — glenn jackman, Jan 05 '15 at 14:43
@glennjackman - if the content of the variables is questionable then (( $var )) is definitely not a safe thing to do. Rather (( var )) should be preferred. But ((...)) is probably not a good way to go, anyway. — mikeserv, Jan 05 '15 at 17:10
@glennjackman - just because it offers no advantage over [ "$((...))" -ne 0 ] or even math() { return "$((!($*)))"; } but is far less portable. You might as well use syntax that will work everywhere rather limiting the application of what you do if the practical difference amounts only to typing a few characters. — mikeserv, Jan 05 '15 at 17:44
@glennjackman - actually, I think, in bash and zsh at least even (( var )) is still an unsafe thing to do. I think both of those shells will still wind up evaling the contents as a math statement rather than an integer regardless of whether you $expand it first or not. You could explicitly typeset it first, I suppose, to guard against that. But if you do so, then its contents are no longer questionable. — mikeserv, Jan 05 '15 at 18:04

glenn jackman · Accepted Answer · 2015-01-03T16:02:07.840

67

That's because you're spawning a sub-shell every time:

if ([ $remainder == 0 ] && [ $is_prime == true ]); then

Just remove the parentheses

if [ $remainder == 0 ] && [ $is_prime == true ]; then

If you want to group commands, there's syntax to do that in the current shell:

if { [ $remainder == 0 ] && [ $is_prime == true ]; }; then

(the trailing semicolon is required, see the manual)

Note that [ is_prime ] is not the same as [ $is_prime == true ]: you could write that as simply $is_prime (with no brackets) which would invoke the bash built-in true or false command.
[ is_prime ] is a test with one argument, the string "is_prime" -- when [ is given a single argument, the result is success if the argument is non-empty, and that literal string is always non-empty, hence always "true".

For readability, I would change the very long line

[ $is_prime == true ] && echo "${number_under_test} is prime!" || echo "${number_under_test} is NOT prime (factors= $factors)"  [ $is_prime == true ] && largest_prime=$number_under_test

to

if [ $is_prime == true ]; then
  echo "${number_under_test} is prime!"
else 
  echo "${number_under_test} is NOT prime (factors= $factors)"
  # removed extraneous [ $is_prime == true ] test that you probably
  # didn't notice off the edge of the screen
  largest_prime=$number_under_test
fi

Don't underestimate whitespace to improve clarity.

edited Jan 03 '15 at 16:02

answered Jan 03 '15 at 15:54

glenn jackman

85,964

1

there is a Typo - largest_prime=$number_under_test should be in the then branch (the same error is in the original) – JJoao Jan 03 '15 at 19:31
1

It's also worth noting that in bash, zsh, et al, [ is invoking a program literally called [, whereas [[ is implemented in the shell - hence it'll be faster. Try time for ((i = 0; $i < 1000; i++)); do [ 1 ]; done and compare to [[. See this SO question for further info. – kirb Jan 04 '15 at 11:55
2

bash implements [, it's a builtin. From a shell prompt, type type -a [ and help [ – glenn jackman Jan 04 '15 at 14:03
@glennjackman Wow; wasn't aware of that. I assumed it was still the case because which [ still returns /usr/bin/[. I also just realised that I implied zsh was the same; for me that does tell me it's a builtin. But then... why is [[ faster? – kirb Jan 04 '15 at 17:27
which only looks at executable files in the PATH. Try less /bin/which. type -a is preferable for bash (IMO) – glenn jackman Jan 04 '15 at 19:45
2

@glennjackman command -v is another good which alternative; see also here. – Abbafei Jan 05 '15 at 07:40

mikeserv · Answer 2 · 2015-01-04T00:31:59.360

I think you're working way too hard on that function of yours. Consider:

unset num div lprime; set -- "$((lprime=(num=(div=1))))"
while [     "$((     num += ! ( div *= ( div <= num   ) ) ))" -eq \
            "$((     num *=   ( div += 1 )   <= 101   ))" ]    && {
      set   "$(( ! ( num %      div )         * div   ))"     "$@"
      shift "$(( !    $1 +    ( $1 ==  1 )    *  $#   ))"
}; do [ "$div" -gt "$num" ] && echo "$*"      
done

Shell arithmetic is pretty capable of evaluating integer conditions on its own. It rarely needs too many tests and/or outside assignments. This one while loop duplicates your nested loops fairly well:

It doesn't print as much, of course, I didn't write all that much, but, for example setting the ceiling to 16 rather than 101 as is written above and...

It's definitely doing the work. And it requires very little else to approximate your output:

...
do [ "$div" -eq "$num" ] && shift &&
   printf "$num ${1+!}= prime.${1+\t%s\t%s}\n" \
          "factors= $*"                        \
          "lprime=$(( lprime = $# ? lprime : num ))"
done

Just doing that rather than the echo and...

1 = prime.
2 = prime.
3 = prime.
4 != prime.     factors= 2      lprime=3
5 = prime.
6 != prime.     factors= 3 2    lprime=5
7 = prime.
8 != prime.     factors= 4 2    lprime=7
9 != prime.     factors= 3      lprime=7
10 != prime.    factors= 5 2    lprime=7
11 = prime.
12 != prime.    factors= 6 4 3 2        lprime=11
13 = prime.
14 != prime.    factors= 7 2    lprime=13
15 != prime.    factors= 5 3    lprime=13

This works in busybox. It is very portable, fast, and easy to use.

Your subshell issue is going to occur in most shells, but it is, by far, most acute in a bash shell. I alternated between doing

( [ "$div" -gt "$num" ] ) && ...

...and the way I wrote it above in several shells for a ceiling of 101 and dash did it without the subshell in .017 seconds and with the subshell in 1.8 seconds. busybox .149 and 2, zsh .149 and 4, bash .35 and 6, and ksh93 in .149 and .160. ksh93 does not fork for subshells as the other shells must. So maybe the problem is not so much the subshell as it is the shell.

What's the advantage of [ "$((...))" -eq "$((...))" ] over (( (...) == (...) ))? Is the latter less portable? — ruakh, Jan 04 '15 at 17:33
@ruakh - portability, speed, reliability. [ "$((...))" -eq "$((...)) ] works in shells that don't take 15 seconds to run the program, and the other doesn't. If the advantage of one over another is questionable at all, then that can only give the edge to the former, which means there is never a good reason to use (( (...) == (...) )). — mikeserv, Jan 04 '15 at 19:09
Sorry, but your reply seems to assume that I already have a detailed knowledge of shell support for (( ... )). I'm flattered, but I do not have that detailed knowledge. (Remember, I'm the one who just asked if (( ... )) is less portable.) So I really can't make sense of your reply. :-/ Could you be a bit more explicit? — ruakh, Jan 04 '15 at 19:27
@ruakh - I'm sorry... I didn't see that you were asking if it was more portable, just how it was advantageous - which is why I replied about portability. Anyway, the "$((...))" is POSIX-specified and the other is a shell extension. POSIX shells are pretty capable. Even dash and posh will correctly handle branch tests like "$((if_true ? (var=10) : (var=5) ))" and always assign $var correctly. busybox breaks there - it always evals both sides regardless of $if_true's value. — mikeserv, Jan 04 '15 at 19:37
@ruakh - oh man. I must be a little off today... it says right there... is the latter less portable? I didn't see that before, I guess...? — mikeserv, Jan 04 '15 at 19:44

Why is using && 75 times faster than if...fi and how to make code clearer

Update (1/4/2015 10:40am EST)

2 Answers2