24

I want to print the floating point number with exactly two significant digits in bash (maybe using a common tool like awk, bc, dc, perl etc.).

Examples:

  • 76543 should be printed as 76000
  • 0.0076543 should be printed as 0.0076

In both cases the significant digits are 7 and 6. I have read some answers for similar problems like:

How to round floating point numbers in shell?

Bash limiting precision of floating point variables

but the answers focus on limiting the number of decimal places (eg. bc command with scale=2 or printf command with %.2f) instead of significant digits.

Is there an easy way to format the number with exactly 2 significant digits or do I have to write my own function?

tafit3
  • 343

3 Answers3

16

This answer to the first linked question has the almost-throwaway line at the end:

See also %g for rounding to a specified number of significant digits.

So you can simply write

printf "%.2g" "$n"

(but see the section below on decimal separator and locale, and note that non-Bash printf need not support %f and %g).

Examples:

$ printf "%.2g\n" 76543 0.0076543
7.7e+04
0.0077

Of course, you now have mantissa-exponent representation rather than pure decimal, so you'll want to convert back:

$ printf "%0.f\n" 7.7e+06
7700000

$ printf "%0.7f\n" 7.7e-06
0.0000077

Putting all this together, and wrapping it in a function:

# Function round(precision, number)
round() {
    n=$(printf "%.${1}g" "$2")
    if [ "$n" != "${n#*e}" ]
    then
        f="${n##*e-}"
        test "$n" = "$f" && f= || f=$(( ${f#0}+$1-1 ))
        printf "%0.${f}f" "$n"
    else
        printf "%s" "$n"
    fi
}

(Note - this function is written in portable (POSIX) shell, but assumes that printf handles the floating-point conversions. Bash has a built-in printf that does, so you're okay here, and the GNU implementation also works, so most GNU/Linux systems can safely use Dash).

Test cases

radix=$(printf %.1f 0)
for i in $(seq 12 | sed -e 's/.*/dc -e "12k 1.234 10 & 6 -^*p"/e' -e "y/_._/$radix/")
do
    echo $i "->" $(round 2 $i)
done

Test results

.000012340000 -> 0.000012
.000123400000 -> 0.00012
.001234000000 -> 0.0012
.012340000000 -> 0.012
.123400000000 -> 0.12
1.234 -> 1.2
12.340 -> 12
123.400 -> 120
1234.000 -> 1200
12340.000 -> 12000
123400.000 -> 120000
1234000.000 -> 1200000

A note on decimal separator and locale

All the working above assumes that the radix character (also known as the decimal separator) is ., as in most English locales. Other locales use , instead, and some shells have a built-in printf that respects locale. In these shells, you may need to set LC_NUMERIC=C to force the use of . as radix character, or write /usr/bin/printf to prevent use of the built-in version. This latter is complicated by the fact that (at least some versions) seem to always parse arguments using ., but print using the current locale settings.

Toby Speight
  • 8,678
  • @Stéphane Chazelas, why did you change my carefully tested POSIX shell shebang back to Bash after I removed the bashism? Your comment mentions%f/%g, but that's the printf argument, and one doesn't need a POSIX printf to have a POSIX shell. I think you should have commented instead of editing there. – Toby Speight Mar 09 '18 at 11:02
  • printf %g cannot be used in a POSIX script. It's true it's down to the printf utility, but that utility is builtin in most shells. The OP tagged as bash, so using a bash shebang is one easy way to get a printf that supports %g. Otherwise, you'd need to add a assuming your printf (or the printf builtin of your sh if printf is builtin there) supports the non-standard (but quite common) %g... – Stéphane Chazelas Mar 09 '18 at 11:30
  • dash's has a builtin printf (which supports %g). On GNU systems, mksh is probably the only shell these days that won't have a builtin printf. – Stéphane Chazelas Mar 09 '18 at 11:41
  • Thanks for your improvements - I've edited to just remove the shebang (since question is tagged bash) and relegate some of this to notes - does it look correct now? – Toby Speight Mar 09 '18 at 11:48
  • 2
    Sadly this doesn't print the correct number of digits if the trailing digits are zeros. For example printf "%.3g\n" 0.400 gives 0.4 not 0.400 – phiresky Jan 13 '20 at 16:36
4

TL;DR

Just copy and use the function sigf in the section A reasonably good "significant numbers" function:. It is written (as all code in this answer) to work with dash.

It will give the printf approximation to the integer part of N with $sig digits.

About the decimal separator.

The first problem to solve with printf is the effect and use of the "decimal mark", which in US is a point, and in DE is a comma (for example). It is a problem because what works for some locale (or shell) will fail with some other locale. Example:

$ dash -c 'printf "%2.3f\n" 12.3045'
12.305
$  ksh -c 'printf "%2.3f\n" 12.3045'
ksh: printf: 12.3045: arithmetic syntax error
ksh: printf: 12.3045: arithmetic syntax error
ksh: printf: warning: invalid argument of type f
12,000
$ ksh -c 'printf "%2.2f\n" 12,3045'
12,304

One common (and incorrect solution) is to set LC_ALL=C for the printf command. But that sets the decimal mark to a fixed decimal point. For locales where a comma (or other) is the common used character that is a problem.

The solution is to find out inside the script for the shell running it what is the locale decimal separator. That is quite simple:

$ printf '%1.1f' 0
0,0                            # for a comma locale (or shell).

Removing zeros:

$ dec="$(IFS=0; printf '%s' $(printf '%.1f'))"; echo "$dec"
,                              # for a comma locale (or shell).

That value is used to change the file with the list of tests:

sed -i 's/[,.]/'"$dec"'/g' infile

That makes the runs on any shell or locale automatically valid.


Some basics.

It should be intuitive to cut the number to be formatted with the format %.*e or even %.*g of printf. The main difference between using %.*e or %.*g is how they count digits. One use the full count, the other needs the count less 1:

$ printf '%.*e  %.*g' $((4-1)) 1,23456e0 4 1,23456e0
1,235e+00  1,235

That worked well for 4 significant digits.

After the number of digits has been cut from the number, we need an additional step to format numbers with exponents different than 0 (as it was above).

$ N=$(printf '%.*e' $((4-1)) 1,23456e3); echo "$N"
1,235e+03
$ printf '%4.0f' "$N"
1235

This works correctly. The count of the integer part (at the left of the decimal mark) is just the value of the exponent ($exp). The count of decimals needed is the number of significant digits ($sig) less the amount of digits already used on the left part of the decimal separator:

a=$((exp<0?0:exp))                      ### count of integer characters.
b=$((exp<sig?sig-exp:0))                ### count of decimal characters.
printf '%*.*f' "$a" "$b" "$N"

As the integral part for the f format has no limit, there is in fact no need to explicitly declare it and this (simpler) code works:

a=$((exp<sig?sig-exp:0))                ### count of decimal characters.
printf '%0.*f' "$a" "$N"

First trial.

A first function that could do this in a more automated way:

# Function significant (number, precision)
sig1(){
    sig=$(($2>0?$2:1))                      ### significant digits (>0)
    N=$(printf "%0.*e" "$(($sig-1))" "$1")  ### N in sci (cut to $sig digits).
    exp=$(echo "${N##*[eE+]}+1"|bc)         ### get the exponent.
    a="$((exp<sig?sig-exp:0))"              ### calc number of decimals.
    printf "%0.*f" "$a" "$N"                ### re-format number.
}

This first attempt works with many numbers but will fail with numbers for which the amount of available digits is less than the significant count requested and the exponent is less than -4:

   Number       sig                       Result        Correct?
   123456789 --> 4<                       123500000 >--| yes
       23455 --> 4<                           23460 >--| yes
       23465 --> 4<                           23460 >--| yes
      1,2e-5 --> 6<                    0,0000120000 >--| no
     1,2e-15 -->15< 0,00000000000000120000000000000 >--| no
          12 --> 6<                         12,0000 >--| no  

It will add many zeros which are not needed.

Second trial.

To solve that we need to clean N of the exponent and any trailing zeros. Then we can get the effective length of digits available and work with that:

# Function significant (number, precision)
sig2(){ local sig N exp n len a
    sig=$(($2>0?$2:1))                      ### significant digits (>0)
    N=$(printf "%+0.*e" "$(($sig-1))" "$1") ### N in sci (cut to $sig digits).
    exp=$(echo "${N##*[eE+]}+1"|bc)         ### get the exponent.
    n=${N%%[Ee]*}                           ### remove sign (first character).
    n=${n%"${n##*[!0]}"}                    ### remove all trailing zeros
    len=$(( ${#n}-2 ))                      ### len of N (less sign and dec).
    len=$((len<sig?len:sig))                ### select the minimum.
    a="$((exp<len?len-exp:0))"              ### use $len to count decimals.
    printf "%0.*f" "$a" "$N"                ### re-format the number.
}

However, that is using floating point math, and "nothing is simple in floating point": Why don’t my numbers add up?

But nothing in "floating point" is simple.

printf "%.2g  " 76500,00001 76500
7,7e+04  7,6e+04

However:

 printf "%.2g  " 75500,00001 75500
 7,6e+04  7,6e+04

Why?:

printf "%.32g\n" 76500,00001e30 76500e30
7,6500000010000000001207515928855e+34
7,6499999999999999997831226199114e+34

And, also, the command printf is a builtin of many shells.
What printf prints may change with the shell:

$ dash -c 'printf "%.*f" 4 123456e+25'
1234560000000000020450486779904.0000
$  ksh -c 'printf "%.*f" 4 123456e+25'
1234559999999999999886313162278,3840

$  dash ./script.sh
   123456789 --> 4<                       123500000 >--| yes
       23455 --> 4<                           23460 >--| yes
       23465 --> 4<                           23460 >--| yes
      1.2e-5 --> 6<                        0.000012 >--| yes
     1.2e-15 -->15<              0.0000000000000012 >--| yes
          12 --> 6<                              12 >--| yes
  123456e+25 --> 4< 1234999999999999958410892148736 >--| no

A reasonably good "significant numbers" function:

dec=$(IFS=0; printf '%s' $(printf '%.1f'))   ### What is the decimal separator?.
sed -i 's/[,.]/'"$dec"'/g' infile

zeros(){ # create an string of $1 zeros (for $1 positive or zero).
         printf '%.*d' $(( $1>0?$1:0 )) 0
       }

# Function significant (number, precision)
sigf(){ local sig sci exp N sgn len z1 z2 b c
    sig=$(($2>0?$2:1))                      ### significant digits (>0)
    N=$(printf '%+e\n' $1)                  ### use scientific format.
    exp=$(echo "${N##*[eE+]}+1"|bc)         ### find ceiling{log(N)}.
    N=${N%%[eE]*}                           ### cut after `e` or `E`.
    sgn=${N%%"${N#-}"}                      ### keep the sign (if any).
    N=${N#[+-]}                             ### remove the sign
    N=${N%[!0-9]*}${N#??}                   ### remove the $dec
    N=${N#"${N%%[!0]*}"}                    ### remove all leading zeros
    N=${N%"${N##*[!0]}"}                    ### remove all trailing zeros
    len=$((${#N}<sig?${#N}:sig))            ### count of selected characters.
    N=$(printf '%0.*s' "$len" "$N")         ### use the first $len characters.

    result="$N"

    # add the decimal separator or lead zeros or trail zeros.
    if   [ "$exp" -gt 0 ] && [ "$exp" -lt "$len" ]; then
            b=$(printf '%0.*s' "$exp" "$result")
            c=${result#"$b"}
            result="$b$dec$c"
    elif [ "$exp" -le 0 ]; then
            # fill front with leading zeros ($exp length).
            z1="$(zeros "$((-exp))")"
            result="0$dec$z1$result"
    elif [ "$exp" -ge "$len" ]; then
            # fill back with trailing zeros.
            z2=$(zeros "$((exp-len))")
            result="$result$z2"
    fi
    # place the sign back.
    printf '%s' "$sgn$result"
}

And the results are:

$ dash ./script.sh
       123456789 --> 4<                       123400000 >--| yes
           23455 --> 4<                           23450 >--| yes
           23465 --> 4<                           23460 >--| yes
          1.2e-5 --> 6<                        0.000012 >--| yes
         1.2e-15 -->15<              0.0000000000000012 >--| yes
              12 --> 6<                              12 >--| yes
      123456e+25 --> 4< 1234000000000000000000000000000 >--| yes
      123456e-25 --> 4<       0.00000000000000000001234 >--| yes
 -12345.61234e-3 --> 4<                          -12.34 >--| yes
 -1.234561234e-3 --> 4<                       -0.001234 >--| yes
           76543 --> 2<                           76000 >--| yes
          -76543 --> 2<                          -76000 >--| yes
          123456 --> 4<                          123400 >--| yes
           12345 --> 4<                           12340 >--| yes
            1234 --> 4<                            1234 >--| yes
           123.4 --> 4<                           123.4 >--| yes
       12.345678 --> 4<                           12.34 >--| yes
      1.23456789 --> 4<                           1.234 >--| yes
    0.1234555646 --> 4<                          0.1234 >--| yes
       0.0076543 --> 2<                          0.0076 >--| yes
   .000000123400 --> 2<                      0.00000012 >--| yes
   .000001234000 --> 2<                       0.0000012 >--| yes
   .000012340000 --> 2<                        0.000012 >--| yes
   .000123400000 --> 2<                         0.00012 >--| yes
   .001234000000 --> 2<                          0.0012 >--| yes
   .012340000000 --> 2<                           0.012 >--| yes
   .123400000000 --> 2<                            0.12 >--| yes
           1.234 --> 2<                             1.2 >--| yes
          12.340 --> 2<                              12 >--| yes
         123.400 --> 2<                             120 >--| yes
        1234.000 --> 2<                            1200 >--| yes
       12340.000 --> 2<                           12000 >--| yes
      123400.000 --> 2<                          120000 >--| yes
0

If you have the number already as a string, that is, as "3456" or "0.003756", then you could potentially do it only using string manipulation. The following is off the top of my head, and not thoroughly tested, and uses sed, but consider:

f() {
    local A="$1"
    local B="$(echo "$A" | sed -E "s/^-?0?\.?0*//")"
    local C="$(eval echo "${A%$B}")"
    if ((${#B} > 2)); then
        D="${B:0:2}"
    else
        D="$B"
    fi
    echo "$C$D"
}

Where basically you strip off and save any "-0.000" stuff at the start, then use a simple substring operation on the rest. One caveat about the above is that multiple leading 0's are not removed. I'll leave that as an exercise.

  • 1
    More than an exercise: it does not pad the integer with zeroes, nor does it account for embedded decimal point. But yes,it's doable using this approach (although achieving that may be beyond OP's skills). – Thomas Dickey Feb 23 '16 at 09:26