0

I have a file with 40,000 rows

head flower_all

    0.992957746478873 0.00704225352112677
    0.646410833917366 0.353589166082634
    0.992957746478873 0.00704225352112677
    0.992957746478873 0.00704225352112677
    0.992957746478873 0.00704225352112677
    0.992957746478873 0.00704225352112677
    0.992957746478873 0.00704225352112677
    0.992957746478873 0.00704225352112677
    0.5 0.5

I want to keep only 3 significant digits. My desired output:

0.992 0.007
0.646 0.353
0.992 0.007
0.992 0.007
0.992 0.007
0.992 0.007
0.992 0.007
0.992 0.007
0.5 0.5

How can I do it?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Anna1364
  • 1,026

2 Answers2

11

With awk:

awk '{ printf("%.3g %.3g\n", $1, $2) }' file

With the given data, this produces

0.993 0.00704
0.646 0.354
0.993 0.00704
0.993 0.00704
0.993 0.00704
0.993 0.00704
0.993 0.00704
0.993 0.00704
0.5 0.5

Note that 0.00704 has five decimals, but three significant digits.

If you want exactly three decimals, use %.3f instead of %.3g and get

0.993 0.007
0.646 0.354
0.993 0.007
0.993 0.007
0.993 0.007
0.993 0.007
0.993 0.007
0.993 0.007
0.500 0.500

The two variation above may be generalised for a variable number of columns, using GNU awk:

awk -v CONVFMT='%.3g' '{ for (i=1; i<=NF; ++i) $i+=0; print }' file

The loop with $i+=0 forces awk to re-format the value of every field as a floating point number, which it will do while taking CONVFMT into account (it will more or less do the equivalent of $i=sprintf(CONVFMT, $i)).


If you want to cut the numbers:

awk '{ for (i=1; i<=NF; ++i) $i=sprintf("%.5s", $i); print }' file

This treats the numbers as strings and cuts them off after five characters (which assumes that all numbers are less than 10 and greater than zero) generating

0.992 0.007
0.646 0.353
0.992 0.007
0.992 0.007
0.992 0.007
0.992 0.007
0.992 0.007
0.992 0.007
0.5 0.5

For a slightly more general cutting of the numbers:

awk '{ for (i=1; i<=NF; ++i) if (match($i,".*\\.[0-9]?[0-9]?[0-9]?")) $i=substr($i,RSTART,RLENGTH); print }' file

The operation in the loop cuts the numbers at the point at which the given regular expression match ends (if it matches).

Kusalananda
  • 333,661
3

Your data has no numbers above 1. I extended the source file to include some values with more digits before the dot:

$ cat infile
0.992957746478873 0.00704225352112677
0.646410833917366 0.353589166082634
0.992957746478873 0.00704225352112677
0.5 0.5
16.258137489137 333444.277775666
16.233399999999 333777.277111111

printf

One possible solution is to use the C compatible printf function (awk has one):

the f format (3 (rounded) decimal places)

An exact count of 3 (rounded) decimals:

$ awk '{ printf("%11.3f %11.3f\n", $1,$2) }' infile
      0.993       0.007
      0.646       0.354
      0.993       0.007
      0.500       0.500
     16.258  333444.278
     16.233  333777.277

Note that 0.992957746478873 is rounded up to 0.993.

the g format (significant (rounded))

An exact count of 3 (significant) digits:

$ awk '{ printf("%9.3g %9.3g\n", $1,$2) }' infile
    0.993   0.00704
    0.646     0.354
    0.993   0.00704
      0.5       0.5
     16.3  3.33e+05
     16.2  3.34e+05

Note the rounding on the forth digit (for example 3.34e+05)

String (not rounded)

Exactly 3 (not rounded) digits after the decimal dot.

Using GNU awk:

$ gawk '{for(i=1;i<=NF;i++){
         printf( "%12s ",gensub(/([0-9]+\.[0-9]{0,3}).*/, "\\1", "g", $i))};print""}
       ' infile
       0.992        0.007
       0.646        0.353
       0.992        0.007
         0.5          0.5
      16.258   333444.277
      16.233   333777.277

Using sed (probably faster):

$ sed -E 's/([0-9]+\.[0-9]{1,3})[^ ]*/\1/g' infile
0.992 0.007
0.646 0.353
0.992 0.007
0.5 0.5
16.258 333444.277
16.233 333777.277