Internal: Common Precision Mistakes and Misconceptions In IDL

Anonym Tuesday, June 30, 1998

Topic:
Inherent inaccuracy in floating point representation/arithmetic
1 - Roundoff error.
2 - Truncation error.
3 - Converting between single-precision and double precision: IDL and C differences.
4 - Significant digits.
Discussion:
While in most cases it is safe to assume that the result of an arithmetical operation done on your computer is correct, it is important to remember that this finite-precision representation leads to unavoidable errors, especially when floating-point numbers, which are digital approximations to real numbers, are involved.

To understand why floating-point numbers are inherently inaccurate, consider the following:

Floating-point numbers must be made to fit in a space (a string of binary digits in a computer's memory register) that can only hold an integer and a scaling factor.

Floating-point numbers are represented by strings of a limited number of bits, but represent numbers much larger or smaller than that number of digits can be made to express.

In other words, floating-point values are finite-precision approximations of infinitely precise numbers.

ROUNDOFF ERROR

This inherent impreciseness of floating point number representation produces "roundoff errors", which are related to the number of bits in the mantissa of the binary representation of the number for a particular machine (different from the numerical value that can be represented by the machine). Arithmetic operations on floating point numbers introduces cumulative roundoff errors. Depending on the algorithm you are using, a calculation involving n arithmetic operations might have a total roundoff error between SQRT(n) times the machine accuracy and n times the machine accuracy.

TRUNCATION ERROR

Another type of representation error is also present in some numerical algorithms. Truncation error is the error introduced by the process of numerically approximating a continuous function by evaluating it at a finite number of discrete points. Often, accuracy can be increased (again at some cost of computation time) by increasing the number of discrete points evaluated.

For example, consider the process of calculating

    e^x = 1 + x + (x^2/2)! + (x^3/3!) + ... + (x^n/n!)

Obviously, the answer becomes more accurate as n approaches infinity. When performing the actual computation, however, a cutoff value must be specified for n. Increasing n reduces truncation error at the expense of computational effort.

======================
3. Converting between single-precision and double-precision values -- IDL and C differences.

IDL's math is identical to the math used in C, which is IEEE standard floating point math. The math is implemented in the hardware, in the CPU.

o Typecasting a float to a double is not equivalent to initializing a double.
Since the float contains nothing past 7 decimal places, casting it to a
double (via DOUBLE(myFloat)) will give no more accuracy.

o IEEE standard (32-bit) doubles are only have up to 16 decimal places
of accuracy. Therefore, the "corrupted" numbers are really just a result
of formating output into the "junk" range.

The default
C treats floating point constants as double floats by default unless otherwise indicated (see page 37, section 2.3 ("Constants"), second paragraph of Kernighan & Ritchie's "The C Programming language", 2nd ed.). For example, for example, in C the constant "83.02" is already a double precision value. When cast to double, the value does not change. However, in IDL, "83.02" is seen as single precision see also Table 8-1 in the "Using IDL" manual, page 126). A double precision constant representation for the value 83.02 in IDL is "83.02D".

In C, for the statement:
    output = (double) 83.02;

The equivalent statement in IDL would be:
    output = double (83.02D)

%%%%%%%%%%%%%%
IDL example:
%%%%%%%%%%%%%%

PRO CAST_TEST

PRINT, 'The following values are printed with f30.25 format'
PRINT, '===================================================='
PRINT, 83.02, FORMAT='("Constant 83.02:    ",f30.25)'
PRINT, 83.02D, FORMAT='("Constant 83.02D:    ", f30.25)'
PRINT, DOUBLE(83.02), FORMAT='("DOUBLE(83.02):    ", f30.25)'
PRINT, DOUBLE(83.02D), FORMAT='("DOUBLE(83.02D)    ", f30.25)'
F = 83.02
PRINT, F, FORMAT='("F = 83.02:    ", f30.25)'
D = 83.02D
PRINT, D, FORMAT='("D = 83.02D:    ", f30.25)'
PRINT, DOUBLE(F), FORMAT='("DOUBLE(F):    ", f30.25)'
PRINT, DOUBLE(D), FORMAT='("DOUBLE(D):    ", f30.25)'

END

%%%%%%%%%%%%%%%%%%%
IDL example result:
%%%%%%%%%%%%%%%%%%%

The following values are printed with f30.25 format
====================================================
Constant 83.02: 83.0199966430664062500000000
Constant 83.02D: 83.0199999999999960209606797
DOUBLE(83.02): 83.0199966430664062500000000
DOUBLE(83.02D) 83.0199999999999960209606797
F = 83.02: 83.0199966430664062500000000
D = 83.02D: 83.0199999999999960209606797
DOUBLE(F): 83.0199966430664062500000000
DOUBLE(D): 83.0199999999999960209606797

%%%%%%%%%%%%%%
C equivalent:
%%%%%%%%%%%%%%

main()
{
float f1=83.02;
double d1=83.02;

float f2;
double d2;

double df, dd;

f2 = (float)83.02;
d2 = (double)83.02;

df = (double)f1;
dd = (double)d1;

printf("The following values are printed with %%30.25f format ");
printf("==================================================== ");
printf("constant 83.02: %30.25f ", 83.02);
printf("float f1 = 83.02: %30.25f ", f1);
printf("double d1 = 83.02: %30.25f ", d1);
printf("f2 = (float)83.02: %30.25f ", f2);
printf("d2 =(double)83.02: %30.25f ", d2);
printf("df = (double)f: %30.25f ", df);
printf("dd = (double)d: %30.25f ", dd);
}

%%%%%%%%%%%%%%%%%
C example result:
%%%%%%%%%%%%%%%%%

The following values are printed with %30.25f format
====================================================
constant 83.02: 83.0199999999999960209606797
float f1 = 83.02: 83.0199966430664062500000000
double d1 = 83.02: 83.0199999999999960209606797
f2 = (float)83.02: 83.0199966430664062500000000
d2 =(double)83.02: 83.0199999999999960209606797
df = (double)f: 83.0199966430664062500000000
dd = (double)d: 83.0199999999999960209606797Solution:
[Edit this field in the IDL-based Tech Tip Editor, v3]