libc/softfloat/softfloat.txt

230363Sdas$NetBSD: softfloat.txt,v 1.2 2006/11/24 19:46:58 christos Exp $
129203Scognet$FreeBSD$
129203Scognet
129203ScognetSoftFloat Release 2a General Documentation
129203Scognet
129203ScognetJohn R. Hauser
129203Scognet1998 December 13
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetIntroduction
129203Scognet
129203ScognetSoftFloat is a software implementation of floating-point that conforms to
129203Scognetthe IEC/IEEE Standard for Binary Floating-Point Arithmetic.  As many as four
129203Scognetformats are supported:  single precision, double precision, extended double
129203Scognetprecision, and quadruple precision.  All operations required by the standard
129203Scognetare implemented, except for conversions to and from decimal.
129203Scognet
129203ScognetThis document gives information about the types defined and the routines
129203Scognetimplemented by SoftFloat.  It does not attempt to define or explain the
129203ScognetIEC/IEEE Floating-Point Standard.  Details about the standard are available
129203Scognetelsewhere.
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetLimitations
129203Scognet
129203ScognetSoftFloat is written in C and is designed to work with other C code.  The
129203ScognetSoftFloat header files assume an ISO/ANSI-style C compiler.  No attempt
230363Sdashas been made to accommodate compilers that are not ISO-conformant.  In
129203Scognetparticular, the distributed header files will not be acceptable to any
129203Scognetcompiler that does not recognize function prototypes.
129203Scognet
129203ScognetSupport for the extended double-precision and quadruple-precision formats
129203Scognetdepends on a C compiler that implements 64-bit integer arithmetic.  If the
129203Scognetlargest integer format supported by the C compiler is 32 bits, SoftFloat is
129203Scognetlimited to only single and double precisions.  When that is the case, all
129203Scognetreferences in this document to the extended double precision, quadruple
129203Scognetprecision, and 64-bit integers should be ignored.
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetContents
129203Scognet
129203Scognet    Introduction
129203Scognet    Limitations
129203Scognet    Contents
129203Scognet    Legal Notice
129203Scognet    Types and Functions
129203Scognet    Rounding Modes
129203Scognet    Extended Double-Precision Rounding Precision
129203Scognet    Exceptions and Exception Flags
129203Scognet    Function Details
129203Scognet        Conversion Functions
129203Scognet        Standard Arithmetic Functions
129203Scognet        Remainder Functions
129203Scognet        Round-to-Integer Functions
129203Scognet        Comparison Functions
129203Scognet        Signaling NaN Test Functions
129203Scognet        Raise-Exception Function
129203Scognet    Contact Information
129203Scognet
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetLegal Notice
129203Scognet
129203ScognetSoftFloat was written by John R. Hauser.  This work was made possible in
129203Scognetpart by the International Computer Science Institute, located at Suite 600,
129203Scognet1947 Center Street, Berkeley, California 94704.  Funding was partially
129203Scognetprovided by the National Science Foundation under grant MIP-9311980.  The
129203Scognetoriginal version of this code was written as part of a project to build
129203Scogneta fixed-point vector processor in collaboration with the University of
129203ScognetCalifornia at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
129203Scognet
129203ScognetTHIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
129203Scognethas been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
129203ScognetTIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
129203ScognetPERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
129203ScognetAND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetTypes and Functions
129203Scognet
129203ScognetWhen 64-bit integers are supported by the compiler, the `softfloat.h' header
129203Scognetfile defines four types:  `float32' (single precision), `float64' (double
129203Scognetprecision), `floatx80' (extended double precision), and `float128'
129203Scognet(quadruple precision).  The `float32' and `float64' types are defined in
129203Scognetterms of 32-bit and 64-bit integer types, respectively, while the `float128'
129203Scognettype is defined as a structure of two 64-bit integers, taking into account
129203Scognetthe byte order of the particular machine being used.  The `floatx80' type
129203Scognetis defined as a structure containing one 16-bit and one 64-bit integer, with
129203Scognetthe machine's byte order again determining the order of the `high' and `low'
129203Scognetfields.
129203Scognet
129203ScognetWhen 64-bit integers are _not_ supported by the compiler, the `softfloat.h'
129203Scognetheader file defines only two types:  `float32' and `float64'.  Because
129203ScognetISO/ANSI C guarantees at least one built-in integer type of 32 bits,
129203Scognetthe `float32' type is identified with an appropriate integer type.  The
129203Scognet`float64' type is defined as a structure of two 32-bit integers, with the
129203Scognetmachine's byte order determining the order of the fields.
129203Scognet
129203ScognetIn either case, the types in `softfloat.h' are defined such that if a system
129203Scognetimplements the usual C `float' and `double' types according to the IEC/IEEE
129203ScognetStandard, then the `float32' and `float64' types should be indistinguishable
129203Scognetin memory from the native `float' and `double' types.  (On the other hand,
129203Scognetwhen `float32' or `float64' values are placed in processor registers by
129203Scognetthe compiler, the type of registers used may differ from those used for the
129203Scognetnative `float' and `double' types.)
129203Scognet
129203ScognetSoftFloat implements the following arithmetic operations:
129203Scognet
129203Scognet-- Conversions among all the floating-point formats, and also between
129203Scognet   integers (32-bit and 64-bit) and any of the floating-point formats.
129203Scognet
129203Scognet-- The usual add, subtract, multiply, divide, and square root operations
129203Scognet   for all floating-point formats.
129203Scognet
129203Scognet-- For each format, the floating-point remainder operation defined by the
129203Scognet   IEC/IEEE Standard.
129203Scognet
129203Scognet-- For each floating-point format, a ``round to integer'' operation that
129203Scognet   rounds to the nearest integer value in the same format.  (The floating-
129203Scognet   point formats can hold integer values, of course.)
129203Scognet
129203Scognet-- Comparisons between two values in the same floating-point format.
129203Scognet
129203ScognetThe only functions required by the IEC/IEEE Standard that are not provided
129203Scognetare conversions to and from decimal.
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetRounding Modes
129203Scognet
129203ScognetAll four rounding modes prescribed by the IEC/IEEE Standard are implemented
129203Scognetfor all operations that require rounding.  The rounding mode is selected
129203Scognetby the global variable `float_rounding_mode'.  This variable may be set
129203Scognetto one of the values `float_round_nearest_even', `float_round_to_zero',
129203Scognet`float_round_down', or `float_round_up'.  The rounding mode is initialized
129203Scognetto nearest/even.
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetExtended Double-Precision Rounding Precision
129203Scognet
129203ScognetFor extended double precision (`floatx80') only, the rounding precision
129203Scognetof the standard arithmetic operations is controlled by the global variable
129203Scognet`floatx80_rounding_precision'.  The operations affected are:
129203Scognet
129203Scognet   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
129203Scognet
129203ScognetWhen `floatx80_rounding_precision' is set to its default value of 80, these
129203Scognetoperations are rounded (as usual) to the full precision of the extended
129203Scognetdouble-precision format.  Setting `floatx80_rounding_precision' to 32
129203Scognetor to 64 causes the operations listed to be rounded to reduced precision
129203Scognetequivalent to single precision (`float32') or to double precision
129203Scognet(`float64'), respectively.  When rounding to reduced precision, additional
129203Scognetbits in the result significand beyond the rounding point are set to zero.
129203ScognetThe consequences of setting `floatx80_rounding_precision' to a value other
129203Scognetthan 32, 64, or 80 is not specified.  Operations other than the ones listed
129203Scognetabove are not affected by `floatx80_rounding_precision'.
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetExceptions and Exception Flags
129203Scognet
129203ScognetAll five exception flags required by the IEC/IEEE Standard are
129203Scognetimplemented.  Each flag is stored as a unique bit in the global variable
129203Scognet`float_exception_flags'.  The positions of the exception flag bits within
129203Scognetthis variable are determined by the bit masks `float_flag_inexact',
129203Scognet`float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and
129203Scognet`float_flag_invalid'.  The exception flags variable is initialized to all 0,
129203Scognetmeaning no exceptions.
129203Scognet
129203ScognetAn individual exception flag can be cleared with the statement
129203Scognet
129203Scognet    float_exception_flags &= ~ float_flag_<exception>;
129203Scognet
129203Scognetwhere `<exception>' is the appropriate name.  To raise a floating-point
129203Scognetexception, the SoftFloat function `float_raise' should be used (see below).
129203Scognet
129203ScognetIn the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess
129203Scognetfor underflow either before or after rounding.  The choice is made by
129203Scognetthe global variable `float_detect_tininess', which can be set to either
129203Scognet`float_tininess_before_rounding' or `float_tininess_after_rounding'.
129203ScognetDetecting tininess after rounding is better because it results in fewer
129203Scognetspurious underflow signals.  The other option is provided for compatibility
129203Scognetwith some systems.  Like most systems, SoftFloat always detects loss of
129203Scognetaccuracy for underflow as an inexact result.
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetFunction Details
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203ScognetConversion Functions
129203Scognet
129203ScognetAll conversions among the floating-point formats are supported, as are all
129203Scognetconversions between a floating-point format and 32-bit and 64-bit signed
129203Scognetintegers.  The complete set of conversion functions is:
129203Scognet
129203Scognet   int32_to_float32      int64_to_float32
129203Scognet   int32_to_float64      int64_to_float32
129203Scognet   int32_to_floatx80     int64_to_floatx80
129203Scognet   int32_to_float128     int64_to_float128
129203Scognet
129203Scognet   float32_to_int32      float32_to_int64
129203Scognet   float32_to_int32      float64_to_int64
129203Scognet   floatx80_to_int32     floatx80_to_int64
129203Scognet   float128_to_int32     float128_to_int64
129203Scognet
129203Scognet   float32_to_float64    float32_to_floatx80   float32_to_float128
129203Scognet   float64_to_float32    float64_to_floatx80   float64_to_float128
129203Scognet   floatx80_to_float32   floatx80_to_float64   floatx80_to_float128
129203Scognet   float128_to_float32   float128_to_float64   float128_to_floatx80
129203Scognet
129203ScognetEach conversion function takes one operand of the appropriate type and
129203Scognetreturns one result.  Conversions from a smaller to a larger floating-point
129203Scognetformat are always exact and so require no rounding.  Conversions from 32-bit
129203Scognetintegers to double precision and larger formats are also exact, and likewise
129203Scognetfor conversions from 64-bit integers to extended double and quadruple
129203Scognetprecisions.
129203Scognet
129203ScognetConversions from floating-point to integer raise the invalid exception if
129203Scognetthe source value cannot be rounded to a representable integer of the desired
129203Scognetsize (32 or 64 bits).  If the floating-point operand is a NaN, the largest
129203Scognetpositive integer is returned.  Otherwise, if the conversion overflows, the
129203Scognetlargest integer with the same sign as the operand is returned.
129203Scognet
129203ScognetOn conversions to integer, if the floating-point operand is not already an
129203Scognetinteger value, the operand is rounded according to the current rounding
129203Scognetmode as specified by `float_rounding_mode'.  Because C (and perhaps other
129203Scognetlanguages) require that conversions to integers be rounded toward zero, the
129203Scognetfollowing functions are provided for improved speed and convenience:
129203Scognet
129203Scognet   float32_to_int32_round_to_zero    float32_to_int64_round_to_zero
129203Scognet   float64_to_int32_round_to_zero    float64_to_int64_round_to_zero
129203Scognet   floatx80_to_int32_round_to_zero   floatx80_to_int64_round_to_zero
129203Scognet   float128_to_int32_round_to_zero   float128_to_int64_round_to_zero
129203Scognet
129203ScognetThese variant functions ignore `float_rounding_mode' and always round toward
129203Scognetzero.
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203ScognetStandard Arithmetic Functions
129203Scognet
129203ScognetThe following standard arithmetic functions are provided:
129203Scognet
129203Scognet   float32_add    float32_sub    float32_mul    float32_div    float32_sqrt
129203Scognet   float64_add    float64_sub    float64_mul    float64_div    float64_sqrt
129203Scognet   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
129203Scognet   float128_add   float128_sub   float128_mul   float128_div   float128_sqrt
129203Scognet
129203ScognetEach function takes two operands, except for `sqrt' which takes only one.
129203ScognetThe operands and result are all of the same type.
129203Scognet
129203ScognetRounding of the extended double-precision (`floatx80') functions is affected
129203Scognetby the `floatx80_rounding_precision' variable, as explained above in the
129203Scognetsection _Extended_Double-Precision_Rounding_Precision_.
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203ScognetRemainder Functions
129203Scognet
129203ScognetFor each format, SoftFloat implements the remainder function according to
129203Scognetthe IEC/IEEE Standard.  The remainder functions are:
129203Scognet
129203Scognet   float32_rem
129203Scognet   float64_rem
129203Scognet   floatx80_rem
129203Scognet   float128_rem
129203Scognet
129203ScognetEach remainder function takes two operands.  The operands and result are all
129203Scognetof the same type.  Given operands x and y, the remainder functions return
129203Scognetthe value x - n*y, where n is the integer closest to x/y.  If x/y is exactly
129203Scognethalfway between two integers, n is the even integer closest to x/y.  The
129203Scognetremainder functions are always exact and so require no rounding.
129203Scognet
129203ScognetDepending on the relative magnitudes of the operands, the remainder
129203Scognetfunctions can take considerably longer to execute than the other SoftFloat
129203Scognetfunctions.  This is inherent in the remainder operation itself and is not a
129203Scognetflaw in the SoftFloat implementation.
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203ScognetRound-to-Integer Functions
129203Scognet
129203ScognetFor each format, SoftFloat implements the round-to-integer function
129203Scognetspecified by the IEC/IEEE Standard.  The functions are:
129203Scognet
129203Scognet   float32_round_to_int
129203Scognet   float64_round_to_int
129203Scognet   floatx80_round_to_int
129203Scognet   float128_round_to_int
129203Scognet
129203ScognetEach function takes a single floating-point operand and returns a result of
129203Scognetthe same type.  (Note that the result is not an integer type.)  The operand
129203Scognetis rounded to an exact integer according to the current rounding mode, and
129203Scognetthe resulting integer value is returned in the same floating-point format.
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203ScognetComparison Functions
129203Scognet
129203ScognetThe following floating-point comparison functions are provided:
129203Scognet
129203Scognet   float32_eq    float32_le    float32_lt
129203Scognet   float64_eq    float64_le    float64_lt
129203Scognet   floatx80_eq   floatx80_le   floatx80_lt
129203Scognet   float128_eq   float128_le   float128_lt
129203Scognet
129203ScognetEach function takes two operands of the same type and returns a 1 or 0
129203Scognetrepresenting either _true_ or _false_.  The abbreviation `eq' stands for
129203Scognet``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands
129203Scognetfor ``less than'' (<).
129203Scognet
129203ScognetThe standard greater-than (>), greater-than-or-equal (>=), and not-equal
129203Scognet(!=) functions are easily obtained using the functions provided.  The
129203Scognetnot-equal function is just the logical complement of the equal function.
129203ScognetThe greater-than-or-equal function is identical to the less-than-or-equal
129203Scognetfunction with the operands reversed; and the greater-than function can be
129203Scognetobtained from the less-than function in the same way.
129203Scognet
129203ScognetThe IEC/IEEE Standard specifies that the less-than-or-equal and less-than
129203Scognetfunctions raise the invalid exception if either input is any kind of NaN.
129203ScognetThe equal functions, on the other hand, are defined not to raise the invalid
129203Scognetexception on quiet NaNs.  For completeness, SoftFloat provides the following
129203Scognetadditional functions:
129203Scognet
129203Scognet   float32_eq_signaling    float32_le_quiet    float32_lt_quiet
129203Scognet   float64_eq_signaling    float64_le_quiet    float64_lt_quiet
129203Scognet   floatx80_eq_signaling   floatx80_le_quiet   floatx80_lt_quiet
129203Scognet   float128_eq_signaling   float128_le_quiet   float128_lt_quiet
129203Scognet
129203ScognetThe `signaling' equal functions are identical to the standard functions
129203Scognetexcept that the invalid exception is raised for any NaN input.  Likewise,
129203Scognetthe `quiet' comparison functions are identical to their counterparts except
129203Scognetthat the invalid exception is not raised for quiet NaNs.
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203ScognetSignaling NaN Test Functions
129203Scognet
129203ScognetThe following functions test whether a floating-point value is a signaling
129203ScognetNaN:
129203Scognet
129203Scognet   float32_is_signaling_nan
129203Scognet   float64_is_signaling_nan
129203Scognet   floatx80_is_signaling_nan
129203Scognet   float128_is_signaling_nan
129203Scognet
129203ScognetThe functions take one operand and return 1 if the operand is a signaling
129203ScognetNaN and 0 otherwise.
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203ScognetRaise-Exception Function
129203Scognet
129203ScognetSoftFloat provides a function for raising floating-point exceptions:
129203Scognet
129203Scognet    float_raise
129203Scognet
129203ScognetThe function takes a mask indicating the set of exceptions to raise.  No
129203Scognetresult is returned.  In addition to setting the specified exception flags,
129203Scognetthis function may cause a trap or abort appropriate for the current system.
129203Scognet
129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
129203Scognet
129203Scognet
129203Scognet-------------------------------------------------------------------------------
129203ScognetContact Information
129203Scognet
129203ScognetAt the time of this writing, the most up-to-date information about
129203ScognetSoftFloat and the latest release can be found at the Web page `http://
129203ScognetHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.
129203Scognet
129203Scognet