softfloat.txt revision 230363
1230363Sdas$NetBSD: softfloat.txt,v 1.2 2006/11/24 19:46:58 christos Exp $ 2129203Scognet$FreeBSD: head/lib/libc/softfloat/softfloat.txt 230363 2012-01-20 06:16:14Z das $ 3129203Scognet 4129203ScognetSoftFloat Release 2a General Documentation 5129203Scognet 6129203ScognetJohn R. Hauser 7129203Scognet1998 December 13 8129203Scognet 9129203Scognet 10129203Scognet------------------------------------------------------------------------------- 11129203ScognetIntroduction 12129203Scognet 13129203ScognetSoftFloat is a software implementation of floating-point that conforms to 14129203Scognetthe IEC/IEEE Standard for Binary Floating-Point Arithmetic. As many as four 15129203Scognetformats are supported: single precision, double precision, extended double 16129203Scognetprecision, and quadruple precision. All operations required by the standard 17129203Scognetare implemented, except for conversions to and from decimal. 18129203Scognet 19129203ScognetThis document gives information about the types defined and the routines 20129203Scognetimplemented by SoftFloat. It does not attempt to define or explain the 21129203ScognetIEC/IEEE Floating-Point Standard. Details about the standard are available 22129203Scognetelsewhere. 23129203Scognet 24129203Scognet 25129203Scognet------------------------------------------------------------------------------- 26129203ScognetLimitations 27129203Scognet 28129203ScognetSoftFloat is written in C and is designed to work with other C code. The 29129203ScognetSoftFloat header files assume an ISO/ANSI-style C compiler. No attempt 30230363Sdashas been made to accommodate compilers that are not ISO-conformant. In 31129203Scognetparticular, the distributed header files will not be acceptable to any 32129203Scognetcompiler that does not recognize function prototypes. 33129203Scognet 34129203ScognetSupport for the extended double-precision and quadruple-precision formats 35129203Scognetdepends on a C compiler that implements 64-bit integer arithmetic. If the 36129203Scognetlargest integer format supported by the C compiler is 32 bits, SoftFloat is 37129203Scognetlimited to only single and double precisions. When that is the case, all 38129203Scognetreferences in this document to the extended double precision, quadruple 39129203Scognetprecision, and 64-bit integers should be ignored. 40129203Scognet 41129203Scognet 42129203Scognet------------------------------------------------------------------------------- 43129203ScognetContents 44129203Scognet 45129203Scognet Introduction 46129203Scognet Limitations 47129203Scognet Contents 48129203Scognet Legal Notice 49129203Scognet Types and Functions 50129203Scognet Rounding Modes 51129203Scognet Extended Double-Precision Rounding Precision 52129203Scognet Exceptions and Exception Flags 53129203Scognet Function Details 54129203Scognet Conversion Functions 55129203Scognet Standard Arithmetic Functions 56129203Scognet Remainder Functions 57129203Scognet Round-to-Integer Functions 58129203Scognet Comparison Functions 59129203Scognet Signaling NaN Test Functions 60129203Scognet Raise-Exception Function 61129203Scognet Contact Information 62129203Scognet 63129203Scognet 64129203Scognet 65129203Scognet------------------------------------------------------------------------------- 66129203ScognetLegal Notice 67129203Scognet 68129203ScognetSoftFloat was written by John R. Hauser. This work was made possible in 69129203Scognetpart by the International Computer Science Institute, located at Suite 600, 70129203Scognet1947 Center Street, Berkeley, California 94704. Funding was partially 71129203Scognetprovided by the National Science Foundation under grant MIP-9311980. The 72129203Scognetoriginal version of this code was written as part of a project to build 73129203Scogneta fixed-point vector processor in collaboration with the University of 74129203ScognetCalifornia at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. 75129203Scognet 76129203ScognetTHIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort 77129203Scognethas been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT 78129203ScognetTIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO 79129203ScognetPERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY 80129203ScognetAND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. 81129203Scognet 82129203Scognet 83129203Scognet------------------------------------------------------------------------------- 84129203ScognetTypes and Functions 85129203Scognet 86129203ScognetWhen 64-bit integers are supported by the compiler, the `softfloat.h' header 87129203Scognetfile defines four types: `float32' (single precision), `float64' (double 88129203Scognetprecision), `floatx80' (extended double precision), and `float128' 89129203Scognet(quadruple precision). The `float32' and `float64' types are defined in 90129203Scognetterms of 32-bit and 64-bit integer types, respectively, while the `float128' 91129203Scognettype is defined as a structure of two 64-bit integers, taking into account 92129203Scognetthe byte order of the particular machine being used. The `floatx80' type 93129203Scognetis defined as a structure containing one 16-bit and one 64-bit integer, with 94129203Scognetthe machine's byte order again determining the order of the `high' and `low' 95129203Scognetfields. 96129203Scognet 97129203ScognetWhen 64-bit integers are _not_ supported by the compiler, the `softfloat.h' 98129203Scognetheader file defines only two types: `float32' and `float64'. Because 99129203ScognetISO/ANSI C guarantees at least one built-in integer type of 32 bits, 100129203Scognetthe `float32' type is identified with an appropriate integer type. The 101129203Scognet`float64' type is defined as a structure of two 32-bit integers, with the 102129203Scognetmachine's byte order determining the order of the fields. 103129203Scognet 104129203ScognetIn either case, the types in `softfloat.h' are defined such that if a system 105129203Scognetimplements the usual C `float' and `double' types according to the IEC/IEEE 106129203ScognetStandard, then the `float32' and `float64' types should be indistinguishable 107129203Scognetin memory from the native `float' and `double' types. (On the other hand, 108129203Scognetwhen `float32' or `float64' values are placed in processor registers by 109129203Scognetthe compiler, the type of registers used may differ from those used for the 110129203Scognetnative `float' and `double' types.) 111129203Scognet 112129203ScognetSoftFloat implements the following arithmetic operations: 113129203Scognet 114129203Scognet-- Conversions among all the floating-point formats, and also between 115129203Scognet integers (32-bit and 64-bit) and any of the floating-point formats. 116129203Scognet 117129203Scognet-- The usual add, subtract, multiply, divide, and square root operations 118129203Scognet for all floating-point formats. 119129203Scognet 120129203Scognet-- For each format, the floating-point remainder operation defined by the 121129203Scognet IEC/IEEE Standard. 122129203Scognet 123129203Scognet-- For each floating-point format, a ``round to integer'' operation that 124129203Scognet rounds to the nearest integer value in the same format. (The floating- 125129203Scognet point formats can hold integer values, of course.) 126129203Scognet 127129203Scognet-- Comparisons between two values in the same floating-point format. 128129203Scognet 129129203ScognetThe only functions required by the IEC/IEEE Standard that are not provided 130129203Scognetare conversions to and from decimal. 131129203Scognet 132129203Scognet 133129203Scognet------------------------------------------------------------------------------- 134129203ScognetRounding Modes 135129203Scognet 136129203ScognetAll four rounding modes prescribed by the IEC/IEEE Standard are implemented 137129203Scognetfor all operations that require rounding. The rounding mode is selected 138129203Scognetby the global variable `float_rounding_mode'. This variable may be set 139129203Scognetto one of the values `float_round_nearest_even', `float_round_to_zero', 140129203Scognet`float_round_down', or `float_round_up'. The rounding mode is initialized 141129203Scognetto nearest/even. 142129203Scognet 143129203Scognet 144129203Scognet------------------------------------------------------------------------------- 145129203ScognetExtended Double-Precision Rounding Precision 146129203Scognet 147129203ScognetFor extended double precision (`floatx80') only, the rounding precision 148129203Scognetof the standard arithmetic operations is controlled by the global variable 149129203Scognet`floatx80_rounding_precision'. The operations affected are: 150129203Scognet 151129203Scognet floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt 152129203Scognet 153129203ScognetWhen `floatx80_rounding_precision' is set to its default value of 80, these 154129203Scognetoperations are rounded (as usual) to the full precision of the extended 155129203Scognetdouble-precision format. Setting `floatx80_rounding_precision' to 32 156129203Scognetor to 64 causes the operations listed to be rounded to reduced precision 157129203Scognetequivalent to single precision (`float32') or to double precision 158129203Scognet(`float64'), respectively. When rounding to reduced precision, additional 159129203Scognetbits in the result significand beyond the rounding point are set to zero. 160129203ScognetThe consequences of setting `floatx80_rounding_precision' to a value other 161129203Scognetthan 32, 64, or 80 is not specified. Operations other than the ones listed 162129203Scognetabove are not affected by `floatx80_rounding_precision'. 163129203Scognet 164129203Scognet 165129203Scognet------------------------------------------------------------------------------- 166129203ScognetExceptions and Exception Flags 167129203Scognet 168129203ScognetAll five exception flags required by the IEC/IEEE Standard are 169129203Scognetimplemented. Each flag is stored as a unique bit in the global variable 170129203Scognet`float_exception_flags'. The positions of the exception flag bits within 171129203Scognetthis variable are determined by the bit masks `float_flag_inexact', 172129203Scognet`float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and 173129203Scognet`float_flag_invalid'. The exception flags variable is initialized to all 0, 174129203Scognetmeaning no exceptions. 175129203Scognet 176129203ScognetAn individual exception flag can be cleared with the statement 177129203Scognet 178129203Scognet float_exception_flags &= ~ float_flag_<exception>; 179129203Scognet 180129203Scognetwhere `<exception>' is the appropriate name. To raise a floating-point 181129203Scognetexception, the SoftFloat function `float_raise' should be used (see below). 182129203Scognet 183129203ScognetIn the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess 184129203Scognetfor underflow either before or after rounding. The choice is made by 185129203Scognetthe global variable `float_detect_tininess', which can be set to either 186129203Scognet`float_tininess_before_rounding' or `float_tininess_after_rounding'. 187129203ScognetDetecting tininess after rounding is better because it results in fewer 188129203Scognetspurious underflow signals. The other option is provided for compatibility 189129203Scognetwith some systems. Like most systems, SoftFloat always detects loss of 190129203Scognetaccuracy for underflow as an inexact result. 191129203Scognet 192129203Scognet 193129203Scognet------------------------------------------------------------------------------- 194129203ScognetFunction Details 195129203Scognet 196129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 197129203ScognetConversion Functions 198129203Scognet 199129203ScognetAll conversions among the floating-point formats are supported, as are all 200129203Scognetconversions between a floating-point format and 32-bit and 64-bit signed 201129203Scognetintegers. The complete set of conversion functions is: 202129203Scognet 203129203Scognet int32_to_float32 int64_to_float32 204129203Scognet int32_to_float64 int64_to_float32 205129203Scognet int32_to_floatx80 int64_to_floatx80 206129203Scognet int32_to_float128 int64_to_float128 207129203Scognet 208129203Scognet float32_to_int32 float32_to_int64 209129203Scognet float32_to_int32 float64_to_int64 210129203Scognet floatx80_to_int32 floatx80_to_int64 211129203Scognet float128_to_int32 float128_to_int64 212129203Scognet 213129203Scognet float32_to_float64 float32_to_floatx80 float32_to_float128 214129203Scognet float64_to_float32 float64_to_floatx80 float64_to_float128 215129203Scognet floatx80_to_float32 floatx80_to_float64 floatx80_to_float128 216129203Scognet float128_to_float32 float128_to_float64 float128_to_floatx80 217129203Scognet 218129203ScognetEach conversion function takes one operand of the appropriate type and 219129203Scognetreturns one result. Conversions from a smaller to a larger floating-point 220129203Scognetformat are always exact and so require no rounding. Conversions from 32-bit 221129203Scognetintegers to double precision and larger formats are also exact, and likewise 222129203Scognetfor conversions from 64-bit integers to extended double and quadruple 223129203Scognetprecisions. 224129203Scognet 225129203ScognetConversions from floating-point to integer raise the invalid exception if 226129203Scognetthe source value cannot be rounded to a representable integer of the desired 227129203Scognetsize (32 or 64 bits). If the floating-point operand is a NaN, the largest 228129203Scognetpositive integer is returned. Otherwise, if the conversion overflows, the 229129203Scognetlargest integer with the same sign as the operand is returned. 230129203Scognet 231129203ScognetOn conversions to integer, if the floating-point operand is not already an 232129203Scognetinteger value, the operand is rounded according to the current rounding 233129203Scognetmode as specified by `float_rounding_mode'. Because C (and perhaps other 234129203Scognetlanguages) require that conversions to integers be rounded toward zero, the 235129203Scognetfollowing functions are provided for improved speed and convenience: 236129203Scognet 237129203Scognet float32_to_int32_round_to_zero float32_to_int64_round_to_zero 238129203Scognet float64_to_int32_round_to_zero float64_to_int64_round_to_zero 239129203Scognet floatx80_to_int32_round_to_zero floatx80_to_int64_round_to_zero 240129203Scognet float128_to_int32_round_to_zero float128_to_int64_round_to_zero 241129203Scognet 242129203ScognetThese variant functions ignore `float_rounding_mode' and always round toward 243129203Scognetzero. 244129203Scognet 245129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 246129203ScognetStandard Arithmetic Functions 247129203Scognet 248129203ScognetThe following standard arithmetic functions are provided: 249129203Scognet 250129203Scognet float32_add float32_sub float32_mul float32_div float32_sqrt 251129203Scognet float64_add float64_sub float64_mul float64_div float64_sqrt 252129203Scognet floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt 253129203Scognet float128_add float128_sub float128_mul float128_div float128_sqrt 254129203Scognet 255129203ScognetEach function takes two operands, except for `sqrt' which takes only one. 256129203ScognetThe operands and result are all of the same type. 257129203Scognet 258129203ScognetRounding of the extended double-precision (`floatx80') functions is affected 259129203Scognetby the `floatx80_rounding_precision' variable, as explained above in the 260129203Scognetsection _Extended_Double-Precision_Rounding_Precision_. 261129203Scognet 262129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 263129203ScognetRemainder Functions 264129203Scognet 265129203ScognetFor each format, SoftFloat implements the remainder function according to 266129203Scognetthe IEC/IEEE Standard. The remainder functions are: 267129203Scognet 268129203Scognet float32_rem 269129203Scognet float64_rem 270129203Scognet floatx80_rem 271129203Scognet float128_rem 272129203Scognet 273129203ScognetEach remainder function takes two operands. The operands and result are all 274129203Scognetof the same type. Given operands x and y, the remainder functions return 275129203Scognetthe value x - n*y, where n is the integer closest to x/y. If x/y is exactly 276129203Scognethalfway between two integers, n is the even integer closest to x/y. The 277129203Scognetremainder functions are always exact and so require no rounding. 278129203Scognet 279129203ScognetDepending on the relative magnitudes of the operands, the remainder 280129203Scognetfunctions can take considerably longer to execute than the other SoftFloat 281129203Scognetfunctions. This is inherent in the remainder operation itself and is not a 282129203Scognetflaw in the SoftFloat implementation. 283129203Scognet 284129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 285129203ScognetRound-to-Integer Functions 286129203Scognet 287129203ScognetFor each format, SoftFloat implements the round-to-integer function 288129203Scognetspecified by the IEC/IEEE Standard. The functions are: 289129203Scognet 290129203Scognet float32_round_to_int 291129203Scognet float64_round_to_int 292129203Scognet floatx80_round_to_int 293129203Scognet float128_round_to_int 294129203Scognet 295129203ScognetEach function takes a single floating-point operand and returns a result of 296129203Scognetthe same type. (Note that the result is not an integer type.) The operand 297129203Scognetis rounded to an exact integer according to the current rounding mode, and 298129203Scognetthe resulting integer value is returned in the same floating-point format. 299129203Scognet 300129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 301129203ScognetComparison Functions 302129203Scognet 303129203ScognetThe following floating-point comparison functions are provided: 304129203Scognet 305129203Scognet float32_eq float32_le float32_lt 306129203Scognet float64_eq float64_le float64_lt 307129203Scognet floatx80_eq floatx80_le floatx80_lt 308129203Scognet float128_eq float128_le float128_lt 309129203Scognet 310129203ScognetEach function takes two operands of the same type and returns a 1 or 0 311129203Scognetrepresenting either _true_ or _false_. The abbreviation `eq' stands for 312129203Scognet``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands 313129203Scognetfor ``less than'' (<). 314129203Scognet 315129203ScognetThe standard greater-than (>), greater-than-or-equal (>=), and not-equal 316129203Scognet(!=) functions are easily obtained using the functions provided. The 317129203Scognetnot-equal function is just the logical complement of the equal function. 318129203ScognetThe greater-than-or-equal function is identical to the less-than-or-equal 319129203Scognetfunction with the operands reversed; and the greater-than function can be 320129203Scognetobtained from the less-than function in the same way. 321129203Scognet 322129203ScognetThe IEC/IEEE Standard specifies that the less-than-or-equal and less-than 323129203Scognetfunctions raise the invalid exception if either input is any kind of NaN. 324129203ScognetThe equal functions, on the other hand, are defined not to raise the invalid 325129203Scognetexception on quiet NaNs. For completeness, SoftFloat provides the following 326129203Scognetadditional functions: 327129203Scognet 328129203Scognet float32_eq_signaling float32_le_quiet float32_lt_quiet 329129203Scognet float64_eq_signaling float64_le_quiet float64_lt_quiet 330129203Scognet floatx80_eq_signaling floatx80_le_quiet floatx80_lt_quiet 331129203Scognet float128_eq_signaling float128_le_quiet float128_lt_quiet 332129203Scognet 333129203ScognetThe `signaling' equal functions are identical to the standard functions 334129203Scognetexcept that the invalid exception is raised for any NaN input. Likewise, 335129203Scognetthe `quiet' comparison functions are identical to their counterparts except 336129203Scognetthat the invalid exception is not raised for quiet NaNs. 337129203Scognet 338129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 339129203ScognetSignaling NaN Test Functions 340129203Scognet 341129203ScognetThe following functions test whether a floating-point value is a signaling 342129203ScognetNaN: 343129203Scognet 344129203Scognet float32_is_signaling_nan 345129203Scognet float64_is_signaling_nan 346129203Scognet floatx80_is_signaling_nan 347129203Scognet float128_is_signaling_nan 348129203Scognet 349129203ScognetThe functions take one operand and return 1 if the operand is a signaling 350129203ScognetNaN and 0 otherwise. 351129203Scognet 352129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 353129203ScognetRaise-Exception Function 354129203Scognet 355129203ScognetSoftFloat provides a function for raising floating-point exceptions: 356129203Scognet 357129203Scognet float_raise 358129203Scognet 359129203ScognetThe function takes a mask indicating the set of exceptions to raise. No 360129203Scognetresult is returned. In addition to setting the specified exception flags, 361129203Scognetthis function may cause a trap or abort appropriate for the current system. 362129203Scognet 363129203Scognet- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 364129203Scognet 365129203Scognet 366129203Scognet------------------------------------------------------------------------------- 367129203ScognetContact Information 368129203Scognet 369129203ScognetAt the time of this writing, the most up-to-date information about 370129203ScognetSoftFloat and the latest release can be found at the Web page `http:// 371129203ScognetHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'. 372129203Scognet 373129203Scognet 374