testfloat.txt revision 206917
1206917Smarius 2206917SmariusTestFloat Release 2a General Documentation 3206917Smarius 4206917SmariusJohn R. Hauser 5206917Smarius1998 December 16 6206917Smarius 7206917Smarius 8206917Smarius------------------------------------------------------------------------------- 9206917SmariusIntroduction 10206917Smarius 11206917SmariusTestFloat is a program for testing that a floating-point implementation 12206917Smariusconforms to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. 13206917SmariusAll standard operations supported by the system can be tested, except for 14206917Smariusconversions to and from decimal. Any of the following machine formats can 15206917Smariusbe tested: single precision, double precision, extended double precision, 16206917Smariusand/or quadruple precision. 17206917Smarius 18206917SmariusTestFloat actually comes in two variants: one is a program for testing 19206917Smariusa machine's floating-point, and the other is a program for testing 20206917Smariusthe SoftFloat software implementation of floating-point. (Information 21206917Smariusabout SoftFloat can be found at the SoftFloat Web page, `http:// 22206917SmariusHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.) The version that 23206917Smariustests SoftFloat is expected to be of interest only to people compiling the 24206917SmariusSoftFloat sources. However, because the two versions share much in common, 25206917Smariusthey are discussed together in all the TestFloat documentation. 26206917Smarius 27206917SmariusThis document explains how to use the TestFloat programs. It does not 28206917Smariusattempt to define or explain the IEC/IEEE Standard for floating-point. 29206917SmariusDetails about the standard are available elsewhere. 30206917Smarius 31206917SmariusThe first release of TestFloat (Release 1) was called _FloatTest_. The old 32206917Smariusname has been obsolete for some time. 33206917Smarius 34206917Smarius 35206917Smarius------------------------------------------------------------------------------- 36206917SmariusLimitations 37206917Smarius 38206917SmariusTestFloat's output is not always easily interpreted. Detailed knowledge 39206917Smariusof the IEC/IEEE Standard and its vagaries is needed to use TestFloat 40206917Smariusresponsibly. 41206917Smarius 42206917SmariusTestFloat performs relatively simple tests designed to check the fundamental 43206917Smariussoundness of the floating-point under test. TestFloat may also at times 44206917Smariusmanage to find rarer and more subtle bugs, but it will probably only find 45206917Smariussuch bugs by accident. Software that purposefully seeks out various kinds 46206917Smariusof subtle floating-point bugs can be found through links posted on the 47206917SmariusTestFloat Web page (`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/ 48206917SmariusTestFloat.html'). 49206917Smarius 50206917Smarius 51206917Smarius------------------------------------------------------------------------------- 52206917SmariusContents 53206917Smarius 54206917Smarius Introduction 55206917Smarius Limitations 56206917Smarius Contents 57206917Smarius Legal Notice 58206917Smarius What TestFloat Does 59206917Smarius Executing TestFloat 60206917Smarius Functions Tested by TestFloat 61206917Smarius Conversion Functions 62206917Smarius Standard Arithmetic Functions 63206917Smarius Remainder and Round-to-Integer Functions 64206917Smarius Comparison Functions 65206917Smarius Interpreting TestFloat Output 66206917Smarius Variations Allowed by the IEC/IEEE Standard 67206917Smarius Underflow 68206917Smarius NaNs 69206917Smarius Conversions to Integer 70206917Smarius TestFloat Options 71206917Smarius -help 72206917Smarius -list 73206917Smarius -level <num> 74206917Smarius -errors <num> 75206917Smarius -errorstop 76206917Smarius -forever 77206917Smarius -checkNaNs 78206917Smarius -precision32, -precision64, -precision80 79206917Smarius -nearesteven, -tozero, -down, -up 80206917Smarius -tininessbefore, -tininessafter 81206917Smarius Function Sets 82206917Smarius Contact Information 83206917Smarius 84206917Smarius 85206917Smarius 86206917Smarius------------------------------------------------------------------------------- 87206917SmariusLegal Notice 88206917Smarius 89206917SmariusTestFloat was written by John R. Hauser. 90206917Smarius 91206917SmariusTHIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort 92206917Smariushas been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT 93206917SmariusTIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO 94206917SmariusPERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY 95206917SmariusAND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. 96206917Smarius 97206917Smarius 98206917Smarius------------------------------------------------------------------------------- 99206917SmariusWhat TestFloat Does 100206917Smarius 101206917SmariusTestFloat tests a system's floating-point by comparing its behavior with 102206917Smariusthat of TestFloat's own internal floating-point implemented in software. 103206917SmariusFor each operation tested, TestFloat generates a large number of test cases, 104206917Smariusmade up of simple pattern tests intermixed with weighted random inputs. 105206917SmariusThe cases generated should be adequate for testing carry chain propagations, 106206917Smariusplus the rounding of adds, subtracts, multiplies, and simple operations like 107206917Smariusconversions. TestFloat makes a point of checking all boundary cases of the 108206917Smariusarithmetic, including underflows, overflows, invalid operations, subnormal 109206917Smariusinputs, zeros (positive and negative), infinities, and NaNs. For the 110206917Smariusinteresting operations like adds and multiplies, literally millions of test 111206917Smariuscases can be checked. 112206917Smarius 113206917SmariusTestFloat is not remarkably good at testing difficult rounding cases for 114206917Smariusdivisions and square roots. It also makes no attempt to find bugs specific 115206917Smariusto SRT divisions and the like (such as the infamous Pentium divide bug). 116206917SmariusSoftware that tests for such failures can be found through links on the 117206917SmariusTestFloat Web page, `http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/ 118206917SmariusTestFloat.html'. 119206917Smarius 120206917SmariusNOTE! 121206917SmariusIt is the responsibility of the user to verify that the discrepancies 122206917SmariusTestFloat finds actually represent faults in the system being tested. 123206917SmariusAdvice to help with this task is provided later in this document. 124206917SmariusFurthermore, even if TestFloat finds no fault with a floating-point 125206917Smariusimplementation, that in no way guarantees that the implementation is bug- 126206917Smariusfree. 127206917Smarius 128206917SmariusFor each operation, TestFloat can test all four rounding modes required 129206917Smariusby the IEC/IEEE Standard. TestFloat verifies not only that the numeric 130206917Smariusresults of an operation are correct, but also that the proper floating-point 131206917Smariusexception flags are raised. All five exception flags are tested, including 132206917Smariusthe inexact flag. TestFloat does not attempt to verify that the floating- 133206917Smariuspoint exception flags are actually implemented as sticky flags. 134206917Smarius 135206917SmariusFor machines that implement extended double precision with rounding 136206917Smariusprecision control (such as Intel's 80x86), TestFloat can test the add, 137206917Smariussubtract, multiply, divide, and square root functions at all the standard 138206917Smariusrounding precisions. The rounding precision can be set equivalent to single 139206917Smariusprecision, to double precision, or to the full extended double precision. 140206917SmariusRounding precision control can only be applied to the extended double- 141206917Smariusprecision format and only for the five standard arithmetic operations: add, 142206917Smariussubtract, multiply, divide, and square root. Other functions can be tested 143206917Smariusonly at full precision. 144206917Smarius 145206917SmariusAs a rule, TestFloat is not particular about the bit patterns of NaNs that 146206917Smariusappear as function results. Any NaN is considered as good a result as 147206917Smariusanother. This laxness can be overridden so that TestFloat checks for 148206917Smariusparticular bit patterns within NaN results. See the sections _Variations_ 149206917Smarius_Allowed_by_the_IEC/IEEE_Standard_ and _TestFloat_Options_ for details. 150206917Smarius 151206917SmariusNot all IEC/IEEE Standard functions are supported by all machines. 152206917SmariusTestFloat can only test functions that exist on the machine. But even if 153206917Smariusa function is supported by the machine, TestFloat may still not be able 154206917Smariusto test the function if it is not accessible through standard ISO C (the 155206917Smariusprogramming language in which TestFloat is written) and if the person who 156206917Smariuscompiled TestFloat did not provide an alternate means for TestFloat to 157206917Smariusinvoke the machine function. 158206917Smarius 159206917SmariusTestFloat compares a machine's floating-point against the SoftFloat software 160206917Smariusimplementation of floating-point, also written by me. SoftFloat is built 161206917Smariusinto the TestFloat executable and does not need to be supplied by the user. 162206917SmariusIf SoftFloat is wanted for some other reason (to compile a new version 163206917Smariusof TestFloat, for instance), it can be found separately at the Web page 164206917Smarius`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'. 165206917Smarius 166206917SmariusFor testing SoftFloat itself, the TestFloat package includes a program that 167206917Smariuscompares SoftFloat's floating-point against _another_ software floating- 168206917Smariuspoint implementation. The second software floating-point is simpler and 169206917Smariusslower than SoftFloat, and is completely independent of SoftFloat. Although 170206917Smariusthe second software floating-point cannot be guaranteed to be bug-free, the 171206917Smariuschance that it would mimic any of SoftFloat's bugs is remote. Consequently, 172206917Smariusan error in one or the other floating-point version should appear as an 173206917Smariusunexpected discrepancy between the two implementations. Note that testing 174206917SmariusSoftFloat should only be necessary when compiling a new TestFloat executable 175206917Smariusor when compiling SoftFloat for some other reason. 176206917Smarius 177206917Smarius 178206917Smarius------------------------------------------------------------------------------- 179206917SmariusExecuting TestFloat 180206917Smarius 181206917SmariusTestFloat is intended to be executed from a command line interpreter. The 182206917Smarius`testfloat' program is invoked as follows: 183206917Smarius 184206917Smarius testfloat [<option>...] <function> 185206917Smarius 186206917SmariusHere square brackets ([]) indicate optional items, while angled brackets 187206917Smarius(<>) denote parameters to be filled in. 188206917Smarius 189206917SmariusThe `<function>' argument is a name like `float32_add' or `float64_to_int32'. 190206917SmariusThe complete list of function names is given in the next section, 191206917Smarius_Functions_Tested_by_TestFloat_. It is also possible to test all machine 192206917Smariusfunctions in a single invocation. The various options to TestFloat are 193206917Smariusdetailed in the section _TestFloat_Options_ later in this document. If 194206917Smarius`testfloat' is executed without any arguments, a summary of TestFloat usage 195206917Smariusis written. 196206917Smarius 197206917SmariusTestFloat will ordinarily test a function for all four rounding modes, one 198206917Smariusafter the other. If the rounding mode is not supposed to have any affect 199206917Smariuson the results--for instance, some operations do not require rounding--only 200206917Smariusthe nearest/even rounding mode is checked. For extended double-precision 201206917Smariusoperations affected by rounding precision control, TestFloat also tests all 202206917Smariusthree rounding precision modes, one after the other. Testing can be limited 203206917Smariusto a single rounding mode and/or rounding precision with appropriate options 204206917Smarius(see _TestFloat_Options_). 205206917Smarius 206206917SmariusAs it executes, TestFloat writes status information to the standard error 207206917Smariusoutput, which should be the screen by default. In order for this status to 208206917Smariusbe displayed properly, the standard error stream should not be redirected 209206917Smariusto a file. The discrepancies TestFloat finds are written to the standard 210206917Smariusoutput stream, which is easily redirected to a file if desired. Ordinarily, 211206917Smariusthe errors TestFloat reports and the ongoing status information appear 212206917Smariusintermixed on the same screen. 213206917Smarius 214206917SmariusThe version of TestFloat for testing SoftFloat is called `testsoftfloat'. 215206917SmariusIt is invoked the same as `testfloat', 216206917Smarius 217206917Smarius testsoftfloat [<option>...] <function> 218206917Smarius 219206917Smariusand operates similarly. 220206917Smarius 221206917Smarius 222206917Smarius------------------------------------------------------------------------------- 223206917SmariusFunctions Tested by TestFloat 224206917Smarius 225206917SmariusTestFloat tests all operations required by the IEC/IEEE Standard except for 226206917Smariusconversions to and from decimal. The operations are 227206917Smarius 228206917Smarius-- Conversions among the supported floating-point formats, and also between 229206917Smarius integers (32-bit and 64-bit) and any of the floating-point formats. 230206917Smarius 231206917Smarius-- The usual add, subtract, multiply, divide, and square root operations 232206917Smarius for all supported floating-point formats. 233206917Smarius 234206917Smarius-- For each format, the floating-point remainder operation defined by the 235206917Smarius IEC/IEEE Standard. 236206917Smarius 237206917Smarius-- For each floating-point format, a ``round to integer'' operation that 238206917Smarius rounds to the nearest integer value in the same format. (The floating- 239206917Smarius point formats can hold integer values, of course.) 240206917Smarius 241206917Smarius-- Comparisons between two values in the same floating-point format. 242206917Smarius 243206917SmariusDetailed information about these functions is given below. In the function 244206917Smariusnames used by TestFloat, single precision is called `float32', double 245206917Smariusprecision is `float64', extended double precision is `floatx80', and 246206917Smariusquadruple precision is `float128'. TestFloat uses the same names for 247206917Smariusfunctions as SoftFloat. 248206917Smarius 249206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 250206917SmariusConversion Functions 251206917Smarius 252206917SmariusAll conversions among the floating-point formats and all conversion between 253206917Smariusa floating-point format and 32-bit and 64-bit signed integers can be tested. 254206917SmariusThe conversion functions are: 255206917Smarius 256206917Smarius int32_to_float32 int64_to_float32 257206917Smarius int32_to_float64 int64_to_float32 258206917Smarius int32_to_floatx80 int64_to_floatx80 259206917Smarius int32_to_float128 int64_to_float128 260206917Smarius 261206917Smarius float32_to_int32 float32_to_int64 262206917Smarius float32_to_int32 float64_to_int64 263206917Smarius floatx80_to_int32 floatx80_to_int64 264206917Smarius float128_to_int32 float128_to_int64 265206917Smarius 266206917Smarius float32_to_float64 float32_to_floatx80 float32_to_float128 267206917Smarius float64_to_float32 float64_to_floatx80 float64_to_float128 268206917Smarius floatx80_to_float32 floatx80_to_float64 floatx80_to_float128 269206917Smarius float128_to_float32 float128_to_float64 float128_to_floatx80 270206917Smarius 271206917SmariusThese conversions all round according to the current rounding mode as 272206917Smariusnecessary. Conversions from a smaller to a larger floating-point format are 273206917Smariusalways exact and so require no rounding. Conversions from 32-bit integers 274206917Smariusto double precision or to any larger floating-point format are also exact, 275206917Smariusand likewise for conversions from 64-bit integers to extended double and 276206917Smariusquadruple precisions. 277206917Smarius 278206917SmariusISO/ANSI C requires that conversions to integers be rounded toward zero. 279206917SmariusSuch conversions can be tested with the following functions that ignore any 280206917Smariusrounding mode: 281206917Smarius 282206917Smarius float32_to_int32_round_to_zero float32_to_int64_round_to_zero 283206917Smarius float64_to_int32_round_to_zero float64_to_int64_round_to_zero 284206917Smarius floatx80_to_int32_round_to_zero floatx80_to_int64_round_to_zero 285206917Smarius float128_to_int32_round_to_zero float128_to_int64_round_to_zero 286206917Smarius 287206917SmariusTestFloat assumes that conversions from floating-point to integer should 288206917Smariusraise the invalid exception if the source value cannot be rounded to a 289206917Smariusrepresentable integer of the desired size (32 or 64 bits). If such a 290206917Smariusconversion overflows, TestFloat expects the largest integer with the same 291206917Smariussign as the operand to be returned. If the floating-point operand is a NaN, 292206917SmariusTestFloat allows either the largest postive or largest negative integer to 293206917Smariusbe returned. 294206917Smarius 295206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 296206917SmariusStandard Arithmetic Functions 297206917Smarius 298206917SmariusThe following standard arithmetic functions can be tested: 299206917Smarius 300206917Smarius float32_add float32_sub float32_mul float32_div float32_sqrt 301206917Smarius float64_add float64_sub float64_mul float64_div float64_sqrt 302206917Smarius floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt 303206917Smarius float128_add float128_sub float128_mul float128_div float128_sqrt 304206917Smarius 305206917SmariusThe extended double-precision (`floatx80') functions can be rounded to 306206917Smariusreduced precision under rounding precision control. 307206917Smarius 308206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 309206917SmariusRemainder and Round-to-Integer Functions 310206917Smarius 311206917SmariusFor each format, TestFloat can test the IEC/IEEE Standard remainder and 312206917Smariusround-to-integer functions. The remainder functions are: 313206917Smarius 314206917Smarius float32_rem 315206917Smarius float64_rem 316206917Smarius floatx80_rem 317206917Smarius float128_rem 318206917Smarius 319206917SmariusThe round-to-integer functions are: 320206917Smarius 321206917Smarius float32_round_to_int 322206917Smarius float64_round_to_int 323206917Smarius floatx80_round_to_int 324206917Smarius float128_round_to_int 325206917Smarius 326206917SmariusThe remainder functions are always exact and so do not require rounding. 327206917Smarius 328206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 329206917SmariusComparison Functions 330206917Smarius 331206917SmariusThe following floating-point comparison functions can be tested: 332206917Smarius 333206917Smarius float32_eq float32_le float32_lt 334206917Smarius float64_eq float64_le float64_lt 335206917Smarius floatx80_eq floatx80_le floatx80_lt 336206917Smarius float128_eq float128_le float128_lt 337206917Smarius 338206917SmariusThe abbreviation `eq' stands for ``equal'' (=); `le' stands for ``less than 339206917Smariusor equal'' (<=); and `lt' stands for ``less than'' (<). 340206917Smarius 341206917SmariusThe IEC/IEEE Standard specifies that the less-than-or-equal and less-than 342206917Smariusfunctions raise the invalid exception if either input is any kind of NaN. 343206917SmariusThe equal functions, for their part, are defined not to raise the invalid 344206917Smariusexception on quiet NaNs. For completeness, the following additional 345206917Smariusfunctions can be tested if supported: 346206917Smarius 347206917Smarius float32_eq_signaling float32_le_quiet float32_lt_quiet 348206917Smarius float64_eq_signaling float64_le_quiet float64_lt_quiet 349206917Smarius floatx80_eq_signaling floatx80_le_quiet floatx80_lt_quiet 350206917Smarius float128_eq_signaling float128_le_quiet float128_lt_quiet 351206917Smarius 352206917SmariusThe `signaling' equal functions are identical to the standard functions 353206917Smariusexcept that the invalid exception should be raised for any NaN input. 354206917SmariusLikewise, the `quiet' comparison functions should be identical to their 355206917Smariuscounterparts except that the invalid exception is not raised for quiet NaNs. 356206917Smarius 357206917SmariusObviously, no comparison functions ever require rounding. Any rounding mode 358206917Smariusis ignored. 359206917Smarius 360206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 361206917Smarius 362206917Smarius 363206917Smarius------------------------------------------------------------------------------- 364206917SmariusInterpreting TestFloat Output 365206917Smarius 366206917SmariusThe ``errors'' reported by TestFloat may or may not really represent errors 367206917Smariusin the system being tested. For each test case tried, TestFloat performs 368206917Smariusthe same floating-point operation for the two implementations being compared 369206917Smariusand reports any unexpected difference in the results. The two results could 370206917Smariusdiffer for several reasons: 371206917Smarius 372206917Smarius-- The IEC/IEEE Standard allows for some variation in how conforming 373206917Smarius floating-point behaves. Two implementations can occasionally give 374206917Smarius different results without either being incorrect. 375206917Smarius 376206917Smarius-- The trusted floating-point emulation could be faulty. This could be 377206917Smarius because there is a bug in the way the enulation is coded, or because a 378206917Smarius mistake was made when the code was compiled for the current system. 379206917Smarius 380206917Smarius-- TestFloat may not work properly, reporting discrepancies that do not 381206917Smarius exist. 382206917Smarius 383206917Smarius-- Lastly, the floating-point being tested could actually be faulty. 384206917Smarius 385206917SmariusIt is the responsibility of the user to determine the causes for the 386206917Smariusdiscrepancies TestFloat reports. Making this determination can require 387206917Smariusdetailed knowledge about the IEC/IEEE Standard. Assuming TestFloat is 388206917Smariusworking properly, any differences found will be due to either the first or 389206917Smariuslast of these reasons. Variations in the IEC/IEEE Standard that could lead 390206917Smariusto false error reports are discussed in the section _Variations_Allowed_by_ 391206917Smarius_the_IEC/IEEE_Standard_. 392206917Smarius 393206917SmariusFor each error (or apparent error) TestFloat reports, a line of text 394206917Smariusis written to the default output. If a line would be longer than 79 395206917Smariuscharacters, it is divided. The first part of each error line begins in the 396206917Smariusleftmost column, and any subsequent ``continuation'' lines are indented with 397206917Smariusa tab. 398206917Smarius 399206917SmariusEach error reported by `testfloat' is of the form: 400206917Smarius 401206917Smarius <inputs> soft: <output-from-emulation> syst: <output-from-system> 402206917Smarius 403206917SmariusThe `<inputs>' are the inputs to the operation. Each output is shown as a 404206917Smariuspair: the result value first, followed by the exception flags. The `soft' 405206917Smariuslabel stands for ``software'' (or ``SoftFloat''), while `syst' stands for 406206917Smarius``system,'' the machine's floating-point. 407206917Smarius 408206917SmariusFor example, two typical error lines could be 409206917Smarius 410206917Smarius 800.7FFF00 87F.000100 soft: 001.000000 ....x syst: 001.000000 ...ux 411206917Smarius 081.000004 000.1FFFFF soft: 001.000000 ....x syst: 001.000000 ...ux 412206917Smarius 413206917SmariusIn the first line, the inputs are `800.7FFF00' and `87F.000100'. The 414206917Smariusinternal emulation result is `001.000000' with flags `....x', and the 415206917Smariussystem result is the same but with flags `...ux'. All the items composed of 416206917Smariushexadecimal digits and a single period represent floating-point values (here 417206917Smariussingle precision). These cases were reported as errors because the flag 418206917Smariusresults differ. 419206917Smarius 420206917SmariusIn addition to the exception flags, there are seven data types that may 421206917Smariusbe represented. Four are floating-point types: single precision, double 422206917Smariusprecision, extended double precision, and quadruple precision. The 423206917Smariusremaining three types are 32-bit and 64-bit two's-complement integers and 424206917SmariusBoolean values (the results of comparison operations). Boolean values are 425206917Smariusrepresented as a single character, either a `0' or a `1'. 32-bit integers 426206917Smariusare written as 8 hexadecimal digits in two's-complement form. Thus, 427206917Smarius`FFFFFFFF' is -1, and `7FFFFFFF' is the largest positive 32-bit integer. 428206917Smarius64-bit integers are the same except with 16 hexadecimal digits. 429206917Smarius 430206917SmariusFloating-point values are written in a correspondingly primitive form. 431206917SmariusDouble-precision values are represented by 16 hexadecimal digits that give 432206917Smariusthe raw bits of the floating-point encoding. A period separates the 3rd and 433206917Smarius4th hexadecimal digits to mark the division between the exponent bits and 434206917Smariusfraction bits. Some notable double-precision values include: 435206917Smarius 436206917Smarius 000.0000000000000 +0 437206917Smarius 3FF.0000000000000 1 438206917Smarius 400.0000000000000 2 439206917Smarius 7FF.0000000000000 +infinity 440206917Smarius 441206917Smarius 800.0000000000000 -0 442206917Smarius BFF.0000000000000 -1 443206917Smarius C00.0000000000000 -2 444206917Smarius FFF.0000000000000 -infinity 445206917Smarius 446206917Smarius 3FE.FFFFFFFFFFFFF largest representable number preceding +1 447206917Smarius 448206917SmariusThe following categories are easily distinguished (assuming the `x's are not 449206917Smariusall 0): 450206917Smarius 451206917Smarius 000.xxxxxxxxxxxxx positive subnormal (denormalized) numbers 452206917Smarius 7FF.xxxxxxxxxxxxx positive NaNs 453206917Smarius 800.xxxxxxxxxxxxx negative subnormal numbers 454206917Smarius FFF.xxxxxxxxxxxxx negative NaNs 455206917Smarius 456206917SmariusQuadruple-precision values are written the same except with 4 hexadecimal 457206917Smariusdigits for the sign and exponent and 28 for the fraction. Notable values 458206917Smariusinclude: 459206917Smarius 460206917Smarius 0000.0000000000000000000000000000 +0 461206917Smarius 3FFF.0000000000000000000000000000 1 462206917Smarius 4000.0000000000000000000000000000 2 463206917Smarius 7FFF.0000000000000000000000000000 +infinity 464206917Smarius 465206917Smarius 8000.0000000000000000000000000000 -0 466206917Smarius BFFF.0000000000000000000000000000 -1 467206917Smarius C000.0000000000000000000000000000 -2 468206917Smarius FFFF.0000000000000000000000000000 -infinity 469206917Smarius 470206917Smarius 3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF largest representable number 471206917Smarius preceding +1 472206917Smarius 473206917SmariusExtended double-precision values are a little unusual in that the leading 474206917Smariussignificand bit is not hidden as with other formats. When correctly 475206917Smariusencoded, the leading significand bit of an extended double-precision value 476206917Smariuswill be 0 if the value is zero or subnormal, and will be 1 otherwise. 477206917SmariusHence, the same values listed above appear in extended double-precision as 478206917Smariusfollows (note the leading `8' digit in the significands): 479206917Smarius 480206917Smarius 0000.0000000000000000 +0 481206917Smarius 3FFF.8000000000000000 1 482206917Smarius 4000.8000000000000000 2 483206917Smarius 7FFF.8000000000000000 +infinity 484206917Smarius 485206917Smarius 8000.0000000000000000 -0 486206917Smarius BFFF.8000000000000000 -1 487206917Smarius C000.8000000000000000 -2 488206917Smarius FFFF.8000000000000000 -infinity 489206917Smarius 490206917Smarius 3FFE.FFFFFFFFFFFFFFFF largest representable number preceding +1 491206917Smarius 492206917SmariusThe representation of single-precision values is unusual for a different 493206917Smariusreason. Because the subfields of standard single-precision do not fall 494206917Smariuson neat 4-bit boundaries, single-precision outputs are slightly perturbed. 495206917SmariusThese are written as 9 hexadecimal digits, with a period separating the 3rd 496206917Smariusand 4th hexadecimal digits. Broken out into bits, the 9 hexademical digits 497206917Smariuscover the single-precision subfields as follows: 498206917Smarius 499206917Smarius x000 .... .... . .... .... .... .... .... .... sign (1 bit) 500206917Smarius .... xxxx xxxx . .... .... .... .... .... .... exponent (8 bits) 501206917Smarius .... .... .... . 0xxx xxxx xxxx xxxx xxxx xxxx fraction (23 bits) 502206917Smarius 503206917SmariusAs shown in this schematic, the first hexadecimal digit contains only 504206917Smariusthe sign, and will be either `0' or `8'. The next two digits give the 505206917Smariusbiased exponent as an 8-bit integer. This is followed by a period and 506206917Smarius6 hexadecimal digits of fraction. The most significant hexadecimal digit 507206917Smariusof the fraction can be at most a `7'. 508206917Smarius 509206917SmariusNotable single-precision values include: 510206917Smarius 511206917Smarius 000.000000 +0 512206917Smarius 07F.000000 1 513206917Smarius 080.000000 2 514206917Smarius 0FF.000000 +infinity 515206917Smarius 516206917Smarius 800.000000 -0 517206917Smarius 87F.000000 -1 518206917Smarius 880.000000 -2 519206917Smarius 8FF.000000 -infinity 520206917Smarius 521206917Smarius 07E.7FFFFF largest representable number preceding +1 522206917Smarius 523206917SmariusAgain, certain categories are easily distinguished (assuming the `x's are 524206917Smariusnot all 0): 525206917Smarius 526206917Smarius 000.xxxxxx positive subnormal (denormalized) numbers 527206917Smarius 0FF.xxxxxx positive NaNs 528206917Smarius 800.xxxxxx negative subnormal numbers 529206917Smarius 8FF.xxxxxx negative NaNs 530206917Smarius 531206917SmariusLastly, exception flag values are represented by five characters, one 532206917Smariuscharacter per flag. Each flag is written as either a letter or a period 533206917Smarius(`.') according to whether the flag was set or not by the operation. A 534206917Smariusperiod indicates the flag was not set. The letter used to indicate a set 535206917Smariusflag depends on the flag: 536206917Smarius 537206917Smarius v invalid flag 538206917Smarius z division-by-zero flag 539206917Smarius o overflow flag 540206917Smarius u underflow flag 541206917Smarius x inexact flag 542206917Smarius 543206917SmariusFor example, the notation `...ux' indicates that the underflow and inexact 544206917Smariusexception flags were set and that the other three flags (invalid, division- 545206917Smariusby-zero, and overflow) were not set. The exception flags are always shown 546206917Smariusfollowing the value returned as the result of the operation. 547206917Smarius 548206917SmariusThe output from `testsoftfloat' is of the same form, except that the results 549206917Smariusare labeled `true' and `soft': 550206917Smarius 551206917Smarius <inputs> true: <simple-software-result> soft: <SoftFloat-result> 552206917Smarius 553206917SmariusThe ``true'' result is from the simpler, slower software floating-point, 554206917Smariuswhich, although not necessarily correct, is more likely to be right than 555206917Smariusthe SoftFloat (`soft') result. 556206917Smarius 557206917Smarius 558206917Smarius------------------------------------------------------------------------------- 559206917SmariusVariations Allowed by the IEC/IEEE Standard 560206917Smarius 561206917SmariusThe IEC/IEEE Standard admits some variation among conforming 562206917Smariusimplementations. Because TestFloat expects the two implementations being 563206917Smariuscompared to deliver bit-for-bit identical results under most circumstances, 564206917Smariusthis leeway in the standard can result in false errors being reported if 565206917Smariusthe two implementations do not make the same choices everywhere the standard 566206917Smariusprovides an option. 567206917Smarius 568206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 569206917SmariusUnderflow 570206917Smarius 571206917SmariusThe standard specifies that the underflow exception flag is to be raised 572206917Smariuswhen two conditions are met simultaneously: (1) _tininess_ and (2) _loss_ 573206917Smarius_of_accuracy_. A result is tiny when its magnitude is nonzero yet smaller 574206917Smariusthan any normalized floating-point number. The standard allows tininess to 575206917Smariusbe determined either before or after a result is rounded to the destination 576206917Smariusprecision. If tininess is detected before rounding, some borderline cases 577206917Smariuswill be flagged as underflows even though the result after rounding actually 578206917Smariuslies within the normal floating-point range. By detecting tininess after 579206917Smariusrounding, a system can avoid some unnecessary signaling of underflow. 580206917Smarius 581206917SmariusLoss of accuracy occurs when the subnormal format is not sufficient 582206917Smariusto represent an underflowed result accurately. The standard allows 583206917Smariusloss of accuracy to be detected either as an _inexact_result_ or as a 584206917Smarius_denormalization_loss_. If loss of accuracy is detected as an inexact 585206917Smariusresult, the underflow flag is raised whenever an underflowed quantity 586206917Smariuscannot be exactly represented in the subnormal format (that is, whenever the 587206917Smariusinexact flag is also raised). A denormalization loss, on the other hand, 588206917Smariusoccurs only when the subnormal format is not able to represent the result 589206917Smariusthat would have been returned if the destination format had infinite range. 590206917SmariusSome underflowed results are inexact but do not suffer a denormalization 591206917Smariusloss. By detecting loss of accuracy as a denormalization loss, a system can 592206917Smariusonce again avoid some unnecessary signaling of underflow. 593206917Smarius 594206917SmariusThe `-tininessbefore' and `-tininessafter' options can be used to control 595206917Smariuswhether TestFloat expects tininess on underflow to be detected before or 596206917Smariusafter rounding. (See _TestFloat_Options_ below.) One or the other is 597206917Smariusselected as the default when TestFloat is compiled, but these command 598206917Smariusoptions allow the default to be overridden. 599206917Smarius 600206917SmariusMost (possibly all) systems detect loss of accuracy as an inexact result. 601206917SmariusThe current version of TestFloat can only test for this case. 602206917Smarius 603206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 604206917SmariusNaNs 605206917Smarius 606206917SmariusThe IEC/IEEE Standard gives the floating-point formats a large number of 607206917SmariusNaN encodings and specifies that NaNs are to be returned as results under 608206917Smariuscertain conditions. However, the standard allows an implementation almost 609206917Smariuscomplete freedom over _which_ NaN to return in each situation. 610206917Smarius 611206917SmariusBy default, TestFloat does not check the bit patterns of NaN results. When 612206917Smariusthe result of an operation should be a NaN, any NaN is considered as good 613206917Smariusas another. This laxness can be overridden with the `-checkNaNs' option. 614206917Smarius(See _TestFloat_Options_ below.) In order for this option to be sensible, 615206917SmariusTestFloat must have been compiled so that its internal floating-point 616206917Smariusimplementation (SoftFloat) generates the proper NaN results for the system 617206917Smariusbeing tested. 618206917Smarius 619206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 620206917SmariusConversions to Integer 621206917Smarius 622206917SmariusConversion of a floating-point value to an integer format will fail if the 623206917Smariussource value is a NaN or if it is too large. The IEC/IEEE Standard does not 624206917Smariusspecify what value should be returned as the integer result in these cases. 625206917SmariusMoreover, according to the standard, the invalid exception can be raised or 626206917Smariusan unspecified alternative mechanism may be used to signal such cases. 627206917Smarius 628206917SmariusTestFloat assumes that conversions to integer will raise the invalid 629206917Smariusexception if the source value cannot be rounded to a representable integer. 630206917SmariusWhen the conversion overflows, TestFloat expects the largest integer with 631206917Smariusthe same sign as the operand to be returned. If the floating-point operand 632206917Smariusis a NaN, TestFloat allows either the largest postive or largest negative 633206917Smariusinteger to be returned. The current version of TestFloat provides no means 634206917Smariusto alter these conventions. 635206917Smarius 636206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 637206917Smarius 638206917Smarius 639206917Smarius------------------------------------------------------------------------------- 640206917SmariusTestFloat Options 641206917Smarius 642206917SmariusThe `testfloat' (and `testsoftfloat') program accepts several command 643206917Smariusoptions. If mutually contradictory options are given, the last one has 644206917Smariuspriority. 645206917Smarius 646206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 647206917Smarius-help 648206917Smarius 649206917SmariusThe `-help' option causes a summary of program usage to be written, after 650206917Smariuswhich the program exits. 651206917Smarius 652206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 653206917Smarius-list 654206917Smarius 655206917SmariusThe `-list' option causes a list of testable functions to be written, 656206917Smariusafter which the program exits. Some machines do not implement all of the 657206917Smariusfunctions TestFloat can test, plus it may not be possible to test functions 658206917Smariusthat are inaccessible from the C language. 659206917Smarius 660206917SmariusThe `testsoftfloat' program does not have this option. All SoftFloat 661206917Smariusfunctions can be tested by `testsoftfloat'. 662206917Smarius 663206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 664206917Smarius-level <num> 665206917Smarius 666206917SmariusThe `-level' option sets the level of testing. The argument to `-level' can 667206917Smariusbe either 1 or 2. The default is level 1. Level 2 performs many more tests 668206917Smariusthan level 1. Testing at level 2 can take as much as a day (even longer for 669206917Smarius`testsoftfloat'), but can reveal bugs not found by level 1. 670206917Smarius 671206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 672206917Smarius-errors <num> 673206917Smarius 674206917SmariusThe `-errors' option instructs TestFloat to report no more than the 675206917Smariusspecified number of errors for any combination of function, rounding mode, 676206917Smariusetc. The argument to `-errors' must be a nonnegative decimal number. Once 677206917Smariusthe specified number of error reports has been generated, TestFloat ends the 678206917Smariuscurrent test and begins the next one, if any. The default is `-errors 20'. 679206917Smarius 680206917SmariusAgainst intuition, `-errors 0' causes TestFloat to report every error it 681206917Smariusfinds. 682206917Smarius 683206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 684206917Smarius-errorstop 685206917Smarius 686206917SmariusThe `-errorstop' option causes the program to exit after the first function 687206917Smariusfor which any errors are reported. 688206917Smarius 689206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 690206917Smarius-forever 691206917Smarius 692206917SmariusThe `-forever' option causes a single operation to be repeatedly tested. 693206917SmariusOnly one rounding mode and/or rounding precision can be tested in a single 694206917Smariusinvocation. If not specified, the rounding mode defaults to nearest/even. 695206917SmariusFor extended double-precision operations, the rounding precision defaults 696206917Smariusto full extended double precision. The testing level is set to 2 by this 697206917Smariusoption. 698206917Smarius 699206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 700206917Smarius-checkNaNs 701206917Smarius 702206917SmariusThe `-checkNaNs' option causes TestFloat to verify the bitwise correctness 703206917Smariusof NaN results. In order for this option to be sensible, TestFloat must 704206917Smariushave been compiled so that its internal floating-point implementation 705206917Smarius(SoftFloat) generates the proper NaN results for the system being tested. 706206917Smarius 707206917SmariusThis option is not available to `testsoftfloat'. 708206917Smarius 709206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 710206917Smarius-precision32, -precision64, -precision80 711206917Smarius 712206917SmariusFor extended double-precision functions affected by rounding precision 713206917Smariuscontrol, the `-precision32' option restricts testing to only the cases 714206917Smariusin which rounding precision is equivalent to single precision. The other 715206917Smariusrounding precision options are not tested. Likewise, the `-precision64' 716206917Smariusand `-precision80' options fix the rounding precision equivalent to double 717206917Smariusprecision or extended double precision, respectively. These options are 718206917Smariusignored for functions not affected by rounding precision control. 719206917Smarius 720206917SmariusThese options are not available if extended double precision is not 721206917Smariussupported by the machine or if extended double precision functions cannot be 722206917Smariustested. 723206917Smarius 724206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 725206917Smarius-nearesteven, -tozero, -down, -up 726206917Smarius 727206917SmariusThe `-nearesteven' option restricts testing to only the cases in which the 728206917Smariusrounding mode is nearest/even. The other rounding mode options are not 729206917Smariustested. Likewise, `-tozero' forces rounding to zero; `-down' forces 730206917Smariusrounding down; and `-up' forces rounding up. These options are ignored for 731206917Smariusfunctions that are exact and thus do not round. 732206917Smarius 733206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 734206917Smarius-tininessbefore, -tininessafter 735206917Smarius 736206917SmariusThe `-tininessbefore' option indicates that the system detects tininess 737206917Smariuson underflow before rounding. The `-tininessafter' option indicates that 738206917Smariustininess is detected after rounding. TestFloat alters its expectations 739206917Smariusaccordingly. These options override the default selected when TestFloat was 740206917Smariuscompiled. Choosing the wrong one of these two options should cause error 741206917Smariusreports for some (not all) functions. 742206917Smarius 743206917SmariusFor `testsoftfloat', these options operate more like the rounding precision 744206917Smariusand rounding mode options, in that they restrict the tests performed by 745206917Smarius`testsoftfloat'. By default, `testsoftfloat' tests both cases for any 746206917Smariusfunction for which there is a difference. 747206917Smarius 748206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 749206917Smarius 750206917Smarius 751206917Smarius------------------------------------------------------------------------------- 752206917SmariusFunction Sets 753206917Smarius 754206917SmariusJust as TestFloat can test an operation for all four rounding modes in 755206917Smariussequence, multiple operations can be tested with a single invocation of 756206917SmariusTestFloat. Three sets are recognized: `-all1', `-all2', and `-all'. The 757206917Smariusset `-all1' comprises all one-operand functions; `-all2' is all two-operand 758206917Smariusfunctions; and `-all' is all functions. A function set can be used in place 759206917Smariusof a function name in the TestFloat command line, such as 760206917Smarius 761206917Smarius testfloat [<option>...] -all 762206917Smarius 763206917Smarius 764206917Smarius------------------------------------------------------------------------------- 765206917SmariusContact Information 766206917Smarius 767206917SmariusAt the time of this writing, the most up-to-date information about 768206917SmariusTestFloat and the latest release can be found at the Web page `http:// 769206917SmariusHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/TestFloat.html'. 770206917Smarius 771206917Smarius 772