1206917Smarius
2206917SmariusTestFloat Release 2a General Documentation
3206917Smarius
4206917SmariusJohn R. Hauser
5206917Smarius1998 December 16
6206917Smarius
7206917Smarius
8206917Smarius-------------------------------------------------------------------------------
9206917SmariusIntroduction
10206917Smarius
11206917SmariusTestFloat is a program for testing that a floating-point implementation
12206917Smariusconforms to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
13206917SmariusAll standard operations supported by the system can be tested, except for
14206917Smariusconversions to and from decimal.  Any of the following machine formats can
15206917Smariusbe tested:  single precision, double precision, extended double precision,
16206917Smariusand/or quadruple precision.
17206917Smarius
18206917SmariusTestFloat actually comes in two variants:  one is a program for testing
19206917Smariusa machine's floating-point, and the other is a program for testing
20206917Smariusthe SoftFloat software implementation of floating-point.  (Information
21206917Smariusabout SoftFloat can be found at the SoftFloat Web page, `http://
22206917SmariusHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.)  The version that
23206917Smariustests SoftFloat is expected to be of interest only to people compiling the
24206917SmariusSoftFloat sources.  However, because the two versions share much in common,
25206917Smariusthey are discussed together in all the TestFloat documentation.
26206917Smarius
27206917SmariusThis document explains how to use the TestFloat programs.  It does not
28206917Smariusattempt to define or explain the IEC/IEEE Standard for floating-point.
29206917SmariusDetails about the standard are available elsewhere.
30206917Smarius
31206917SmariusThe first release of TestFloat (Release 1) was called _FloatTest_.  The old
32206917Smariusname has been obsolete for some time.
33206917Smarius
34206917Smarius
35206917Smarius-------------------------------------------------------------------------------
36206917SmariusLimitations
37206917Smarius
38206917SmariusTestFloat's output is not always easily interpreted.  Detailed knowledge
39206917Smariusof the IEC/IEEE Standard and its vagaries is needed to use TestFloat
40206917Smariusresponsibly.
41206917Smarius
42206917SmariusTestFloat performs relatively simple tests designed to check the fundamental
43206917Smariussoundness of the floating-point under test.  TestFloat may also at times
44206917Smariusmanage to find rarer and more subtle bugs, but it will probably only find
45206917Smariussuch bugs by accident.  Software that purposefully seeks out various kinds
46206917Smariusof subtle floating-point bugs can be found through links posted on the
47206917SmariusTestFloat Web page (`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/
48206917SmariusTestFloat.html').
49206917Smarius
50206917Smarius
51206917Smarius-------------------------------------------------------------------------------
52206917SmariusContents
53206917Smarius
54206917Smarius    Introduction
55206917Smarius    Limitations
56206917Smarius    Contents
57206917Smarius    Legal Notice
58206917Smarius    What TestFloat Does
59206917Smarius    Executing TestFloat
60206917Smarius    Functions Tested by TestFloat
61206917Smarius        Conversion Functions
62206917Smarius        Standard Arithmetic Functions
63206917Smarius        Remainder and Round-to-Integer Functions
64206917Smarius        Comparison Functions
65206917Smarius    Interpreting TestFloat Output
66206917Smarius    Variations Allowed by the IEC/IEEE Standard
67206917Smarius        Underflow
68206917Smarius        NaNs
69206917Smarius        Conversions to Integer
70206917Smarius    TestFloat Options
71206917Smarius        -help
72206917Smarius        -list
73206917Smarius        -level <num>
74206917Smarius        -errors <num>
75206917Smarius        -errorstop
76206917Smarius        -forever
77206917Smarius        -checkNaNs
78206917Smarius        -precision32, -precision64, -precision80
79206917Smarius        -nearesteven, -tozero, -down, -up
80206917Smarius        -tininessbefore, -tininessafter
81206917Smarius    Function Sets
82206917Smarius    Contact Information
83206917Smarius
84206917Smarius
85206917Smarius
86206917Smarius-------------------------------------------------------------------------------
87206917SmariusLegal Notice
88206917Smarius
89206917SmariusTestFloat was written by John R. Hauser.
90206917Smarius
91206917SmariusTHIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
92206917Smariushas been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
93206917SmariusTIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
94206917SmariusPERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
95206917SmariusAND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
96206917Smarius
97206917Smarius
98206917Smarius-------------------------------------------------------------------------------
99206917SmariusWhat TestFloat Does
100206917Smarius
101206917SmariusTestFloat tests a system's floating-point by comparing its behavior with
102206917Smariusthat of TestFloat's own internal floating-point implemented in software.
103206917SmariusFor each operation tested, TestFloat generates a large number of test cases,
104206917Smariusmade up of simple pattern tests intermixed with weighted random inputs.
105206917SmariusThe cases generated should be adequate for testing carry chain propagations,
106206917Smariusplus the rounding of adds, subtracts, multiplies, and simple operations like
107206917Smariusconversions.  TestFloat makes a point of checking all boundary cases of the
108206917Smariusarithmetic, including underflows, overflows, invalid operations, subnormal
109206917Smariusinputs, zeros (positive and negative), infinities, and NaNs.  For the
110206917Smariusinteresting operations like adds and multiplies, literally millions of test
111206917Smariuscases can be checked.
112206917Smarius
113206917SmariusTestFloat is not remarkably good at testing difficult rounding cases for
114206917Smariusdivisions and square roots.  It also makes no attempt to find bugs specific
115206917Smariusto SRT divisions and the like (such as the infamous Pentium divide bug).
116206917SmariusSoftware that tests for such failures can be found through links on the
117206917SmariusTestFloat Web page, `http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/
118206917SmariusTestFloat.html'.
119206917Smarius
120206917SmariusNOTE!
121206917SmariusIt is the responsibility of the user to verify that the discrepancies
122206917SmariusTestFloat finds actually represent faults in the system being tested.
123206917SmariusAdvice to help with this task is provided later in this document.
124206917SmariusFurthermore, even if TestFloat finds no fault with a floating-point
125206917Smariusimplementation, that in no way guarantees that the implementation is bug-
126206917Smariusfree.
127206917Smarius
128206917SmariusFor each operation, TestFloat can test all four rounding modes required
129206917Smariusby the IEC/IEEE Standard.  TestFloat verifies not only that the numeric
130206917Smariusresults of an operation are correct, but also that the proper floating-point
131206917Smariusexception flags are raised.  All five exception flags are tested, including
132206917Smariusthe inexact flag.  TestFloat does not attempt to verify that the floating-
133206917Smariuspoint exception flags are actually implemented as sticky flags.
134206917Smarius
135206917SmariusFor machines that implement extended double precision with rounding
136206917Smariusprecision control (such as Intel's 80x86), TestFloat can test the add,
137206917Smariussubtract, multiply, divide, and square root functions at all the standard
138206917Smariusrounding precisions.  The rounding precision can be set equivalent to single
139206917Smariusprecision, to double precision, or to the full extended double precision.
140206917SmariusRounding precision control can only be applied to the extended double-
141206917Smariusprecision format and only for the five standard arithmetic operations:  add,
142206917Smariussubtract, multiply, divide, and square root.  Other functions can be tested
143206917Smariusonly at full precision.
144206917Smarius
145206917SmariusAs a rule, TestFloat is not particular about the bit patterns of NaNs that
146206917Smariusappear as function results.  Any NaN is considered as good a result as
147206917Smariusanother.  This laxness can be overridden so that TestFloat checks for
148206917Smariusparticular bit patterns within NaN results.  See the sections _Variations_
149206917Smarius_Allowed_by_the_IEC/IEEE_Standard_ and _TestFloat_Options_ for details.
150206917Smarius
151206917SmariusNot all IEC/IEEE Standard functions are supported by all machines.
152206917SmariusTestFloat can only test functions that exist on the machine.  But even if
153206917Smariusa function is supported by the machine, TestFloat may still not be able
154206917Smariusto test the function if it is not accessible through standard ISO C (the
155206917Smariusprogramming language in which TestFloat is written) and if the person who
156206917Smariuscompiled TestFloat did not provide an alternate means for TestFloat to
157206917Smariusinvoke the machine function.
158206917Smarius
159206917SmariusTestFloat compares a machine's floating-point against the SoftFloat software
160206917Smariusimplementation of floating-point, also written by me.  SoftFloat is built
161206917Smariusinto the TestFloat executable and does not need to be supplied by the user.
162206917SmariusIf SoftFloat is wanted for some other reason (to compile a new version
163206917Smariusof TestFloat, for instance), it can be found separately at the Web page
164206917Smarius`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.
165206917Smarius
166206917SmariusFor testing SoftFloat itself, the TestFloat package includes a program that
167206917Smariuscompares SoftFloat's floating-point against _another_ software floating-
168206917Smariuspoint implementation.  The second software floating-point is simpler and
169206917Smariusslower than SoftFloat, and is completely independent of SoftFloat.  Although
170206917Smariusthe second software floating-point cannot be guaranteed to be bug-free, the
171206917Smariuschance that it would mimic any of SoftFloat's bugs is remote.  Consequently,
172206917Smariusan error in one or the other floating-point version should appear as an
173206917Smariusunexpected discrepancy between the two implementations.  Note that testing
174206917SmariusSoftFloat should only be necessary when compiling a new TestFloat executable
175206917Smariusor when compiling SoftFloat for some other reason.
176206917Smarius
177206917Smarius
178206917Smarius-------------------------------------------------------------------------------
179206917SmariusExecuting TestFloat
180206917Smarius
181206917SmariusTestFloat is intended to be executed from a command line interpreter.  The
182206917Smarius`testfloat' program is invoked as follows:
183206917Smarius
184206917Smarius    testfloat [<option>...] <function>
185206917Smarius
186206917SmariusHere square brackets ([]) indicate optional items, while angled brackets
187206917Smarius(<>) denote parameters to be filled in.
188206917Smarius
189206917SmariusThe `<function>' argument is a name like `float32_add' or `float64_to_int32'.
190206917SmariusThe complete list of function names is given in the next section,
191206917Smarius_Functions_Tested_by_TestFloat_.  It is also possible to test all machine
192206917Smariusfunctions in a single invocation.  The various options to TestFloat are
193206917Smariusdetailed in the section _TestFloat_Options_ later in this document.  If
194206917Smarius`testfloat' is executed without any arguments, a summary of TestFloat usage
195206917Smariusis written.
196206917Smarius
197206917SmariusTestFloat will ordinarily test a function for all four rounding modes, one
198206917Smariusafter the other.  If the rounding mode is not supposed to have any affect
199206917Smariuson the results--for instance, some operations do not require rounding--only
200206917Smariusthe nearest/even rounding mode is checked.  For extended double-precision
201206917Smariusoperations affected by rounding precision control, TestFloat also tests all
202206917Smariusthree rounding precision modes, one after the other.  Testing can be limited
203206917Smariusto a single rounding mode and/or rounding precision with appropriate options
204206917Smarius(see _TestFloat_Options_).
205206917Smarius
206206917SmariusAs it executes, TestFloat writes status information to the standard error
207206917Smariusoutput, which should be the screen by default.  In order for this status to
208206917Smariusbe displayed properly, the standard error stream should not be redirected
209206917Smariusto a file.  The discrepancies TestFloat finds are written to the standard
210206917Smariusoutput stream, which is easily redirected to a file if desired.  Ordinarily,
211206917Smariusthe errors TestFloat reports and the ongoing status information appear
212206917Smariusintermixed on the same screen.
213206917Smarius
214206917SmariusThe version of TestFloat for testing SoftFloat is called `testsoftfloat'.
215206917SmariusIt is invoked the same as `testfloat',
216206917Smarius
217206917Smarius    testsoftfloat [<option>...] <function>
218206917Smarius
219206917Smariusand operates similarly.
220206917Smarius
221206917Smarius
222206917Smarius-------------------------------------------------------------------------------
223206917SmariusFunctions Tested by TestFloat
224206917Smarius
225206917SmariusTestFloat tests all operations required by the IEC/IEEE Standard except for
226206917Smariusconversions to and from decimal.  The operations are
227206917Smarius
228206917Smarius-- Conversions among the supported floating-point formats, and also between
229206917Smarius   integers (32-bit and 64-bit) and any of the floating-point formats.
230206917Smarius
231206917Smarius-- The usual add, subtract, multiply, divide, and square root operations
232206917Smarius   for all supported floating-point formats.
233206917Smarius
234206917Smarius-- For each format, the floating-point remainder operation defined by the
235206917Smarius   IEC/IEEE Standard.
236206917Smarius
237206917Smarius-- For each floating-point format, a ``round to integer'' operation that
238206917Smarius   rounds to the nearest integer value in the same format.  (The floating-
239206917Smarius   point formats can hold integer values, of course.)
240206917Smarius
241206917Smarius-- Comparisons between two values in the same floating-point format.
242206917Smarius
243206917SmariusDetailed information about these functions is given below.  In the function
244206917Smariusnames used by TestFloat, single precision is called `float32', double
245206917Smariusprecision is `float64', extended double precision is `floatx80', and
246206917Smariusquadruple precision is `float128'.  TestFloat uses the same names for
247206917Smariusfunctions as SoftFloat.
248206917Smarius
249206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
250206917SmariusConversion Functions
251206917Smarius
252206917SmariusAll conversions among the floating-point formats and all conversion between
253206917Smariusa floating-point format and 32-bit and 64-bit signed integers can be tested.
254206917SmariusThe conversion functions are:
255206917Smarius
256206917Smarius   int32_to_float32      int64_to_float32
257206917Smarius   int32_to_float64      int64_to_float32
258206917Smarius   int32_to_floatx80     int64_to_floatx80
259206917Smarius   int32_to_float128     int64_to_float128
260206917Smarius
261206917Smarius   float32_to_int32      float32_to_int64
262206917Smarius   float32_to_int32      float64_to_int64
263206917Smarius   floatx80_to_int32     floatx80_to_int64
264206917Smarius   float128_to_int32     float128_to_int64
265206917Smarius
266206917Smarius   float32_to_float64    float32_to_floatx80   float32_to_float128
267206917Smarius   float64_to_float32    float64_to_floatx80   float64_to_float128
268206917Smarius   floatx80_to_float32   floatx80_to_float64   floatx80_to_float128
269206917Smarius   float128_to_float32   float128_to_float64   float128_to_floatx80
270206917Smarius
271206917SmariusThese conversions all round according to the current rounding mode as
272206917Smariusnecessary.  Conversions from a smaller to a larger floating-point format are
273206917Smariusalways exact and so require no rounding.  Conversions from 32-bit integers
274206917Smariusto double precision or to any larger floating-point format are also exact,
275206917Smariusand likewise for conversions from 64-bit integers to extended double and
276206917Smariusquadruple precisions.
277206917Smarius
278206917SmariusISO/ANSI C requires that conversions to integers be rounded toward zero.
279206917SmariusSuch conversions can be tested with the following functions that ignore any
280206917Smariusrounding mode:
281206917Smarius
282206917Smarius   float32_to_int32_round_to_zero    float32_to_int64_round_to_zero
283206917Smarius   float64_to_int32_round_to_zero    float64_to_int64_round_to_zero
284206917Smarius   floatx80_to_int32_round_to_zero   floatx80_to_int64_round_to_zero
285206917Smarius   float128_to_int32_round_to_zero   float128_to_int64_round_to_zero
286206917Smarius
287206917SmariusTestFloat assumes that conversions from floating-point to integer should
288206917Smariusraise the invalid exception if the source value cannot be rounded to a
289206917Smariusrepresentable integer of the desired size (32 or 64 bits).  If such a
290206917Smariusconversion overflows, TestFloat expects the largest integer with the same
291206917Smariussign as the operand to be returned.  If the floating-point operand is a NaN,
292228975SuqsTestFloat allows either the largest positive or largest negative integer to
293206917Smariusbe returned.
294206917Smarius
295206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
296206917SmariusStandard Arithmetic Functions
297206917Smarius
298206917SmariusThe following standard arithmetic functions can be tested:
299206917Smarius
300206917Smarius   float32_add    float32_sub    float32_mul    float32_div    float32_sqrt
301206917Smarius   float64_add    float64_sub    float64_mul    float64_div    float64_sqrt
302206917Smarius   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
303206917Smarius   float128_add   float128_sub   float128_mul   float128_div   float128_sqrt
304206917Smarius
305206917SmariusThe extended double-precision (`floatx80') functions can be rounded to
306206917Smariusreduced precision under rounding precision control.
307206917Smarius
308206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
309206917SmariusRemainder and Round-to-Integer Functions
310206917Smarius
311206917SmariusFor each format, TestFloat can test the IEC/IEEE Standard remainder and
312206917Smariusround-to-integer functions.  The remainder functions are:
313206917Smarius
314206917Smarius   float32_rem
315206917Smarius   float64_rem
316206917Smarius   floatx80_rem
317206917Smarius   float128_rem
318206917Smarius
319206917SmariusThe round-to-integer functions are:
320206917Smarius
321206917Smarius   float32_round_to_int
322206917Smarius   float64_round_to_int
323206917Smarius   floatx80_round_to_int
324206917Smarius   float128_round_to_int
325206917Smarius
326206917SmariusThe remainder functions are always exact and so do not require rounding.
327206917Smarius
328206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
329206917SmariusComparison Functions
330206917Smarius
331206917SmariusThe following floating-point comparison functions can be tested:
332206917Smarius
333206917Smarius   float32_eq    float32_le    float32_lt
334206917Smarius   float64_eq    float64_le    float64_lt
335206917Smarius   floatx80_eq   floatx80_le   floatx80_lt
336206917Smarius   float128_eq   float128_le   float128_lt
337206917Smarius
338206917SmariusThe abbreviation `eq' stands for ``equal'' (=); `le' stands for ``less than
339206917Smariusor equal'' (<=); and `lt' stands for ``less than'' (<).
340206917Smarius
341206917SmariusThe IEC/IEEE Standard specifies that the less-than-or-equal and less-than
342206917Smariusfunctions raise the invalid exception if either input is any kind of NaN.
343206917SmariusThe equal functions, for their part, are defined not to raise the invalid
344206917Smariusexception on quiet NaNs.  For completeness, the following additional
345206917Smariusfunctions can be tested if supported:
346206917Smarius
347206917Smarius   float32_eq_signaling    float32_le_quiet    float32_lt_quiet
348206917Smarius   float64_eq_signaling    float64_le_quiet    float64_lt_quiet
349206917Smarius   floatx80_eq_signaling   floatx80_le_quiet   floatx80_lt_quiet
350206917Smarius   float128_eq_signaling   float128_le_quiet   float128_lt_quiet
351206917Smarius
352206917SmariusThe `signaling' equal functions are identical to the standard functions
353206917Smariusexcept that the invalid exception should be raised for any NaN input.
354206917SmariusLikewise, the `quiet' comparison functions should be identical to their
355206917Smariuscounterparts except that the invalid exception is not raised for quiet NaNs.
356206917Smarius
357206917SmariusObviously, no comparison functions ever require rounding.  Any rounding mode
358206917Smariusis ignored.
359206917Smarius
360206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
361206917Smarius
362206917Smarius
363206917Smarius-------------------------------------------------------------------------------
364206917SmariusInterpreting TestFloat Output
365206917Smarius
366206917SmariusThe ``errors'' reported by TestFloat may or may not really represent errors
367206917Smariusin the system being tested.  For each test case tried, TestFloat performs
368206917Smariusthe same floating-point operation for the two implementations being compared
369206917Smariusand reports any unexpected difference in the results.  The two results could
370206917Smariusdiffer for several reasons:
371206917Smarius
372206917Smarius-- The IEC/IEEE Standard allows for some variation in how conforming
373206917Smarius   floating-point behaves.  Two implementations can occasionally give
374206917Smarius   different results without either being incorrect.
375206917Smarius
376206917Smarius-- The trusted floating-point emulation could be faulty.  This could be
377206917Smarius   because there is a bug in the way the enulation is coded, or because a
378206917Smarius   mistake was made when the code was compiled for the current system.
379206917Smarius
380206917Smarius-- TestFloat may not work properly, reporting discrepancies that do not
381206917Smarius   exist.
382206917Smarius
383206917Smarius-- Lastly, the floating-point being tested could actually be faulty.
384206917Smarius
385206917SmariusIt is the responsibility of the user to determine the causes for the
386206917Smariusdiscrepancies TestFloat reports.  Making this determination can require
387206917Smariusdetailed knowledge about the IEC/IEEE Standard.  Assuming TestFloat is
388206917Smariusworking properly, any differences found will be due to either the first or
389206917Smariuslast of these reasons.  Variations in the IEC/IEEE Standard that could lead
390206917Smariusto false error reports are discussed in the section _Variations_Allowed_by_
391206917Smarius_the_IEC/IEEE_Standard_.
392206917Smarius
393206917SmariusFor each error (or apparent error) TestFloat reports, a line of text
394206917Smariusis written to the default output.  If a line would be longer than 79
395206917Smariuscharacters, it is divided.  The first part of each error line begins in the
396206917Smariusleftmost column, and any subsequent ``continuation'' lines are indented with
397206917Smariusa tab.
398206917Smarius
399206917SmariusEach error reported by `testfloat' is of the form:
400206917Smarius
401206917Smarius    <inputs>  soft: <output-from-emulation>  syst: <output-from-system>
402206917Smarius
403206917SmariusThe `<inputs>' are the inputs to the operation.  Each output is shown as a
404206917Smariuspair:  the result value first, followed by the exception flags.  The `soft'
405206917Smariuslabel stands for ``software'' (or ``SoftFloat''), while `syst' stands for
406206917Smarius``system,'' the machine's floating-point.
407206917Smarius
408206917SmariusFor example, two typical error lines could be
409206917Smarius
410206917Smarius    800.7FFF00  87F.000100  soft: 001.000000 ....x  syst: 001.000000 ...ux
411206917Smarius    081.000004  000.1FFFFF  soft: 001.000000 ....x  syst: 001.000000 ...ux
412206917Smarius
413206917SmariusIn the first line, the inputs are `800.7FFF00' and `87F.000100'.  The
414206917Smariusinternal emulation result is `001.000000' with flags `....x', and the
415206917Smariussystem result is the same but with flags `...ux'.  All the items composed of
416206917Smariushexadecimal digits and a single period represent floating-point values (here
417206917Smariussingle precision).  These cases were reported as errors because the flag
418206917Smariusresults differ.
419206917Smarius
420206917SmariusIn addition to the exception flags, there are seven data types that may
421206917Smariusbe represented.  Four are floating-point types:  single precision, double
422206917Smariusprecision, extended double precision, and quadruple precision.  The
423206917Smariusremaining three types are 32-bit and 64-bit two's-complement integers and
424206917SmariusBoolean values (the results of comparison operations).  Boolean values are
425206917Smariusrepresented as a single character, either a `0' or a `1'.  32-bit integers
426206917Smariusare written as 8 hexadecimal digits in two's-complement form.  Thus,
427206917Smarius`FFFFFFFF' is -1, and `7FFFFFFF' is the largest positive 32-bit integer.
428206917Smarius64-bit integers are the same except with 16 hexadecimal digits.
429206917Smarius
430206917SmariusFloating-point values are written in a correspondingly primitive form.
431206917SmariusDouble-precision values are represented by 16 hexadecimal digits that give
432206917Smariusthe raw bits of the floating-point encoding.  A period separates the 3rd and
433206917Smarius4th hexadecimal digits to mark the division between the exponent bits and
434206917Smariusfraction bits.  Some notable double-precision values include:
435206917Smarius
436206917Smarius    000.0000000000000    +0
437206917Smarius    3FF.0000000000000     1
438206917Smarius    400.0000000000000     2
439206917Smarius    7FF.0000000000000    +infinity
440206917Smarius
441206917Smarius    800.0000000000000    -0
442206917Smarius    BFF.0000000000000    -1
443206917Smarius    C00.0000000000000    -2
444206917Smarius    FFF.0000000000000    -infinity
445206917Smarius
446206917Smarius    3FE.FFFFFFFFFFFFF    largest representable number preceding +1
447206917Smarius
448206917SmariusThe following categories are easily distinguished (assuming the `x's are not
449206917Smariusall 0):
450206917Smarius
451206917Smarius    000.xxxxxxxxxxxxx    positive subnormal (denormalized) numbers
452206917Smarius    7FF.xxxxxxxxxxxxx    positive NaNs
453206917Smarius    800.xxxxxxxxxxxxx    negative subnormal numbers
454206917Smarius    FFF.xxxxxxxxxxxxx    negative NaNs
455206917Smarius
456206917SmariusQuadruple-precision values are written the same except with 4 hexadecimal
457206917Smariusdigits for the sign and exponent and 28 for the fraction.  Notable values
458206917Smariusinclude:
459206917Smarius
460206917Smarius    0000.0000000000000000000000000000    +0
461206917Smarius    3FFF.0000000000000000000000000000     1
462206917Smarius    4000.0000000000000000000000000000     2
463206917Smarius    7FFF.0000000000000000000000000000    +infinity
464206917Smarius
465206917Smarius    8000.0000000000000000000000000000    -0
466206917Smarius    BFFF.0000000000000000000000000000    -1
467206917Smarius    C000.0000000000000000000000000000    -2
468206917Smarius    FFFF.0000000000000000000000000000    -infinity
469206917Smarius
470206917Smarius    3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF    largest representable number
471206917Smarius                                             preceding +1
472206917Smarius
473206917SmariusExtended double-precision values are a little unusual in that the leading
474206917Smariussignificand bit is not hidden as with other formats.  When correctly
475206917Smariusencoded, the leading significand bit of an extended double-precision value
476206917Smariuswill be 0 if the value is zero or subnormal, and will be 1 otherwise.
477206917SmariusHence, the same values listed above appear in extended double-precision as
478206917Smariusfollows (note the leading `8' digit in the significands):
479206917Smarius
480206917Smarius    0000.0000000000000000    +0
481206917Smarius    3FFF.8000000000000000     1
482206917Smarius    4000.8000000000000000     2
483206917Smarius    7FFF.8000000000000000    +infinity
484206917Smarius
485206917Smarius    8000.0000000000000000    -0
486206917Smarius    BFFF.8000000000000000    -1
487206917Smarius    C000.8000000000000000    -2
488206917Smarius    FFFF.8000000000000000    -infinity
489206917Smarius
490206917Smarius    3FFE.FFFFFFFFFFFFFFFF    largest representable number preceding +1
491206917Smarius
492206917SmariusThe representation of single-precision values is unusual for a different
493206917Smariusreason.  Because the subfields of standard single-precision do not fall
494206917Smariuson neat 4-bit boundaries, single-precision outputs are slightly perturbed.
495206917SmariusThese are written as 9 hexadecimal digits, with a period separating the 3rd
496206917Smariusand 4th hexadecimal digits.  Broken out into bits, the 9 hexademical digits
497206917Smariuscover the single-precision subfields as follows:
498206917Smarius
499206917Smarius    x000 .... ....  .  .... .... .... .... .... ....    sign       (1 bit)
500206917Smarius    .... xxxx xxxx  .  .... .... .... .... .... ....    exponent   (8 bits)
501206917Smarius    .... .... ....  .  0xxx xxxx xxxx xxxx xxxx xxxx    fraction  (23 bits)
502206917Smarius
503206917SmariusAs shown in this schematic, the first hexadecimal digit contains only
504206917Smariusthe sign, and will be either `0' or `8'.  The next two digits give the
505206917Smariusbiased exponent as an 8-bit integer.  This is followed by a period and
506206917Smarius6 hexadecimal digits of fraction.  The most significant hexadecimal digit
507206917Smariusof the fraction can be at most a `7'.
508206917Smarius
509206917SmariusNotable single-precision values include:
510206917Smarius
511206917Smarius    000.000000    +0
512206917Smarius    07F.000000     1
513206917Smarius    080.000000     2
514206917Smarius    0FF.000000    +infinity
515206917Smarius
516206917Smarius    800.000000    -0
517206917Smarius    87F.000000    -1
518206917Smarius    880.000000    -2
519206917Smarius    8FF.000000    -infinity
520206917Smarius
521206917Smarius    07E.7FFFFF    largest representable number preceding +1
522206917Smarius
523206917SmariusAgain, certain categories are easily distinguished (assuming the `x's are
524206917Smariusnot all 0):
525206917Smarius
526206917Smarius    000.xxxxxx    positive subnormal (denormalized) numbers
527206917Smarius    0FF.xxxxxx    positive NaNs
528206917Smarius    800.xxxxxx    negative subnormal numbers
529206917Smarius    8FF.xxxxxx    negative NaNs
530206917Smarius
531206917SmariusLastly, exception flag values are represented by five characters, one
532206917Smariuscharacter per flag.  Each flag is written as either a letter or a period
533206917Smarius(`.') according to whether the flag was set or not by the operation.  A
534206917Smariusperiod indicates the flag was not set.  The letter used to indicate a set
535206917Smariusflag depends on the flag:
536206917Smarius
537206917Smarius    v    invalid flag
538206917Smarius    z    division-by-zero flag
539206917Smarius    o    overflow flag
540206917Smarius    u    underflow flag
541206917Smarius    x    inexact flag
542206917Smarius
543206917SmariusFor example, the notation `...ux' indicates that the underflow and inexact
544206917Smariusexception flags were set and that the other three flags (invalid, division-
545206917Smariusby-zero, and overflow) were not set.  The exception flags are always shown
546206917Smariusfollowing the value returned as the result of the operation.
547206917Smarius
548206917SmariusThe output from `testsoftfloat' is of the same form, except that the results
549206917Smariusare labeled `true' and `soft':
550206917Smarius
551206917Smarius    <inputs>  true: <simple-software-result>  soft: <SoftFloat-result>
552206917Smarius
553206917SmariusThe ``true'' result is from the simpler, slower software floating-point,
554206917Smariuswhich, although not necessarily correct, is more likely to be right than
555206917Smariusthe SoftFloat (`soft') result.
556206917Smarius
557206917Smarius
558206917Smarius-------------------------------------------------------------------------------
559206917SmariusVariations Allowed by the IEC/IEEE Standard
560206917Smarius
561206917SmariusThe IEC/IEEE Standard admits some variation among conforming
562206917Smariusimplementations.  Because TestFloat expects the two implementations being
563206917Smariuscompared to deliver bit-for-bit identical results under most circumstances,
564206917Smariusthis leeway in the standard can result in false errors being reported if
565206917Smariusthe two implementations do not make the same choices everywhere the standard
566206917Smariusprovides an option.
567206917Smarius
568206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
569206917SmariusUnderflow
570206917Smarius
571206917SmariusThe standard specifies that the underflow exception flag is to be raised
572206917Smariuswhen two conditions are met simultaneously:  (1) _tininess_ and (2) _loss_
573206917Smarius_of_accuracy_.  A result is tiny when its magnitude is nonzero yet smaller
574206917Smariusthan any normalized floating-point number.  The standard allows tininess to
575206917Smariusbe determined either before or after a result is rounded to the destination
576206917Smariusprecision.  If tininess is detected before rounding, some borderline cases
577206917Smariuswill be flagged as underflows even though the result after rounding actually
578206917Smariuslies within the normal floating-point range.  By detecting tininess after
579206917Smariusrounding, a system can avoid some unnecessary signaling of underflow.
580206917Smarius
581206917SmariusLoss of accuracy occurs when the subnormal format is not sufficient
582206917Smariusto represent an underflowed result accurately.  The standard allows
583206917Smariusloss of accuracy to be detected either as an _inexact_result_ or as a
584206917Smarius_denormalization_loss_.  If loss of accuracy is detected as an inexact
585206917Smariusresult, the underflow flag is raised whenever an underflowed quantity
586206917Smariuscannot be exactly represented in the subnormal format (that is, whenever the
587206917Smariusinexact flag is also raised).  A denormalization loss, on the other hand,
588206917Smariusoccurs only when the subnormal format is not able to represent the result
589206917Smariusthat would have been returned if the destination format had infinite range.
590206917SmariusSome underflowed results are inexact but do not suffer a denormalization
591206917Smariusloss.  By detecting loss of accuracy as a denormalization loss, a system can
592206917Smariusonce again avoid some unnecessary signaling of underflow.
593206917Smarius
594206917SmariusThe `-tininessbefore' and `-tininessafter' options can be used to control
595206917Smariuswhether TestFloat expects tininess on underflow to be detected before or
596206917Smariusafter rounding.  (See _TestFloat_Options_ below.)  One or the other is
597206917Smariusselected as the default when TestFloat is compiled, but these command
598206917Smariusoptions allow the default to be overridden.
599206917Smarius
600206917SmariusMost (possibly all) systems detect loss of accuracy as an inexact result.
601206917SmariusThe current version of TestFloat can only test for this case.
602206917Smarius
603206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
604206917SmariusNaNs
605206917Smarius
606206917SmariusThe IEC/IEEE Standard gives the floating-point formats a large number of
607206917SmariusNaN encodings and specifies that NaNs are to be returned as results under
608206917Smariuscertain conditions.  However, the standard allows an implementation almost
609206917Smariuscomplete freedom over _which_ NaN to return in each situation.
610206917Smarius
611206917SmariusBy default, TestFloat does not check the bit patterns of NaN results.  When
612206917Smariusthe result of an operation should be a NaN, any NaN is considered as good
613206917Smariusas another.  This laxness can be overridden with the `-checkNaNs' option.
614206917Smarius(See _TestFloat_Options_ below.)  In order for this option to be sensible,
615206917SmariusTestFloat must have been compiled so that its internal floating-point
616206917Smariusimplementation (SoftFloat) generates the proper NaN results for the system
617206917Smariusbeing tested.
618206917Smarius
619206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
620206917SmariusConversions to Integer
621206917Smarius
622206917SmariusConversion of a floating-point value to an integer format will fail if the
623206917Smariussource value is a NaN or if it is too large.  The IEC/IEEE Standard does not
624206917Smariusspecify what value should be returned as the integer result in these cases.
625206917SmariusMoreover, according to the standard, the invalid exception can be raised or
626206917Smariusan unspecified alternative mechanism may be used to signal such cases.
627206917Smarius
628206917SmariusTestFloat assumes that conversions to integer will raise the invalid
629206917Smariusexception if the source value cannot be rounded to a representable integer.
630206917SmariusWhen the conversion overflows, TestFloat expects the largest integer with
631206917Smariusthe same sign as the operand to be returned.  If the floating-point operand
632228975Suqsis a NaN, TestFloat allows either the largest positive or largest negative
633206917Smariusinteger to be returned.  The current version of TestFloat provides no means
634206917Smariusto alter these conventions.
635206917Smarius
636206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
637206917Smarius
638206917Smarius
639206917Smarius-------------------------------------------------------------------------------
640206917SmariusTestFloat Options
641206917Smarius
642206917SmariusThe `testfloat' (and `testsoftfloat') program accepts several command
643206917Smariusoptions.  If mutually contradictory options are given, the last one has
644206917Smariuspriority.
645206917Smarius
646206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
647206917Smarius-help
648206917Smarius
649206917SmariusThe `-help' option causes a summary of program usage to be written, after
650206917Smariuswhich the program exits.
651206917Smarius
652206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
653206917Smarius-list
654206917Smarius
655206917SmariusThe `-list' option causes a list of testable functions to be written,
656206917Smariusafter which the program exits.  Some machines do not implement all of the
657206917Smariusfunctions TestFloat can test, plus it may not be possible to test functions
658206917Smariusthat are inaccessible from the C language.
659206917Smarius
660206917SmariusThe `testsoftfloat' program does not have this option.  All SoftFloat
661206917Smariusfunctions can be tested by `testsoftfloat'.
662206917Smarius
663206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
664206917Smarius-level <num>
665206917Smarius
666206917SmariusThe `-level' option sets the level of testing.  The argument to `-level' can
667206917Smariusbe either 1 or 2.  The default is level 1.  Level 2 performs many more tests
668206917Smariusthan level 1.  Testing at level 2 can take as much as a day (even longer for
669206917Smarius`testsoftfloat'), but can reveal bugs not found by level 1.
670206917Smarius
671206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
672206917Smarius-errors <num>
673206917Smarius
674206917SmariusThe `-errors' option instructs TestFloat to report no more than the
675206917Smariusspecified number of errors for any combination of function, rounding mode,
676206917Smariusetc.  The argument to `-errors' must be a nonnegative decimal number.  Once
677206917Smariusthe specified number of error reports has been generated, TestFloat ends the
678206917Smariuscurrent test and begins the next one, if any.  The default is `-errors 20'.
679206917Smarius
680206917SmariusAgainst intuition, `-errors 0' causes TestFloat to report every error it
681206917Smariusfinds.
682206917Smarius
683206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
684206917Smarius-errorstop
685206917Smarius
686206917SmariusThe `-errorstop' option causes the program to exit after the first function
687206917Smariusfor which any errors are reported.
688206917Smarius
689206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
690206917Smarius-forever
691206917Smarius
692206917SmariusThe `-forever' option causes a single operation to be repeatedly tested.
693206917SmariusOnly one rounding mode and/or rounding precision can be tested in a single
694206917Smariusinvocation.  If not specified, the rounding mode defaults to nearest/even.
695206917SmariusFor extended double-precision operations, the rounding precision defaults
696206917Smariusto full extended double precision.  The testing level is set to 2 by this
697206917Smariusoption.
698206917Smarius
699206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
700206917Smarius-checkNaNs
701206917Smarius
702206917SmariusThe `-checkNaNs' option causes TestFloat to verify the bitwise correctness
703206917Smariusof NaN results.  In order for this option to be sensible, TestFloat must
704206917Smariushave been compiled so that its internal floating-point implementation
705206917Smarius(SoftFloat) generates the proper NaN results for the system being tested.
706206917Smarius
707206917SmariusThis option is not available to `testsoftfloat'.
708206917Smarius
709206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
710206917Smarius-precision32, -precision64, -precision80
711206917Smarius
712206917SmariusFor extended double-precision functions affected by rounding precision
713206917Smariuscontrol, the `-precision32' option restricts testing to only the cases
714206917Smariusin which rounding precision is equivalent to single precision.  The other
715206917Smariusrounding precision options are not tested.  Likewise, the `-precision64'
716206917Smariusand `-precision80' options fix the rounding precision equivalent to double
717206917Smariusprecision or extended double precision, respectively.  These options are
718206917Smariusignored for functions not affected by rounding precision control.
719206917Smarius
720206917SmariusThese options are not available if extended double precision is not
721206917Smariussupported by the machine or if extended double precision functions cannot be
722206917Smariustested.
723206917Smarius
724206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
725206917Smarius-nearesteven, -tozero, -down, -up
726206917Smarius
727206917SmariusThe `-nearesteven' option restricts testing to only the cases in which the
728206917Smariusrounding mode is nearest/even.  The other rounding mode options are not
729206917Smariustested.  Likewise, `-tozero' forces rounding to zero; `-down' forces
730206917Smariusrounding down; and `-up' forces rounding up.  These options are ignored for
731206917Smariusfunctions that are exact and thus do not round.
732206917Smarius
733206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
734206917Smarius-tininessbefore, -tininessafter
735206917Smarius
736206917SmariusThe `-tininessbefore' option indicates that the system detects tininess
737206917Smariuson underflow before rounding.  The `-tininessafter' option indicates that
738206917Smariustininess is detected after rounding.  TestFloat alters its expectations
739206917Smariusaccordingly.  These options override the default selected when TestFloat was
740206917Smariuscompiled.  Choosing the wrong one of these two options should cause error
741206917Smariusreports for some (not all) functions.
742206917Smarius
743206917SmariusFor `testsoftfloat', these options operate more like the rounding precision
744206917Smariusand rounding mode options, in that they restrict the tests performed by
745206917Smarius`testsoftfloat'.  By default, `testsoftfloat' tests both cases for any
746206917Smariusfunction for which there is a difference.
747206917Smarius
748206917Smarius- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
749206917Smarius
750206917Smarius
751206917Smarius-------------------------------------------------------------------------------
752206917SmariusFunction Sets
753206917Smarius
754206917SmariusJust as TestFloat can test an operation for all four rounding modes in
755206917Smariussequence, multiple operations can be tested with a single invocation of
756206917SmariusTestFloat.  Three sets are recognized:  `-all1', `-all2', and `-all'.  The
757206917Smariusset `-all1' comprises all one-operand functions; `-all2' is all two-operand
758206917Smariusfunctions; and `-all' is all functions.  A function set can be used in place
759206917Smariusof a function name in the TestFloat command line, such as
760206917Smarius
761206917Smarius    testfloat [<option>...] -all
762206917Smarius
763206917Smarius
764206917Smarius-------------------------------------------------------------------------------
765206917SmariusContact Information
766206917Smarius
767206917SmariusAt the time of this writing, the most up-to-date information about
768206917SmariusTestFloat and the latest release can be found at the Web page `http://
769206917SmariusHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/TestFloat.html'.
770206917Smarius
771206917Smarius
772