darwin-ldouble-format revision 169689
1284345SsjgLong double format
2284345Ssjg==================
3284345Ssjg
4284345Ssjg  Each long double is made up of two IEEE doubles.  The value of the
5290816Ssjglong double is the sum of the values of the two parts (except for
6284345Ssjg-0.0).  The most significant part is required to be the value of the
7284349Ssjglong double rounded to the nearest double, as specified by IEEE.  For
8284349SsjgInf values, the least significant part is required to be one of +0.0
9284349Ssjgor -0.0.  No other requirements are made; so, for example, 1.0 may be
10287898Sbdreweryrepresented as (1.0, +0.0) or (1.0, -0.0), and the low part of a NaN
11287869Sbdreweryis don't-care.
12284345Ssjg
13297438SbdreweryClassification
14284650Ssjg--------------
15287898Sbdrewery
16287636SsjgA long double can represent any value of the form
17287636Ssjg  s * 2^e * sum(k=0...105: f_k * 2^(-k))
18287636Ssjgwhere 's' is +1 or -1, 'e' is between 1022 and -968 inclusive, f_0 is
19287867Sbdrewery1, and f_k for k>0 is 0 or 1.  These are the 'normal' long doubles.
20284650Ssjg
21287867SbdreweryA long double can also represent any value of the form
22284650Ssjg  s * 2^-968 * sum(k=0...105: f_k * 2^(-k))
23287636Ssjgwhere 's' is +1 or -1, f_0 is 0, and f_k for k>0 is 0 or 1.  These are
24287887Sbdrewerythe 'subnormal' long doubles.
25287887Sbdrewery
26284650SsjgThere are four long doubles that represent zero, two that represent
27284345Ssjg+0.0 and two that represent -0.0.  The sign of the high part is the
28284345Ssjgsign of the long double, and the sign of the low part is ignored.
29284345Ssjg
30284345SsjgLikewise, there are four long doubles that represent infinities, two
31284345Ssjgfor +Inf and two for -Inf.
32287966Sbdrewery
33284345SsjgEach NaN, quiet or signalling, that can be represented as a 'double'
34287871Sbdrewerycan be represented as a 'long double'.  In fact, there are 2^64
35284345Ssjgequivalent representations for each one.
36284345Ssjg
37284345SsjgThere are certain other valid long doubles where both parts are
38284345Ssjgnonzero but the low part represents a value which has a bit set below
39284345Ssjg2^(e-105).  These, together with the subnormal long doubles, make up
40284345Ssjgthe denormal long doubles.
41284345Ssjg
42284345SsjgMany possible long double bit patterns are not valid long doubles.
43284345SsjgThese do not represent any value.
44284345Ssjg
45284345SsjgLimits
46287885Sbdrewery------
47287885Sbdrewery
48287885SbdreweryThe maximum representable long double is 2^1024-2^918.  The smallest
49284345Ssjg*normal* positive long double is 2^-968.  The smallest denormalised
50284345Ssjgpositive long double is 2^-1074 (this is the same as for 'double').
51292124Sbr
52284345SsjgConversions
53284345Ssjg-----------
54284345Ssjg
55284345SsjgA double can be converted to a long double by adding a zero low part.
56284345Ssjg
57284345SsjgA long double can be converted to a double by removing the low part.
58292124Sbr
59284345SsjgComparisons
60284345Ssjg-----------
61284345Ssjg
62284345SsjgTwo long doubles can be compared by comparing the high parts, and if
63284345Ssjgthose compare equal, comparing the low parts.
64284345Ssjg
65284345SsjgArithmetic
66284345Ssjg----------
67284345Ssjg
68284345SsjgThe unary negate operation operates by negating the low and high parts.
69284345Ssjg
70284345SsjgAn absolute or absolute-negate operation must be done by comparing
71284345Ssjgagainst zero and negating if necessary.
72284345Ssjg
73284345SsjgAddition and subtraction are performed using library routines.  They
74284345Ssjgare not at present performed perfectly accurately, the result produced
75284345Ssjgwill be within 1ulp of the range generated by adding or subtracting
76284345Ssjg1ulp from the input values, where a 'ulp' is 2^(e-106) given the
77284345Ssjgexponent 'e'.  In the presence of cancellation, this may be
78284345Ssjgarbitrarily inaccurate.  Subtraction is done by negation and addition.
79284345Ssjg
80284345SsjgMultiplication is also performed using a library routine.  Its result
81284345Ssjgwill be within 2ulp of the correct result.
82284345Ssjg
83284345SsjgDivision is also performed using a library routine.  Its result will
84284345Ssjgbe within 3ulp of the correct result.
85284345Ssjg