1Copyright 1999-2023 Free Software Foundation, Inc.
2Contributed by the AriC and Caramba projects, INRIA.
3
4This file is part of the GNU MPFR Library.
5
6The GNU MPFR Library is free software; you can redistribute it and/or modify
7it under the terms of the GNU Lesser General Public License as published by
8the Free Software Foundation; either version 3 of the License, or (at your
9option) any later version.
10
11The GNU MPFR Library is distributed in the hope that it will be useful, but
12WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
13or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
14License for more details.
15
16You should have received a copy of the GNU Lesser General Public License
17along with the GNU MPFR Library; see the file COPYING.LESSER.  If not, see
18https://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
1951 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
20
21Table of contents:
221. Documentation
232. Compiler/library detection
243. Changes in existing functions
254. New functions to implement
265. Efficiency
276. Miscellaneous
287. Portability
29
30
31##############################################################################
321. Documentation
33##############################################################################
34
35- add a description of the algorithms used and a proof of correctness
36
37- when mp*_srcptr is used in a prototype, add "const" in the prototype
38  in the manual, like what is now done in the GMP manual, i.e.
39    mpfr_srcptr -> const mpfr_t
40    mpz_srcptr  -> const mpz_t
41    mpq_srcptr  -> const mpq_t
42    mpf_srcptr  -> const mpf_t
43
44
45##############################################################################
462. Compiler/library detection
47##############################################################################
48
49- update ICC detection.
50  * Use only __INTEL_COMPILER instead of the obsolete macro __ICC?
51
52
53##############################################################################
543. Changes in existing functions
55##############################################################################
56
57- export mpfr_overflow and mpfr_underflow as public functions
58
59- many functions currently taking into account the precision of the *input*
60  variable to set the initial working precision (acosh, asinh, cosh, ...).
61  This is nonsense since the "average" working precision should only depend
62  on the precision of the *output* variable (and maybe on the *value* of
63  the input in case of cancellation).
64  -> remove those dependencies from the input precision.
65
66- mpfr_can_round:
67   change the meaning of the 2nd argument (err). Currently the error is
68   at most 2^(MPFR_EXP(b)-err), i.e. err is the relative shift wrt the
69   most significant bit of the approximation. I propose that the error
70   is now at most 2^err ulps of the approximation, i.e.
71   2^(MPFR_EXP(b)-MPFR_PREC(b)+err).
72
73- mpfr_set_q first tries to convert the numerator and the denominator
74  to mpfr_t. But this conversion may fail even if the correctly rounded
75  result is representable. New way to implement:
76  Function q = a/b.  nq = PREC(q) na = PREC(a) nb = PREC(b)
77    If na < nb
78       a <- a*2^(nb-na)
79    n <- na-nb+ (HIGH(a,nb) >= b)
80    if (n >= nq)
81       bb <- b*2^(n-nq)
82       a  = q*bb+r     --> q has exactly n bits.
83    else
84       aa <- a*2^(nq-n)
85       aa = q*b+r      --> q has exactly n bits.
86  If RNDN, takes nq+1 bits. (See also the new division function).
87
88- revisit the conversion functions between a MPFR number and a native
89  floating-point value.
90  * Consequences if some exception is trapped?
91  * Specify under which conditions (current rounding direction and
92    precision of the FPU, whether a format has been recognized...),
93    correct rounding is guaranteed. Fix the code if need be. Do not
94    forget subnormals.
95  * Provide mpfr_buildopt_* functions to tell whether the format of a
96    native type (float / double / long double) has been recognized and
97    which format it is?
98  * For functions that return a native floating-point value (mpfr_get_flt,
99    mpfr_get_d, mpfr_get_ld, mpfr_get_decimal64), in case of underflow or
100    overflow, follow the convention used for the functions in <math.h>?
101    See ��7.12.1 "Treatment of error conditions" of ISO C11, which provides
102    two ways of handling error conditions, depending on math_errhandling:
103    errno (to be set to ERANGE here) and floating-point exceptions.
104    If floating-point exceptions need to be generated, do not use
105    feraiseexcept(), as this function may require the math library (-lm);
106    use a floating-point expression instead, such as DBL_MIN * DBL_MIN
107    (underflow) or DBL_MAX * DBL_MAX (overflow), which are probably safe
108    as used in the GNU libc implementation.
109  * For testing the lack of subnormal support:
110    see the -mfpu GCC option for ARM and
111    https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level
112
113
114##############################################################################
1154. New functions to implement
116##############################################################################
117
118- cr_xxx functions from https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
119  (section 7.31.8, page 392):
120  rsqrt (differs from mpfr_rec_sqrt in -0: mpfr_rec_sqrt gives +Inf whereas
121         rsqrt gives -Inf)
122- a function to compute the hash of a floating-point number
123  (suggested by Patrick Pelissier)
124- implement new functions from the C++17 standard:
125  https://en.cppreference.com/w/cpp/numeric/special_functions
126  assoc_laguerre, assoc_legendre, comp_ellint_1, comp_ellint_2, comp_ellint_3,
127  cyl_bessel_i, cyl_bessel_j, cyl_bessel_k, cyl_neumann, ellint_1, ellint_2,
128  ellint_3, hermite, legendre, laguerre, sph_bessel, sph_legendre,
129  sph_neumann.
130  Already in mpfr4: beta and riemann_zeta.
131  See also https://isocpp.org/files/papers/P0226R1.pdf and ��29.9.5 in the
132  C++17 draft:
133  https://github.com/cplusplus/draft/blob/master/source/numerics.tex
134- implement mpfr_q_sub, mpfr_z_div, mpfr_q_div?
135- implement mpfr_pow_q and variants with two integers (native or mpz)
136  instead of a rational? See IEEE P1788.
137- implement functions for random distributions, see for example
138  https://sympa.inria.fr/sympa/arc/mpfr/2010-01/msg00034.html
139  (suggested by Charles Karney <ckarney@Sarnoff.com>, 18 Jan 2010):
140   * a Bernoulli distribution with prob p/q (exact)
141   * a general discrete distribution (i with prob w[i]/sum(w[i]) (Walker
142     algorithm, but make it exact)
143   * a uniform distribution in (a,b)
144   * exponential distribution (mean lambda) (von Neumann's method?)
145   * normal distribution (mean m, s.d. sigma) (ratio method?)
146- wanted for Magma [John Cannon <john@maths.usyd.edu.au>, Tue, 19 Apr 2005]:
147  HypergeometricU(a,b,s) = 1/gamma(a)*int(exp(-su)*u^(a-1)*(1+u)^(b-a-1),
148                                    u=0..infinity)
149  JacobiThetaNullK
150  PolylogP, PolylogD, PolylogDold: see https://arxiv.org/abs/math/0702243
151    and the references herein.
152  JBessel(n, x) = BesselJ(n+1/2, x)
153  KBessel, KBessel2 [2nd kind]
154  JacobiTheta
155  (see http://www.ams.org/journals/mcom/0000-000-00/S0025-5718-2017-03245-2/home.html)
156  LogIntegral
157  ExponentialIntegralEn (formula 5.1.4 of Abramowitz and Stegun)
158  DawsonIntegral
159  GammaD(x) = Gamma(x+1/2)
160- new functions of IEEE 754-2008, and more generally functions of the
161  C binding draft TS 18661-4:
162    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1946.pdf
163  More rootn functions: mpfr_rootn_uj, mpfr_rootn_sj, mpfr_rootn_z.
164- reduction functions and augmented arithmetic functions from IEEE 754-2019.
165  Note: the interface for the augmented arithmetic functions will need to be
166  discussed (suggestion: for augmentedAddition and augmentedSubtraction,
167  where x+y-roundTiesTowardZero(x+y) may be inexact in MPFR, use a spec
168  similar to augmentedMultiplication).
169- functions defined in the LIA-2 standard
170  <https://standards.iso.org/ittf/PubliclyAvailableStandards/c024427_ISO_IEC_10967-2_2001(E).zip>
171  + minimum and maximum (5.2.2): max, min, max_seq, min_seq, mmax_seq
172    and mmin_seq (mpfr_min and mpfr_max correspond to mmin and mmax);
173  + rounding_rest, floor_rest, ceiling_rest (5.2.4);
174  + remr (5.2.5): x - round(x/y) y;
175  + error functions from 5.2.7 (if useful in MPFR);
176  + power1pm1 (5.3.6.7): (1 + x)^y - 1;
177  + logbase (5.3.6.12): \log_x(y);
178  + logbase1p1p (5.3.6.13): \log_{1+x}(1+y);
179  + rad (5.3.9.1): x - round(x / (2 pi)) 2 pi = remr(x, 2 pi);
180  + axis_rad (5.3.9.1) if useful in MPFR;
181  + cycle (5.3.10.1): rad(2 pi x / u) u / (2 pi) = remr(x, u);
182  + axis_cycle (5.3.10.1) if useful in MPFR;
183  + cotu, secu, cscu, cossinu, arccotu, arcsecu, arccscu (5.3.10.{5..14}):
184  + arcu (5.3.10.15): arctan2(y,x) u / (2 pi);
185  + rad_to_cycle, cycle_to_rad, cycle_to_cycle (5.3.11.{1..3}).
186- From GSL, missing special functions (if useful in MPFR):
187  (cf https://www.gnu.org/software/gsl/manual/gsl-ref.html#Special-Functions)
188  + The Airy functions Ai(x) and Bi(x) defined by the integral representations:
189   * Ai(x) = (1/\pi) \int_0^\infty \cos((1/3) t^3 + xt) dt
190   * Bi(x) = (1/\pi) \int_0^\infty (e^(-(1/3) t^3) + \sin((1/3) t^3 + xt)) dt
191   * Derivatives of Airy Functions
192  + The Bessel functions for n integer and n fractional:
193   * Regular Modified Cylindrical Bessel Functions I_n
194   * Irregular Modified Cylindrical Bessel Functions K_n
195   * Regular Spherical Bessel Functions j_n: j_0(x) = \sin(x)/x,
196     j_1(x)= (\sin(x)/x-\cos(x))/x & j_2(x)= ((3/x^2-1)\sin(x)-3\cos(x)/x)/x
197     Note: the "spherical" Bessel functions are solutions of
198     x^2 y'' + 2 x y' + [x^2 - n (n+1)] y = 0 and satisfy
199     j_n(x) = sqrt(Pi/(2x)) J_{n+1/2}(x). They should not be mixed with the
200     classical Bessel Functions, also noted j0, j1, jn, y0, y1, yn in C99
201     and mpfr.
202     Cf https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions
203   *Irregular Spherical Bessel Functions y_n: y_0(x) = -\cos(x)/x,
204     y_1(x)= -(\cos(x)/x+\sin(x))/x &
205     y_2(x)= (-3/x^3+1/x)\cos(x)-(3/x^2)\sin(x)
206   * Regular Modified Spherical Bessel Functions i_n:
207     i_l(x) = \sqrt{\pi/(2x)} I_{l+1/2}(x)
208   * Irregular Modified Spherical Bessel Functions:
209     k_l(x) = \sqrt{\pi/(2x)} K_{l+1/2}(x).
210  + Clausen Function:
211     Cl_2(x) = - \int_0^x dt \log(2 \sin(t/2))
212     Cl_2(\theta) = \Im Li_2(\exp(i \theta)) (dilogarithm).
213  + Dawson Function: \exp(-x^2) \int_0^x dt \exp(t^2).
214  + Debye Functions: D_n(x) = n/x^n \int_0^x dt (t^n/(e^t - 1))
215  + Elliptic Integrals:
216   * Definition of Legendre Forms:
217    F(\phi,k) = \int_0^\phi dt 1/\sqrt((1 - k^2 \sin^2(t)))
218    E(\phi,k) = \int_0^\phi dt   \sqrt((1 - k^2 \sin^2(t)))
219    P(\phi,k,n) = \int_0^\phi dt 1/((1 + n \sin^2(t))\sqrt(1 - k^2 \sin^2(t)))
220   * Complete Legendre forms are denoted by
221    K(k) = F(\pi/2, k)
222    E(k) = E(\pi/2, k)
223   * Definition of Carlson Forms
224    RC(x,y) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1)
225    RD(x,y,z) = 3/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-3/2)
226    RF(x,y,z) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2)
227    RJ(x,y,z,p) = 3/2 \int_0^\infty dt
228                          (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) (t+p)^(-1)
229  + Elliptic Functions (Jacobi)
230  + N-relative exponential:
231     exprel_N(x) = N!/x^N (\exp(x) - \sum_{k=0}^{N-1} x^k/k!)
232  + exponential integral:
233     E_2(x) := \Re \int_1^\infty dt \exp(-xt)/t^2.
234     Ei_3(x) = \int_0^x dt \exp(-t^3) for x >= 0.
235     Ei(x) := - PV(\int_{-x}^\infty dt \exp(-t)/t)
236  + Hyperbolic/Trigonometric Integrals
237     Shi(x) = \int_0^x dt \sinh(t)/t
238     Chi(x) := Re[ \gamma_E + \log(x) + \int_0^x dt (\cosh[t]-1)/t]
239     Si(x) = \int_0^x dt \sin(t)/t
240     Ci(x) = -\int_x^\infty dt \cos(t)/t for x > 0
241     AtanInt(x) = \int_0^x dt \arctan(t)/t
242     [ \gamma_E is the Euler constant ]
243  + Fermi-Dirac Function:
244     F_j(x)   := (1/r\Gamma(j+1)) \int_0^\infty dt (t^j / (\exp(t-x) + 1))
245  + Pochhammer symbol (a)_x := \Gamma(a + x)/\Gamma(a) : see [Smith01] in
246          algorithms.bib
247    logarithm of the Pochhammer symbol
248  + Gegenbauer Functions
249  + Laguerre Functions
250  + Eta Function: \eta(s) = (1-2^{1-s}) \zeta(s)
251    Hurwitz zeta function: \zeta(s,q) = \sum_0^\infty (k+q)^{-s}.
252  + Lambert W Functions, W(x) are defined to be solutions of the equation:
253     W(x) \exp(W(x)) = x.
254    This function has multiple branches for x < 0 (2 funcs W0(x) and Wm1(x))
255    From Fredrik Johansson:
256    See https://cs.uwaterloo.ca/research/tr/1993/03/W.pdf, in particular
257    formulas 5.2 and 5.3 for the error bound: one first computes an
258    approximation w, and then evaluates the residual w e^w - x. There is an
259    expression for the error in terms of the residual and the derivative W'(t),
260    where the derivative can be bounded by piecewise simple functions,
261    something like min(1, 1/t) when t >= 0.
262    See https://arxiv.org/abs/1705.03266 for rigorous error bounds.
263  + Trigamma Function psi'(x).
264    and Polygamma Function: psi^{(m)}(x) for m >= 0, x > 0.
265- functions from ISO/IEC 24747:2009 (Extensions to the C Library,
266  to Support Mathematical Special Functions).
267  Standard: https://www.iso.org/standard/38857.html
268  Draft: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1292.pdf
269  Rationale: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1244.pdf
270  See also: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf
271  (similar, for C++).
272  Also check whether the functions that are already implemented in MPFR
273  match this standard.
274
275- from gnumeric (www.gnome.org/projects/gnumeric/doc/function-reference.html):
276  - incomplete beta function, see message from Martin Maechler
277    <maechler@stat.math.ethz.ch> on 18 Jan 2016, and Section 6.6 in
278    Abramowitz & Stegun
279  - betaln
280  - degrees
281  - radians
282  - sqrtpi
283
284- mpfr_inp_raw, mpfr_out_raw (cf mail "Serialization of mpfr_t" from Alexey
285  and answer from Granlund on mpfr list, May 2007)
286- [maybe useful for SAGE] implement companion frac_* functions to the rint_*
287  functions. For example mpfr_frac_floor(x) = x - floor(x). (The current
288  mpfr_frac function corresponds to mpfr_rint_trunc.)
289- scaled erfc (https://sympa.inria.fr/sympa/arc/mpfr/2009-05/msg00054.html)
290- asec, acsc, acot, asech, acsch and acoth (mail from Bj��rn Terelius on mpfr
291  list, 18 June 2009)
292
293- function to reduce the precision of a variable, with a ternary value in
294  input, i.e. taking care of double rounding. Two possible forms: like
295  mpfr_set (i.e. with input and output) or like mpfr_prec_round (i.e. with
296  a single variable). mpfr_subnormalize and mpfr_round_nearest_away_end
297  could use it.
298
299- UBF functions for +, -, *, fmma, /, sqrt.
300  Support UBF in mpfr_check_range or add mpfr_ubf_check_range?
301  Make this available in the API, e.g. for MPC.
302
303- mpfr_cmp_uj, mpfr_cmp_sj, mpfr_mul_uj, mpfr_mul_sj. They would be useful
304  to test MPFR with _MPFR_EXP_FORMAT=4.
305
306- base conversion with the round-trip property using a minimal precision,
307  such as the to_chars functions from the C++ standard:
308
309    The functions [...] ensure that the string representation consists
310    of the smallest number of characters such that there is at least
311    one digit before the radix point (if present) and parsing the
312    representation using the corresponding from_chars function
313    recovers value exactly. [Note: This guarantee applies only if
314    to_chars and from_chars are executed on the same implementation.
315    ��� end note] If there are several such representations, the
316    representation with the smallest difference from the
317    floating-point argument value is chosen, resolving any remaining
318    ties using rounding according to round_to_nearest.
319
320  Text from: https://www.zsh.org/mla/workers/2019/msg01138.html
321
322  Idea of implementation: The get_str.c code is based on mpn_get_str, so
323  that we would like to do something similar, but mpn_get_str doesn't give
324  any information about the error. So we should request additional digits
325  (a bit like what is done in get_str.c) and compute approximations to the
326  distance (as an mpz_t) between the input and the straddling midpoints,
327  the unit being the weight of the last requested digit. Then deduce the
328  closest string to the input with the fewest digits in the interval (data
329  are approximate, so that it may be impossible to answer, i.e. this needs
330  to be done in a Ziv loop).
331
332- Serialization / Deserialization. Suggested by Fr��d��ric P��trot:
333  https://sympa.inria.fr/sympa/arc/mpfr/2020-02/msg00006.html
334  like mpfr_fpif_{import,export}, but with memory instead of file.
335
336  Idea of implementation to reuse most of the code and change very little:
337
338  Instead of passing a FILE *fh, pass a struct ext_data *h, and instead of
339  using fread and fwrite, use
340    h->read (h, buffer, size)
341    h->write (h, buffer, size)
342  respectively.
343
344  The struct ext_data structure could contain the following fields:
345    * read: pointer to a wrapper function for the read method.
346    * write: pointer to a wrapper function for the write method.
347    * FILE *fh: to be used for operations with files.
348    * unsigned char *arena: to be used for operations with memory.
349
350  The wrapper functions for the read method could be:
351
352  static int
353  read_from_file (struct ext_data *h, unsigned char *buffer, size_t size)
354  {
355    return fread (buffer, size, 1, h->fh) != 1;
356  }
357
358  static int
359  read_from_memory (struct ext_data *h, unsigned char *buffer, size_t size)
360  {
361    if (h->arena == NULL)
362      return 1;
363    memcpy (buffer, h->arena, size);
364    h->arena += size;
365    return 0;
366  }
367
368  So I expect very few changes in the existing code:
369    * Write a few wrapper functions.
370    * Rename mpfr_fpif_export to mpfr_fpif_export_aux and
371      mpfr_fpif_import to mpfr_fpif_import_aux.
372    * In the existing functions, replace FILE *fh, and fread/fwrite
373      calls as mentioned above.
374    * Add new mpfr_fpif_export, mpfr_fpif_import, mpfr_fpif_export_mem,
375      mpfr_fpif_import_mem.
376
377- Fused divide-add and fused divide-subtract (mpfr_fda, mpfr_fds).
378  Reference: K. Pande, A. Parkhi, S. Jaykar and A. Peshattiwar,
379  "Design and Implementation of Floating Point Divide-Add Fused Architecture",
380  2015 Fifth International Conference on Communication Systems and Network
381  Technologies, 2015, pp. 797-800, doi: 10.1109/CSNT.2015.179.
382  However, this article does not provide a specification of special values
383  (but the same rules as FMA/FMS could be chosen, based on 1/��0 = ��Inf and
384  1/��Inf = ��0).
385
386
387##############################################################################
3885. Efficiency
389##############################################################################
390
391- Fredrik Johansson wrote a new preprint about computing the gamma function,
392  where he claims a speedup of 10 over MPFR at machine precision, and a
393  speedup of 1000 for a first call to gamma at 10000 digits:
394  https://hal.inria.fr/hal-03346642
395- Fredrik Johansson reports that mpfr_ai is slow for large arguments: an
396  asymptotic expansion should be used (once done, remove REDUCE_EMAX from
397  tests/tai.c and update the description in mpfr.texi).
398- for exp(x), Fredrik Johansson reports a 20% speed improvement starting from
399  4000 bits, and up to a 75% memory improvement in his Arb implementation, by
400  using recursive instead of iterative binary splitting:
401  https://github.com/fredrik-johansson/arb/blob/master/elefun/exp_sum_bs_powtab.c
402- improve mpfr_grandom using the algorithm in https://arxiv.org/abs/1303.6257
403- implement a mpfr_sqrthigh algorithm based on Mulders' algorithm, with a
404  basecase variant
405- use mpn_div_q to speed up mpfr_div. However, mpn_div_q, which is new in
406  GMP 5, is not documented in the GMP manual, thus we are not sure it
407  guarantees to return the same quotient as mpn_tdiv_qr.
408  Also mpfr_div uses the remainder computed by mpn_divrem. A workaround would
409  be to first try with mpn_div_q, and if we cannot (easily) compute the
410  rounding, then use the current code with mpn_divrem.
411- improve atanh(x) for small x by using atanh(x) = log1p(2x/(1-x)),
412  and log1p should also be improved for small arguments.
413- compute exp by using the series for cosh or sinh, which has half the terms
414  (see Exercise 4.11 from Modern Computer Arithmetic, version 0.3)
415  The same method can be used for log, using the series for atanh, i.e.,
416  atanh(x) = 1/2*log((1+x)/(1-x)).
417- improve mpfr_gamma (see https://code.google.com/p/fastfunlib/). A possible
418  idea is to implement a fast algorithm for the argument reconstruction
419  gamma(x+k): instead of performing k products by x+i, we could precompute
420  x^2, ..., x^m for m ~ sqrt(k), and perform only sqrt(k) products.
421  One could also use the series for 1/gamma(x), see for example
422  https://dlmf.nist.gov/5/7/ or formula (36) from
423  https://mathworld.wolfram.com/GammaFunction.html
424- improve the computation of Bernoulli numbers: instead of computing just one
425  B[2n] at a time in mpfr_bernoulli_internal, we could compute several at a
426  time, sharing the expensive computation of the 1/p^(2n) series.
427- fix regression with mpfr_mpz_root (from Keith Briggs, 5 July 2006), for
428   example on 3Ghz P4 with gmp-4.2, x=12.345:
429   prec=50000    k=2   k=3   k=10  k=100
430   mpz_root      0.036 0.072 0.476 7.628
431   mpfr_mpz_root 0.004 0.004 0.036 12.20
432   See also mail from Carl Witty on mpfr list, 09 Oct 2007.
433- for sparse input (say x=1 with 2 bits), mpfr_exp is not faster than for
434        full precision when precision <= MPFR_EXP_THRESHOLD. The reason is
435        that argument reduction kills sparsity. Maybe avoid argument reduction
436        for sparse input?
437- speed up mpfr_atan for large arguments (to speed up mpc_log) see FR #6198
438- improve mpfr_sin on values like ~pi (do not compute sin from cos, because
439  of the cancellation). For instance, reduce the input modulo pi/2 in
440  [-pi/4,pi/4], and define auxiliary functions for which the argument is
441  assumed to be already reduced (so that the sin function can avoid
442  unnecessary computations by calling the auxiliary cos function instead of
443  the full cos function). This will require a native code for sin, for
444  example using the reduction sin(3x)=3sin(x)-4sin(x)^3.
445  See https://sympa.inria.fr/sympa/arc/mpfr/2007-08/msg00001.html and
446  the following messages.
447- improve generic.c to work for number of terms <> 2^k
448- rewrite mpfr_greater_p... as native code.
449
450- mpf_t uses a scheme where the number of limbs actually present can
451  be less than the selected precision, thereby allowing low precision
452  values (for instance small integers) to be stored and manipulated in
453  an mpf_t efficiently.
454
455  Perhaps mpfr should get something similar, especially if looking to
456  replace mpf with mpfr, though it'd be a major change.  Alternately
457  perhaps those mpfr routines like mpfr_mul where optimizations are
458  possible through stripping low zero bits or limbs could check for
459  that (this would be less efficient but easier).
460
461- try the idea of the paper "Reduced Cancellation in the Evaluation of Entire
462  Functions and Applications to the Error Function" by W. Gawronski, J. Mueller
463  and M. Reinhard, to be published in SIAM Journal on Numerical Analysis: to
464  avoid cancellation in say erfc(x) for x large, they compute the Taylor
465  expansion of erfc(x)*exp(x^2/2) instead (which has less cancellation),
466  and then divide by exp(x^2/2) (which is simpler to compute).
467
468- replace the *_THRESHOLD macros by global (TLS) variables that can be
469  changed at run time (via a function, like other variables)? One benefit
470  is that users could use a single MPFR binary on several machines (e.g.,
471  a library provided by binary packages or shared via NFS) with different
472  thresholds. On the default values, this would be a bit less efficient
473  than the current code, but this isn't probably noticeable (this should
474  be tested). Something like:
475    long *mpfr_tune_get(void) to get the current values (the first value
476      is the size of the array).
477    int mpfr_tune_set(long *array) to set the tune values.
478    int mpfr_tune_run(long level) to find the best values (the support
479      for this feature is optional, this can also be done with an
480      external function).
481
482- better distinguish different processors (for example Opteron and Core 2)
483  and use corresponding default tuning parameters (as in GMP). This could be
484  done in configure.ac to avoid hacking config.guess, for example define
485  MPFR_HAVE_CORE2.
486  Note (VL): the effect on cross-compilation (that can be a processor
487  with the same architecture, e.g. compilation on a Core 2 for an
488  Opteron) is not clear. The choice should be consistent with the
489  build target (e.g. -march or -mtune value with gcc).
490  Also choose better default values. For instance, the default value of
491  MPFR_MUL_THRESHOLD is 40, while the best values that have been found
492  are between 11 and 19 for 32 bits and between 4 and 10 for 64 bits!
493
494- during the Many Digits competition, we noticed that (our implantation of)
495  Mulders short product was slower than a full product for large sizes.
496  This should be precisely analyzed and fixed if needed.
497
498- for various functions, check the timings as a function of the magnitude
499  of the input (and the input and/or output precisions?), and use better
500  thresholds for asymptotic expansions.
501
502- improve the special case of mpfr_{add,sub} (x, x, y, ...) when |x| > |y|
503  to do the addition in-place and have a complexity of O(prec(y)) in most
504  cases. The mpfr_{add,sub}_{d,ui} functions should automatically benefit
505  from this change.
506
507- in gmp_op.c, for functions with mpz_srcptr, check whether mpz_fits_slong_p
508  is really useful in all cases (see TODO in this file).
509
510- optimize code that uses a test based on the fact that x >> s is
511  undefined in C for s == width of x but the result is expected to
512  be 0. ARM and PowerPC could benefit from such an optimization,
513  but not x86. This needs support from the compiler.
514  For PowerPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79233
515
516- deal with MPFR_RNDF in mpfr_round_near_x (replaced by MPFR_RNDZ).
517
518- instead of a fixed mparam.h, optionally use function multiversioning
519  (FMV), currently only available with the GNU C++ front end:
520    https://gcc.gnu.org/wiki/FunctionMultiVersioning
521  According to https://lwn.net/Articles/691932/ the dispatch resolution
522  is now done by the dynamic loader, so that this should be fast enough
523  (the cost would be the reading of a static variable, initialized at
524  load time, instead of a constant).
525  In particular, binary package distributions would benefit from FMV as
526  only one binary is generated for different processor families.
527
528- use intrinsics such as _addcarry_u64 (Intel specific) and
529  __builtin_addcll (clang specific) when available
530  (needs a configure test).
531  References:
532    https://software.intel.com/sites/landingpage/IntrinsicsGuide/
533    https://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins
534    https://stackoverflow.com/q/11228855/3782797
535    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173
536    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97387
537    https://stackoverflow.com/questions/71600474/difference-between-builtin-addcll-and-addcarry-u64
538  (would be useful for add1sp.c and sub1sp.c).
539
540- reduce the size of the MPFR library generated by GCC by avoiding
541  operations on _Decimal128 in get_d128.c when its encoding is BID
542  (i.e., software implementation). See FIXME in this file.
543
544
545##############################################################################
5466. Miscellaneous
547##############################################################################
548
549- [suggested by Tobias Burnus <burnus(at)net-b.de> and
550   Asher Langton <langton(at)gcc.gnu.org>, Wed, 01 Aug 2007]
551  support quiet and signaling NaNs in mpfr:
552  * functions to set/test a quiet/signaling NaN: mpfr_set_snan, mpfr_snan_p,
553    mpfr_set_qnan, mpfr_qnan_p
554  * correctly convert to/from double (if encoding of s/qNaN is fixed in 754R)
555  Note: Signaling NaNs are not specified by the ISO C standard and may
556  not be supported by the implementation. GCC needs the -fsignaling-nans
557  option (but this does not affect the C library, which may or may not
558  accept signaling NaNs).
559
560- check the constants mpfr_set_emin (-16382-63) and mpfr_set_emax (16383) in
561  get_ld.c and the other constants, and provide a testcase for large and
562  small numbers.
563
564- from Kevin Ryde <user42@zip.com.au>:
565   Also for pi.c, a pre-calculated compiled-in pi to a few thousand
566   digits would be good value I think.  After all, say 10000 bits using
567   1250 bytes would still be small compared to the code size!
568   Store pi in round to zero mode (to recover other modes).
569
570- add other prototypes for round to nearest-away (mpfr_round_nearest_away
571  only deals with the prototypes of say mpfr_sin) or implement it as a native
572  rounding mode
573- add a new roundind mode: round to odd. If the result is not exactly
574        representable, then round to the odd mantissa. This rounding
575        has the nice property that for k > 1, if:
576        y = round(x, p+k, TO_ODD)
577        z = round(y, p, TO_NEAREST_EVEN), then
578        z = round(x, p, TO_NEAREST_EVEN)
579  so it avoids the double-rounding problem.
580  VL: I prefer the (original?) term "sticky rounding", as used in
581    J Strother Moore, Tom Lynch, Matt Kaufmann. A Mechanically Checked
582    Proof of the Correctness of the Kernel of the AMD5K86 Floating-Point
583    Division Algorithm. IEEE Transactions on Computers, 1996.
584  and
585    http://www.russinoff.com/libman/text/node26.html
586
587- new rounding mode MPFR_RNDE when the result is known to be exact?
588  * In normal mode, this would allow MPFR to optimize using
589    this information.
590  * In debug mode, MPFR would check that the result is exact
591    (i.e. that the ternary value is 0).
592
593- add tests of the ternary value for constants
594
595- When doing Extensive Check (--enable-assert=full), since all the
596  functions use a similar use of MACROS (ZivLoop, ROUND_P), it should
597  be possible to do such a scheme:
598    For the first call to ROUND_P when we can round.
599    Mark it as such and save the approximated rounding value in
600    a temporary variable.
601    Then after, if the mark is set, check if:
602      - we still can round.
603      - The rounded value is the same.
604  It should be a complement to tgeneric tests.
605
606- in div.c, try to find a case for which cy != 0 after the line
607        cy = mpn_sub_1 (sp + k, sp + k, qsize, cy);
608  (which should be added to the tests), e.g. by having {vp, k} = 0, or
609  prove that this cannot happen.
610
611- add a configure test for --enable-logging to ignore the option if
612  it cannot be supported. Modify the "configure --help" description
613  to say "on systems that support it".
614
615- add generic bad cases for functions that don't have an inverse
616  function that is implemented (use a single Newton iteration).
617
618- add bad cases for the internal error bound (by using a dichotomy
619  between a bad case for the correct rounding and some input value
620  with fewer Ziv iterations?).
621
622- add an option to use a 32-bit exponent type (int) on LP64 machines,
623  mainly for developers, in order to be able to test the case where the
624  extended exponent range is the same as the default exponent range, on
625  such platforms.
626  Tests can be done with the exp-int branch (added on 2010-12-17, and
627  many tests fail at this time).
628
629- test underflow/overflow detection of various functions (in particular
630  mpfr_exp) in reduced exponent ranges, including ranges that do not
631  contain 0.
632
633- add an internal macro that does the equivalent of the following?
634    MPFR_IS_ZERO(x) || MPFR_GET_EXP(x) <= value
635
636- check whether __gmpfr_emin and __gmpfr_emax could be replaced by
637  a constant (see README.dev). Also check the use of MPFR_EMIN_MIN
638  and MPFR_EMAX_MAX.
639
640- add a test checking that no mpfr.h macros depend on mpfr-impl.h
641  (the current tests cannot check that since mpfr-impl.h is always
642  included).
643
644- move some macro definitions from acinclude.m4 to the m4 directory
645  as suggested by the Automake manual? The reason is that the
646  acinclude.m4 file is big and a bit difficult to read.
647
648- use symbol versioning.
649
650- check whether mpz_t caching (pool) is necessary. Timings with -static
651  with details about the C / C library implementation should be put
652  somewhere as a comment in the source or in the doc. Using -static
653  is important because otherwise the cache saves the dynamic call to
654  mpz_init and mpz_clear; so, what we're measuring is not clear.
655  See thread:
656    https://gmplib.org/list-archives/gmp-devel/2015-September/004147.html
657  Summary: It will not be integrated in GMP because 1) This yields
658  problems with threading (in MPFR, we have TLS variables, but this is
659  not the case of GMP). 2) The gain (if confirmed with -static) would
660  be due to a poor malloc implementation (timings would depend on the
661  platform). 3) Applications would use more RAM.
662  Additional notes [VL]: the major differences in the timings given
663  by Patrick in 2014-01 under Linux were:
664    Before:
665    arccos(x)  took 0.054689 ms (32767 eval in 1792 ms)
666    arctan(x)  took 0.042116 ms (32767 eval in 1380 ms)
667    After:
668    arccos(x)  took 0.043580 ms (32767 eval in 1428 ms)
669    arctan(x)  took 0.035401 ms (32767 eval in 1160 ms)
670  mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes
671  from mpfr_atan, which uses mpz a lot. The problem mainly comes from the
672  reallocations in GMP because mpz_init is used instead of mpz_init2 with
673  the estimated maximum size. Other places in the code that uses mpz_init
674  may be concerned.
675  Issues with mpz_t caching:
676    * The pool can take much memory, which may no longer be useful.
677      For instance:
678        mpfr_init2 (x, 10000000);
679        mpfr_log_ui (x, 17, MPFR_RNDN);
680        /* ... */
681        mpfr_clear (x);
682        /* followed by code using only small precision */
683      while contrary to real caches, they contain no data. This is not
684      valuable memory: freeing/allocating a large block of memory is
685      much faster than the actual computations, so that mpz_t caching
686      has no impact on the performance in such cases. A pool with large
687      blocks also potentially destroys the data locality.
688    * It assumes that the real GMP functions are __gmpz_init and
689      __gmpz_clear, which are not part of the official GMP API, thus
690      is based on GMP internals, which may change in the future or
691      may be different in forks / compatible libraries / etc. This
692      can be solved if MPFR code calls mpfr_mpz_init / mpfr_mpz_clear
693      directly, avoiding the #define's.
694  Questions that need to be answered:
695    * What about the comparisons with other memory allocators?
696    * Shouldn't the pool be part of the memory allocator?
697      For the default memory allocator (malloc): RFE?
698  If it is decided to keep some form of mpz_t caching, a possible solution
699  for both issues: define mpfr_mpz_init2 and mpfr_mpz_clear2, which both
700  take 2 arguments like mpz_init2, where mpfr_mpz_init2 behaves in a way
701  similar to mpz_init2, and mpfr_mpz_clear2 behaves in a way similar to
702  mpz_clear but where the size argument is a hint for the pool; if it is
703  too large, then the mpz_t should not be pushed back to the pool. The
704  size argument of mpfr_mpz_init2 could also be a hint to decide which
705  element to pull from the pool.
706
707- in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722,
708  that is, with a large error during the computation of the secondary term
709  (when the TMD occurs).
710
711- use the keyword "static" in array indices of parameter declarations with
712  C99 compilers (6.7.5.3p7) when the pointer is expected not to be null?
713  For instance, if mpfr.h is changed to have:
714    __MPFR_DECLSPEC void mpfr_dump (const __mpfr_struct [static 1]);
715  and one calls
716    mpfr_dump (NULL);
717  one gets a warning with Clang. This is just an example; this needs to be
718  done in a clean way.
719  See:
720    https://stackoverflow.com/a/3430353/3782797
721    https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html
722
723- change most mpfr_urandomb occurrences to mpfr_urandom in the tests?
724  (The one done in r10573 allowed us to find a bug even without
725  assertion checking.)
726
727- tzeta has been much slower since r9848 (which increases the precision
728  of the input for the low output precisions), at least with the x86
729  32-bit ABI. This seems to come from the fact that the working precision
730  in the mpfr_zeta implementation depends on the precision of the input.
731  Once mpfr_zeta has improved, change the last argument of test_generic
732  in tzeta.c back to 5 (as it was before r10667).
733
734- check the small-precision tables in the tests?
735  This may require to export some pointer to the tables, but this could
736  be done only if some debug macro is defined.
737
738- optionally use malloc() for the caches? See mpfr_mp_memory_cleanup.
739  Note: This can be implemented by adding a TLS flag saying whether we
740  are under cache generation or not, and by making the MPFR allocation
741  functions consider this flag. Moreover, this can only work for mpfr_t
742  caching (floating-point constants), not for mpz_t caching (Bernoulli
743  constants) because we do not have the control of memory allocation for
744  mpz_init.
745
746- use GCC's nonnull attribute (available since GCC 4.0) where applicable.
747
748- avoid the use of MPFR_MANT(x) as an lvalue; use other (more high level)
749  internal macros if possible, such as MPFR_TMP_INIT1, MPFR_TMP_INIT and
750  MPFR_ALIAS.
751
752- add some option to use pkg-config to get the compiler flags associated
753  with GMP ("pkg-config --cflags gmp" / "pkg-config --libs gmp"). It
754  could be one of
755    --with-pkg-config[=<PATH>]
756    --with-pkg-config-path[=<PATH>]
757  See also the autoconf macros.
758  The non-standard CPATH and C_INCLUDE_PATH environment variables need to
759  be unset (only when invoking pkg-config) as a workaround to
760    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984504
761
762
763##############################################################################
7647. Portability
765##############################################################################
766
767- [Kevin about texp.c long strings]
768  For strings longer than c99 guarantees, it might be cleaner to
769  introduce a "tests_strdupcat" or something to concatenate literal
770  strings into newly allocated memory.  I thought I'd done that in a
771  couple of places already.  Arrays of chars are not much fun.
772
773- use https://gcc.gnu.org/viewcvs/gcc/trunk/config/stdint.m4 for mpfr-gmp.h
774
775- By default, GNU Automake adds -I options to local directories, with
776  the side effect that these directories have the precedence to search
777  for system headers (#include <...>). This may make the build fail if
778  a C implementation includes a file that has the same name as one used
779  in such a directory.
780  For instance, if one adds an empty file "src/bits/types.h", then the
781  MPFR build fails under Linux because /usr/include/stdio.h has
782    #include <bits/types.h>
783  Possible workaround:
784    * disable the default -I options with nostdinc as documented in
785      the Automake manual;
786    * have a rule that copies the needed files ("mpfr.h" or they should
787      be prefixed with "mpfr-") to $(top_builddir)/include;
788    * use "-I$(top_builddir)/include".
789
790- _Noreturn (from ISO C11) may be replaced by [[noreturn]] in the future:
791  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2764.pdf
792  Once this is implemented in compilers, consider it as an alternative
793  (concerned files: acinclude.m4, doc/README.dev and src/mpfr-impl.h).
794