1Copyright 1999-2023 Free Software Foundation, Inc. 2Contributed by the AriC and Caramba projects, INRIA. 3 4This file is part of the GNU MPFR Library. 5 6The GNU MPFR Library is free software; you can redistribute it and/or modify 7it under the terms of the GNU Lesser General Public License as published by 8the Free Software Foundation; either version 3 of the License, or (at your 9option) any later version. 10 11The GNU MPFR Library is distributed in the hope that it will be useful, but 12WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 13or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public 14License for more details. 15 16You should have received a copy of the GNU Lesser General Public License 17along with the GNU MPFR Library; see the file COPYING.LESSER. If not, see 18https://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc., 1951 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. 20 21Table of contents: 221. Documentation 232. Compiler/library detection 243. Changes in existing functions 254. New functions to implement 265. Efficiency 276. Miscellaneous 287. Portability 29 30 31############################################################################## 321. Documentation 33############################################################################## 34 35- add a description of the algorithms used and a proof of correctness 36 37- when mp*_srcptr is used in a prototype, add "const" in the prototype 38 in the manual, like what is now done in the GMP manual, i.e. 39 mpfr_srcptr -> const mpfr_t 40 mpz_srcptr -> const mpz_t 41 mpq_srcptr -> const mpq_t 42 mpf_srcptr -> const mpf_t 43 44 45############################################################################## 462. Compiler/library detection 47############################################################################## 48 49- update ICC detection. 50 * Use only __INTEL_COMPILER instead of the obsolete macro __ICC? 51 52 53############################################################################## 543. Changes in existing functions 55############################################################################## 56 57- export mpfr_overflow and mpfr_underflow as public functions 58 59- many functions currently taking into account the precision of the *input* 60 variable to set the initial working precision (acosh, asinh, cosh, ...). 61 This is nonsense since the "average" working precision should only depend 62 on the precision of the *output* variable (and maybe on the *value* of 63 the input in case of cancellation). 64 -> remove those dependencies from the input precision. 65 66- mpfr_can_round: 67 change the meaning of the 2nd argument (err). Currently the error is 68 at most 2^(MPFR_EXP(b)-err), i.e. err is the relative shift wrt the 69 most significant bit of the approximation. I propose that the error 70 is now at most 2^err ulps of the approximation, i.e. 71 2^(MPFR_EXP(b)-MPFR_PREC(b)+err). 72 73- mpfr_set_q first tries to convert the numerator and the denominator 74 to mpfr_t. But this conversion may fail even if the correctly rounded 75 result is representable. New way to implement: 76 Function q = a/b. nq = PREC(q) na = PREC(a) nb = PREC(b) 77 If na < nb 78 a <- a*2^(nb-na) 79 n <- na-nb+ (HIGH(a,nb) >= b) 80 if (n >= nq) 81 bb <- b*2^(n-nq) 82 a = q*bb+r --> q has exactly n bits. 83 else 84 aa <- a*2^(nq-n) 85 aa = q*b+r --> q has exactly n bits. 86 If RNDN, takes nq+1 bits. (See also the new division function). 87 88- revisit the conversion functions between a MPFR number and a native 89 floating-point value. 90 * Consequences if some exception is trapped? 91 * Specify under which conditions (current rounding direction and 92 precision of the FPU, whether a format has been recognized...), 93 correct rounding is guaranteed. Fix the code if need be. Do not 94 forget subnormals. 95 * Provide mpfr_buildopt_* functions to tell whether the format of a 96 native type (float / double / long double) has been recognized and 97 which format it is? 98 * For functions that return a native floating-point value (mpfr_get_flt, 99 mpfr_get_d, mpfr_get_ld, mpfr_get_decimal64), in case of underflow or 100 overflow, follow the convention used for the functions in <math.h>? 101 See ��7.12.1 "Treatment of error conditions" of ISO C11, which provides 102 two ways of handling error conditions, depending on math_errhandling: 103 errno (to be set to ERANGE here) and floating-point exceptions. 104 If floating-point exceptions need to be generated, do not use 105 feraiseexcept(), as this function may require the math library (-lm); 106 use a floating-point expression instead, such as DBL_MIN * DBL_MIN 107 (underflow) or DBL_MAX * DBL_MAX (overflow), which are probably safe 108 as used in the GNU libc implementation. 109 * For testing the lack of subnormal support: 110 see the -mfpu GCC option for ARM and 111 https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level 112 113 114############################################################################## 1154. New functions to implement 116############################################################################## 117 118- cr_xxx functions from https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf 119 (section 7.31.8, page 392): 120 rsqrt (differs from mpfr_rec_sqrt in -0: mpfr_rec_sqrt gives +Inf whereas 121 rsqrt gives -Inf) 122- a function to compute the hash of a floating-point number 123 (suggested by Patrick Pelissier) 124- implement new functions from the C++17 standard: 125 https://en.cppreference.com/w/cpp/numeric/special_functions 126 assoc_laguerre, assoc_legendre, comp_ellint_1, comp_ellint_2, comp_ellint_3, 127 cyl_bessel_i, cyl_bessel_j, cyl_bessel_k, cyl_neumann, ellint_1, ellint_2, 128 ellint_3, hermite, legendre, laguerre, sph_bessel, sph_legendre, 129 sph_neumann. 130 Already in mpfr4: beta and riemann_zeta. 131 See also https://isocpp.org/files/papers/P0226R1.pdf and ��29.9.5 in the 132 C++17 draft: 133 https://github.com/cplusplus/draft/blob/master/source/numerics.tex 134- implement mpfr_q_sub, mpfr_z_div, mpfr_q_div? 135- implement mpfr_pow_q and variants with two integers (native or mpz) 136 instead of a rational? See IEEE P1788. 137- implement functions for random distributions, see for example 138 https://sympa.inria.fr/sympa/arc/mpfr/2010-01/msg00034.html 139 (suggested by Charles Karney <ckarney@Sarnoff.com>, 18 Jan 2010): 140 * a Bernoulli distribution with prob p/q (exact) 141 * a general discrete distribution (i with prob w[i]/sum(w[i]) (Walker 142 algorithm, but make it exact) 143 * a uniform distribution in (a,b) 144 * exponential distribution (mean lambda) (von Neumann's method?) 145 * normal distribution (mean m, s.d. sigma) (ratio method?) 146- wanted for Magma [John Cannon <john@maths.usyd.edu.au>, Tue, 19 Apr 2005]: 147 HypergeometricU(a,b,s) = 1/gamma(a)*int(exp(-su)*u^(a-1)*(1+u)^(b-a-1), 148 u=0..infinity) 149 JacobiThetaNullK 150 PolylogP, PolylogD, PolylogDold: see https://arxiv.org/abs/math/0702243 151 and the references herein. 152 JBessel(n, x) = BesselJ(n+1/2, x) 153 KBessel, KBessel2 [2nd kind] 154 JacobiTheta 155 (see http://www.ams.org/journals/mcom/0000-000-00/S0025-5718-2017-03245-2/home.html) 156 LogIntegral 157 ExponentialIntegralEn (formula 5.1.4 of Abramowitz and Stegun) 158 DawsonIntegral 159 GammaD(x) = Gamma(x+1/2) 160- new functions of IEEE 754-2008, and more generally functions of the 161 C binding draft TS 18661-4: 162 https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1946.pdf 163 More rootn functions: mpfr_rootn_uj, mpfr_rootn_sj, mpfr_rootn_z. 164- reduction functions and augmented arithmetic functions from IEEE 754-2019. 165 Note: the interface for the augmented arithmetic functions will need to be 166 discussed (suggestion: for augmentedAddition and augmentedSubtraction, 167 where x+y-roundTiesTowardZero(x+y) may be inexact in MPFR, use a spec 168 similar to augmentedMultiplication). 169- functions defined in the LIA-2 standard 170 <https://standards.iso.org/ittf/PubliclyAvailableStandards/c024427_ISO_IEC_10967-2_2001(E).zip> 171 + minimum and maximum (5.2.2): max, min, max_seq, min_seq, mmax_seq 172 and mmin_seq (mpfr_min and mpfr_max correspond to mmin and mmax); 173 + rounding_rest, floor_rest, ceiling_rest (5.2.4); 174 + remr (5.2.5): x - round(x/y) y; 175 + error functions from 5.2.7 (if useful in MPFR); 176 + power1pm1 (5.3.6.7): (1 + x)^y - 1; 177 + logbase (5.3.6.12): \log_x(y); 178 + logbase1p1p (5.3.6.13): \log_{1+x}(1+y); 179 + rad (5.3.9.1): x - round(x / (2 pi)) 2 pi = remr(x, 2 pi); 180 + axis_rad (5.3.9.1) if useful in MPFR; 181 + cycle (5.3.10.1): rad(2 pi x / u) u / (2 pi) = remr(x, u); 182 + axis_cycle (5.3.10.1) if useful in MPFR; 183 + cotu, secu, cscu, cossinu, arccotu, arcsecu, arccscu (5.3.10.{5..14}): 184 + arcu (5.3.10.15): arctan2(y,x) u / (2 pi); 185 + rad_to_cycle, cycle_to_rad, cycle_to_cycle (5.3.11.{1..3}). 186- From GSL, missing special functions (if useful in MPFR): 187 (cf https://www.gnu.org/software/gsl/manual/gsl-ref.html#Special-Functions) 188 + The Airy functions Ai(x) and Bi(x) defined by the integral representations: 189 * Ai(x) = (1/\pi) \int_0^\infty \cos((1/3) t^3 + xt) dt 190 * Bi(x) = (1/\pi) \int_0^\infty (e^(-(1/3) t^3) + \sin((1/3) t^3 + xt)) dt 191 * Derivatives of Airy Functions 192 + The Bessel functions for n integer and n fractional: 193 * Regular Modified Cylindrical Bessel Functions I_n 194 * Irregular Modified Cylindrical Bessel Functions K_n 195 * Regular Spherical Bessel Functions j_n: j_0(x) = \sin(x)/x, 196 j_1(x)= (\sin(x)/x-\cos(x))/x & j_2(x)= ((3/x^2-1)\sin(x)-3\cos(x)/x)/x 197 Note: the "spherical" Bessel functions are solutions of 198 x^2 y'' + 2 x y' + [x^2 - n (n+1)] y = 0 and satisfy 199 j_n(x) = sqrt(Pi/(2x)) J_{n+1/2}(x). They should not be mixed with the 200 classical Bessel Functions, also noted j0, j1, jn, y0, y1, yn in C99 201 and mpfr. 202 Cf https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions 203 *Irregular Spherical Bessel Functions y_n: y_0(x) = -\cos(x)/x, 204 y_1(x)= -(\cos(x)/x+\sin(x))/x & 205 y_2(x)= (-3/x^3+1/x)\cos(x)-(3/x^2)\sin(x) 206 * Regular Modified Spherical Bessel Functions i_n: 207 i_l(x) = \sqrt{\pi/(2x)} I_{l+1/2}(x) 208 * Irregular Modified Spherical Bessel Functions: 209 k_l(x) = \sqrt{\pi/(2x)} K_{l+1/2}(x). 210 + Clausen Function: 211 Cl_2(x) = - \int_0^x dt \log(2 \sin(t/2)) 212 Cl_2(\theta) = \Im Li_2(\exp(i \theta)) (dilogarithm). 213 + Dawson Function: \exp(-x^2) \int_0^x dt \exp(t^2). 214 + Debye Functions: D_n(x) = n/x^n \int_0^x dt (t^n/(e^t - 1)) 215 + Elliptic Integrals: 216 * Definition of Legendre Forms: 217 F(\phi,k) = \int_0^\phi dt 1/\sqrt((1 - k^2 \sin^2(t))) 218 E(\phi,k) = \int_0^\phi dt \sqrt((1 - k^2 \sin^2(t))) 219 P(\phi,k,n) = \int_0^\phi dt 1/((1 + n \sin^2(t))\sqrt(1 - k^2 \sin^2(t))) 220 * Complete Legendre forms are denoted by 221 K(k) = F(\pi/2, k) 222 E(k) = E(\pi/2, k) 223 * Definition of Carlson Forms 224 RC(x,y) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1) 225 RD(x,y,z) = 3/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-3/2) 226 RF(x,y,z) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) 227 RJ(x,y,z,p) = 3/2 \int_0^\infty dt 228 (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) (t+p)^(-1) 229 + Elliptic Functions (Jacobi) 230 + N-relative exponential: 231 exprel_N(x) = N!/x^N (\exp(x) - \sum_{k=0}^{N-1} x^k/k!) 232 + exponential integral: 233 E_2(x) := \Re \int_1^\infty dt \exp(-xt)/t^2. 234 Ei_3(x) = \int_0^x dt \exp(-t^3) for x >= 0. 235 Ei(x) := - PV(\int_{-x}^\infty dt \exp(-t)/t) 236 + Hyperbolic/Trigonometric Integrals 237 Shi(x) = \int_0^x dt \sinh(t)/t 238 Chi(x) := Re[ \gamma_E + \log(x) + \int_0^x dt (\cosh[t]-1)/t] 239 Si(x) = \int_0^x dt \sin(t)/t 240 Ci(x) = -\int_x^\infty dt \cos(t)/t for x > 0 241 AtanInt(x) = \int_0^x dt \arctan(t)/t 242 [ \gamma_E is the Euler constant ] 243 + Fermi-Dirac Function: 244 F_j(x) := (1/r\Gamma(j+1)) \int_0^\infty dt (t^j / (\exp(t-x) + 1)) 245 + Pochhammer symbol (a)_x := \Gamma(a + x)/\Gamma(a) : see [Smith01] in 246 algorithms.bib 247 logarithm of the Pochhammer symbol 248 + Gegenbauer Functions 249 + Laguerre Functions 250 + Eta Function: \eta(s) = (1-2^{1-s}) \zeta(s) 251 Hurwitz zeta function: \zeta(s,q) = \sum_0^\infty (k+q)^{-s}. 252 + Lambert W Functions, W(x) are defined to be solutions of the equation: 253 W(x) \exp(W(x)) = x. 254 This function has multiple branches for x < 0 (2 funcs W0(x) and Wm1(x)) 255 From Fredrik Johansson: 256 See https://cs.uwaterloo.ca/research/tr/1993/03/W.pdf, in particular 257 formulas 5.2 and 5.3 for the error bound: one first computes an 258 approximation w, and then evaluates the residual w e^w - x. There is an 259 expression for the error in terms of the residual and the derivative W'(t), 260 where the derivative can be bounded by piecewise simple functions, 261 something like min(1, 1/t) when t >= 0. 262 See https://arxiv.org/abs/1705.03266 for rigorous error bounds. 263 + Trigamma Function psi'(x). 264 and Polygamma Function: psi^{(m)}(x) for m >= 0, x > 0. 265- functions from ISO/IEC 24747:2009 (Extensions to the C Library, 266 to Support Mathematical Special Functions). 267 Standard: https://www.iso.org/standard/38857.html 268 Draft: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1292.pdf 269 Rationale: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1244.pdf 270 See also: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf 271 (similar, for C++). 272 Also check whether the functions that are already implemented in MPFR 273 match this standard. 274 275- from gnumeric (www.gnome.org/projects/gnumeric/doc/function-reference.html): 276 - incomplete beta function, see message from Martin Maechler 277 <maechler@stat.math.ethz.ch> on 18 Jan 2016, and Section 6.6 in 278 Abramowitz & Stegun 279 - betaln 280 - degrees 281 - radians 282 - sqrtpi 283 284- mpfr_inp_raw, mpfr_out_raw (cf mail "Serialization of mpfr_t" from Alexey 285 and answer from Granlund on mpfr list, May 2007) 286- [maybe useful for SAGE] implement companion frac_* functions to the rint_* 287 functions. For example mpfr_frac_floor(x) = x - floor(x). (The current 288 mpfr_frac function corresponds to mpfr_rint_trunc.) 289- scaled erfc (https://sympa.inria.fr/sympa/arc/mpfr/2009-05/msg00054.html) 290- asec, acsc, acot, asech, acsch and acoth (mail from Bj��rn Terelius on mpfr 291 list, 18 June 2009) 292 293- function to reduce the precision of a variable, with a ternary value in 294 input, i.e. taking care of double rounding. Two possible forms: like 295 mpfr_set (i.e. with input and output) or like mpfr_prec_round (i.e. with 296 a single variable). mpfr_subnormalize and mpfr_round_nearest_away_end 297 could use it. 298 299- UBF functions for +, -, *, fmma, /, sqrt. 300 Support UBF in mpfr_check_range or add mpfr_ubf_check_range? 301 Make this available in the API, e.g. for MPC. 302 303- mpfr_cmp_uj, mpfr_cmp_sj, mpfr_mul_uj, mpfr_mul_sj. They would be useful 304 to test MPFR with _MPFR_EXP_FORMAT=4. 305 306- base conversion with the round-trip property using a minimal precision, 307 such as the to_chars functions from the C++ standard: 308 309 The functions [...] ensure that the string representation consists 310 of the smallest number of characters such that there is at least 311 one digit before the radix point (if present) and parsing the 312 representation using the corresponding from_chars function 313 recovers value exactly. [Note: This guarantee applies only if 314 to_chars and from_chars are executed on the same implementation. 315 ��� end note] If there are several such representations, the 316 representation with the smallest difference from the 317 floating-point argument value is chosen, resolving any remaining 318 ties using rounding according to round_to_nearest. 319 320 Text from: https://www.zsh.org/mla/workers/2019/msg01138.html 321 322 Idea of implementation: The get_str.c code is based on mpn_get_str, so 323 that we would like to do something similar, but mpn_get_str doesn't give 324 any information about the error. So we should request additional digits 325 (a bit like what is done in get_str.c) and compute approximations to the 326 distance (as an mpz_t) between the input and the straddling midpoints, 327 the unit being the weight of the last requested digit. Then deduce the 328 closest string to the input with the fewest digits in the interval (data 329 are approximate, so that it may be impossible to answer, i.e. this needs 330 to be done in a Ziv loop). 331 332- Serialization / Deserialization. Suggested by Fr��d��ric P��trot: 333 https://sympa.inria.fr/sympa/arc/mpfr/2020-02/msg00006.html 334 like mpfr_fpif_{import,export}, but with memory instead of file. 335 336 Idea of implementation to reuse most of the code and change very little: 337 338 Instead of passing a FILE *fh, pass a struct ext_data *h, and instead of 339 using fread and fwrite, use 340 h->read (h, buffer, size) 341 h->write (h, buffer, size) 342 respectively. 343 344 The struct ext_data structure could contain the following fields: 345 * read: pointer to a wrapper function for the read method. 346 * write: pointer to a wrapper function for the write method. 347 * FILE *fh: to be used for operations with files. 348 * unsigned char *arena: to be used for operations with memory. 349 350 The wrapper functions for the read method could be: 351 352 static int 353 read_from_file (struct ext_data *h, unsigned char *buffer, size_t size) 354 { 355 return fread (buffer, size, 1, h->fh) != 1; 356 } 357 358 static int 359 read_from_memory (struct ext_data *h, unsigned char *buffer, size_t size) 360 { 361 if (h->arena == NULL) 362 return 1; 363 memcpy (buffer, h->arena, size); 364 h->arena += size; 365 return 0; 366 } 367 368 So I expect very few changes in the existing code: 369 * Write a few wrapper functions. 370 * Rename mpfr_fpif_export to mpfr_fpif_export_aux and 371 mpfr_fpif_import to mpfr_fpif_import_aux. 372 * In the existing functions, replace FILE *fh, and fread/fwrite 373 calls as mentioned above. 374 * Add new mpfr_fpif_export, mpfr_fpif_import, mpfr_fpif_export_mem, 375 mpfr_fpif_import_mem. 376 377- Fused divide-add and fused divide-subtract (mpfr_fda, mpfr_fds). 378 Reference: K. Pande, A. Parkhi, S. Jaykar and A. Peshattiwar, 379 "Design and Implementation of Floating Point Divide-Add Fused Architecture", 380 2015 Fifth International Conference on Communication Systems and Network 381 Technologies, 2015, pp. 797-800, doi: 10.1109/CSNT.2015.179. 382 However, this article does not provide a specification of special values 383 (but the same rules as FMA/FMS could be chosen, based on 1/��0 = ��Inf and 384 1/��Inf = ��0). 385 386 387############################################################################## 3885. Efficiency 389############################################################################## 390 391- Fredrik Johansson wrote a new preprint about computing the gamma function, 392 where he claims a speedup of 10 over MPFR at machine precision, and a 393 speedup of 1000 for a first call to gamma at 10000 digits: 394 https://hal.inria.fr/hal-03346642 395- Fredrik Johansson reports that mpfr_ai is slow for large arguments: an 396 asymptotic expansion should be used (once done, remove REDUCE_EMAX from 397 tests/tai.c and update the description in mpfr.texi). 398- for exp(x), Fredrik Johansson reports a 20% speed improvement starting from 399 4000 bits, and up to a 75% memory improvement in his Arb implementation, by 400 using recursive instead of iterative binary splitting: 401 https://github.com/fredrik-johansson/arb/blob/master/elefun/exp_sum_bs_powtab.c 402- improve mpfr_grandom using the algorithm in https://arxiv.org/abs/1303.6257 403- implement a mpfr_sqrthigh algorithm based on Mulders' algorithm, with a 404 basecase variant 405- use mpn_div_q to speed up mpfr_div. However, mpn_div_q, which is new in 406 GMP 5, is not documented in the GMP manual, thus we are not sure it 407 guarantees to return the same quotient as mpn_tdiv_qr. 408 Also mpfr_div uses the remainder computed by mpn_divrem. A workaround would 409 be to first try with mpn_div_q, and if we cannot (easily) compute the 410 rounding, then use the current code with mpn_divrem. 411- improve atanh(x) for small x by using atanh(x) = log1p(2x/(1-x)), 412 and log1p should also be improved for small arguments. 413- compute exp by using the series for cosh or sinh, which has half the terms 414 (see Exercise 4.11 from Modern Computer Arithmetic, version 0.3) 415 The same method can be used for log, using the series for atanh, i.e., 416 atanh(x) = 1/2*log((1+x)/(1-x)). 417- improve mpfr_gamma (see https://code.google.com/p/fastfunlib/). A possible 418 idea is to implement a fast algorithm for the argument reconstruction 419 gamma(x+k): instead of performing k products by x+i, we could precompute 420 x^2, ..., x^m for m ~ sqrt(k), and perform only sqrt(k) products. 421 One could also use the series for 1/gamma(x), see for example 422 https://dlmf.nist.gov/5/7/ or formula (36) from 423 https://mathworld.wolfram.com/GammaFunction.html 424- improve the computation of Bernoulli numbers: instead of computing just one 425 B[2n] at a time in mpfr_bernoulli_internal, we could compute several at a 426 time, sharing the expensive computation of the 1/p^(2n) series. 427- fix regression with mpfr_mpz_root (from Keith Briggs, 5 July 2006), for 428 example on 3Ghz P4 with gmp-4.2, x=12.345: 429 prec=50000 k=2 k=3 k=10 k=100 430 mpz_root 0.036 0.072 0.476 7.628 431 mpfr_mpz_root 0.004 0.004 0.036 12.20 432 See also mail from Carl Witty on mpfr list, 09 Oct 2007. 433- for sparse input (say x=1 with 2 bits), mpfr_exp is not faster than for 434 full precision when precision <= MPFR_EXP_THRESHOLD. The reason is 435 that argument reduction kills sparsity. Maybe avoid argument reduction 436 for sparse input? 437- speed up mpfr_atan for large arguments (to speed up mpc_log) see FR #6198 438- improve mpfr_sin on values like ~pi (do not compute sin from cos, because 439 of the cancellation). For instance, reduce the input modulo pi/2 in 440 [-pi/4,pi/4], and define auxiliary functions for which the argument is 441 assumed to be already reduced (so that the sin function can avoid 442 unnecessary computations by calling the auxiliary cos function instead of 443 the full cos function). This will require a native code for sin, for 444 example using the reduction sin(3x)=3sin(x)-4sin(x)^3. 445 See https://sympa.inria.fr/sympa/arc/mpfr/2007-08/msg00001.html and 446 the following messages. 447- improve generic.c to work for number of terms <> 2^k 448- rewrite mpfr_greater_p... as native code. 449 450- mpf_t uses a scheme where the number of limbs actually present can 451 be less than the selected precision, thereby allowing low precision 452 values (for instance small integers) to be stored and manipulated in 453 an mpf_t efficiently. 454 455 Perhaps mpfr should get something similar, especially if looking to 456 replace mpf with mpfr, though it'd be a major change. Alternately 457 perhaps those mpfr routines like mpfr_mul where optimizations are 458 possible through stripping low zero bits or limbs could check for 459 that (this would be less efficient but easier). 460 461- try the idea of the paper "Reduced Cancellation in the Evaluation of Entire 462 Functions and Applications to the Error Function" by W. Gawronski, J. Mueller 463 and M. Reinhard, to be published in SIAM Journal on Numerical Analysis: to 464 avoid cancellation in say erfc(x) for x large, they compute the Taylor 465 expansion of erfc(x)*exp(x^2/2) instead (which has less cancellation), 466 and then divide by exp(x^2/2) (which is simpler to compute). 467 468- replace the *_THRESHOLD macros by global (TLS) variables that can be 469 changed at run time (via a function, like other variables)? One benefit 470 is that users could use a single MPFR binary on several machines (e.g., 471 a library provided by binary packages or shared via NFS) with different 472 thresholds. On the default values, this would be a bit less efficient 473 than the current code, but this isn't probably noticeable (this should 474 be tested). Something like: 475 long *mpfr_tune_get(void) to get the current values (the first value 476 is the size of the array). 477 int mpfr_tune_set(long *array) to set the tune values. 478 int mpfr_tune_run(long level) to find the best values (the support 479 for this feature is optional, this can also be done with an 480 external function). 481 482- better distinguish different processors (for example Opteron and Core 2) 483 and use corresponding default tuning parameters (as in GMP). This could be 484 done in configure.ac to avoid hacking config.guess, for example define 485 MPFR_HAVE_CORE2. 486 Note (VL): the effect on cross-compilation (that can be a processor 487 with the same architecture, e.g. compilation on a Core 2 for an 488 Opteron) is not clear. The choice should be consistent with the 489 build target (e.g. -march or -mtune value with gcc). 490 Also choose better default values. For instance, the default value of 491 MPFR_MUL_THRESHOLD is 40, while the best values that have been found 492 are between 11 and 19 for 32 bits and between 4 and 10 for 64 bits! 493 494- during the Many Digits competition, we noticed that (our implantation of) 495 Mulders short product was slower than a full product for large sizes. 496 This should be precisely analyzed and fixed if needed. 497 498- for various functions, check the timings as a function of the magnitude 499 of the input (and the input and/or output precisions?), and use better 500 thresholds for asymptotic expansions. 501 502- improve the special case of mpfr_{add,sub} (x, x, y, ...) when |x| > |y| 503 to do the addition in-place and have a complexity of O(prec(y)) in most 504 cases. The mpfr_{add,sub}_{d,ui} functions should automatically benefit 505 from this change. 506 507- in gmp_op.c, for functions with mpz_srcptr, check whether mpz_fits_slong_p 508 is really useful in all cases (see TODO in this file). 509 510- optimize code that uses a test based on the fact that x >> s is 511 undefined in C for s == width of x but the result is expected to 512 be 0. ARM and PowerPC could benefit from such an optimization, 513 but not x86. This needs support from the compiler. 514 For PowerPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79233 515 516- deal with MPFR_RNDF in mpfr_round_near_x (replaced by MPFR_RNDZ). 517 518- instead of a fixed mparam.h, optionally use function multiversioning 519 (FMV), currently only available with the GNU C++ front end: 520 https://gcc.gnu.org/wiki/FunctionMultiVersioning 521 According to https://lwn.net/Articles/691932/ the dispatch resolution 522 is now done by the dynamic loader, so that this should be fast enough 523 (the cost would be the reading of a static variable, initialized at 524 load time, instead of a constant). 525 In particular, binary package distributions would benefit from FMV as 526 only one binary is generated for different processor families. 527 528- use intrinsics such as _addcarry_u64 (Intel specific) and 529 __builtin_addcll (clang specific) when available 530 (needs a configure test). 531 References: 532 https://software.intel.com/sites/landingpage/IntrinsicsGuide/ 533 https://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins 534 https://stackoverflow.com/q/11228855/3782797 535 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 536 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97387 537 https://stackoverflow.com/questions/71600474/difference-between-builtin-addcll-and-addcarry-u64 538 (would be useful for add1sp.c and sub1sp.c). 539 540- reduce the size of the MPFR library generated by GCC by avoiding 541 operations on _Decimal128 in get_d128.c when its encoding is BID 542 (i.e., software implementation). See FIXME in this file. 543 544 545############################################################################## 5466. Miscellaneous 547############################################################################## 548 549- [suggested by Tobias Burnus <burnus(at)net-b.de> and 550 Asher Langton <langton(at)gcc.gnu.org>, Wed, 01 Aug 2007] 551 support quiet and signaling NaNs in mpfr: 552 * functions to set/test a quiet/signaling NaN: mpfr_set_snan, mpfr_snan_p, 553 mpfr_set_qnan, mpfr_qnan_p 554 * correctly convert to/from double (if encoding of s/qNaN is fixed in 754R) 555 Note: Signaling NaNs are not specified by the ISO C standard and may 556 not be supported by the implementation. GCC needs the -fsignaling-nans 557 option (but this does not affect the C library, which may or may not 558 accept signaling NaNs). 559 560- check the constants mpfr_set_emin (-16382-63) and mpfr_set_emax (16383) in 561 get_ld.c and the other constants, and provide a testcase for large and 562 small numbers. 563 564- from Kevin Ryde <user42@zip.com.au>: 565 Also for pi.c, a pre-calculated compiled-in pi to a few thousand 566 digits would be good value I think. After all, say 10000 bits using 567 1250 bytes would still be small compared to the code size! 568 Store pi in round to zero mode (to recover other modes). 569 570- add other prototypes for round to nearest-away (mpfr_round_nearest_away 571 only deals with the prototypes of say mpfr_sin) or implement it as a native 572 rounding mode 573- add a new roundind mode: round to odd. If the result is not exactly 574 representable, then round to the odd mantissa. This rounding 575 has the nice property that for k > 1, if: 576 y = round(x, p+k, TO_ODD) 577 z = round(y, p, TO_NEAREST_EVEN), then 578 z = round(x, p, TO_NEAREST_EVEN) 579 so it avoids the double-rounding problem. 580 VL: I prefer the (original?) term "sticky rounding", as used in 581 J Strother Moore, Tom Lynch, Matt Kaufmann. A Mechanically Checked 582 Proof of the Correctness of the Kernel of the AMD5K86 Floating-Point 583 Division Algorithm. IEEE Transactions on Computers, 1996. 584 and 585 http://www.russinoff.com/libman/text/node26.html 586 587- new rounding mode MPFR_RNDE when the result is known to be exact? 588 * In normal mode, this would allow MPFR to optimize using 589 this information. 590 * In debug mode, MPFR would check that the result is exact 591 (i.e. that the ternary value is 0). 592 593- add tests of the ternary value for constants 594 595- When doing Extensive Check (--enable-assert=full), since all the 596 functions use a similar use of MACROS (ZivLoop, ROUND_P), it should 597 be possible to do such a scheme: 598 For the first call to ROUND_P when we can round. 599 Mark it as such and save the approximated rounding value in 600 a temporary variable. 601 Then after, if the mark is set, check if: 602 - we still can round. 603 - The rounded value is the same. 604 It should be a complement to tgeneric tests. 605 606- in div.c, try to find a case for which cy != 0 after the line 607 cy = mpn_sub_1 (sp + k, sp + k, qsize, cy); 608 (which should be added to the tests), e.g. by having {vp, k} = 0, or 609 prove that this cannot happen. 610 611- add a configure test for --enable-logging to ignore the option if 612 it cannot be supported. Modify the "configure --help" description 613 to say "on systems that support it". 614 615- add generic bad cases for functions that don't have an inverse 616 function that is implemented (use a single Newton iteration). 617 618- add bad cases for the internal error bound (by using a dichotomy 619 between a bad case for the correct rounding and some input value 620 with fewer Ziv iterations?). 621 622- add an option to use a 32-bit exponent type (int) on LP64 machines, 623 mainly for developers, in order to be able to test the case where the 624 extended exponent range is the same as the default exponent range, on 625 such platforms. 626 Tests can be done with the exp-int branch (added on 2010-12-17, and 627 many tests fail at this time). 628 629- test underflow/overflow detection of various functions (in particular 630 mpfr_exp) in reduced exponent ranges, including ranges that do not 631 contain 0. 632 633- add an internal macro that does the equivalent of the following? 634 MPFR_IS_ZERO(x) || MPFR_GET_EXP(x) <= value 635 636- check whether __gmpfr_emin and __gmpfr_emax could be replaced by 637 a constant (see README.dev). Also check the use of MPFR_EMIN_MIN 638 and MPFR_EMAX_MAX. 639 640- add a test checking that no mpfr.h macros depend on mpfr-impl.h 641 (the current tests cannot check that since mpfr-impl.h is always 642 included). 643 644- move some macro definitions from acinclude.m4 to the m4 directory 645 as suggested by the Automake manual? The reason is that the 646 acinclude.m4 file is big and a bit difficult to read. 647 648- use symbol versioning. 649 650- check whether mpz_t caching (pool) is necessary. Timings with -static 651 with details about the C / C library implementation should be put 652 somewhere as a comment in the source or in the doc. Using -static 653 is important because otherwise the cache saves the dynamic call to 654 mpz_init and mpz_clear; so, what we're measuring is not clear. 655 See thread: 656 https://gmplib.org/list-archives/gmp-devel/2015-September/004147.html 657 Summary: It will not be integrated in GMP because 1) This yields 658 problems with threading (in MPFR, we have TLS variables, but this is 659 not the case of GMP). 2) The gain (if confirmed with -static) would 660 be due to a poor malloc implementation (timings would depend on the 661 platform). 3) Applications would use more RAM. 662 Additional notes [VL]: the major differences in the timings given 663 by Patrick in 2014-01 under Linux were: 664 Before: 665 arccos(x) took 0.054689 ms (32767 eval in 1792 ms) 666 arctan(x) took 0.042116 ms (32767 eval in 1380 ms) 667 After: 668 arccos(x) took 0.043580 ms (32767 eval in 1428 ms) 669 arctan(x) took 0.035401 ms (32767 eval in 1160 ms) 670 mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes 671 from mpfr_atan, which uses mpz a lot. The problem mainly comes from the 672 reallocations in GMP because mpz_init is used instead of mpz_init2 with 673 the estimated maximum size. Other places in the code that uses mpz_init 674 may be concerned. 675 Issues with mpz_t caching: 676 * The pool can take much memory, which may no longer be useful. 677 For instance: 678 mpfr_init2 (x, 10000000); 679 mpfr_log_ui (x, 17, MPFR_RNDN); 680 /* ... */ 681 mpfr_clear (x); 682 /* followed by code using only small precision */ 683 while contrary to real caches, they contain no data. This is not 684 valuable memory: freeing/allocating a large block of memory is 685 much faster than the actual computations, so that mpz_t caching 686 has no impact on the performance in such cases. A pool with large 687 blocks also potentially destroys the data locality. 688 * It assumes that the real GMP functions are __gmpz_init and 689 __gmpz_clear, which are not part of the official GMP API, thus 690 is based on GMP internals, which may change in the future or 691 may be different in forks / compatible libraries / etc. This 692 can be solved if MPFR code calls mpfr_mpz_init / mpfr_mpz_clear 693 directly, avoiding the #define's. 694 Questions that need to be answered: 695 * What about the comparisons with other memory allocators? 696 * Shouldn't the pool be part of the memory allocator? 697 For the default memory allocator (malloc): RFE? 698 If it is decided to keep some form of mpz_t caching, a possible solution 699 for both issues: define mpfr_mpz_init2 and mpfr_mpz_clear2, which both 700 take 2 arguments like mpz_init2, where mpfr_mpz_init2 behaves in a way 701 similar to mpz_init2, and mpfr_mpz_clear2 behaves in a way similar to 702 mpz_clear but where the size argument is a hint for the pool; if it is 703 too large, then the mpz_t should not be pushed back to the pool. The 704 size argument of mpfr_mpz_init2 could also be a hint to decide which 705 element to pull from the pool. 706 707- in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722, 708 that is, with a large error during the computation of the secondary term 709 (when the TMD occurs). 710 711- use the keyword "static" in array indices of parameter declarations with 712 C99 compilers (6.7.5.3p7) when the pointer is expected not to be null? 713 For instance, if mpfr.h is changed to have: 714 __MPFR_DECLSPEC void mpfr_dump (const __mpfr_struct [static 1]); 715 and one calls 716 mpfr_dump (NULL); 717 one gets a warning with Clang. This is just an example; this needs to be 718 done in a clean way. 719 See: 720 https://stackoverflow.com/a/3430353/3782797 721 https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html 722 723- change most mpfr_urandomb occurrences to mpfr_urandom in the tests? 724 (The one done in r10573 allowed us to find a bug even without 725 assertion checking.) 726 727- tzeta has been much slower since r9848 (which increases the precision 728 of the input for the low output precisions), at least with the x86 729 32-bit ABI. This seems to come from the fact that the working precision 730 in the mpfr_zeta implementation depends on the precision of the input. 731 Once mpfr_zeta has improved, change the last argument of test_generic 732 in tzeta.c back to 5 (as it was before r10667). 733 734- check the small-precision tables in the tests? 735 This may require to export some pointer to the tables, but this could 736 be done only if some debug macro is defined. 737 738- optionally use malloc() for the caches? See mpfr_mp_memory_cleanup. 739 Note: This can be implemented by adding a TLS flag saying whether we 740 are under cache generation or not, and by making the MPFR allocation 741 functions consider this flag. Moreover, this can only work for mpfr_t 742 caching (floating-point constants), not for mpz_t caching (Bernoulli 743 constants) because we do not have the control of memory allocation for 744 mpz_init. 745 746- use GCC's nonnull attribute (available since GCC 4.0) where applicable. 747 748- avoid the use of MPFR_MANT(x) as an lvalue; use other (more high level) 749 internal macros if possible, such as MPFR_TMP_INIT1, MPFR_TMP_INIT and 750 MPFR_ALIAS. 751 752- add some option to use pkg-config to get the compiler flags associated 753 with GMP ("pkg-config --cflags gmp" / "pkg-config --libs gmp"). It 754 could be one of 755 --with-pkg-config[=<PATH>] 756 --with-pkg-config-path[=<PATH>] 757 See also the autoconf macros. 758 The non-standard CPATH and C_INCLUDE_PATH environment variables need to 759 be unset (only when invoking pkg-config) as a workaround to 760 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984504 761 762 763############################################################################## 7647. Portability 765############################################################################## 766 767- [Kevin about texp.c long strings] 768 For strings longer than c99 guarantees, it might be cleaner to 769 introduce a "tests_strdupcat" or something to concatenate literal 770 strings into newly allocated memory. I thought I'd done that in a 771 couple of places already. Arrays of chars are not much fun. 772 773- use https://gcc.gnu.org/viewcvs/gcc/trunk/config/stdint.m4 for mpfr-gmp.h 774 775- By default, GNU Automake adds -I options to local directories, with 776 the side effect that these directories have the precedence to search 777 for system headers (#include <...>). This may make the build fail if 778 a C implementation includes a file that has the same name as one used 779 in such a directory. 780 For instance, if one adds an empty file "src/bits/types.h", then the 781 MPFR build fails under Linux because /usr/include/stdio.h has 782 #include <bits/types.h> 783 Possible workaround: 784 * disable the default -I options with nostdinc as documented in 785 the Automake manual; 786 * have a rule that copies the needed files ("mpfr.h" or they should 787 be prefixed with "mpfr-") to $(top_builddir)/include; 788 * use "-I$(top_builddir)/include". 789 790- _Noreturn (from ISO C11) may be replaced by [[noreturn]] in the future: 791 https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2764.pdf 792 Once this is implemented in compilers, consider it as an alternative 793 (concerned files: acinclude.m4, doc/README.dev and src/mpfr-impl.h). 794