History log of /freebsd-11-stable/lib/msun/src/k_tanf.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 239192 11-Aug-2012 dim

Change a few extern inline functions in libm to static inline, since
they need to refer to static constants, which C99 does not allow for
extern inline functions.

While here, change a comment in e_rem_pio2f.c to mention the correct
number of bits.

Reviewed by: bde
MFC after: 1 week


# 193368 03-Jun-2009 ed

Use ISO C99 style inline semantics in msun.

Because we use ISO C99 nowadays, we can just get rid of enforcing
GNU89-style inlining.


# 187128 13-Jan-2009 das

Use __gnu89_inline so that these files will compile with newer versions
of gcc, where the meaning of 'inline' was changed to match C99.

Noticed by: rdivacky


# 176451 22-Feb-2008 das

s/rcsid/__FBSDID/


# 152881 28-Nov-2005 bde

Rearranged the polynomial evaluation some more to reduce dependencies.
Instead of echoing the code in a comment, try to describe why we split
up the evaluation in a special way.

The new optimization is mostly to move the evaluation of w = z*z later
so that everything else (except z = x*x) doesn't have to wait for w.
On Athlons, FP multiplication has a latency of 4 cycles so this
optimization saves 4 cycles per call provided no new dependencies are
introduced. Tweaking the other terms in to reduce dependencies saves
a couple more cycles in some cases (more on AXP than on A64; up to 8
cycles out of 56 altogether in some cases). The previous version had
a similar optimization for s = z*x. Special optimizations like these
probably have a larger effect than the simple 2-way vectorization
permitted (but not activated by gcc) in the old version, since 2-way
vectorization is not enough and the polynomial's degree is so small
in the float case that non-vectorizable dependencies dominate.

On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 34-55 cycles (was 39-59 cycles).


# 152870 28-Nov-2005 bde

Changed spelling of the request-to-inline macro name to match the change
of the function name.

Added my (non-)copyright.

In k_tanf.c, added the first set of redundant parentheses to control
grouping which was claimed to be added in the previous commit.


# 152766 24-Nov-2005 bde

Minor cleanups and optimizations:

- Remove dead code that I forgot to remove in the previous commit.

- Calculate the sum of the lower terms of the polynomial (divided by
x**5) in a single expression (sum of odd terms) + (sum of even terms)
with parentheses to control grouping. This is clearer and happens to
give better instruction scheduling for a tiny optimization (an
average of about ~0.5 cycles/call on Athlons).

- Calculate the final sum in a single expression with parentheses to
control grouping too. Change the grouping from
first_term + (second_term + sum_of_lower_terms) to
(first_term + second_term) + sum_of_lower_terms. Normally the first
grouping must be used for accuracy, but extra precision makes any
grouping give a correct result so we can group for efficiency. This
is a larger optimization (average 3-4 cycles/call or 5%).

- Use parentheses to indicate that the C order of left to right evaluation
is what is wanted (for efficiency) in a multiplication too.

The old fdlibm code has several optimizations related to these. 2
involve doing an extra operation that can be done almost in parallel
on some superscalar machines but are pessimizations on sequential
machines. Others involve statement ordering or expression grouping.
All of these except the ordering for the combining the sums of the odd
and even terms seem to be ideal for Athlons, but parallelism is still
limited so all of these optimizations combined together with the ones
in this commit save only ~6-8 cycles (~10%).

On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 39-59 cycles. I don't know of any more optimizations for tanf()
short of writing it all in asm with very MD instruction scheduling.
Hardware fsin takes 122-138 cycles. Most of the optimizations for
tanf() don't work very well for tan[l](). fdlibm tan() now takes
145-365 cycles.


# 152741 24-Nov-2005 bde

Optimized by eliminating the special case for 0.67434 <= |x| < pi/4.

A single polynomial approximation for tan(x) works in infinite precision
up to |x| < pi/2, but in finite precision, to restrict the accumulated
roundoff error to < 1 ulp, |x| must be restricted to less than about
sqrt(0.5/((1.5+1.5)/3)) ~= 0.707. We restricted it a bit more to
give a safety margin including some slop for optimizations. Now that
we use double precision for the calculations, the accumulated roundoff
error is in double-precision ulps so it can easily be made almost 2**29
times smaller than a single-precision ulp. Near x = pi/4 its maximum
is about 0.5+(1.5+1.5)*x**2/3 ~= 1.117 double-precision ulps.

The minimax polynomial needs to be different to work for the larger
interval. I didn't increase its degree the old degree is just large
enough to keep the final error less than 1 ulp and increasing the
degree would be a pessimization. The maximum error is now ~0.80
ulps instead of ~0.53 ulps.

The speedup from this optimization for uniformly distributed args in
[-2pi, 2pi] is 28-43% on athlons, depending on how badly gcc selected
and scheduled the instructions in the old version. The old version
has some int-to-float conversions that are apparently difficult to schedule
well, but gcc-3.3 somehow did everything ~10 cycles or ~10% faster than
gcc-3.4, with the difference especially large on AXPs. On A64s, the
problem seems to be related to documented penalties for moving single
precision data to undead xmm registers. With this version, the speed
is cycles is almost independent of the athlon and gcc version despite
the large differences in instruction selection to use the FPU on AXPs
and SSE on A64s.


# 152713 23-Nov-2005 bde

Use only double precision for "kernel" tanf (except for returning float).
This is a minor interface change. The function is renamed from
__kernel_tanf() to __kernel_tandf() so that misues of it will cause
link errors and not crashes.

This version is a routine translation with no special optimizations
for accuracy or efficiency. It gives an unimportant increase in
accuracy, from ~0.9 ulps to 0.5285 ulps. Almost all of the error is
from the minimax polynomial (~0.03 ulps and the final rounding step
(< 0.5 ulps). It gives strange differences in efficiency in the -5
to +10% range, with -O1 fairly consistently becoming faster and -O2
slower on AXP and A64 with gcc-3.3 and gcc-3.4.


# 152647 21-Nov-2005 bde

Mess up the "kernel" float trig function .c files with ifdefs so that
they can be #included in other .c files to give inline functions, and
use them to inline the functions in most callers (not in e_lgammaf_r.c).
__kernel_tanf() is too large and complicated for gcc to inline very well.

An athlons, this gives a speed increase under favourable pipeline
conditions of about 10% overall (larger for AXP, smaller for A64).
E.g., on AXP, sinf() on uniformly distributed args in [-2Pi, 2Pi]
now takes 30-56 cycles; it used to take 45-61 cycles; hardware fsin
takes 65-129.


# 152641 20-Nov-2005 bde

Use double precision to simplify and optimize a long division.

On athlons, this gives a speedup of 10-20% for tanf() on uniformly
distributed args in [-2Pi, 2Pi]. (It only directly applies for 43%
of the args and gives a 16-20% speedup for these (more for AXP than
A64) and this gives an overall speedup of 10-12% which is all that it
should; however, it gives an overall speedup of 17-20% with gcc-3.3
on AXP-A64 by mysteriously effected cases where it isn't executed.)

I originally intended to use double precision for all internals of
float trig functions and will probably still do this, but benchmarking
showed that converting to double precision and back is a pessimization
in cases where a simple float precision calculation works, so it may
be optimal to switch precisions only when using extra precision is
much simpler.


# 152343 12-Nov-2005 bde

Imoproved comments for the minimax polynomial.

Removed an unused variable.

Fixed some wrong comments and some nearby misformatting.


# 152284 10-Nov-2005 bde

As for __kernel_cosf() and __kernel_sinf(), use a fairly optimal minimax
polynomial for __kernel_tanf(). The old one was the double-precision
polynomial with coefficients truncated to float. Truncation is not
a good way to convert minimax polynomials to lower precision. Optimize
for efficiency and use the lowest-degree polynomial that gives a
relative error of less than 1 ulp. It has degree 13 instead of 27,
and happens to be 2.5 times more accurate (in infinite precision) than
the old polynomial (the maximum error is 0.017 ulps instead of 0.041
ulps).

Unlike for cosf and sinf, the old accuracy was close to being inadequate
-- the polynomial for double precision has a max error of 0.014 ulps
and nearly this small an error is needed. The new accuracy is also a
bit small, but exhaustive checking shows that even the old accuracy
was enough. The increased accuracy reduces the maximum relative error
in the final result on amd64 -O1 from 0.9588 ulps to 0.9044 ulps.


# 151969 02-Nov-2005 bde

Moved the optimization for tiny x from __kernel_tan[f](x) to tan[f](x)
so that it can be faster for tiny x and avoided for reduced x.

This improves things a little differently than for cosine and sine.
We still need to reclassify x in the "kernel" functions, but we get
an extra optimization for tiny x, and an overall optimization since
tiny reduced x rarely happens. We also get optimizations for space
and style. A large block of poorly duplicated code to fix a special
case is no longer needed. This supersedes the fixes in k_sin.c revs
1.9 and 1.11 and k_sinf.c 1.8 and 1.10.

Fixed wrong constant for the cutoff for "tiny" in tanf(). It was
2**-28, but should be almost the same as the cutoff in sinf() (2**-12).
The incorrect cutoff protected us from the bugs fixed in k_sinf.c 1.8
and 1.10, except 4 cases of reduced args passed the cutoff and needed
special handling in theory although not in practice. Now we essentially
use a cutoff of 0 for the case of reduced args, so we now have 0 special
args instead of 4.

This change makes no difference to the results for sinf() (since it
only changes the algorithm for the 4 special args and the results for
those happen not to change), but it changes lots of results for sin().
Exhaustive testing is impossible for sin(), but exhaustive testing
for sinf() (relative to a version with the old algorithm and a fixed
cutoff) shows that the changes in the error are either reductions or
from 0.5-epsilon ulps to 0.5+epsilon ulps. The new method just uses
some extra terms in approximations so it tends to give more accurate
results, and there are apparently no problems from having extra
accuracy. On amd64 with -O1, on all float args the error range in ulps
is reduced from (0.500, 0.665] to [0.335, 0.500) in 24168 cases and
increased from 0.500-epsilon to 0.500+epsilon in 24 cases. Non-
exhaustive testing by ucbtest shows no differences.


# 151963 02-Nov-2005 bde

Removed dead code for handling tan[f]() on odd multiples of pi/2. This
case never occurs since pi/2 is irrational so no multiple of it can
be represented as a float and we have precise arg reduction so we never
end up with a remainder of 0 in the "kernel" function unless the
original arg is 0.

If this case occurs, then we would now fall through to general code
that returns +-Inf (depending on the sign of the reduced arg) instead
of forcing +Inf. The correct handling would be to return NaN since
we would have lost so much precision that the correct result can be
anything _except_ +-Inf.

Don't reindent the else clause left over from this, although it was already
bogusly indented ("if (foo) return; else ..." just marches the indentation
to the right), since it will be removed too.

Index: k_tan.c
===================================================================
RCS file: /home/ncvs/src/lib/msun/src/k_tan.c,v
retrieving revision 1.10
diff -r1.10 k_tan.c
88,90c88
< if (((ix | low) | (iy + 1)) == 0)
< return one / fabs(x);
< else {
---
> {


# 151961 02-Nov-2005 bde

Fixed some of the silliness related to rev.1.8. In 1.8, "double" in
a declaration was not translated to "float" although bit fiddling on
double variables was translated. This resulted in garbage being put
into the low word of one of the doubles instead of non-garbage being
put into the only word of the intended float. This had no effect on
any result because:
- with doubles, the algorithm for calculating -1/(x+y) is unnecessarily
complicated. Just returning -1/((double)x+y) would work, and the
misdeclaration gave something like that except for messing up some
low bits with the bit fiddling.
- doubles have plenty of bits to spare so messing up some of the low
bits is unlikely to matter.
- due to other bugs, the buggy code is reached for a whole 4 args out
of all 2**32 float args. The bug fixed by 1.8 only affects a small
percentage of cases and a small percentage of 4 is 0. The 4 args
happen to cause no problems without 1.8, so they are even less likely
to be affected by the bug in 1.8 than average args; in fact, neither
1.8 nor this commit makes any difference to the result for these 4
args (and thus for all args).

Corrections to the log message in 1.8: the bug only applies to tan()
and not tanf(), not because the float type can't represent numbers
large enough to trigger the problem (e.g., the example in the fdlibm-5.3
readme which is > 1.0e269), but because:
- the float type can't represent small enough numbers. For there to be
a possible problem, the original arg for tanf() must lie very near an
odd multiple of pi/2. Doubles can get nearer in absolute units. In
ulps there should be little difference, but ...
- ... the cutoff for "small" numbers is bogus in k_tanf.c. It is still
the double value (2**-28). Since this is 32 times smaller than
FLT_EPSILON and large float values are not very uniformly distributed,
only 6 args other than ones that are initially below the cutoff give
a reduced arg that passes the cutoff (the 4 problem cases mentioned
above and 2 non-problem cases).

Fixing the cutoff makes the bug affect tanf() and much easier to detect
than for tan(). With a cutoff of 2**-12 on amd64 with -O1, 670102
args pass the cutoff; of these, there are 337604 cases where there
might be an error of >= 1 ulp and 5826 cases where there is such an
error; the maximum error is 1.5382 ulps.

The fix in 1.8 works with the reduced cutoff in all cases despite the
bug in it. It changes the result in 84492 cases altogether to fix the
5826 broken cases. Fixing the fix by translating "double" to "float"
changes the result in 42 cases relative to 1.8. In 24 cases the
(absolute) error is increased and in 18 cases it is reduced, but it
remains less than 1 ulp in all cases.


# 129981 02-Jun-2004 das

Port a bugfix from FDLIBM 5.3. The bug really only applies to tan()
and not tanf() because float type can't represent numbers large enough
to trigger the problem. However, there seems to be a precedent that
the float versions of the fdlibm routines should mirror their double
counterparts.

Also update to the FDLIBM 5.3 license.

Obtained from: FDLIBM
Reviewed by: exhaustive comparison


# 97413 28-May-2002 alfred

Fix formatting, this is hard to explain, so I'll show one example.

- float ynf(int n, float x) /* wrapper ynf */
+float
+ynf(int n, float x) /* wrapper ynf */

This is because the __STDC__ stuff was indented.

Reviewed by: md5


# 97409 28-May-2002 alfred

Assume __STDC__, remove non-__STDC__ code.

Reviewed by: md5


# 50476 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 22993 22-Feb-1997 peter

Revert $FreeBSD$ to $Id$


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 8870 30-May-1995 rgrimes

Remove trailing whitespace.


# 2117 19-Aug-1994 jkh

This commit was generated by cvs2svn to compensate for changes in r2116,
which included commits to RCS files with non-trunk default branches.


# 2116 19-Aug-1994 jkh

J.T. Conklin's latest version of the Sun math library.

-- Begin comments from J.T. Conklin:
The most significant improvement is the addition of "float" versions
of the math functions that take float arguments, return floats, and do
all operations in floating point. This doesn't help (performance)
much on the i386, but they are still nice to have.

The float versions were orginally done by Cygnus' Ian Taylor when
fdlibm was integrated into the libm we support for embedded systems.
I gave Ian a copy of my libm as a starting point since I had already
fixed a lot of bugs & problems in Sun's original code. After he was
done, I cleaned it up a bit and integrated the changes back into my
libm.
-- End comments

Reviewed by: jkh
Submitted by: jtc