Cross Reference: /freebsd-11-stable/lib/msun/src/s

History log of /freebsd-11-stable/lib/msun/src/s_cosf.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 302408	07-Jul-2016	gjb	Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here. Additional commits post-branch will follow. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-11-stable/MAINTAINERS /freebsd-11-stable/cddl /freebsd-11-stable/cddl/contrib/opensolaris /freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print /freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs /freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs /freebsd-11-stable/contrib/amd /freebsd-11-stable/contrib/apr /freebsd-11-stable/contrib/apr-util /freebsd-11-stable/contrib/atf /freebsd-11-stable/contrib/binutils /freebsd-11-stable/contrib/bmake /freebsd-11-stable/contrib/byacc /freebsd-11-stable/contrib/bzip2 /freebsd-11-stable/contrib/com_err /freebsd-11-stable/contrib/compiler-rt /freebsd-11-stable/contrib/dialog /freebsd-11-stable/contrib/dma /freebsd-11-stable/contrib/dtc /freebsd-11-stable/contrib/ee /freebsd-11-stable/contrib/elftoolchain /freebsd-11-stable/contrib/elftoolchain/ar /freebsd-11-stable/contrib/elftoolchain/brandelf /freebsd-11-stable/contrib/elftoolchain/elfdump /freebsd-11-stable/contrib/expat /freebsd-11-stable/contrib/file /freebsd-11-stable/contrib/gcc /freebsd-11-stable/contrib/gcclibs/libgomp /freebsd-11-stable/contrib/gdb /freebsd-11-stable/contrib/gdtoa /freebsd-11-stable/contrib/groff /freebsd-11-stable/contrib/ipfilter /freebsd-11-stable/contrib/ldns /freebsd-11-stable/contrib/ldns-host /freebsd-11-stable/contrib/less /freebsd-11-stable/contrib/libarchive /freebsd-11-stable/contrib/libarchive/cpio /freebsd-11-stable/contrib/libarchive/libarchive /freebsd-11-stable/contrib/libarchive/libarchive_fe /freebsd-11-stable/contrib/libarchive/tar /freebsd-11-stable/contrib/libc++ /freebsd-11-stable/contrib/libc-vis /freebsd-11-stable/contrib/libcxxrt /freebsd-11-stable/contrib/libexecinfo /freebsd-11-stable/contrib/libpcap /freebsd-11-stable/contrib/libstdc++ /freebsd-11-stable/contrib/libucl /freebsd-11-stable/contrib/libxo /freebsd-11-stable/contrib/llvm /freebsd-11-stable/contrib/llvm/projects/libunwind /freebsd-11-stable/contrib/llvm/tools/clang /freebsd-11-stable/contrib/llvm/tools/lldb /freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump /freebsd-11-stable/contrib/llvm/tools/llvm-lto /freebsd-11-stable/contrib/mdocml /freebsd-11-stable/contrib/mtree /freebsd-11-stable/contrib/ncurses /freebsd-11-stable/contrib/netcat /freebsd-11-stable/contrib/ntp /freebsd-11-stable/contrib/nvi /freebsd-11-stable/contrib/one-true-awk /freebsd-11-stable/contrib/openbsm /freebsd-11-stable/contrib/openpam /freebsd-11-stable/contrib/openresolv /freebsd-11-stable/contrib/pf /freebsd-11-stable/contrib/sendmail /freebsd-11-stable/contrib/serf /freebsd-11-stable/contrib/sqlite3 /freebsd-11-stable/contrib/subversion /freebsd-11-stable/contrib/tcpdump /freebsd-11-stable/contrib/tcsh /freebsd-11-stable/contrib/tnftp /freebsd-11-stable/contrib/top /freebsd-11-stable/contrib/top/install-sh /freebsd-11-stable/contrib/tzcode/stdtime /freebsd-11-stable/contrib/tzcode/zic /freebsd-11-stable/contrib/tzdata /freebsd-11-stable/contrib/unbound /freebsd-11-stable/contrib/vis /freebsd-11-stable/contrib/wpa /freebsd-11-stable/contrib/xz /freebsd-11-stable/crypto/heimdal /freebsd-11-stable/crypto/openssh /freebsd-11-stable/crypto/openssl /freebsd-11-stable/gnu/lib /freebsd-11-stable/gnu/usr.bin/binutils /freebsd-11-stable/gnu/usr.bin/cc/cc_tools /freebsd-11-stable/gnu/usr.bin/gdb /freebsd-11-stable/lib/libc/locale/ascii.c /freebsd-11-stable/sys/cddl/contrib/opensolaris /freebsd-11-stable/sys/contrib/dev/acpica /freebsd-11-stable/sys/contrib/ipfilter /freebsd-11-stable/sys/contrib/libfdt /freebsd-11-stable/sys/contrib/octeon-sdk /freebsd-11-stable/sys/contrib/x86emu /freebsd-11-stable/sys/contrib/xz-embedded /freebsd-11-stable/usr.sbin/bhyve/atkbdc.h /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h /freebsd-11-stable/usr.sbin/bhyve/console.c /freebsd-11-stable/usr.sbin/bhyve/console.h /freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h /freebsd-11-stable/usr.sbin/bhyve/rfb.c /freebsd-11-stable/usr.sbin/bhyve/rfb.h /freebsd-11-stable/usr.sbin/bhyve/sockstream.c /freebsd-11-stable/usr.sbin/bhyve/sockstream.h /freebsd-11-stable/usr.sbin/bhyve/usb_emul.c /freebsd-11-stable/usr.sbin/bhyve/usb_emul.h /freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c /freebsd-11-stable/usr.sbin/bhyve/vga.c /freebsd-11-stable/usr.sbin/bhyve/vga.h
# 176569	25-Feb-2008	bde	Inline __ieee754__rem_pio2f(). On amd64 (A64) and i386 (A64), this gives an average speedup of about 12 cycles or 17% for 9pi/4 < \|x\| <= 2**19pi/2 and a smaller speedup for larger x, and a small speeddown for \|x\| <= 9pi/4 (only 1-2 cycles average, but that is 4%). Inlining this is less likely to bust caches than inlining the float version since it is much smaller (about 220 bytes text and rodata) and has many fewer branches. However, the float version was already large due to its manual inlining of the branches and also the polynomial evaluations.
# 176552	25-Feb-2008	bde	Change __ieee754_rem_pio2f() to return double instead of float so that this function and its callers cosf(), sinf() and tanf() don't waste time converting values from doubles to floats and back for \|x\| > 9pi/4. All these functions were optimized a few years ago to mostly use doubles internally and across the __kernel() interfaces but not across the __ieee754_rem_pio2f() interface. This saves about 40 cycles in cosf(), sinf() and tanf() for \|x\| > 9pi/4 on amd64 (A64), and about 20 cycles on i386 (A64) (except for cosf() and sinf() in the upper range). 40 cycles is about 35% for \|x\| < 9pi/4 <= 219pi/2 and about 5% for \|x\| > 2*19pi/2. The saving is much larger on amd64 than on i386 since the conversions are not easy to optimize except on i386 where some of them are automatic and others are optimized invalidly. amd64 is still about 10% slower in cosf() and tanf() in the lower range due to conversion overhead. This also gives a tiny speedup for \|x\| <= 9pi/4 on amd64 (by simplifying the code). It also avoids compiler bugs and/or additional slowness in the conversions on (not yet supported) machines where double_t != double.
# 176451	22-Feb-2008	das	s/rcsid/__FBSDID/
# 152949	30-Nov-2005	bde	Fixed cosf(x) when x is a "negative" NaNs. I broke this in rev.1.10. cosf(x) is supposed to return something like x when x is a NaN, and we actually fairly consistently return x-x which is normally very like x (on i386 and and it is x if x is a quiet NaN and x with the quiet bit set if x is a signaling NaN. Rev.1.10 broke this by normalising x to fabsf(x). It's not clear if fabsf(x) is should preserve x if x is a NaN, but it actually clears the sign bit, and other parts of the code depended on this. The bugs can be fixed by saving x before normalizing it, and using the saved x only for NaNs, and using uint32_t instead of int32_t for ix so that negative NaNs are not misclassified even if fabsf() doesn't clear their sign bit, but gcc pessimizes the saving very well, especially on Athlon XPs (it generates extra loads and stores, and mixes use of the SSE and i387, and this somehow messes up pipelines). Normalizing x is not a very good optimization anyway, so stop doing it. (It adds latency to the FPU pipelines, but in previous versions it helped except for \|x\| <= 3pi/4 by simplifying the integer pipelines.) Use the same organization as in s_sinf.c and s_tanf.c with some branches reordered. These changes combined recover most of the performance of the unfixed version on A64 but still lose 10% on AXP with gcc-3.4 -O1 but not with gcc-3.3 -O1.
# 152872	28-Nov-2005	bde	Exploit skew-symmetry to avoid an operation: -sin(x-A) = sin(A-x). This gives a tiny but hopefully always free optimization in the 2 quadrants to which it applies. On Athlons, it reduces maximum latency by 4 cycles in these quadrants but has usually has a smaller effect on total time (typically ~2 cycles (~5%), but sometimes 8 cycles when the compiler generates poor code).
# 152871	28-Nov-2005	bde	Try to use the "proximity" (~) operator consistently in comments (x ~<= a, not x <= ~a). This got messed up in some places when the comments were moved from e_rem_pio2f.c. Added my (non-)copyright.
# 152869	28-Nov-2005	bde	Use only double precision for "kernel" cosf and sinf (except for returning float). The functions are renamed from __kernel_{cos,sin}f() to __kernel_{cos,sin}df() so that misuses of them will cause link errors and not crashes. This version is an almost-routine translation with no special optimizations for accuracy or efficiency. The not-quite-routine part is that in __kernel_cosf(), regenerating the minimax polynomial with double precision coefficients gives a coefficient for the x2 term that is not quite -0.5, so the literal 0.5 in the code and the related `hz' variable need to be modified; also, the special code for reducing the error in 1.0-x20.5 is no longer needed, so it is convenient to adjust all the logic for the x2 term a little. Note that without extra precision, it would be very bad to use a coefficient of other than -0.5 for the x2 term -- the old version depends on multiplication by -0.5 being infinitely precise so as not to need even more special code for reducing the error in 1-x20.5. This gives an unimportant increase in accuracy, from ~0.8 to ~0.501 ulps. Almost all of the error is from the final rounding step, since the choice of the minimax polynomials so that their contribution to the error is a bit less than 0.5 ulps just happens to give contributions that are significantly less (~.001 ulps). An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives overall speed increases in the 10-20% range, despite giving a speed decrease of typically 19% (from 31 cycles up to 37) for sinf() on args in [-pi/4, pi/4].
# 152647	21-Nov-2005	bde	Mess up the "kernel" float trig function .c files with ifdefs so that they can be #included in other .c files to give inline functions, and use them to inline the functions in most callers (not in e_lgammaf_r.c). __kernel_tanf() is too large and complicated for gcc to inline very well. An athlons, this gives a speed increase under favourable pipeline conditions of about 10% overall (larger for AXP, smaller for A64). E.g., on AXP, sinf() on uniformly distributed args in [-2Pi, 2Pi] now takes 30-56 cycles; it used to take 45-61 cycles; hardware fsin takes 65-129.
# 152596	19-Nov-2005	bde	Moved all the optimizations for \|x\| <= 9pi/2 from __ieee754_rem_pio2f() to its 3 callers and manually inline them. On Athlons, with favourable compiler flags and optimizations and favourable pipeline conditions, this gives a speedup of 30-40 cycles for cosf(), sinf() and tanf() on the range pi/4 < \|x\| <= 9pi/4, so thes functions are now signifcantly faster than the hardware trig functions in many cases. E.g., in a benchmark with uniformly distributed x in [-2pi, 2pi], A64 hardware fcos took 72-129 cycles and cosf() took 37-55 cycles. Out-of-order execution is needed to get both of these times. The optimizations in this commit apparently work more by removing 1 serialization point than by reducing latency.
# 152540	17-Nov-2005	bde	Minor cleanups: s_cosf.c and s_sinf.c: Use a non-bogus magic constant for the threshold of pi/4. It was 2 ulps smaller than pi/4 rounded down, but its value is not critical so it should be the result of natural rounding. s_cosf.c and s_tanf.c: Use a literal 0.0 instead of an unnecessary variable initialized to [(float)]0.0. Let the function prototype convert to 0.0F. Improved wording in some comments. Attempted to improve indentation of comments.
# 151620	24-Oct-2005	bde	Moved the optimization for tiny x from __kernel_{cos,sin}[f](x) to {cos_sin}[f](x) so that x doesn't need to be reclassified in the "kernel" functions to determine if it is tiny (it still needs to be reclassified in the cosine case for other reasons that will go away). This optimization is quite large for exponentially distributed x, since x is tiny for almost half of the domain, but it is a pessimization for uniformally distributed x since it takes a little time for all cases but rarely applies. Arg reduction on exponentially distributed x rarely gives a tiny x unless the reduction is null, so it is best to only do the optimization if the initial x is tiny, which is what this commit arranges. The imediate result is an average optimization of 1.4% relative to the previous version in a case that doesn't favour the optimization (double cos(x) on all float x) and a large pessimization for the relatively unimportant cases of lgamma[f][_r](x) on tiny, negative, exponentially distributed x. The optimization should be recovered for lgamma() as part of fixing lgamma()'s low-quality arg reduction. Fixed various wrong constants for the cutoff for "tiny". For cosine, the cutoff is when x2/2! == {FLT or DBL}_EPSILON/2. We round down to an integral power of 2 (and for cos() reduce the power by another 1) because the exact cutoff doesn't matter and would take more work to determine. For sine, the exact cutoff is larger due to the ration of terms being x2/3! instead of x2/2!, but we use the same cutoff as for cosine. We now use a cutoff of 2-27 for double precision and 2-12 for single precision. 2-27 was used in all cases but was misspelled 2**27 in comments. Wrong and sloppy cutoffs just cause missed optimizations (provided the rounding mode is to nearest -- other modes just aren't supported).
# 97413	28-May-2002	alfred	Fix formatting, this is hard to explain, so I'll show one example. - float ynf(int n, float x) /* wrapper ynf / +float +ynf(int n, float x) / wrapper ynf */ This is because the __STDC__ stuff was indented. Reviewed by: md5
# 97409	28-May-2002	alfred	Assume __STDC__, remove non-__STDC__ code. Reviewed by: md5
# 50476	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 22993	22-Feb-1997	peter	Revert $FreeBSD$ to $Id$
# 21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 8870	30-May-1995	rgrimes	Remove trailing whitespace.
# 2117	19-Aug-1994	jkh	This commit was generated by cvs2svn to compensate for changes in r2116, which included commits to RCS files with non-trunk default branches.
# 2116	19-Aug-1994	jkh	J.T. Conklin's latest version of the Sun math library. -- Begin comments from J.T. Conklin: The most significant improvement is the addition of "float" versions of the math functions that take float arguments, return floats, and do all operations in floating point. This doesn't help (performance) much on the i386, but they are still nice to have. The float versions were orginally done by Cygnus' Ian Taylor when fdlibm was integrated into the libm we support for embedded systems. I gave Ian a copy of my libm as a starting point since I had already fixed a lot of bugs & problems in Sun's original code. After he was done, I cleaned it up a bit and integrated the changes back into my libm. -- End comments Reviewed by: jkh Submitted by: jtc