musl/src/math/__cosdf.c
Szabolcs Nagy e216951f50 math: use double_t for temporaries to avoid stores on i386
When FLT_EVAL_METHOD!=0 (only i386 with x87 fp) the excess
precision of an expression must be removed in an assignment.
(gcc needs -fexcess-precision=standard or -std=c99 for this)

This is done by extra load/store instructions which adds code
bloat when lot of temporaries are used and it makes the result
less precise in many cases.
Using double_t and float_t avoids these issues on i386 and
it makes no difference on other archs.

For now only a few functions are modified where the excess
precision is clearly beneficial (mostly polynomial evaluations
with temporaries).

object size differences on i386, gcc-4.8:
             old   new
__cosdf.o    123    95
__cos.o      199   169
__sindf.o    131    95
__sin.o      225   203
__tandf.o    207   151
__tan.o      605   499
erff.o      1470  1416
erf.o       1703  1649
j0f.o       1779  1745
j0.o        2308  2274
j1f.o       1602  1568
j1.o        2286  2252
tgamma.o    1431  1424
math/*.o   64164 63635
2013-05-15 23:08:52 +00:00

35 lines
1.1 KiB
C

/* origin: FreeBSD /usr/src/lib/msun/src/k_cosf.c */
/*
* Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com.
* Debugged and optimized by Bruce D. Evans.
*/
/*
* ====================================================
* Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
*
* Developed at SunPro, a Sun Microsystems, Inc. business.
* Permission to use, copy, modify, and distribute this
* software is freely granted, provided that this notice
* is preserved.
* ====================================================
*/
#include "libm.h"
/* |cos(x) - c(x)| < 2**-34.1 (~[-5.37e-11, 5.295e-11]). */
static const double
C0 = -0x1ffffffd0c5e81.0p-54, /* -0.499999997251031003120 */
C1 = 0x155553e1053a42.0p-57, /* 0.0416666233237390631894 */
C2 = -0x16c087e80f1e27.0p-62, /* -0.00138867637746099294692 */
C3 = 0x199342e0ee5069.0p-68; /* 0.0000243904487962774090654 */
float __cosdf(double x)
{
double_t r, w, z;
/* Try to optimize for parallel evaluation as in __tandf.c. */
z = x*x;
w = z*z;
r = C2+z*C3;
return ((1.0+z*C0) + w*C1) + (w*z)*r;
}