/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/ |
H A D | copyi.asm | 61 C above 50 limbs or so.
|
H A D | sqr_basecase.asm | 183 C ecx counter, limbs, negative 334 C ebp scratch (fetched dst limbs)
|
H A D | dive_1.asm | 144 C ebp counter, limbs, negative
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/ia64/ |
H A D | mode1o.asm | 86 C limbs it's the other way around, ie. we have the inverse and are waiting 99 C And the load for the low limbs src[0] and src[1] can be initiated long 165 C Process two source limbs per iteration using a two-limb inverse and a 184 C or 18 cycles for 3 limbs, giving 5.66 or 6.0 cycles/limb.
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/pa32/hppa1_1/pa7100/ |
H A D | sub_n.asm | 47 sub %r20,%r19,%r28 C subtract first limbs ignoring cy
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k7/ |
H A D | mul_basecase.asm | 34 C K7: approx 4.42 cycles per cross product at around 20x20 limbs (16 35 C limbs/loop unrolling). 117 C two limbs by one limb 393 C VAR_ADJUST is the negative of how many limbs the leals in the inner loop
|
H A D | sqr_basecase.asm | 35 C (measured on the speed difference between 25 and 50 limbs, which is 158 C Three limbs 400 C edx VAR_COUNTER, limbs, negative
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/p6/ |
H A D | sqr_basecase.asm | 35 C product (measured on the speed difference between 20 and 40 limbs, 181 C three limbs 330 C ecx counter, limbs, negative, -(size-1) to -1 423 C edx VAR_COUNTER, limbs, negative
|
H A D | mul_basecase.asm | 34 C P6: approx 6.5 cycles per cross product (16 limbs/loop unrolling). 114 C two limbs by one limb 396 C VAR_ADJUST is the negative of how many limbs the leals in the inner loop
|
H A D | mode1o.asm | 134 C ebx counter, limbs, negative
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/pentium4/sse2/ |
H A D | addlsh1_n.asm | 82 C ecx counter, limbs, negative
|
H A D | add_n.asm | 80 C ecx counter, limbs, negative
|
H A D | sub_n.asm | 80 C ecx counter, limbs, negative
|
H A D | divrem_1.asm | 99 dnl At 4 limbs the div is a touch faster than the mul (and of course 100 dnl simpler), so start the mul from 5 limbs. 420 movd -4(%esi), %mm1 C next src limbs
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/arm64/ |
H A D | copyd.asm | 80 C Copy last 0-3 limbs. Note that rp is aligned after loop, but not when we
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/pentium/ |
H A D | dive_1.asm | 175 C ebx counter, limbs, negative 211 C ebx counter, limbs, negative
|
H A D | bdiv_q_1.asm | 180 C ebx counter, limbs, negative 214 C ebx counter, limbs, negative
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k6/ |
H A D | mul_1.asm | 145 C limiting factor. At 4 limbs/loop and 1 cycle/loop of overhead it's 6.25 227 C ecx 0 to 3 representing respectively 4 to 1 further limbs
|
H A D | mul_basecase.asm | 34 C K6: approx 9.0 cycles per cross product on 30x30 limbs (with 16 limbs/loop 129 C two limbs by one limb
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86_64/fastsse/ |
H A D | copyd-palignr.asm | 57 C limbs. We use the SSSE3 palignr instruction when rp - up = 8 (mod 16). 59 C For operands of < COPYD_SSE_THRESHOLD limbs, we use a plain 64-bit loop,
|
H A D | copyi-palignr.asm | 59 C limbs. We use the SSSE3 palignr instruction when rp - up = 8 (mod 16). That 63 C For operands of < COPYI_SSE_THRESHOLD limbs, we use a plain 64-bit loop,
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k6/mmx/ |
H A D | logops_n.asm | 112 C able to get 1.0 with just a 4 limb loop, being 3 instructions per 2 limbs. 113 C The others are 4 instructions per 2 limbs, and so can only approach 1.0
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/sparc64/ultrasparct3/ |
H A D | aorslsh_n.asm | 39 C For sublsh_n we combine the two shifted limbs using xnor, using the identity
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/alpha/ev67/ |
H A D | hamdist.asm | 43 C The main loop processes two limbs from each operand on each iteration. An
|
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k7/mmx/ |
H A D | popham.asm | 47 C execution units, possibly leading to 3.25 c/l (13 cycles for 4 limbs).
|