Searched refs:limbs (Results 76 - 100 of 148) sorted by relevance

123456

/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/
H A Dcopyi.asm61 C above 50 limbs or so.
H A Dsqr_basecase.asm183 C ecx counter, limbs, negative
334 C ebp scratch (fetched dst limbs)
H A Ddive_1.asm144 C ebp counter, limbs, negative
/netbsd-current/external/lgpl3/gmp/dist/mpn/ia64/
H A Dmode1o.asm86 C limbs it's the other way around, ie. we have the inverse and are waiting
99 C And the load for the low limbs src[0] and src[1] can be initiated long
165 C Process two source limbs per iteration using a two-limb inverse and a
184 C or 18 cycles for 3 limbs, giving 5.66 or 6.0 cycles/limb.
/netbsd-current/external/lgpl3/gmp/dist/mpn/pa32/hppa1_1/pa7100/
H A Dsub_n.asm47 sub %r20,%r19,%r28 C subtract first limbs ignoring cy
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k7/
H A Dmul_basecase.asm34 C K7: approx 4.42 cycles per cross product at around 20x20 limbs (16
35 C limbs/loop unrolling).
117 C two limbs by one limb
393 C VAR_ADJUST is the negative of how many limbs the leals in the inner loop
H A Dsqr_basecase.asm35 C (measured on the speed difference between 25 and 50 limbs, which is
158 C Three limbs
400 C edx VAR_COUNTER, limbs, negative
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/p6/
H A Dsqr_basecase.asm35 C product (measured on the speed difference between 20 and 40 limbs,
181 C three limbs
330 C ecx counter, limbs, negative, -(size-1) to -1
423 C edx VAR_COUNTER, limbs, negative
H A Dmul_basecase.asm34 C P6: approx 6.5 cycles per cross product (16 limbs/loop unrolling).
114 C two limbs by one limb
396 C VAR_ADJUST is the negative of how many limbs the leals in the inner loop
H A Dmode1o.asm134 C ebx counter, limbs, negative
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/pentium4/sse2/
H A Daddlsh1_n.asm82 C ecx counter, limbs, negative
H A Dadd_n.asm80 C ecx counter, limbs, negative
H A Dsub_n.asm80 C ecx counter, limbs, negative
H A Ddivrem_1.asm99 dnl At 4 limbs the div is a touch faster than the mul (and of course
100 dnl simpler), so start the mul from 5 limbs.
420 movd -4(%esi), %mm1 C next src limbs
/netbsd-current/external/lgpl3/gmp/dist/mpn/arm64/
H A Dcopyd.asm80 C Copy last 0-3 limbs. Note that rp is aligned after loop, but not when we
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/pentium/
H A Ddive_1.asm175 C ebx counter, limbs, negative
211 C ebx counter, limbs, negative
H A Dbdiv_q_1.asm180 C ebx counter, limbs, negative
214 C ebx counter, limbs, negative
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k6/
H A Dmul_1.asm145 C limiting factor. At 4 limbs/loop and 1 cycle/loop of overhead it's 6.25
227 C ecx 0 to 3 representing respectively 4 to 1 further limbs
H A Dmul_basecase.asm34 C K6: approx 9.0 cycles per cross product on 30x30 limbs (with 16 limbs/loop
129 C two limbs by one limb
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86_64/fastsse/
H A Dcopyd-palignr.asm57 C limbs. We use the SSSE3 palignr instruction when rp - up = 8 (mod 16).
59 C For operands of < COPYD_SSE_THRESHOLD limbs, we use a plain 64-bit loop,
H A Dcopyi-palignr.asm59 C limbs. We use the SSSE3 palignr instruction when rp - up = 8 (mod 16). That
63 C For operands of < COPYI_SSE_THRESHOLD limbs, we use a plain 64-bit loop,
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k6/mmx/
H A Dlogops_n.asm112 C able to get 1.0 with just a 4 limb loop, being 3 instructions per 2 limbs.
113 C The others are 4 instructions per 2 limbs, and so can only approach 1.0
/netbsd-current/external/lgpl3/gmp/dist/mpn/sparc64/ultrasparct3/
H A Daorslsh_n.asm39 C For sublsh_n we combine the two shifted limbs using xnor, using the identity
/netbsd-current/external/lgpl3/gmp/dist/mpn/alpha/ev67/
H A Dhamdist.asm43 C The main loop processes two limbs from each operand on each iteration. An
/netbsd-current/external/lgpl3/gmp/dist/mpn/x86/k7/mmx/
H A Dpopham.asm47 C execution units, possibly leading to 3.25 c/l (13 cycles for 4 limbs).

Completed in 156 milliseconds

123456