1dnl Alpha ev67 mpn_popcount -- mpn bit population count. 2 3dnl Copyright 2003, 2005 Free Software Foundation, Inc. 4 5dnl This file is part of the GNU MP Library. 6dnl 7dnl The GNU MP Library is free software; you can redistribute it and/or modify 8dnl it under the terms of either: 9dnl 10dnl * the GNU Lesser General Public License as published by the Free 11dnl Software Foundation; either version 3 of the License, or (at your 12dnl option) any later version. 13dnl 14dnl or 15dnl 16dnl * the GNU General Public License as published by the Free Software 17dnl Foundation; either version 2 of the License, or (at your option) any 18dnl later version. 19dnl 20dnl or both in parallel, as here. 21dnl 22dnl The GNU MP Library is distributed in the hope that it will be useful, but 23dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 24dnl or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 25dnl for more details. 26dnl 27dnl You should have received copies of the GNU General Public License and the 28dnl GNU Lesser General Public License along with the GNU MP Library. If not, 29dnl see https://www.gnu.org/licenses/. 30 31include(`../config.m4') 32 33 34C ev67: 1.5 cycles/limb 35 36 37C unsigned long mpn_popcount (mp_srcptr src, mp_size_t size); 38C 39C This schedule seems necessary for the full 1.5 c/l, the IQ can't quite hide 40C all latencies, the addq's must be deferred to the next iteration. 41C 42C Since we need just 3 instructions per limb, further unrolling could approach 43C 1.0 c/l. 44C 45C The main loop processes two limbs at a time. An odd size is handled by 46C processing src[0] at the start. If the size is even that result is 47C discarded, and src[0] is repeated by the main loop. 48C 49 50ASM_START() 51PROLOGUE(mpn_popcount) 52 53 C r16 src 54 C r17 size 55 56 ldq r0, 0(r16) C L0 src[0] 57 and r17, 1, r8 C U1 1 if size odd 58 srl r17, 1, r17 C U0 size, limb pairs 59 60 s8addq r8, r16, r16 C L1 src++ if size odd 61 ctpop r0, r0 C U0 62 beq r17, L(one) C U1 if size==1 63 64 cmoveq r8, r31, r0 C L discard first limb if size even 65 clr r3 C L 66 67 clr r4 C L 68 unop C U 69 unop C L 70 unop C U 71 72 73 ALIGN(16) 74L(top): 75 C r0 total accumulating 76 C r3 pop 0 77 C r4 pop 1 78 C r16 src, incrementing 79 C r17 size, decrementing 80 81 ldq r1, 0(r16) C L 82 ldq r2, 8(r16) C L 83 lda r16, 16(r16) C U 84 lda r17, -1(r17) C U 85 86 addq r0, r3, r0 C L 87 addq r0, r4, r0 C L 88 ctpop r1, r3 C U0 89 ctpop r2, r4 C U0 90 91 ldl r31, 512(r16) C L prefetch 92 bne r17, L(top) C U 93 94 95 addq r0, r3, r0 C L 96 addq r0, r4, r0 C U 97L(one): 98 ret r31, (r26), 1 C L0 99 100EPILOGUE() 101ASM_END() 102