1dnl IBM POWER mpn_addmul_1 -- Multiply a limb vector with a limb and add the 2dnl result to a second limb vector. 3 4dnl Copyright 1992, 1994, 1999-2001 Free Software Foundation, Inc. 5 6dnl This file is part of the GNU MP Library. 7dnl 8dnl The GNU MP Library is free software; you can redistribute it and/or modify 9dnl it under the terms of either: 10dnl 11dnl * the GNU Lesser General Public License as published by the Free 12dnl Software Foundation; either version 3 of the License, or (at your 13dnl option) any later version. 14dnl 15dnl or 16dnl 17dnl * the GNU General Public License as published by the Free Software 18dnl Foundation; either version 2 of the License, or (at your option) any 19dnl later version. 20dnl 21dnl or both in parallel, as here. 22dnl 23dnl The GNU MP Library is distributed in the hope that it will be useful, but 24dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 25dnl or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 26dnl for more details. 27dnl 28dnl You should have received copies of the GNU General Public License and the 29dnl GNU Lesser General Public License along with the GNU MP Library. If not, 30dnl see https://www.gnu.org/licenses/. 31 32 33dnl INPUT PARAMETERS 34dnl res_ptr r3 35dnl s1_ptr r4 36dnl size r5 37dnl s2_limb r6 38 39dnl The POWER architecture has no unsigned 32x32->64 bit multiplication 40dnl instruction. To obtain that operation, we have to use the 32x32->64 41dnl signed multiplication instruction, and add the appropriate compensation to 42dnl the high limb of the result. We add the multiplicand if the multiplier 43dnl has its most significant bit set, and we add the multiplier if the 44dnl multiplicand has its most significant bit set. We need to preserve the 45dnl carry flag between each iteration, so we have to compute the compensation 46dnl carefully (the natural, srai+and doesn't work). Since all POWER can 47dnl branch in zero cycles, we use conditional branches for the compensation. 48 49include(`../config.m4') 50 51ASM_START() 52PROLOGUE(mpn_addmul_1) 53 cal 3,-4(3) 54 l 0,0(4) 55 cmpi 0,6,0 56 mtctr 5 57 mul 9,0,6 58 srai 7,0,31 59 and 7,7,6 60 mfmq 8 61 cax 9,9,7 62 l 7,4(3) 63 a 8,8,7 C add res_limb 64 blt Lneg 65Lpos: bdz Lend 66 67Lploop: lu 0,4(4) 68 stu 8,4(3) 69 cmpi 0,0,0 70 mul 10,0,6 71 mfmq 0 72 ae 8,0,9 C low limb + old_cy_limb + old cy 73 l 7,4(3) 74 aze 10,10 C propagate cy to new cy_limb 75 a 8,8,7 C add res_limb 76 bge Lp0 77 cax 10,10,6 C adjust high limb for negative limb from s1 78Lp0: bdz Lend0 79 lu 0,4(4) 80 stu 8,4(3) 81 cmpi 0,0,0 82 mul 9,0,6 83 mfmq 0 84 ae 8,0,10 85 l 7,4(3) 86 aze 9,9 87 a 8,8,7 88 bge Lp1 89 cax 9,9,6 C adjust high limb for negative limb from s1 90Lp1: bdn Lploop 91 92 b Lend 93 94Lneg: cax 9,9,0 95 bdz Lend 96Lnloop: lu 0,4(4) 97 stu 8,4(3) 98 cmpi 0,0,0 99 mul 10,0,6 100 mfmq 7 101 ae 8,7,9 102 l 7,4(3) 103 ae 10,10,0 C propagate cy to new cy_limb 104 a 8,8,7 C add res_limb 105 bge Ln0 106 cax 10,10,6 C adjust high limb for negative limb from s1 107Ln0: bdz Lend0 108 lu 0,4(4) 109 stu 8,4(3) 110 cmpi 0,0,0 111 mul 9,0,6 112 mfmq 7 113 ae 8,7,10 114 l 7,4(3) 115 ae 9,9,0 C propagate cy to new cy_limb 116 a 8,8,7 C add res_limb 117 bge Ln1 118 cax 9,9,6 C adjust high limb for negative limb from s1 119Ln1: bdn Lnloop 120 b Lend 121 122Lend0: cal 9,0(10) 123Lend: st 8,4(3) 124 aze 3,9 125 br 126EPILOGUE(mpn_addmul_1) 127