1This directory contains mpn functions for various HP PA-RISC chips. Code 2that runs faster on the PA7100 and later implementations, is in the pa7100 3directory. 4 5RELEVANT OPTIMIZATION ISSUES 6 7 Load and Store timing 8 9On the PA7000 no memory instructions can issue the two cycles after a store. 10For the PA7100, this is reduced to one cycle. 11 12The PA7100 has a lookup-free cache, so it helps to schedule loads and the 13dependent instruction really far from each other. 14 15STATUS 16 171. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the 18 instructions bwlow (but some sw pipelining is needed to avoid the 19 xmpyu-fstds delay): 20 21 fldds s1_ptr 22 23 xmpyu 24 fstds N(%r30) 25 xmpyu 26 fstds N(%r30) 27 28 ldws N(%r30) 29 ldws N(%r30) 30 ldws N(%r30) 31 ldws N(%r30) 32 33 addc 34 stws res_ptr 35 addc 36 stws res_ptr 37 38 addib Loop 39 402. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb 41 (asymptotically) on the PA7100, using the instructions below. With proper 42 sw pipelining and the unrolling level below, the speed becomes 8 43 cycles/limb. 44 45 fldds s1_ptr 46 fldds s1_ptr 47 48 xmpyu 49 fstds N(%r30) 50 xmpyu 51 fstds N(%r30) 52 xmpyu 53 fstds N(%r30) 54 xmpyu 55 fstds N(%r30) 56 57 ldws N(%r30) 58 ldws N(%r30) 59 ldws N(%r30) 60 ldws N(%r30) 61 ldws N(%r30) 62 ldws N(%r30) 63 ldws N(%r30) 64 ldws N(%r30) 65 addc 66 addc 67 addc 68 addc 69 addc %r0,%r0,cy-limb 70 71 ldws res_ptr 72 ldws res_ptr 73 ldws res_ptr 74 ldws res_ptr 75 add 76 stws res_ptr 77 addc 78 stws res_ptr 79 addc 80 stws res_ptr 81 addc 82 stws res_ptr 83 84 addib 85