NameDateSize

..Today41

add_n.asmH A D19-Jun-20111.6 KiB

gmp-mparam.hH A D19-Jun-20111.5 KiB

hppa1_1/H12-Oct-201511

hppa2_0/H12-Oct-20157

lshift.asmH A D19-Jun-20111.6 KiB

pa-defs.m4H A D19-Jun-20111.6 KiB

READMEH A D19-Jun-20113.2 KiB

rshift.asmH A D19-Jun-20111.6 KiB

sub_n.asmH A D19-Jun-20111.6 KiB

udiv.asmH A D19-Jun-20116.4 KiB

README

1Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc.
2
3This file is part of the GNU MP Library.
4
5The GNU MP Library is free software; you can redistribute it and/or modify
6it under the terms of the GNU Lesser General Public License as published by
7the Free Software Foundation; either version 3 of the License, or (at your
8option) any later version.
9
10The GNU MP Library is distributed in the hope that it will be useful, but
11WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
12or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
13License for more details.
14
15You should have received a copy of the GNU Lesser General Public License
16along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.
17
18
19
20
21
22
23This directory contains mpn functions for various HP PA-RISC chips.  Code
24that runs faster on the PA7100 and later implementations, is in the pa7100
25directory.
26
27RELEVANT OPTIMIZATION ISSUES
28
29  Load and Store timing
30
31On the PA7000 no memory instructions can issue the two cycles after a store.
32For the PA7100, this is reduced to one cycle.
33
34The PA7100 has a lookup-free cache, so it helps to schedule loads and the
35dependent instruction really far from each other.
36
37STATUS
38
391. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
40   instructions below (but some sw pipelining is needed to avoid the
41   xmpyu-fstds delay):
42
43	fldds	s1_ptr
44
45	xmpyu
46	fstds	N(%r30)
47	xmpyu
48	fstds	N(%r30)
49
50	ldws	N(%r30)
51	ldws	N(%r30)
52	ldws	N(%r30)
53	ldws	N(%r30)
54
55	addc
56	stws	res_ptr
57	addc
58	stws	res_ptr
59
60	addib	Loop
61
622. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
63   (asymptotically) on the PA7100, using the instructions below.  With proper
64   sw pipelining and the unrolling level below, the speed becomes 8
65   cycles/limb.
66
67	fldds	s1_ptr
68	fldds	s1_ptr
69
70	xmpyu
71	fstds	N(%r30)
72	xmpyu
73	fstds	N(%r30)
74	xmpyu
75	fstds	N(%r30)
76	xmpyu
77	fstds	N(%r30)
78
79	ldws	N(%r30)
80	ldws	N(%r30)
81	ldws	N(%r30)
82	ldws	N(%r30)
83	ldws	N(%r30)
84	ldws	N(%r30)
85	ldws	N(%r30)
86	ldws	N(%r30)
87	addc
88	addc
89	addc
90	addc
91	addc	%r0,%r0,cy-limb
92
93	ldws	res_ptr
94	ldws	res_ptr
95	ldws	res_ptr
96	ldws	res_ptr
97	add
98	stws	res_ptr
99	addc
100	stws	res_ptr
101	addc
102	stws	res_ptr
103	addc
104	stws	res_ptr
105
106	addib
107
1083. For the PA8000 we have to stick to using 32-bit limbs before compiler
109   support emerges.  But we want to use 64-bit operations whenever possible,
110   in particular for loads and stores.  It is possible to handle mpn_add_n
111   efficiently by rotating (when s1/s2 are aligned), masking+bit field
112   inserting when (they are not).  The speed should double compared to the
113   code used today.
114
115
116
117
118LABEL SYNTAX
119
120The HP-UX assembler takes labels starting in column 0 with no colon,
121
122	L$loop  ldws,mb -4(0,%r25),%r22
123
124Gas on hppa GNU/Linux however requires a colon,
125
126	L$loop: ldws,mb -4(0,%r25),%r22
127
128This is covered by using LDEF() from asm-defs.m4.  An alternative would be
129to use ".label" which is accepted by both,
130
131		.label  L$loop
132		ldws,mb -4(0,%r25),%r22
133
134but that's not as nice to look at, not if you're used to assembler code
135having labels in column 0.
136
137
138
139
140REFERENCES
141
142Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998,
143part number 92432-90012.
144
145
146
147----------------
148Local variables:
149mode: text
150fill-column: 76
151End:
152