1This directory contains mpn functions for various HP PA-RISC chips.  Code
2that runs faster on the PA7100 and later implementations, is in the pa7100
3directory.
4
5RELEVANT OPTIMIZATION ISSUES
6
7  Load and Store timing
8
9On the PA7000 no memory instructions can issue the two cycles after a store.
10For the PA7100, this is reduced to one cycle.
11
12The PA7100 has a lookup-free cache, so it helps to schedule loads and the
13dependent instruction really far from each other.
14
15STATUS
16
171. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
18   instructions bwlow (but some sw pipelining is needed to avoid the
19   xmpyu-fstds delay):
20
21	fldds	s1_ptr
22
23	xmpyu
24	fstds	N(%r30)
25	xmpyu
26	fstds	N(%r30)
27
28	ldws	N(%r30)
29	ldws	N(%r30)
30	ldws	N(%r30)
31	ldws	N(%r30)
32
33	addc
34	stws	res_ptr
35	addc
36	stws	res_ptr
37
38	addib	Loop
39
402. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
41   (asymptotically) on the PA7100, using the instructions below.  With proper
42   sw pipelining and the unrolling level below, the speed becomes 8
43   cycles/limb.
44
45	fldds	s1_ptr
46	fldds	s1_ptr
47
48	xmpyu
49	fstds	N(%r30)
50	xmpyu
51	fstds	N(%r30)
52	xmpyu
53	fstds	N(%r30)
54	xmpyu
55	fstds	N(%r30)
56
57	ldws	N(%r30)
58	ldws	N(%r30)
59	ldws	N(%r30)
60	ldws	N(%r30)
61	ldws	N(%r30)
62	ldws	N(%r30)
63	ldws	N(%r30)
64	ldws	N(%r30)
65	addc
66	addc
67	addc
68	addc
69	addc	%r0,%r0,cy-limb
70
71	ldws	res_ptr
72	ldws	res_ptr
73	ldws	res_ptr
74	ldws	res_ptr
75	add
76	stws	res_ptr
77	addc
78	stws	res_ptr
79	addc
80	stws	res_ptr
81	addc
82	stws	res_ptr
83
84	addib
85