NameDateSize

..13-May-201340

addaddmul_1msb0.asmH A D22-Nov-20122.7 KiB

addmul_2.asmH A D22-Nov-20123.4 KiB

aorrlsh1_n.asmH A D22-Nov-20123.3 KiB

aorrlsh2_n.asmH A D22-Nov-20123.3 KiB

aorrlsh_n.asmH A D13-May-20133.2 KiB

aors_n.asmH A D13-May-20132.9 KiB

aorsmul_1.asmH A D22-Nov-20123 KiB

atom/H13-May-20134

bd1/H13-May-20133

bdiv_dbm1c.asmH A D22-Nov-20121.9 KiB

bdiv_q_1.asmH A D22-Nov-20123.2 KiB

bobcat/H13-May-20133

com.asmH A D22-Nov-20121.6 KiB

copyd.asmH A D22-Nov-20121.6 KiB

copyi.asmH A D22-Nov-20121.6 KiB

core2/H13-May-201310

coreinhm/H13-May-20133

coreisbr/H13-May-20133

darwin.m4H A D22-Nov-20121 KiB

dive_1.asmH A D22-Nov-20123.1 KiB

divrem_1.asmH A D22-Nov-20125.5 KiB

divrem_2.asmH A D13-May-20134.8 KiB

fat/H13-May-20139

gcd_1.asmH A D13-May-20132.7 KiB

gmp-mparam.hH A D13-May-20139.8 KiB

invert_limb.asmH A D22-Nov-20124.3 KiB

logops_n.asmH A D22-Nov-20124.3 KiB

lshift.asmH A D22-Nov-20124.5 KiB

lshiftc.asmH A D22-Nov-20123.5 KiB

lshsub_n.asmH A D13-May-20132.8 KiB

mod_1_4.asmH A D22-Nov-20124.1 KiB

mod_34lsub1.asmH A D22-Nov-20123.4 KiB

mode1o.asmH A D22-Nov-20124.4 KiB

mul_1.asmH A D22-Nov-20122.9 KiB

mul_2.asmH A D22-Nov-20123.2 KiB

mul_basecase.asmH A D22-Nov-20128 KiB

nano/H13-May-20133

pentium4/H13-May-20138

popham.asmH A D22-Nov-20123.3 KiB

READMEH A D22-Nov-20122.4 KiB

redc_1.asmH A D22-Nov-20126.3 KiB

rsh1aors_n.asmH A D22-Nov-20123.5 KiB

rshift.asmH A D22-Nov-20123.3 KiB

sqr_basecase.asmH A D22-Nov-201213.6 KiB

sublsh1_n.asmH A D22-Nov-20123 KiB

x86_64-defs.m4H A D22-Nov-20123.9 KiB

README

1Copyright 2003, 2004, 2006, 2008 Free Software Foundation, Inc.
2
3This file is part of the GNU MP Library.
4
5The GNU MP Library is free software; you can redistribute it and/or modify
6it under the terms of the GNU Lesser General Public License as published by
7the Free Software Foundation; either version 3 of the License, or (at your
8option) any later version.
9
10The GNU MP Library is distributed in the hope that it will be useful, but
11WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
12or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
13License for more details.
14
15You should have received a copy of the GNU Lesser General Public License
16along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.
17
18
19
20
21
22			AMD64 MPN SUBROUTINES
23
24
25This directory contains mpn functions for AMD64 chips.  It is also useful
26for 64-bit Pentiums, and "Core 2".
27
28
29		     RELEVANT OPTIMIZATION ISSUES
30
31The Opteron and Athlon64 can sustain up to 3 instructions per cycle, but in
32practice that is only possible for integer instructions.  But almost any
33three integer instructions can issue simultaneously, including any 3 ALU
34operations, including shifts.  Up to two memory operations can issue each
35cycle.
36
37Scheduling typically requires that load-use instructions are split into
38separate load and use instructions.  That requires more decode resources,
39and it is rarely a win.  Opteron/Athlon64 have deep out-of-order core.
40
41
42Optimizing for 64-bit Pentium4 is probably a waste of time, as the most
43critical instructions are very poorly implemented here.  Perhaps we could
44save a cycle or two, but the most common loops now run at between 10 and 22
45cycles, so a saved cycle isn't too exciting.
46
47
48The new spin of the venerable P6 core, the "Core 2" is much better than the
49Pentium4 for the GMP loops.  Its integer pipeline is somewhat similar to to
50the Opteron/Athlon64 pipeline, except that the GMP favourites ADC/SBB and
51MUL are slower.  Furthermore, an INC/DEC followed by ADC/SBB incur a
52pipeline stall of around 10 cycles.  The default mpn_add_n and mpn_sub_n
53code suffers badly from the stall.  The code in the core2 subdirectory uses
54the almost forgotten instruction JRCXZ for loop control, and updates the
55induction variable using LEA.
56
57
58
59REFERENCES
60
61"System V Application Binary Interface AMD64 Architecture Processor
62Supplement", draft version 0.99, December 2007.
63http://www.x86-64.org/documentation/abi.pdf
64