ppro.md revision 169689
1169689Skan;; Scheduling for the Intel P6 family of processors 2169689Skan;; Copyright (C) 2004, 2005 Free Software Foundation, Inc. 3117395Skan;; 4132718Skan;; This file is part of GCC. 5117395Skan;; 6132718Skan;; GCC is free software; you can redistribute it and/or modify 7117395Skan;; it under the terms of the GNU General Public License as published by 8117395Skan;; the Free Software Foundation; either version 2, or (at your option) 9117395Skan;; any later version. 10117395Skan;; 11132718Skan;; GCC is distributed in the hope that it will be useful, 12117395Skan;; but WITHOUT ANY WARRANTY; without even the implied warranty of 13117395Skan;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14117395Skan;; GNU General Public License for more details. 15117395Skan;; 16117395Skan;; You should have received a copy of the GNU General Public License 17132718Skan;; along with GCC; see the file COPYING. If not, write to 18169689Skan;; the Free Software Foundation, 51 Franklin Street, Fifth Floor, 19169689Skan;; Boston, MA 02110-1301, USA. */ 20117395Skan 21169689Skan;; The P6 family includes the Pentium Pro, Pentium II, Pentium III, Celeron 22169689Skan;; and Xeon lines of CPUs. The DFA scheduler description in this file is 23169689Skan;; based on information that can be found in the following three documents: 24169689Skan;; 25169689Skan;; "P6 Family of Processors Hardware Developer's Manual", 26169689Skan;; Intel, September 1999. 27169689Skan;; 28169689Skan;; "Intel Architecture Optimization Manual", 29169689Skan;; Intel, 1999 (Order Number: 245127-001). 30169689Skan;; 31169689Skan;; "How to optimize for the Pentium family of microprocessors", 32169689Skan;; by Agner Fog, PhD. 33169689Skan;; 34169689Skan;; The P6 pipeline has three major components: 35169689Skan;; 1) the FETCH/DECODE unit, an in-order issue front-end 36169689Skan;; 2) the DISPATCH/EXECUTE unit, which is the out-of-order core 37169689Skan;; 3) the RETIRE unit, an in-order retirement unit 38169689Skan;; 39169689Skan;; So, the P6 CPUs have out-of-order cores, but the instruction decoder and 40169689Skan;; retirement unit are naturally in-order. 41169689Skan;; 42169689Skan;; BUS INTERFACE UNIT 43169689Skan;; / \ 44169689Skan;; L1 ICACHE L1 DCACHE 45169689Skan;; / | \ | \ 46169689Skan;; DECODER0 DECODER1 DECODER2 DISP/EXEC RETIRE 47169689Skan;; \ | / | | 48169689Skan;; INSTRUCTION POOL __________|_______/ 49169689Skan;; (inc. reorder buffer) 50169689Skan;; 51169689Skan;; Since the P6 CPUs execute instructions out-of-order, the most important 52169689Skan;; consideration in performance tuning is making sure enough micro-ops are 53169689Skan;; ready for execution in the out-of-order core, while not stalling the 54169689Skan;; decoder. 55169689Skan;; 56169689Skan;; TODO: 57169689Skan;; - Find a less crude way to model complex instructions, in 58169689Skan;; particular how many cycles they take to be decoded. 59169689Skan;; - Include decoder latencies in the total reservation latencies. 60169689Skan;; This isn't necessary right now because we assume for every 61169689Skan;; instruction that it never blocks a decoder. 62169689Skan;; - Figure out where the p0 and p1 reservations come from. These 63169689Skan;; appear not to be in the manual (e.g. why is cld "(p0+p1)*2" 64169689Skan;; better than "(p0|p1)*4" ???) 65169689Skan;; - Lots more because I'm sure this is still far from optimal :-) 66117395Skan 67169689Skan;; The ppro_idiv and ppro_fdiv automata are used to model issue 68169689Skan;; latencies of idiv and fdiv type insns. 69169689Skan(define_automaton "ppro_decoder,ppro_core,ppro_idiv,ppro_fdiv,ppro_load,ppro_store") 70117395Skan 71169689Skan;; Simple instructions of the register-register form have only one uop. 72169689Skan;; Load instructions are also only one uop. Store instructions decode to 73169689Skan;; two uops, and simple read-modify instructions also take two uops. 74169689Skan;; Simple instructions of the register-memory form have two to three uops. 75169689Skan;; Simple read-modify-write instructions have four uops. The rules for 76169689Skan;; the decoder are simple: 77169689Skan;; - an instruction with 1 uop can be decoded by any of the three 78169689Skan;; decoders in one cycle. 79169689Skan;; - an instruction with 1 to 4 uops can be decoded only by decoder 0 80169689Skan;; but still in only one cycle. 81169689Skan;; - a complex (microcode) instruction can also only be decoded by 82169689Skan;; decoder 0, and this takes an unspecified number of cycles. 83169689Skan;; 84169689Skan;; The goal is to schedule such that we have a few-one-one uops sequence 85169689Skan;; in each cycle, to decode as many instructions per cycle as possible. 86169689Skan(define_cpu_unit "decoder0" "ppro_decoder") 87169689Skan(define_cpu_unit "decoder1" "ppro_decoder") 88169689Skan(define_cpu_unit "decoder2" "ppro_decoder") 89169689Skan 90169689Skan;; We first wish to find an instruction for decoder0, so exclude 91169689Skan;; decoder1 and decoder2 from being reserved until decoder 0 is 92169689Skan;; reserved. 93169689Skan(presence_set "decoder1" "decoder0") 94169689Skan(presence_set "decoder2" "decoder0") 95169689Skan 96169689Skan;; Most instructions can be decoded on any of the three decoders. 97169689Skan(define_reservation "decodern" "(decoder0|decoder1|decoder2)") 98169689Skan 99169689Skan;; The out-of-order core has five pipelines. During each cycle, the core 100169689Skan;; may dispatch zero or one uop on the port of any of the five pipelines 101169689Skan;; so the maximum number of dispatched uops per cycle is 5. In practicer, 102169689Skan;; 3 uops per cycle is more realistic. 103117395Skan;; 104169689Skan;; Two of the five pipelines contain several execution units: 105117395Skan;; 106169689Skan;; Port 0 Port 1 Port 2 Port 3 Port 4 107169689Skan;; ALU ALU LOAD SAC SDA 108169689Skan;; FPU JUE 109169689Skan;; AGU MMX 110169689Skan;; MMX P3FPU 111169689Skan;; P3FPU 112117395Skan;; 113169689Skan;; (SAC=Store Address Calculation, SDA=Store Data Unit, P3FPU = SSE unit, 114169689Skan;; JUE = Jump Execution Unit, AGU = Address Generation Unit) 115117395Skan;; 116169689Skan(define_cpu_unit "p0,p1" "ppro_core") 117169689Skan(define_cpu_unit "p2" "ppro_load") 118169689Skan(define_cpu_unit "p3,p4" "ppro_store") 119169689Skan(define_cpu_unit "idiv" "ppro_idiv") 120169689Skan(define_cpu_unit "fdiv" "ppro_fdiv") 121117395Skan 122169689Skan;; Only the irregular instructions have to be modeled here. A load 123169689Skan;; increases the latency by 2 or 3, or by nothing if the manual gives 124169689Skan;; a latency already. Store latencies are not accounted for. 125169689Skan;; 126169689Skan;; The simple instructions follow a very regular pattern of 1 uop per 127169689Skan;; reg-reg operation, 1 uop per load on port 2. and 2 uops per store 128169689Skan;; on port 4 and port 3. These instructions are modelled at the bottom 129169689Skan;; of this file. 130169689Skan;; 131169689Skan;; For microcoded instructions we don't know how many uops are produced. 132169689Skan;; These instructions are the "complex" ones in the Intel manuals. All 133169689Skan;; we _do_ know is that they typically produce four or more uops, so 134169689Skan;; they can only be decoded on decoder0. Modelling their latencies 135169689Skan;; doesn't make sense because we don't know how these instructions are 136169689Skan;; executed in the core. So we just model that they can only be decoded 137169689Skan;; on decoder 0, and say that it takes a little while before the result 138169689Skan;; is available. 139169689Skan(define_insn_reservation "ppro_complex_insn" 6 140169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 141169689Skan (eq_attr "type" "other,multi,call,callv,str")) 142169689Skan "decoder0") 143117395Skan 144169689Skan;; imov with memory operands does not use the integer units. 145169689Skan(define_insn_reservation "ppro_imov" 1 146169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 147169689Skan (and (eq_attr "memory" "none") 148169689Skan (eq_attr "type" "imov"))) 149169689Skan "decodern,(p0|p1)") 150117395Skan 151169689Skan(define_insn_reservation "ppro_imov_load" 4 152169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 153169689Skan (and (eq_attr "memory" "load") 154169689Skan (eq_attr "type" "imov"))) 155169689Skan "decodern,p2") 156117395Skan 157169689Skan(define_insn_reservation "ppro_imov_store" 1 158169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 159169689Skan (and (eq_attr "memory" "store") 160169689Skan (eq_attr "type" "imov"))) 161169689Skan "decoder0,p4+p3") 162117395Skan 163169689Skan;; imovx always decodes to one uop, and also doesn't use the integer 164169689Skan;; units if it has memory operands. 165169689Skan(define_insn_reservation "ppro_imovx" 1 166169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 167169689Skan (and (eq_attr "memory" "none") 168169689Skan (eq_attr "type" "imovx"))) 169169689Skan "decodern,(p0|p1)") 170117395Skan 171169689Skan(define_insn_reservation "ppro_imovx_load" 4 172169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 173169689Skan (and (eq_attr "memory" "load") 174169689Skan (eq_attr "type" "imovx"))) 175169689Skan "decodern,p2") 176117395Skan 177169689Skan;; lea executes on port 0 with latency one and throughput 1. 178169689Skan(define_insn_reservation "ppro_lea" 1 179169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 180169689Skan (and (eq_attr "memory" "none") 181169689Skan (eq_attr "type" "lea"))) 182169689Skan "decodern,p0") 183117395Skan 184169689Skan;; Shift and rotate execute on port 0 with latency and throughput 1. 185169689Skan;; The load and store units need to be reserved when memory operands 186169689Skan;; are involved. 187169689Skan(define_insn_reservation "ppro_shift_rotate" 1 188169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 189169689Skan (and (eq_attr "memory" "none") 190169689Skan (eq_attr "type" "ishift,ishift1,rotate,rotate1"))) 191169689Skan "decodern,p0") 192117395Skan 193169689Skan(define_insn_reservation "ppro_shift_rotate_mem" 4 194169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 195169689Skan (and (eq_attr "memory" "!none") 196169689Skan (eq_attr "type" "ishift,ishift1,rotate,rotate1"))) 197169689Skan "decoder0,p2+p0,p4+p3") 198117395Skan 199169689Skan(define_insn_reservation "ppro_cld" 2 200169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 201169689Skan (eq_attr "type" "cld")) 202169689Skan "decoder0,(p0+p1)*2") 203117395Skan 204169689Skan;; The P6 has a sophisticated branch prediction mechanism to minimize 205169689Skan;; latencies due to branching. In particular, it has a fast way to 206169689Skan;; execute branches that are taken multiple times (such as in loops). 207169689Skan;; Branches not taken suffer no penalty, and correctly predicted 208169689Skan;; branches cost only one fetch cycle. Mispredicted branches are very 209169689Skan;; costly: typically 15 cycles and possibly as many as 26 cycles. 210169689Skan;; 211169689Skan;; Unfortunately all this makes it quite difficult to properly model 212169689Skan;; the latencies for the compiler. Here I've made the choice to be 213169689Skan;; optimistic and assume branches are often predicted correctly, so 214169689Skan;; they have latency 1, and the decoders are not blocked. 215169689Skan;; 216169689Skan;; In addition, the model assumes a branch always decodes to only 1 uop, 217169689Skan;; which is not exactly true because there are a few instructions that 218169689Skan;; decode to 2 uops or microcode. But this probably gives the best 219169689Skan;; results because we can assume these instructions can decode on all 220169689Skan;; decoders. 221169689Skan(define_insn_reservation "ppro_branch" 1 222169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 223169689Skan (and (eq_attr "memory" "none") 224169689Skan (eq_attr "type" "ibr"))) 225169689Skan "decodern,p1") 226117395Skan 227169689Skan;; ??? Indirect branches probably have worse latency than this. 228169689Skan(define_insn_reservation "ppro_indirect_branch" 6 229169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 230169689Skan (and (eq_attr "memory" "!none") 231169689Skan (eq_attr "type" "ibr"))) 232169689Skan "decoder0,p2+p1") 233117395Skan 234169689Skan(define_insn_reservation "ppro_leave" 4 235169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 236169689Skan (eq_attr "type" "leave")) 237169689Skan "decoder0,p2+(p0|p1),(p0|p1)") 238117395Skan 239169689Skan;; imul has throughput one, but latency 4, and can only execute on port 0. 240169689Skan(define_insn_reservation "ppro_imul" 4 241169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 242169689Skan (and (eq_attr "memory" "none") 243169689Skan (eq_attr "type" "imul"))) 244169689Skan "decodern,p0") 245117395Skan 246169689Skan(define_insn_reservation "ppro_imul_mem" 4 247169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 248169689Skan (and (eq_attr "memory" "!none") 249169689Skan (eq_attr "type" "imul"))) 250169689Skan "decoder0,p2+p0") 251117395Skan 252169689Skan;; div and idiv are very similar, so we model them the same. 253169689Skan;; QI, HI, and SI have issue latency 12, 21, and 37, respectively. 254169689Skan;; These issue latencies are modelled via the ppro_div automaton. 255169689Skan(define_insn_reservation "ppro_idiv_QI" 19 256169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 257169689Skan (and (eq_attr "memory" "none") 258169689Skan (and (eq_attr "mode" "QI") 259169689Skan (eq_attr "type" "idiv")))) 260169689Skan "decoder0,(p0+idiv)*2,(p0|p1)+idiv,idiv*9") 261117395Skan 262169689Skan(define_insn_reservation "ppro_idiv_QI_load" 19 263169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 264169689Skan (and (eq_attr "memory" "load") 265169689Skan (and (eq_attr "mode" "QI") 266169689Skan (eq_attr "type" "idiv")))) 267169689Skan "decoder0,p2+p0+idiv,p0+idiv,(p0|p1)+idiv,idiv*9") 268169689Skan 269169689Skan(define_insn_reservation "ppro_idiv_HI" 23 270169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 271169689Skan (and (eq_attr "memory" "none") 272169689Skan (and (eq_attr "mode" "HI") 273169689Skan (eq_attr "type" "idiv")))) 274169689Skan "decoder0,(p0+idiv)*3,(p0|p1)+idiv,idiv*17") 275169689Skan 276169689Skan(define_insn_reservation "ppro_idiv_HI_load" 23 277169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 278169689Skan (and (eq_attr "memory" "load") 279169689Skan (and (eq_attr "mode" "HI") 280169689Skan (eq_attr "type" "idiv")))) 281169689Skan "decoder0,p2+p0+idiv,p0+idiv,(p0|p1)+idiv,idiv*18") 282169689Skan 283169689Skan(define_insn_reservation "ppro_idiv_SI" 39 284169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 285169689Skan (and (eq_attr "memory" "none") 286169689Skan (and (eq_attr "mode" "SI") 287169689Skan (eq_attr "type" "idiv")))) 288169689Skan "decoder0,(p0+idiv)*3,(p0|p1)+idiv,idiv*33") 289169689Skan 290169689Skan(define_insn_reservation "ppro_idiv_SI_load" 39 291169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 292169689Skan (and (eq_attr "memory" "load") 293169689Skan (and (eq_attr "mode" "SI") 294169689Skan (eq_attr "type" "idiv")))) 295169689Skan "decoder0,p2+p0+idiv,p0+idiv,(p0|p1)+idiv,idiv*34") 296169689Skan 297169689Skan;; Floating point operations always execute on port 0. 298169689Skan;; ??? where do these latencies come from? fadd has latency 3 and 299169689Skan;; has throughput "1/cycle (align with FADD)". What do they 300169689Skan;; mean and how can we model that? 301169689Skan(define_insn_reservation "ppro_fop" 3 302169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 303169689Skan (and (eq_attr "memory" "none,unknown") 304169689Skan (eq_attr "type" "fop"))) 305169689Skan "decodern,p0") 306169689Skan 307169689Skan(define_insn_reservation "ppro_fop_load" 5 308169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 309169689Skan (and (eq_attr "memory" "load") 310169689Skan (eq_attr "type" "fop"))) 311169689Skan "decoder0,p2+p0,p0") 312169689Skan 313169689Skan(define_insn_reservation "ppro_fop_store" 3 314169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 315169689Skan (and (eq_attr "memory" "store") 316169689Skan (eq_attr "type" "fop"))) 317169689Skan "decoder0,p0,p0,p0+p4+p3") 318169689Skan 319169689Skan(define_insn_reservation "ppro_fop_both" 5 320169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 321169689Skan (and (eq_attr "memory" "both") 322169689Skan (eq_attr "type" "fop"))) 323169689Skan "decoder0,p2+p0,p0+p4+p3") 324169689Skan 325169689Skan(define_insn_reservation "ppro_fsgn" 1 326169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 327169689Skan (eq_attr "type" "fsgn")) 328169689Skan "decodern,p0") 329169689Skan 330169689Skan(define_insn_reservation "ppro_fistp" 5 331169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 332169689Skan (eq_attr "type" "fistp")) 333169689Skan "decoder0,p0*2,p4+p3") 334169689Skan 335169689Skan(define_insn_reservation "ppro_fcmov" 2 336169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 337169689Skan (eq_attr "type" "fcmov")) 338169689Skan "decoder0,p0*2") 339169689Skan 340169689Skan(define_insn_reservation "ppro_fcmp" 1 341169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 342169689Skan (and (eq_attr "memory" "none") 343169689Skan (eq_attr "type" "fcmp"))) 344169689Skan "decodern,p0") 345169689Skan 346169689Skan(define_insn_reservation "ppro_fcmp_load" 4 347169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 348169689Skan (and (eq_attr "memory" "load") 349169689Skan (eq_attr "type" "fcmp"))) 350169689Skan "decoder0,p2+p0") 351169689Skan 352169689Skan(define_insn_reservation "ppro_fmov" 1 353169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 354169689Skan (and (eq_attr "memory" "none") 355169689Skan (eq_attr "type" "fmov"))) 356169689Skan "decodern,p0") 357169689Skan 358169689Skan(define_insn_reservation "ppro_fmov_load" 1 359169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 360169689Skan (and (eq_attr "memory" "load") 361169689Skan (and (eq_attr "mode" "!XF") 362169689Skan (eq_attr "type" "fmov")))) 363169689Skan "decodern,p2") 364169689Skan 365169689Skan(define_insn_reservation "ppro_fmov_XF_load" 3 366169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 367169689Skan (and (eq_attr "memory" "load") 368169689Skan (and (eq_attr "mode" "XF") 369169689Skan (eq_attr "type" "fmov")))) 370169689Skan "decoder0,(p2+p0)*2") 371169689Skan 372169689Skan(define_insn_reservation "ppro_fmov_store" 1 373169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 374169689Skan (and (eq_attr "memory" "store") 375169689Skan (and (eq_attr "mode" "!XF") 376169689Skan (eq_attr "type" "fmov")))) 377169689Skan "decodern,p0") 378169689Skan 379169689Skan(define_insn_reservation "ppro_fmov_XF_store" 3 380169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 381169689Skan (and (eq_attr "memory" "store") 382169689Skan (and (eq_attr "mode" "XF") 383169689Skan (eq_attr "type" "fmov")))) 384169689Skan "decoder0,(p0+p4),(p0+p3)") 385169689Skan 386169689Skan;; fmul executes on port 0 with latency 5. It has issue latency 2, 387169689Skan;; but we don't model this. 388169689Skan(define_insn_reservation "ppro_fmul" 5 389169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 390169689Skan (and (eq_attr "memory" "none") 391169689Skan (eq_attr "type" "fmul"))) 392169689Skan "decoder0,p0*2") 393169689Skan 394169689Skan(define_insn_reservation "ppro_fmul_load" 6 395169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 396169689Skan (and (eq_attr "memory" "load") 397169689Skan (eq_attr "type" "fmul"))) 398169689Skan "decoder0,p2+p0,p0") 399169689Skan 400169689Skan;; fdiv latencies depend on the mode of the operands. XFmode gives 401169689Skan;; a latency of 38 cycles, DFmode gives 32, and SFmode gives latency 18. 402169689Skan;; Division by a power of 2 takes only 9 cycles, but we cannot model 403169689Skan;; that. Throughput is equal to latency - 1, which we model using the 404169689Skan;; ppro_div automaton. 405169689Skan(define_insn_reservation "ppro_fdiv_SF" 18 406169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 407169689Skan (and (eq_attr "memory" "none") 408169689Skan (and (eq_attr "mode" "SF") 409169689Skan (eq_attr "type" "fdiv,fpspc")))) 410169689Skan "decodern,p0+fdiv,fdiv*16") 411169689Skan 412169689Skan(define_insn_reservation "ppro_fdiv_SF_load" 19 413169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 414169689Skan (and (eq_attr "memory" "load") 415169689Skan (and (eq_attr "mode" "SF") 416169689Skan (eq_attr "type" "fdiv,fpspc")))) 417169689Skan "decoder0,p2+p0+fdiv,fdiv*16") 418169689Skan 419169689Skan(define_insn_reservation "ppro_fdiv_DF" 32 420169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 421169689Skan (and (eq_attr "memory" "none") 422169689Skan (and (eq_attr "mode" "DF") 423169689Skan (eq_attr "type" "fdiv,fpspc")))) 424169689Skan "decodern,p0+fdiv,fdiv*30") 425169689Skan 426169689Skan(define_insn_reservation "ppro_fdiv_DF_load" 33 427169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 428169689Skan (and (eq_attr "memory" "load") 429169689Skan (and (eq_attr "mode" "DF") 430169689Skan (eq_attr "type" "fdiv,fpspc")))) 431169689Skan "decoder0,p2+p0+fdiv,fdiv*30") 432169689Skan 433169689Skan(define_insn_reservation "ppro_fdiv_XF" 38 434169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 435169689Skan (and (eq_attr "memory" "none") 436169689Skan (and (eq_attr "mode" "XF") 437169689Skan (eq_attr "type" "fdiv,fpspc")))) 438169689Skan "decodern,p0+fdiv,fdiv*36") 439169689Skan 440169689Skan(define_insn_reservation "ppro_fdiv_XF_load" 39 441169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 442169689Skan (and (eq_attr "memory" "load") 443169689Skan (and (eq_attr "mode" "XF") 444169689Skan (eq_attr "type" "fdiv,fpspc")))) 445169689Skan "decoder0,p2+p0+fdiv,fdiv*36") 446169689Skan 447169689Skan;; MMX instructions can execute on either port 0 or port 1 with a 448169689Skan;; throughput of 1/cycle. 449169689Skan;; on port 0: - ALU (latency 1) 450169689Skan;; - Multiplier Unit (latency 3) 451169689Skan;; on port 1: - ALU (latency 1) 452169689Skan;; - Shift Unit (latency 1) 453169689Skan;; 454169689Skan;; MMX instructions are either of the type reg-reg, or read-modify, and 455169689Skan;; except for mmxshft and mmxmul they can execute on port 0 or port 1, 456169689Skan;; so they behave as "simple" instructions that need no special modelling. 457169689Skan;; We only have to model mmxshft and mmxmul. 458169689Skan(define_insn_reservation "ppro_mmx_shft" 1 459169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 460169689Skan (and (eq_attr "memory" "none") 461169689Skan (eq_attr "type" "mmxshft"))) 462169689Skan "decodern,p1") 463169689Skan 464169689Skan(define_insn_reservation "ppro_mmx_shft_load" 2 465169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 466169689Skan (and (eq_attr "memory" "none") 467169689Skan (eq_attr "type" "mmxshft"))) 468169689Skan "decoder0,p2+p1") 469169689Skan 470169689Skan(define_insn_reservation "ppro_mmx_mul" 3 471169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 472169689Skan (and (eq_attr "memory" "none") 473169689Skan (eq_attr "type" "mmxmul"))) 474169689Skan "decodern,p0") 475169689Skan 476169689Skan(define_insn_reservation "ppro_mmx_mul_load" 3 477169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 478169689Skan (and (eq_attr "memory" "none") 479169689Skan (eq_attr "type" "mmxmul"))) 480169689Skan "decoder0,p2+p0") 481169689Skan 482169689Skan(define_insn_reservation "ppro_sse_mmxcvt" 4 483169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 484169689Skan (and (eq_attr "mode" "DI") 485169689Skan (eq_attr "type" "mmxcvt"))) 486169689Skan "decodern,p1") 487169689Skan 488169689Skan;; FIXME: These are Pentium III only, but we cannot tell here if 489169689Skan;; we're generating code for PentiumPro/Pentium II or Pentium III 490169689Skan;; (define_insn_reservation "ppro_sse_mmxshft" 2 491169689Skan;; (and (eq_attr "cpu" "pentiumpro,generic32") 492169689Skan;; (and (eq_attr "mode" "DI") 493169689Skan;; (eq_attr "type" "mmxshft"))) 494169689Skan;; "decodern,p0") 495169689Skan 496169689Skan;; SSE is very complicated, and takes a bit more effort. 497169689Skan;; ??? I assumed that all SSE instructions decode on decoder0, 498169689Skan;; but is this correct? 499169689Skan 500169689Skan;; The sfence instruction. 501169689Skan(define_insn_reservation "ppro_sse_sfence" 3 502169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 503169689Skan (and (eq_attr "memory" "unknown") 504169689Skan (eq_attr "type" "sse"))) 505169689Skan "decoder0,p4+p3") 506169689Skan 507169689Skan;; FIXME: This reservation is all wrong when we're scheduling sqrtss. 508169689Skan(define_insn_reservation "ppro_sse_SF" 3 509169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 510169689Skan (and (eq_attr "mode" "SF") 511169689Skan (eq_attr "type" "sse"))) 512169689Skan "decodern,p0") 513169689Skan 514169689Skan(define_insn_reservation "ppro_sse_add_SF" 3 515169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 516169689Skan (and (eq_attr "memory" "none") 517169689Skan (and (eq_attr "mode" "SF") 518169689Skan (eq_attr "type" "sseadd")))) 519169689Skan "decodern,p1") 520169689Skan 521169689Skan(define_insn_reservation "ppro_sse_add_SF_load" 3 522169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 523169689Skan (and (eq_attr "memory" "load") 524169689Skan (and (eq_attr "mode" "SF") 525169689Skan (eq_attr "type" "sseadd")))) 526169689Skan "decoder0,p2+p1") 527169689Skan 528169689Skan(define_insn_reservation "ppro_sse_cmp_SF" 3 529169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 530169689Skan (and (eq_attr "memory" "none") 531169689Skan (and (eq_attr "mode" "SF") 532169689Skan (eq_attr "type" "ssecmp")))) 533169689Skan "decoder0,p1") 534169689Skan 535169689Skan(define_insn_reservation "ppro_sse_cmp_SF_load" 3 536169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 537169689Skan (and (eq_attr "memory" "load") 538169689Skan (and (eq_attr "mode" "SF") 539169689Skan (eq_attr "type" "ssecmp")))) 540169689Skan "decoder0,p2+p1") 541169689Skan 542169689Skan(define_insn_reservation "ppro_sse_comi_SF" 1 543169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 544169689Skan (and (eq_attr "memory" "none") 545169689Skan (and (eq_attr "mode" "SF") 546169689Skan (eq_attr "type" "ssecomi")))) 547169689Skan "decodern,p0") 548169689Skan 549169689Skan(define_insn_reservation "ppro_sse_comi_SF_load" 1 550169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 551169689Skan (and (eq_attr "memory" "load") 552169689Skan (and (eq_attr "mode" "SF") 553169689Skan (eq_attr "type" "ssecomi")))) 554169689Skan "decoder0,p2+p0") 555169689Skan 556169689Skan(define_insn_reservation "ppro_sse_mul_SF" 4 557169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 558169689Skan (and (eq_attr "memory" "none") 559169689Skan (and (eq_attr "mode" "SF") 560169689Skan (eq_attr "type" "ssemul")))) 561169689Skan "decodern,p0") 562169689Skan 563169689Skan(define_insn_reservation "ppro_sse_mul_SF_load" 4 564169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 565169689Skan (and (eq_attr "memory" "load") 566169689Skan (and (eq_attr "mode" "SF") 567169689Skan (eq_attr "type" "ssemul")))) 568169689Skan "decoder0,p2+p0") 569169689Skan 570169689Skan;; FIXME: ssediv doesn't close p0 for 17 cycles, surely??? 571169689Skan(define_insn_reservation "ppro_sse_div_SF" 18 572169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 573169689Skan (and (eq_attr "memory" "none") 574169689Skan (and (eq_attr "mode" "SF") 575169689Skan (eq_attr "type" "ssediv")))) 576169689Skan "decoder0,p0*17") 577169689Skan 578169689Skan(define_insn_reservation "ppro_sse_div_SF_load" 18 579169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 580169689Skan (and (eq_attr "memory" "none") 581169689Skan (and (eq_attr "mode" "SF") 582169689Skan (eq_attr "type" "ssediv")))) 583169689Skan "decoder0,(p2+p0),p0*16") 584169689Skan 585169689Skan(define_insn_reservation "ppro_sse_icvt_SF" 4 586169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 587169689Skan (and (eq_attr "mode" "SF") 588169689Skan (eq_attr "type" "sseicvt"))) 589169689Skan "decoder0,(p2+p1)*2") 590169689Skan 591169689Skan(define_insn_reservation "ppro_sse_icvt_SI" 3 592169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 593169689Skan (and (eq_attr "mode" "SI") 594169689Skan (eq_attr "type" "sseicvt"))) 595169689Skan "decoder0,(p2+p1)") 596169689Skan 597169689Skan(define_insn_reservation "ppro_sse_mov_SF" 3 598169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 599169689Skan (and (eq_attr "memory" "none") 600169689Skan (and (eq_attr "mode" "SF") 601169689Skan (eq_attr "type" "ssemov")))) 602169689Skan "decoder0,(p0|p1)") 603169689Skan 604169689Skan(define_insn_reservation "ppro_sse_mov_SF_load" 3 605169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 606169689Skan (and (eq_attr "memory" "load") 607169689Skan (and (eq_attr "mode" "SF") 608169689Skan (eq_attr "type" "ssemov")))) 609169689Skan "decoder0,p2+(p0|p1)") 610169689Skan 611169689Skan(define_insn_reservation "ppro_sse_mov_SF_store" 3 612169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 613169689Skan (and (eq_attr "memory" "store") 614169689Skan (and (eq_attr "mode" "SF") 615169689Skan (eq_attr "type" "ssemov")))) 616169689Skan "decoder0,p4+p3") 617169689Skan 618169689Skan(define_insn_reservation "ppro_sse_V4SF" 4 619169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 620169689Skan (and (eq_attr "mode" "V4SF") 621169689Skan (eq_attr "type" "sse"))) 622169689Skan "decoder0,p1*2") 623169689Skan 624169689Skan(define_insn_reservation "ppro_sse_add_V4SF" 3 625169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 626169689Skan (and (eq_attr "memory" "none") 627169689Skan (and (eq_attr "mode" "V4SF") 628169689Skan (eq_attr "type" "sseadd")))) 629169689Skan "decoder0,p1*2") 630169689Skan 631169689Skan(define_insn_reservation "ppro_sse_add_V4SF_load" 3 632169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 633169689Skan (and (eq_attr "memory" "load") 634169689Skan (and (eq_attr "mode" "V4SF") 635169689Skan (eq_attr "type" "sseadd")))) 636169689Skan "decoder0,(p2+p1)*2") 637169689Skan 638169689Skan(define_insn_reservation "ppro_sse_cmp_V4SF" 3 639169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 640169689Skan (and (eq_attr "memory" "none") 641169689Skan (and (eq_attr "mode" "V4SF") 642169689Skan (eq_attr "type" "ssecmp")))) 643169689Skan "decoder0,p1*2") 644169689Skan 645169689Skan(define_insn_reservation "ppro_sse_cmp_V4SF_load" 3 646169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 647169689Skan (and (eq_attr "memory" "load") 648169689Skan (and (eq_attr "mode" "V4SF") 649169689Skan (eq_attr "type" "ssecmp")))) 650169689Skan "decoder0,(p2+p1)*2") 651169689Skan 652169689Skan(define_insn_reservation "ppro_sse_cvt_V4SF" 3 653169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 654169689Skan (and (eq_attr "memory" "none,unknown") 655169689Skan (and (eq_attr "mode" "V4SF") 656169689Skan (eq_attr "type" "ssecvt")))) 657169689Skan "decoder0,p1*2") 658169689Skan 659169689Skan(define_insn_reservation "ppro_sse_cvt_V4SF_other" 4 660169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 661169689Skan (and (eq_attr "memory" "!none,unknown") 662169689Skan (and (eq_attr "mode" "V4SF") 663169689Skan (eq_attr "type" "ssecmp")))) 664169689Skan "decoder0,p1,p4+p3") 665169689Skan 666169689Skan(define_insn_reservation "ppro_sse_mul_V4SF" 5 667169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 668169689Skan (and (eq_attr "memory" "none") 669169689Skan (and (eq_attr "mode" "V4SF") 670169689Skan (eq_attr "type" "ssemul")))) 671169689Skan "decoder0,p0*2") 672169689Skan 673169689Skan(define_insn_reservation "ppro_sse_mul_V4SF_load" 5 674169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 675169689Skan (and (eq_attr "memory" "load") 676169689Skan (and (eq_attr "mode" "V4SF") 677169689Skan (eq_attr "type" "ssemul")))) 678169689Skan "decoder0,(p2+p0)*2") 679169689Skan 680169689Skan;; FIXME: p0 really closed this long??? 681169689Skan(define_insn_reservation "ppro_sse_div_V4SF" 48 682169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 683169689Skan (and (eq_attr "memory" "none") 684169689Skan (and (eq_attr "mode" "V4SF") 685169689Skan (eq_attr "type" "ssediv")))) 686169689Skan "decoder0,p0*34") 687169689Skan 688169689Skan(define_insn_reservation "ppro_sse_div_V4SF_load" 48 689169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 690169689Skan (and (eq_attr "memory" "load") 691169689Skan (and (eq_attr "mode" "V4SF") 692169689Skan (eq_attr "type" "ssediv")))) 693169689Skan "decoder0,(p2+p0)*2,p0*32") 694169689Skan 695169689Skan(define_insn_reservation "ppro_sse_log_V4SF" 2 696169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 697169689Skan (and (eq_attr "memory" "none") 698169689Skan (and (eq_attr "mode" "V4SF") 699169689Skan (eq_attr "type" "sselog,sselog1")))) 700169689Skan "decodern,p1") 701169689Skan 702169689Skan(define_insn_reservation "ppro_sse_log_V4SF_load" 2 703169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 704169689Skan (and (eq_attr "memory" "load") 705169689Skan (and (eq_attr "mode" "V4SF") 706169689Skan (eq_attr "type" "sselog,sselog1")))) 707169689Skan "decoder0,(p2+p1)") 708169689Skan 709169689Skan(define_insn_reservation "ppro_sse_mov_V4SF" 1 710169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 711169689Skan (and (eq_attr "memory" "none") 712169689Skan (and (eq_attr "mode" "V4SF") 713169689Skan (eq_attr "type" "ssemov")))) 714169689Skan "decoder0,(p0|p1)*2") 715169689Skan 716169689Skan(define_insn_reservation "ppro_sse_mov_V4SF_load" 2 717169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 718169689Skan (and (eq_attr "memory" "load") 719169689Skan (and (eq_attr "mode" "V4SF") 720169689Skan (eq_attr "type" "ssemov")))) 721169689Skan "decoder0,p2*2") 722169689Skan 723169689Skan(define_insn_reservation "ppro_sse_mov_V4SF_store" 3 724169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 725169689Skan (and (eq_attr "memory" "store") 726169689Skan (and (eq_attr "mode" "V4SF") 727169689Skan (eq_attr "type" "ssemov")))) 728169689Skan "decoder0,(p4+p3)*2") 729169689Skan 730169689Skan;; All other instructions are modelled as simple instructions. 731169689Skan;; We have already modelled all i387 floating point instructions, so all 732169689Skan;; other instructions execute on either port 0 or port 1. This includes 733169689Skan;; the ALU units, and the MMX units. 734169689Skan;; 735169689Skan;; reg-reg instructions produce 1 uop so they can be decoded on any of 736169689Skan;; the three decoders. 737169689Skan(define_insn_reservation "ppro_insn" 1 738169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 739169689Skan (and (eq_attr "memory" "none,unknown") 740169689Skan (eq_attr "type" "alu,alu1,negnot,incdec,icmp,test,setcc,icmov,push,pop,fxch,sseiadd,sseishft,sseimul,mmx,mmxadd,mmxcmp"))) 741169689Skan "decodern,(p0|p1)") 742169689Skan 743169689Skan;; read-modify and register-memory instructions have 2 or three uops, 744169689Skan;; so they have to be decoded on decoder0. 745169689Skan(define_insn_reservation "ppro_insn_load" 3 746169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 747169689Skan (and (eq_attr "memory" "load") 748169689Skan (eq_attr "type" "alu,alu1,negnot,incdec,icmp,test,setcc,icmov,push,pop,fxch,sseiadd,sseishft,sseimul,mmx,mmxadd,mmxcmp"))) 749169689Skan "decoder0,p2+(p0|p1)") 750169689Skan 751169689Skan(define_insn_reservation "ppro_insn_store" 1 752169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 753169689Skan (and (eq_attr "memory" "store") 754169689Skan (eq_attr "type" "alu,alu1,negnot,incdec,icmp,test,setcc,icmov,push,pop,fxch,sseiadd,sseishft,sseimul,mmx,mmxadd,mmxcmp"))) 755169689Skan "decoder0,(p0|p1),p4+p3") 756169689Skan 757169689Skan;; read-modify-store instructions produce 4 uops so they have to be 758169689Skan;; decoded on decoder0 as well. 759169689Skan(define_insn_reservation "ppro_insn_both" 4 760169689Skan (and (eq_attr "cpu" "pentiumpro,generic32") 761169689Skan (and (eq_attr "memory" "both") 762169689Skan (eq_attr "type" "alu,alu1,negnot,incdec,icmp,test,setcc,icmov,push,pop,fxch,sseiadd,sseishft,sseimul,mmx,mmxadd,mmxcmp"))) 763169689Skan "decoder0,p2+(p0|p1),p4+p3") 764169689Skan 765