c-i386.texi revision 78828
1@c Copyright 1991, 1992, 1993, 1994, 1995, 1997, 1998, 1999, 2000, 2001 2@c Free Software Foundation, Inc. 3@c This is part of the GAS manual. 4@c For copying conditions, see the file as.texinfo. 5@ifset GENERIC 6@page 7@node i386-Dependent 8@chapter 80386 Dependent Features 9@end ifset 10@ifclear GENERIC 11@node Machine Dependencies 12@chapter 80386 Dependent Features 13@end ifclear 14 15@cindex i386 support 16@cindex i80306 support 17@cindex x86-64 support 18 19The i386 version @code{@value{AS}} supports both the original Intel 386 20architecture in both 16 and 32-bit mode as well as AMD x86-64 architecture 21extending the Intel architecture to 64-bits. 22 23@menu 24* i386-Options:: Options 25* i386-Syntax:: AT&T Syntax versus Intel Syntax 26* i386-Mnemonics:: Instruction Naming 27* i386-Regs:: Register Naming 28* i386-Prefixes:: Instruction Prefixes 29* i386-Memory:: Memory References 30* i386-Jumps:: Handling of Jump Instructions 31* i386-Float:: Floating Point 32* i386-SIMD:: Intel's MMX and AMD's 3DNow! SIMD Operations 33* i386-16bit:: Writing 16-bit Code 34* i386-Arch:: Specifying an x86 CPU architecture 35* i386-Bugs:: AT&T Syntax bugs 36* i386-Notes:: Notes 37@end menu 38 39@node i386-Options 40@section Options 41 42@cindex options for i386 43@cindex options for x86-64 44@cindex i386 options 45@cindex x86-64 options 46 47The i386 version of @code{@value{AS}} has a few machine 48dependent options: 49 50@table @code 51@cindex @samp{--32} option, i386 52@cindex @samp{--32} option, x86-64 53@cindex @samp{--64} option, i386 54@cindex @samp{--64} option, x86-64 55@item --32 | --64 56Select the word size, either 32 bits or 64 bits. Selecting 32-bit 57implies Intel i386 architecture, while 64-bit implies AMD x86-64 58architecture. 59 60These options are only available with the ELF object file format, and 61require that the necessary BFD support has been included (on a 32-bit 62platform you have to add --enable-64-bit-bfd to configure enable 64-bit 63usage and use x86-64 as target platform). 64@end table 65 66@node i386-Syntax 67@section AT&T Syntax versus Intel Syntax 68 69@cindex i386 intel_syntax pseudo op 70@cindex intel_syntax pseudo op, i386 71@cindex i386 att_syntax pseudo op 72@cindex att_syntax pseudo op, i386 73@cindex i386 syntax compatibility 74@cindex syntax compatibility, i386 75@cindex x86-64 intel_syntax pseudo op 76@cindex intel_syntax pseudo op, x86-64 77@cindex x86-64 att_syntax pseudo op 78@cindex att_syntax pseudo op, x86-64 79@cindex x86-64 syntax compatibility 80@cindex syntax compatibility, x86-64 81 82@code{@value{AS}} now supports assembly using Intel assembler syntax. 83@code{.intel_syntax} selects Intel mode, and @code{.att_syntax} switches 84back to the usual AT&T mode for compatibility with the output of 85@code{@value{GCC}}. Either of these directives may have an optional 86argument, @code{prefix}, or @code{noprefix} specifying whether registers 87require a @samp{%} prefix. AT&T System V/386 assembler syntax is quite 88different from Intel syntax. We mention these differences because 89almost all 80386 documents use Intel syntax. Notable differences 90between the two syntaxes are: 91 92@cindex immediate operands, i386 93@cindex i386 immediate operands 94@cindex register operands, i386 95@cindex i386 register operands 96@cindex jump/call operands, i386 97@cindex i386 jump/call operands 98@cindex operand delimiters, i386 99 100@cindex immediate operands, x86-64 101@cindex x86-64 immediate operands 102@cindex register operands, x86-64 103@cindex x86-64 register operands 104@cindex jump/call operands, x86-64 105@cindex x86-64 jump/call operands 106@cindex operand delimiters, x86-64 107@itemize @bullet 108@item 109AT&T immediate operands are preceded by @samp{$}; Intel immediate 110operands are undelimited (Intel @samp{push 4} is AT&T @samp{pushl $4}). 111AT&T register operands are preceded by @samp{%}; Intel register operands 112are undelimited. AT&T absolute (as opposed to PC relative) jump/call 113operands are prefixed by @samp{*}; they are undelimited in Intel syntax. 114 115@cindex i386 source, destination operands 116@cindex source, destination operands; i386 117@cindex x86-64 source, destination operands 118@cindex source, destination operands; x86-64 119@item 120AT&T and Intel syntax use the opposite order for source and destination 121operands. Intel @samp{add eax, 4} is @samp{addl $4, %eax}. The 122@samp{source, dest} convention is maintained for compatibility with 123previous Unix assemblers. Note that instructions with more than one 124source operand, such as the @samp{enter} instruction, do @emph{not} have 125reversed order. @ref{i386-Bugs}. 126 127@cindex mnemonic suffixes, i386 128@cindex sizes operands, i386 129@cindex i386 size suffixes 130@cindex mnemonic suffixes, x86-64 131@cindex sizes operands, x86-64 132@cindex x86-64 size suffixes 133@item 134In AT&T syntax the size of memory operands is determined from the last 135character of the instruction mnemonic. Mnemonic suffixes of @samp{b}, 136@samp{w}, @samp{l} and @samp{q} specify byte (8-bit), word (16-bit), long 137(32-bit) and quadruple word (64-bit) memory references. Intel syntax accomplishes 138this by prefixing memory operands (@emph{not} the instruction mnemonics) with 139@samp{byte ptr}, @samp{word ptr}, @samp{dword ptr} and @samp{qword ptr}. Thus, 140Intel @samp{mov al, byte ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T 141syntax. 142 143@cindex return instructions, i386 144@cindex i386 jump, call, return 145@cindex return instructions, x86-64 146@cindex x86-64 jump, call, return 147@item 148Immediate form long jumps and calls are 149@samp{lcall/ljmp $@var{section}, $@var{offset}} in AT&T syntax; the 150Intel syntax is 151@samp{call/jmp far @var{section}:@var{offset}}. Also, the far return 152instruction 153is @samp{lret $@var{stack-adjust}} in AT&T syntax; Intel syntax is 154@samp{ret far @var{stack-adjust}}. 155 156@cindex sections, i386 157@cindex i386 sections 158@cindex sections, x86-64 159@cindex x86-64 sections 160@item 161The AT&T assembler does not provide support for multiple section 162programs. Unix style systems expect all programs to be single sections. 163@end itemize 164 165@node i386-Mnemonics 166@section Instruction Naming 167 168@cindex i386 instruction naming 169@cindex instruction naming, i386 170@cindex x86-64 instruction naming 171@cindex instruction naming, x86-64 172 173Instruction mnemonics are suffixed with one character modifiers which 174specify the size of operands. The letters @samp{b}, @samp{w}, @samp{l} 175and @samp{q} specify byte, word, long and quadruple word operands. If 176no suffix is specified by an instruction then @code{@value{AS}} tries to 177fill in the missing suffix based on the destination register operand 178(the last one by convention). Thus, @samp{mov %ax, %bx} is equivalent 179to @samp{movw %ax, %bx}; also, @samp{mov $1, %bx} is equivalent to 180@samp{movw $1, bx}. Note that this is incompatible with the AT&T Unix 181assembler which assumes that a missing mnemonic suffix implies long 182operand size. (This incompatibility does not affect compiler output 183since compilers always explicitly specify the mnemonic suffix.) 184 185Almost all instructions have the same names in AT&T and Intel format. 186There are a few exceptions. The sign extend and zero extend 187instructions need two sizes to specify them. They need a size to 188sign/zero extend @emph{from} and a size to zero extend @emph{to}. This 189is accomplished by using two instruction mnemonic suffixes in AT&T 190syntax. Base names for sign extend and zero extend are 191@samp{movs@dots{}} and @samp{movz@dots{}} in AT&T syntax (@samp{movsx} 192and @samp{movzx} in Intel syntax). The instruction mnemonic suffixes 193are tacked on to this base name, the @emph{from} suffix before the 194@emph{to} suffix. Thus, @samp{movsbl %al, %edx} is AT&T syntax for 195``move sign extend @emph{from} %al @emph{to} %edx.'' Possible suffixes, 196thus, are @samp{bl} (from byte to long), @samp{bw} (from byte to word), 197@samp{wl} (from word to long), @samp{bq} (from byte to quadruple word), 198@samp{wq} (from word to quadruple word), and @samp{lq} (from long to 199quadruple word). 200 201@cindex conversion instructions, i386 202@cindex i386 conversion instructions 203@cindex conversion instructions, x86-64 204@cindex x86-64 conversion instructions 205The Intel-syntax conversion instructions 206 207@itemize @bullet 208@item 209@samp{cbw} --- sign-extend byte in @samp{%al} to word in @samp{%ax}, 210 211@item 212@samp{cwde} --- sign-extend word in @samp{%ax} to long in @samp{%eax}, 213 214@item 215@samp{cwd} --- sign-extend word in @samp{%ax} to long in @samp{%dx:%ax}, 216 217@item 218@samp{cdq} --- sign-extend dword in @samp{%eax} to quad in @samp{%edx:%eax}, 219 220@item 221@samp{cdqe} --- sign-extend dword in @samp{%eax} to quad in @samp{%rax} 222(x86-64 only), 223 224@item 225@samp{cdo} --- sign-extend quad in @samp{%rax} to octuple in 226@samp{%rdx:%rax} (x86-64 only), 227@end itemize 228 229@noindent 230are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, @samp{cltd}, @samp{cltq}, and 231@samp{cqto} in AT&T naming. @code{@value{AS}} accepts either naming for these 232instructions. 233 234@cindex jump instructions, i386 235@cindex call instructions, i386 236@cindex jump instructions, x86-64 237@cindex call instructions, x86-64 238Far call/jump instructions are @samp{lcall} and @samp{ljmp} in 239AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel 240convention. 241 242@node i386-Regs 243@section Register Naming 244 245@cindex i386 registers 246@cindex registers, i386 247@cindex x86-64 registers 248@cindex registers, x86-64 249Register operands are always prefixed with @samp{%}. The 80386 registers 250consist of 251 252@itemize @bullet 253@item 254the 8 32-bit registers @samp{%eax} (the accumulator), @samp{%ebx}, 255@samp{%ecx}, @samp{%edx}, @samp{%edi}, @samp{%esi}, @samp{%ebp} (the 256frame pointer), and @samp{%esp} (the stack pointer). 257 258@item 259the 8 16-bit low-ends of these: @samp{%ax}, @samp{%bx}, @samp{%cx}, 260@samp{%dx}, @samp{%di}, @samp{%si}, @samp{%bp}, and @samp{%sp}. 261 262@item 263the 8 8-bit registers: @samp{%ah}, @samp{%al}, @samp{%bh}, 264@samp{%bl}, @samp{%ch}, @samp{%cl}, @samp{%dh}, and @samp{%dl} (These 265are the high-bytes and low-bytes of @samp{%ax}, @samp{%bx}, 266@samp{%cx}, and @samp{%dx}) 267 268@item 269the 6 section registers @samp{%cs} (code section), @samp{%ds} 270(data section), @samp{%ss} (stack section), @samp{%es}, @samp{%fs}, 271and @samp{%gs}. 272 273@item 274the 3 processor control registers @samp{%cr0}, @samp{%cr2}, and 275@samp{%cr3}. 276 277@item 278the 6 debug registers @samp{%db0}, @samp{%db1}, @samp{%db2}, 279@samp{%db3}, @samp{%db6}, and @samp{%db7}. 280 281@item 282the 2 test registers @samp{%tr6} and @samp{%tr7}. 283 284@item 285the 8 floating point register stack @samp{%st} or equivalently 286@samp{%st(0)}, @samp{%st(1)}, @samp{%st(2)}, @samp{%st(3)}, 287@samp{%st(4)}, @samp{%st(5)}, @samp{%st(6)}, and @samp{%st(7)}. 288These registers are overloaded by 8 MMX registers @samp{%mm0}, 289@samp{%mm1}, @samp{%mm2}, @samp{%mm3}, @samp{%mm4}, @samp{%mm5}, 290@samp{%mm6} and @samp{%mm7}. 291 292@item 293the 8 SSE registers registers @samp{%xmm0}, @samp{%xmm1}, @samp{%xmm2}, 294@samp{%xmm3}, @samp{%xmm4}, @samp{%xmm5}, @samp{%xmm6} and @samp{%xmm7}. 295@end itemize 296 297The AMD x86-64 architecture extends the register set by: 298 299@itemize @bullet 300@item 301enhancing the 8 32-bit registers to 64-bit: @samp{%rax} (the 302accumulator), @samp{%rbx}, @samp{%rcx}, @samp{%rdx}, @samp{%rdi}, 303@samp{%rsi}, @samp{%rbp} (the frame pointer), @samp{%rsp} (the stack 304pointer) 305 306@item 307the 8 extended registers @samp{%r8}--@samp{%r15}. 308 309@item 310the 8 32-bit low ends of the extended registers: @samp{%r8d}--@samp{%r15d} 311 312@item 313the 8 16-bit low ends of the extended registers: @samp{%r8w}--@samp{%r15w} 314 315@item 316the 8 8-bit low ends of the extended registers: @samp{%r8b}--@samp{%r15b} 317 318@item 319the 4 8-bit registers: @samp{%sil}, @samp{%dil}, @samp{%bpl}, @samp{%spl}. 320 321@item 322the 8 debug registers: @samp{%db8}--@samp{%db15}. 323 324@item 325the 8 SSE registers: @samp{%xmm8}--@samp{%xmm15}. 326@end itemize 327 328@node i386-Prefixes 329@section Instruction Prefixes 330 331@cindex i386 instruction prefixes 332@cindex instruction prefixes, i386 333@cindex prefixes, i386 334Instruction prefixes are used to modify the following instruction. They 335are used to repeat string instructions, to provide section overrides, to 336perform bus lock operations, and to change operand and address sizes. 337(Most instructions that normally operate on 32-bit operands will use 33816-bit operands if the instruction has an ``operand size'' prefix.) 339Instruction prefixes are best written on the same line as the instruction 340they act upon. For example, the @samp{scas} (scan string) instruction is 341repeated with: 342 343@smallexample 344 repne scas %es:(%edi),%al 345@end smallexample 346 347You may also place prefixes on the lines immediately preceding the 348instruction, but this circumvents checks that @code{@value{AS}} does 349with prefixes, and will not work with all prefixes. 350 351Here is a list of instruction prefixes: 352 353@cindex section override prefixes, i386 354@itemize @bullet 355@item 356Section override prefixes @samp{cs}, @samp{ds}, @samp{ss}, @samp{es}, 357@samp{fs}, @samp{gs}. These are automatically added by specifying 358using the @var{section}:@var{memory-operand} form for memory references. 359 360@cindex size prefixes, i386 361@item 362Operand/Address size prefixes @samp{data16} and @samp{addr16} 363change 32-bit operands/addresses into 16-bit operands/addresses, 364while @samp{data32} and @samp{addr32} change 16-bit ones (in a 365@code{.code16} section) into 32-bit operands/addresses. These prefixes 366@emph{must} appear on the same line of code as the instruction they 367modify. For example, in a 16-bit @code{.code16} section, you might 368write: 369 370@smallexample 371 addr32 jmpl *(%ebx) 372@end smallexample 373 374@cindex bus lock prefixes, i386 375@cindex inhibiting interrupts, i386 376@item 377The bus lock prefix @samp{lock} inhibits interrupts during execution of 378the instruction it precedes. (This is only valid with certain 379instructions; see a 80386 manual for details). 380 381@cindex coprocessor wait, i386 382@item 383The wait for coprocessor prefix @samp{wait} waits for the coprocessor to 384complete the current instruction. This should never be needed for the 38580386/80387 combination. 386 387@cindex repeat prefixes, i386 388@item 389The @samp{rep}, @samp{repe}, and @samp{repne} prefixes are added 390to string instructions to make them repeat @samp{%ecx} times (@samp{%cx} 391times if the current address size is 16-bits). 392@cindex REX prefixes, i386 393@item 394The @samp{rex} family of prefixes is used by x86-64 to encode 395extensions to i386 instruction set. The @samp{rex} prefix has four 396bits --- an operand size overwrite (@code{64}) used to change operand size 397from 32-bit to 64-bit and X, Y and Z extensions bits used to extend the 398register set. 399 400You may write the @samp{rex} prefixes directly. The @samp{rex64xyz} 401instruction emits @samp{rex} prefix with all the bits set. By omitting 402the @code{64}, @code{x}, @code{y} or @code{z} you may write other 403prefixes as well. Normally, there is no need to write the prefixes 404explicitly, since gas will automatically generate them based on the 405instruction operands. 406@end itemize 407 408@node i386-Memory 409@section Memory References 410 411@cindex i386 memory references 412@cindex memory references, i386 413@cindex x86-64 memory references 414@cindex memory references, x86-64 415An Intel syntax indirect memory reference of the form 416 417@smallexample 418@var{section}:[@var{base} + @var{index}*@var{scale} + @var{disp}] 419@end smallexample 420 421@noindent 422is translated into the AT&T syntax 423 424@smallexample 425@var{section}:@var{disp}(@var{base}, @var{index}, @var{scale}) 426@end smallexample 427 428@noindent 429where @var{base} and @var{index} are the optional 32-bit base and 430index registers, @var{disp} is the optional displacement, and 431@var{scale}, taking the values 1, 2, 4, and 8, multiplies @var{index} 432to calculate the address of the operand. If no @var{scale} is 433specified, @var{scale} is taken to be 1. @var{section} specifies the 434optional section register for the memory operand, and may override the 435default section register (see a 80386 manual for section register 436defaults). Note that section overrides in AT&T syntax @emph{must} 437be preceded by a @samp{%}. If you specify a section override which 438coincides with the default section register, @code{@value{AS}} does @emph{not} 439output any section register override prefixes to assemble the given 440instruction. Thus, section overrides can be specified to emphasize which 441section register is used for a given memory operand. 442 443Here are some examples of Intel and AT&T style memory references: 444 445@table @asis 446@item AT&T: @samp{-4(%ebp)}, Intel: @samp{[ebp - 4]} 447@var{base} is @samp{%ebp}; @var{disp} is @samp{-4}. @var{section} is 448missing, and the default section is used (@samp{%ss} for addressing with 449@samp{%ebp} as the base register). @var{index}, @var{scale} are both missing. 450 451@item AT&T: @samp{foo(,%eax,4)}, Intel: @samp{[foo + eax*4]} 452@var{index} is @samp{%eax} (scaled by a @var{scale} 4); @var{disp} is 453@samp{foo}. All other fields are missing. The section register here 454defaults to @samp{%ds}. 455 456@item AT&T: @samp{foo(,1)}; Intel @samp{[foo]} 457This uses the value pointed to by @samp{foo} as a memory operand. 458Note that @var{base} and @var{index} are both missing, but there is only 459@emph{one} @samp{,}. This is a syntactic exception. 460 461@item AT&T: @samp{%gs:foo}; Intel @samp{gs:foo} 462This selects the contents of the variable @samp{foo} with section 463register @var{section} being @samp{%gs}. 464@end table 465 466Absolute (as opposed to PC relative) call and jump operands must be 467prefixed with @samp{*}. If no @samp{*} is specified, @code{@value{AS}} 468always chooses PC relative addressing for jump/call labels. 469 470Any instruction that has a memory operand, but no register operand, 471@emph{must} specify its size (byte, word, long, or quadruple) with an 472instruction mnemonic suffix (@samp{b}, @samp{w}, @samp{l} or @samp{q}, 473respectively). 474 475The x86-64 architecture adds an RIP (instruction pointer relative) 476addressing. This addressing mode is specified by using @samp{rip} as a 477base register. Only constant offsets are valid. For example: 478 479@table @asis 480@item AT&T: @samp{1234(%rip)}, Intel: @samp{[rip + 1234]} 481Points to the address 1234 bytes past the end of the current 482instruction. 483 484@item AT&T: @samp{symbol(%rip)}, Intel: @samp{[rip + symbol]} 485Points to the @code{symbol} in RIP relative way, this is shorter than 486the default absolute addressing. 487@end table 488 489Other addressing modes remain unchanged in x86-64 architecture, except 490registers used are 64-bit instead of 32-bit. 491 492@node i386-Jumps 493@section Handling of Jump Instructions 494 495@cindex jump optimization, i386 496@cindex i386 jump optimization 497@cindex jump optimization, x86-64 498@cindex x86-64 jump optimization 499Jump instructions are always optimized to use the smallest possible 500displacements. This is accomplished by using byte (8-bit) displacement 501jumps whenever the target is sufficiently close. If a byte displacement 502is insufficient a long displacement is used. We do not support 503word (16-bit) displacement jumps in 32-bit mode (i.e. prefixing the jump 504instruction with the @samp{data16} instruction prefix), since the 80386 505insists upon masking @samp{%eip} to 16 bits after the word displacement 506is added. (See also @pxref{i386-Arch}) 507 508Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz}, 509@samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in byte 510displacements, so that if you use these instructions (@code{@value{GCC}} does 511not use them) you may get an error message (and incorrect code). The AT&T 51280386 assembler tries to get around this problem by expanding @samp{jcxz foo} 513to 514 515@smallexample 516 jcxz cx_zero 517 jmp cx_nonzero 518cx_zero: jmp foo 519cx_nonzero: 520@end smallexample 521 522@node i386-Float 523@section Floating Point 524 525@cindex i386 floating point 526@cindex floating point, i386 527@cindex x86-64 floating point 528@cindex floating point, x86-64 529All 80387 floating point types except packed BCD are supported. 530(BCD support may be added without much difficulty). These data 531types are 16-, 32-, and 64- bit integers, and single (32-bit), 532double (64-bit), and extended (80-bit) precision floating point. 533Each supported type has an instruction mnemonic suffix and a constructor 534associated with it. Instruction mnemonic suffixes specify the operand's 535data type. Constructors build these data types into memory. 536 537@cindex @code{float} directive, i386 538@cindex @code{single} directive, i386 539@cindex @code{double} directive, i386 540@cindex @code{tfloat} directive, i386 541@cindex @code{float} directive, x86-64 542@cindex @code{single} directive, x86-64 543@cindex @code{double} directive, x86-64 544@cindex @code{tfloat} directive, x86-64 545@itemize @bullet 546@item 547Floating point constructors are @samp{.float} or @samp{.single}, 548@samp{.double}, and @samp{.tfloat} for 32-, 64-, and 80-bit formats. 549These correspond to instruction mnemonic suffixes @samp{s}, @samp{l}, 550and @samp{t}. @samp{t} stands for 80-bit (ten byte) real. The 80387 551only supports this format via the @samp{fldt} (load 80-bit real to stack 552top) and @samp{fstpt} (store 80-bit real and pop stack) instructions. 553 554@cindex @code{word} directive, i386 555@cindex @code{long} directive, i386 556@cindex @code{int} directive, i386 557@cindex @code{quad} directive, i386 558@cindex @code{word} directive, x86-64 559@cindex @code{long} directive, x86-64 560@cindex @code{int} directive, x86-64 561@cindex @code{quad} directive, x86-64 562@item 563Integer constructors are @samp{.word}, @samp{.long} or @samp{.int}, and 564@samp{.quad} for the 16-, 32-, and 64-bit integer formats. The 565corresponding instruction mnemonic suffixes are @samp{s} (single), 566@samp{l} (long), and @samp{q} (quad). As with the 80-bit real format, 567the 64-bit @samp{q} format is only present in the @samp{fildq} (load 568quad integer to stack top) and @samp{fistpq} (store quad integer and pop 569stack) instructions. 570@end itemize 571 572Register to register operations should not use instruction mnemonic suffixes. 573@samp{fstl %st, %st(1)} will give a warning, and be assembled as if you 574wrote @samp{fst %st, %st(1)}, since all register to register operations 575use 80-bit floating point operands. (Contrast this with @samp{fstl %st, mem}, 576which converts @samp{%st} from 80-bit to 64-bit floating point format, 577then stores the result in the 4 byte location @samp{mem}) 578 579@node i386-SIMD 580@section Intel's MMX and AMD's 3DNow! SIMD Operations 581 582@cindex MMX, i386 583@cindex 3DNow!, i386 584@cindex SIMD, i386 585@cindex MMX, x86-64 586@cindex 3DNow!, x86-64 587@cindex SIMD, x86-64 588 589@code{@value{AS}} supports Intel's MMX instruction set (SIMD 590instructions for integer data), available on Intel's Pentium MMX 591processors and Pentium II processors, AMD's K6 and K6-2 processors, 592Cyrix' M2 processor, and probably others. It also supports AMD's 3DNow! 593instruction set (SIMD instructions for 32-bit floating point data) 594available on AMD's K6-2 processor and possibly others in the future. 595 596Currently, @code{@value{AS}} does not support Intel's floating point 597SIMD, Katmai (KNI). 598 599The eight 64-bit MMX operands, also used by 3DNow!, are called @samp{%mm0}, 600@samp{%mm1}, ... @samp{%mm7}. They contain eight 8-bit integers, four 60116-bit integers, two 32-bit integers, one 64-bit integer, or two 32-bit 602floating point values. The MMX registers cannot be used at the same time 603as the floating point stack. 604 605See Intel and AMD documentation, keeping in mind that the operand order in 606instructions is reversed from the Intel syntax. 607 608@node i386-16bit 609@section Writing 16-bit Code 610 611@cindex i386 16-bit code 612@cindex 16-bit code, i386 613@cindex real-mode code, i386 614@cindex @code{code16gcc} directive, i386 615@cindex @code{code16} directive, i386 616@cindex @code{code32} directive, i386 617@cindex @code{code64} directive, i386 618@cindex @code{code64} directive, x86-64 619While @code{@value{AS}} normally writes only ``pure'' 32-bit i386 code 620or 64-bit x86-64 code depending on the default configuration, 621it also supports writing code to run in real mode or in 16-bit protected 622mode code segments. To do this, put a @samp{.code16} or 623@samp{.code16gcc} directive before the assembly language instructions to 624be run in 16-bit mode. You can switch @code{@value{AS}} back to writing 625normal 32-bit code with the @samp{.code32} directive. 626 627@samp{.code16gcc} provides experimental support for generating 16-bit 628code from gcc, and differs from @samp{.code16} in that @samp{call}, 629@samp{ret}, @samp{enter}, @samp{leave}, @samp{push}, @samp{pop}, 630@samp{pusha}, @samp{popa}, @samp{pushf}, and @samp{popf} instructions 631default to 32-bit size. This is so that the stack pointer is 632manipulated in the same way over function calls, allowing access to 633function parameters at the same stack offsets as in 32-bit mode. 634@samp{.code16gcc} also automatically adds address size prefixes where 635necessary to use the 32-bit addressing modes that gcc generates. 636 637The code which @code{@value{AS}} generates in 16-bit mode will not 638necessarily run on a 16-bit pre-80386 processor. To write code that 639runs on such a processor, you must refrain from using @emph{any} 32-bit 640constructs which require @code{@value{AS}} to output address or operand 641size prefixes. 642 643Note that writing 16-bit code instructions by explicitly specifying a 644prefix or an instruction mnemonic suffix within a 32-bit code section 645generates different machine instructions than those generated for a 64616-bit code segment. In a 32-bit code section, the following code 647generates the machine opcode bytes @samp{66 6a 04}, which pushes the 648value @samp{4} onto the stack, decrementing @samp{%esp} by 2. 649 650@smallexample 651 pushw $4 652@end smallexample 653 654The same code in a 16-bit code section would generate the machine 655opcode bytes @samp{6a 04} (ie. without the operand size prefix), which 656is correct since the processor default operand size is assumed to be 16 657bits in a 16-bit code section. 658 659@node i386-Bugs 660@section AT&T Syntax bugs 661 662The UnixWare assembler, and probably other AT&T derived ix86 Unix 663assemblers, generate floating point instructions with reversed source 664and destination registers in certain cases. Unfortunately, gcc and 665possibly many other programs use this reversed syntax, so we're stuck 666with it. 667 668For example 669 670@smallexample 671 fsub %st,%st(3) 672@end smallexample 673@noindent 674results in @samp{%st(3)} being updated to @samp{%st - %st(3)} rather 675than the expected @samp{%st(3) - %st}. This happens with all the 676non-commutative arithmetic floating point operations with two register 677operands where the source register is @samp{%st} and the destination 678register is @samp{%st(i)}. 679 680@node i386-Arch 681@section Specifying CPU Architecture 682 683@cindex arch directive, i386 684@cindex i386 arch directive 685@cindex arch directive, x86-64 686@cindex x86-64 arch directive 687 688@code{@value{AS}} may be told to assemble for a particular CPU 689architecture with the @code{.arch @var{cpu_type}} directive. This 690directive enables a warning when gas detects an instruction that is not 691supported on the CPU specified. The choices for @var{cpu_type} are: 692 693@multitable @columnfractions .20 .20 .20 .20 694@item @samp{i8086} @tab @samp{i186} @tab @samp{i286} @tab @samp{i386} 695@item @samp{i486} @tab @samp{i586} @tab @samp{i686} @tab @samp{pentium} 696@item @samp{pentiumpro} @tab @samp{pentium4} @tab @samp{k6} @tab @samp{athlon} 697@item @samp{sledgehammer} 698@end multitable 699 700Apart from the warning, there are only two other effects on 701@code{@value{AS}} operation; Firstly, if you specify a CPU other than 702@samp{i486}, then shift by one instructions such as @samp{sarl $1, %eax} 703will automatically use a two byte opcode sequence. The larger three 704byte opcode sequence is used on the 486 (and when no architecture is 705specified) because it executes faster on the 486. Note that you can 706explicitly request the two byte opcode by writing @samp{sarl %eax}. 707Secondly, if you specify @samp{i8086}, @samp{i186}, or @samp{i286}, 708@emph{and} @samp{.code16} or @samp{.code16gcc} then byte offset 709conditional jumps will be promoted when necessary to a two instruction 710sequence consisting of a conditional jump of the opposite sense around 711an unconditional jump to the target. 712 713Following the CPU architecture, you may specify @samp{jumps} or 714@samp{nojumps} to control automatic promotion of conditional jumps. 715@samp{jumps} is the default, and enables jump promotion; All external 716jumps will be of the long variety, and file-local jumps will be promoted 717as necessary. (@pxref{i386-Jumps}) @samp{nojumps} leaves external 718conditional jumps as byte offset jumps, and warns about file-local 719conditional jumps that @code{@value{AS}} promotes. 720Unconditional jumps are treated as for @samp{jumps}. 721 722For example 723 724@smallexample 725 .arch i8086,nojumps 726@end smallexample 727 728@node i386-Notes 729@section Notes 730 731@cindex i386 @code{mul}, @code{imul} instructions 732@cindex @code{mul} instruction, i386 733@cindex @code{imul} instruction, i386 734@cindex @code{mul} instruction, x86-64 735@cindex @code{imul} instruction, x86-64 736There is some trickery concerning the @samp{mul} and @samp{imul} 737instructions that deserves mention. The 16-, 32-, 64- and 128-bit expanding 738multiplies (base opcode @samp{0xf6}; extension 4 for @samp{mul} and 5 739for @samp{imul}) can be output only in the one operand form. Thus, 740@samp{imul %ebx, %eax} does @emph{not} select the expanding multiply; 741the expanding multiply would clobber the @samp{%edx} register, and this 742would confuse @code{@value{GCC}} output. Use @samp{imul %ebx} to get the 74364-bit product in @samp{%edx:%eax}. 744 745We have added a two operand form of @samp{imul} when the first operand 746is an immediate mode expression and the second operand is a register. 747This is just a shorthand, so that, multiplying @samp{%eax} by 69, for 748example, can be done with @samp{imul $69, %eax} rather than @samp{imul 749$69, %eax, %eax}. 750 751