DisassemblerEmitter.cpp revision 206124
1112158Sdas//===- DisassemblerEmitter.cpp - Generate a disassembler ------------------===// 2112158Sdas// 3112158Sdas// The LLVM Compiler Infrastructure 4112158Sdas// 5112158Sdas// This file is distributed under the University of Illinois Open Source 6112158Sdas// License. See LICENSE.TXT for details. 7112158Sdas// 8112158Sdas//===----------------------------------------------------------------------===// 9112158Sdas 10112158Sdas#include "DisassemblerEmitter.h" 11112158Sdas#include "CodeGenTarget.h" 12112158Sdas#include "Record.h" 13112158Sdas#include "X86DisassemblerTables.h" 14112158Sdas#include "X86RecognizableInstr.h" 15112158Sdas#include "ARMDecoderEmitter.h" 16112158Sdas 17112158Sdasusing namespace llvm; 18112158Sdasusing namespace llvm::X86Disassembler; 19112158Sdas 20112158Sdas/// DisassemblerEmitter - Contains disassembler table emitters for various 21112158Sdas/// architectures. 22112158Sdas 23112158Sdas/// X86 Disassembler Emitter 24112158Sdas/// 25112158Sdas/// *** IF YOU'RE HERE TO RESOLVE A "Primary decode conflict", LOOK DOWN NEAR 26112158Sdas/// THE END OF THIS COMMENT! 27112158Sdas/// 28112158Sdas/// The X86 disassembler emitter is part of the X86 Disassembler, which is 29112158Sdas/// documented in lib/Target/X86/X86Disassembler.h. 30112158Sdas/// 31112158Sdas/// The emitter produces the tables that the disassembler uses to translate 32112158Sdas/// instructions. The emitter generates the following tables: 33112158Sdas/// 34112158Sdas/// - One table (CONTEXTS_SYM) that contains a mapping of attribute masks to 35112158Sdas/// instruction contexts. Although for each attribute there are cases where 36112158Sdas/// that attribute determines decoding, in the majority of cases decoding is 37112158Sdas/// the same whether or not an attribute is present. For example, a 64-bit 38112158Sdas/// instruction with an OPSIZE prefix and an XS prefix decodes the same way in 39112158Sdas/// all cases as a 64-bit instruction with only OPSIZE set. (The XS prefix 40112158Sdas/// may have effects on its execution, but does not change the instruction 41112158Sdas/// returned.) This allows considerable space savings in other tables. 42112158Sdas/// - Four tables (ONEBYTE_SYM, TWOBYTE_SYM, THREEBYTE38_SYM, and 43112158Sdas/// THREEBYTE3A_SYM) contain the hierarchy that the decoder traverses while 44112158Sdas/// decoding an instruction. At the lowest level of this hierarchy are 45112158Sdas/// instruction UIDs, 16-bit integers that can be used to uniquely identify 46112158Sdas/// the instruction and correspond exactly to its position in the list of 47112158Sdas/// CodeGenInstructions for the target. 48112158Sdas/// - One table (INSTRUCTIONS_SYM) contains information about the operands of 49112158Sdas/// each instruction and how to decode them. 50112158Sdas/// 51112158Sdas/// During table generation, there may be conflicts between instructions that 52112158Sdas/// occupy the same space in the decode tables. These conflicts are resolved as 53112158Sdas/// follows in setTableFields() (X86DisassemblerTables.cpp) 54112158Sdas/// 55112158Sdas/// - If the current context is the native context for one of the instructions 56112158Sdas/// (that is, the attributes specified for it in the LLVM tables specify 57112158Sdas/// precisely the current context), then it has priority. 58112158Sdas/// - If the current context isn't native for either of the instructions, then 59112158Sdas/// the higher-priority context wins (that is, the one that is more specific). 60112158Sdas/// That hierarchy is determined by outranks() (X86DisassemblerTables.cpp) 61112158Sdas/// - If the current context is native for both instructions, then the table 62112158Sdas/// emitter reports a conflict and dies. 63112158Sdas/// 64112158Sdas/// *** RESOLUTION FOR "Primary decode conflict"S 65112158Sdas/// 66112158Sdas/// If two instructions collide, typically the solution is (in order of 67112158Sdas/// likelihood): 68112158Sdas/// 69112158Sdas/// (1) to filter out one of the instructions by editing filter() 70112158Sdas/// (X86RecognizableInstr.cpp). This is the most common resolution, but 71112158Sdas/// check the Intel manuals first to make sure that (2) and (3) are not the 72112158Sdas/// problem. 73112158Sdas/// (2) to fix the tables (X86.td and its subsidiaries) so the opcodes are 74112158Sdas/// accurate. Sometimes they are not. 75112158Sdas/// (3) to fix the tables to reflect the actual context (for example, required 76112158Sdas/// prefixes), and possibly to add a new context by editing 77112158Sdas/// lib/Target/X86/X86DisassemblerDecoderCommon.h. This is unlikely to be 78112158Sdas/// the cause. 79112158Sdas/// 80112158Sdas/// DisassemblerEmitter.cpp contains the implementation for the emitter, 81112158Sdas/// which simply pulls out instructions from the CodeGenTarget and pushes them 82112158Sdas/// into X86DisassemblerTables. 83112158Sdas/// X86DisassemblerTables.h contains the interface for the instruction tables, 84112158Sdas/// which manage and emit the structures discussed above. 85112158Sdas/// X86DisassemblerTables.cpp contains the implementation for the instruction 86112158Sdas/// tables. 87112158Sdas/// X86ModRMFilters.h contains filters that can be used to determine which 88112158Sdas/// ModR/M values are valid for a particular instruction. These are used to 89112158Sdas/// populate ModRMDecisions. 90112158Sdas/// X86RecognizableInstr.h contains the interface for a single instruction, 91112158Sdas/// which knows how to translate itself from a CodeGenInstruction and provide 92112158Sdas/// the information necessary for integration into the tables. 93112158Sdas/// X86RecognizableInstr.cpp contains the implementation for a single 94112158Sdas/// instruction. 95112158Sdas 96112158Sdasvoid DisassemblerEmitter::run(raw_ostream &OS) { 97112158Sdas CodeGenTarget Target; 98112158Sdas 99112158Sdas OS << "/*===- TableGen'erated file " 100112158Sdas << "---------------------------------------*- C -*-===*\n" 101112158Sdas << " *\n" 102112158Sdas << " * " << Target.getName() << " Disassembler\n" 103112158Sdas << " *\n" 104112158Sdas << " * Automatically generated file, do not edit!\n" 105112158Sdas << " *\n" 106112158Sdas << " *===---------------------------------------------------------------" 107112158Sdas << "-------===*/\n"; 108112158Sdas 109112158Sdas // X86 uses a custom disassembler. 110112158Sdas if (Target.getName() == "X86") { 111112158Sdas DisassemblerTables Tables; 112112158Sdas 113112158Sdas const std::vector<const CodeGenInstruction*> &numberedInstructions = 114112158Sdas Target.getInstructionsByEnumValue(); 115 116 for (unsigned i = 0, e = numberedInstructions.size(); i != e; ++i) 117 RecognizableInstr::processInstr(Tables, *numberedInstructions[i], i); 118 119 // FIXME: As long as we are using exceptions, might as well drop this to the 120 // actual conflict site. 121 if (Tables.hasConflicts()) 122 throw TGError(Target.getTargetRecord()->getLoc(), 123 "Primary decode conflict"); 124 125 Tables.emit(OS); 126 return; 127 } 128 129 // Fixed-instruction-length targets use a common disassembler. 130 if (Target.getName() == "ARM") { 131 ARMDecoderEmitter(Records).run(OS); 132 return; 133 } 134 135 throw TGError(Target.getTargetRecord()->getLoc(), 136 "Unable to generate disassembler for this target"); 137} 138