Various moves come in load and store forms, and just like on the GPR
and FPU sides there would better be only one pattern. In some cases this
is not feasible because the opcodes are too different, but quite a few
cases follow a similar standard scheme. Introduce Opcode_SIMD_FloatD and
Opcode_SIMD_IntD, generalize handling in operand_size_match() (reverse
operand handling there simply needs to match "straight" operand one),
and fix a long standing, but so far only latent bug with when to zap
found_reverse_match.
Also once again drop IgnoreSize where pointlessly applied to templates
touched anyway as well as *word when redundant with Reg*.
There are separate CPUID feature bits for fxsave/fxrstor and cmovCC
instructions. This patch adds CpuCMOV and CpuFXSR to replace Cpu686
on corresponding instructions.
gas/
* config/tc-i386.c (cpu_arch): Add .cmov and .fxsr.
(cpu_noarch): Add nocmov and nofxsr.
* doc/c-i386.texi: Document cmov and fxsr.
opcodes/
* i386-gen.c (cpu_flag_init): Add CpuCMOV and CpuFXSR to
CPU_I686_FLAGS. Add CPU_CMOV_FLAGS, CPU_FXSR_FLAGS,
CPU_ANY_CMOV_FLAGS and CPU_ANY_FXSR_FLAGS.
(cpu_flags): Add CpuCMOV and CpuFXSR.
* i386-opc.tbl: Replace Cpu686 with CpuFXSR on fxsave, fxsave64,
fxrstor and fxrstor64. Replace Cpu686 with CpuCMOV on cmovCC.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
There's no insn allowing ZEROING_MASKING alone. Re-purpose its value for
handling the not uncommon case of insns allowing either form of masking
with register operands, but only merging masking with a memory operand.
Expand Broadcast to 3 bits so that the number of bytes to broadcast
can be computed as 1 << (Broadcast - 1). Use it to simplify x86
assembler.
gas/
* config/tc-i386.c (Broadcast_Operation): Add bytes.
(build_evex_prefix): Use i.broadcast->bytes.
(match_broadcast_size): New function.
(check_VecOperands): Use the broadcast field to compute the
number of bytes to broadcast directly. Set i.broadcast->bytes.
Use match_broadcast_size.
opcodes/
* i386-gen.c (adjust_broadcast_modifier): New function.
(process_i386_opcode_modifier): Add an argument for operands.
Adjust the Broadcast value based on operands.
(output_i386_opcode): Pass operand_types to
process_i386_opcode_modifier.
(process_i386_opcodes): Pass NULL as operands to
process_i386_opcode_modifier.
* i386-opc.h (BYTE_BROADCAST): New.
(WORD_BROADCAST): Likewise.
(DWORD_BROADCAST): Likewise.
(QWORD_BROADCAST): Likewise.
(i386_opcode_modifier): Expand broadcast to 3 bits.
* i386-tbl.h: Regenerated.
Just like for their AVX counterparts and CVTSI2S{D,S}, a memory source
here is ambiguous and hence
- in source files should be qualified with a suitable suffix or operand
size specifier (not doing so is an error in Intel mode, and will gain
a diagnostic in AT&T mode in the future),
- in disassembly should be properly suffixed (the Intel operand size
specifiers were emitted correctly already).
When multiple (here: two) forms of an insn take different width inputs
but produce identical size outputs (here: RegXMM), the templates can be
combined.
Also drop IgnoreSize (and the now redundant size specifiers) wherever
applicable.
These are special because they may not have a register operand to derive
the vector length from, which requires to also deal with the braodcast
case when determining vector length in build_evex_prefix().
Also drop IgnoreSize (and the now redundant size specifiers) from their
suffixed counterparts.
After
commit 1b54b8d7e4
Author: Jan Beulich <jbeulich@novell.com>
Date: Mon Dec 18 09:36:14 2017 +0100
x86: fold RegXMM/RegYMM/RegZMM into RegSIMD
... qualified by their respective sizes, allowing to drop FirstXmm0 at
the same time.
folded RegXMM, RegYMM and RegZMM into RegSIMD, it's no longer impossible
to distinguish if Xmmword can represent a memory reference when operand
specification contains SIMD register. For example, template operands
specification like these
RegXMM|...|Xmmword|...
and
RegXMM|...
The Xmmword bitfield is always set by RegXMM which is represented by
"RegSIMD|Xmmword". This patch splits each of vcvtps2qq, vcvtps2uqq,
vcvttps2qq and vcvttps2uqq into 2 templates: one template only has
RegXMM source operand and the other only has mempry source operand.
gas/
PR gas/23418
* testsuite/gas/i386/xmmword.s: Add tests for vcvtps2qq,
vcvtps2uqq, vcvttps2qq and vcvttps2uqq.
* testsuite/gas/i386/xmmword.l: Updated.
opcodes/
PR gas/23418
* i386-opc.h (Byte): Update comments.
(Word): Likewise.
(Dword): Likewise.
(Fword): Likewise.
(Qword): Likewise.
(Tbyte): Likewise.
(Xmmword): Likewise.
(Ymmword): Likewise.
(Zmmword): Likewise.
* i386-opc.tbl: Split vcvtps2qq, vcvtps2uqq, vcvttps2qq and
vcvttps2uqq.
* i386-tbl.h: Regenerated.
Architecturally, MONITOR's and MONITORX'es memory operand is a 16- or
32-bit register outside of 64-bit mode, and a 64- or 32-bit register
inside 64-bit mode. The other register operands, including all of them
for MWAIT and MWAITX, are uniformly 32-bit, irrespective of mode. Retain
the original 64-bit MONITOR{,X} templates for compatibility only, and
fold the MWAIT{,X} ones.
First of all there's no point in having separate Cpu386 templates - the
respective SReg3 registers can't be specified for pre-386 anyway; see
parse_real_register().
And then we can also make use of D here for the memory forms of the
insn. This cannot be done for the non-64bit GPR forms because of the
IgnoreSize that cannot be dropped from the to-SREG variant.
There's little point carrying up to three templates per insn flavor
when the sole difference is operand size and the dependency on AVX512VL
being enabled. Instead the need for AVX512VL can be derived from an
operand allowing for ZMMword as well as one or both or XMMword and
YMMword (irrespective of whether this is a register or memory operand).
Without further abstraction to deal with the different Disp8MemShift
values between the templates, only a limited set (mostly ones only
allowing for non-memory operands) can be folded, which is being done
here.
Also drop IgnoreSize wherever possible from anything that's being
touched anyway.
It's not clear to me why they had been introduced - the respective
comments in opcodes/i386-gen.c are certainly wrong: ymm<N> registers
are very well supported (and necessary) with just AVX512F.
Since only the first 32 bits of input operand are used for tpause and
umwait, the REX.W bit is skipped. Both 32-bit registers and 64-bit
registers are allowed.
gas/
* testsuite/gas/i386/x86-64-waitpkg.s: Add 32-bit registers
tests for tpause and umwait.
* testsuite/gas/i386/x86-64-waitpkg-intel.d: Updated.
* testsuite/gas/i386/x86-64-waitpkg.d: Likewise.
opcodes/
* i386-dis.c (prefix_table): Replace Em with Edq on tpause and
umwait.
* i386-opc.tbl: Allow 32-bit registers for tpause and umwait in
64-bit mode.
* i386-tbl.h: Regenerated.
It again can be inferred from other information.
The vpopcntd templates all need to have Dword added to their memory
operands; the lack thereof was actually a bug preventing certain Intel
syntax code to assemble, so test cases get extended.
The wrong placement of the Load attribute in the templates prevented
this from working. The disassembler also didn't handle this consistently
with other similar dual-encoding insns.
While many templates allowing multiple suitably matching XMM/YMM/ZMM
operand sizes can be folded, a few need to be split in order to not
wrongly accept "xmmword ptr" operands when only XMM registers are
permitted (and memory operands are more narrow). Add a test case
validating this.
"clr reg" is an alias of "xor reg, reg". We can encode "clr reg64" as
"xor reg32, reg32".
gas/
* config/tc-i386.c (optimize_encoding): Also encode "clr reg64"
as "xor reg32, reg32".
* testsuite/gas/i386/x86-64-optimize-1.s: Add "clr reg64" tests.
* testsuite/gas/i386/x86-64-optimize-1.d: Updated.
opcodes/
* i386-opc.tbl: Add Optimize to clr.
* i386-tbl.h: Regenerated.
The differences between some of the register and memory forms of the
same insn often don't really require the templates to be separate. For
example, Disp8MemShift is simply irrelevant to register forms. Fold
these as far as possible, and also fold register-only forms. Further
folding is possible, but needs other prereq work done first.
A note regarding EVEXDYN: This is intended to be used only when no other
properties of the template would make is_evex_encoding() return true. In
all "normal" cases I think it is preferable to omit this indicator, to
keep the table half way readable.
Their memory forms were bogusly using VexLWP instead of VexNDD. Adjust
VexNDD handling to cope with these, allowing their register and memory
forms to be folded.
The differences between some of the register and memory forms of the
same insn often don't really require the templates to be separate. For
example, Disp8MemShift is simply irrelevant to register forms. Fold them
as far as possible. Further folding is possible, but needs other prereq
work done first.
They aren't really useful (anymore?): The conflicting operand size check
isn't applicable to any insn validly using respective memory operand
sizes (and if they're used wrongly, another error would result), and the
logic in process_suffix() can be easily changed to work without them.
While re-structuring conditionals in process_suffix() also drop the
CMPXCHG8B special case in favor of a NoRex64 attribute in the opcode
table.
Neither touches any XMM register, so the check is pointless. It is imo
even questionable whether in SSE2AVX mode the two should be converted to
their AVX counterparts.
This requires a change to ModR/M handling: Recording of displacement
types must not discard operand size information. Change the respective
code to alter only .disp<N>.
On x86, some instructions have alternate shorter encodings:
1. When the upper 32 bits of destination registers of
andq $imm31, %r64
testq $imm31, %r64
xorq %r64, %r64
subq %r64, %r64
known to be zero, we can encode them without the REX_W bit:
andl $imm31, %r32
testl $imm31, %r32
xorl %r32, %r32
subl %r32, %r32
This optimization is enabled with -O, -O2 and -Os.
2. Since 0xb0 mov with 32-bit destination registers zero-extends 32-bit
immediate to 64-bit destination register, we can use it to encode 64-bit
mov with 32-bit immediates. This optimization is enabled with -O, -O2
and -Os.
3. Since the upper bits of destination registers of VEX128 and EVEX128
instructions are extended to zero, if all bits of destination registers
of AVX256 or AVX512 instructions are zero, we can use VEX128 or EVEX128
encoding to encode AVX256 or AVX512 instructions. When 2 source
registers are identical, AVX256 and AVX512 andn and xor instructions:
VOP %reg, %reg, %dest_reg
can be encoded with
VOP128 %reg, %reg, %dest_reg
This optimization is enabled with -O2 and -Os.
4. 16-bit, 32-bit and 64-bit register tests with immediate may be
encoded as 8-bit register test with immediate. This optimization is
enabled with -Os.
This patch does:
1. Add {nooptimize} pseudo prefix to disable instruction size
optimization.
2. Add optimize to i386_opcode_modifier to tell assembler that encoding
of an instruction may be optimized.
gas/
PR gas/22871
* NEWS: Mention -O[2|s].
* config/tc-i386.c (_i386_insn): Add no_optimize.
(optimize): New.
(optimize_for_space): Likewise.
(fits_in_imm7): New function.
(fits_in_imm31): Likewise.
(optimize_encoding): Likewise.
(md_assemble): Call optimize_encoding to optimize encoding.
(parse_insn): Handle {nooptimize}.
(md_shortopts): Append "O::".
(md_parse_option): Handle -On.
* doc/c-i386.texi: Document -O0, -O, -O1, -O2 and -Os as well
as {nooptimize}.
* testsuite/gas/cfi/cfi-x86_64.d: Pass -O0 to assembler.
* testsuite/gas/i386/ilp32/cfi/cfi-x86_64.d: Likewise.
* testsuite/gas/i386/i386.exp: Run optimize-1, optimize-2,
optimize-3, x86-64-optimize-1, x86-64-optimize-2,
x86-64-optimize-3 and x86-64-optimize-4.
* testsuite/gas/i386/optimize-1.d: New file.
* testsuite/gas/i386/optimize-1.s: Likewise.
* testsuite/gas/i386/optimize-2.d: Likewise.
* testsuite/gas/i386/optimize-2.s: Likewise.
* testsuite/gas/i386/optimize-3.d: Likewise.
* testsuite/gas/i386/optimize-3.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-1.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-1.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-2.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-2.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-3.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-3.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-4.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-4.s: Likewise.
opcodes/
PR gas/22871
* i386-gen.c (opcode_modifiers): Add Optimize.
* i386-opc.h (Optimize): New enum.
(i386_opcode_modifier): Add optimize.
* i386-opc.tbl: Add "Optimize" to "mov $imm, reg",
"sub reg, reg/mem", "test $imm, acc", "test $imm, reg/mem",
"and $imm, acc", "and $imm, reg/mem", "xor reg, reg/mem",
"movq $imm, reg" and AVX256 and AVX512 versions of vandnps,
vandnpd, vpandn, vpandnd, vpandnq, vxorps, vxorpd, vpxor,
vpxord and vpxorq.
* i386-tbl.h: Regenerated.
Add {rex} pseudo prefix to generate a REX byte for integer and legacy
vector instructions if possible. Note that this differs from the rex
prefix which generates REX prefix unconditionally.
gas/
* config/tc-i386.c (_i386_insn): Add rex_encoding.
(md_assemble): When i.rex_encoding is true, generate a REX byte
if possible.
(parse_insn): Set i.rex_encoding for {rex}.
* doc/c-i386.texi: Document {rex}.
* testsuite/gas/i386/x86-64-pseudos.s: Add {rex} tests.
* testsuite/gas/i386/x86-64-pseudos.d: Updated.
opcodes/
* i386-opc.tbl: Add {rex},
* i386-tbl.h: Regenerated.
Just like their packed counterparts the memory operand is always 16
bytes wide, and the Disp8 scaling is the same for all of them. (As a
side note: I'm also surprised by there being AVX512VL variants of
these as well as the AVX512_4VNNIW ones - the SDM doesn't define any
such.)
Adjust the test cases also for the packed forms to actually live up to
their promise of testing correct Disp8 encoding.
For historical reason, we allow movd/vmovd with 64-bit register and
memeory operands. But for vmovd, we failed to handle 64-bit memeory
operand. This has been gone unnoticed since AT&T syntax always treats
memory operand as 32-bit memory. This patch properly encodes vmovd
with 64-bit memeory operands. It also removes AVX512 vmovd with 64-bit
operands since GCC has
case TYPE_SSEMOV:
switch (get_attr_mode (insn))
{
case MODE_DI:
/* Handle broken assemblers that require movd instead of movq. */
if (!HAVE_AS_IX86_INTERUNIT_MOVQ
&& (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
return "%vmovd\t{%1, %0|%0, %1}";
return "%vmovq\t{%1, %0|%0, %1}";
and all AVX512 GNU assemblers set HAVE_AS_IX86_INTERUNIT_MOVQ, GCC won't
generate AVX512 vmovd with 64-bit operand.
gas/
PR gas/22681
* testsuite/gas/i386/i386.exp: Run x86-64-movd and
x86-64-movd-intel.
* testsuite/gas/i386/x86-64-movd-intel.d: New file.
* testsuite/gas/i386/x86-64-movd.d: Likewise.
* testsuite/gas/i386/x86-64-movd.s: Likewise.
opcodes/
PR gas/22681
* i386-opc.tbl: Properly encode vmovd with Qword memeory operand.
Remove AVX512 vmovd with 64-bit operands.
* i386-tbl.h: Regenerated.
Just like for instructions in GPRs, there's no need to have separate
templates for otherwise identical insns acting on XMM or YMM registers
(or memory of the same size).
Use a combination of a single new Reg bit and Byte, Word, Dword, or
Qword instead.
Besides shrinking the number of operand type bits this has the benefit
of making register handling more similar to accumulator handling (a
generic flag is being accompanied by a "size qualifier"). It requires,
however, to split a few insn templates, as it is no longer correct to
have combinations like Reg32|Reg64|Byte. This slight growth in size will
hopefully be outweighed by this change paving the road for folding a
presumably much larger number of templates later on.
They are relevant only when multiple operands permit registers:
operand_type_register_match() returns true if either operand is not a
register one. IOW
grep -i CheckRegSize i386-opc.tbl | grep -Ev "(Reg[8136]|Acc).*,.*(Reg|Acc)"
should produce no output.
BaseIndex implies - with the exception of string instructions the
optional presence of a displacement. This is almost completely uniform
for all instructions (the sole exception being MPX ones, which don't
allow 16-bit addressing and hence Disp16), so there's no point in
explicitly stating this in the main opcode table. Drop those explict
specifications in favor of adding logic to i386-gen, shrinking the
table size quite a bit and hence making it more readable.
The opcodes/i386-tbl.h changes are due to a few cases where pointless
Disp* still hadn't been removed from their insns.
Make the assembler recognize UD0, supporting only the newer form
expecting a ModR/M byte.
Make assembler and disassembler properly emit / expect a ModR/M byte for
UD1.
For the testsuite, as arch-4 already tests all UDn, avoid producing a
huge delta for other tests using UD2B by making them use UD2 instead.
Just like %cxl can't be used as shift count register. Otherwise for
consistency %cxl would need to gain "ShiftCount" and use of both ought
to properly cause REX prefixes to be emitted.
Just like is the case for xsave{s,c}64 and xrstors64 already. I wonder
though why xsave{s,c} and xrstors don't allow for the q suffix, other
than the other insns without the "64" suffix do.
Update x86 assembler and disassembler for CET v2.0:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
1. incsspd and incsspq are changed to take a register opeand with a
different opcode.
2. setssbsy is changed to take no opeand with a different opcode.
gas/
* testsuite/gas/i386/cet-intel.d: Updated.
* testsuite/gas/i386/cet.d: Likewise.
* testsuite/gas/i386/x86-64-cet-intel.d: Likewise.
* testsuite/gas/i386/x86-64-cet.d: Likewise.
* testsuite/gas/i386/cet.s: Update incsspd and setssbsy tests.
* testsuite/gas/i386/x86-64-cet.s: Likewise.
opcodes/
* i386-dis.c (RM_0FAE_REG_5): Removed.
(PREFIX_MOD_3_0F01_REG_5_RM_1): Likewise.
(PREFIX_MOD_3_0F01_REG_5_RM_0): New.
(PREFIX_MOD_3_0FAE_REG_5): Likewise.
(prefix_table): Remove PREFIX_MOD_3_0F01_REG_5_RM_1. Add
PREFIX_MOD_3_0F01_REG_5_RM_0.
(prefix_table): Update PREFIX_MOD_0_0FAE_REG_5. Add
PREFIX_MOD_3_0FAE_REG_5.
(mod_table): Update MOD_0FAE_REG_5.
(rm_table): Update RM_0F01_REG_5. Remove RM_0FAE_REG_5.
* i386-opc.tbl: Update incsspd, incsspq and setssbsy.
* i386-tbl.h: Regenerated.
Many x86 instructions have more than one encodings. Assembler picks
the default one, usually the shortest one. Although the ".s", ".d8"
and ".d32" suffixes can be used to swap register operands or specify
displacement size, they aren't very flexible. This patch adds pseudo
prefixes, {xxx}, to control instruction encoding. The available
pseudo prefixes are {disp8}, {disp32}, {load}, {store}, {vex2}, {vex3}
and {evex}. Pseudo prefixes are preferred over the ".s", ".d8" and
".d32" suffixes, which are deprecated.
gas/
* config/tc-i386.c (_i386_insn): Add dir_encoding and
vec_encoding. Remove swap_operand and need_vrex.
(extra_symbol_chars): Add '}'.
(md_begin): Mark '}' with LEX_BEGIN_NAME. Allow '}' in
mnemonic.
(build_vex_prefix): Don't use 2-byte VEX encoding with
{vex3}. Check dir_encoding and load.
(parse_insn): Check pseudo prefixes. Set dir_encoding.
(VEX_check_operands): Likewise.
(match_template): Check dir_encoding and load.
(parse_real_register): Set vec_encoding instead of need_vrex.
(parse_register): Likewise.
* doc/c-i386.texi: Document {disp8}, {disp32}, {load}, {store},
{vex2}, {vex3} and {evex}. Remove ".s", ".d8" and ".d32"
* testsuite/gas/i386/i386.exp: Run pseudos and x86-64-pseudos.
* testsuite/gas/i386/pseudos.d: New file.
* testsuite/gas/i386/pseudos.s: Likewise.
* testsuite/gas/i386/x86-64-pseudos.d: Likewise.
* testsuite/gas/i386/x86-64-pseudos.s: Likewise.
opcodes/
* i386-gen.c (opcode_modifiers): Replace S with Load.
* i386-opc.h (S): Removed.
(Load): New.
(i386_opcode_modifier): Replace s with load.
* i386-opc.tbl: Add {disp8}, {disp32}, {swap}, {vex2}, {vex3}
and {evex}. Replace S with Load.
* i386-tbl.h: Regenerated.
Just like REX.W affects operand size of the implicit rAX/rDX inputs to
PCMPESTR{I,M}, VEX.W does for VPCMPESTR{I,M}. Allow Q or L suffixes on
the instructions.
Similarly the disassembler needs to be adjusted to no longer require
VEX.W to be zero for the instructions to be valid, and to emit proper
suffixes.
Note, however, that this doesn't address the problem of there being no
way to control (at least) {,E}VEX.W for 32- or 16-bit code. Nor does it
address the problem of the many WIG instructions not getting properly
disassembled when VEX.W=1.
AVX512F vmovq doesn't support masking. We can't swap register operand
in AVX512F vmovq with Reg64 since Reg64 != RegXMM. This patch merges
AVX512F vmovq.
* i386-opc.tbl: Merge AVX512F vmovq.
... just like is already the case for 16- and 32-bit movzb: I can't see
why omitting suffixes on this (and movs{b,w,l}) is not allowed, when it
is allowed for all other instructions where the suffix is redundant
with (one of) the operands.
The dual purpose mnemonic (string move vs scalar double move) breaks
the assumption that the isstring flag would be set on both the first
and last entry in the current set of templates, which results in bogus
or missing diagnostics for the string move variant of the mnemonic.
Short of mostly rewriting i386_index_check() and its interaction with
the rest of the code, simply shrink the template set to just string
instructions when encountering the second memory operand, and run
i386_index_check() a second time for the first memory operand after
that reduction.
AMD64 spec and Intel64 spec differ in indirect branches in 64-bit mode.
AMD64 supports indirect branches with 16-bit address via the data size
prefix while the data size prefix is ignored by Intel64.
gas/
PR binutis/18386
* testsuite/gas/i386/i386.exp: Run x86-64-branch-4.
* testsuite/gas/i386/x86-64-branch.d: Updated.
* testsuite/gas/i386/ilp32/x86-64-branch.d: Likewise.
* testsuite/gas/i386/x86-64-branch-4.l: New file.
* testsuite/gas/i386/x86-64-branch-4.s: Likewise.
opcodes/
PR binutis/18386
* i386-dis.c (indirEv): Replace stack_v_mode with indir_v_mode.
(indir_v_mode): New.
Add comments for '&'.
(reg_table): Replace "{T|}" with "{&|}" on call and jmp.
(putop): Handle '&'.
(intel_operand_size): Handle indir_v_mode.
(OP_E_register): Likewise.
* i386-opc.tbl: Mark 64-bit indirect call/jmp as AMD64. Add
64-bit indirect call/jmp for AMD64.
* i386-tbl.h: Regenerated
AMD64 vs CpuIntel64 ISA should be handled similar as AT&T vs Intel
syntax. Since cpu_flags isn't sorted by position, we need to check
the whole cpu_flags array for the maximum position when verifying
CpuMax.
gas/
PR gas/20154
* config/tc-i386.c (cpu_flags_match): Don't set cpuamd64 nor
cpuintel64.
(match_template): Check Intel64/AMD64 ISA.
opcodes/
PR gas/20154
* i386-gen.c (cpu_flags): Remove CpuAMD64 and CpuIntel64.
(opcode_modifiers): Add AMD64 and Intel64.
(main): Properly verify CpuMax.
* i386-opc.h (CpuAMD64): Removed.
(CpuIntel64): Likewise.
(CpuMax): Set to CpuNo64.
(i386_cpu_flags): Remove cpuamd64 and cpuintel64.
(AMD64): New.
(Intel64): Likewise.
(i386_opcode_modifier): Add amd64 and intel64.
(i386-opc.tbl): Replace CpuAMD64/CpuIntel64 with AMD64/Intel64
on call and jmp.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
CpuMax should be CpuIntel64, not CpuNo64. i386-gen.c is updated to
verify that CpuMax is correct. X86 assembler is updated to properly
set cpuamd64 and cpuintel64.
gas/
PR gas/20154
* config/tc-i386.c (intel64): New.
(cpu_flags_match): Set cpuamd64 and cpuintel64.
(md_parse_option): Set intel64 instead of cpuamd64 and
cpuintel64.
opcodes/
PR gas/20154
* i386-gen.c (main): Fail if CpuMax is incorrect.
* i386-opc.h (CpuMax): Set to CpuIntel64.
* i386-tbl.h: Regenerated.
As pointed out before, the documentation mandates the rounding mode to
follow the GPR, so gas should accept such input. As the brojen code got
released already we sadly will need to continue to also accept the
badly ordered operands.
gas/testsuite/
2015-06-01 Jan Beulich <jbeulich@suse.com>
* gas/i386/avx512f-intel.d: Adjust expectations on operand order.
* gas/i386/evex-lig256-intel.d: Likewise.
* gas/i386/evex-lig512-intel.d: Likewise.
* gas/i386/x86-64-avx512f-intel.d: Likewise.
* gas/i386/x86-64-evex-lig256-intel.d: Likewise.
* gas/i386/x86-64-evex-lig512-intel.d: Likewise.
opcodes/
2015-06-01 Jan Beulich <jbeulich@suse.com>
* i386-opc.tbl: New IntelSyntax entries for vcvt{,u}si2s{d,s}.
* i386-tbl.h: Regenerate.
AMD64 spec and Intel64 spec differ in direct unconditional branches in
64-bit mode. AMD64 supports direct unconditional branches with 16-bit
offset via the data size prefix, which truncates RIP to 16 bits, while
the data size prefix is ignored by Intel64.
This patch adds -mamd64/-mintel64 option to x86-64 assembler and
-Mamd64/-Mintel64 option to x86-64 disassembler. The most permissive
ISA, which is AMD64, is the default.
GDB can add an option, similar to
(gdb) help set disassembly-flavor
Set the disassembly flavor.
The valid values are "att" and "intel", and the default value is "att".
to select which ISA to disassemble.
binutils/
PR binutis/18386
* doc/binutils.texi: Document -Mamd64 and -Mintel64.
gas/
PR binutis/18386
* config/tc-i386.c (OPTION_MAMD64): New.
(OPTION_MINTEL64): Likewise.
(md_longopts): Add -mamd64 and -mintel64.
(md_parse_option): Handle OPTION_MAMD64 and OPTION_MINTEL64.
(md_show_usage): Add -mamd64 and -mintel64.
* doc/c-i386.texi: Document -mamd64 and -mintel64.
gas/testsuite/
PR binutis/18386
* gas/i386/i386.exp: Run x86-64-branch-2 and x86-64-branch-3.
* gas/i386/x86-64-branch.d: Also pass -Mintel64 to objdump.
* gas/i386/ilp32/x86-64-branch.d: Likewise.
* gas/i386/x86-64-branch-2.d: New file.
* gas/i386/x86-64-branch-2.s: Likewise.
* gas/i386/x86-64-branch-3.l: Likewise.
* gas/i386/x86-64-branch-3.s: Likewise.
ld/testsuite/
PR binutis/18386
* ld-x86-64/tlsgdesc.dd: Also pass -Mintel64 to objdump.
* ld-x86-64/tlspic.dd: Likewise.
* ld-x86-64/x86-64.exp (x86_64tests): Also pass -Mintel64 to
objdump for tlspic.dd and tlsgdesc.dd.
opcodes/
PR binutis/18386
* i386-dis.c: Add comments for '@'.
(x86_64_table): Use '@' on call/jmp for X86_64_E8/X86_64_E9.
(enum x86_64_isa): New.
(isa64): Likewise.
(print_i386_disassembler_options): Add amd64 and intel64.
(print_insn): Handle amd64 and intel64.
(putop): Handle '@'.
(OP_J): Don't ignore the operand size prefix for AMD64 in 64-bit.
* i386-gen.c (cpu_flags): Add CpuAMD64 and CpuIntel64.
* i386-opc.h (AMD64): New.
(CpuIntel64): Likewise.
(i386_cpu_flags): Add cpuamd64 and cpuintel64.
* i386-opc.tbl: Add direct call/jmp with Disp16|Disp32 for AMD64.
Mark direct call/jmp without Disp16|Disp32 as Intel64.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
Disp16 and Disp32 aren't supported by direct branches in 64-bit mode.
This patch removes them from 64-bit direct branches.
* opcodes/i386-opc.tbl (call): Remove Disp16|Disp32 from 64-bit
direct branch.
(jmp): Likewise.
* i386-tbl.h: Regenerated.
For gathers with indices larger than elements (e. g.)
vpgatherqd ymm6{k1}, ZMMWORD PTR [ebp+zmm7*8-123]
We currently treat memory size as a size of index register, while it is
actually should be size of destination register:
vpgatherqd ymm6{k1}, YMMWORD PTR [ebp+zmm7*8-123]
This patch fixes it.
opcodes/
* i386-opc.tbl: Change memory size for vgatherpf0qps, vgatherpf1qps,
vscatterpf0qps, vscatterpf1qps, vgatherqps, vpgatherqd, vpscatterqd,
vscatterqps.
* i386-tbl.h: Regenerate.
gas/testsuite/
* gas/i386/avx512pf-intel.d: Change memory size for vgatherpf0qps,
vgatherpf1qps, vscatterpf0qps, vscatterpf1qps.
* gas/i386/avx512pf.s: Ditto.
* gas/i386/x86-64-avx512pf-intel.d: Ditto.
* gas/i386/x86-64-avx512pf.s: Ditto.
* gas/i386/avx512f-intel.d: Change memory size for vgatherqps,
vpgatherqd, vpscatterqd, vscatterqps.
* gas/i386/avx512f.s: Ditto.
* gas/i386/x86-64-avx512f-intel.d: Ditto.
* gas/i386/x86-64-avx512f.s: Ditto.
We currently support version of vcvtps2ph with sae and only 1 register operand.
This version is encoded as if missing operand was equal to ymm0.
I didn't found any references to this variant in
http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
This patch removes it.
opcodes/
* i386-opc.tbl: Remove wrong variant of vcvtps2ph
* i386-tbl.h: Regenerate.