Neither touches any XMM register, so the check is pointless. It is imo
even questionable whether in SSE2AVX mode the two should be converted to
their AVX counterparts.
This requires a change to ModR/M handling: Recording of displacement
types must not discard operand size information. Change the respective
code to alter only .disp<N>.
On x86, some instructions have alternate shorter encodings:
1. When the upper 32 bits of destination registers of
andq $imm31, %r64
testq $imm31, %r64
xorq %r64, %r64
subq %r64, %r64
known to be zero, we can encode them without the REX_W bit:
andl $imm31, %r32
testl $imm31, %r32
xorl %r32, %r32
subl %r32, %r32
This optimization is enabled with -O, -O2 and -Os.
2. Since 0xb0 mov with 32-bit destination registers zero-extends 32-bit
immediate to 64-bit destination register, we can use it to encode 64-bit
mov with 32-bit immediates. This optimization is enabled with -O, -O2
and -Os.
3. Since the upper bits of destination registers of VEX128 and EVEX128
instructions are extended to zero, if all bits of destination registers
of AVX256 or AVX512 instructions are zero, we can use VEX128 or EVEX128
encoding to encode AVX256 or AVX512 instructions. When 2 source
registers are identical, AVX256 and AVX512 andn and xor instructions:
VOP %reg, %reg, %dest_reg
can be encoded with
VOP128 %reg, %reg, %dest_reg
This optimization is enabled with -O2 and -Os.
4. 16-bit, 32-bit and 64-bit register tests with immediate may be
encoded as 8-bit register test with immediate. This optimization is
enabled with -Os.
This patch does:
1. Add {nooptimize} pseudo prefix to disable instruction size
optimization.
2. Add optimize to i386_opcode_modifier to tell assembler that encoding
of an instruction may be optimized.
gas/
PR gas/22871
* NEWS: Mention -O[2|s].
* config/tc-i386.c (_i386_insn): Add no_optimize.
(optimize): New.
(optimize_for_space): Likewise.
(fits_in_imm7): New function.
(fits_in_imm31): Likewise.
(optimize_encoding): Likewise.
(md_assemble): Call optimize_encoding to optimize encoding.
(parse_insn): Handle {nooptimize}.
(md_shortopts): Append "O::".
(md_parse_option): Handle -On.
* doc/c-i386.texi: Document -O0, -O, -O1, -O2 and -Os as well
as {nooptimize}.
* testsuite/gas/cfi/cfi-x86_64.d: Pass -O0 to assembler.
* testsuite/gas/i386/ilp32/cfi/cfi-x86_64.d: Likewise.
* testsuite/gas/i386/i386.exp: Run optimize-1, optimize-2,
optimize-3, x86-64-optimize-1, x86-64-optimize-2,
x86-64-optimize-3 and x86-64-optimize-4.
* testsuite/gas/i386/optimize-1.d: New file.
* testsuite/gas/i386/optimize-1.s: Likewise.
* testsuite/gas/i386/optimize-2.d: Likewise.
* testsuite/gas/i386/optimize-2.s: Likewise.
* testsuite/gas/i386/optimize-3.d: Likewise.
* testsuite/gas/i386/optimize-3.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-1.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-1.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-2.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-2.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-3.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-3.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-4.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-4.s: Likewise.
opcodes/
PR gas/22871
* i386-gen.c (opcode_modifiers): Add Optimize.
* i386-opc.h (Optimize): New enum.
(i386_opcode_modifier): Add optimize.
* i386-opc.tbl: Add "Optimize" to "mov $imm, reg",
"sub reg, reg/mem", "test $imm, acc", "test $imm, reg/mem",
"and $imm, acc", "and $imm, reg/mem", "xor reg, reg/mem",
"movq $imm, reg" and AVX256 and AVX512 versions of vandnps,
vandnpd, vpandn, vpandnd, vpandnq, vxorps, vxorpd, vpxor,
vpxord and vpxorq.
* i386-tbl.h: Regenerated.
Add {rex} pseudo prefix to generate a REX byte for integer and legacy
vector instructions if possible. Note that this differs from the rex
prefix which generates REX prefix unconditionally.
gas/
* config/tc-i386.c (_i386_insn): Add rex_encoding.
(md_assemble): When i.rex_encoding is true, generate a REX byte
if possible.
(parse_insn): Set i.rex_encoding for {rex}.
* doc/c-i386.texi: Document {rex}.
* testsuite/gas/i386/x86-64-pseudos.s: Add {rex} tests.
* testsuite/gas/i386/x86-64-pseudos.d: Updated.
opcodes/
* i386-opc.tbl: Add {rex},
* i386-tbl.h: Regenerated.
Just like their packed counterparts the memory operand is always 16
bytes wide, and the Disp8 scaling is the same for all of them. (As a
side note: I'm also surprised by there being AVX512VL variants of
these as well as the AVX512_4VNNIW ones - the SDM doesn't define any
such.)
Adjust the test cases also for the packed forms to actually live up to
their promise of testing correct Disp8 encoding.
For historical reason, we allow movd/vmovd with 64-bit register and
memeory operands. But for vmovd, we failed to handle 64-bit memeory
operand. This has been gone unnoticed since AT&T syntax always treats
memory operand as 32-bit memory. This patch properly encodes vmovd
with 64-bit memeory operands. It also removes AVX512 vmovd with 64-bit
operands since GCC has
case TYPE_SSEMOV:
switch (get_attr_mode (insn))
{
case MODE_DI:
/* Handle broken assemblers that require movd instead of movq. */
if (!HAVE_AS_IX86_INTERUNIT_MOVQ
&& (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
return "%vmovd\t{%1, %0|%0, %1}";
return "%vmovq\t{%1, %0|%0, %1}";
and all AVX512 GNU assemblers set HAVE_AS_IX86_INTERUNIT_MOVQ, GCC won't
generate AVX512 vmovd with 64-bit operand.
gas/
PR gas/22681
* testsuite/gas/i386/i386.exp: Run x86-64-movd and
x86-64-movd-intel.
* testsuite/gas/i386/x86-64-movd-intel.d: New file.
* testsuite/gas/i386/x86-64-movd.d: Likewise.
* testsuite/gas/i386/x86-64-movd.s: Likewise.
opcodes/
PR gas/22681
* i386-opc.tbl: Properly encode vmovd with Qword memeory operand.
Remove AVX512 vmovd with 64-bit operands.
* i386-tbl.h: Regenerated.
Just like for instructions in GPRs, there's no need to have separate
templates for otherwise identical insns acting on XMM or YMM registers
(or memory of the same size).
Use a combination of a single new Reg bit and Byte, Word, Dword, or
Qword instead.
Besides shrinking the number of operand type bits this has the benefit
of making register handling more similar to accumulator handling (a
generic flag is being accompanied by a "size qualifier"). It requires,
however, to split a few insn templates, as it is no longer correct to
have combinations like Reg32|Reg64|Byte. This slight growth in size will
hopefully be outweighed by this change paving the road for folding a
presumably much larger number of templates later on.
They are relevant only when multiple operands permit registers:
operand_type_register_match() returns true if either operand is not a
register one. IOW
grep -i CheckRegSize i386-opc.tbl | grep -Ev "(Reg[8136]|Acc).*,.*(Reg|Acc)"
should produce no output.
BaseIndex implies - with the exception of string instructions the
optional presence of a displacement. This is almost completely uniform
for all instructions (the sole exception being MPX ones, which don't
allow 16-bit addressing and hence Disp16), so there's no point in
explicitly stating this in the main opcode table. Drop those explict
specifications in favor of adding logic to i386-gen, shrinking the
table size quite a bit and hence making it more readable.
The opcodes/i386-tbl.h changes are due to a few cases where pointless
Disp* still hadn't been removed from their insns.
Make the assembler recognize UD0, supporting only the newer form
expecting a ModR/M byte.
Make assembler and disassembler properly emit / expect a ModR/M byte for
UD1.
For the testsuite, as arch-4 already tests all UDn, avoid producing a
huge delta for other tests using UD2B by making them use UD2 instead.
Just like %cxl can't be used as shift count register. Otherwise for
consistency %cxl would need to gain "ShiftCount" and use of both ought
to properly cause REX prefixes to be emitted.
Just like is the case for xsave{s,c}64 and xrstors64 already. I wonder
though why xsave{s,c} and xrstors don't allow for the q suffix, other
than the other insns without the "64" suffix do.
Update x86 assembler and disassembler for CET v2.0:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
1. incsspd and incsspq are changed to take a register opeand with a
different opcode.
2. setssbsy is changed to take no opeand with a different opcode.
gas/
* testsuite/gas/i386/cet-intel.d: Updated.
* testsuite/gas/i386/cet.d: Likewise.
* testsuite/gas/i386/x86-64-cet-intel.d: Likewise.
* testsuite/gas/i386/x86-64-cet.d: Likewise.
* testsuite/gas/i386/cet.s: Update incsspd and setssbsy tests.
* testsuite/gas/i386/x86-64-cet.s: Likewise.
opcodes/
* i386-dis.c (RM_0FAE_REG_5): Removed.
(PREFIX_MOD_3_0F01_REG_5_RM_1): Likewise.
(PREFIX_MOD_3_0F01_REG_5_RM_0): New.
(PREFIX_MOD_3_0FAE_REG_5): Likewise.
(prefix_table): Remove PREFIX_MOD_3_0F01_REG_5_RM_1. Add
PREFIX_MOD_3_0F01_REG_5_RM_0.
(prefix_table): Update PREFIX_MOD_0_0FAE_REG_5. Add
PREFIX_MOD_3_0FAE_REG_5.
(mod_table): Update MOD_0FAE_REG_5.
(rm_table): Update RM_0F01_REG_5. Remove RM_0FAE_REG_5.
* i386-opc.tbl: Update incsspd, incsspq and setssbsy.
* i386-tbl.h: Regenerated.
Many x86 instructions have more than one encodings. Assembler picks
the default one, usually the shortest one. Although the ".s", ".d8"
and ".d32" suffixes can be used to swap register operands or specify
displacement size, they aren't very flexible. This patch adds pseudo
prefixes, {xxx}, to control instruction encoding. The available
pseudo prefixes are {disp8}, {disp32}, {load}, {store}, {vex2}, {vex3}
and {evex}. Pseudo prefixes are preferred over the ".s", ".d8" and
".d32" suffixes, which are deprecated.
gas/
* config/tc-i386.c (_i386_insn): Add dir_encoding and
vec_encoding. Remove swap_operand and need_vrex.
(extra_symbol_chars): Add '}'.
(md_begin): Mark '}' with LEX_BEGIN_NAME. Allow '}' in
mnemonic.
(build_vex_prefix): Don't use 2-byte VEX encoding with
{vex3}. Check dir_encoding and load.
(parse_insn): Check pseudo prefixes. Set dir_encoding.
(VEX_check_operands): Likewise.
(match_template): Check dir_encoding and load.
(parse_real_register): Set vec_encoding instead of need_vrex.
(parse_register): Likewise.
* doc/c-i386.texi: Document {disp8}, {disp32}, {load}, {store},
{vex2}, {vex3} and {evex}. Remove ".s", ".d8" and ".d32"
* testsuite/gas/i386/i386.exp: Run pseudos and x86-64-pseudos.
* testsuite/gas/i386/pseudos.d: New file.
* testsuite/gas/i386/pseudos.s: Likewise.
* testsuite/gas/i386/x86-64-pseudos.d: Likewise.
* testsuite/gas/i386/x86-64-pseudos.s: Likewise.
opcodes/
* i386-gen.c (opcode_modifiers): Replace S with Load.
* i386-opc.h (S): Removed.
(Load): New.
(i386_opcode_modifier): Replace s with load.
* i386-opc.tbl: Add {disp8}, {disp32}, {swap}, {vex2}, {vex3}
and {evex}. Replace S with Load.
* i386-tbl.h: Regenerated.
Just like REX.W affects operand size of the implicit rAX/rDX inputs to
PCMPESTR{I,M}, VEX.W does for VPCMPESTR{I,M}. Allow Q or L suffixes on
the instructions.
Similarly the disassembler needs to be adjusted to no longer require
VEX.W to be zero for the instructions to be valid, and to emit proper
suffixes.
Note, however, that this doesn't address the problem of there being no
way to control (at least) {,E}VEX.W for 32- or 16-bit code. Nor does it
address the problem of the many WIG instructions not getting properly
disassembled when VEX.W=1.
AVX512F vmovq doesn't support masking. We can't swap register operand
in AVX512F vmovq with Reg64 since Reg64 != RegXMM. This patch merges
AVX512F vmovq.
* i386-opc.tbl: Merge AVX512F vmovq.