1. Use single entry for vcvtne2ps2bf16 and vdpbf16ps with Disp8ShiftVL.
2. Use 5 entries, instead of 8, for vcvtneps2bf16.
* i386-opc.tbl: Consolidate AVX512 BF16 entries.
* i386-init.h: Regenerated.
PEXTR{B,W} and PINSR{B,W}, just like for AVX512BW, are WIG, no matter
that the SDM uses a nonstandard description of that fact.
PEXTRD, even with EVEX.W set, ignores that bit outside of 64-bit mode,
just like its AVX counterpart.
Many VEX-/EVEX-encoded instructions accessing GPRs become WIG outside of
64-bit mode. The respective templates should specify neither VexWIG nor
VexW0, but instead the setting of the bit should be determined from
- REX.W in 64-bit mode,
- the setting established through -mvexwig= / -mevexwig= otherwise.
This implies that the evex-wig2 testcase needs to go away, as being
wrong altogether.
A few test additions desirable here will only happen in later patches,
as the disassembler needs adjustments first.
Once again SSE2AVX templates are left alone, for it being unclear what
the behavior there should be.
Quite a few templates were marked LIG while really the insns aren't.
Introduce descriptive shorthands once again, instead of continuing to
use the less legible original forms.
The 0F C5 encoding is indeed a load type one (just that memory operands
are not permitted), while the 0F 3A 15 encoding is obviously a store.
Allow the pseudo prefixes to be used to select between them.
Also move (without any change) the secondary AVX512BW templates next to
the primary one.
Commits 6865c0435a ("x86: Support VEX/EVEX WIG encoding") and 6fa52824c3
("x86: Replace VexW=3 with VexWIG") omitted quite a few templates, oddly
enough in some cases despite testcases getting added (which then were
recorded with wrong expected output).
Also adjust VPMAXUB's attributes in the AVX512BW case to match ordering
of that of neighboring templates.
For the moment SSE2AVX templates are left alone, as it isn't clear
whether they were intentionally left untouched by the original commits
(the descriptions don't say either way).
In this context I question the decision in commit 0375113302 ("x86: Add
-mvexwig=[0|1] option to assembler") to move the logic to determine the
value of the W bit ahead of the decision whether to use 2-byte VEX:
While I can see this as one possible interpretation of -mvexwig=, the
other alternative (setting the value of the bit only if it actually
exists in the encoding) looks as reasonable to me, and perhaps even more
in line with us generally trying to pick the shortest encoding.
AVX "VMOVQ xmm1, xmm2/m64" and "VMOVQ xmm1/m64, xmm2" can only be
encoded with VEX.128. Set Vex=1 on VEX.128 only vmovq and update
assembler tests.
gas/
PR gas/23665
* testsuite/gas/i386/avx-scalar-intel.d: Updated.
* testsuite/gas/i386/avx-scalar.d: Likewise.
* testsuite/gas/i386/x86-64-avx-scalar-intel.d: Likewise.
* testsuite/gas/i386/x86-64-avx-scalar.d: Likewise.
opcodes/
PR gas/23665
* i386-dis.c (vex_len_table): Update VEX_LEN_0F7E_P_1 and
VEX_LEN_0FD6_P_2 entries.
* i386-opc.tbl: Set Vex=1 on VEX.128 only vmovq.
* i386-tbl.h: Regenerated.
Add VEXWIG, defined as 3, to indicate that the VEX.W/EVEX.W bit is
ignored by such VEX/EVEX instructions, aka WIG instructions. Set
VexW=3 on VEX/EVEX WIG instructions. Update assembler to check
VEXWIG when setting the VEX.W bit.
gas/
PR gas/23642
* config/tc-i386.c (build_vex_prefix): Check VEXWIG when setting
the VEX.W bit.
(build_evex_prefix): Check VEXWIG when setting the EVEX.W bit.
opcodes/
PR gas/23642
* i386-opc.h (VEXWIG): New.
* i386-opc.tbl: Set VexW=3 on VEX/EVEX WIG instructions.
* i386-tbl.h: Regenerated.
Just like other insns having byte and word forms, these can also make
use of the W modifier, which at the same time allows simplifying some
other code a little bit.
Various moves come in load and store forms, and just like on the GPR
and FPU sides there would better be only one pattern. In some cases this
is not feasible because the opcodes are too different, but quite a few
cases follow a similar standard scheme. Introduce Opcode_SIMD_FloatD and
Opcode_SIMD_IntD, generalize handling in operand_size_match() (reverse
operand handling there simply needs to match "straight" operand one),
and fix a long standing, but so far only latent bug with when to zap
found_reverse_match.
Also once again drop IgnoreSize where pointlessly applied to templates
touched anyway as well as *word when redundant with Reg*.
There are separate CPUID feature bits for fxsave/fxrstor and cmovCC
instructions. This patch adds CpuCMOV and CpuFXSR to replace Cpu686
on corresponding instructions.
gas/
* config/tc-i386.c (cpu_arch): Add .cmov and .fxsr.
(cpu_noarch): Add nocmov and nofxsr.
* doc/c-i386.texi: Document cmov and fxsr.
opcodes/
* i386-gen.c (cpu_flag_init): Add CpuCMOV and CpuFXSR to
CPU_I686_FLAGS. Add CPU_CMOV_FLAGS, CPU_FXSR_FLAGS,
CPU_ANY_CMOV_FLAGS and CPU_ANY_FXSR_FLAGS.
(cpu_flags): Add CpuCMOV and CpuFXSR.
* i386-opc.tbl: Replace Cpu686 with CpuFXSR on fxsave, fxsave64,
fxrstor and fxrstor64. Replace Cpu686 with CpuCMOV on cmovCC.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
There's no insn allowing ZEROING_MASKING alone. Re-purpose its value for
handling the not uncommon case of insns allowing either form of masking
with register operands, but only merging masking with a memory operand.
Just like for their AVX counterparts and CVTSI2S{D,S}, a memory source
here is ambiguous and hence
- in source files should be qualified with a suitable suffix or operand
size specifier (not doing so is an error in Intel mode, and will gain
a diagnostic in AT&T mode in the future),
- in disassembly should be properly suffixed (the Intel operand size
specifiers were emitted correctly already).