Commit Graph

842 Commits

Author SHA1 Message Date
H.J. Lu a7e12755d5 x86: Mark cvtpi2ps and cvtpi2pd as MMX
* config/tc-i386.c (output_insn): Mark cvtpi2ps and cvtpi2pd
	with GNU_PROPERTY_X86_FEATURE_2_MMX.
	* testsuite/gas/i386/i386.exp: Run property-3 and
	x86-64-property-3.
	* testsuite/gas/i386/property-3.d: New file.
	* testsuite/gas/i386/property-3.s: Likewise.
	* testsuite/gas/i386/x86-64-property-3.d: Likewise.
2020-02-19 04:54:45 -08:00
H.J. Lu 272a84b120 x86: Remove CpuABM and add CpuPOPCNT
AMD ABM has 2 instructions: popcnt and lzcnt.  ABM CPUID feature bit has
been reused for lzcnt and a POPCNT CPUID feature bit is added for popcnt
which used to be the part of SSE4.2.  This patch removes CpuABM and adds
CpuPOPCNT.  It changes ABM to enable both lzcnt and popcnt, changes SSE4.2
to also enable popcnt.

gas/

	* config/tc-i386.c (cpu_arch): Add .popcnt.
	* doc/c-i386.texi: Remove abm and .abm.  Add popcnt and .popcnt.
	Add a tab before @samp{.sse4a}.

opcodes/

	* i386-gen.c (cpu_flag_init): Replace CpuABM with
	CpuLZCNT|CpuPOPCNT.  Add CpuPOPCNT to CPU_SSE4_2_FLAGS.  Add
	CPU_POPCNT_FLAGS.
	(cpu_flags): Remove CpuABM.  Add CpuPOPCNT.
	* i386-opc.h (CpuABM): Removed.
	(CpuPOPCNT): New.
	(i386_cpu_flags): Remove cpuabm.  Add cpupopcnt.
	* i386-opc.tbl: Replace CpuABM|CpuSSE4_2 with CpuPOPCNT on
	popcnt.  Remove CpuABM from lzcnt.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
2020-02-17 07:31:28 -08:00
Jan Beulich c8f8eebc3f x86: fold AddrPrefixOpReg templates
There's no need to have separate Cpu64 and CpuNo64 templates: There
already is special logic handling the attribute, and all that's needed
is rejecting 16-bit address registers in 64-bit mode. Suppress suffix
guessing and group all involved logic together, outside of suffix
processing (arguably it doesn't even belong in process_suffix()).

Also, since no AddrPrefixOpReg template permits any suffixes, move the
No_*Suf specifiers for them to a central place. Along with this drop
the no longer relevant NoRex64 from there.
2020-02-17 08:59:07 +01:00
Jan Beulich eedb0f2cfd x86/Intel: don't swap operands of MONITOR{,X} and MWAIT{,X}
Generally, the documentation doesn't allow for any explicit operands
to be specified with MONITOR/MWAIT. To permit the more legible
overriding of the address size via specifying operands, the option is
being retained even in Intel mode, but operand swapping is being
suppressed by this patch. This is both because it makes no sense here
(all of the operands are inputs) and because, as a result, old gcc
(prior to 4.8) actually expects it this way with -mintel-syntax (and
hence gets fixed by this change rather than, as claimed by a reply in
the bug report, broken).
2020-02-17 08:57:54 +01:00
Jan Beulich b9915cbc7d x86/Intel: improve diagnostics for ambiguous VCVT* operands
Conversions which shrink element size and which have a memory source
can't be disambiguated between their 128- and 256-bit variants by
looking at the register operand. "operand size mismatch", however, is a
pretty misleading diagnostic. Generalize the logic introduced for
VFPCLASSP{S,D} such that, with suitable similar adjustments to the
respective templates, it'll cover these cases too.

For VCVTNEPS2BF16 also fold the two previously separate AVX512VL
templates to achieve the intended effect. This is then also accompanied
by a respective addition to the inval-avx512f testcase.
2020-02-17 08:56:18 +01:00
H.J. Lu af5c13b01e x86: Don't disable SSE4a when disabling SSE4
commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is
a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel
SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also
adds .sse4a and nosse4a.

gas/

	* config/tc-i386.c (cpu_arch): Add .sse4a and nosse4a.  Restore
	nosse4.
	* doc/c-i386.texi: Document sse4a and nosse4a.

opcodes/

	* i386-gen.c (cpu_flag_init): Add CPU_ANY_SSE4A_FLAGS.  Remove
	CPU_ANY_SSE4_FLAGS.
2020-02-16 08:45:34 -08:00
Jan Beulich 65fca0597f x86: replace adhoc (partly wrong) ambiguous operand checking for MOVSX/MOVZX
For these to get treatment consistent with other operand size checking
the special logic shouldn't live in md_assemble(), but process_suffix().
And there's more logic involved than simply zapping the suffix.

Note however that MOVS[BW]* and MOVZ[BW]* still won't be fully
consistent, due to the objection to fold MOVS* templates just like was
done for MOVZ* in c07315e0c6 ("x86: allow suffix-less movzw and 64-bit
movzb").

Note further that it is against my own intentions to have MOVSX/MOVZX
silently default to a byte source in AT&T mode. This should happen only
when the destination register is a 16-bit one. In all other cases there
is an ambiguity, and the user should be warned. But it was explicitly
requested for this to be done in a way inconsistent with everything
else.

Note finally that the assembler change points out (and this patch fixes)
a wrong Intel syntax test introduced by bc31405ebb ("x86-64: Properly
encode and decode movsxd"): When source code specifies a 16-bit
destination register, disassembly expectations shouldn't have been to
find a 32-bit one.
2020-02-14 14:27:28 +01:00
Jan Beulich b677388436 x86: adjust segment override prefix emission
Since we already suppress the prefix altogether when it's the default
one for the chosen addressing mode, let's do so also when instruction
prefix and override specified with the memory operand match. (Note that
insn prefix specified segment overrides never get discarded.)
2020-02-14 14:04:23 +01:00
Jan Beulich 92334ad2c6 x86: optimize away pointless segment overrides
When optimizing there's no point keeping the segment overrides when we
warn about their presence in the first place.
2020-02-14 14:03:19 +01:00
Jan Beulich 514a8bb031 x86: extend LEA's segment override warning
For one both possible forms should be warned about. And then, to guard
against future surprises, qualify the original opcode check by excluding
VEX/EVEX-like templates.
2020-02-14 14:02:05 +01:00
H.J. Lu 292676c15a x86: Resolve PLT32 reloc aganst local symbol to section
Since PLT entry isn't needed for branch to local symbol, we can resolve
R_386_PLT32/R_X86_64_PLT32 relocation aganst local symbol to section,
similar to R_386_PC32/R_X86_64_PC32.

2020-02-13  Fangrui Song   <maskray@google.com>
	    H.J. Lu  <hongjiu.lu@intel.com>

	PR gas/25551
	* config/tc-i386.c (tc_i386_fix_adjustable): Don't check
	BFD_RELOC_386_PLT32 nor BFD_RELOC_X86_64_PLT32.
	* testsuite/gas/i386/i386.exp: Run relax-5 and x86-64-relax-4.
	* testsuite/gas/i386/relax-5.d: New file.
	* testsuite/gas/i386/relax-5.s: Likewise.
	* testsuite/gas/i386/x86-64-relax-4.d: Likewise.
	* testsuite/gas/i386/x86-64-relax-4.s: Likewise.
2020-02-13 13:44:29 -08:00
Jan Beulich 7deea9aad8 x86: fix SSE4a dependencies of ".arch .nosse*"
Since ".arch .sse4a" enables SSE3 and earlier, disabling SSE3 should
also disable SSE4a. And as per its name, ".arch .nosse4" should also do
so.
2020-02-13 10:19:28 +01:00
Jan Beulich 6c0946d0d2 x86: correct VFPCLASSP{S,D} operand size handling
With AVX512VL disabled (e.g. when writing code for the Knights family
of processors) these insns aren't ambiguous when used with a memory
source, and hence should be accepted without suffix or operand size
specifier. When AVX512VL is enabled, to be consistent with this as
well as other ambiguous operand size handling it would seem better to
just warn about the ambiguity in AT&T mode, and still default to 512-bit
operands (on the assumption that the code may have been written without
AVX512VL in mind yet), but it was requested to leave AT&T syntax mode
alone here.
2020-02-12 16:20:56 +01:00
Jan Beulich 5990e377e5 x86-64: Intel64 adjustments for insns dealing with far pointers
AMD and Intel differ in their handling of far indirect branches as well
as LFS/LGS/LSS: AMD CPUs ignore REX.W while Intel ones honors it. (Note
how the latter three were hybrids so far, while far branches were fully
AMD-like.)
2020-02-12 16:19:03 +01:00
Jan Beulich 9706160abd x86: also disallow non-byte/-word registers with byte/word suffix
Along the lines of be4c5e58bd ("x86: Always disallow double word suffix
with word general register") also adjust check_{byte,word}_reg(), to make
overall behavior consistent again in this regard.
2020-02-12 10:59:32 +01:00
Jan Beulich 5de4d9ef71 x86/Intel: improve diagnostics
The diagnostics issued by check_*_reg() are pretty AT&T-centric. Re-use
logic already used for SIMD memory operand size checking also for ones
where GPRs would alternatively also be allowed. (There's certainly room
for further improvement here.)
2020-02-12 10:58:42 +01:00
Jan Beulich 50128d0cab x86: drop ShortForm attribute
It is very simple to derive from other template properties, and hence
there's little point wasting storage for it.
2020-02-11 11:20:55 +01:00
H.J. Lu 4b5aaf5f69 x86: Accept Intel64 only instruction by default
Commit d835a58baa disabled sysenter/sysenter in 64-bit mode by
default.  By default, assembler should accept common, Intel64 only
and AMD64 ISAs since there are no conflicts.

gas/

	PR gas/25516
	* config/tc-i386.c (intel64): Renamed to ...
	(isa64): This.
	(match_template): Accept Intel64 only instruction by default.
	(i386_displacement): Updated.
	(md_parse_option): Updated.
	* c-i386.texi: Update -mamd64/-mintel64 documentation.
	* testsuite/gas/i386/i386.exp: Run x86-64-sysenter.  Pass
	-mamd64 to x86-64-sysenter-amd.
	* testsuite/gas/i386/x86-64-sysenter.d: New file.

opcodes/

	PR gas/25516
	* i386-gen.c (opcode_modifiers): Replace AMD64 and Intel64
	with ISA64.
	* i386-opc.h (AMD64): Removed.
	(Intel64): Likewose.
	(AMD64): New.
	(INTEL64): Likewise.
	(INTEL64ONLY): Likewise.
	(i386_opcode_modifier): Replace amd64 and intel64 with isa64.
	* i386-opc.tbl (Amd64): New.
	(Intel64): Likewise.
	(Intel64Only): Likewise.
	Replace AMD64 with Amd64.  Update sysenter/sysenter with
	Cpu64 and Intel64Only.  Remove AMD64 from sysenter/sysenter.
	* i386-tbl.h: Regenerated.
2020-02-10 08:37:36 -08:00
Jan Beulich 2ae4c7035c x86: prevent undue use of GOT32X and alike relocations
Comparison of i.tm.base_opcode against particular but not sufficiently
specific values needs to be accompanied by other qualification. Exclude
VEX and alike encodings here, and also exclude all forms of prefixes
explicitly specified in the opcodes table. While using @GOT with such
insns may not be very useful, it also isn't with e.g. ADC and SBB, yet
these get explicitly listed in comments as supported.
2020-01-30 17:03:22 +01:00
Jan Beulich 873494c89f x86-64: also diagnose far returns / IRET with ambiguous operand size
Other than near returns these default to 32-bit operand size, and hence
it isn't really unlikely that 64-bit forms are meant. Hence these should
have disambiguating suffixes. In Intel mode, however, don't error in
these cases unconditionally - MASM accepts these without suffix _and_
without warning.
2020-01-30 11:35:20 +01:00
Jan Beulich 62b3f54810 x86: drop further pointless/bogus DefaultSize
- 64-bit CALL permitting just a single operand size doesn't need it.
- FLDENV et al should never have had it.

It remains suspicious that a number of 64-bit only insns continue to
have the attribute, despite this being intended for .code16gcc handling
only.
2020-01-30 11:33:53 +01:00
H.J. Lu bc31405ebb x86-64: Properly encode and decode movsxd
movsxd is a 64-bit only instruction.  It supports both 16-bit and 32-bit
destination registers.  Its AT&T mnemonic is movslq which only supports
64-bit destination register.  There is also a discrepancy between AMD64
and Intel64 on movsxd with 16-bit destination register.  AMD64 supports
32-bit source operand and Intel64 supports 16-bit source operand.

This patch updates movsxd encoding and decoding to alow 16-bit and 32-bit
destination registers.  It also handles movsxd with 16-bit destination
register for AMD64 and Intel 64.

gas/

	PR binutils/25445
	* config/tc-i386.c (check_long_reg): Also convert to QWORD for
	movsxd.
	* doc/c-i386.texi: Add a node for AMD64 vs. Intel64 ISA
	differences.  Document movslq and movsxd.
	* testsuite/gas/i386/i386.exp: Run PR binutils/25445 tests.
	* testsuite/gas/i386/x86-64-movsxd-intel.d: New file.
	* testsuite/gas/i386/x86-64-movsxd-intel64-intel.d: Likewise.
	* testsuite/gas/i386/x86-64-movsxd-intel64-inval.l: Likewise.
	* testsuite/gas/i386/x86-64-movsxd-intel64-inval.s: Likewise.
	* testsuite/gas/i386/x86-64-movsxd-intel64.d: Likewise.
	* testsuite/gas/i386/x86-64-movsxd-intel64.s: Likewise.
	* testsuite/gas/i386/x86-64-movsxd-inval.l: Likewise.
	* testsuite/gas/i386/x86-64-movsxd-inval.s: Likewise.
	* testsuite/gas/i386/x86-64-movsxd.d: Likewise.
	* testsuite/gas/i386/x86-64-movsxd.s: Likewise.

opcodes/

	PR binutils/25445
	* i386-dis.c (MOVSXD_Fixup): New function.
	(movsxd_mode): New enum.
	(x86_64_table): Use MOVSXD_Fixup and movsxd_mode on movsxd.
	(intel_operand_size): Handle movsxd_mode.
	(OP_E_register): Likewise.
	(OP_G): Likewise.
	* i386-opc.tbl: Remove Rex64 and allow 32-bit destination
	register on movsxd.  Add movsxd with 16-bit destination register
	for AMD64 and Intel64 ISAs.
	* i386-tbl.h: Regenerated.
2020-01-27 04:38:29 -08:00
H.J. Lu be4c5e58bd x86: Always disallow double word suffix with word general register
In 64-bit mode, double word suffix in mnemonic with word general register
is disallowed.  Otherwise, assembler gives a warning:

$ cat /tmp/x.s
	movl	%ax, %bx
	movl	%ds, %ax
	movl	%ax, %cs
$ gcc -c /tmp/x.s
/tmp/x.s: Assembler messages:
/tmp/x.s:1: Error: incorrect register `%bx' used with `l' suffix
/tmp/x.s:2: Error: incorrect register `%ax' used with `l' suffix
/tmp/x.s:3: Error: incorrect register `%ax' used with `l' suffix
$ gcc -c /tmp/x.s -m32
/tmp/x.s: Assembler messages:
/tmp/x.s: Assembler messages:
/tmp/x.s:1: Warning: using `%ebx' instead of `%bx' due to `l' suffix
/tmp/x.s:1: Warning: using `%eax' instead of `%ax' due to `l' suffix
/tmp/x.s:2: Warning: using `%eax' instead of `%ax' due to `l' suffix
/tmp/x.s:3: Warning: using `%eax' instead of `%ax' due to `l' suffix

This patch makes it a hard error in all modes.  Now we get:

$ gcc -c /tmp/x.s -m32
/tmp/x.s: Assembler messages:
/tmp/x.s:1: Error: incorrect register `%bx' used with `l' suffix
/tmp/x.s:2: Error: incorrect register `%ax' used with `l' suffix
/tmp/x.s:3: Error: incorrect register `%ax' used with `l' suffix

	PR gas/25438
	* config/tc-i386.c (check_long_reg): Always disallow double word
	suffix in mnemonic with word general register.
	* testsuite/gas/i386/general.s: Replace word general register
	with double word general register for movl.
	* testsuite/gas/i386/inval.s: Add tests for movl with word general
	register.
	* testsuite/gas/i386/general.l: Updated.
	* testsuite/gas/i386/inval.l: Likewise.
2020-01-22 09:24:14 -08:00
Jan Beulich 1a0351246a x86: replace adhoc ambiguous operand checking for CRC32
There's no need (anymore?) to heavily special case this - just make
generic logic consider only its first operand, and deal with the case
of an 'l' suffix not being allowed in a pattern.
2020-01-21 08:30:05 +01:00
Jan Beulich c006a730e9 x86: improve handling of insns with ambiguous operand sizes
Commit b76bc5d54e ("x86: don't default variable shift count insns to
8-bit operand size") pointed out a very bad case, but the underlying
problem is, as mentioned on various occasions, much larger: Silently
selecting a (nowhere documented afaict) certain default operand size
when there's no "sizing" suffix and no suitable register operand(s) is
simply dangerous (for the programmer to make mistakes).

While in Intel syntax mode such mistakes already lead to an error (which
is going to remain that way), AT&T syntax mode now gains warnings in
such cases by default, which can be suppressed or promoted to an error
if so desired by the programmer. Furthermore at least general purpose
insns now consistently have a default applied (alongside the warning
emission), rather than accepting some and refusing others.

No warnings are (as before) to be generated for "DefaultSize" insns as
well as ones acting on selector and other fixed-width values. For
SYSRET, however, the DefaultSize needs to be dropped - it had been
wrongly put there in the first place, as it's unrelated to .code16gcc
(no stack accesses involved).

As set forth as a prereq when I first mentioned this intended change a
few years back, Linux as well as gcc have meanwhile been patched to
avoid (emission of) ambiguous operands (and hence triggering of the new
warning).

Note that I think that in 64-bit mode IRET and far RET would better get
a diagnostic too, as it's reasonably likely that a suffix-less instance
really is meant to be a 64-bit one. But I guess I better make this a
separate follow-on patch.

Note further that floating point operations with integer operands are an
exception for now: They continue to use short (16-bit) operands by
default even in 32- and 64-bit modes.

Finally note that while {,V}PCMPESTR{I,M} would, strictly speaking, also
need to be diagnosed, with their 64-bit forms not being very useful I
think it is better to continue to avoid warning about them (by way of
them carrying IgnoreSize attributes).
2020-01-21 08:28:25 +01:00
H.J. Lu 14470f0755 x86-64: Fix TLSDESC relaxation for x32
For x32, we must encode "lea x@TLSDESC(%rip), %reg" with a REX prefix
even if it isn't required.  Otherwise linker can’t safely perform
GDesc -> IE/LE optimization.  X32 TLSDESC sequences can be:

40 8d 05 00 00 00 00	rex lea	x@TLSDESC(%rip), %reg
...
67 ff 10		call	*x@TLSCALL(%eax)

or the same sequence as LP64:

48 8d 05 00 00 00 00	lea	foo@TLSDESC(%rip), %reg
...
ff 10			call	*foo@TLSCALL(%rax)

We need to support both sequences for x32.  For both GDesc -> IE/LE
transitions,

67 ff 10		call	*x@TLSCALL(%eax)

should relaxed to

0f 1f 00		nopl	(%rax)

For GDesc -> LE transition,

40 8d 05 00 00 00 00	rex lea	x@TLSDESC(%rip), %reg

should relaxed to

40 c7 c0 fc ff ff ff	rex movl $x@tpoff, %reg

For GDesc -> IE transition,

40 8d 05 00 00 00 00	rex lea	x@TLSDESC(%rip), %reg

should relaxed to

40 8b 05 00 00 00 00	rex movl x@gottpoff(%rip), %eax

bfd/

	PR ld/25416
	* elf64-x86-64.c (elf_x86_64_check_tls_transition): Support
	"rex leal x@tlsdesc(%rip), %reg" and "call *x@tlsdesc(%eax)" in
	X32 mode.
	(elf_x86_64_relocate_section): In x32 mode, for GDesc -> LE
	transition, relax "rex leal x@tlsdesc(%rip), %reg" to
	"rex movl $x@tpoff, %reg", for GDesc -> IE transition, relax
	"rex leal x@tlsdesc(%rip), %reg" to
	"rex movl x@gottpoff(%rip), %eax".  For both transitions, relax
	"call *(%eax)" to "nopl (%rax)".

gas/

	PR ld/25416
	* config/tc-i386.c (output_insn): Add a dummy REX_OPCODE prefix
	for lea with R_X86_64_GOTPC32_TLSDESC relocation when generating
	x32 object.
	* testsuite/gas/i386/ilp32/x32-tls.d: Updated.
	* testsuite/gas/i386/ilp32/x32-tls.s: Add tests for lea with
	R_X86_64_GOTPC32_TLSDESC relocation.

ld/

	PR ld/25416
	* testsuite/ld-x86-64/pr25416-1.s: New file
	* testsuite/ld-x86-64/pr25416-1a.d: Likewise.
	* testsuite/ld-x86-64/pr25416-1b.d: Likewise.
	* testsuite/ld-x86-64/pr25416-1.s: Likewise.
	* testsuite/ld-x86-64/pr25416-2.s: Likewise.
	* testsuite/ld-x86-64/pr25416-2a.d: Likewise.
	* testsuite/ld-x86-64/pr25416-2b.d: Likewise.
	* testsuite/ld-x86-64/pr25416-3.d: Likewise.
	* testsuite/ld-x86-64/pr25416-3.s: Likewise.
	* testsuite/ld-x86-64/pr25416-4.d: Likewise.
	* testsuite/ld-x86-64/pr25416-4.s: Likewise.
	* testsuite/ld-x86-64/pr25416-5a.c: Likewise.
	* testsuite/ld-x86-64/pr25416-5b.s: Likewise.
	* testsuite/ld-x86-64/pr25416-5c.s: Likewise.
	* testsuite/ld-x86-64/pr25416-5d.s: Likewise.
	* testsuite/ld-x86-64/pr25416-5e.s: Likewise.
	* testsuite/ld-x86-64/x86-64.exp: Run PR ld/25416 tests.
2020-01-20 07:01:07 -08:00
H.J. Lu 42e04b3601 x86: Add {vex} pseudo prefix
There are 2-byte VEX prefix and 3-byte VEX prefix.  2-byte VEX prefix
can't encode all operands.  By default, assembler tries 2-byte VEX prefix
first.  {vex3} can be used to force 3-byte VEX prefix.  This patch adds
{vex} pseudo prefix and keeps {vex2} for backward compatibility.

gas/

	* config/tc-i386.c (_i386_insn): Replace vex_encoding_vex2
	with vex_encoding_vex.
	(parse_insn): Likewise.
	* doc/c-i386.texi: Replace {vex2} with {vex}.  Update {vex}
	and {vex3} documentation.
	* testsuite/gas/i386/pseudos.s: Replace 3 {vex2} tests with
	{vex}.
	* testsuite/gas/i386/x86-64-pseudos.s: Likewise.

opcodes/

	* i386-opc.tbl: Add {vex} pseudo prefix.
	* i386-tbl.h: Regenerated.
2020-01-17 07:07:55 -08:00
Jan Beulich 45a4bb2010 x86: drop found_cpu_match local variable
50aecf8c5f could have done so right away; perhaps the variable shouldn't
have been introduced in the first place.
2020-01-16 10:07:36 +01:00
Jan Beulich 72aea32839 x86: refine when to trigger optimizations
Checking just the base opcode without also checking this isn't a VEX
encoding, and without there being other insn properties avoiding a match
once respective VEX/XOP/EXEX-encoded insns would appear, is at least
dangerous. Add respective checks. At the same time there's no real need
to check the extension opcode to be None for the 0xA8 form - there's
nothing it can be confused with, and non-VEX-and-alike forms also can't
appear.
2020-01-09 11:40:04 +01:00
Jan Beulich 3f93af6141 x86-64: assert sane internal state for REX conversions
For the comments about "hi" registers to be really applicable, RegRex
may not be set on the respective registers. Assert this is the case.
2020-01-09 11:39:33 +01:00
Jan Beulich 7697afb662 x86: consistently convert to byte registers for TEST w/ imm optimization
Commit ac0ab1842d ("i386: Also check R12-R15 registers when optimizing
testq to testb") didn't go quite far enough: In order to avoid confusing
other code registers would better be converted to byte ones uniformly.
2020-01-09 11:38:59 +01:00
Alan Modra b3adc24a07 Update year range in copyright notice of binutils files 2020-01-01 18:42:54 +10:30
Jan Beulich f2810fe00a x86: adjust ignored prefix warning for branches
There's no reason to not also issue them in Intel syntax mode, and it
can be quite helpful to mention the actual insn (after all there can be
multiple on a single line).
2019-12-27 09:39:58 +01:00
Jan Beulich 6cb0a70ef3 x86-64: correct / adjust prefix emission
First and foremost REX must come last. Next JumpInterSegment branches
can't possibly have a REX prefix, as they're consistently CpuNo64. And
finally make BND prefix handling in output_branch() consistent with that
of other prefixes in the same function, and make its placement among
prefixes consistent with output_jump() (which, oddly enough, still isn't
the supposedly canonical order specified by the *_PREFIX definitions).
2019-12-27 09:39:17 +01:00
Jan Beulich 376cd05610 x86-64: fix Intel64 handling of branch with data16 prefix
The expectation of x86-64-branch-3 for "call" / "jmp" with an obvious
direct destination to translate to an indirect _far_ branch is plain
wrong. The operand size prefix should have no effect at all on the
interpretation of the operand. The main underlying issue here is that
the Intel64 templates of the direct branches don't include Disp16, yet
various assumptions exist that it would always be there when there's
also Disp32/Disp32S, toggled by the operand size prefix (which is
being ignored by direct branches in Intel64 mode).

Along these lines it was also wrong to base the displacement width
decision solely on the operand size prefix: REX.W cancels this effect
and hence needs taking into consideration, too.

A disassembler change is needed here as well: XBEGIN was wrongly treated
the same as direct CALL/JMP, which isn't the case - the operand size
prefix does affect displacement size there, it's merely ignored when it
comes to updating [ER]IP.
2019-12-27 09:38:34 +01:00
Jan Beulich 48bcea9f48 x86: consolidate Disp<NN> handling a little
In memory operand addressing, which forms of displacement are permitted
besides Disp8 is pretty clearly limited
- outside of 64-bit mode, Disp16 or Disp32 only, depending on address
  size (MPX being special in not allowing Disp16),
- in 64-bit mode, Disp32s or Disp64 without address size override, and
  solely Disp32 with one.
Adjust assembler and i386-gen to match this, observing that templates
already get adjusted before trying to match them against input depending
on the presence of an address size prefix.

This adjustment logic gets extended to all cases, as certain DispNN
values should also be dropped when there's no such prefix. In fact
behavior of the assembler, perhaps besides the exact diagnostics wording,
should not differ between there being templates applicable to 64-bit and
non-64-bit at the same time, or there being fully separate sets of
templates, with their DispNN settings already reduced accordingly.

This adjustment logic further gets guarded such that there wouldn't be
and Disp<N> conversion based on address size prefix when this prefix
doesn't control the width of the displacement (on branches other than
absolute ones).

These adjustments then also allow folding two MOV templates, which had
been split between 64-bit and non-64-bits variants so far.

Once in this area also
- drop the bogus DispNN from JumpByte templates, leaving just the
  correct Disp8 there (compensated by i386_finalize_displacement()
  now setting Disp8 on their operands),
- add the missing Disp32S to XBEGIN.

Note that the changes make it necessary to temporarily mark a test as
XFAIL; this will get taken care of by a subsequent patch. The failing
parts are entirely bogus and will get replaced.
2019-12-27 09:22:03 +01:00
H.J. Lu ac0ab1842d i386: Also check R12-R15 registers when optimizing testq to testb
Similar to SP, BP, SI and DI registers, R12-R15 registers must use REX
prefix for the low byte register when optimizing

test $imm7, %r64/%r32/%r16 -> test $imm7, %r8

	PR gas/25274
	* config/tc-i386.c (optimize_encoding): Also check R12-R15
	registers for "test $imm7, %r64/%r32/%r16 -> test $imm7, %r8"
	optimization.
	* testsuite/gas/i386/x86-64-optimize-3.s: Add tests for test
	with r12.
	* testsuite/gas/i386/x86-64-optimize-3.d: Updated.
	* testsuite/gas/i386/x86-64-optimize-3b.d: Likewise.
2019-12-12 12:31:26 -08:00
H.J. Lu 76cf450b4c i386: Add -mbranches-within-32B-boundaries
Add -mbranches-within-32B-boundaries to enable

-malign-branch-boundary=32
-malign-branch=jcc+fused+jmp
-malign-branch-prefix-size=5

	* config/tc-i386.c (OPTION_MBRANCHES_WITH_32B_BOUNDARIES): New.
	(md_longopts): Add -mbranches-within-32B-boundaries.
	(md_parse_option): Handle -mbranches-within-32B-boundaries.
	(md_show_usage): Add -mbranches-within-32B-boundaries.
2019-12-12 12:03:45 -08:00
H.J. Lu e379e5f385 i386: Align branches within a fixed boundary
Add 3 command-line options to align branches within a fixed boundary
with segment prefixes or NOPs:

1. -malign-branch-boundary=NUM aligns branches within NUM byte boundary.
2. -malign-branch=TYPE[+TYPE...] specifies types of branches to align.
The supported branches are:
  a. Conditional jump.
  b. Fused conditional jump.
  c. Unconditional jump.
  d. Call.
  e. Ret.
  f. Indirect jump and call.
3. -malign-branch-prefix-size=NUM aligns branches with NUM segment
prefixes per instruction.

3 new rs_machine_dependent frag types are added:

1. BRANCH_PADDING.  The variable size frag to insert NOP before branch.
2. BRANCH_PREFIX.  The variable size frag to insert segment prefixes to
an instruction.  The choices of prefixes are:
   a. Use the existing segment prefix if there is one.
   b. Use CS segment prefix in 64-bit mode.
   c. In 32-bit mode, use SS segment prefix with ESP/EBP base register
   and use DS segment prefix without ESP/EBP base register.
3. FUSED_JCC_PADDING.  The variable size frag to insert NOP before fused
conditional jump.

The new rs_machine_dependent frags aren't inserted if the previous item
is a prefix or a constant directive, which may be used to hardcode an
instruction, since there is no clear instruction boundary.  Segment
prefixes and NOP padding are disabled before relaxable TLS relocations
and tls_get_addr calls to keep TLS instruction sequence unchanged.

md_estimate_size_before_relax() and i386_generic_table_relax_frag() are
used to handled BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags.
i386_generic_table_relax_frag() grows or shrinks sizes of segment prefix
and NOP to align the next branch frag:

1. First try to add segment prefixes to instructions before a branch.
2. If there is no sufficient room to add segment prefixes, NOP will be
inserted before a branch.

	* config/tc-i386.c (_i386_insn): Add has_gotpc_tls_reloc.
	(tls_get_addr): New.
	(last_insn): New.
	(align_branch_power): New.
	(align_branch_kind): New.
	(align_branch_bit): New.
	(align_branch): New.
	(MAX_FUSED_JCC_PADDING_SIZE): New.
	(align_branch_prefix_size): New.
	(BRANCH_PADDING): New.
	(BRANCH_PREFIX): New.
	(FUSED_JCC_PADDING): New.
	(i386_generate_nops): Support BRANCH_PADDING and FUSED_JCC_PADDING.
	(md_begin): Abort if align_branch_prefix_size <
	MAX_FUSED_JCC_PADDING_SIZE.
	(md_assemble): Set last_insn.
	(maybe_fused_with_jcc_p): New.
	(add_fused_jcc_padding_frag_p): New.
	(add_branch_prefix_frag_p): New.
	(add_branch_padding_frag_p): New.
	(output_insn): Generate a BRANCH_PADDING, FUSED_JCC_PADDING or
	BRANCH_PREFIX frag and terminate each frag to align branches.
	(output_disp): Set i.has_gotpc_tls_reloc to TRUE for GOTPC and
	relaxable TLS relocations.
	(output_imm): Likewise.
	(i386_next_non_empty_frag): New.
	(i386_next_jcc_frag): New.
	(i386_classify_machine_dependent_frag): New.
	(i386_branch_padding_size): New.
	(i386_generic_table_relax_frag): New.
	(md_estimate_size_before_relax): Handle COND_JUMP_PADDING,
	FUSED_JCC_PADDING and COND_JUMP_PREFIX frags.
	(md_convert_frag): Handle BRANCH_PADDING, BRANCH_PREFIX and
	FUSED_JCC_PADDING frags.
	(OPTION_MALIGN_BRANCH_BOUNDARY): New.
	(OPTION_MALIGN_BRANCH_PREFIX_SIZE): New.
	(OPTION_MALIGN_BRANCH): New.
	(md_longopts): Add -malign-branch-boundary=,
	-malign-branch-prefix-size= and -malign-branch=.
	(md_parse_option): Handle -malign-branch-boundary=,
	-malign-branch-prefix-size= and -malign-branch=.
	(md_show_usage): Display -malign-branch-boundary=,
	-malign-branch-prefix-size= and -malign-branch=.
	(i386_target_format): Set tls_get_addr.
	(i386_cons_align): New.
	* config/tc-i386.h (i386_cons_align): New.
	(md_cons_align): New.
	(i386_generic_table_relax_frag): New.
	(md_generic_table_relax_frag): New.
	(i386_tc_frag_data): Add u, padding_address, length,
	max_prefix_length, prefix_length, default_prefix, cmp_size,
	classified and branch_type.
	(TC_FRAG_INIT): Initialize u, padding_address, length,
	max_prefix_length, prefix_length, default_prefix, cmp_size,
	classified and branch_type.
	* doc/c-i386.texi: Document -malign-branch-boundary=,
	-malign-branch= and -malign-branch-prefix-size=.
2019-12-12 12:03:45 -08:00
Jan Beulich 569d50f1c6 x86: further refine SSE check (SSE4a, SHA, GFNI)
In  ("x86: extend SSE check to PCLMULQDQ, AES, and GFNI insns") I went
both a little too far and not quite far enough:
- GFNI insns also have AVX512 variants, which also shouldn't get
  diagnosed,
- SSE4a insns should get diagnosed just like SSE4.x ones,
- SHA insns should get diagnosed just like PCLMULQDQ or AES ones.
2019-12-11 09:42:29 +01:00
Jan Beulich 319ff62c8a x86: consolidate tracking of MMX register use
Just like for XMM/YMM/ZMM don't key this to any Cpu* flags. Instead
include the two special insns (not having register operands) explicitly.
2019-12-04 10:43:50 +01:00
Jan Beulich 13e600d0f5 x86: make sure all PUSH/POP honor DefaultSize
While segment registers are registers, their use doesn't allow sizing
of insns without suffix / explicit operand size specifier. Prevent
PUSH and POP of segment registers from entering that path, instead
allowing them to observe the stackop_size setting just like other
PUSH/POP and alike do.
2019-12-04 10:40:40 +01:00
Jan Beulich 3036c89919 x86: drop some stray/bogus DefaultSize
Insns permitting only GPR operands (and hence implicit sizing when
there's no suffix) don't ever have their DefaultSize attribute
inspected, so it shouldn't be there in the first place.

Additionally XBEGIN is like JMP, not CALL, and hence shouldn't be
converted to 32-bit operand size in .code16gcc mode. While the same is
true for SYSRET, it permitting more than one suffix makes it FLDENV-
like, and hence rather than dropping the attribute, for now add it to
the exclusion list to avoid it getting an operand size prefix emitted
in .code16gcc mode. (This will be dealt with later, perhaps together
with FLDENV and friends.)
2019-12-04 10:40:02 +01:00
Jan Beulich 0cfa3eb352 x86: fold individual Jump* attributes into a single Jump one
..., taking just 3 bits instead of 5. No two of them are used together.
2019-11-14 08:47:44 +01:00
Jan Beulich 6f2f06bea8 x86: make JumpAbsolute an insn attribute
... instead of an operand one: There's only ever one operand here
anyway.
2019-11-14 08:47:03 +01:00
Jan Beulich 601e856422 x86: make AnySize an insn attribute
... instead of an operand one. Which operand it applies to can be
determined from other operand properties, but as it turns out the only
place it is actually used at doesn't even need further qualification.
2019-11-14 08:46:19 +01:00
Jan Beulich 51c8edf68b x86: fold EsSeg into IsString
EsSeg (a per-operand bit) is used with IsString (a per-insn attribute)
only. Extend the attribute to 2 bits, thus allowing to encode
- not a string insn,
- string insn with neither operand requiring use of %es:,
- string insn with 1st operand requiring use of %es:,
- string insn with 2nd operand requiring use of %es:,
which covers all possible cases, allowing to drop EsSeg.

The (transient) need to comment out the OTUnused #define did uncover an
oversight in the earlier OTMax -> OTNum conversion, which is being taken
care of here.
2019-11-12 09:09:31 +01:00
Jan Beulich 474da251bf x86: eliminate ImmExt abuse
Drop the remaining instances left in place by commit c3949f432f ("x86:
limit ImmExt abuse), now that we have a way to specify specific GPRs.

Take the opportunity and also introduce proper 16-bit forms of
applicable SVME insns as well as 1-operand forms of CLZERO.
2019-11-12 09:08:32 +01:00
Jan Beulich 75e5731b8f x86: introduce operand type "instance"
Special register "class" instances can't be combined with one another
(neither in templates nor in register entries), and hence it is not a
good use of resources (memory as well as execution time) to represent
them as individual bits of a bit field.

Furthermore the generalization becoming possible will allow
improvements to the handling of insns accepting only individual
registers as their operands.
2019-11-12 09:07:34 +01:00
H.J. Lu dc2be329b9 i386: Only check suffix in instruction mnemonic
We should check suffix in instruction mnemonic when matching instruction.
In Intel syntax, normally we check for memory operand size.  But the same
mnemonic with 2 different encodings can have the same memory operand
size and i.suffix is set to LONG_DOUBLE_MNEM_SUFFIX from memory operand
size in Intel syntax to distinguish them.  When there is no suffix in
mnemonic, we check LONG_DOUBLE_MNEM_SUFFIX in i.suffix for mnemonic
suffix.

gas/

	PR gas/25167
	* config/tc-i386.c (match_template): Don't check instruction
	suffix set from operand.
	* testsuite/gas/i386/code16.d: New file.
	* testsuite/gas/i386/code16.s: Likewise.
	* testsuite/gas/i386/i386.exp: Run code16.
	* testsuite/gas/i386/x86-64-branch-4.l: Updated.

opcodes/

	PR gas/25167
	* i386-opc.tbl: Remove IgnoreSize from cmpsd and movsd.
	* i386-tbl.h: Regenerated.
2019-11-08 09:31:17 -08:00