Since only the first 32 bits of input operand are used for tpause and
umwait, the REX.W bit is skipped. Both 32-bit registers and 64-bit
registers are allowed.
gas/
* testsuite/gas/i386/x86-64-waitpkg.s: Add 32-bit registers
tests for tpause and umwait.
* testsuite/gas/i386/x86-64-waitpkg-intel.d: Updated.
* testsuite/gas/i386/x86-64-waitpkg.d: Likewise.
opcodes/
* i386-dis.c (prefix_table): Replace Em with Edq on tpause and
umwait.
* i386-opc.tbl: Allow 32-bit registers for tpause and umwait in
64-bit mode.
* i386-tbl.h: Regenerated.
config/plugins.m4 has
if test "$plugins" = "yes"; then
AC_SEARCH_LIBS([dlopen], [dl])
fi
Plugin uses dlsym, but libasan.so only intercepts dlopen, not dlsym:
[hjl@gnu-tools-1 binutils-text]$ nm -D /lib64/libasan.so.4| grep " dl"
0000000000038580 W dlclose
U dl_iterate_phdr
000000000004dc50 W dlopen
U dlsym
U dlvsym
[hjl@gnu-tools-1 binutils-text]$
Testing dlopen for libdl leads to false negative when -fsanitize=address
is used. It results in link failure:
../bfd/.libs/libbfd.a(plugin.o): undefined reference to symbol 'dlsym@@GLIBC_2.16'
dlsym should be used to check if libdl is needed for plugin.
bfd/
PR gas/22318
* configure: Regenerated.
binutils/
PR gas/22318
* configure: Regenerated.
gas/
PR gas/22318
* configure: Regenerated.
gprof/
PR gas/22318
* configure: Regenerated.
ld/
PR gas/22318
* configure: Regenerated.
"vex" has many fields to control how to decode an instruction. Clear
all fields in "vex" before decoding an instruction to avoid using values
left from the previous instruction.
gas/
PR binutils/23025
* testsuite/gas/i386/prefix.s: Add tests for vcvtpd2dq with
VEX and EVEX prefixes.
* testsuite/gas/i386/prefix.d: Updated.
opcodes/
PR binutils/23025
* i386-dis.c (get_valid_dis386): Don't set vex.prefix nor vex.w
to 0.
(print_insn): Clear vex instead of vex.evex.
This patch adds the following relocation support into binutils gas.
BFD_RELOC_AARCH64_TLSLE_LDST16_TPREL_LO12,
BFD_RELOC_AARCH64_TLSLE_LDST16_TPREL_LO12_NC,
BFD_RELOC_AARCH64_TLSLE_LDST32_TPREL_LO12,
BFD_RELOC_AARCH64_TLSLE_LDST32_TPREL_LO12_NC,
BFD_RELOC_AARCH64_TLSLE_LDST64_TPREL_LO12,
BFD_RELOC_AARCH64_TLSLE_LDST64_TPREL_LO12_NC,
BFD_RELOC_AARCH64_TLSLE_LDST8_TPREL_LO12,
BFD_RELOC_AARCH64_TLSLE_LDST8_TPREL_LO12_NC.
Those relocations includes both ip64 and ilp32 variant.
It again can be inferred from other information.
The vpopcntd templates all need to have Dword added to their memory
operands; the lack thereof was actually a bug preventing certain Intel
syntax code to assemble, so test cases get extended.
In the course of folding their patterns (possible now that the pointless
and partly even bogus VecESize are no longer in the way) I've noticed
that vcvt*2usi, other than their vcvt*2si counterparts, don't allow for
any suffixes. As that is supposedly intentional, make the disassembler
consistently omit suffixes for all to-scalar-int conversion insns.
Since they're shorter to encode, the 0xa0...0xa3 encodings are preferred
for moves between accumulator and absolute address outside of 64-bit
mode. With HLE release semantics this encoding is unsupported though,
with the assembler raising an error. The operation is valid though, we
merely need to pick the longer encoding in that case.
The wrong placement of the Load attribute in the templates prevented
this from working. The disassembler also didn't handle this consistently
with other similar dual-encoding insns.
While many templates allowing multiple suitably matching XMM/YMM/ZMM
operand sizes can be folded, a few need to be split in order to not
wrongly accept "xmmword ptr" operands when only XMM registers are
permitted (and memory operands are more narrow). Add a test case
validating this.
Since the addition of pseudo prefixes changed how the scrubber treats
'{', we need to explicitly strip whitespace in check_VecOperations ().
* config/tc-i386.c (check_VecOperations): Strip whitespace.
* testsuite/gas/i386/optimize-1.s: Add whitespaces before
{%k7} and {z},
* testsuite/gas/i386/x86-64-optimize-2.s: Likewise.
We can optimize AVX512 instructions with EVEX128 only if AVX512VL is
enabled:
1. Instruction is an AVX512VL instruction. Or
2. AVX512VL is enabled explicitly by -march=+avx512vl/".arch .avx512vl".
We should optimize EVEX instructions with EVEX128 encoding when pseudo
{evex} prefix is used.
* config/tc-i386.c (set_cpu_arch): Set cpu_arch_isa_flags.
(md_parse_option): Likewise.
(optimize_encoding): Check i.tm.cpu_flags and cpu_arch_isa_flags
for cpuavx512vl instead of cpu_arch_flags. Optimize EVEX with
EVEX128 when EVEX encoding is required.
* testsuite/gas/i386/i386.exp: Run optimize-4, optimize-5,
x86-64-optimize-5 and x86-64-optimize-6.
* testsuite/gas/i386/optimize-1.d: Updated.
* testsuite/gas/i386/x86-64-optimize-2.d: Likewise.
* testsuite/gas/i386/optimize-4.d: New file.
* testsuite/gas/i386/optimize-4.s: Likewise.
* testsuite/gas/i386/optimize-5.d: Likewise.
* testsuite/gas/i386/optimize-5.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-5.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-5.s: Likewise.
* testsuite/gas/i386/x86-64-optimize-6.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-6.s: Likewise.
"clr reg" is an alias of "xor reg, reg". We can encode "clr reg64" as
"xor reg32, reg32".
gas/
* config/tc-i386.c (optimize_encoding): Also encode "clr reg64"
as "xor reg32, reg32".
* testsuite/gas/i386/x86-64-optimize-1.s: Add "clr reg64" tests.
* testsuite/gas/i386/x86-64-optimize-1.d: Updated.
opcodes/
* i386-opc.tbl: Add Optimize to clr.
* i386-tbl.h: Regenerated.
The differences between some of the register and memory forms of the
same insn often don't really require the templates to be separate. For
example, Disp8MemShift is simply irrelevant to register forms. Fold
these as far as possible, and also fold register-only forms. Further
folding is possible, but needs other prereq work done first.
A note regarding EVEXDYN: This is intended to be used only when no other
properties of the template would make is_evex_encoding() return true. In
all "normal" cases I think it is preferable to omit this indicator, to
keep the table half way readable.
Their memory forms were bogusly using VexLWP instead of VexNDD. Adjust
VexNDD handling to cope with these, allowing their register and memory
forms to be folded.
They aren't really useful (anymore?): The conflicting operand size check
isn't applicable to any insn validly using respective memory operand
sizes (and if they're used wrongly, another error would result), and the
logic in process_suffix() can be easily changed to work without them.
While re-structuring conditionals in process_suffix() also drop the
CMPXCHG8B special case in favor of a NoRex64 attribute in the opcode
table.
Some BMI/BMI2 insns allow their middle operands to be a memory one. In
such a case, matching register types between operands 0 and 1 as well as
1 and 2 won't help - operands 0 and 2 also need to be checked.
Make more obvious what the success and failure paths are, and in
particular that what used to be at the "skip" label can't be reached
by what used to be straight line code.
Just like for the AVX/AES and AVX/PCLMUL combinations, AVX/GFN,
AVX512F/GFNI, AVX512F/VAES, and AVX512F/PCLMUL need special handling to
deal with the pair of required checks specified in the templates.
fsub/fsubr/fsubp/fsubrp as well as fdiv/fdivr/fdivp/fdivrp disassembly
should match (a) the Intel SDM and (b) respective input fed to gas (both
of course with the exception of when we intentionally convert bogus
insns, accompanied by a warning).
Drop "second": For one there's no other source register (the other
source operand is in memory), and in Intel syntax such numbering would
also be wrong.
Take the opportunity and also
- properly place declarations ahead of statements
- use %u format for unsigned int arguments
- fix indentation
This requires a change to ModR/M handling: Recording of displacement
types must not discard operand size information. Change the respective
code to alter only .disp<N>.
Oops, not tested well enough. -mpower9 sets all the PPC_OPCODE_POWERn
for n <= 9.
* config/tc-ppc.c (ppc_handle_align): Correct last patch. Really
don't emit a group terminating nop for power9. Simplify cpu
tests.
Power9 doesn't have a group terminating nop, so we may as well emit a
normal nop for power9. Not that it matters a great deal, I believe
ori 2,2,0 will be treated exactly as ori 0,0,0 by the hardware.
* config/tc-ppc.c (ppc_handle_align): Don't emit a group
terminating nop for power9.