There is a recurring pattern in assembly files generated by a compiler
where a lot of jumps in a function are going to the same place. When
these jumps are relaxed with trampolines the assembler generates a
separate jump thread from each source.
Create an index of trampoline jump targets for each segment and see if a
jump being relaxed goes to a location from that index, in which case
replace its target with a location of existing trampoline jump that
results in the shortest path to the original target.
gas/
2017-11-27 Max Filippov <jcmvbkbc@gmail.com>
* config/tc-xtensa.c (trampoline_chain_entry, trampoline_chain)
(trampoline_chain_index): New structures.
(trampoline_index): Add chain_index field.
(xg_order_trampoline_chain_entry, xg_sort_trampoline_chain)
(xg_find_chain_entry, xg_get_best_chain_entry)
(xg_order_trampoline_chain, xg_get_trampoline_chain)
(xg_find_best_eq_target, xg_add_location_to_chain)
(xg_create_trampoline_chain, xg_get_single_symbol_slot): New
functions.
(xg_relax_fixups): Call xg_find_best_eq_target to adjust jump
target to point to an existing jump. Call
xg_create_trampoline_chain to create new jump target. Call
xg_add_location_to_chain to add newly created trampoline jump
to the corresponding chain.
(add_jump_to_trampoline): Extract loop searching for a single
slot with a symbol into a separate function, replace that code
with a call to that function.
(relax_frag_immed): Call xg_find_best_eq_target to adjust jump
target to point to an existing jump.
* testsuite/gas/xtensa/all.exp: Add trampoline-2 test.
* testsuite/gas/xtensa/trampoline.d: Adjust absolute addresses
as many duplicate trampoline chains are now coalesced.
* testsuite/gas/xtensa/trampoline.s: Add _nop so that objdump
stays in sync with instruction stream.
* testsuite/gas/xtensa/trampoline-2.l: New test result file.
* testsuite/gas/xtensa/trampoline-2.s: New test source file.
There's almost exact copy of the trampoline placement code in the
search_trampolines function that is used for jumps generated for relaxed
branch instructions. Get rid of the duplication and reuse
xg_find_best_trampoline function for that.
gas/
2017-11-27 Max Filippov <jcmvbkbc@gmail.com>
* config/tc-xtensa.c (search_trampolines, get_best_trampoline):
Remove definitions.
(xg_find_best_trampoline_for_tinsn): New function.
(relax_frag_immed): Replace call to get_best_trampoline with a
call to xg_find_best_trampoline_for_tinsn.
* testsuite/gas/xtensa/trampoline.d: Adjust absolute addresses
as the placement of trampolines for relaxed branches has been
changed.
Replace linked list of trampoline frags with an ordered array, so that
instead of indexing fixups trampolines could be indexed. Keep each array
in the trampoline_seg structure, so there's no need to rebuild it for
every new processed segment. Don't run relaxation for each trampoline
frag, instead run it for each fixup in the current segment that needs
relaxation at the beginning of each relaxation pass. This way the
complexity of this process drops from about O(n^2 * m) to about
O(log n * m), where n is the number of trampoline frags and m is the
number of fixups that need relaxation in the segment.
gas/
2017-11-27 Max Filippov <jcmvbkbc@gmail.com>
* config/tc-xtensa.c (trampoline_index): New structure.
(trampoline_seg): Replace trampoline list with trampoline index.
(xg_find_trampoline, xg_add_trampoline_to_index)
(xg_remove_trampoline_from_index, xg_add_trampoline_to_seg)
(xg_is_trampoline_frag_full, xg_get_fulcrum)
(xg_find_best_trampoline, xg_relax_fixup, xg_relax_fixups)
(xg_is_relaxable_fixup): New functions.
(J_MARGIN): New macro.
(xtensa_create_trampoline_frag): Use xg_add_trampoline_to_seg
instead of open-coded addition to the linked list.
(dump_trampolines): Iterate through the trampoline_seg::index.
(cached_fixupS, cached_fixup, fixup_cacheS, fixup_cache)
(fixup_order, xtensa_make_cached_fixup)
(xtensa_realloc_fixup_cache, xtensa_cache_relaxable_fixups)
(xtensa_find_first_cached_fixup, xtensa_delete_cached_fixup)
(xtensa_add_cached_fixup, check_and_update_trampolines): Remove
definitions.
(xg_relax_trampoline): Extract logic into separate functions,
replace body with a call to xg_relax_fixups.
(search_trampolines): Replace search in linked list with search
in index. Change data type of address-tracking variables from
int to offsetT. Replace abs with labs.
(xg_append_jump): Finish the trampoline frag if it's full.
(add_jump_to_trampoline): Remove trampoline frag from the index
if the frag is full.
* config/tc-xtensa.h (xtensa_frag_type): Remove next_trampoline.
* testsuite/gas/xtensa/trampoline.d: Adjust absolute addresses
as the placement of trampolines has slightly changed.
* testsuite/gas/xtensa/trampoline.s: Add _nop so that objdump
stays in sync with instruction stream.
The split between fragS and trampoline_frag doesn't save much space, but
makes trampolines management much more awkward. Merge trampoline_frag
data into the xtensa_frag_type, which is a part of fragS. No functional
changes.
gas/
2017-11-27 Max Filippov <jcmvbkbc@gmail.com>
* config/tc-xtensa.c (init_trampoline_frag): Replace pointer to
struct trampoline_frag parameter with pointer to fragS.
(xg_append_jump): Remove jump_around parameter.
(struct trampoline_frag): Remove.
(struct trampoline_seg): Change type of trampoline_list from
struct trampoline_frag to fragS.
(xtensa_create_trampoline_frag): Don't allocate struct
trampoline_frag. Initialize new fragS::tc_frag_data fields.
(dump_trampolines, xg_relax_trampoline, search_trampolines)
(get_best_trampoline, init_trampoline_frag)
(add_jump_to_trampoline, relax_frag_immed): Replace pointer to
struct trampoline_frag with a pointer to fragS.
(xg_append_jump): Remove jump_around parameter, use
fragS::tc_frag_data.jump_around_fix instead.
(xg_relax_trampoline, init_trampoline_frag)
(add_jump_to_trampoline): Don't pass jump_around parameter to
xg_append_jump.
* config/tc-xtensa.h (struct xtensa_frag_type): Add new fields:
needs_jump_around, next_trampoline and jump_around_fix.
xtensa_create_trampoline_frag has opencoded fragment equivalent to
find_trampoline_seg. Drop the fragment and use find_trampoline_seg
instead. No functional changes.
gas/
2017-11-27 Max Filippov <jcmvbkbc@gmail.com>
* config/tc-xtensa.c (find_trampoline_seg): Move above the first
use.
(xtensa_create_trampoline_frag): Replace trampoline seg search
code with a call to find_trampoline_seg.
init_trampoline_frag, add_jump_to_trampoline and xg_relax_trampoline add
a jump to the end of a trampoline frag. Extract it into a separate
funciton and use it in all these places. No functional changes.
gas/
2017-11-27 Max Filippov <jcmvbkbc@gmail.com>
* config/tc-xtensa.c (xg_append_jump): New function.
(xg_relax_trampoline, init_trampoline_frag)
(add_jump_to_trampoline): Replace trampoline jump assembling
code with a call to xg_append_jump.
To make measurement and changes easier extract trampoline relaxation
function. No functional changes.
gas/
2017-11-27 Max Filippov <jcmvbkbc@gmail.com>
* config/tc-xtensa.c (xg_relax_trampoline): New function.
(xtensa_relax_frag): Replace trampoline relaxation code with a
call to xg_relax_trampoline.
For one the register type used for masking should be validated. And then
we shouldn't accept input producing encodings which will #UD when
executed, as is the case when EVEX.Z is set while EVEX.AAA is zero.
Despite EVEX encodings not being available in real and VM86 modes,
16-bit addressing still needs to be handled properly for 16-bit
protected mode as well as 16-bit addressing in 32-bit mode. Neither
should displacements be dropped silently by the assembler, nor should
the disassembler fail to correctly scale 8-bit displacements.
Except for %eip-relative addressing, where we don't have a suitable
relocation type silently wrapping at the 4G boundary, consistently
force use of R_X86_64_32 (in ELF terms) instead of its sign-extending
counterpart. This wasn't right in case there was no base register in
the addressing expression.
Make the assembler recognize UD0, supporting only the newer form
expecting a ModR/M byte.
Make assembler and disassembler properly emit / expect a ModR/M byte for
UD1.
For the testsuite, as arch-4 already tests all UDn, avoid producing a
huge delta for other tests using UD2B by making them use UD2 instead.
Multiple errors are more confusing than helpful, as the more generic
one often implies a sufficiently different adjustment than would
actually be needed to fix the code. Additionally it makes it more
cumbersome to add missing error checks, as the testsuite then needs
extra updating.
* as.c: Include write.h.
(common_emul_init): Use FAKE_LABEL_NAME.
* ecoff.c (add_file, ecoff_directive_end, ecoff_directive_loc):
Likewise.
(ecoff_build_symbols): Use FAKE_LABEL_CHAR.
* expr.c (get_symbol_name): Use FAKE_LABEL_CHAR. Accept only if
input_from_string is TRUE.
* read.c (input_from_string): New.
(read_symbol_name): Use FAKE_LABEL_CHAR. Accept only if
input_from_string is TRUE.
(temp_ilp): Set input_from_string to TRUE.
(restore_ilp): Set input_from_string to FALSE.
* read.h (input_from_string): Declare.
* symbols.c: Include write.h
(S_IS_LOCAL): Check for FAKE_LABEL_CHAR.
(symbol_relc_make_sym): Fix comment refering to default fake label
string.
* write.h (FAKE_LABEL_CHAR): New.
* config/tc-riscv.h (FAKE_LABEL_CHAR): Define.
* testsuite/gas/all/err-fakelabel.s: New.
Uses of reg_expected_msgs rely on each arm_reg_type enumerator to have a
message entry in the same order as the enumerator declaration. This is
not clearly stated in the definition of both the arm_reg_type enum and
the reg_expected_msgs. Worse, there is nothing to ensure both are kept
in sync.
As an attempt towards this, this patch uses C99 array designators to
ensure that each message is associated with the right arm_reg_type. A
comment is also added near the definition of arm_reg_type to point to
the reg_expected_msgs array. Finally, the array is synced with
arm_reg_type by adding the missing error message for REG_TYPE_RNB.
2017-11-22 Thomas Preud'homme <thomas.preudhomme@arm.com>
gas/
* config/tc-arm.c (arm_reg_type): Comment on the link with
reg_expected_msgs.
(reg_expected_msgs): Initialize using array designators with
arm_reg_type index.
The -n command-line of x86 assembler disables optimization of alignment
directives, like ".balign 8, 0x90", with multi-byte nop instructions
such as leal 0(%esi),%esi.
PR gas/22464
* testsuite/gas/i386/align-1.s: New file.
* testsuite/gas/i386/align-1a.d: Likewise.
* testsuite/gas/i386/align-1b.d: Likewise.
* testsuite/gas/i386/i386.exp: Run align-1a and align-1b.
This patch separates the new FP16 instructions backported from Armv8.4-a to Armv8.2-a
into a new flag order to distinguish them from the rest of the already existing optional
FP16 instructions in Armv8.2-a.
The new flag "+fp16fml" is available from Armv8.2-a and implies +fp16 and is mandatory on
Armv8.4-a.
gas/
* config/tc-aarch64.c (fp16fml): New.
* doc/c-aarch64.texi (fp16fml): New.
* testsuite/gas/aarch64/armv8_2-a-crypto-fp16.d (fp16): Make fp16fml.
* testsuite/gas/aarch64/armv8_3-a-crypto-fp16.d (fp16): Make fp16fml.
include/
* opcode/aarch64.h: (AARCH64_FEATURE_F16_FML): New.
(AARCH64_ARCH_V8_4): Enable AARCH64_FEATURE_F16_FML by default.
opcodes/
* aarch64-tbl.h (aarch64_feature_fp_16_v8_2): Require AARCH64_FEATURE_F16_FML
and AARCH64_FEATURE_F16.
The crypto options depend on SIMD and FP, the documentation states so but the dependency is not there the code.
We have mostly gotten away with this due to the default flags
for the architectures (e.g. Armv8.2-a implies +simd) but this
discrepancy needs to be addressed.
gas/
2017-11-16 Tamar Christina <tamar.christina@arm.com>
* opcodes/aarch64-tbl.h
(aarch64_feature_crypto): Add ARCH64_FEATURE_SIMD and AARCH64_FEATURE_FP.
(aarch64_feature_crypto_v8_2, aarch64_feature_sm4): Likewise.
(aarch64_feature_sha3): Likewise.
While commits 9889cbb14e ("Check invalid mask registers") and
abfcb414b9 ("X86: Ignore REX_B bit for 32-bit XOP instructions") went a
bit into the right direction, this wasn't quite enough:
- VEX.vvvv has its high bit ignored
- EVEX.vvvv has its high bit ignored together with EVEX.v'
- the high bits of {,E}VEX.vvvv should not be prematurely zapped, to
allow proper checking of them when the fields has to hold al ones
- when the high bits of an immediate specify a register, bit 7 is
ignored
Since .code64 directive isn't available for 32-bit BFD and ELF directive
isn't available for non-ELF directive, we should avoid them.
* testsuite/gas/i386/noextreg.s: Replace .code64/.code32 and
64-bit instructions with .byte. Remove ELF directive.
The new flag "+fp16fml" is available from Armv8.2-a and implies +fp16 and is mandatory
from Armv8.4-a.
gas/
* config/tc-arm.c (arm_ext_fp16_fml, fp16fml): New.
(do_neon_fmac_maybe_scalar_long): Use arm_ext_fp16_fml.
* doc/c-arm.texi (fp16, fp16fml): New.
* testsuite/gas/arm/armv8_2-a-fp16.d (fp16): Make fp16fml.
* testsuite/gas/arm/armv8_3-a-fp16.d (fp16): Make fp16fml.
* testsuite/gas/arm/armv8_2-a-fp16-illegal.d (fp16): Make fp16fml.
* testsuite/gas/arm/armv8_2-a-fp16-thumb2.d (fp16): Make fp16fml.
include/
* opcode/arm.h: (ARM_EXT2_FP16_FML): New.
(ARM_AEXT2_V8_4A): Add ARM_EXT2_FP16_FML.
Hi Guys,
I am applying the rather large patch attached to this email to enhance
the readelf and objdump programs so that they now have the ability to
follow links to separate debug info files. (As requested by PR
15152). So for example whereas before we had this output:
$ readelf -wi main.exe
Contents of the .debug_info section:
[...]
<15> DW_AT_comp_dir : (alt indirect string, offset: 0x30c)
[...]
With the new option enabled we get:
$ readelf -wiK main.exe
main.exe: Found separate debug info file: dwz.debug
Contents of the .debug_info section (loaded from main.exe):
[...]
<15> DW_AT_comp_dir : (alt indirect string, offset: 0x30c) /home/nickc/Downloads/dwzm
[...]
The link following feature also means that we can get two lots of
output if the same section exists in both the main file and the
separate debug info file:
$ readelf -wiK main.exe
main.exe: Found separate debug info file: dwz.debug
Contents of the .debug_info section (loaded from main.exe):
[...]
Contents of the .debug_info section (loaded from dwz.debug):
[...]
The patch also adds the ability to display the contents of debuglink
sections:
$ readelf -wk main.exe
Contents of the .gnu_debugaltlink section:
Separate debug info file: dwz.debug
Build-ID (0x14 bytes):
c4 a8 89 8d 64 cf 70 8a 35 68 21 f2 ed 24 45 3e 18 7a 7a 93
Naturally there are long versions of these options (=follow-links and
=links). The documentation has been updated as well, and since both
readelf and objdump use the same set of debug display options, I have
moved the text into a separate file. There are also a couple of new
binutils tests to exercise the new behaviour.
There are a couple of missing features in the current patch however,
although I do intend to address them in follow up submissions:
Firstly the code does not check the build-id inside separate debug
info files when it is searching for a file specified by a
.gnu_debugaltlink section. It just assumes that if the file is there,
then it contains the information being sought.
Secondly I have not checked the DWARF-5 version of these link
features, so there will probably be code to add there.
Thirdly I have only implemented link following for the
DW_FORM_GNU_strp_alt format. Other alternate formats (eg
DW_FORM_GNU_ref_alt) have yet to be implemented.
Lastly, whilst implementing this feature I found it necessary to move
some of the global variables used by readelf (eg section_headers) into
a structure that can be passed around. I have moved all of the global
variables that were necessary to get the patch working, but I need to
complete the operation and move the remaining, file-specific variables
(eg dynamic_strings).
Cheers
Nick
binutils PR 15152
* dwarf.h (enum dwarf_section_display_enum): Add gnu_debuglink,
gnu_debugaltlink and separate_debug_str.
(struct dwarf_section): Add filename field.
Add prototypes for load_separate_debug_file, close_debug_file and
open_debug_file.
* dwarf.c (do_debug_links): New.
(do_follow_links): New.
(separate_debug_file, separate_debug_filename): New.
(fetch_alt_indirect_string): New function. Retrieves a string
from the debug string table in the separate debug info file.
(read_and_display_attr_value): Use it with DW_FORM_GNU_strp_alt.
(load_debug_section_with_follow): New function. Like
load_debug_section, but if the first attempt fails, then tries
again in the separate debug info file.
(introduce): New function.
(process_debug_info): Use load_debug_section_with_follow and
introduce.
(load_debug_info): Likewise.
(display_debug_lines_raw): Likewise.
(display_debug_lines_decoded): Likewise.
(display_debug_macinfo): Likewise.
(display_debug_macro): Likewise.
(display_debug_abbrev): Likewise.
(display_debug_loc): Likewise.
(display_debug_str): Likewise.
(display_debug_aranges): Likewise.
(display_debug_addr); Likewise.
(display_debug_frames): Likewise.
(display_gdb_index): Likewise.
(process_cu_tu_index): Likewise.
(load_cu_tu_indexes): Likewise.
(display_debug_links): New function. Displays the contents of a
.gnu_debuglink or .gnu_debugaltlink section.
(calc_gnu_debuglink_ctc32):New function. Calculates a CRC32
value.
(check_gnu_debuglink): New function. Checks the CRC of a
potential separate debug info file.
(parse_gnu_debuglink): New function. Reads a CRC value out of a
.gnu_debuglink section.
(check_gnu_debugaltlink): New function.
(parse_gnu_debugaltlink): New function. Reads the build-id value
out of a .gnu_debugaltlink section.
(load_separate_debug_info): New function. Finds and loads a
separate debug info file.
(load_separate_debug_file): New function. Attempts to find and
follow a link to a separate debug info file.
(free_debug_memory): Free the separate debug info file
information.
(opts_table): Add "follow-links" and "links".
(dwarf_select_sections_by_letters): Add "k" and "K".
(debug_displays): Reformat. Add .gnu-debuglink and
.gnu_debugaltlink.
Add an extra entry for .debug_str in a separate debug info file.
* doc/binutils.texi: Move description of debug dump features
common to both readelf and objdump into...
* objdump.c (usage): Add -Wk and -WK.
(load_specific_debug_section): Initialise the filename field in
the dwarf_section structure.
(close_debug_file): New function.
(open_debug_file): New function.
(dump_dwarf): Load and dump the separate debug info sections.
* readelf.c (struct filedata): New structure. Contains various
variables that used to be global:
(current_file_size, string_table, string_table_length, elf_header)
(section_headers, program_headers, dump_sects, num_dump_sects):
Move into filedata structure.
(cmdline): New global variable. Contains list of sections to dump
by number, as specified on the command line.
Add filedata parameter to most functions.
(load_debug_section): Load the string table if it has not already
been retrieved.
(close_file): New function.
(close_debug_file): New function.
(open_file): New function.
(open_debug_file): New function.
(process_object): Process sections in any separate debug info files.
* doc/debug.options.texi: New file. Add description of =links and
=follow-links options.
* NEWS: Mention the new feature.
* elfcomm.c: Have the byte gte functions take a const pointer.
* elfcomm.h: Update prototypes.
* testsuite/binutils-all/dw5.W: Update expected output.
* testsuite/binutils-all/objdump.WL: Update expected output.
* testsuite/binutils-all/objdump.exp: Add test of -WK and -Wk.
* testsuite/binutils-all/readelf.exp: Add test of -wK and -wk.
* testsuite/binutils-all/readelf.k: New file.
* testsuite/binutils-all/objdump.Wk: New file.
* testsuite/binutils-all/objdump.WK2: New file.
* testsuite/binutils-all/linkdebug.s: New file.
* testsuite/binutils-all/debuglink.s: New file.
gas * testsuite/gas/avr/large-debug-line-table.d: Update expected
output.
* testsuite/gas/elf/dwarf2-11.d: Likewise.
* testsuite/gas/elf/dwarf2-12.d: Likewise.
* testsuite/gas/elf/dwarf2-13.d: Likewise.
* testsuite/gas/elf/dwarf2-14.d: Likewise.
* testsuite/gas/elf/dwarf2-15.d: Likewise.
* testsuite/gas/elf/dwarf2-16.d: Likewise.
* testsuite/gas/elf/dwarf2-17.d: Likewise.
* testsuite/gas/elf/dwarf2-18.d: Likewise.
* testsuite/gas/elf/dwarf2-5.d: Likewise.
* testsuite/gas/elf/dwarf2-6.d: Likewise.
* testsuite/gas/elf/dwarf2-7.d: Likewise.
ld * testsuite/ld-avr/gc-section-debugline.d: Update expected
output.
VEX.W may be legitimately set (and is then ignored by the CPU) for
non-64-bit code. Don't print 64-bit register names in such a case, by
utilizing that REX_W would never be set for non-64-bit code, and that
it is being set from VEX.W by generic decoding.
A test for this is going to be introduced in the next patch of this
series.
The low four bits of an immediate being set when the high bits specify a
fourth register operand is not a problem: CPUs ignore these bits rather
than raising #UD. Take care of incrementing codep in OP_EX_VexW()
instead.
Just like %cxl can't be used as shift count register. Otherwise for
consistency %cxl would need to gain "ShiftCount" and use of both ought
to properly cause REX prefixes to be emitted.
Commit dd90581873 ("Place .shstrtab section after .symtab and .strtab,
thus restoring monotonically incre... ") adjusted section numbers, but
forgot to adjust sh_link references from relocation and group section
table entries.
Additionally some other (perhaps subsequent) change appears to have
added .rel.* and .rela.* sections to their respective groups, which
requires some further adjustments to group-2.d. I assume this additional
breakage wasn't noticed because the test was already failing at that
time.
This makes the gas testsuite complete successfully again for me in a
cross build on ix86-linux; there continue to be quite a few ld failures.
... rather than silently dropping it altogether.
i386_finalize_displacement() expects baseindex to already be set, so
the respective statement needs to be moved up. This then also allows a
subsequent conditional to be simplified.
For this to not regress on 32-bit addressing, break out address size
guessing from i386_index_check(), invoking the new function earlier so
that i386_finalize_displacement() has i.prefix[ADDR_PREFIX] available.
i386_addressing_mode () in turn needs i.base_reg / i.index_reg set
earlier.
The new options are:
+aes: Enables the AES instructions of Armv8-a,
enabled by default with +crypto.
+sha2: Enables the SHA1 and SHA2 instructions of Armv8-a,
enabled by default with +crypto.
These options have been turned on by default when +crypto
is used, as such no breakage is expected.
The reason for the split is because with the introduction of Armv8.4-a
the implementation of AES has explicitly been made independent of the
implementation of the other crypto extensions. Backporting the split does
not break any of the previous requirements and so is safe to do.
gas * config/tc-aarch64.c
(aarch64_features): Include AES and SHA2 in CRYPTO.
Add SHA2 and AES.
include * opcode/aarch64.h:
(AARCH64_FEATURE_SHA2, AARCH64_FEATURE_AES): New.
opcodes * aarch64-tbl.h (aarch64_feature_crypto): Add AES and SHA2.
(aarch64_feature_sha2, aarch64_feature_aes): New.
(SHA2, AES): New.
(AES_INSN, SHA2_INSN): New.
(pmull, pmull2, aese, aesd, aesmc, aesimc): Change to AES_INS.
(sha1h, sha1su1, sha256su0, sha1c, sha1p,
sha1m, sha1su0, sha256h, sha256h2, sha256su1):
Change to SHA2_INS.
gas * config/tc-arm.c (arm_extensions):
(arm_archs): New entry for "armv8.4-a".
Add FPU_ARCH_DOTPROD_NEON_VFP_ARMV8.
(arm_ext_v8_2): New variable.
(enum arm_reg_type): New enumeration REG_TYPE_NSD.
(reg_expected_msgs): New entry for REG_TYPE_NSD.
(parse_typed_reg_or_scalar): Handle REG_TYPE_NSD.
(parse_scalar): Support REG_TYPE_VFS.
(enum operand_parse_code): New enumerations OP_RNSD and OP_RNSD_RNSC.
(parse_operands): Handle OP_RNSD and OP_RNSD_RNSC.
(NEON_SHAPE_DEF): New entries for DHH and DHS.
(neon_scalar_for_fmac_fp16_long): New function to generate Rm encoding
for new FP16 instructions in ARMv8.2-A.
(do_neon_fmac_maybe_scalar_long): New function to encode new FP16
instructions in ARMv8.2-A.
(do_neon_vfmal): Wrapper function for vfmal.
(do_neon_vfmsl): Wrapper function for vfmsl.
(insns): New entries for vfmal and vfmsl.
* doc/c-arm.texi (-march): Document "armv8.4-a".
* testsuite/gas/arm/dotprod-mandatory.d: New test.
* testsuite/gas/arm/armv8_2-a-fp16.s: New test source.
* testsuite/gas/arm/armv8_2-a-fp16-illegal.s: New test source.
* testsuite/gas/arm/armv8_2-a-fp16.d: New test.
* testsuite/gas/arm/armv8_3-a-fp16.d: New test.
* testsuite/gas/arm/armv8_4-a-fp16.d: New test.
* testsuite/gas/arm/armv8_2-a-fp16-thumb2.d: New test.
* testsuite/gas/arm/armv8_2-a-fp16-illegal.d: New test.
* testsuite/gas/arm/armv8_2-a-fp16-illegal.l: New error file.
opcodes * arm-dis.c (coprocessor_opcodes): New entries for ARMv8.2-A new
FP16 instructions, including vfmal.f16 and vfmsl.f16.
include * opcode/arm.h (ARM_AEXT2_V8_4A): Include Dot Product feature.
(ARM_EXT2_V8_4A): New macro.
(ARM_AEXT2_V8_4A): Likewise.
(ARM_ARCH_V8_4A): Likewise.
This fixes some EH failures for the medany code model in the g++ testsuite.
The problem is that the assembler is computing some values in the eh_frame
section as constants, that instead should have had relocs to be resolved by
the linker. This happens in output_cfi_insn in the DW_CFA_advance_loc case
where it compares label frags and immediately simplifies if they are the
same. We can fix that by forcing a new frag after every instruction
that the linker can reduce in size. I've also added a testcase to verify
the fix. This was tested with binutils make check, and gcc/g++ make checks on
qemu for medlow and medany code models.
gas/
* config/tc-riscv.c (append_insn): Call frag_wane and frag_new at
end for linker optimizable relocs.
* testsuite/gas/riscv/eh-relocs.d: New.
* testsuite/gas/riscv/eh-relocs.s: New.
* testsuite/gas/riscv/riscv.exp: Run eh-relocs test.
The RISC-V privileged ISA changed the name of sptbr (Supervisor Page
Table Base Register) to satp (Supervisor Address Translation and
Protection) to reflect the fact it could be used for more than just
paging. This patch adds an alias, as they're the same register.
include/ChangeLog
2017-11-06 Palmer Dabbelt <palmer@dabbelt.com>
* opcode/riscv-opc.h (sptbr): Rename to satp.
(CSR_SPTBR): Rename to CSR_SATP.
(sptbr): Alias to CSR_SATP.
gas/ChangeLog
2017-11-06 Palmer Dabbelt <palmer@dabbelt.com>
* testsuite/gas/riscv/satp.d: New test.
testsuite/gas/riscv/satp.s: Likewise.
testsuite/gas/riscv/riscv.exp: Likewise.
config/tc-riscv.c (md_begin): Handle CSR aliases.
gas * config/tc-arm.c (arm_cpus):
Change FPU_ARCH_CRYPTO_NEON_VFP_ARMV8
into FPU_ARCH_CRYPTO_NEON_VFP_ARMV8_DOTPROD.
include * opcode/arm.h (FPU_ARCH_CRYPTO_NEON_VFP_ARMV8_DOTPROD):
New macro.
I'd edited these thinking that there might be cases where the counts
were one, but on further investigation it appears not. What's left
here are some minor tweaks.
* read.c (assemble_one, s_bundle_unlock): Formatting.
Consistently add comma and "bytes" to error message.
* testsuite/gas/i386/bundle-bad.l: Adjust to suit.
binutils has lacked proper pluralization of output messages for a long
time, for example, readelf will display information about a section
that "contains 1 entries" or "There are 1 section headers". Fixing
this properly requires us to use ngettext, because other languages
have different rules to English.
This patch defines macros for ngettext and friends to handle builds
with --disable-nls, and tidies the existing nls support. I've
redefined gettext rather than just defining "_" as dgettext in bfd and
opcodes in case someone wants to use gettext there (which might
conceivably happen with generated code).
bfd/
* sysdep.h: Formatting, comment fixes.
(gettext, ngettext): Redefine when ENABLE_NLS.
(ngettext, dngettext, dcngettext): Define when !ENABLE_NLS.
(_): Define using gettext.
(textdomain, bindtextdomain): Use safer "do nothing".
* hosts/alphavms.h (textdomain, bindtextdomain): Likewise.
(ngettext, dngettext, dcngettext): Define when !ENABLE_NLS.
opcodes/
* opintl.h: Formatting, comment fixes.
(gettext, ngettext): Redefine when ENABLE_NLS.
(ngettext, dngettext, dcngettext): Define when !ENABLE_NLS.
(_): Define using gettext.
(textdomain, bindtextdomain): Use safer "do nothing".
binutils/
* sysdep.h (textdomain, bindtextdomain): Use safer "do nothing".
(ngettext, dngettext, dcngettext): Define when !ENABLE_NLS.
gas/
* asintl.h (textdomain, bindtextdomain): Use safer "do nothing".
(ngettext, dngettext, dcngettext): Define when !ENABLE_NLS.
gold/
* system.h (textdomain, bindtextdomain): Use safer "do nothing".
(ngettext, dngettext, dcngettext): Define when !ENABLE_NLS.
ld/
* ld.h (textdomain, bindtextdomain): Use safer "do nothing".
(ngettext, dngettext, dcngettext): Define when !ENABLE_NLS.
This adds an option for the Qualcomm saphira core, the corresponding
gcc patch is here:
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02055.html
This was tested with an aarch64 build and make check and also by
building and running SPEC2006.
gas/
* config/tc-aarch64.c (aarch64_cpus): Add saphira.
* doc/c-aarch64.texi: Likewise.
Object files other than ELF do not have mapping symbols to indicate the
type of data for objdump to work reliably. This is why the following
tests FAIL on arm-wince-pe targets:
ARMv6T2 Thumb CoProcessor Instructions (1)
ARMv6T2 Thumb CoProcessor Instructions (2)
This patch adds the force-thumb disassembler option to objdump for this
test to PASS on these targets as well.
2017-11-02 Thomas Preud'homme <thomas.preudhomme@arm.com>
gas/
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v6t2-1.d: Add
--disassembler-options=force-thumb to objdump options.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v6t2-2.d: Likewise.
A few coprocessor instructions introduced in ARMv2 are currently
accepted by GAS when targeting ARMv1 due to a typo in the code. This
patch fixes the issue and introduce a more fine grained testing for
coprocessor instructions availability. Coprocessor instructions are
grouped as follows:
* ARM coprocessor instructions introduced in ARMv2
Includes: ldc, stc, mcr, mrc, cdp, ldcl, stcl
Guarded by: ARM_EXT_V2
Tests: copro-arm_v2plus-arm_v*.d
* ARM coprocessor instructions introduced in ARMv5
Includes: ldc2, ldc2l, stc2, stc2l, cdp2, mcr2, mrc2
Guarded by: ARM_EXT_V5
Tests: copro-arm_v5plus-arm_v*.d
* ARM coprocessor instructions introduced in ARMv5TE
Includes: mcrr, mrrc
Guarded by: ARM_EXT_V5E
Tests: copro-arm_v5teplus-arm_v*.d
* ARM coprocessor instructions introduced in ARMv6
Includes: mcrr2, mrrc2
Guarded by: ARM_EXT_V6
Tests: copro-arm_v6plus-arm_v*.d
* Thumb coprocessor instructions introduced in ARMv6T2
Includes: ldc, ldcl, stc, stcl, mcr, mrc, mcrr, mrrc, cdp, ldc2,
ldc2l, stc2, stc2l, cdp2, mcr2, mrc2, mcrr2, mrrc2
Guarded by: ARM_EXT_V6T2
Tests: copro-thumb_v6t2plus-thumb_v*.d
For each of these groups, at least 2 tests are performed:
* instructions are not available in earlier architecture
* instructions are available in architecture where they were introduced
More tests need to be performed when instructions in a group span
several assembly files.
Note that an instruction in the original coprocessor testcase is
changed to unified syntax to allow the testcase to be assembled for ARM
and Thumb state. Correct processing of legacy syntax is covered in other
testcases.
2017-11-01 Thomas Preud'homme <thomas.preudhomme@arm.com>
gas/
* config/tc-arm.c (arm_ext_v2): Define to ARM_EXT_V2 feature bit.
* testsuite/gas/arm/copro.s: Split into ...
* testsuite/gas/arm/copro-arm_v2plus-thumb_v6t2plus.s: This while
changing it to unified syntax and ...
* testsuite/gas/arm/copro-arm_v5plus-thumb_v6t2plus.s: this and ...
* testsuite/gas/arm/copro-arm_v5teplus-thumb_v6t2plus.s: This and ...
* testsuite/gas/arm/copro-arm_v6plus-thumb_v6t2plus.s: This.
* testsuite/gas/arm/copro.d: Split into ...
* testsuite/gas/arm/copro-arm_v2plus-arm_v2.d: This but target ARMv2
and ...
* testsuite/gas/arm/copro-arm_v5plus-arm_v5.d: this but target ARMv5
and ...
* testsuite/gas/arm/copro-arm_v5teplus-arm_v5te.d: This but target
ARMv5TE and ...
* testsuite/gas/arm/copro-arm_v6plus-arm_v6.d: This but target ARMv6.
* testsuite/gas/arm/copro-arm_v2plus-arm_v1.d: New testcase.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v4t-1.d: New testcase.
* testsuite/gas/arm/copro-arm_v2plus-thumb_v6t2plus-unavail.l: Expected
errors for the above two testcases.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v6t2-1.d: New testcase.
* testsuite/gas/arm/copro-arm_v5plus-arm_v4.d: New testcase.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v4t-2.d: New testcase.
* testsuite/gas/arm/copro-arm_v5plus-thumb_v6t2plus-unavail.l:
Expected errors for the above two testcases.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v6t2-2.d: New testcase.
* testsuite/gas/arm/copro-arm_v5teplus-arm_v5.d: New testcase.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v4t-3.d: New testcase.
* testsuite/gas/arm/copro-arm_v5teplus-thumb_v6t2plus-unavail.l:
Expected errors for the above two testcases.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v6t2-3.d: New testcase.
* testsuite/gas/arm/copro-arm_v6plus-arm_v5te.d: New testcase.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v4t-4.d: New testcase.
* testsuite/gas/arm/copro-arm_v6plus-thumb_v6t2plus-unavail.l:
Expected errors for the above two testcases.
* testsuite/gas/arm/copro-thumb_v6t2plus-thumb_v6t2-4.d: New testcase.
tic4x fails due to being a 4 octets per byte target, while tic54x is 2
octets per byte.
mmix still fails with
fill-1.s:4: Error: unknown pseudo-op: `.l1:'
fill-1.s:6: Error: unknown pseudo-op: `.l2:'
fill-1.s:3: Error: .space specifies non-absolute value
and if the labels are changed to L1 and L2 then mep-elf fails with
fill-1.s:3: Error: .space specifies non-absolute value
Since both of those look like they ought to be investigated by the
target maintainers, I'm tweaking the test to fail on both targets.
* testsuite/gas/all/fill-1.d: Exclude tic4x and tic54x.
* testsuite/gas/all/fill-1.s: Use L1 rather than .L1.
These are all invalid instructions, so they should not disassemble.
opcodes/ChangeLog
2017-10-24 Andrew Waterman <andrew@sifive.com>
* riscv-opc.c (match_c_addi16sp) : New function.
(match_c_addi4spn): New function.
(match_c_lui): Don't allow 0-immediate encodings.
(riscv_opcodes) <addi>: Use the above functions.
<add>: Likewise.
<c.addi4spn>: Likewise.
<c.addi16sp>: Likewise.
gas/ChangeLog
2017-10-24 Andrew Waterman <andrew@sifive.com>
* testsuite/gas/riscv/c-addi16sp-fail.d: New test.
testsuite/gas/riscv/c-addi16sp-fail.l: Likewise.
testsuite/gas/riscv/c-addi16sp-fail.s: Likewise.
testsuite/gas/riscv/c-addi4spn-fail.d: Likewise.
testsuite/gas/riscv/c-addi4spn-fail.l: Likewise.
testsuite/gas/riscv/c-addi4spn-fail.s: Likewise.
testsuite/gas/riscv/riscv.exp: Add new tests.
This matches the ISA specification. This also adds two tests: one to
make sure the assembler rejects invalid 'c.lui's, and one to make sure
we only relax valid 'c.lui's.
bfd/ChangeLog
2017-10-24 Andrew Waterman <andrew@sifive.com>
* elfnn-riscv.c (_bfd_riscv_relax_lui): Don't relax to c.lui
when rd is x0.
include/ChangeLog
2017-10-24 Andrew Waterman <andrew@sifive.com>
* opcode/riscv.h (VALID_RVC_LUI_IMM): c.lui can't load the
immediate 0.
gas/ChangeLog
2017-10-24 Andrew Waterman <andrew@sifive.com>
* testsuite/gas/riscv/c-lui-fail.d: New testcase.
gas/testsuite/gas/riscv/c-lui-fail.l: Likewise.
gas/testsuite/gas/riscv/c-lui-fail.s: Likewise.
gas/testsuite/gas/riscv/riscv.exp: Likewise.
ld/ChangeLog
2017-10-24 Andrew Waterman <andrew@sifive.com>
* ld/testsuite/ld-riscv-elf/c-lui.d: New testcase.
ld/testsuite/ld-riscv-elf/c-lui.s: Likewise.
ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp: New test suite.
Without 64-bit bfd, we can't properly support .code64 directive in
32-bit mode.
* config/tc-i386.c (md_pseudo_table): Add .code64 directive
only if BFD64 is defined.
* testsuite/gas/i386/code64-inval.l: New file.
* gas/testsuite/gas/i386/code64-inval.s: Likewise.
* gas/testsuite/gas/i386/code64.d: Likewise.
* gas/testsuite/gas/i386/code64.s: Likewise.
* testsuite/gas/i386/i386.exp: Run mixed-mode-reloc32,
att-regs, intel-regs, intel-expr and string-ok tests only if
assembler supports x86-64. Run code64 and code64-inval.
Systems without the C extension mandate 4-byte alignment for
instructions, so there is no reason to allow for 2-byte alignment. This
change avoids emitting lots of unimplemented instructions into object
files on non-C targets, which users keep reporting as a bug. While this
isn't actually a bug (as none of the offsets in object files are
relevant until RISC-V), it is ugly.
gas/ChangeLog
2017-10-23 Palmer Dabbelt <palmer@dabbelt.com>
* config/tc-riscv.c (riscv_frag_align_code): Align code by 4
bytes on non-RVC systems.
Fix a bug in MIPS n32 ELF object file generation and make such objects
consistent with the n32 BFD requested, by presetting the EF_MIPS_ABI2
flag in the `e_flags' member of the newly created ELF file header, as it
is this flag that tells n32 objects apart from o32 objects.
This flag will then stay set through to output file generation for
writers such as GAS or GDB's `generate-core-file' command. Readers will
overwrite the whole of `e_flags' along with the rest of the ELF file
header in `elf_swap_ehdr_in' and then verify in `mips_elf_n32_object_p'
that the flag is still set before accepting an input file as an n32
object.
The issue was discovered with GDB's `generate-core-file' command making
o32 core files out of n32 debuggees.
bfd/
* elfn32-mips.c (mips_elf_n32_mkobject): New prototype and
function.
(bfd_elf32_mkobject): Use `mips_elf_n32_mkobject' rather than
`_bfd_mips_elf_mkobject'.
gas/
* config/tc-mips.c (mips_elf_final_processing): Don't set
EF_MIPS_ABI2 in `e_flags'.
With a 32-bit bfd (default on an ILP32 system) the previous markings
on tests *were* correct. There, the results have been consistent
since they were added. The tests would appear to "spuriously" xpass
"only" on LP64 hosts, which were not the norm in 2000. (But, now CRIS
requires a 64-bit BFD.)
PR 22324
* read.c (s_rept): Use size_t type for count parameter.
(do_repeat): Change type of count parameter to size_t.
Issue an error is the count parameter is negative.
(do_repeat_with_expression): Likewise.
* read.h: Update prototypes for do_repeat and
do_repeat_with_expression.
* doc/as.texinfo (Rept): Document that a zero count is allowed but
negative counts are not.
* config/tc-rx.c (rx_rept): Use size_t type for count parameter.
* config/tc-tic54x.c (tic54x_loop): Cast count parameter to size_t
type.
* testsuite/gas/macros/end.s: Add a test using a negative repeat
count.
* testsuite/gas/macros/end.l: Add expected error message.
In the medany code model the compiler generates PCREL_HI20+PCREL_LO12
relocation pairs against local symbols because HI20+LO12 relocations
can't reach high addresses. We relax HI20+LO12 pairs to GPREL
relocations when possible, which is an important optimization for
Dhrystone. Without this commit we are unable to relax
PCREL_HI20+PCREL_LO12 pairs to GPREL when possible, causing a 10%
permormance hit on Dhrystone on Rocket.
Note that we'll now relax
la gp, __global_pointer$
to
mv gp, gp
which probably isn't what you want in your entry code. Users who want
gp-relative symbols to continue to resolve should add ".option norelax"
accordingly. Due to this, the assembler now pairs PCREL relocations
with RELAX relocations when they're expected to be relaxed just like
every other relaxable relocation.
bfd/ChangeLog
2017-10-19 Palmer Dabbelt <palmer@dabbelt.com>
* elfnn-riscv.c (riscv_pcgp_hi_reloc): New structure.
(riscv_pcgp_lo_reloc): Likewise.
(riscv_pcgp_relocs): Likewise.
(riscv_init_pcgp_relocs): New function.
(riscv_free_pcgp_relocs): Likewise.
(riscv_record_pcgp_hi_reloc): Likewise.
(riscv_record_pcgp_lo_reloc): Likewise.
(riscv_delete_pcgp_hi_reloc): Likewise.
(riscv_use_pcgp_hi_reloc): Likewise.
(riscv_record_pcgp_lo_reloc): Likewise.
(riscv_find_pcgp_lo_reloc): Likewise.
(riscv_delete_pcgp_lo_reloc): Likewise.
(_bfd_riscv_relax_pc): Likewise.
(_bfd_riscv_relax_section): Handle R_RISCV_PCREL_* relocations
via the new functions above.
gas/ChangeLog
2017-10-19 Palmer Dabbelt <palmer@dabbelt.com>
* config/tc-riscv.c (md_apply_fix): Mark
BFD_RELOC_RISCV_PCREL_HI20 as relaxable when relaxations are
enabled.
PR 21621
* config/tc-avr.h (struct avr_frag_data): Add prev_opcode field.
(TC_FRAG_INIT): Define.
(avr_frag_init): Add prototype.
* config/tc-avr.c (avr_frag_init): New function.
(avr_operands): Replace static local 'prev' variable with
prev_opcode field in current frag.
* testsuite/gas/avr/pr21621.s: New test source file.
* testsuite/gas/avr/pr21621.d: New test driver file.
* testsuite/gas/avr/pr21621.s: New test error output file.
This fixes various issues with the fill-1 testcase causing fails on a
couple of targets.
gas/ChangeLog:
2017-10-19 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
* testsuite/gas/all/fill-1.s: Use normal labels. Change .text to
.data. Pick different values. Use .dc.w instead of .word.
* testsuite/gas/all/fill-1.d: New objdump output check.
* testsuite/gas/all/gas.exp: Use run_dump_test to execute fill-1
testcase.
There are individual comments that explain why each test isn't
supported, but the vast majority of them are due to RISC-V's aggressive
linker relaxation. The SLEB test cases should eventually be supported,
but the remaining ones probably won't ever be.
2017-10-18 Palmer Dabbelt <palmer@dabbelt.com>
* testsuite/gas/all/align.d: Mark as unsupported on RISC-V.
testsuite/gas/all/relax.d: Likewise.
testsuite/gas/all/sleb128-2.d: Likewise.
testsuite/gas/all/sleb128-4.d: Likewise.
testsuite/gas/all/sleb128-5.d: Likewise.
testsuite/gas/all/sleb128-7.d: Likewise.
testsuite/gas/elf/section11.d: Likewise.
testsuite/gas/all/gas.exp (diff1.s): Likewise.
2017-10-16 Sandra Loosemore <sandra@codesourcery.com>
Henry Wong <henry@stuffedcow.net>
gas/
* config/tc-nios2.c (nios2_translate_pseudo_insn): Check for
correct number of arguments.
(md_assemble): Handle failure of nios2_translate_pseudo_insn.
* testsuite/gas/nios2/illegal_pseudoinst.l: New file.
* testsuite/gas/nios2/illegal_pseudoinst.s: New file.
* testsuite/gas/nios2/nios2.exp: Add illegal_pseudoinst test.
FT32B is a new FT32 family member. It has a code
compression scheme, which requires the use of linker
relaxations. The change is quite large, so submission
is in several parts.
Part 1 adds a 15-bit instruction field, and CPU-specific functions for
the code compression that are used in binutils and GDB.
bfd/ChangeLog:
2017-10-12 James Bowman <james.bowman@ftdichip.com>
* bfd-in2.h: Regenerate.
* libbfd.h: Regenerate.
* elf32-ft32.c: Add HOWTO R_FT32_15.
* reloc.c: Add BFD_RELOC_FT32_15.
gas/ChangeLog:
2017-10-12 James Bowman <james.bowman@ftdichip.com>
* config/tc-ft32.c (md_assemble): Replace FT32_FLD_K8 with
K15.
(md_apply_fix, tc_gen_reloc): Add BFD_RELOC_FT32_15.
include/ChangeLog:
2017-10-12 James Bowman <james.bowman@ftdichip.com>
* elf/ft32.h: Add R_FT32_15.
* opcode/ft32.h: Replace FT32_FLD_K8 with K15.
(ft32_shortcode, sc_compar, ft32_split_shortcode,
ft32_merge_shortcode, ft32_merge_shortcode): New functions.
opcodes/ChangeLog:
2017-10-12 James Bowman <james.bowman@ftdichip.com>
* opcodes/ft32-dis.c (print_insn_ft32): Replace FT32_FLD_K8 with K15.
* opcodes/ft32-opc.c (ft32_opc_info): Replace FT32_FLD_K8 with
K15. Add jmpix pattern.
sim/ChangeLog:
2017-10-12 James Bowman <james.bowman@ftdichip.com>
* sim/ft32/interp.c (step_once): Replace FT32_FLD_K8 with K15.
PR 21977
* listing.c (listing_newline): Use the name of the current
physical input file, rather than the current logical input file,
unless including high level source in the listing.
* input-scrub.c (as_where_physical): New function. Returns the
name of the current physical input file.
* as.h: Add prototype for as_where_physical.
prno, tpei, and irbm are missing in the optable.
gas/ChangeLog:
2017-10-09 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
* testsuite/gas/s390/zarch-arch12.d (prno, tpei, irbm): New
instructions added.
* testsuite/gas/s390/zarch-arch12.s: Likewise.
* testsuite/gas/s390/zarch-z13.d: Rename ppno to prno.
opcodes/ChangeLog:
2017-10-09 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
* s390-opc.txt (prno, tpei, irbm): New instructions added.