In the TLS GD/LD to LE optimization, ld replaces a sequence like
addi 3,2,x@got@tlsgd R_PPC64_GOT_TLSGD16 x
bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x
R_PPC64_REL24 __tls_get_addr
nop
with
addis 3,13,x@tprel@ha R_PPC64_TPREL16_HA x
addi 3,3,x@tprel@l R_PPC64_TPREL16_LO x
nop
When the tprel offset is small, this can be further optimized to
nop
addi 3,13,x@tprel
nop
bfd/
* elf64-ppc.c (struct ppc_link_hash_table): Add do_tls_opt.
(ppc64_elf_tls_optimize): Set it.
(ppc64_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO and TPREL16_LO_DS relocs to use r13 when
addis would add zero.
* elf32-ppc.c (struct ppc_elf_link_hash_table): Add do_tls_opt.
(ppc_elf_tls_optimize): Set it.
(ppc_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO relocs to use r2 when addis would add zero.
gold/
* powerpc.cc (Target_powerpc::Relocate::relocate): Nop addis on
TPREL16_HA, and convert insn on TPREL16_LO and TPREL16_LO_DS
relocs to use r2/r13 when addis would add zero.
ld/
* testsuite/ld-powerpc/tls.s: Add calls with tls markers.
* testsuite/ld-powerpc/tls32.s: Likewise.
* testsuite/ld-powerpc/powerpc.exp: Run tls marker tests.
* testsuite/ld-powerpc/tls.d: Adjust for TPREL16_HA/LO optimization.
* testsuite/ld-powerpc/tlsexe.d: Likewise.
* testsuite/ld-powerpc/tlsexetoc.d: Likewise.
* testsuite/ld-powerpc/tlsld.d: Likewise.
* testsuite/ld-powerpc/tlsmark.d: Likewise.
* testsuite/ld-powerpc/tlsopt4.d: Likewise.
* testsuite/ld-powerpc/tlstoc.d: Likewise.
This implements the special __tls_get_addr_opt call stub for powerpc
gold that returns __thread variable addresses without actually making
a call to __tls_get_addr in most cases. Shared libraries that are
loaded at program load time (ie. dlopen is not used) have a known
layout for their __thread variables, and thus DTPMOD64/DPTREL64 pairs
describing those variables can be set up by ld.so for the
__tls_get_addr_opt call stub fast exit.
Ref https://sourceware.org/ml/libc-alpha/2015-03/msg00626.html
I really, really wish I'd used a differently versioned __tls_get_addr
symbol than the base symbol to indicate glibc support for the
optimized call, rather than having glibc export __tls_get_addr_opt. A
lot of the messing around here, flipping symbols from __tls_get_addr
to __tls_get_addr_opt, is caused by that decision. About the only
benefit is that a user can see at a glance that their disassembled
code is calling __tls_get_addr via the fancy call stub.. Anyway, we
need references to __tls_get_addr to seem like they were to
__tls_get_addr_opt, and in cases like the tsan interceptor, a
definition of __tls_get_addr to seem like one of __tls_get_addr_opt
as well. That's the reason for Symbol::clear_in_reg and
Symbol_table::clone, and why symbols are substituted in Scan::global
and other places dealing with dynamic linking.
elfcpp/
* elfcpp.h (DT_PPC_OPT): Define.
* powerpc.h (PPC_OPT_TLS): Define.
gold/
* options.h (tls_get_addr_optimize): New option.
* symtab.h (Symbol::clear_in_reg, clone): New functions.
(Sized_symbol::clone): New function.
(Symbol_table::clone): New function.
* resolve.cc (Symbol::clone, Sized_symbol::clone): New functions.
* powerpc.cc (Target_powerpc::has_tls_get_addr_opt_,
tls_get_addr_, tls_get_addr_opt_): New vars.
(Target_powerpc::tls_get_addr_opt, tls_get_addr,
is_tls_get_addr_opt, replace_tls_get_addr,
set_has_tls_get_addr_opt, stk_linker): New functions.
(Target_powerpc::Track_tls::maybe_skip_tls_get_addr_call): Add
target param. Update callers. Compare symbols rather than names.
(Target_powerpc::do_define_standard_symbols): Init tls_get_addr_
and tls_get_addr_opt_.
(Target_powerpc::Branch_info::mark_pltcall): Translate tls_get_addr
sym to tls_get_addr_opt.
(Target_powerpc::Branch_info::make_stub): Likewise.
(Stub_table::define_stub_syms): Likewise.
(Target_powerpc::Scan::global): Likewise.
(Target_powerpc::Relocate::relocate): Likewise.
(add_3_12_2, add_3_12_13, bctrl, beqlr, cmpdi_11_0, cmpwi_11_0,
ld_11_1, ld_11_3, ld_12_3, lwz_11_3, lwz_12_3, mr_0_3, mr_3_0,
mtlr_11, std_11_1): New constants.
(Stub_table::eh_frame_added_): Delete.
(Stub_table::tls_get_addr_opt_bctrl_, plt_fde_len_, plt_fde_): New vars.
(Stub_table::init_plt_fde): New functions.
(Stub_table::add_eh_frame, replace_eh_frame): Move definition out
of line. Init and use plt_fde_.
(Stub_table::plt_call_size): Return size for tls_get_addr stub.
Extract alignment code to..
(Stub_table::plt_call_align): ..this new function. Adjust all callers.
(Stub_table::add_plt_call_entry): Set has_tls_get_addr_opt and
tls_get_addr_opt_bctrl, and align after that.
(Stub_table::do_write): Write out tls_get_addr stub.
(Target_powerpc::do_finalize_sections): Emit DT_PPC_OPT
PPC_OPT_TLS/PPC64_OPT_TLS bit.
(Target_powerpc::Relocate::relocate): Don't check for or modify
nop following bl for tls_get_addr stub.
On 64-bit targets there is a 32-bit hole in symbol->u_, and another
due to symbol flags exceeding 32 bits. By splitting the union,
the total size of the class reduces by one 64-bit word.
* symtab.h (Symbol): Split u_ into u1_ and u2_. Adjust accessors
to suit. Move plt_offset_ before got_offsets_.
* symtab.cc (Symbol::init_fields): Adjust for union change.
(Symbol::init_base_output_data): Likewise.
(Symbol::init_base_output_segment): Likewise.
(Symbol::allocate_base_common): Likewise.
(Symbol::output_section): Likewise.
(Symbol::set_output_section): Likewise.
(Symbol::set_output_segment): Likewise.
* resolve.cc (Symbol::override_base): Likewise.
(Symbol::override_base_with_special): Likewise.
gold/ChangeLog:
PR gold/21868
* aarch64.cc (AArch64_relobj::try_fix_erratum_843419_optimized):
Add extra view offset argument to function.
(AArch64_relobj::fix_errata_and_relocate_erratum_stubs): Add
extra view offset set to the output offset when the view has
is_input_output_view set, since it has not already been
included. Pass this to try_fix_erratum_843419_optimized.
If a custom linker script with an unexpected relative layout of .got
and .got.plt sections was used, gold might produce a wrong offset
when applying R_AARCH64_TLSDESC_* relocations.
This patch fixes the issue by calculating "got_tlsdesc_offset"
in a more direct way.
gold/
* aarch64.cc (Target_aarch64::Relocate::relocate_tls):
Make got_tlsdesc_offset signed and fix its calculation.
* testsuite/Makefile.am (aarch64_tlsdesc): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/aarch64_tlsdesc.s: New test source file.
* testsuite/aarch64_tlsdesc.sh: New test script.
* testsuite/aarch64_tlsdesc.t: New test linker script.
This patch provides a flag for PowerPC64 ELFv2 use in class Symbol,
and modifies Sized_target::resolve to return whether the symbol has
been resolved. If not, normal processing continues. I use this for
PowerPC64 ELFv2 to keep track of whether a symbol has any definition
with non-zero localentry, in order to disable --plt-localentry for
that symbol.
PR 21847
* powerpc.cc (Target_powerpc::is_elfv2_localentry0): Test
non_zero_localentry.
(Target_powerpc::resolve): New function.
(powerpc_info): Set has_resolve for 64-bit.
* target.h (Sized_target::resolve): Return bool.
* resolve.cc (Symbol_table::resolve): Continue with normal
processing when target resolve returns false.
* symtab.h (Symbol::non_zero_localentry, set_non_zero_localentry):
New accessors.
(Symbol::non_zero_localentry_): New flag bit.
* symtab.cc (Symbol::init_fields): Init non_zero_localentry_.
There is a very small but non-zero probability that a stub group
contains stubs on one relax pass, but does not on the next. In that
case we would get an FDE covering a zero length address range.
(Actually, it's even worse. Alignment padding for stubs can mean the
address for the non-existent stubs is past the end of the original
section to which stubs are attached, and due to the way
do_plt_fde_location calculates the length we can get a negative
length.) Fixing this properly requires removing the FDE.
Also, I have been implementing the __tls_get_addr_opt support for
gold, and that stub needs something other than the default FDE. The
necessary FDE will depend on the offset to the __tls_get_addr_opt
stub, which of course can change during relaxation. That means at the
very least, rewriting the FDE on each pass, possibly changing the FDE
size. I think that is better done by completely recreating PLT
eh_frame FDEs.
* ehframe.cc (Fde::operator==): New.
(Cie::remove_fde, Eh_frame::remove_ehframe_for_plt): New.
* ehframe.h (Fde::operator==): Declare.
(Cie::remove_fde, Eh_frame::remove_ehframe_for_plt): Likewise.
* layout.cc (Layout::remove_eh_frame_for_plt): New.
* layout.h (Layout::remove_eh_frame_for_plt): Declare.
* powerpc.cc (Target_powerpc::do_relax): Remove old eh_frame FDEs.
(Stub_table::add_eh_frame): Delete eh_frame_added_ condition.
Don't add eh_frame for empty stub section.
(Stub_table::remove_eh_frame): New.
This adds a --no-tls-optimize option for people who want to keep
__tls_get_addr calls in an executable rather than optimizing such code
sequences to IE/LE.
Also tidy some formatting errors, rename a variable to better reflect
its use, and tweak two functions that create pairs of GOT entries to
first check whether the GOT entry already exists before potentially
inserting the header via reserve(2). Without the check it is possible
to waste one GOT entry.
* options.h (no_tls_optimize): New powerpc option.
* powerpc.cc (Target_powerpc::abiversion, set_abiversion): Formatting.
(Target_powerpc::stk_toc): Formatting, fix comment.
(Target_powerpc::Track_tls::tls_get_addr_state): Rename from
tls_get_addr.
(Target_powerpc::optimize_tls_gd, optimize_tls_ld, optimize_tls_ie):
Return TLSOPT_NONE when !tls_optimize.
(Target_powerpc::add_global_pair_with_rel): Check
for existing reloc before reserving.
(Target_powerpc::add_local_tls_pair): Likewise.
This makes ld warn about --plt-localentry if a version of glibc
without the necessary ld.so checks is detected, and revises the
documentation.
bfd/
* elf64-ppc.c (ppc64_elf_tls_setup): Warn on --plt-localentry
without ld.so checks.
gold/
* powerpc.cc (Target_powerpc::scan_relocs): Warn on --plt-localentry
without ld.so checks.
ld/
* ld.texinfo (plt-localentry): Revise.
The big comment in ppc64_elf_tls_setup says why. I've also added some
code to the bfd linker that catches the -lpthread -lc symbol
differences and disable generation of optimized call stubs even when
--plt-localentry is activated. Gold doesn't yet have that.
PR 21847
bfd/
* elf64-ppc.c (struct ppc_link_hash_entry): Add non_zero_localentry.
(ppc64_elf_merge_symbol): Set non_zero_localentry.
(is_elfv2_localentry0): Test non_zero_localentry.
(ppc64_elf_tls_setup): Default to --no-plt-localentry.
gold/
* powerpc.cc (Target_powerpc::scan_relocs): Default to
--no-plt-localentry.
ld/
* ld.texinfo (plt-localentry): Document.
The 64-bit ELF compression header has a reserved field. It should be
cleared to avoid random bits in it.
elfcpp/
PR gold/21857
* elfcpp.h (Chdr_write): Add put_ch_reserved.
(Chdr_write<64, true>::put_ch_reserved): New.
(Chdr_write<64, false>::put_ch_reserved): Likewise.
gold/
PR gold/21857
* compressed_output.cc (Output_compressed_section::set_final_data_size):
Call put_ch_reserved to clear the reserved field for 64-bit ELF.
GCC 4.2 fails to compile "(uint64_t) 0x800080008000" with
error: integer constant is too large for ‘long’ type
This patch adds "llu" suffix to 0x800080008000 for GCC 4.2.
* mips.cc (Mips_relocate_functions): Add "llu" suffix to
0x800080008000.
My PPC64_OPT_LOCALENTRY patch of June 1, git commit f378ab099d, and
the later gold change, git commit 7ee7ff7015, added an insn in
__glink_PLTresolve which needs a corresponding adjustment in the
eh_frame info for asynchronous exceptions to unwind correctly.
It would have been OK for both ABIs to use +5 for the advance before
restore of LR, since we can put the DW_CFA_restore_extended on any
insn after the actual restore and before the r12/r0 copy is clobbered,
but it's slightly better to delay as much as possible. There are
then more addresses where fewer CFA program insns are executed.
bfd/
* elf64-ppc.c (ppc64_elf_size_stubs): Correct advance to
restore of LR.
gold/
* powerpc.cc (glink_eh_frame_fde_64v2): Correct advance to
restore of LR.
(glink_eh_frame_fde_64v1): Advance to restore of LR at latest
possible insn.
The problem is caused by the fact that gold is relocating the stubs
for an entire output section when it processes the relocations for a
particular input section that happened to be designated as the stub
table "owner". The Relocate_task for that input section may or may not
run before the Relocate_task for another input section that contains
the code that needs the erratum fix, but doesn't "own" the stub
table. If it runs before (or might even race with) that other task, it
ends up with a copy of the unrelocated original instruction.
In other words - when calling fix_errata() from
do_relocate_sections(), gold is going through the list of errata stubs
that are associated only with that object. This routine updates the
stored original instruction and replaces it in the output view with a
branch to the stub. Later, as gold is going through the object file's
input sections, it then checks for stub tables "owned" by each input
section, and writes out all the stubs from that stub table, regardless
of what object file each stub is associated with.
Fixed by relocating the erratum stub only after the corresponding
errata spot is fixed. That is to have fix_errata() call
Stub_table::relocate_erratum_stub() for each stub.
gold/ChangeLog
2017-07-06 Han Shen <shenhan@google.com>
PR gold/21491
* aarch64.cc (Erratum_stub::invalidate_erratum_stub): New method.
(Erratum_stub::is_invalidated_erratum_stub): New method.
(Stub_table::relocate_reloc_stub): Renamed from "relocate_stub".
(Stub_table::relocate_reloc_stubs): Renamed from "relocate_stubs".
(Stub_table::relocate_erratum_stub): New method.
(AArch64_relobj::fix_errata_and_relocate_erratum_stubs): Renamed from
"fix_errata".
(Target_aarch64::relocate_reloc_stub): Renamed from "relocate_stub".
elfcpp/
* elfcpp.h (DT_PPC64_OPT): Define.
* powerpc.h (PPC64_OPT_TLS, PPC64_OPT_MULTI_TOC,
PPC64_OPT_LOCALENTRY): Define.
gold/
* options.h (General_options): Add plt_localentry.
* powerpc.cc (Target_powerpc::st_other): New function.
(Target_powerpc::plt_localentry0_, plt_localentry0_init_,
has_localentry0_): New vars.
(Target_powerpc::plt_localentry0, set_has_localentry0,
is_elfv2_localentry0): New functions.
(Target_powerpc::Branch_info::mark_pltcall): Don't set tocsave or
return true for localentry:0 calls.
(Stub_table::Plt_stub_ent::localentry0_): New var.
(Stub_table::add_plt_call_entry): Set localentry0_ and has_localentry0_.
Don't set r2save_ for localentry:0 calls.
(Output_data_glink::do_write): Save r2 in __glink_PLTresolve for elfv2.
(Target_powerpc::scan_relocs): Default plt_localentry0_.
(Target_powerpc::do_finalize_sections): Set DT_PPC64_OPT.
(Target_powerpc::Relocate::relocate): Don't require nop following
calls for localentry:0 plt calls, and don't change nop.
This adds support to gold for the tocsave relocs already supported by
ld.bfd. R_PPC64_TOCSAVE relocs are part of a scheme to move r2 saves
to the prologue of a function rather than in each plt call stub. We
don't want a compiler to always emit the r2 save, as this would be
wasted if the calls turned out to be local. See the tocsave*.s in
ld/testsuite/ld-powerpc/.
* powerpc.cc (Target_powerpc::tocsave_loc_): New var.
(Target_powerpc::mark_pltcall, add_tocsave, tocsave_loc): New functions.
(Target_powerpc::Branch_info::tocsave_): New var.
(Target_powerpc::Branch_info::mark_pltcall): New function.
(Target_powerpc::Branch_info::make_stub): Pass tocsave_ to
add_plt_call_entry.
(Stub_table::Plt_stub_ent): Make public. Add r2save_.
(Stub_table::add_plt_call_entry): Add bool tocsave_ param. Set
r2save_.
(Stub_table::find_plt_call_entry): Return Plt_stub_ent*. Adjust
use throughout.
(Stub_table::do_write): Conditionally output r2 save in plt stubs.
(Target_powerpc::Scan::local): Handle R_PPC64_TOCSAVE.
(Target_powerpc::Scan::global): Likewise.
(Target_powerpc::Relocate::relocate): Skip r2 save in plt call stub
with tocsave reloc. Replace header tocsave nop with r2 save.
* symtab.h (struct Symbol_location_hash): Make public.
I was lazy when adding indx_ to Plt_stub_ent. The field isn't part of
the key, so ought to be part of the mapped type. Make it so.
* powerpc.cc (Plt_stub_key): Rename from Plt_stub_ent. Remove indx_.
(Plt_stub_key_hash): Rename from Plt_stub_ent_hash.
(struct Plt_stub_ent): New.
(Plt_stub_entries): Map from Plt_stub_key to Plt_stub_ent. Adjust
use throughout file.
* aarch64.cc (scan_reloc_for_stub): Use plt_address_for_global to
calculate the symbol value.
(scan_reloc_section_for_stubs): Allow stubs to be created for
section symbols.
(maybe_apply_stub): Handle creating stubs for weak symbols to
match the code in scan_reloc_for_stub.
If two objects are compiled with -fPIC or -fPIE and call the same
function, two different PLT entries are created, one for each object,
but the same stub symbol name is used for both.
* powerpc.cc (Stub_table::define_stub_syms): Always include object's
uniq_ value.
TLS relaxation may change erratum 843419 sequences that those offending ADRP
instructions actually transformed into other instructions in which case there
is erratum 843419 risk anymore that we should avoid installing unnecessary
branch-to-stub.
gold/
* aarch64.cc (Insn_utilities::is_mrs_tpidr_el0): New method.
(AArch64_relobj<size, big_endian>::try_fix_erratum_843419_optimized):
Return ture for some TLS relaxed sequences.
* aarch64.cc (maybe_apply_stub): Add debug logging for looking
up stubs to undefined symbols and early return rather than
fail to look them up.
(scan_reloc_for_stub): Add debug logging for no stub creation
for undefined symbols.
gold/
PR gold/21444
* gold.cc (Target_sparc::Relocate::relocate_tls): Local
variables are final for position-independent executables. This
has to be consistent with Target_sparc::Scan::local otherwise
they will disagree as to whether local-exec is used.
gold/ChangeLog
PR gold/21430
* aarch64.cc
(AArch64_relobj::convert_input_section_to_relaxed_section):
Set the section offset to -1ULL.
(Target_aarch64::relocate_section): Adjust the view in case
of a relaxed input section.
* testsuite/Makefile.am (pr21430): New test.
* testsuite/Makefile.in: Regenerate
* testsuite/pr21430.s: New test source file.
* testsuite/pr21430.sh: New test script.
gold/
* mips.cc (Mips_got_entry::hash()): Shift addend to reduce
possibility of collisions.
(Mips_got_entry::equals): Fix case for GOT_TLS_LDM
entries.
gold/
* mips.cc (Mips_relobj::merge_processor_specific_data_): New data
member.
(Mips_relobj::merge_processor_specific_data): New method.
(Mips_relobj::do_read_symbols): Set merge_processor_specific_data_
to false, only if the input file is a binary or if object has no
contents except the section name string table and an empty symbol
table with the undefined symbol.
(Target_mips::do_finalize_sections): Refactor. Skip empty object files
for merging processor-specific data.
gold/
* mips.cc (Target_mips::Relocate::calculated_value_): New data
member.
(Target_mips::Relocate::calculate_only_): Likewise.
(Target_mips::Relocate::relocate): Handle multiple consecutive
relocations with the same offset.
gold/
* mips.cc (symbol_refs_local): Return false if a symbol
is from a dynamic object.
(Target_mips::got_section): Make _GLOBAL_OFFSET_TABLE_ STV_HIDDEN.
(Target_mips::set_gp): Refactor. Make _gp STT_NOTYPE and
STB_LOCAL.
(Target_mips::do_finalize_sections): Set _gp after all the checks
for creating .got are done.
(Target_mips::Scan::global): Remove unused code.
2017-02-15 Vladimir Radosavljevic <Vladimir.Radosavljevic@imgtec.com>
PR gold/21111
* mips.cc (Mips_relocate_functions::relhigher): New method.
(Mips_relocate_functions::relhighest): Likewise.
(mips_get_size_for_reloc): Add support for relocs: R_MIPS_HIGHER and
R_MIPS_HIGHEST.
(Target_mips::Scan::local): Add support for relocs: R_MIPS_HIGHER,
R_MIPS_HIGHEST, R_MICROMIPS_HIGHER and R_MICROMIPS_HIGHEST.
(Target_mips::Scan::global): Likewise.
(Target_mips::Scan::get_reference_flags): Likewise.
(Target_mips::Relocate::relocate): Call static methods for resolving
HIGHER and HIGHEST relocations.
gold/
* x86_64.cc (Target_x86_64::do_can_check_for_function_pointers):
Return true even when building pie binaries.
(Target_x86_64::possible_function_pointer_reloc): Check opcode
for R_X86_64_PC32 relocations.
(Target_x86_64::local_reloc_may_be_function_pointer): Pass
extra arguments to local_reloc_may_be_function_pointer.
(Target_x86_64::global_reloc_may_be_function_pointer): Likewise.
* gc.h (gc_process_relocs): Add check for STT_FUNC.
* testsuite/Makefile.am (icf_safe_pie_test): New test case.
* testsuite/Makefile.in: Regenerate.
* testsuite/icf_safe_pie_test.sh: New shell script.
gold/
* mips.cc (Mips_output_data_plt::rel_plt): Remove const from return
type.
(Target_mips::make_plt_entry): Make the sh_info field of .rel.plt
point to .plt.
gold/
PR gold/21054
* mips.cc (Mips_got_info::record_global_got_symbol): Don't add symbol
to the dynamic symbol table if it is forced to local visibility.
(Target_mips::do_finalize_sections): Don't add __RLD_MAP symbol to the
dynamic symbol table if it is forced to local visibility.