ELFv2 functions with localentry:0 are those with a single entry point,
ie. global entry == local entry, and that have no requirement on r2 or
r12, and guarantee r2 is unchanged on return. Such an external
function can be called via the PLT without saving r2 or restoring it
on return, avoiding a common load-hit-store for small functions. The
optimization is attractive. The TOC pointer load-hit-store is a major
reason why calls to small functions that need no register saves, or
with shrink-wrap, no register saves on a fast path, are slow on
powerpc64le.
To be safe, this optimization needs ld.so support to check that the
run-time matches link-time function implementation. If a function
in a shared library with st_other localentry non-zero is called
without saving and restoring r2, r2 will be trashed on return, leading
to segfaults. For that reason the optimization does not happen for
weak functions since a weak definition is a fairly solid hint that the
function will likely be overridden. I'm also not enabling the
optimization by default unless glibc-2.26 is detected, which should
have the ld.so checks implemented.
bfd/
* elf64-ppc.c (struct ppc_link_hash_table): Add has_plt_localentry0.
(ppc64_elf_merge_symbol_attribute): Merge localentry bits from
dynamic objects.
(is_elfv2_localentry0): New function.
(ppc64_elf_tls_setup): Default params->plt_localentry0.
(plt_stub_size): Adjust size for tls_get_addr_opt stub.
(build_tls_get_addr_stub): Use a simpler stub when r2 is not saved.
(ppc64_elf_size_stubs): Leave stub_type as ppc_stub_plt_call for
optimized localentry:0 stubs.
(ppc64_elf_build_stubs): Save r2 in ELFv2 __glink_PLTresolve.
(ppc64_elf_relocate_section): Leave nop unchanged for optimized
localentry:0 stubs.
(ppc64_elf_finish_dynamic_sections): Set PPC64_OPT_LOCALENTRY in
DT_PPC64_OPT.
* elf64-ppc.h (struct ppc64_elf_params): Add plt_localentry0.
include/
* elf/ppc64.h (PPC64_OPT_LOCALENTRY): Define.
ld/
* emultempl/ppc64elf.em (params): Init plt_localentry0 field.
(enum ppc64_opt): New, replacing OPTION_* defines. Add
OPTION_PLT_LOCALENTRY, and OPTION_NO_PLT_LOCALENTRY.
(PARSE_AND_LIST_*): Support --plt-localentry and --no-plt-localentry.
* testsuite/ld-powerpc/elfv2so.d: Update.
* testsuite/ld-powerpc/powerpc.exp (TLS opt 5): Use --no-plt-localentry.
* testsuite/ld-powerpc/tlsopt5.d: Update.
This came up because I was looking at ld/tmpdir/addpcis.o and noticed
the odd addends on REL16DX_HA. They ought to both be -4. The error
crept in due REL16DX_HA howto being pc-relative (as indeed it should
be), and code at gas/write.c:1001 after this comment
/* Make it pc-relative. If the back-end code has not
selected a pc-relative reloc, cancel the adjustment
we do later on all pc-relative relocs. */
*not* cancelling the pc-relative adjustment. So I've made a dummy
non-relative split reloc so that the generic code handles this, rather
than attempting to add hacks later in md_apply_fix which would not be
very robust. Having the new internal reloc also makes it easy to
support
addpcis rx,sym@ha
as an equivalent to
addpcis rx,(sym-0f)@ha
0:
The patch also fixes overflow checking, which must test whether the
addi will overflow too since @l relocs don't have any overflow check.
Lastly, since I was poking at md_apply_fix, I arranged to have the
generic gas/write.c code emit errors for subtraction expressions where
we lack reloc support.
include/
* elf/ppc64.h (R_PPC64_16DX_HA): New. Expand fake reloc comment.
* elf/ppc.h (R_PPC_16DX_HA): Likewise.
bfd/
* reloc.c (BFD_RELOC_PPC_16DX_HA): New.
* elf64-ppc.c (ppc64_elf_howto_raw <R_PPC64_16DX_HA>): New howto.
(ppc64_elf_reloc_type_lookup): Translate new bfd reloc.
(ppc64_elf_ha_reloc): Correct overflow test on REL16DX_HA.
(ppc64_elf_relocate_section): Likewise.
* elf32-ppc.c (ppc_elf_howto_raw <R_PPC_16DX_HA>): New howto.
(ppc_elf_reloc_type_lookup): Translate new bfd reloc.
(ppc_elf_check_relocs): Handle R_PPC_16DX_HA to pacify gcc.
* libbfd.h: Regenerate.
* bfd-in2.h: Regenerate.
gas/
* config/tc-ppc.c (md_assemble): Use BFD_RELOC_PPC_16DX_HA for addpcis.
(md_apply_fix): Remove fx_subsy check. Move code converting to
pcrel reloc earlier and handle BFD_RELOC_PPC_16DX_HA. Remove code
emiiting errors on seeing fx_pcrel set on unexpected relocs, as
that is done now by the generic code via..
* config/tc-ppc.h (TC_FORCE_RELOCATION_SUB_LOCAL): ..this. Define.
(TC_VALIDATE_FIX_SUB): Define.
ld/
* testsuite/ld-powerpc/addpcis.d: Define ext1 and ext2 at
limits of addpcis range.
Add a new relocation that marks large-model entry code, for edit back
to medium-model.
include/elf/
* ppc64.h (R_PPC64_ENTRY): Define.
bfd/
* reloc.c (BFD_RELOC_PPC64_ENTRY): New.
* elf64-ppc.c (reloc_howto_type ppc64_elf_howto_raw): Add
entry for R_PPC64_ENTRY.
(LD_R2_0R12, ADD_R2_R2_R12, LIS_R2, ADDIS_R2_R12): Define.
(ppc64_elf_reloc_type_lookup): Handle R_PPC64_ENTRY.
(ppc64_elf_relocate_section): Edit code at R_PPC64_ENTTY. Use
new insn defines.
* libbfd.h: Regenerate.
* bfd-in2.h: Regenerate.
This adds support for "func@localentry", an expression that returns the
ELFv2 local entry point address of function "func". I've excluded
dynamic relocation support because that obviously would require glibc
changes.
include/elf/
* ppc64.h (R_PPC64_REL24_NOTOC, R_PPC64_ADDR64_LOCAL): Define.
bfd/
* elf64-ppc.c (ppc64_elf_howto_raw): Add R_PPC64_ADDR64_LOCAL entry.
(ppc64_elf_reloc_type_lookup): Support R_PPC64_ADDR64_LOCAL.
(ppc64_elf_check_relocs): Likewise.
(ppc64_elf_relocate_section): Likewise.
* Add BFD_RELOC_PPC64_ADDR64_LOCAL.
* bfd-in2.h: Regenerate.
* libbfd.h: Regenerate.
gas/
* config/tc-ppc.c (ppc_elf_suffix): Support @localentry.
(md_apply_fix): Support R_PPC64_ADDR64_LOCAL.
ld/testsuite/
* ld-powerpc/elfv2-2a.s, ld-powerpc/elfv2-2b.s: New files.
* ld-powerpc/elfv2-2exe.d, ld-powerpc/elfv2-2so.d: New files.
* ld-powerpc/powerpc.exp: Run new test.
elfcpp/
* powerpc.h (R_PPC64_REL24_NOTOC, R_PPC64_ADDR64_LOCAL): Define.
gold/
* powerpc.cc (Target_powerpc::Scan::local, global): Support
R_PPC64_ADDR64_LOCAL.
(Target_powerpc::Relocate::relocate): Likewise.
This removes the DT_PPC_TLSOPT/DT_PPC64_TLSOPT dynamic tag and replaces
it with DT_PPC_OPT/DT_PPC64_OPT tag to provide the same functionality
and more. This isn't backwards compatible, but the TLSOPT tag hasn't
been used since the tls optimisation support was never submitted to
glibc.
/include/elf/
* ppc.h (DT_PPC_TLSOPT): Delete.
(DT_PPC_OPT, PPC_OPT_TLS): Define.
* ppc64.h (DT_PPC64_TLSOPT): Delete.
(DT_PPC64_OPT, PPC64_OPT_TLS, PPC64_OPT_MULTI_TOC): Define.
bfd/
* elf32-ppc.c (ppc_elf_size_dynamic_sections): Use new DT_PPC_OPT
tag to specify tls optimisation.
* elf64-ppc.c (ppc64_elf_size_dynamic_sections): Likewise.
(ppc64_elf_finish_dynamic_sections): Specify whether multiple
toc pointers are used via DT_PPC64_OPT.
binutils/
* readelf.c (get_ppc_dynamic_type): Replace PPC_TLSOPT with PPC_OPT.
(get_ppc64_dynamic_type): Replace PPC64_TLSOPT with PPC64_OPT.
This defines the ELF symbol st_other field used to encode the number
of instructions between a function "global entry" and its "local entry",
and adds support related to the local entry offset.
include/elf/
* ppc64.h (STO_PPC64_LOCAL_BIT, STO_PPC64_LOCAL_MASK): Define.
(ppc64_decode_local_entry, ppc64_encode_local_entry): New functions.
(PPC64_LOCAL_ENTRY_OFFSET, PPC64_SET_LOCAL_ENTRY_OFFSET): Define.
bfd/
* elf64-ppc.c (struct ppc_stub_hash_entry): Add "other".
(stub_hash_newfunc): Init new ppc_stub_hash_entry field, and one
we forgot, "plt_ent".
(ppc64_elf_add_symbol_hook): Check ELFv1 objects don't have
st_other bits only valid in ELFv2.
(ppc64_elf_merge_symbol_attribute): New function.
(ppc_type_of_stub): Add local_off param to test branch range.
(ppc_build_one_stub): Adjust destinations for ELFv2 locals.
(ppc_size_one_stub, toc_adjusting_stub_needed): Similarly.
(ppc64_elf_size_stubs): Pass local_off to ppc_type_of_stub.
Set "other" field.
(ppc64_elf_relocate_section): Adjust destination for ELFv2 local
calls.
gas/
* config/tc-ppc.c (md_pseudo_table): Add .localentry.
(ppc_elf_localentry): New function.
(ppc_force_relocation): Force relocs on all branches to localenty
symbols.
(ppc_fix_adjustable): Don't reduce such symbols to section+offset.
binutils/
* readelf.c (get_ppc64_symbol_other): New function.
(get_symbol_other): Use it for EM_PPC64.
Defines bits in ELF e_flags to differentiate ELFv2 objects from ELFv2,
adds .abiversion directive to explicitly choose the ABI, and code to
check and automatically select ABI.
include/elf/
* ppc64.h (EF_PPC64_ABI): Define.
bfd/
* elf64-ppc.c (abiversion, set_abiversion): New functions.
(ppc64_elf_get_synthetic_symtab): Handle ELFv2 objects without .opd.
(struct ppc_link_hash_table): Add opd_abi.
(ppc64_elf_check_relocs): Check no .opd with ELFv2.
(ppc64_elf_merge_private_bfd_data): New function.
(ppc64_elf_print_private_bfd_data): New function.
(ppc64_elf_tls_setup): Set htab->opd_abi.
(ppc64_elf_size_dynamic_sections): Don't emit OPD related dynamic
tags for ELFv2.
(ppc_build_one_stub): Use R_PPC64_IRELATIVE for ELFv2 ifunc.
(ppc64_elf_finish_dynamic_symbol): Likewise
binutils/
* readelf.c (get_machine_flags): Display ABI version for EM_PPC64.
gas/
* config/tc-ppc.c: Include elf/ppc64.h.
(ppc_abiversion): New variable.
(md_pseudo_table): Add .abiversion.
(ppc_elf_abiversion, ppc_elf_end): New functions.
* config/tc-ppc.h (md_end): Define.
This changes the behaviour of @h and @ha on PowerPC64 to report errors
on 32-bit overflow. The motivation for this change is that on
PowerPC64, most uses of @h and @ha modifiers and their corresponding
relocations are to build up 32-bit offsets. We'd like to know when
such offsets overflow. Only rarely do people use @h or @ha with the
high 32-bit modifiers to build a 64-bit constant. Those uses will now
need to use two new modifiers, @high and @higha, if the constant isn't
known at assembly time. For now, we won't report overflow at assembly
time..
This also fixes an error when applying some of the HIGHER and HIGHEST
relocations.
include/elf/
* ppc64.h (R_PPC64_ADDR16_HIGH, R_PPC64_ADDR16_HIGHA,
R_PPC64_TPREL16_HIGH, R_PPC64_TPREL16_HIGHA,
R_PPC64_DTPREL16_HIGH, R_PPC64_DTPREL16_HIGHA): New.
(IS_PPC64_TLS_RELOC): Match new tls relocs.
bfd/
* reloc.c (BFD_RELOC_PPC64_ADDR16_HIGH, BFD_RELOC_PPC64_ADDR16_HIGHA,
BFD_RELOC_PPC64_TPREL16_HIGH, BFD_RELOC_PPC64_TPREL16_HIGHA,
BFD_RELOC_PPC64_DTPREL16_HIGH, BFD_RELOC_PPC64_DTPREL16_HIGHA): New.
* elf64-ppc.c (ppc64_elf_howto_raw): Add entries for new relocs.
Make all _HA and _HI relocs report signed overflow.
(ppc64_elf_reloc_type_lookup): Handle new relocs.
(must_be_dyn_reloc, ppc64_elf_check_relocs): Likewise.
(dec_dynrel_count, ppc64_elf_relocate_section): Likewise.
(ppc64_elf_relocate_section): Don't apply 0x8000 adjust to
R_PPC64_TPREL16_HIGHER, R_PPC64_TPREL16_HIGHEST,
R_PPC64_DTPREL16_HIGHER, and R_PPC64_DTPREL16_HIGHEST.
* libbfd.h: Regenerate.
* bfd-in2.h: Regenerate.
gas/
* config/tc-ppc.c (SEX16): Don't mask.
(REPORT_OVERFLOW_HI): Define as zero.
(ppc_elf_suffix): Support @high, @higha, @dtprel@high, @dtprel@higha,
@tprel@high, and @tprel@higha modifiers.
(md_assemble): Ignore X_unsigned when applying 16-bit insn fields.
Add (disabled) code to check @h and @ha reloc overflow for powerpc64.
Handle new relocs.
(md_apply_fix): Similarly.
elfcpp/
* powerpc.h (R_PPC64_ADDR16_HIGH, R_PPC64_ADDR16_HIGHA,
R_PPC64_TPREL16_HIGH, R_PPC64_TPREL16_HIGHA,
R_PPC64_DTPREL16_HIGH, R_PPC64_DTPREL16_HIGHA): Define.
gold/
* powerpc.cc (Target_powerpc::Scan::check_non_pic): Handle new relocs.
(Target_powerpc::Scan::global, local): Likewise.
(Target_powerpc::Relocate::relocate): Likewise. Check for overflow
on all ppc64 @h and @ha relocs.
* ppc.h (R_PPC_TLSGD, R_PPC_TLSLD): Add new relocs.
* ppc64.h (R_PPC64_TLSGD, R_PPC64_TLSLD): Add new relocs.
bfd/
* reloc.c (BFD_RELOC_PPC_TLSGD, BFD_RELOC_PPC_TLSLD): New.
* section.c (struct bfd_section): Add has_tls_get_addr_call.
(BFD_FAKE_SECTION): Init new flag.
* ecoff.c (bfd_debug_section): Likewise.
* bfd-in2.h: Regenerate.
* libbfd.h: Regenerate.
* elf32-ppc.c (ppc_elf_howto_raw): Add R_PPC_TLSGD and R_PPC_TLSLD.
(ppc_elf_reloc_type_lookup): Handle new relocs.
(ppc_elf_check_relocs): Set has_tls_get_addr_call on finding such
without marker relocs.
(ppc_elf_tls_optimize): Allow out-of-order __tls_get_addr relocs
if section has no old-style calls.
(ppc_elf_relocate_section): Set tls_mask for non-tls relocs too.
Don't try to optimize new-style __tls_get_addr call when handling
arg setup relocs. Instead do so for R_PPC_TLSGD and R_PPC_TLSLD
relocs.
* elf64-ppc.c (ppc64_elf_howto_raw): Add R_PPC64_TLSGD, R_PPC64_TLSLD.
(ppc64_elf_reloc_type_lookup): Handle new relocs.
(ppc64_elf_check_relocs): Set has_tls_get_addr_call on finding such
without marker relocs.
(ppc64_elf_tls_optimize): Allow out-of-order __tls_get_addr relocs
if section has no old-style calls. Set toc_ref for new relocs as
appropriate.
(ppc64_elf_relocate_section): Set tls_mask for non-tls relocs too.
Don't try to optimize new-style __tls_get_addr call when handling
arg setup relocs. Instead do so for R_PPC_TLSGD and R_PPC_TLSLD
relocs.
gas/
* config/tc-ppc.c (ppc_elf_suffix): Error if ppc32 tls got relocs
have non-zero addend.
(md_assemble): Parse args of __tls_get_addr calls.
(md_apply_fix): Handle BFD_RELOC_PPC_TLSGD and BFD_RELOC_PPC_TLSLD.
ld/testsuite/
* ld-powerpc/tlsmark.s, * ld-powerpc/tlsmark.d: New test.
* ld-powerpc/tlsmark32.s, * ld-powerpc/tlsmark32.d: New test.
* ld-powerpc/powerpc.exp: Run them.