Now that ISA3.1 is out we can finish with the powerxx silliness.
bfd/
* elf64-ppc.c: Rename powerxx to power10 throughout.
gas/
* config/tc-ppc.c (md_assemble): Update for PPC_OPCODE_POWER10
renaming.
* testsuite/gas/ppc/prefix-align.d: Use -mpower10/-Mpower10 in
place of -mfuture/-Mfuture.
* testsuite/gas/ppc/prefix-pcrel.d: Likewise.
* testsuite/gas/ppc/prefix-reloc.d: Likewise.
gold/
* powerpc.cc: Rename powerxx to power10 throughout.
include/
* elf/ppc64.h: Update comment.
* opcode/ppc.h (PPC_OPCODE_POWER10): Rename from PPC_OPCODE_POWERXX.
ld/
* testsuite/ld-powerpc/callstub-1.d: Use -mpower10/-Mpower10 in
place of -mfuture/-Mfuture.
* testsuite/ld-powerpc/notoc2.d: Likewise.
* testsuite/ld-powerpc/powerpc.exp: Likewise.
* testsuite/ld-powerpc/tlsgd.d: Likewise.
* testsuite/ld-powerpc/tlsie.d: Likewise.
* testsuite/ld-powerpc/tlsld.d: Likewise.
opcodes/
* ppc-dis.c (ppc_opts): Add "power10" entry.
(print_insn_powerpc): Update for PPC_OPCODE_POWER10 renaming.
* ppc-opc.c (POWER10): Rename from POWERXX. Update all uses.
This provides a linker generated __tls_get_addr_desc wrapper function
preserving registers around a __tls_get_addr call. The idea being to
support __tls_get_addr_desc without requiring a glibc update.
bfd/
* elf64-ppc.c (struct ppc_link_hash_table): Add tga_group.
(ppc64_elf_archive_symbol_lookup): Extract __tls_get_addr_opt for
__tls_get_addr_desc.
(ppc64_elf_size_stubs): Add section for linker generated
__tls_get_addr_desc wrapper function. Loop at least once if
generating this function.
(emit_tga_desc, emit_tga_desc_eh_frame): New functions.
(ppc64_elf_build_stubs): Generate __tls_get_addr_desc.
ld/
* testsuite/ld-powerpc/tlsdesc3.d,
* testsuite/ld-powerpc/tlsdesc3.wf,
* testsuite/ld-powerpc/tlsdesc4.d,
* testsuite/ld-powerpc/tlsdesc4.s,
* testsuite/ld-powerpc/tlsdesc4.wf: New tests.
* testsuite/ld-powerpc/powerpc.exp: Run them.
This implements register saving and restoring in the __tls_get_addr
call stub, so that when glibc supports the optimized tls call stub gcc
can generate code that assumes only r0, r12 and of course r3 are
changed on a __tls_get_addr call. When gcc expects __tls_get_addr
calls to preserve registers the call will be to __tls_get_addr_desc,
which will be translated by the linker to a call to __tls_get_addr_opt.
bfd/
* elf64-ppc.h (struct ppc64_elf_params): Add no_tls_get_addr_regsave.
* elf64-ppc.c (struct ppc_link_hash_table): Add tga_desc and
tga_desc_fd.
(is_tls_get_addr): Match tga_desc and tga_desc_df too.
(STDU_R1_0R1, ADDI_R1_R1): Define.
(tls_get_addr_prologue, tls_get_addr_epilogue): New functions.
(ppc64_elf_tls_setup): Set up tga_desc and tga_desc_fd. Indirect
tga_desc_fd to opt_fd, and tga_desc to opt. Set
no_tls_get_addr_regsave.
(branch_reloc_hash_match): Add hash3 and hash4.
(ppc64_elf_tls_optimize): Handle tga_desc_fd and tga_desc too.
(ppc64_elf_size_dynamic_sections): Likewise.
(ppc64_elf_relocate_section): Likewise.
(plt_stub_size, build_plt_stub): Likewise. Size regsave
__tls_get_addr stub.
(build_tls_get_addr_stub): Build regsave __tls_get_addr stub and
eh_frame.
(ppc_size_one_stub): Handle tga_desc_fd and tga_desc too. Size
eh_frame for regsave __tls_get_addr.
gas/
* config/tc-ppc.c (parse_tls_arg): Handle tls arg for
__tls_get_addr_desc and __tls_get_addr_opt.
ld/
* emultempl/ppc64elf.em (ppc64_opt, PARSE_AND_LIST_LONGOPTS),
(PARSE_AND_LIST_OPTIONS, PARSE_AND_LIST_ARGS_CASES): Support
--tls-get-addr-regsave and --no-tls-get-addr-regsave.
(params): Init new field.
* ld.texi (--tls-get-addr-regsave, --no-tls-get-addr-regsave):
Document.
* testsuite/ld-powerpc/tlsdesc.s,
* testsuite/ld-powerpc/tlsdesc.d,
* testsuite/ld-powerpc/tlsdesc.wf,
* testsuite/ld-powerpc/tlsdesc2.d,
* testsuite/ld-powerpc/tlsdesc2.wf,
* testsuite/ld-powerpc/tlsexenors.d,
* testsuite/ld-powerpc/tlsexenors.r,
* testsuite/ld-powerpc/tlsexers.d,
* testsuite/ld-powerpc/tlsexers.r,
* testsuite/ld-powerpc/tlsexetocnors.d,
* testsuite/ld-powerpc/tlsexetocrs.d,
* testsuite/ld-powerpc/tlsexetocrs.r,
* testsuite/ld-powerpc/tlsopt6.d,
* testsuite/ld-powerpc/tlsopt6.wf: New.
* testsuite/ld-powerpc/powerpc.exp: Run new tests.
This is the one that causes ld segfaults between 2019-10-04 and
2019-10-07. Bug introduced with f749f26eea, fixed by 93370e8e7b.
* testsuite/ld-powerpc/localgot.s,
* testsuite/ld-powerpc/localgot.d: New test.
* testsuite/ld-powerpc/powerpc.exp: Run it.
This patch adds some --no-tls-optimize tests and performs some of the
existing dynamic tests with tls markers in order to catch any
regression in PLT counting.
* testsuite/ld-powerpc/tlsexe.r: Adjust for added TLSMARK symbol.
* testsuite/ld-powerpc/tlsexe32.r: Likewise.
* testsuite/ld-powerpc/tlsso.r: Likewise.
* testsuite/ld-powerpc/tlsso32.r: Likewise.
* testsuite/ld-powerpc/tls32no.d,
* testsuite/ld-powerpc/tls32no.g: New test files.
* testsuite/ld-powerpc/tlsexe32no.d,
* testsuite/ld-powerpc/tlsexe32no.g,
* testsuite/ld-powerpc/tlsexe32no.r: New test files.
* testsuite/ld-powerpc/tlsexeno.d,
* testsuite/ld-powerpc/tlsexeno.g,
* testsuite/ld-powerpc/tlsexeno.r: New test files.
* testsuite/ld-powerpc/tlsexetocno.d,
* testsuite/ld-powerpc/tlsexetocno.g: New test files.
* testsuite/ld-powerpc/tlsno.d,
* testsuite/ld-powerpc/tlsno.g: New test files.
* testsuite/ld-powerpc/tlstocno.d,
* testsuite/ld-powerpc/tlstocno.g: New test files.
* testsuite/ld-powerpc/powerpc.exp: Run new tests.
This patch supports using pcrel instructions in TLS code sequences. A
number of new relocations are needed, gas operand modifiers to
generate those relocations, and new TLS optimisation. For
optimisation it turns out that the new pcrel GD and LD sequences can
be distinguished from the non-pcrel GD and LD sequences by there being
different relocations on the new sequence. The final "add ra,rb,13"
on IE sequences similarly needs a new relocation, or as I chose, a
modification of R_PPC64_TLS. On pcrel IE code, the R_PPC64_TLS points
one byte into the "add" instruction rather than being on the
instruction boundary.
GD:
pla 3,z@got@tlsgd@pcrel # R_PPC64_GOT_TLSGD34
bl __tls_get_addr@notoc(z@tlsgd) # R_PPC64_TLSGD and R_PPC64_REL24_NOTOC
edited to IE
pld 3,z@got@tprel@pcrel
add 3,3,13
edited to LE
paddi 3,13,z@tprel
nop
LD:
pla 3,z@got@tlsld@pcrel # R_PPC64_GOT_TLSLD34
bl __tls_get_addr@notoc(z@tlsld) # R_PPC64_TLSLD and R_PPC64_REL24_NOTOC
..
paddi 9,3,z2@dtprel
pld 10,z3@got@dtprel@pcrel
add 10,10,3
edited to LE
paddi 3,13,0x1000
nop
IE:
pld 9,z@got@tprel@pcrel # R_PPC64_GOT_TPREL34
add 3,9,z@tls@pcrel # R_PPC64_TLS at insn+1
ldx 4,9,z@tls@pcrel
lwax 5,9,z@tls@pcrel
stdx 5,9,z@tls@pcrel
edited to LE
paddi 9,13,z@tprel
nop
ld 4,0(9)
lwa 5,0(9)
std 5,0(9)
LE:
paddi 10,13,z@tprel
include/
* elf/ppc64.h (R_PPC64_TPREL34, R_PPC64_DTPREL34),
(R_PPC64_GOT_TLSGD34, R_PPC64_GOT_TLSLD34),
(R_PPC64_GOT_TPREL34, R_PPC64_GOT_DTPREL34): Define.
(IS_PPC64_TLS_RELOC): Include new tls relocs.
bfd/
* reloc.c (BFD_RELOC_PPC64_TPREL34, BFD_RELOC_PPC64_DTPREL34),
(BFD_RELOC_PPC64_GOT_TLSGD34, BFD_RELOC_PPC64_GOT_TLSLD34),
(BFD_RELOC_PPC64_GOT_TPREL34, BFD_RELOC_PPC64_GOT_DTPREL34),
(BFD_RELOC_PPC64_TLS_PCREL): New pcrel tls relocs.
* elf64-ppc.c (ppc64_elf_howto_raw): Add howtos for pcrel tls relocs.
(ppc64_elf_reloc_type_lookup): Translate pcrel tls relocs.
(must_be_dyn_reloc, dec_dynrel_count): Add R_PPC64_TPREL64.
(ppc64_elf_check_relocs): Support pcrel tls relocs.
(ppc64_elf_tls_optimize, ppc64_elf_relocate_section): Likewise.
* bfd-in2.h: Regenerate.
* libbfd.h: Regenerate.
gas/
* config/tc-ppc.c (ppc_elf_suffix): Map "tls@pcrel", "got@tlsgd@pcrel",
"got@tlsld@pcrel", "got@tprel@pcrel", and "got@dtprel@pcrel".
(fixup_size, md_assemble): Handle pcrel tls relocs.
(ppc_force_relocation, ppc_fix_adjustable): Likewise.
(md_apply_fix, tc_gen_reloc): Likewise.
ld/
* testsuite/ld-powerpc/tlsgd.d,
* testsuite/ld-powerpc/tlsgd.s,
* testsuite/ld-powerpc/tlsie.d,
* testsuite/ld-powerpc/tlsie.s,
* testsuite/ld-powerpc/tlsld.d,
* testsuite/ld-powerpc/tlsld.s: New tests.
* testsuite/ld-powerpc/powerpc.exp: Run them.
Just making room for a new tlsld test.
* testsuite/ld-powerpc/tlsldopt.d: Rename from tlsld.d.
* testsuite/ld-powerpc/tlsldopt.s: Rename from tlsld.s.
* testsuite/ld-powerpc/tlsldopt32.d: Rename from tlsld32.d.
* testsuite/ld-powerpc/tlsldopt32.s: Rename from tlsld32.s.
* testsuite/ld-powerpc/powerpc.exp: Update.
bfd/
* elf64-ppc.c (ppc64_elf_check_relocs): Set has_gotrel for
R_PPC64_GOT_PCREL34.
(xlate_pcrel_opt): New function.
(ppc64_elf_edit_toc): Handle R_PPC64_GOT_PCREL34.
(ppc64_elf_relocate_section): Edit GOT indirect to GOT relative
for R_PPC64_GOT_PCREL34. Implement R_PPC64_PCREL_OPT optimisation.
ld/
* testsuite/ld-powerpc/pcrelopt.s,
* testsuite/ld-powerpc/pcrelopt.d,
* testsuite/ld-powerpc/pcrelopt.sec: New test.
* testsuite/ld-powerpc/powerpc.exp: Run it.
IFUNC resolvers must always be called via their global entry point.
They will be called from ld.so rather than from the local executable.
PR 23937
bfd/
* elf64-ppc.c (write_plt_relocs_for_local_syms): Don't add local
entry offset for ifuncs.
ld/
* testsuite/ld-powerpc/pr23937.d,
* testsuite/ld-powerpc/pr23937.s: New test.
* testsuite/ld-powerpc/powerpc.exp: Run it.
This patch generates EH info for the new _notoc linkage stubs, to
support unwinding from asynchronous signal handlers. Unwinding
through the __tls_get_addr_opt stub was already supported, but that
was just a single stub. With multiple stubs the EH opcodes need to be
emitted and sized when iterating over stubs, so this is done when
emitting and sizing the stub code. Emitting the CIEs and FDEs is done
when sizing the stubs, as we did before in order to have the linker
generated FDEs indexed in .eh_frame_hdr. I moved the final tweaks to
FDEs from ppc64_elf_finish_dynamic_sections to ppc64_elf_build_stubs
simply because it's tidier to be done with them at that point.
bfd/
* elf64-ppc.c (struct map_stub): Delete tls_get_addr_opt_bctrl.
Add lr_restore, eh_size and eh_base.
(eh_advance, eh_advance_size): New functions.
(build_tls_get_addr_stub): Emit EH info for stub.
(ppc_build_one_stub): Likewise for _notoc stubs.
(ppc_size_one_stub): Size EH info for stub.
(group_sections): Init new map_stub fields.
(stub_eh_frame_size): Delete.
(ppc64_elf_size_stubs): Size EH info for stubs. Set up dummy EH
program for stubs.
(ppc64_elf_build_stubs): Reinit new map_stub fields. Set FDE
offset to stub section here..
(ppc64_elf_finish_dynamic_sections): ..rather than here.
ld/
* testsuite/ld-powerpc/notoc.s: Generate some cfi.
* testsuite/ld-powerpc/notoc.d: Adjust.
* testsuite/ld-powerpc/notoc.wf: New file.
* testsuite/ld-powerpc/powerpc.exp: Run "ext" and "notoc" tests
as run_ld_link_tests rather than run_dump_test.
R_PPC64_REL24_NOTOC is used on calls like "bl foo@notoc" to tell the
linker that linkage stubs for PLT calls or long branches can't use r2
for pic addressing. Instead, new stubs that generate pc-relative
addresses are used. One complication is that pc-relative offsets to
the PLT may need to be 64-bit in large programs, in contrast to the
toc-relative addressing used by older PLT linkage stubs where a 32-bit
offset is sufficient until the PLT itself exceeds 2G in size.
.eh_frame info to cover the _notoc stubs is yet to be implemented.
bfd/
* elf64-ppc.c (ADDI_R12_R11, ADDI_R12_R12, LIS_R12),
(ADDIS_R12_R11, ORIS_R12_R12_0, ORI_R12_R12_0),
(SLDI_R12_R12_32, LDX_R12_R11_R12, ADD_R12_R11_R12): Define.
(ppc64_elf_howto_raw): Add R_PPC64_REL24_NOTOC entry.
(ppc64_elf_reloc_type_lookup): Support R_PPC64_REL24_NOTOC.
(ppc_stub_type): Add ppc_stub_long_branch_notoc,
ppc_stub_long_branch_both, ppc_stub_plt_branch_notoc,
ppc_stub_plt_branch_both, ppc_stub_plt_call_notoc, and
ppc_stub_plt_call_both.
(is_branch_reloc): Add R_PPC64_REL24_NOTOC.
(build_offset, size_offset): New functions.
(plt_stub_size): Support plt_call_notoc and plt_call_both.
(ppc_build_one_stub, ppc_size_one_stub): Support new stubs.
(toc_adjusting_stub_needed): Handle R_PPC64_REL24_NOTOC.
(ppc64_elf_size_stubs): Likewise, and new stubs.
(ppc64_elf_build_stubs, ppc64_elf_relocate_section): Likewise.
* reloc.c: Add BFD_RELOC_PPC64_REL24_NOTOC.
* bfd-in2.h: Regenerate.
* libbfd.h: Regenerate.
gas/
* config/tc-ppc.c (ppc_elf_suffix): Support @notoc.
(ppc_force_relocation, ppc_fix_adjustable): Handle REL24_NOTOC.
ld/
* testsuite/ld-powerpc/ext.d,
* testsuite/ld-powerpc/ext.s,
* testsuite/ld-powerpc/ext.lnk,
* testsuite/ld-powerpc/notoc.d,
* testsuite/ld-powerpc/notoc.s: New tests.
* testsuite/ld-powerpc/powerpc.exp: Run them.
The modified test failed on some powerpc targets due to differences in
default hash style. If the default hash style is both, then more
sections are created, bumping section ids. Section id is used in stub
symbols and although the test is careful to not depend on id in
labels, the stub hash traversal order changes when stub names change.
That lead to the stubs being emitted in a different order and thus not
matching expected output.
* testsuite/ld-powerpc/powerpc.exp: Run tlsopt5 with hash-style
specified.
This patch sets stub_offset in ppc_size_one_stub rather than in
ppc_build_one_stub. That allows the plt stub alignment to be done in
just ppc_size_one_stub rather than both functions. The patch also
corrects the place where the alignment was done, fixing a possible
error in .eh_frame data, and tidies some offset calculations.
bfd/
* elf64-ppc.c (plt_stub_pad): Delay plt_stub_size call until needed.
(ppc_build_one_stub): Don't set stub_offset, instead assert that
it is sane. Don't adjust stub_offset for alignment. Adjust size
calculation. Use "targ" temp when calculating offsets.
(ppc_size_one_stub): Set stub_offset here. Use "targ" temp when
calculating offsets. Adjust for alignment before setting
tls_get_addr_opt_bctrl.
ld/
* testsuite/ld-powerpc/powerpc.exp: Run tlsopt5 with plt alignment.
* testsuite/ld-powerpc/tlsopt5.s: Add extra call.
* testsuite/ld-powerpc/tlsopt5.wf: Adjust expected output.
* testsuite/ld-powerpc/tlsopt5.d: Likewise.
One of the ill effects of ld -r is to mash together sections. That
can result in reduced icache performance at runtime due to unexpected
movement of code. Another problem is that sections can become too
large to link on targets that have limited relative addressing. ld -r
--relax attempts to overcome the large section problem for branches by
inserting trampolines, but the powerpc support added lots of
unnecessary trampolines. This patch trims them somewhat.
bfd/
* elf32-ppc.c (ppc_elf_relax_section): Ignore common or undef locals.
Avoid trashing toff with added when used as a symbol index.
Ignore R_PPC_PLTREL24 addends in unused example code. Avoid
creating unnecessary fixups when relocatable.
ld/
* testsuite/ld-powerpc/big.s: New file.
* testsuite/ld-powerpc/relaxrl.d: New test.
* testsuite/ld-powerpc/powerpc.exp: Run new test.
* testsuite/ld-powerpc/relaxr.d: Adjust.
This reverts most of commit 1be5d8d3bb.
Left in place are addition of --no-plt-align to some ppc32 ld tests
and the ld.texinfo --no-plt-thread-safe fix.
This changes the PowerPC64 --plt-align option to perform the usual
alignment of code as suggested by its name, as well as the previous
behaviour of padding so as to reduce boundary crossing. The old
behaviour is had by using a negative parameter.
The default is also changed to align plt stub code by default to 32
byte boundaries, the point being to get better bctr branch prediction
on power8 and power9 hardware.
bfd/
* elf64-ppp.c (plt_stub_pad): Handle positive and negative
plt_stub_align.
ld/
* ld.texinfo (--plt-align): Describe new behaviour of option.
* emultempl/ppc64elf.em (params): Default plt_stub_align to 5.
* testsuite/ld-powerpc/powerpc.exp: Pass --no-plt-align for
selected tests.
* testsuite/ld-powerpc/relbrlt.d: Pass --no-plt-align.
* testsuite/ld-powerpc/elfv2so.d: Adjust expected output.
In the TLS GD/LD to LE optimization, ld replaces a sequence like
addi 3,2,x@got@tlsgd R_PPC64_GOT_TLSGD16 x
bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x
R_PPC64_REL24 __tls_get_addr
nop
with
addis 3,13,x@tprel@ha R_PPC64_TPREL16_HA x
addi 3,3,x@tprel@l R_PPC64_TPREL16_LO x
nop
When the tprel offset is small, this can be further optimized to
nop
addi 3,13,x@tprel
nop
bfd/
* elf64-ppc.c (struct ppc_link_hash_table): Add do_tls_opt.
(ppc64_elf_tls_optimize): Set it.
(ppc64_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO and TPREL16_LO_DS relocs to use r13 when
addis would add zero.
* elf32-ppc.c (struct ppc_elf_link_hash_table): Add do_tls_opt.
(ppc_elf_tls_optimize): Set it.
(ppc_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO relocs to use r2 when addis would add zero.
gold/
* powerpc.cc (Target_powerpc::Relocate::relocate): Nop addis on
TPREL16_HA, and convert insn on TPREL16_LO and TPREL16_LO_DS
relocs to use r2/r13 when addis would add zero.
ld/
* testsuite/ld-powerpc/tls.s: Add calls with tls markers.
* testsuite/ld-powerpc/tls32.s: Likewise.
* testsuite/ld-powerpc/powerpc.exp: Run tls marker tests.
* testsuite/ld-powerpc/tls.d: Adjust for TPREL16_HA/LO optimization.
* testsuite/ld-powerpc/tlsexe.d: Likewise.
* testsuite/ld-powerpc/tlsexetoc.d: Likewise.
* testsuite/ld-powerpc/tlsld.d: Likewise.
* testsuite/ld-powerpc/tlsmark.d: Likewise.
* testsuite/ld-powerpc/tlsopt4.d: Likewise.
* testsuite/ld-powerpc/tlstoc.d: Likewise.
Since the __tls_get_addr_opt stub saves LR and makes a call, eh_frame
info should be generated to describe how to unwind through the stub.
The patch also changes the way the backend iterates over stubs, from
looking at all sections in stub_bfd to which all dynamic sections are
attached as well, to iterating over the group list, which gets just
the stub sections. Most binaries will have just one or two stub
groups, so this is a little faster.
bfd/
* elf64-ppc.c (struct map_stub): Add tls_get_addr_opt_bctrl.
(stub_eh_frame_size): New function.
(ppc_size_one_stub): Set group tls_get_addr_opt_bctrl.
(group_sections): Init group tls_get_addr_opt_bctrl.
(ppc64_elf_size_stubs): Update sizing and initialization of
.eh_frame. Iteration over stubs via group list.
(ppc64_elf_build_stubs): Iterate over stubs via group list.
(ppc64_elf_finish_dynamic_sections): Update finalization of
.eh_frame.
ld/
* testsuite/ld-powerpc/tlsopt5.s: Add cfi.
* testsuite/ld-powerpc/tlsopt5.d: Update.
* testsuite/ld-powerpc/tlsopt5.wf: New file.
* testsuite/ld-powerpc/powerpc.exp: Perform new tlsopt5 test.
These all were odd in that they used r13 as the GOT pointer. That
didn't matter for the purpose of testing, but would never occur in
practice. Also, the tlsopt5 tests could have their global dynamic
sequences optimized to initial exec, so link with -shared.
* testsuite/ld-powerpc/powerpc.exp: Add -shared to tlsop5 tests.
* testsuite/ld-powerpc/tlsopt5.d: Adjust.
* testsuite/ld-powerpc/tlsopt1_32.s: Use r30 as GOT pointer.
* testsuite/ld-powerpc/tlsopt2_32.s: Likewise.
* testsuite/ld-powerpc/tlsopt3_32.s: Likewise.
* testsuite/ld-powerpc/tlsopt4_32.s: Likewise.
* testsuite/ld-powerpc/tlsopt5_32.s: Rewrite.
* testsuite/ld-powerpc/tlsopt1_32.d: Adjust.
* testsuite/ld-powerpc/tlsopt2_32.d: Adjust.
* testsuite/ld-powerpc/tlsopt3_32.d: Adjust.
* testsuite/ld-powerpc/tlsopt5_32.d: Adjust.
ELFv2 functions with localentry:0 are those with a single entry point,
ie. global entry == local entry, and that have no requirement on r2 or
r12, and guarantee r2 is unchanged on return. Such an external
function can be called via the PLT without saving r2 or restoring it
on return, avoiding a common load-hit-store for small functions. The
optimization is attractive. The TOC pointer load-hit-store is a major
reason why calls to small functions that need no register saves, or
with shrink-wrap, no register saves on a fast path, are slow on
powerpc64le.
To be safe, this optimization needs ld.so support to check that the
run-time matches link-time function implementation. If a function
in a shared library with st_other localentry non-zero is called
without saving and restoring r2, r2 will be trashed on return, leading
to segfaults. For that reason the optimization does not happen for
weak functions since a weak definition is a fairly solid hint that the
function will likely be overridden. I'm also not enabling the
optimization by default unless glibc-2.26 is detected, which should
have the ld.so checks implemented.
bfd/
* elf64-ppc.c (struct ppc_link_hash_table): Add has_plt_localentry0.
(ppc64_elf_merge_symbol_attribute): Merge localentry bits from
dynamic objects.
(is_elfv2_localentry0): New function.
(ppc64_elf_tls_setup): Default params->plt_localentry0.
(plt_stub_size): Adjust size for tls_get_addr_opt stub.
(build_tls_get_addr_stub): Use a simpler stub when r2 is not saved.
(ppc64_elf_size_stubs): Leave stub_type as ppc_stub_plt_call for
optimized localentry:0 stubs.
(ppc64_elf_build_stubs): Save r2 in ELFv2 __glink_PLTresolve.
(ppc64_elf_relocate_section): Leave nop unchanged for optimized
localentry:0 stubs.
(ppc64_elf_finish_dynamic_sections): Set PPC64_OPT_LOCALENTRY in
DT_PPC64_OPT.
* elf64-ppc.h (struct ppc64_elf_params): Add plt_localentry0.
include/
* elf/ppc64.h (PPC64_OPT_LOCALENTRY): Define.
ld/
* emultempl/ppc64elf.em (params): Init plt_localentry0 field.
(enum ppc64_opt): New, replacing OPTION_* defines. Add
OPTION_PLT_LOCALENTRY, and OPTION_NO_PLT_LOCALENTRY.
(PARSE_AND_LIST_*): Support --plt-localentry and --no-plt-localentry.
* testsuite/ld-powerpc/elfv2so.d: Update.
* testsuite/ld-powerpc/powerpc.exp (TLS opt 5): Use --no-plt-localentry.
* testsuite/ld-powerpc/tlsopt5.d: Update.
Recognize power9 and a few other insns from older machines. Fixes
linker complaints like "toc optimization is not supported for
0xf4090002 instruction". 0xf4090002 is stxsd v0,0(r9)
bfd/
* elf64-ppc.c (ok_lo_toc_insn): Add r_type param. Recognize
lq,lfq,lxv,lxsd,lxssp,lfdp,stq,stfq,stxv,stxsd,stxssp,stfdp.
Don't match lmd and stmd.
ld/
* testsuite/ld-powerpc/tocopt7.s,
* testsuite/ld-powerpc/tocopt7.out,
* testsuite/ld-powerpc/tocopt7.d: New test.
* testsuite/ld-powerpc/tocopt8.s,
* testsuite/ld-powerpc/tocopt8.d: New test.
* testsuite/ld-powerpc/powerpc.exp: Run them.
Lots of fixes for the compatibility code that handles linking of
-mcall-aixdesc code (or that generated by 12 year old gcc) with
current ELFv1 ABI code.
1) A reference to a dot-symbol in an object file wasn't satisfied by a
function descriptor in later object files.
2) The as-needed code had bit-rotted; Shared libs now need a strong
reference to be counted as needed.
3) --gc-sections involving dot-symbols was broken, needing
func_desc_adjust to be run early and lots of other fixes.
bfd/
* elf64-ppc.c (struct ppc_link_hash_entry): Delete "was_undefined".
(struct ppc_link_hash_table): Delete "twiddled_syms". Add
"need_func_desc_adj".
(lookup_fdh): Link direct fdh sym via oh field and set flags.
(make_fdh): Make strong and weak undefined function descriptor
symbols.
(ppc64_elf_merge_symbol): New function.
(elf_backend_merge_symbol): Define.
(ppc64_elf_archive_symbol_lookup): Don't test undefweak for fake
function descriptors.
(add_symbol_adjust): Don't twiddle symbols to undefweak.
Propagate more ref flags to function descriptor symbol. Make
some function descriptor symbols dynamic.
(ppc64_elf_before_check_relocs): Only run add_symbol_adjust for
ELFv1. Set need_func_desc_adj. Don't fix undefs list.
(ppc64_elf_check_relocs): Set non_ir_ref for descriptors.
Don't call lookup_fdh here.
(ppc64_elf_gc_sections): New function.
(bfd_elf64_bfd_gc_sections): Define.
(ppc64_elf_gc_mark_hook): Mark descriptor.
(func_desc_adjust): Don't make fake function descriptor syms strong
here. Exit earlier on non-dotsyms. Take note of elf.dynamic
flag when deciding whether a dynamic function descriptor might
be needed. Transfer elf.dynamic and set elf.needs_plt. Move
plt regardless of visibility. Make descriptor dynamic if
entry sym is dynamic, not for other cases.
(ppc64_elf_func_desc_adjust): Don't run func_desc_adjust if
already done.
(ppc64_elf_edit_opd): Use oh field rather than lookup_fdh.
(ppc64_elf_size_stubs): Likewise.
(ppc_build_one_stub): Don't clear was_undefined. Only set sym
undefweak if stub symbol is defined.
(undo_symbol_twiddle, ppc64_elf_restore_symbols): Delete.
* elf64-ppc.h (ppc64_elf_restore_symbols): Don't declare.
ld/
* emultempl/ppc64elf.em (gld${EMULATION_NAME}_finish): Don't call
ppc64_elf_restore_symbols.
* testsuite/ld-powerpc/dotsym1.d: New.
* testsuite/ld-powerpc/dotsym2.d: New.
* testsuite/ld-powerpc/dotsym3.d: New.
* testsuite/ld-powerpc/dotsym4.d: New.
* testsuite/ld-powerpc/dotsymref.s: New.
* testsuite/ld-powerpc/nodotsym.s: New.
* testsuite/ld-powerpc/powerpc.exp: Run new tests.
This patch extends Tag_GNU_Power_ABI_FP to cover long double ABIs,
makes the assembler warn about undefined tag values, and removes
similar warnings from the linker. I think it is better to not
warn in the linker about undefined tag values as future extensions to
the tags then won't result in likely bogus warnings. This is
consistent with the fact that an older linker won't warn on an
entirely new tag.
include/
* elf/ppc.h (Tag_GNU_Power_ABI_FP): Comment.
bfd/
* elf-bfd.h (_bfd_elf_ppc_merge_fp_attributes): Declare.
* elf32-ppc.c (_bfd_elf_ppc_merge_fp_attributes): New function.
(ppc_elf_merge_obj_attributes): Use it. Don't copy first file
attributes, merge them. Don't warn about undefined tag bits,
or copy unknown values to output.
* elf64-ppc.c (ppc64_elf_merge_private_bfd_data): Call
_bfd_elf_ppc_merge_fp_attributes.
binutils/
* readelf.c (display_power_gnu_attribute): Catch truncated section
for all powerpc attributes. Display long double ABI. Don't
capitalize words, except for names. Show known bits of tag values
when some unknown bits are present. Whitespace fixes.
gas/
* config/tc-ppc.c (ppc_elf_gnu_attribute): New function.
(md_pseudo_table <ELF>): Handle "gnu_attribute".
ld/
* testsuite/ld-powerpc/attr-gnu-4-4.s: Delete.
* testsuite/ld-powerpc/attr-gnu-4-14.d: Delete.
* testsuite/ld-powerpc/attr-gnu-4-24.d: Delete.
* testsuite/ld-powerpc/attr-gnu-4-34.d: Delete.
* testsuite/ld-powerpc/attr-gnu-4-41.d: Delete.
* testsuite/ld-powerpc/attr-gnu-4-32.d: Adjust expected warning.
* testsuite/ld-powerpc/attr-gnu-8-23.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-4-01.d: Adjust expected output.
* testsuite/ld-powerpc/attr-gnu-4-02.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-4-03.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-4-10.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-4-11.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-4-20.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-4-22.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-4-33.d: Likewise.
* testsuite/ld-powerpc/attr-gnu-8-11.d: Likewise.
* testsuite/ld-powerpc/powerpc.exp: Don't run deleted tests.
VLE is an encoding, not a particular processor architecture, so it
isn't really proper to select insns based on PPC_OPCODE_VLE. For
example
{"evaddw", VX (4, 512), VX_MASK, PPCSPE|PPCVLE, PPCNONE, {RS, RA, RB}},
{"vaddubs", VX (4, 512), VX_MASK, PPCVEC|PPCVLE, PPCNONE, {VD, VA, VB}},
shows two insns that have the same encoding, both available with VLE.
Enabling both with VLE means we can't disassemble the second variant
even if -Maltivec is given rather than -Mspe. Also, we don't check
user assembly against the processor type as well as we could.
Another problem is that when using the VLE encoding, insns from the
main ppc opcode table are not available, except those using opcode 4
and 31. Correcting this revealed two errors in the ld testsuite,
use of "nop" and "rfmci" when -mvle.
This patch fixes those problems in the opcode table, and removes
PPCNONE. I find a plain 0 distracts less from other values.
In addition, I've implemented code to recognize some machine values
from the apuinfo note present in ppc32 objects. It's not a complete
disambiguation since we're lacking info to detect newer chips, but
what we have should help with disassembly.
include/
* elf/ppc.h (APUINFO_SECTION_NAME, APUINFO_LABEL, PPC_APUINFO_ISEL,
PPC_APUINFO_PMR, PPC_APUINFO_RFMCI, PPC_APUINFO_CACHELCK,
PPC_APUINFO_SPE, PPC_APUINFO_EFS, PPC_APUINFO_BRLOCK,
PPC_APUINFO_VLE: Define.
opcodes/
* ppc-dis.c (ppc_opts): Delete extraneous parentheses. Default
cpu for "vle" to e500.
* ppc-opc.c (ALLOW8_SPRG): Remove PPC_OPCODE_VLE.
(NO371, PPCSPE, PPCISEL, PPCEFS, MULHW, DCBT_EO): Likewise.
(PPCNONE): Delete, substitute throughout.
(powerpc_opcodes): Remove PPCVLE from "flags". Add to "deprecated"
except for major opcode 4 and 31.
(vle_opcodes <se_rfmci>): Add PPCRFMCI to flags.
bfd/
* cpu-powerpc.c (powerpc_compatible): Allow bfd_mach_ppc_vle entry
to match other 32-bit archs.
* elf32-ppc.c (_bfd_elf_ppc_set_arch): New function.
(ppc_elf_object_p): Call it.
(ppc_elf_special_sections): Use APUINFO_SECTION_NAME. Fix
overlong line.
(APUINFO_SECTION_NAME, APUINFO_LABEL): Don't define here.
* elf64-ppc.c (ppc64_elf_object_p): Call _bfd_elf_ppc_set_arch.
* bfd-in.h (_bfd_elf_ppc_at_tls_transform,
_bfd_elf_ppc_at_tprel_transform): Move to..
* elf-bfd.h: ..here.
(_bfd_elf_ppc_set_arch): Declare.
* bfd-in2.h: Regenerate.
gas/
* config/tc-ppc.c (PPC_APUINFO_ISEL, PPC_APUINFO_PMR,
PPC_APUINFO_RFMCI, PPC_APUINFO_CACHELCK, PPC_APUINFO_SPE,
PPC_APUINFO_EFS, PPC_APUINFO_BRLOCK, PPC_APUINFO_VLE): Don't define.
(ppc_setup_opcodes): Check vle disables powerpc_opcodes overridden
by vle_opcodes, and that vle flag doesn't enable opcodes. Don't
add vle_opcodes twice.
(ppc_cleanup): Use APUINFO_SECTION_NAME and APUINFO_LABEL.
ld/
* testsuite/ld-powerpc/apuinfo1.s: Delete nop.
* testsuite/ld-powerpc/apuinfo-vle2.s: New.
* testsuite/ld-powerpc/powerpc.exp: Use apuinfo-vle2.s.
bfd/
* elf64-ppc.c (toc_adjusting_stub_needed): Use the symbol value
plus addend rather than the original st_value when looking up
entries in opd->adjust.
ld/testsuite/
* ld-powerpc/tocopt6-inc.s, ld-powerpc/tocopt6a.s,
ld-powerpc/tocopt6b.s, ld-powerpc/tocopt6c.s,
ld-powerpc/tocopt6.d: New test.
* ld-powerpc/powerpc.exp (ppc64elftests): Add it.
When building a shared lib from non-PIC objects, we'll get dynamic
text relocations. These need to move with any insns we move.
Otherwise the dynamic reloc will modify the branch, resulting in
crashes and other unpleasant behaviour.
Also, ld -r --ppc476-workaround used with sufficiently aligned PIC
objects needs a fix for emitted REL16 relocs.
bfd/
* elf64-ppc.c (ppc_elf_relocate_section): Move dynamic text
relocs with insns moved by --ppc476-workaround. Correct
output of REL16 relocs.
ld/testsuite/
* ld-powerpc/ppc476-shared.s,
* ld-powerpc/ppc476-shared.lnk,
* ld-powerpc/ppc476-shared.d,
* ld-powerpc/ppc476-shared2.d: New tests.
* ld-powerpc/powerpc.exp: Run them.
The linker hardcoded r3 into a local-dynamic to local-exec TLS
optimization sequence. This is normally the case since r3 is required
as a parameter to (the optimized out) __tls_get_addr call. However,
it is possible for a compiler, LLVM in this case, to set up the
parameter value in another register then copy it to r3 before the
call.
When fixing this problem, I noticed that ppc32 had another bug when
optimizing away one of the TLS insns to a nop.
The patch also tidies a mask used by global-dynamic to initial-exec
TLS optimization, to just select the fields needed. Leaving the
offset in the instruction wasn't a bug since it will be overwritten
anyway.
bfd/
* elf64-ppc.c (ppc64_elf_relocate_section): Correct GOT_TLSLD
optimization. Tidy mask for GOT_TLSGD optimization.
* elf32-ppc.c (ppc_elf_relocate_section): Likewise. Correct
location of nop zapping high insn too.
ld/testsuite/
* ld-powerpc/tlsld.d, * ld-powerpc/tlsld.s: New test.
* ld-powerpc/tlsld32.d, * ld-powerpc/tlsld32.s: New test.
* ld-powerpc/powerpc.exp: Run them. Move tocvar and tocnovar.
The changes to reorder sections for better relro protection on powerpc64,
3e2b0f31, 23283c1b, and 5ad18f16, run into a problem with xlc.
xlc -qdatalocal puts global variables into .toc, which means that .toc
must be writable. The simplest way to accomplish this is to edit the
linker script to remove .toc sections from .got on detecting xlc object
files.
bfd/
* elf64-ppc.h (struct ppc64_elf_params): Add "object_in_toc".
* elf64-ppc.c (ppc64_elf_add_symbol_hook): Assume that global symbols
in .toc indicate xlc compiled code that might require a rw .toc.
ld/
* emulparams/elf64ppc.sh (INITIAL_READWRITE_SECTIONS): Define.
* emultempl/ppc64elf.em (params): Init new field.
(ppc_after_open): New function.
(LDEMUL_AFTER_OPEN): Define.
* ldlang.c (lang_final): Whitespace fix.
ld/testsuite/
* ld-powerpc/tocvar.d, * ld-powerpc/tocvar.s: New test.
* ld-powerpc/tocnovar.d, * ld-powerpc/tocnovar.s: New test.
* ld-powerpc/powerpc.exp: Run tocvar and tocnovar.
Trying to use the SEC_LINKER_CREATED section flag to determine whether
a symbol is linker defined fails to work on targets like alpha that
define special SEC_COMMON sections. These might contain symbols that
originated in an object file.
include/
* bfdlink.h (struct bfd_link_hash_entry): Comment non_ir_ref. Add
linker_def.
bfd/
* elflink.c (_bfd_elf_define_linkage_sym): Set linker_def.
* linker.c (_bfd_generic_link_add_one_symbol): Clear linker_def
for CDEF, DEF, DEFW, COM.
ld/
* ldexp.c (exp_fold_tree_1 <etree_provide>): Test linker_def.
ld/testsuite/
* ld-powerpc/sdabase.s,
* ld-powerpc/sdabase.t,
* ld-powerpc/sdabase.d: New test.
* ld-powerpc/sdabase2.t,
* ld-powerpc/sdabase2.d: New test.
* ld-powerpc/powerpc.exp: Run them.
Only set the VLE flag if the instruction has been pulled via the VLE
instruction set. This way the flag is guaranteed to be set for VLE-only
instructions or for VLE-only processors, however it'll remain clear for
dual-mode instructions on dual-mode and, more importantly, standard-mode
processors.
gas/
* config/tc-ppc.c (md_assemble): Only set the PPC_APUINFO_VLE
flag if both the processor and opcode flags match.
ld/testsuite/
* ld-powerpc/apuinfo-vle.rd: New test.
* ld-powerpc/apuinfo-vle.s: New test source.
* ld-powerpc/apuinfo.rd: Adjust according to GAS PPC_APUINFO_VLE
handling change.
* ld-powerpc/powerpc.exp: Run the new test.
This fixes a problem seen on powerpc64le ELFv2 when creating a
function symbol alias with ld --defsym. st_other needs to be copied
from the source symbol to the alias in order to set up the local entry
offset for the alias. I decided to make this change in the generic
ELF code rather than in elf64-ppc.c since it looks like other targets
that use st_other bits might benefit too.
bfd/
* elflink.c (_bfd_elf_copy_link_hash_symbol_type): Copy st_other
bits from source to dest.
* linker.c (_bfd_generic_copy_link_hash_symbol_type): Update comment.
* targets.c (struct bfd_target <_bfd_copy_link_hash_symbol_type>):
Likewise.
* bfd-in2.h: Regenerate.
ld/testsuite/
* ld-powerpc/defsym.s, * ld-powerpc/defsym.d: New test.
* ld-powerpc/powerpc.exp: Run it.
doesn't always mean you need to define a function symbol on plt code.
If all references are in read-write sections, then using dynamic relocs
is OK.
bfd/
* elf32-ppc.c (ppc_elf_adjust_dynamic_symbol): Clear
pointer_equality_needed when !readonly_dynrelocs.
* elf64-ppc.c (ppc64_elf_adjust_dynamic_symbol): Likewise.
ld/testsuite/
* ld-powerpc/ambiguousv1.d: Match symbol table too.
* ld-powerpc/ambiguousv2.d: Likewise.
* ld-powerpc/ambiguousv1b.d: New.
* ld-powerpc/ambiguousv2b.d: New.
* ld-powerpc/powerpc.exp: Run new tests.
ELFv2 needs to create plt entries in a non-PIC executable for an
address reference to a function defined in a shared object. It's
possible that an object file has no features that distinguish it as
ELFv1 or ELFv2, eg. an object only containing data. Such files need
to be handled like those that are known to be ELFv2.
However, this unnecessarily creates plt entries for the analogous
ELFv1 case, so arrange to set output abi version earlier, and use the
output abi version to further distinguish ambiguous input files.
bfd/
* elf64-ppc.c (ppc64_elf_check_relocs): Account for possibly
needed plt entries when taking the address of functions for
abiversion == 0 (ie. unknown) as well as abiversion == 2.
Move opd setup and abiversion checks to..
(ppc64_elf_before_check_relocs): ..here. Renamed from
ppc64_elf_process_dot_syms. Set output abiversion from input and
input abiversion from output, if either is not set.
(ppc64_elf_merge_private_bfd_data): Don't merge flags here.
(elf_backend_check_directives): Update.
ld/testsuite/
* ld-powerpc/startv1.s, * ld-powerpc/startv2.s, * ld-powerpc/funref.s,
* ld-powerpc/funv1.s, * ld-powerpc/funv2.s,
* ld-powerpc/ambiguousv1.d, * ld-powerpc/ambiguousv2.d: New test files.
* ld-powerpc/powerpc.exp: Run new tests.