Intel MPX introduces 4 bound registers, which will be used for parameter
passing in x86-64. Bound registers are cleared by branch instructions.
Branch instructions with BND prefix will keep bound register contents.
This leads to 2 requirements to 64-bit MPX run-time:
1. Dynamic linker (ld.so) should save and restore bound registers during
symbol lookup.
2. Change the current 16-byte PLT0:
ff 35 08 00 00 00 pushq GOT+8(%rip)
ff 25 00 10 00 jmpq *GOT+16(%rip)
0f 1f 40 00 nopl 0x0(%rax)
and 16-byte PLT1:
ff 25 00 00 00 00 jmpq *name@GOTPCREL(%rip)
68 00 00 00 00 pushq $index
e9 00 00 00 00 jmpq PLT0
which clear bound registers, to preserve bound registers.
We use 2 new relocations:
to mark branch instructions with BND prefix.
When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
it switches to a different PLT0:
ff 35 08 00 00 00 pushq GOT+8(%rip)
f2 ff 25 00 10 00 bnd jmpq *GOT+16(%rip)
0f 1f 00 nopl (%rax)
to preserve bound registers for symbol lookup and it also creates an
external PLT section, .pl.bnd. Linker will create a BND PLT1 entry
in .plt:
68 00 00 00 00 pushq $index
f2 e9 00 00 00 00 bnd jmpq PLT0
0f 1f 44 00 00 nopl 0(%rax,%rax,1)
and a 8-byte BND PLT entry in .plt.bnd:
f2 ff 25 00 00 00 00 bnd jmpq *name@GOTPCREL(%rip)
90 nop
Otherwise, linker will create a legacy PLT1 entry in .plt:
68 00 00 00 00 pushq $index
e9 00 00 00 00 jmpq PLT0
66 0f 1f 44 00 00 nopw 0(%rax,%rax,1)
and a 8-byte legacy PLT in .plt.bnd:
ff 25 00 00 00 00 jmpq *name@GOTPCREL(%rip)
66 90 xchg %ax,%ax
The initial value of the GOT entry for "name" will be set to the the
"pushq" instruction in the corresponding entry in .plt. Linker will
resolve reference of symbol "name" to the entry in the second PLT,
.plt.bnd.
Prelink stores the offset of pushq of PLT1 (plt_base + 0x10) in GOT[1]
and GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing
the corresponding the pushq offset with
GOT[1] + (GOT offset - &GOT[3]) * 2
Since for each entry in .plt except for PLT0 we create a 8-byte entry in
.plt.bnd, there is extra 8-byte per PLT symbol.
We also investigated the 16-byte entry for .plt.bnd. We compared the
8-byte entry vs the the 16-byte entry for .plt.bnd on Sandy Bridge.
There are no performance differences in SPEC CPU 2000/2006 as well as
micro benchmarks.
Pros:
No change to undo prelink in dynamic linker.
Only 8-byte memory overhead for each PLT symbol.
Cons:
Extra .plt.bnd section is needed.
Extra 8 byte for legacy branches to PLT.
GDB is unware of the new layout of .plt and .plt.bnd.
bfd/
* elf64-x86-64.c (elf_x86_64_bnd_plt0_entry): New.
(elf_x86_64_legacy_plt_entry): Likewise.
(elf_x86_64_bnd_plt_entry): Likewise.
(elf_x86_64_legacy_plt2_entry): Likewise.
(elf_x86_64_bnd_plt2_entry): Likewise.
(elf_x86_64_bnd_arch_bed): Likewise.
(elf_x86_64_link_hash_entry): Add has_bnd_reloc and plt_bnd.
(elf_x86_64_link_hash_table): Add plt_bnd.
(elf_x86_64_link_hash_newfunc): Initialize has_bnd_reloc and
plt_bnd.
(elf_x86_64_copy_indirect_symbol): Also copy has_bnd_reloc.
(elf_x86_64_check_relocs): Create the second PLT for Intel MPX
in 64-bit mode.
(elf_x86_64_allocate_dynrelocs): Handle the second PLT for IFUNC
symbols. Resolve call to the second PLT if it is created.
(elf_x86_64_size_dynamic_sections): Keep the second PLT section.
(elf_x86_64_relocate_section): Resolve PLT references to the
second PLT if it is created.
(elf_x86_64_finish_dynamic_symbol): Use BND PLT0 and fill the
second PLT entry for BND relocation.
(elf_x86_64_finish_dynamic_sections): Use MPX backend data if
the second PLT is created.
(elf_x86_64_get_synthetic_symtab): New.
(bfd_elf64_get_synthetic_symtab): Likewise. Undefine for NaCl.
ld/
* emulparams/elf_x86_64.sh (TINY_READONLY_SECTION): New.
ld/testsuite/
* ld-x86-64/mpx.exp: Run bnd-ifunc-1 and bnd-plt-1.
* ld-x86-64/bnd-ifunc-1.d: New file.
* ld-x86-64/bnd-ifunc-1.s: Likewise.
* ld-x86-64/bnd-plt-1.d: Likewise.
It's best that we standardize on process_stratum targets using the
ptid.lwp field to store thread ids. The idea being leave the ptid.tid
field free for any thread_stratum target that might want to sit on
top. This patch adds a comment in that direction to struct ptid's
definition.
gdb/
2014-02-19 Pedro Alves <palves@redhat.com>
* common/ptid.h (struct ptid): Mention that process_stratum
targets should prefer ptid.lwp.
From GDB's perspective, independently of how the target really
implements threads, gdb/remote sees all threads as if kernel/system
threads. A rationale along theses lines led to gdbserver storing
thread ids in ptid.lwp in all ports.
Because remote.c is currently using ptid.tid, we can't make gdbserver
and gdb share bits of remote-specific code that manipulates ptids
(e.g., write_ptid/read_ptid).
This patch thus makes remote.c use ptid.lwp instead of ptid.tid.
I believe that on the GDB side too, it's best that we standardize on
process_stratum targets using the ptid.lwp field to store thread ids
anyway. The idea being leave the ptid.tid field free for any
thread_stratum target that might want to sit on top.
Tested on x86_64 Fedora 17, w/ local gdbserver.
gdb/
2014-02-19 Pedro Alves <palves@redhat.com>
* remote.c (remote_thread_alive, write_ptid, read_ptid)
(read_ptid, remote_newthread_step, remote_threads_extra_info)
(remote_get_ada_task_ptid, append_resumption, remote_stop_ns)
(threadalive_test, remote_pid_to_str): Use the ptid.lwp field to
store remote thread ids rather than ptid.tid.
(_initialize_remote): Adjust.
This converts to_get_unwinder and to_get_tailcall_unwinder to methods
and arranges for them to use the new delegation scheme.
This just lets us avoid having a differing style (neither new-style
nor INHERIT) of delegation in the tree.
2014-02-19 Tom Tromey <tromey@redhat.com>
* target.c (target_get_unwinder): Rewrite.
(target_get_tailcall_unwinder): Rewrite.
* record-btrace.c (record_btrace_to_get_unwinder): New function.
(record_btrace_to_get_tailcall_unwinder): New function.
(init_record_btrace_ops): Update.
* target.h (struct target_ops) <to_get_unwinder,
to_get_tailcall_unwinder>: Now function pointers. Use
TARGET_DEFAULT_RETURN.
I happened to notice that nto-procfs.c defines
procfs_remove_hw_breakpoint but never uses it. This caused it not to
be updated by my target-method-updating script. This patch fixes the
function and installs it properly. I have no way to test this,
however.
2014-02-19 Tom Tromey <tromey@redhat.com>
* nto-procfs.c (procfs_remove_hw_breakpoint): Add 'self'
argument.
(init_procfs_ops): Correctly set to_remove_hw_breakpoint.
This converts to_decr_pc_after_break to the new style of delegation,
removing forward_target_decr_pc_after_break.
2014-02-19 Tom Tromey <tromey@redhat.com>
* record-btrace.c (record_btrace_decr_pc_after_break): Delegate
directly.
* target-delegates.c: Rebuild.
* target.h (struct target_ops) <to_decr_pc_after_break>: Use
TARGET_DEFAULT_FUNC.
* target.c (default_target_decr_pc_after_break): Rename from
forward_target_decr_pc_after_break. Simplify.
(target_decr_pc_after_break): Rely on delegation.
This removes a few unnecessary calls to INHERIT and de_fault:
* to_doc is only used when a target is registered
* to_magic is only used when a target is pushed and not useful for
current_target.
* to_open and to_close are only ever called using a specific
target_ops object; there is no need to de_fault them.
2014-02-19 Tom Tromey <tromey@redhat.com>
* target.c (update_current_target): Do not INHERIT to_doc or
to_magic. Do not de_fault to_open or to_close.
exec_set_find_memory_regions is used to modify the exec target.
However, it only has a single caller, and so it is much clearer to
simply set the appropriate field directly. It's also better for the
coming multi-target world to avoid this kind of global state change
anyway.
2014-02-19 Tom Tromey <tromey@redhat.com>
* gcore.h (objfile_find_memory_regions): Declare.
* gcore.c (objfile_find_memory_regions): No longer static. Add
"self" argument.
(_initialize_gcore): Don't call exec_set_find_memory_regions.
* exec.c: Include gcore.h.
(exec_set_find_memory_regions): Remove.
(exec_find_memory_regions): Remove.
(exec_do_find_memory_regions): Remove.
(init_exec_ops): Update.
* defs.h (exec_set_find_memory_regions): Remove.
This changes instances of TARGET_DEFAULT_RETURN(0) to
TARGET_DEFAULT_RETURN(NULL) when appropriate. The use of "0" was a
relic from an earlier implementation of make-target-delegates; and I
didn't want to go back through the long patch series, fixing up
conflicts, just to change this small detail.
2014-02-19 Tom Tromey <tromey@redhat.com>
* target-delegates.c: Rebuild.
* target.h (struct target_ops) <to_extra_thread_info,
to_thread_name, to_pid_to_exec_file, to_get_section_table,
to_memory_map, to_read_description, to_traceframe_info>: Use NULL,
not 0, in TARGET_DEFAULT_RETURN.
This cleans up target.c to avoid function casts.
2014-02-19 Tom Tromey <tromey@redhat.com>
* target.c (complete_target_initialization): Remove casts. Use
return_zero_has_execution.
(return_zero): Add "ignore" argument.
(return_zero_has_execution): New function.
(init_dummy_target): Remove casts. Use
return_zero_has_execution.
During the conversion I kept all the "do not inherit" comments in
update_current_target. However, now they are not needed. This patch
updates the comments for INHERIT and de_fault, and removes the
somewhat odd INHERIT of to_stratum.
2014-02-19 Tom Tromey <tromey@redhat.com>
* target.c (update_current_target): Update comments. Do not
INHERIT to_stratum.
This switches to_read_description to the "new normal" delegation
scheme. This one was a bit trickier than the other changes due to the
way that target_read_description handled delegation. I examined all
the target implementations of to_read_description and changed the ones
returning NULL to instead delegate.
2014-02-19 Tom Tromey <tromey@redhat.com>
* arm-linux-nat.c (arm_linux_read_description): Delegate when
needed.
* corelow.c (core_read_description): Delegate when needed.
* remote.c (remote_read_description): Delegate when needed.
* target-delegates.c: Rebuild.
* target.c (target_read_description): Rewrite.
* target.h (struct target_ops) <to_read_description>: Update
comment. Use TARGET_DEFAULT_RETURN.