glibc/sysdeps
Szabolcs Nagy 08325735c2 [BZ 18034][AArch64] Lazy TLSDESC relocation data race fix
Lazy TLSDESC initialization needs to be synchronized with concurrent TLS
accesses.  The TLS descriptor contains a function pointer (entry) and an
argument that is accessed from the entry function.  With lazy initialization
the first call to the entry function updates the entry and the argument to
their final value.  A final entry function must make sure that it accesses an
initialized argument, this needs synchronization on systems with weak memory
ordering otherwise the writes of the first call can be observed out of order.

There are at least two issues with the current code:

tlsdesc.c (i386, x86_64, arm, aarch64) uses volatile memory accesses on the
write side (in the initial entry function) instead of C11 atomics.

And on systems with weak memory ordering (arm, aarch64) the read side
synchronization is missing from the final entry functions (dl-tlsdesc.S).

This patch only deals with aarch64.

* Write side:

Volatile accesses were replaced with C11 relaxed atomics, and a release
store was used for the initialization of entry so the read side can
synchronize with it.

* Read side:

TLS access generated by the compiler and an entry function code is roughly

  ldr x1, [x0]    // load the entry
  blr x1          // call it

entryfunc:
  ldr x0, [x0,#8] // load the arg
  ret

Various alternatives were considered to force the ordering in the entry
function between the two loads:

(1) barrier

entryfunc:
  dmb ishld
  ldr x0, [x0,#8]

(2) address dependency (if the address of the second load depends on the
result of the first one the ordering is guaranteed):

entryfunc:
  ldr x1,[x0]
  and x1,x1,#8
  orr x1,x1,#8
  ldr x0,[x0,x1]

(3) load-acquire (ARMv8 instruction that is ordered before subsequent
loads and stores)

entryfunc:
  ldar xzr,[x0]
  ldr x0,[x0,#8]

Option (1) is the simplest but slowest (note: this runs at every TLS
access), options (2) and (3) do one extra load from [x0] (same address
loads are ordered so it happens-after the load on the call site),
option (2) clobbers x1 which is problematic because existing gcc does
not expect that, so approach (3) was chosen.

A new _dl_tlsdesc_return_lazy entry function was introduced for lazily
relocated static TLS, so non-lazy static TLS can avoid the synchronization
cost.

	[BZ #18034]
	* sysdeps/aarch64/dl-tlsdesc.h (_dl_tlsdesc_return_lazy): Declare.
	* sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return_lazy): Define.
	(_dl_tlsdesc_undefweak): Guarantee TLSDESC entry and argument load-load
	ordering using ldar.
	(_dl_tlsdesc_dynamic): Likewise.
	(_dl_tlsdesc_return_lazy): Likewise.
	* sysdeps/aarch64/tlsdesc.c (_dl_tlsdesc_resolve_rela_fixup): Use
	relaxed atomics instead of volatile and synchronize with release store.
	(_dl_tlsdesc_resolve_hold_fixup): Use relaxed atomics instead of
	volatile.
	* elf/tlsdeschtab.h (_dl_tlsdesc_resolve_early_return_p): Likewise.
2015-06-17 12:41:01 +01:00
..
aarch64 [BZ 18034][AArch64] Lazy TLSDESC relocation data race fix 2015-06-17 12:41:01 +01:00
alpha alpha: Update libm-test-ulps 2015-05-19 09:43:54 -07:00
arm NaCl: Provide non-default values for uname. 2015-05-12 10:54:47 -07:00
generic Refactoring of START for conditions in individual tests 2015-05-14 18:07:06 +03:00
gnu
hppa hppa: Fix feupdateenv and fesetexceptflag (Bug 18111). 2015-03-11 02:48:59 -04:00
i386 Fix regcomp wcscoll, wcscmp namespace (bug 18497). 2015-06-09 21:07:30 +00:00
ia64 Set errno for log1p on pole/domain error. 2015-04-13 21:19:27 +02:00
ieee754 Replace finite with isfinite. 2015-06-03 16:35:44 +01:00
init_array NPTL: Initializer for .init_array-only configurations. 2015-02-13 13:19:11 -08:00
m68k Set errno for log1p on pole/domain error. 2015-04-13 21:19:27 +02:00
mach Fix getlogin_r namespace (bug 18527). 2015-06-12 20:02:30 +00:00
microblaze Replace ELF_RTYPE_CLASS_NOCOPY with ELF_RTYPE_CLASS_COPY 2015-03-05 08:40:41 -08:00
mips Fix mips16 __fpu_control static linking (bug 18397). 2015-05-11 22:58:10 +00:00
nacl NaCl: Implement nacl_interface_ext_supply entry point. 2015-06-03 13:51:11 -07:00
nios2 Replace ELF_RTYPE_CLASS_NOCOPY with ELF_RTYPE_CLASS_COPY 2015-03-05 08:40:41 -08:00
nptl NPTL: Remove duplicate definition of PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP 2015-03-28 01:50:12 -04:00
posix posix_fallocate: Emulation fixes and documentation [BZ #15661] 2015-06-05 10:50:38 +02:00
powerpc Use libc_hidden_proto / libc_hidden_def with __strnlen. 2015-06-02 20:24:25 +00:00
pthread Fix aio_* pread namespace (bug 18519). 2015-06-12 17:34:11 +00:00
s390 S/390: Regenerate ULPs 2015-04-24 13:37:48 +02:00
sh Replace ELF_RTYPE_CLASS_NOCOPY with ELF_RTYPE_CLASS_COPY 2015-03-05 08:40:41 -08:00
sparc Split timed-wait functions out of nptl/lowlevellock.c. 2015-05-26 14:49:13 -07:00
tile Use libc_hidden_proto / libc_hidden_def with __strnlen. 2015-06-02 20:24:25 +00:00
unix Vector sinf for x86_64 and tests. 2015-06-15 15:06:53 +03:00
wordsize-32
wordsize-64
x86 Vector sinf for x86_64 and tests. 2015-06-15 15:06:53 +03:00
x86_64 Vector sinf for x86_64 and tests. 2015-06-15 15:06:53 +03:00