linux/arch/x86
Matt Fleming df13086490 x86/asm/64: Align start of __clear_user() loop to 16-bytes
commit bb5570ad3b upstream.

x86 CPUs can suffer severe performance drops if a tight loop, such as
the ones in __clear_user(), straddles a 16-byte instruction fetch
window, or worse, a 64-byte cacheline. This issues was discovered in the
SUSE kernel with the following commit,

  1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")

which increased the code object size from 10 bytes to 15 bytes and
caused the 8-byte copy loop in __clear_user() to be split across a
64-byte cacheline.

Aligning the start of the loop to 16-bytes makes this fit neatly inside
a single instruction fetch window again and restores the performance of
__clear_user() which is used heavily when reading from /dev/zero.

Here are some numbers from running libmicro's read_z* and pread_z*
microbenchmarks which read from /dev/zero:

  Zen 1 (Naples)

  libmicro-file
                                        5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                                    revert-1153933703d9+               align16+
  Time mean95-pread_z100k       9.9195 (   0.00%)      5.9856 (  39.66%)      5.9938 (  39.58%)
  Time mean95-pread_z10k        1.1378 (   0.00%)      0.7450 (  34.52%)      0.7467 (  34.38%)
  Time mean95-pread_z1k         0.2623 (   0.00%)      0.2251 (  14.18%)      0.2252 (  14.15%)
  Time mean95-pread_zw100k      9.9974 (   0.00%)      6.0648 (  39.34%)      6.0756 (  39.23%)
  Time mean95-read_z100k        9.8940 (   0.00%)      5.9885 (  39.47%)      5.9994 (  39.36%)
  Time mean95-read_z10k         1.1394 (   0.00%)      0.7483 (  34.33%)      0.7482 (  34.33%)

Note that this doesn't affect Haswell or Broadwell microarchitectures
which seem to avoid the alignment issue by executing the loop straight
out of the Loop Stream Detector (verified using perf events).

Fixes: 1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.19+
Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30 15:37:08 -04:00
..
boot x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld 2020-06-24 17:50:49 +02:00
configs x86/defconfigs: Remove useless UEVENT_HELPER_PATH 2019-06-21 19:22:08 +02:00
crypto crypto: arch/nhpoly1305 - process in explicit 4k chunks 2020-05-14 07:58:25 +02:00
entry x86/unwind/orc: Fix premature unwind stoppage due to IRET frames 2020-05-14 07:58:29 +02:00
events perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont 2020-06-17 16:40:25 +02:00
hyperv x86/Hyper-V: Report crash data in die() when panic_on_oops is set 2020-04-23 10:36:24 +02:00
ia32 syscalls/x86: Use COMPAT_SYSCALL_DEFINE0 for IA32 (rt_)sigreturn 2020-01-17 19:48:30 +01:00
include KVM: nVMX: Plumb L2 GPA through to PML emulation 2020-06-30 15:37:07 -04:00
kernel x86/cpu: Use pinning mask for CR4 bits needing to be 0 2020-06-30 15:37:08 -04:00
kvm KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL 2020-06-30 15:37:07 -04:00
lib x86/asm/64: Align start of __clear_user() loop to 16-bytes 2020-06-30 15:37:08 -04:00
math-emu x86/math-emu: Check __copy_from_user() result 2019-12-31 16:43:32 +01:00
mm x86/mm: Stop printing BRK addresses 2020-06-22 09:31:08 +02:00
net bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B 2020-05-02 08:48:55 +02:00
oprofile
pci x86/PCI: Mark Intel C620 MROMs as having non-compliant BARs 2020-06-17 16:40:24 +02:00
platform efi/x86: Fix the deletion of variables in mixed mode 2020-04-17 10:50:25 +02:00
power Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-17 12:04:39 -07:00
purgatory x86/purgatory: Disable various profiling and sanitizing options 2020-06-24 17:50:20 +02:00
ras RAS/CEC: Add CONFIG_RAS_CEC_DEBUG and move CEC debug features there 2019-06-08 17:39:24 +02:00
realmode x86/realmode: Remove trampoline_status 2019-07-22 11:30:18 +02:00
tools x86/insn: Fix awk regexp warnings 2019-11-29 10:09:45 +01:00
um um: Implement copy_thread_tls 2020-01-14 20:08:35 +01:00
video treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
xen x86: Fix early boot crash on gcc-10, third try 2020-05-20 08:20:34 +02:00
.gitignore
Kbuild treewide: Add SPDX license identifier - Kbuild 2019-05-30 11:32:33 -07:00
Kconfig x86/tsx: Add config options to set tsx=on|off|auto 2019-10-28 09:12:18 +01:00
Kconfig.cpu x86/cpu: Create Zhaoxin processors architecture support file 2019-06-22 11:45:57 +02:00
Kconfig.debug x86, perf: Fix the dependency of the x86 insn decoder selftest 2019-09-02 20:05:58 +02:00
Makefile x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning 2019-08-28 17:31:31 +02:00
Makefile.um
Makefile_32.cpu