Commit Graph

6 Commits

Author SHA1 Message Date
Borislav Petkov d61931d89b x86: Add optimized popcnt variants
Add support for the hardware version of the Hamming weight function,
popcnt, present in CPUs which advertize it under CPUID, Function
0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
default lib/hweight.c sw versions.

A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
a 3x speedup on a F10h machine.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100318112015.GC11152@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-06 15:52:11 -07:00
Andi Kleen c8399943bd x86, generic: mark complex bitops.h inlines as __always_inline
Impact: reduce kernel image size

Hugh Dickins noticed that older gcc versions when the kernel
is built for code size didn't inline some of the bitops.

Mark all complex x86 bitops that have more than a single
asm statement or two as always inline to avoid this problem.

Probably should be done for other architectures too.

Ingo then found a better fix that only requires
a single line change, but it unfortunately only
works on gcc 4.3.

On older gccs the original patch still makes a ~0.3% defconfig
difference with CONFIG_OPTIMIZE_INLINING=y.

With gcc 4.1 and a defconfig like build:

    6116998 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
    6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining

~20k / 0.3% difference.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 18:56:30 +01:00
Linus Torvalds c4295fbb60 x86: make 'constant_test_bit()' take an unsigned bit number
Ingo noticed that using signed arithmetic seems to confuse the gcc
inliner, and make it potentially decide that it's all too complicated.

(Yeah, yeah, it's a constant. It's always positive. Still..)

Based-on: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 12:49:50 -08:00
Uros Bizjak 838e8bb71d x86: Implement change_bit with immediate operand as "lock xorb"
Impact: Minor optimization.

Implement change_bit with immediate bit count as "lock xorb". This is
similar to  "lock orb" and "lock andb"  for set_bit and clear_bit
functions.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-05 09:51:02 -08:00
H. Peter Anvin 1965aae3c9 x86: Fix ASM_X86__ header guards
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since:

a. the double underscore is ugly and pointless.
b. no leading underscore violates namespace constraints.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-22 22:55:23 -07:00
Al Viro bb8985586b x86, um: ... and asm-x86 move
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-22 22:55:20 -07:00