17968fbbd1
I noticed __clear_user high up in a profile of one of my RAID stress tests. The testcase was doing a dd from /dev/zero which ends up calling __clear_user. __clear_user is basically a loop with a single 4 byte store which is horribly slow. We can do much better by aligning the desination and doing 32 bytes of 8 byte stores in a loop. The following testcase was used to verify the patch: http://ozlabs.org/~anton/junkcode/stress_clear_user.c To show the improvement in performance I ran a dd from /dev/zero to /dev/null on a POWER7 box: Before: # dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s After: # time dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s Over 5x faster. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> |
||
---|---|---|
.. | ||
alloc.c | ||
checksum_32.S | ||
checksum_64.S | ||
checksum_wrappers_64.c | ||
code-patching.c | ||
copy_32.S | ||
copypage_64.S | ||
copyuser_64.S | ||
copyuser_power7_vmx.c | ||
copyuser_power7.S | ||
crtsavres.S | ||
devres.c | ||
div64.S | ||
feature-fixups-test.S | ||
feature-fixups.c | ||
hweight_64.S | ||
ldstfp.S | ||
locks.c | ||
Makefile | ||
mem_64.S | ||
memcpy_64.S | ||
rheap.c | ||
sstep.c | ||
string_64.S | ||
string.S | ||
usercopy_64.c |