linux

Commit Graph

Author	SHA1	Message	Date
Andy Shevchenko	d50ba3687b	x86/lib: Fix spelling, put space between a numeral and its units As suggested by Peter Anvin. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: H . Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-15 11:40:32 +02:00
Andy Shevchenko	bb916ff7cd	x86/lib: Fix spelling in the comments Apparently 'byts' should be 'bytes'. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: H . Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-15 11:40:31 +02:00
Ma Ling	3b4b682bec	x86, mem: Optimize memmove for small size and unaligned cases movs instruction will combine data to accelerate moving data, however we need to concern two cases about it. 1. movs instruction need long lantency to startup, so here we use general mov instruction to copy data. 2. movs instruction is not good for unaligned case, even if src offset is 0x10, dest offset is 0x0, we avoid and handle the case by general mov instruction. Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1284664360-6138-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-09-24 18:57:11 -07:00
Ma Ling	59daa706fb	x86, mem: Optimize memcpy by avoiding memory false dependece All read operations after allocation stage can run speculatively, all write operation will run in program order, and if addresses are different read may run before older write operation, otherwise wait until write commit. However CPU don't check each address bit, so read could fail to recognize different address even they are in different page.For example if rsi is 0xf004, rdi is 0xe008, in following operation there will generate big performance latency. 1. movq (%rsi), %rax 2. movq %rax, (%rdi) 3. movq 8(%rsi), %rax 4. movq %rax, 8(%rdi) If %rsi and rdi were in really the same meory page, there are TRUE read-after-write dependence because instruction 2 write 0x008 and instruction 3 read 0x00c, the two address are overlap partially. Actually there are in different page and no any issues, but without checking each address bit CPU could think they are in the same page, and instruction 3 have to wait for instruction 2 to write data into cache from write buffer, then load data from cache, the cost time read spent is equal to mfence instruction. We may avoid it by tuning operation sequence as follow. 1. movq 8(%rsi), %rax 2. movq %rax, 8(%rdi) 3. movq (%rsi), %rax 4. movq %rax, (%rdi) Instruction 3 read 0x004, instruction 2 write address 0x010, no any dependence. At last on Core2 we gain 1.83x speedup compared with original instruction sequence. In this patch we first handle small size(less 20bytes), then jump to different copy mode. Based on our micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X improvement, and up to 1.5X improvement for 1024 bytes on Corei7. (We use our micro-benchmark, and will do further test according to your requirment) Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-23 14:56:41 -07:00
Ma, Ling	fdf4289679	x86, mem: Don't implement forward memmove() as memcpy() memmove() allow source and destination address to be overlap, but there is no such limitation for memcpy(). Therefore, explicitly implement memmove() in both the forwards and backward directions, to give us the ability to optimize memcpy(). Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-23 14:14:27 -07:00
Paolo Ciarrocchi	93d8bd3d4f	x86: coding style fixes to arch/x86/lib/memcpy_32.c Before: total: 2 errors, 0 warnings, 43 lines checked After: total: 0 errors, 0 warnings, 43 lines checked No code changed: arch/x86/lib/memcpy_32.o: text data bss dec hex filename 164 0 0 164 a4 memcpy_32.o.before 164 0 0 164 a4 memcpy_32.o.after md5: d759f55621af27f51720b59c8ca96a4d memcpy_32.o.before.asm d759f55621af27f51720b59c8ca96a4d memcpy_32.o.after.asm Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:40:49 +02:00
Jan Engelhardt	ade1af7712	x86: remove unneded casts x86: remove unneeded casts Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:23 +01:00
Thomas Gleixner	44f0257fc3	i386: move lib Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-11 11:16:33 +02:00

8 Commits