optimize the following memcpy: sysdeps/i386/i686/multiarch/memcpy-ssse3.S

I've improved the following implementation of memcpy: "sysdeps/i386/i686/multiarch/memcpy-ssse3.S". The patch includes some minor style fixes, but the important part is just using prefetch loops for the case: DATA_CACHE_SIZE_HALF <= len < SHARED_CACHE_SIZE_HALF and src and dst pointers have unequal 16 byte alignments. This gives from 6% - 50% performance boost on the atom machine, about 24,73% in geometric mean.
2012-03-30 16:45:27 -04:00 · 2012-03-30 16:45:27 -04:00 · 4b43400f6a
commit 4b43400f6a
parent 48c41d04ee
2 changed files with 1484 additions and 564 deletions
--- a/7
+++ b/7
@ -1,3 +1,10 @@
+2012-03-22  Liubov Dmitrieva  <liubov.dmitrieva@gmail.com>
+
+	* sysdeps/i386/i686/multiarch/memcpy-ssse3.S: Update.
+	Optimize memcpy with prefetch if
+	DATA_CACHE_SIZE_HALF <= len <  SHARED_CACHE_SIZE_HALF and
+	src, dst pointers have unequal 16 byte alignments.
+
 2012-03-30  Siddhesh Poyarekar  <siddhesh@redhat.com>

 	[BZ #13928]
--- a/sysdeps/i386/i686/multiarch/memcpy-ssse3.S
+++ b/sysdeps/i386/i686/multiarch/memcpy-ssse3.S