optimize the following memcpy: sysdeps/i386/i686/multiarch/memcpy-ssse3.S
I've improved the following implementation of memcpy: "sysdeps/i386/i686/multiarch/memcpy-ssse3.S". The patch includes some minor style fixes, but the important part is just using prefetch loops for the case: DATA_CACHE_SIZE_HALF <= len < SHARED_CACHE_SIZE_HALF and src and dst pointers have unequal 16 byte alignments. This gives from 6% - 50% performance boost on the atom machine, about 24,73% in geometric mean.
This commit is contained in:
parent
48c41d04ee
commit
4b43400f6a
@ -1,3 +1,10 @@
|
||||
2012-03-22 Liubov Dmitrieva <liubov.dmitrieva@gmail.com>
|
||||
|
||||
* sysdeps/i386/i686/multiarch/memcpy-ssse3.S: Update.
|
||||
Optimize memcpy with prefetch if
|
||||
DATA_CACHE_SIZE_HALF <= len < SHARED_CACHE_SIZE_HALF and
|
||||
src, dst pointers have unequal 16 byte alignments.
|
||||
|
||||
2012-03-30 Siddhesh Poyarekar <siddhesh@redhat.com>
|
||||
|
||||
[BZ #13928]
|
||||
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user