Faster memset on x64

This implementation speed up memset in several ways. First is avoiding
expensive computed jump. Second is using fact that arguments of memset
are most of time aligned to 8 bytes.

Benchmark results on:
kam.mff.cuni.cz/~ondra/benchmark_string/memset_profile_result27_04_13.tar.bz2
This commit is contained in:
Ondrej Bilka 2013-05-20 08:26:00 +02:00
parent 2d48b41c8f
commit b2b671b677
2 changed files with 96 additions and 1315 deletions

View File

@ -1,3 +1,9 @@
2013-05-20 Ondřej Bílka <neleai@seznam.cz>
* sysdeps/x86_64/memset.S (memset): New implementation.
(__bzero): Likewise.
(__memset_tail): New function.
2013-05-20 Ondřej Bílka <neleai@seznam.cz>
* sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: New file.

File diff suppressed because it is too large Load Diff