Go to file
Siddhesh Poyarekar dd5bc7f1b3 aarch64: Optimized implementation of memmove for Qualcomm Falkor
This is an optimized memmove implementation for the Qualcomm Falkor
processor core.  Due to the way the falkor memcpy needs to be written,
code cannot be easily shared between memmove and memcpy like in case
of other aarch64 memcpy implementations due to which this routine is
separate.  The underlying principle is the same as that of memcpy
where it tries to use registers with the same lower 4 bits for
fetching the same stream, thus optimizing hardware prefetcher
performance.

The memcpy copy loop copies 64 bytes at a time using the same register
pair since that's the way to train the hardware prefetcher on the
falkor core.  memmove cannot quite do that since it needs to avoid
overlaps, so it does the next best thing, i.e. has a 32 byte loop with
a 32 byte end (prefetch a loop ahead to account for overlapping
locations) with register pairs that alias so that they hit the same
prefetcher.  Due to this difference in loop size, they have to
currently be separate implementations but efforts are on to try and
get memmove to fall back into memcpy whenever it can without simply
duplicating all of the code.

Performance:

The routine fares around 20-25% better than the generic memmove for
most medium to large sizes (i.e. > 128 bytes) for the new walking
memmove benchmark (memmove-walk) with an unexplained regression
between 1K and 2K.  The minor regression is something worth looking
into for us, but the remaining gains are significant enough that we
would like this included upstream as we looking into the cause for the
regression.  Here is a snippet of the numbers as generated from the
microbenchmark by the compare_strings script.  Comparisons are against
__memmove_generic:

Function: memmove
Variant: walk
                                    __memmove_thunderx	__memmove_falkor	__memmove_generic
========================================================================================================================
<snip>
                        length=16384:  12508800.00 (  6.09%)	 11486800.00 ( 13.76%)	 13319600.00
                        length=16400:  13614200.00 ( -0.67%)	 11585000.00 ( 14.33%)	 13523600.00
                        length=16385:  13448400.00 (  0.10%)	 11732700.00 ( 12.84%)	 13461200.00
                        length=16399:  13594100.00 ( -0.22%)	 11859600.00 ( 12.57%)	 13564400.00
                        length=16386:  13211600.00 (  1.13%)	 11503800.00 ( 13.91%)	 13362400.00
                        length=16398:  13218600.00 (  2.12%)	 11573200.00 ( 14.30%)	 13504700.00
                        length=16387:  13510900.00 ( -0.37%)	 11744200.00 ( 12.76%)	 13461300.00
                        length=16397:  13603700.00 ( -0.15%)	 11878200.00 ( 12.55%)	 13583200.00
                        length=16388:  13461700.00 ( -0.13%)	 11558000.00 ( 14.03%)	 13444100.00
                        length=16396:  13517500.00 ( -0.03%)	 11561300.00 ( 14.45%)	 13513900.00
                        length=16389:  13534100.00 (  0.17%)	 11756800.00 ( 13.28%)	 13556900.00
                        length=16395:  13585600.00 (  0.11%)	 11791800.00 ( 13.30%)	 13601200.00
                        length=16390:  13480100.00 ( -0.13%)	 11685500.00 ( 13.20%)	 13462100.00
                        length=16394:  13529900.00 ( -0.23%)	 11549800.00 ( 14.43%)	 13498200.00
                        length=16391:  13595400.00 ( -0.26%)	 11768200.00 ( 13.22%)	 13560600.00
                        length=16393:  13567000.00 (  0.20%)	 11779700.00 ( 13.35%)	 13594700.00
                        length=32768:  71308800.00 ( -6.53%)	 50220800.00 ( 24.98%)	 66939200.00
                        length=32784:  72100800.00 (-11.55%)	 50114100.00 ( 22.47%)	 64636300.00
                        length=32769:  71767000.00 ( -7.10%)	 51238400.00 ( 23.54%)	 67010000.00
                        length=32783:  70113700.00 (-40.95%)	 51129000.00 ( -2.78%)	 49744400.00
                        length=32770:  71367600.00 ( -6.52%)	 50244700.00 ( 25.01%)	 67000900.00
                        length=32782:  64366700.00 (  4.71%)	 50101400.00 ( 25.83%)	 67545600.00
                        length=32771:  71440100.00 ( -6.51%)	 51263900.00 ( 23.57%)	 67074900.00
                        length=32781:  66993000.00 (  0.34%)	 51108300.00 ( 23.97%)	 67220300.00
                        length=32772:  71443900.00 (-60.50%)	 50062100.00 (-12.47%)	 44512600.00
                        length=32780:  71759100.00 ( -6.58%)	 50263200.00 ( 25.35%)	 67328600.00
                        length=32773:  71714900.00 (-33.21%)	 51076600.00 (  5.12%)	 53835400.00
                        length=32779:  71756900.00 ( -6.56%)	 51290800.00 ( 23.83%)	 67337800.00
                        length=32774:  59689300.00 (-34.55%)	 50068400.00 (-12.86%)	 44363300.00
                        length=32778:  71847500.00 (-18.20%)	 50084100.00 ( 17.61%)	 60786500.00
                        length=32775:  71599300.00 ( -6.54%)	 51278200.00 ( 23.70%)	 67204800.00
                        length=32777:  71862900.00 (-60.85%)	 51094000.00 (-14.36%)	 44677900.00
                        length=65536: 282848000.00 ( -6.60%)	199187000.00 ( 24.93%)	265325000.00
                        length=65552: 243285000.00 (-41.61%)	198512000.00 (-15.54%)	171805000.00
                        length=65537: 255415000.00 (-23.47%)	202499000.00 (  2.11%)	206858000.00
                        length=65551: 280122000.00 (-62.95%)	203349000.00 (-18.29%)	171911000.00
                        length=65538: 283676000.00 (-14.46%)	198368000.00 ( 19.96%)	247848000.00
                        length=65550: 275566000.00 (-51.76%)	198494000.00 ( -9.31%)	181581000.00
                        length=65539: 283699000.00 ( -6.58%)	203453000.00 ( 23.57%)	266195000.00
                        length=65549: 286572000.00 ( -6.65%)	202607000.00 ( 24.60%)	268712000.00
                        length=65540: 283710000.00 ( -6.59%)	199161000.00 ( 25.17%)	266160000.00
                        length=65548: 237573000.00 ( 11.48%)	198462000.00 ( 26.06%)	268395000.00
                        length=65541: 284150000.00 ( -6.58%)	203273000.00 ( 23.75%)	266600000.00
                        length=65547: 286250000.00 ( -6.70%)	202594000.00 ( 24.48%)	268263000.00
                        length=65542: 284167000.00 ( -6.60%)	199122000.00 ( 25.31%)	266584000.00
                        length=65546: 285656000.00 ( -6.59%)	198443000.00 ( 25.95%)	268002000.00
                        length=65543: 284600000.00 ( -6.58%)	203247000.00 ( 23.89%)	267030000.00
                        length=65545: 285665000.00 ( -6.40%)	202575000.00 ( 24.55%)	268472000.00
<snip>

	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
	memmove_falkor.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Likewise.
	* sysdeps/aarch64/multiarch/memmove.c: Likewise.
	* sysdeps/aarch64/multiarch/memmove_falkor.S: New file.
2017-10-05 22:20:23 +05:30
ChangeLog.old Move all old ChangeLogs to a top-level ChangeLog.old directory. 2017-09-01 09:31:43 -04:00
argp Mark internal argp functions with attribute_hidden [BZ #18822] 2017-10-01 15:10:27 -07:00
assert Fix position of tests-unsupported definition in assert/Makefile. 2017-08-22 00:30:51 +00:00
benchtests benchtests: Memory walking benchmark for memmove 2017-10-05 22:20:23 +05:30
bits hurd: Fix bits/socket.h conformity 2017-09-24 22:21:41 +02:00
catgets Don't compile non-lib modules as lib modules [BZ #21864] 2017-08-21 05:34:54 -07:00
conform Fix mcontext_t sigcontext namespace (bug 21457). 2017-08-30 22:02:04 +00:00
crypt crypt: Use NSPR header files in addition to NSS header files [BZ #17956] 2017-10-04 15:02:35 +02:00
csu Hide internal __libc_print_version function [BZ #18822] 2017-10-01 17:55:30 -07:00
ctype Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
debug Enable unwind info in libc-start.c and backtrace.c 2017-09-19 15:07:58 +01:00
dev Rename xlocale.h to bits/types/__locale_t.h. 2017-06-20 20:28:11 -04:00
dirent hurd: Fix dirfd symbol exposition from ftw 2017-09-28 00:49:05 +02:00
dlfcn Mark __dso_handle as hidden [BZ #18822] 2017-09-26 16:53:44 -07:00
elf Use $(DEFAULT-LDFLAGS-$(@F)) in +link-static-before-libc 2017-10-04 17:16:04 -07:00
gmon tst-gmon: Build with -fno-omit-frame-pointer 2017-10-05 14:34:45 +02:00
gnulib Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
grp Remove compat from DEFAULT_CONFIG lookup strings 2017-09-12 10:21:48 -07:00
gshadow Remove __need macros from stdio.h and wchar.h. 2017-06-08 13:58:17 -04:00
hesiod Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
hurd hurd: fix gethostname(NULL, 0) 2017-09-07 00:51:17 +02:00
iconv Mark internal functions with attribute_hidden [BZ #18822] 2017-10-01 15:07:23 -07:00
iconvdata Add new codepage charmaps/IBM858 [BZ #21084] 2017-09-14 15:50:57 +02:00
include Don't use hidden visibility in libc.a with PIE on i386 2017-10-04 17:18:42 -07:00
inet Hide internal idna functions [BZ #18822] 2017-10-01 17:33:22 -07:00
intl Hide internal __hash_string function [BZ #18822] 2017-10-01 17:41:34 -07:00
io hurd: Fix dirfd symbol exposition from ftw 2017-09-28 00:49:05 +02:00
libidn Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
libio Always do locking when iterating over list of streams (bug 15142) 2017-10-05 17:26:05 +02:00
locale Mark internal functions with attribute_hidden [BZ #18822] 2017-10-01 15:07:23 -07:00
localedata Add new codepage charmaps/IBM858 [BZ #21084] 2017-09-14 15:50:57 +02:00
login Mark internal utmp functions with attribute_hidden [BZ #18822] 2017-10-01 15:51:56 -07:00
mach hurd: Remove duplicate symbol version 2017-08-28 14:19:55 +02:00
malloc Mark __dso_handle as hidden [BZ #18822] 2017-09-26 16:53:44 -07:00
manual Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
math test-math-iscanonical.cc: Return errors != 0 2017-10-04 14:31:16 -07:00
mathvec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
misc Hide internal __hasmntopt function [BZ #18822] 2017-10-01 17:37:42 -07:00
nis Move nss_compat from nis to nss subdir and install it unconditionally 2017-10-04 14:20:48 +02:00
nptl Mark __dso_handle as hidden [BZ #18822] 2017-09-26 16:53:44 -07:00
nptl_db Move all old ChangeLogs to a top-level ChangeLog.old directory. 2017-09-01 09:31:43 -04:00
nscd nscd: Eliminate compilation time dependency in the build output 2017-10-05 18:14:57 +02:00
nss Move nss_compat from nis to nss subdir and install it unconditionally 2017-10-04 14:20:48 +02:00
po Update translations 2017-09-11 05:50:49 +05:30
posix Hide internal __sched_setparam function [BZ #18822] 2017-10-01 17:43:25 -07:00
pwd Remove __need macros from stdio.h and wchar.h. 2017-06-08 13:58:17 -04:00
resolv Mark internal functions with attribute_hidden [BZ #18822] 2017-10-01 15:07:23 -07:00
resource Hide internal __setrlimit function [BZ #18822] 2017-10-01 17:46:54 -07:00
rt aio: Remove internal_function function attribute 2017-08-31 15:59:06 +02:00
scripts Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
setjmp Remove __need macros from signal.h. 2017-05-20 19:04:43 -04:00
shadow Remove __need macros from stdio.h and wchar.h. 2017-06-08 13:58:17 -04:00
signal Hide internal signal functions [BZ #18822] 2017-10-01 16:04:41 -07:00
socket __opensock: Remove internal_function attribute 2017-08-17 10:18:15 +02:00
soft-fp Remove non-add-on Banner files. 2017-09-21 17:49:51 +00:00
stdio-common linux: Implement tmpfile with O_TMPFILE (BZ#21530) 2017-09-01 09:52:47 -03:00
stdlib abort: Do not flush stdio streams [BZ #15436] 2017-10-05 14:48:16 +02:00
streams Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
string Hide internal __strsep function [BZ #18822] 2017-10-01 16:03:41 -07:00
sunrpc sunrpc/tst-udp-nonblocking: Fix timeout value 2017-09-10 21:09:28 +02:00
support support_format_hostent: Add more error information for NETDB_INTERNAL 2017-10-05 12:20:19 +02:00
sysdeps aarch64: Optimized implementation of memmove for Qualcomm Falkor 2017-10-05 22:20:23 +05:30
sysvipc Fix test-sysvsem on some platforms 2017-01-02 18:53:50 -02:00
termios Hide internal __tcgetattr function [BZ #18822] 2017-10-01 17:48:24 -07:00
time time: Remove the internal_function attribute 2017-08-31 15:59:07 +02:00
timezone zic: Use PRIdMAX to print line numbers 2017-07-25 12:34:14 +05:30
wcsmbs Mark ____wcsto*_l_internal functions with attribute_hidden [BZ #18822] 2017-10-01 15:09:28 -07:00
wctype Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
.gitattributes Assume __NR_openat is always defined 2016-03-23 23:35:08 +01:00
.gitignore Add *.pyc to .gitignore 2015-05-18 15:26:26 +05:30
COPYING
COPYING.LIB
ChangeLog aarch64: Optimized implementation of memmove for Qualcomm Falkor 2017-10-05 22:20:23 +05:30
INSTALL Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
LICENSES
MAINTAINERS Add MAINTAINERS 2017-05-11 13:38:30 -04:00
Makeconfig Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
Makefile Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
Makefile.in New make target to only build benchmark binaries 2016-04-20 10:23:28 +05:30
Makerules Place $(elf-objpfx)sofini.os last [BZ #22051] 2017-08-31 06:28:46 -07:00
NEWS Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
README Require Linux kernel 3.2 or later on x86 / x86_64. 2017-05-08 10:45:20 +00:00
Rules Suppress internal declarations for most of the testsuite. 2017-05-11 19:27:59 -04:00
abi-tags Remove the bulk of the NaCl port. 2017-05-20 08:09:10 -04:00
aclocal.m4 gmon: Add test for basic mcount/gprof functionality 2017-08-15 15:49:45 +02:00
config.h.in Don't use hidden visibility in libc.a with PIE on i386 2017-10-04 17:18:42 -07:00
config.make.in Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
configure Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
configure.ac Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
extra-lib.mk Rename cppflags-iterator.mk to libof-iterator.mk, remove extra-modules.mk. 2017-05-09 07:06:29 -04:00
gen-locales.mk
libc-abis
libof-iterator.mk Rename cppflags-iterator.mk to libof-iterator.mk, remove extra-modules.mk. 2017-05-09 07:06:29 -04:00
o-iterator.mk
shlib-versions Extend NSS test suite 2017-07-17 15:52:44 -04:00
test-skeleton.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
version.h Open master for development 2017-08-02 19:23:16 +05:30

README

This directory contains the sources of the GNU C Library.
See the file "version.h" for what release version you have.

The GNU C Library is the standard system C library for all GNU systems,
and is an important part of what makes up a GNU system.  It provides the
system API for all programs written in C and C-compatible languages such
as C++ and Objective C; the runtime facilities of other programming
languages use the C library to access the underlying operating system.

In GNU/Linux systems, the C library works with the Linux kernel to
implement the operating system behavior seen by user applications.
In GNU/Hurd systems, it works with a microkernel and Hurd servers.

The GNU C Library implements much of the POSIX.1 functionality in the
GNU/Hurd system, using configurations i[4567]86-*-gnu.  The current
GNU/Hurd support requires out-of-tree patches that will eventually be
incorporated into an official GNU C Library release.

When working with Linux kernels, this version of the GNU C Library
requires Linux kernel version 3.2 or later.

Also note that the shared version of the libgcc_s library must be
installed for the pthread library to work correctly.

The GNU C Library supports these configurations for using Linux kernels:

	aarch64*-*-linux-gnu
	alpha*-*-linux-gnu
	arm-*-linux-gnueabi
	hppa-*-linux-gnu	Not currently functional without patches.
	i[4567]86-*-linux-gnu
	x86_64-*-linux-gnu	Can build either x86_64 or x32
	ia64-*-linux-gnu
	m68k-*-linux-gnu
	microblaze*-*-linux-gnu
	mips-*-linux-gnu
	mips64-*-linux-gnu
	powerpc-*-linux-gnu	Hardware or software floating point, BE only.
	powerpc64*-*-linux-gnu	Big-endian and little-endian.
	s390-*-linux-gnu
	s390x-*-linux-gnu
	sh[34]-*-linux-gnu
	sparc*-*-linux-gnu
	sparc64*-*-linux-gnu
	tilegx-*-linux-gnu
	tilepro-*-linux-gnu

If you are interested in doing a port, please contact the glibc
maintainers; see http://www.gnu.org/software/libc/ for more
information.

See the file INSTALL to find out how to configure, build, and install
the GNU C Library.  You might also consider reading the WWW pages for
the C library at http://www.gnu.org/software/libc/.

The GNU C Library is (almost) completely documented by the Texinfo manual
found in the `manual/' subdirectory.  The manual is still being updated
and contains some known errors and omissions; we regret that we do not
have the resources to work on the manual as much as we would like.  For
corrections to the manual, please file a bug in the `manual' component,
following the bug-reporting instructions below.  Please be sure to check
the manual in the current development sources to see if your problem has
already been corrected.

Please see http://www.gnu.org/software/libc/bugs.html for bug reporting
information.  We are now using the Bugzilla system to track all bug reports.
This web page gives detailed information on how to report bugs properly.

The GNU C Library is free software.  See the file COPYING.LIB for copying
conditions, and LICENSES for notices about a few contributions that require
these additional notices to be distributed.  License copyright years may be
listed using range notation, e.g., 1996-2015, indicating that every year in
the range, inclusive, is a copyrightable year that would otherwise be listed
individually.