linux/arch
Eric Biggers 91a2abb78f crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS
Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
for ARM64.  This is ported from the 32-bit version.  It may be useful on
devices with 64-bit ARM CPUs that don't have the Cryptography
Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
processor on the Raspberry Pi 3.

It generally works the same way as the 32-bit version, but there are
some slight differences due to the different instructions, registers,
and syntax available in ARM64 vs. in ARM32.  For example, in the 64-bit
version there are enough registers to hold the XTS tweaks for each
128-byte chunk, so they don't need to be saved on the stack.

Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:

   Algorithm                              Encryption     Decryption
   ---------                              ----------     ----------
   Speck64/128-XTS (NEON)                 92.2 MB/s      92.2 MB/s
   Speck128/256-XTS (NEON)                75.0 MB/s      75.0 MB/s
   Speck128/256-XTS (generic)             47.4 MB/s      35.6 MB/s
   AES-128-XTS (NEON bit-sliced)          33.4 MB/s      29.6 MB/s
   AES-256-XTS (NEON bit-sliced)          24.6 MB/s      21.7 MB/s

The code performs well on higher-end ARM64 processors as well, though
such processors tend to have the Crypto Extensions which make AES
preferred.  For example, here are the same benchmarks run on a HiKey960
(with CPU affinity set for the A73 cores), with the Crypto Extensions
implementation of AES-256-XTS added:

   Algorithm                              Encryption     Decryption
   ---------                              -----------    -----------
   AES-256-XTS (Crypto Extensions)        1273.3 MB/s    1274.7 MB/s
   Speck64/128-XTS (NEON)                  359.8 MB/s     348.0 MB/s
   Speck128/256-XTS (NEON)                 292.5 MB/s     286.1 MB/s
   Speck128/256-XTS (generic)              186.3 MB/s     181.8 MB/s
   AES-128-XTS (NEON bit-sliced)           142.0 MB/s     124.3 MB/s
   AES-256-XTS (NEON bit-sliced)           104.7 MB/s      91.1 MB/s

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-16 23:35:41 +08:00
..
alpha
arc
arm crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS 2018-02-22 22:16:55 +08:00
arm64 crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS 2018-03-16 23:35:41 +08:00
blackfin unify {de,}mangle_poll(), get rid of kernel-side POLL... 2018-02-11 14:37:22 -08:00
c6x
cris vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
frv unify {de,}mangle_poll(), get rid of kernel-side POLL... 2018-02-11 14:37:22 -08:00
h8300
hexagon
ia64 vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
m32r
m68k unify {de,}mangle_poll(), get rid of kernel-side POLL... 2018-02-11 14:37:22 -08:00
metag
microblaze
mips unify {de,}mangle_poll(), get rid of kernel-side POLL... 2018-02-11 14:37:22 -08:00
mn10300
nios2 nios2 update for v4.16-rc1 2018-02-11 13:52:32 -08:00
openrisc
parisc
powerpc vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
riscv RISC-V changes for 4.16 2018-02-07 11:33:08 -08:00
s390 KVM changes for 4.16 2018-02-10 13:16:35 -08:00
score arch/score/kernel/setup.c: combine two seq_printf() calls into one call in show_cpuinfo() 2018-02-06 18:32:47 -08:00
sh
sparc unify {de,}mangle_poll(), get rid of kernel-side POLL... 2018-02-11 14:37:22 -08:00
tile
um mconsole_proc(): don't mess with file->f_pos 2018-02-09 19:28:01 -08:00
unicore32 lib: optimize cpumask_next_and() 2018-02-06 18:32:44 -08:00
x86 crypto: x86/des3_ede - des3_ede_skciphers[] can be static 2018-03-09 22:45:53 +08:00
xtensa unify {de,}mangle_poll(), get rid of kernel-side POLL... 2018-02-11 14:37:22 -08:00
.gitignore
Kconfig Makefile: introduce CONFIG_CC_STACKPROTECTOR_AUTO 2018-02-06 18:32:44 -08:00