Commit Graph

555 Commits

Author SHA1 Message Date
Martin Willi cee7a36ecb crypto: x86/chacha20 - Add a 8-block AVX-512VL variant
This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-29 16:27:04 +08:00
Eric Biggers 878afc35cd crypto: poly1305 - use structures for key and accumulator
In preparation for exposing a low-level Poly1305 API which implements
the ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305
MAC and supports block-aligned inputs only, create structures
poly1305_key and poly1305_state which hold the limbs of the Poly1305
"r" key and accumulator, respectively.

These structures could actually have the same type (e.g. poly1305_val),
but different types are preferable, to prevent misuse.

Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20 14:26:56 +08:00
Eric Biggers 1ca1b91794 crypto: chacha20-generic - refactor to allow varying number of rounds
In preparation for adding XChaCha12 support, rename/refactor
chacha20-generic to support different numbers of rounds.  The
justification for needing XChaCha12 support is explained in more detail
in the patch "crypto: chacha - add XChaCha12 support".

The only difference between ChaCha{8,12,20} are the number of rounds
itself; all other parts of the algorithm are the same.  Therefore,
remove the "20" from all definitions, structures, functions, files, etc.
that will be shared by all ChaCha versions.

Also make ->setkey() store the round count in the chacha_ctx (previously
chacha20_ctx).  The generic code then passes the round count through to
chacha_block().  There will be a ->setkey() function for each explicitly
allowed round count; the encrypt/decrypt functions will be the same.  I
decided not to do it the opposite way (same ->setkey() function for all
round counts, with different encrypt/decrypt functions) because that
would have required more boilerplate code in architecture-specific
implementations of ChaCha and XChaCha.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20 14:26:55 +08:00
Martin Willi 8a5a79d555 crypto: x86/chacha20 - Add a 4-block AVX2 variant
This variant builds upon the idea of the 2-block AVX2 variant that
shuffles words after each round. The shuffling has a rather high latency,
so the arithmetic units are not optimally used.

Given that we have plenty of registers in AVX, this version parallelizes
the 2-block variant to do four blocks. While the first two blocks are
shuffling, the CPU can do the XORing on the second two blocks and
vice-versa, which makes this version much faster than the SSSE3 variant
for four blocks. The latter is now mostly for systems that do not have
AVX2, but there it is the work-horse, so we keep it in place.

The partial XORing function trailer is very similar to the AVX2 2-block
variant. While it could be shared, that code segment is rather short;
profiling is also easier with the trailer integrated, so we keep it per
function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16 14:11:04 +08:00
Martin Willi a5dd97f862 crypto: x86/chacha20 - Add a 2-block AVX2 variant
This variant uses the same principle as the single block SSSE3 variant
by shuffling the state matrix after each round. With the wider AVX
registers, we can do two blocks in parallel, though.

This function can increase performance and efficiency significantly for
lengths that would otherwise require a 4-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16 14:11:04 +08:00
Martin Willi 9b17608f15 crypto: x86/chacha20 - Use larger block functions more aggressively
Now that all block functions support partial lengths, engage the wider
block sizes more aggressively. This prevents using smaller block
functions multiple times, where the next larger block function would
have been faster.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16 14:11:04 +08:00
Martin Willi c3b734dd32 crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant
Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.

To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16 14:11:04 +08:00
Martin Willi db8e15a249 crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant
Add a length argument to the quad block function for SSSE3, so the
block function may XOR only a partial length of four blocks.

As we already have the stack set up, the partial XORing does not need
to. This gives a slightly different function trailer, so we keep that
separate from the 1-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16 14:11:04 +08:00
Martin Willi e4e72063d3 crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant
Add a length argument to the single block function for SSSE3, so the
block function may XOR only a partial length of the full block. Given
that the setup code is rather cheap, the function does not process more
than one block; this allows us to keep the block function selection in
the C glue code.

The required branching does not negatively affect performance for full
block sizes. The partial XORing uses simple "rep movsb" to copy the
data before and after doing XOR in SSE. This is rather efficient on
modern processors; movsw can be slightly faster, but the additional
complexity is probably not worth it.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16 14:11:04 +08:00
Eric Biggers e0db9c48f1 crypto: x86/aes-ni - fix build error following fpu template removal
aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit()
to register/unregister the "fpu" template.  But these functions don't
exist anymore, causing a build error.  Remove the calls to them.

Fixes: 944585a64f ("crypto: x86/aes-ni - remove special handling of AES in PCBC mode")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-10-08 13:47:02 +08:00
Ard Biesheuvel 944585a64f crypto: x86/aes-ni - remove special handling of AES in PCBC mode
For historical reasons, the AES-NI based implementation of the PCBC
chaining mode uses a special FPU chaining mode wrapper template to
amortize the FPU start/stop overhead over multiple blocks.

When this FPU wrapper was introduced, it supported widely used
chaining modes such as XTS and CTR (as well as LRW), but currently,
PCBC is the only remaining user.

Since there are no known users of pcbc(aes) in the kernel, let's remove
this special driver, and rely on the generic pcbc driver to encapsulate
the AES-NI core cipher.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-10-05 10:16:56 +08:00
Kees Cook 88fe0b957f x86/fpu: Remove VLA usage of skcipher
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Cc: x86@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-28 12:46:07 +08:00
Herbert Xu 910e3ca10b Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge crypto-2.6 to resolve caam conflict with skcipher conversion.
2018-09-21 13:22:37 +08:00
Mikulas Patocka a788848116 crypto: aesni - don't use GFP_ATOMIC allocation if the request doesn't cross a page in gcm
This patch fixes gcmaes_crypt_by_sg so that it won't use memory
allocation if the data doesn't cross a page boundary.

Authenticated encryption may be used by dm-crypt. If the encryption or
decryption fails, it would result in I/O error and filesystem corruption.
The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can
fail anytime. This patch fixes the logic so that it won't attempt the
failing allocation if the data doesn't cross a page boundary.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-14 14:08:53 +08:00
Ondrej Mosnacek 24568b47d4 crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2
It turns out OSXSAVE needs to be checked only for AVX, not for SSE.
Without this patch the affected modules refuse to load on CPUs with SSE2
but without AVX support.

Fixes: 877ccce7cb ("crypto: x86/aegis,morus - Fix and simplify CPUID checks")
Cc: <stable@vger.kernel.org> # 4.18
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-14 14:08:27 +08:00
Ard Biesheuvel ab8085c130 crypto: x86 - remove SHA multibuffer routines and mcryptd
As it turns out, the AVX2 multibuffer SHA routines are currently
broken [0], in a way that would have likely been noticed if this
code were in wide use. Since the code is too complicated to be
maintained by anyone except the original authors, and since the
performance benefits for real-world use cases are debatable to
begin with, it is better to drop it entirely for the moment.

[0] https://marc.info/?l=linux-crypto-vger&m=153476243825350&w=2

Suggested-by: Eric Biggers <ebiggers@google.com>
Cc: Megha Dey <megha.dey@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-04 11:37:04 +08:00
Linus Torvalds b4df50de6a Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - Check for the right CPU feature bit in sm4-ce on arm64.

 - Fix scatterwalk WARN_ON in aes-gcm-ce on arm64.

 - Fix unaligned fault in aesni on x86.

 - Fix potential NULL pointer dereference on exit in chtls.

 - Fix DMA mapping direction for RSA in caam.

 - Fix error path return value for xts setkey in caam.

 - Fix address endianness when DMA unmapping in caam.

 - Fix sleep-in-atomic in vmx.

 - Fix command corruption when queue is full in cavium/nitrox.

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: cavium/nitrox - fix for command corruption in queue full case with backlog submissions.
  crypto: vmx - Fix sleep-in-atomic bugs
  crypto: arm64/aes-gcm-ce - fix scatterwalk API violation
  crypto: aesni - Use unaligned loads from gcm_context_data
  crypto: chtls - fix null dereference chtls_free_uld()
  crypto: arm64/sm4-ce - check for the right CPU feature bit
  crypto: caam - fix DMA mapping direction for RSA forms 2 & 3
  crypto: caam/qi - fix error path in xts setkey
  crypto: caam/jr - fix descriptor DMA unmapping
2018-08-29 13:38:39 -07:00
Dave Watson e5b954e8d1 crypto: aesni - Use unaligned loads from gcm_context_data
A regression was reported bisecting to 1476db2d12
"Move HashKey computation from stack to gcm_context".  That diff
moved HashKey computation from the stack, which was explicitly aligned
in the asm, to a struct provided from the C code, depending on
AESNI_ALIGN_ATTR for alignment.   It appears some compilers may not
align this struct correctly, resulting in a crash on the movdqa
instruction when attempting to encrypt or decrypt data.

Fix by using unaligned loads for the HashKeys.  On modern
hardware there is no perf difference between the unaligned and
aligned loads.  All other accesses to gcm_context_data already use
unaligned loads.

Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Fixes: 1476db2d12 ("Move HashKey computation from stack to gcm_context")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-25 19:50:42 +08:00
Linus Torvalds dafa5f6577 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Fix dcache flushing crash in skcipher.
   - Add hash finup self-tests.
   - Reschedule during speed tests.

  Algorithms:
   - Remove insecure vmac and replace it with vmac64.
   - Add public key verification for DH/ECDH.

  Drivers:
   - Decrease priority of sha-mb on x86.
   - Improve NEON latency/throughput on ARM64.
   - Add md5/sha384/sha512/des/3des to inside-secure.
   - Support eip197d in inside-secure.
   - Only register algorithms supported by the host in virtio.
   - Add cts and remove incompatible cts1 from ccree.
   - Add hisilicon SEC security accelerator driver.
   - Replace msm hwrng driver with qcom pseudo rng driver.

  Misc:
   - Centralize CRC polynomials"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (121 commits)
  crypto: arm64/ghash-ce - implement 4-way aggregation
  crypto: arm64/ghash-ce - replace NEON yield check with block limit
  crypto: hisilicon - sec_send_request() can be static
  lib/mpi: remove redundant variable esign
  crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable
  crypto: arm64/aes-ce-gcm - implement 2-way aggregation
  crypto: arm64/aes-ce-gcm - operate on two input blocks at a time
  crypto: dh - make crypto_dh_encode_key() make robust
  crypto: dh - fix calculating encoded key size
  crypto: ccp - Check for NULL PSP pointer at module unload
  crypto: arm/chacha20 - always use vrev for 16-bit rotates
  crypto: ccree - allow bigger than sector XTS op
  crypto: ccree - zero all of request ctx before use
  crypto: ccree - remove cipher ivgen left overs
  crypto: ccree - drop useless type flag during reg
  crypto: ablkcipher - fix crash flushing dcache in error path
  crypto: blkcipher - fix crash flushing dcache in error path
  crypto: skcipher - fix crash flushing dcache in error path
  crypto: skcipher - remove unnecessary setting of walk->nbytes
  crypto: scatterwalk - remove scatterwalk_samebuf()
  ...
2018-08-15 16:01:47 -07:00
Linus Torvalds f24d6f2654 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Thomas Gleixner:
 "The lowlevel and ASM code updates for x86:

   - Make stack trace unwinding more reliable

   - ASM instruction updates for better code generation

   - Various cleanups"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Add two more instruction suffixes
  x86/asm/64: Use 32-bit XOR to zero registers
  x86/build/vdso: Simplify 'cmd_vdso2c'
  x86/build/vdso: Remove unused vdso-syms.lds
  x86/stacktrace: Enable HAVE_RELIABLE_STACKTRACE for the ORC unwinder
  x86/unwind/orc: Detect the end of the stack
  x86/stacktrace: Do not fail for ORC with regs on stack
  x86/stacktrace: Clarify the reliable success paths
  x86/stacktrace: Remove STACKTRACE_DUMP_ONCE
  x86/stacktrace: Do not unwind after user regs
  x86/asm: Use CC_SET/CC_OUT in percpu_cmpxchg8b_double() to micro-optimize code generation
2018-08-13 13:35:26 -07:00
Ondrej Mosnacek 877ccce7cb crypto: x86/aegis,morus - Fix and simplify CPUID checks
It turns out I had misunderstood how the x86_match_cpu() function works.
It evaluates a logical OR of the matching conditions, not logical AND.
This caused the CPU feature checks for AEGIS to pass even if only SSE2
(but not AES-NI) was supported (or vice versa), leading to potential
crashes if something tried to use the registered algs.

This patch switches the checks to a simpler method that is used e.g. in
the Camellia x86 code.

The patch also removes the MODULE_DEVICE_TABLE declarations which
actually seem to cause the modules to be auto-loaded at boot, which is
not desired. The crypto API on-demand module loading is sufficient.

Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Fixes: 6ecc9d9ff9 ("crypto: x86 - Add optimized MORUS implementations")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:51:15 +08:00
Herbert Xu c5f5aeef9b Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Merge mainline to pick up c7513c2a27 ("crypto/arm64: aes-ce-gcm -
add missing kernel_neon_begin/end pair").
2018-08-03 17:55:12 +08:00
Eric Biggers c87a405e3b crypto: ahash - remove useless setting of cra_type
Some ahash algorithms set .cra_type = &crypto_ahash_type.  But this is
redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the .cra_type automatically.
Apparently the useless assignment has just been copy+pasted around.

So, remove the useless assignment from all the ahash algorithms.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-09 00:30:26 +08:00
Eric Biggers 6a38f62245 crypto: ahash - remove useless setting of type flags
Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH.  But this
is redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the type flag automatically,
clearing any type flag that was already there.  Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the ahash algorithms.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-09 00:30:25 +08:00
Eric Biggers e50944e219 crypto: shash - remove useless setting of type flags
Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH.  But this
is redundant with the C structure type ('struct shash_alg'), and
crypto_register_shash() already sets the type flag automatically,
clearing any type flag that was already there.  Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the shash algorithms.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-09 00:30:24 +08:00
Eric Biggers 8aeef492fe crypto: x86/sha-mb - decrease priority of multibuffer algorithms
With all the crypto modules enabled on x86, and with a CPU that supports
AVX-2 but not SHA-NI instructions (e.g. Haswell, Broadwell, Skylake),
the "multibuffer" implementations of SHA-1, SHA-256, and SHA-512 are the
highest priority.  However, these implementations only perform well when
many hash requests are being submitted concurrently, filling all 8 AVX-2
lanes.  Otherwise, they are incredibly slow, as they waste time waiting
for more requests to arrive before proceeding to execute each request.

For example, here are the speeds I see hashing 4096-byte buffers with a
single thread on a Haswell-based processor:

            generic            avx2          mb (multibuffer)
            -------            --------      ----------------
sha1        602 MB/s           997 MB/s      0.61 MB/s
sha256      228 MB/s           412 MB/s      0.61 MB/s
sha512      312 MB/s           559 MB/s      0.61 MB/s

So, the multibuffer implementation is 500 to 1000 times slower than the
other implementations.  Note that with smaller buffers or more update()s
per digest, the difference would be even greater.

I believe the vast majority of people are in the boat where the
multibuffer code is much slower, and only a small minority are doing the
highly parallel, hashing-intensive, latency-flexible workloads (maybe
IPsec on servers?) where the multibuffer code may be beneficial.  Yet,
people often aren't familiar with all the crypto config options and so
the multibuffer code may inadvertently be built into the kernel.

Also the multibuffer code apparently hasn't been very well tested,
seeing as it was sometimes computing the wrong SHA-256 digest.

So, let's make the multibuffer algorithms low priority.  Users who want
to use them can either request them explicitly by driver name, or use
NETLINK_CRYPTO (crypto_user) to increase their priority at runtime.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-09 00:30:21 +08:00
Eric Biggers af839b4e54 crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2()
There is a copy-paste error where sha256_mb_mgr_get_comp_job_avx2()
copies the SHA-256 digest state from sha256_mb_mgr::args::digest to
job_sha256::result_digest.  Consequently, the sha256_mb algorithm
sometimes calculates the wrong digest.  Fix it.

Reproducer using AF_ALG:

    #include <assert.h>
    #include <linux/if_alg.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/socket.h>
    #include <unistd.h>

    static const __u8 expected[32] =
        "\xad\x7f\xac\xb2\x58\x6f\xc6\xe9\x66\xc0\x04\xd7\xd1\xd1\x6b\x02"
        "\x4f\x58\x05\xff\x7c\xb4\x7c\x7a\x85\xda\xbd\x8b\x48\x89\x2c\xa7";

    int main()
    {
        int fd;
        struct sockaddr_alg addr = {
            .salg_type = "hash",
            .salg_name = "sha256_mb",
        };
        __u8 data[4096] = { 0 };
        __u8 digest[32];
        int ret;
        int i;

        fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(fd, (void *)&addr, sizeof(addr));
        fork();
        fd = accept(fd, 0, 0);
        do {
            ret = write(fd, data, 4096);
            assert(ret == 4096);
            ret = read(fd, digest, 32);
            assert(ret == 32);
        } while (memcmp(digest, expected, 32) == 0);

        printf("wrong digest: ");
        for (i = 0; i < 32; i++)
            printf("%02x", digest[i]);
        printf("\n");
    }

Output was:

    wrong digest: ad7facb2000000000000000000000000ffffffef7cb47c7a85dabd8b48892ca7

Fixes: 172b1d6b5a ("crypto: sha256-mb - fix ctx pointer and digest copy")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-09 00:30:19 +08:00
Jan Beulich a7bea83089 x86/asm/64: Use 32-bit XOR to zero registers
Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
idioms don't require execution bandwidth, as they're being taken care
of in the frontend (through register renaming). Use 32-bit XORs instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: herbert@gondor.apana.org.au
Cc: pavel@ucw.cz
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/5B39FF1A02000078001CFB54@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:59:29 +02:00
Borislav Petkov 221e00d1fc crypto: x86 - Add missing RETs
Add explicit RETs to the tail calls of AEGIS and MORUS crypto algorithms
otherwise they run into INT3 padding due to

  ("x86/asm: Pad assembly functions with INT3 instructions")

leading to spurious debug exceptions.

Mike Galbraith <efault@gmx.de> took care of all the remaining callsites.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-01 23:33:20 +08:00
Eric Biggers b7b73cd5d7 crypto: x86/salsa20 - remove x86 salsa20 implementations
The x86 assembly implementations of Salsa20 use the frame base pointer
register (%ebp or %rbp), which breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Recent (v4.10+) kernels will warn about this, e.g.

WARNING: kernel stack regs at 00000000a8291e69 in syzkaller047086:4677 has bad 'bp' value 000000001077994c
[...]

But after looking into it, I believe there's very little reason to still
retain the x86 Salsa20 code.  First, these are *not* vectorized
(SSE2/SSSE3/AVX2) implementations, which would be needed to get anywhere
close to the best Salsa20 performance on any remotely modern x86
processor; they're just regular x86 assembly.  Second, it's still
unclear that anyone is actually using the kernel's Salsa20 at all,
especially given that now ChaCha20 is supported too, and with much more
efficient SSSE3 and AVX2 implementations.  Finally, in benchmarks I did
on both Intel and AMD processors with both gcc 8.1.0 and gcc 4.9.4, the
x86_64 salsa20-asm is actually slightly *slower* than salsa20-generic
(~3% slower on Skylake, ~10% slower on Zen), while the i686 salsa20-asm
is only slightly faster than salsa20-generic (~15% faster on Skylake,
~20% faster on Zen).  The gcc version made little difference.

So, the x86_64 salsa20-asm is pretty clearly useless.  That leaves just
the i686 salsa20-asm, which based on my tests provides a 15-20% speed
boost.  But that's without updating the code to not use %ebp.  And given
the maintenance cost, the small speed difference vs. salsa20-generic,
the fact that few people still use i686 kernels, the doubt that anyone
is even using the kernel's Salsa20 at all, and the fact that a SSE2
implementation would almost certainly be much faster on any remotely
modern x86 processor yet no one has cared enough to add one yet, I don't
think it's worthwhile to keep.

Thus, just remove both the x86_64 and i686 salsa20-asm implementations.

Reported-by: syzbot+ffa3a158337bbc01ff09@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-31 00:13:57 +08:00
Ondrej Mosnacek 2808f17319 crypto: morus - Mark MORUS SIMD glue as x86-specific
Commit 56e8e57fc3 ("crypto: morus - Add common SIMD glue code for
MORUS") accidetally consiedered the glue code to be usable by different
architectures, but it seems to be only usable on x86.

This patch moves it under arch/x86/crypto and adds 'depends on X86' to
the Kconfig options and also removes the prompt to hide these internal
options from the user.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-31 00:13:41 +08:00
Ondrej Mosnacek dd09f58ce0 crypto: x86/aegis256 - Fix wrong key buffer size
AEGIS-256 key is two blocks, not one.

Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-27 00:12:12 +08:00
Ondrej Mosnacek 6ecc9d9ff9 crypto: x86 - Add optimized MORUS implementations
This patch adds optimized implementations of MORUS-640 and MORUS-1280,
utilizing the SSE2 and AVX2 x86 extensions.

For MORUS-1280 (which operates on 256-bit blocks) we provide both AVX2
and SSE2 implementation. Although SSE2 MORUS-1280 is slower than AVX2
MORUS-1280, it is comparable in speed to the SSE2 MORUS-640.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-19 00:15:35 +08:00
Ondrej Mosnacek 1d373d4e8e crypto: x86 - Add optimized AEGIS implementations
This patch adds optimized implementations of AEGIS-128, AEGIS-128L,
and AEGIS-256, utilizing the AES-NI and SSE2 x86 extensions.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-19 00:14:00 +08:00
Colin Ian King 158b52ff14 crypto: ghash-clmulni - fix spelling mistake: "acclerated" -> "accelerated"
Trivial fix to spelling mistake in module description text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-05 14:52:58 +08:00
Wu Fengguang 9cc16b4d32 crypto: x86/des3_ede - des3_ede_skciphers[] can be static
Fixes: 09c0f03bf8 ("crypto: x86/des3_ede - convert to skcipher interface")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-09 22:45:53 +08:00
Eric Biggers 75d8a5532f crypto: x86/glue_helper - rename glue_skwalk_fpu_begin()
There are no users of the original glue_fpu_begin() anymore, so rename
glue_skwalk_fpu_begin() to glue_fpu_begin() so that it matches
glue_fpu_end() again.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:35 +08:00
Eric Biggers 0d87d0f425 crypto: x86/glue_helper - remove blkcipher_walk functions
Now that all glue_helper users have been switched from the blkcipher
interface over to the skcipher interface, remove the versions of the
glue_helper functions that handled the blkcipher interface.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:34 +08:00
Eric Biggers 217afccf65 crypto: lrw - remove lrw_crypt()
Now that all users of lrw_crypt() have been removed in favor of the LRW
template wrapping an ECB mode algorithm, remove lrw_crypt().  Also
remove crypto/lrw.h as that is no longer needed either; and fold
'struct lrw_table_ctx' into 'struct priv', lrw_init_table() into
setkey(), and lrw_free_table() into exit_tfm().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:34 +08:00
Eric Biggers 44893bc296 crypto: x86/camellia-aesni-avx, avx2 - convert to skcipher interface
Convert the AESNI AVX and AESNI AVX2 implementations of Camellia from
the (deprecated) ablkcipher and blkcipher interfaces over to the
skcipher interface.  Note that this includes replacing the use of
ablk_helper with crypto_simd.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:32 +08:00
Eric Biggers 1af6d03710 crypto: x86/camellia - convert to skcipher interface
Convert the x86 asm implementation of Camellia from the (deprecated)
blkcipher interface over to the skcipher interface.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:32 +08:00
Eric Biggers 451cc49324 crypto: x86/camellia - remove XTS algorithm
The XTS template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic XTS code themselves via xts_crypt().

Remove the xts-camellia-asm algorithm which did this.  Users who request
xts(camellia) and previously would have gotten xts-camellia-asm will now
get xts(ecb-camellia-asm) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:32 +08:00
Eric Biggers 6043d341f0 crypto: x86/camellia - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-camellia-asm algorithm which did this.  Users who request
lrw(camellia) and previously would have gotten lrw-camellia-asm will now
get lrw(ecb-camellia-asm) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:31 +08:00
Eric Biggers 44c9b75409 crypto: x86/camellia-aesni-avx2 - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-camellia-aesni-avx2 algorithm which did this.  Users who
request lrw(camellia) and previously would have gotten
lrw-camellia-aesni-avx2 will now get lrw(ecb-camellia-aesni-avx2)
instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:31 +08:00
Eric Biggers 6fcb81b562 crypto: x86/camellia-aesni-avx - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-camellia-aesni algorithm which did this.  Users who
request lrw(camellia) and previously would have gotten
lrw-camellia-aesni will now get lrw(ecb-camellia-aesni) instead, which
is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:30 +08:00
Eric Biggers 09c0f03bf8 crypto: x86/des3_ede - convert to skcipher interface
Convert the x86 asm implementation of Triple DES from the (deprecated)
blkcipher interface over to the skcipher interface.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:30 +08:00
Eric Biggers c1679171c6 crypto: x86/blowfish: convert to skcipher interface
Convert the x86 asm implementation of Blowfish from the (deprecated)
blkcipher interface over to the skcipher interface.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:29 +08:00
Eric Biggers 4bd9692431 crypto: x86/cast6-avx - convert to skcipher interface
Convert the AVX implementation of CAST6 from the (deprecated) ablkcipher
and blkcipher interfaces over to the skcipher interface.  Note that this
includes replacing the use of ablk_helper with crypto_simd.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:28 +08:00
Eric Biggers f51a1fa439 crypto: x86/cast6-avx - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-cast6-avx algorithm which did this.  Users who request
lrw(cast6) and previously would have gotten lrw-cast6-avx will now get
lrw(ecb-cast6-avx) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:27 +08:00
Eric Biggers 1e63183a20 crypto: x86/cast5-avx - convert to skcipher interface
Convert the AVX implementation of CAST5 from the (deprecated) ablkcipher
and blkcipher interfaces over to the skcipher interface.  Note that this
includes replacing the use of ablk_helper with crypto_simd.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:26 +08:00
Eric Biggers 8f461b1e02 crypto: x86/cast5-avx - fix ECB encryption when long sg follows short one
With ecb-cast5-avx, if a 128+ byte scatterlist element followed a
shorter one, then the algorithm accidentally encrypted/decrypted only 8
bytes instead of the expected 128 bytes.  Fix it by setting the
encryption/decryption 'fn' correctly.

Fixes: c12ab20b16 ("crypto: cast5/avx - avoid using temporary stack buffers")
Cc: <stable@vger.kernel.org> # v3.8+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:25 +08:00
Eric Biggers 0e6ab46dad crypto: x86/twofish-avx - convert to skcipher interface
Convert the AVX implementation of Twofish from the (deprecated)
ablkcipher and blkcipher interfaces over to the skcipher interface.
Note that this includes replacing the use of ablk_helper with
crypto_simd.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:25 +08:00
Eric Biggers 876e9f0c12 crypto: x86/twofish-avx - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-twofish-avx algorithm which did this.  Users who request
lrw(twofish) and previously would have gotten lrw-twofish-avx will now
get lrw(ecb-twofish-avx) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:25 +08:00
Eric Biggers 37992fa47f crypto: x86/twofish-3way - convert to skcipher interface
Convert the 3-way implementation of Twofish from the (deprecated)
blkcipher interface over to the skcipher interface.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:24 +08:00
Eric Biggers ebeea983dd crypto: x86/twofish-3way - remove XTS algorithm
The XTS template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic XTS code themselves via xts_crypt().

Remove the xts-twofish-3way algorithm which did this.  Users who request
xts(twofish) and previously would have gotten xts-twofish-3way will now
get xts(ecb-twofish-3way) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:24 +08:00
Eric Biggers 68bfc4924b crypto: x86/twofish-3way - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-twofish-3way algorithm which did this.  Users who request
lrw(twofish) and previously would have gotten lrw-twofish-3way will now
get lrw(ecb-twofish-3way) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:23 +08:00
Eric Biggers e16bf974b3 crypto: x86/serpent-avx,avx2 - convert to skcipher interface
Convert the AVX and AVX2 implementations of Serpent from the
(deprecated) ablkcipher and blkcipher interfaces over to the skcipher
interface.  Note that this includes replacing the use of ablk_helper
with crypto_simd.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:22 +08:00
Eric Biggers 340b830326 crypto: x86/serpent-avx - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-serpent-avx algorithm which did this.  Users who request
lrw(serpent) and previously would have gotten lrw-serpent-avx will now
get lrw(ecb-serpent-avx) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:21 +08:00
Eric Biggers e5f382e643 crypto: x86/serpent-avx2 - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-serpent-avx2 algorithm which did this.  Users who request
lrw(serpent) and previously would have gotten lrw-serpent-avx2 will now
get lrw(ecb-serpent-avx2) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:20 +08:00
Eric Biggers e0f409dcb8 crypto: x86/serpent-sse2 - convert to skcipher interface
Convert the SSE2 implementation of Serpent from the (deprecated)
ablkcipher and blkcipher interfaces over to the skcipher interface.
Note that this includes replacing the use of ablk_helper with
crypto_simd.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:20 +08:00
Eric Biggers 8bab4e3cd5 crypto: x86/serpent-sse2 - remove XTS algorithm
The XTS template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic XTS code themselves via xts_crypt().

Remove the xts-serpent-sse2 algorithm which did this.  Users who request
xts(serpent) and previously would have gotten xts-serpent-sse2 will now
get xts(ecb-serpent-sse2) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:19 +08:00
Eric Biggers 2a05cfc35f crypto: x86/serpent-sse2 - remove LRW algorithm
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly.  Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().

Remove the lrw-serpent-sse2 algorithm which did this.  Users who request
lrw(serpent) and previously would have gotten lrw-serpent-sse2 will now
get lrw(ecb-serpent-sse2) instead, which is just as fast.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:19 +08:00
Eric Biggers f15f2a2542 crypto: x86/glue_helper - add skcipher_walk functions
Add ECB, CBC, and CTR functions to glue_helper which use skcipher_walk
rather than blkcipher_walk.  This will allow converting the remaining
x86 algorithms from the blkcipher interface over to the skcipher
interface, after which we'll be able to remove the blkcipher_walk
versions of these functions.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03 00:03:18 +08:00
Dave Watson e845520707 crypto: aesni - Update aesni-intel_glue to use scatter/gather
Add gcmaes_crypt_by_sg routine, that will do scatter/gather
by sg. Either src or dst may contain multiple buffers, so
iterate over both at the same time if they are different.
If the input is the same as the output, iterate only over one.

Currently both the AAD and TAG must be linear, so copy them out
with scatterlist_map_and_copy.  If first buffer contains the
entire AAD, we can optimize and not copy.   Since the AAD
can be any size, if copied it must be on the heap.  TAG can
be on the stack since it is always < 16 bytes.

Only the SSE routines are updated so far, so leave the previous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:52 +08:00
Dave Watson fb8986e643 crypto: aesni - Introduce scatter/gather asm function stubs
The asm macros are all set up now, introduce entry points.

GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:51 +08:00
Dave Watson 933d6aefd5 crypto: aesni - Add fast path for > 16 byte update
We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount.  Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c91e for the average case.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:50 +08:00
Dave Watson ae952c5ec6 crypto: aesni - Introduce partial block macro
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.

Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.

The data offset %r11 is also updated after the partial block.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:49 +08:00
Dave Watson 1476db2d12 crypto: aesni - Move HashKey computation from stack to gcm_context
HashKey computation only needs to happen once per scatter/gather operation,
save it between calls in gcm_context struct instead of on the stack.
Since the asm no longer stores anything on the stack, we can use
%rsp directly, and clean up the frame save/restore macros a bit.

Hashkeys actually only need to be calculated once per key and could
be moved to when set_key is called, however, the current glue code
falls back to generic aes code if fpu is disabled.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:49 +08:00
Dave Watson e2e34b0856 crypto: aesni - Move ghash_mul to GCM_COMPLETE
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:48 +08:00
Dave Watson 9660474b0e crypto: aesni - Fill in new context data structures
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:48 +08:00
Dave Watson c594c54085 crypto: aesni - Split AAD hash calculation to separate macro
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:47 +08:00
Dave Watson 9ee4a5df22 crypto: aesni - Introduce gcm_context_data
Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls.  It is passed
as the second argument (after crypto keys), other args are
renumbered.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:47 +08:00
Dave Watson ba45833e3e crypto: aesni - Merge encode and decode to GCM_ENC_DEC macro
Make a macro for the main encode/decode routine.  Only a small handful
of lines differ for enc and dec.   This will also become the main
scatter/gather update routine.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:45 +08:00
Dave Watson adcadab369 crypto: aesni - Add GCM_COMPLETE macro
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:44 +08:00
Dave Watson 7af964c2fc crypto: aesni - Add GCM_INIT macro
Reduce code duplication by introducting GCM_INIT macro.  This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:44 +08:00
Dave Watson 6c2c86b3e0 crypto: aesni - Macro-ify func save/restore
Macro-ify function save and restore.  These will be used in new functions
added for scatter/gather update operations.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:43 +08:00
Dave Watson e1fd316fba crypto: aesni - Merge INITIAL_BLOCKS_ENC/DEC
Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.

Use macro counter \@ to simplify implementation.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22 22:16:41 +08:00
Eric Biggers c7f582f5de crypto: sha512-mb - remove HASH_FIRST flag
The HASH_FIRST flag is never set.  Remove it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-15 23:26:46 +08:00
Eric Biggers ee689d164a crypto: sha256-mb - remove HASH_FIRST flag
The HASH_FIRST flag is never set.  Remove it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-15 23:26:46 +08:00
Eric Biggers 7a95cbd9a9 crypto: sha1-mb - remove HASH_FIRST flag
The HASH_FIRST flag is never set.  Remove it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-15 23:26:45 +08:00
Linus Torvalds 178e834c47 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - oversize stack frames on mn10300 in sha3-generic

   - warning on old compilers in sha3-generic

   - API error in sun4i_ss_prng

   - potential dead-lock in sun4i_ss_prng

   - null-pointer dereference in sha512-mb

   - endless loop when DECO acquire fails in caam

   - kernel oops when hashing empty message in talitos"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sun4i_ss_prng - convert lock to _bh in sun4i_ss_prng_generate
  crypto: sun4i_ss_prng - fix return value of sun4i_ss_prng_generate
  crypto: caam - fix endless loop when DECO acquire fails
  crypto: sha3-generic - Use __optimize to support old compilers
  compiler-gcc.h: __nostackprotector needs gcc-4.4 and up
  compiler-gcc.h: Introduce __optimize function attribute
  crypto: sha3-generic - deal with oversize stack frames
  crypto: talitos - fix Kernel Oops on hashing an empty file
  crypto: sha512-mb - initialize pending lengths correctly
2018-02-12 08:57:21 -08:00
Eric Biggers eff84b3790 crypto: sha512-mb - initialize pending lengths correctly
The SHA-512 multibuffer code keeps track of the number of blocks pending
in each lane.  The minimum of these values is used to identify the next
lane that will be completed.  Unused lanes are set to a large number
(0xFFFFFFFF) so that they don't affect this calculation.

However, it was forgotten to set the lengths to this value in the
initial state, where all lanes are unused.  As a result it was possible
for sha512_mb_mgr_get_comp_job_avx2() to select an unused lane, causing
a NULL pointer dereference.  Specifically this could happen in the case
where ->update() was passed fewer than SHA512_BLOCK_SIZE bytes of data,
so it then called sha_complete_job() without having actually submitted
any blocks to the multi-buffer code.  This hit a NULL pointer
dereference if another task happened to have submitted blocks
concurrently to the same CPU and the flush timer had not yet expired.

Fix this by initializing sha512_mb_mgr->lens correctly.

As usual, this bug was found by syzkaller.

Fixes: 45691e2d9b ("crypto: sha512-mb - submit/flush routines for AVX2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-08 22:37:05 +11:00
Linus Torvalds a103950e0d Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Enforce the setting of keys for keyed aead/hash/skcipher
     algorithms.
   - Add multibuf speed tests in tcrypt.

  Algorithms:
   - Improve performance of sha3-generic.
   - Add native sha512 support on arm64.
   - Add v8.2 Crypto Extentions version of sha3/sm3 on arm64.
   - Avoid hmac nesting by requiring underlying algorithm to be unkeyed.
   - Add cryptd_max_cpu_qlen module parameter to cryptd.

  Drivers:
   - Add support for EIP97 engine in inside-secure.
   - Add inline IPsec support to chelsio.
   - Add RevB core support to crypto4xx.
   - Fix AEAD ICV check in crypto4xx.
   - Add stm32 crypto driver.
   - Add support for BCM63xx platforms in bcm2835 and remove bcm63xx.
   - Add Derived Key Protocol (DKP) support in caam.
   - Add Samsung Exynos True RNG driver.
   - Add support for Exynos5250+ SoCs in exynos PRNG driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (166 commits)
  crypto: picoxcell - Fix error handling in spacc_probe()
  crypto: arm64/sha512 - fix/improve new v8.2 Crypto Extensions code
  crypto: arm64/sm3 - new v8.2 Crypto Extensions implementation
  crypto: arm64/sha3 - new v8.2 Crypto Extensions implementation
  crypto: testmgr - add new testcases for sha3
  crypto: sha3-generic - export init/update/final routines
  crypto: sha3-generic - simplify code
  crypto: sha3-generic - rewrite KECCAK transform to help the compiler optimize
  crypto: sha3-generic - fixes for alignment and big endian operation
  crypto: aesni - handle zero length dst buffer
  crypto: artpec6 - remove select on non-existing CRYPTO_SHA384
  hwrng: bcm2835 - Remove redundant dev_err call in bcm2835_rng_probe()
  crypto: stm32 - remove redundant dev_err call in stm32_cryp_probe()
  crypto: axis - remove unnecessary platform_get_resource() error check
  crypto: testmgr - test misuse of result in ahash
  crypto: inside-secure - make function safexcel_try_push_requests static
  crypto: aes-generic - fix aes-generic regression on powerpc
  crypto: chelsio - Fix indentation warning
  crypto: arm64/sha1-ce - get rid of literal pool
  crypto: arm64/sha2-ce - move the round constant table to .rodata section
  ...
2018-01-31 14:22:45 -08:00
Stephan Mueller 9c674e1e2f crypto: aesni - handle zero length dst buffer
GCM can be invoked with a zero destination buffer. This is possible if
the AAD and the ciphertext have zero lengths and only the tag exists in
the source buffer (i.e. a source buffer cannot be zero). In this case,
the GCM cipher only performs the authentication and no decryption
operation.

When the destination buffer has zero length, it is possible that no page
is mapped to the SG pointing to the destination. In this case,
sg_page(req->dst) is an invalid access. Therefore, page accesses should
only be allowed if the req->dst->length is non-zero which is the
indicator that a page must exist.

This fixes a crash that can be triggered by user space via AF_ALG.

CC: <stable@vger.kernel.org>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-26 01:10:32 +11:00
Linus Torvalds 40548c6b6c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
 "This contains:

   - a PTI bugfix to avoid setting reserved CR3 bits when PCID is
     disabled. This seems to cause issues on a virtual machine at least
     and is incorrect according to the AMD manual.

   - a PTI bugfix which disables the perf BTS facility if PTI is
     enabled. The BTS AUX buffer is not globally visible and causes the
     CPU to fault when the mapping disappears on switching CR3 to user
     space. A full fix which restores BTS on PTI is non trivial and will
     be worked on.

   - PTI bugfixes for EFI and trusted boot which make sure that the user
     space visible page table entries have the NX bit cleared

   - removal of dead code in the PTI pagetable setup functions

   - add PTI documentation

   - add a selftest for vsyscall to verify that the kernel actually
     implements what it advertises.

   - a sysfs interface to expose vulnerability and mitigation
     information so there is a coherent way for users to retrieve the
     status.

   - the initial spectre_v2 mitigations, aka retpoline:

      + The necessary ASM thunk and compiler support

      + The ASM variants of retpoline and the conversion of affected ASM
        code

      + Make LFENCE serializing on AMD so it can be used as speculation
        trap

      + The RSB fill after vmexit

   - initial objtool support for retpoline

  As I said in the status mail this is the most of the set of patches
  which should go into 4.15 except two straight forward patches still on
  hold:

   - the retpoline add on of LFENCE which waits for ACKs

   - the RSB fill after context switch

  Both should be ready to go early next week and with that we'll have
  covered the major holes of spectre_v2 and go back to normality"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  x86,perf: Disable intel_bts when PTI
  security/Kconfig: Correct the Documentation reference for PTI
  x86/pti: Fix !PCID and sanitize defines
  selftests/x86: Add test_vsyscall
  x86/retpoline: Fill return stack buffer on vmexit
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline: Add initial retpoline support
  objtool: Allow alternatives to be ignored
  objtool: Detect jumps to retpoline thunks
  x86/pti: Make unpoison of pgd for trusted boot work for real
  x86/alternatives: Fix optimize_nops() checking
  sysfs/cpu: Fix typos in vulnerability documentation
  x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
  ...
2018-01-14 09:51:25 -08:00
Eric Biggers c9a3ff8f22 crypto: x86/salsa20 - cleanup and convert to skcipher API
Convert salsa20-asm from the deprecated "blkcipher" API to the
"skcipher" API, in the process fixing it up to use the generic helpers.
This allows removing the salsa20_keysetup() and salsa20_ivsetup()
assembly functions, which aren't performance critical; the C versions do
just fine.

This also fixes the same bug that salsa20-generic had, where the state
array was being maintained directly in the transform context rather than
on the stack or in the request context.  Thus, if multiple threads used
the same Salsa20 transform concurrently they produced the wrong results.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-12 23:03:43 +11:00
Eric Biggers a208fa8f33 crypto: hash - annotate algorithms taking optional key
We need to consistently enforce that keyed hashes cannot be used without
setting the key.  To do this we need a reliable way to determine whether
a given hash algorithm is keyed or not.  AF_ALG currently does this by
checking for the presence of a ->setkey() method.  However, this is
actually slightly broken because the CRC-32 algorithms implement
->setkey() but can also be used without a key.  (The CRC-32 "key" is not
actually a cryptographic key but rather represents the initial state.
If not overridden, then a default initial state is used.)

Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
indicates that the algorithm has a ->setkey() method, but it is not
required to be called.  Then set it on all the CRC-32 algorithms.

The same also applies to the Adler-32 implementation in Lustre.

Also, the cryptd and mcryptd templates have to pass through the flag
from their underlying algorithm.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-12 23:03:35 +11:00
Eric Biggers a16e772e66 crypto: poly1305 - remove ->setkey() method
Since Poly1305 requires a nonce per invocation, the Linux kernel
implementations of Poly1305 don't use the crypto API's keying mechanism
and instead expect the key and nonce as the first 32 bytes of the data.
But ->setkey() is still defined as a stub returning an error code.  This
prevents Poly1305 from being used through AF_ALG and will also break it
completely once we start enforcing that all crypto API users (not just
AF_ALG) call ->setkey() if present.

Fix it by removing crypto_poly1305_setkey(), leaving ->setkey as NULL.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-12 23:03:14 +11:00
David Woodhouse 9697fa39ef x86/retpoline/crypto: Convert crypto assembler indirect jumps
Convert all indirect jumps in crypto assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-6-git-send-email-dwmw@amazon.co.uk
2018-01-12 00:14:29 +01:00
Eric Biggers b7dac37318 crypto: x86/poly1305 - remove cra_alignmask
crypto_poly1305_final() no longer requires a cra_alignmask, and nothing
else in the x86 poly1305-simd implementation does either.  So remove the
cra_alignmask so that the crypto API does not have to unnecessarily
align the buffers.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-05 18:43:12 +11:00
Junaid Shahid 1ecdd37e30 crypto: aesni - Fix out-of-bounds access of the AAD buffer in generic-gcm-aesni
The aesni_gcm_enc/dec functions can access memory after the end of
the AAD buffer if the AAD length is not a multiple of 4 bytes.
It didn't matter with rfc4106-gcm-aesni as in that case the AAD was
always followed by the 8 byte IV, but that is no longer the case with
generic-gcm-aesni. This can potentially result in accessing a page that
is not mapped and thus causing the machine to crash. This patch fixes
that by reading the last <16 byte block of the AAD byte-by-byte and
optionally via an 8-byte load if the block was at least 8 bytes.

Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen")
Cc: <stable@vger.kernel.org>
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-12-28 17:56:51 +11:00
Junaid Shahid b20209c91e crypto: aesni - Fix out-of-bounds access of the data buffer in generic-gcm-aesni
The aesni_gcm_enc/dec functions can access memory before the start of
the data buffer if the length of the data buffer is less than 16 bytes.
This is because they perform the read via a single 16-byte load. This
can potentially result in accessing a page that is not mapped and thus
causing the machine to crash. This patch fixes that by reading the
partial block byte-by-byte and optionally an via 8-byte load if the block
was at least 8 bytes.

Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen")
Cc: <stable@vger.kernel.org>
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-12-28 17:56:51 +11:00
Eric Biggers d8c7fe9f2a crypto: x86/twofish-3way - Fix %rbp usage
Using %rbp as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.

In twofish-3way, we can't simply replace %rbp with another register
because there are none available.  Instead, we use the stack to hold the
values that %rbp, %r11, and %r12 were holding previously.  Each of these
values represents the half of the output from the previous Feistel round
that is being passed on unchanged to the following round.  They are only
used once per round, when they are exchanged with %rax, %rbx, and %rcx.

As a result, we free up 3 registers (one per block) and can reassign
them so that %rbp is not used, and additionally %r14 and %r15 are not
used so they do not need to be saved/restored.

There may be a small overhead caused by replacing 'xchg REG, REG' with
the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per
round.  But, counterintuitively, when I tested "ctr-twofish-3way" on a
Haswell processor, the new version was actually about 2% faster.
(Perhaps 'xchg' is not as well optimized as plain moves.)

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-12-28 17:56:44 +11:00
Sabrina Dubroca fc8517bf62 crypto: aesni - add wrapper for generic gcm(aes)
When I added generic-gcm-aes I didn't add a wrapper like the one
provided for rfc4106(gcm(aes)). We need to add a cryptd wrapper to fall
back on in case the FPU is not available, otherwise we might corrupt the
FPU state.

Fixes: cce2ea8d90 ("crypto: aesni - add generic gcm(aes)")
Cc: <stable@vger.kernel.org>
Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-12-22 19:52:46 +11:00
Sabrina Dubroca 106840c410 crypto: aesni - fix typo in generic_gcmaes_decrypt
generic_gcmaes_decrypt needs to use generic_gcmaes_ctx, not
aesni_rfc4106_gcm_ctx. This is actually harmless because the fields in
struct generic_gcmaes_ctx share the layout of the same fields in
aesni_rfc4106_gcm_ctx.

Fixes: cce2ea8d90 ("crypto: aesni - add generic gcm(aes)")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-12-22 19:52:46 +11:00
Eric Biggers 796c99fbd7 crypto: x86/chacha20 - Remove cra_alignmask
Now that the generic ChaCha20 implementation no longer needs a
cra_alignmask, the x86 one doesn't either -- given that the x86
implementation doesn't need the alignment itself.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-11-29 17:33:33 +11:00
Eric Biggers ecaaab5649 crypto: salsa20 - fix blkcipher_walk API usage
When asked to encrypt or decrypt 0 bytes, both the generic and x86
implementations of Salsa20 crash in blkcipher_walk_done(), either when
doing 'kfree(walk->buffer)' or 'free_page((unsigned long)walk->page)',
because walk->buffer and walk->page have not been initialized.

The bug is that Salsa20 is calling blkcipher_walk_done() even when
nothing is in 'walk.nbytes'.  But blkcipher_walk_done() is only meant to
be called when a nonzero number of bytes have been provided.

The broken code is part of an optimization that tries to make only one
call to salsa20_encrypt_bytes() to process inputs that are not evenly
divisible by 64 bytes.  To fix the bug, just remove this "optimization"
and use the blkcipher_walk API the same way all the other users do.

Reproducer:

    #include <linux/if_alg.h>
    #include <sys/socket.h>
    #include <unistd.h>

    int main()
    {
            int algfd, reqfd;
            struct sockaddr_alg addr = {
                    .salg_type = "skcipher",
                    .salg_name = "salsa20",
            };
            char key[16] = { 0 };

            algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
            bind(algfd, (void *)&addr, sizeof(addr));
            reqfd = accept(algfd, 0, 0);
            setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
            read(reqfd, key, sizeof(key));
    }

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: eb6f13eb9f ("[CRYPTO] salsa20_generic: Fix multi-page processing")
Cc: <stable@vger.kernel.org> # v2.6.25+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-11-29 16:25:58 +11:00
Linus Torvalds 37dc79565c Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.15:

  API:

   - Disambiguate EBUSY when queueing crypto request by adding ENOSPC.
     This change touches code outside the crypto API.
   - Reset settings when empty string is written to rng_current.

  Algorithms:

   - Add OSCCA SM3 secure hash.

  Drivers:

   - Remove old mv_cesa driver (replaced by marvell/cesa).
   - Enable rfc3686/ecb/cfb/ofb AES in crypto4xx.
   - Add ccm/gcm AES in crypto4xx.
   - Add support for BCM7278 in iproc-rng200.
   - Add hash support on Exynos in s5p-sss.
   - Fix fallback-induced error in vmx.
   - Fix output IV in atmel-aes.
   - Fix empty GCM hash in mediatek.

  Others:

   - Fix DoS potential in lib/mpi.
   - Fix potential out-of-order issues with padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits)
  lib/mpi: call cond_resched() from mpi_powm() loop
  crypto: stm32/hash - Fix return issue on update
  crypto: dh - Remove pointless checks for NULL 'p' and 'g'
  crypto: qat - Clean up error handling in qat_dh_set_secret()
  crypto: dh - Don't permit 'key' or 'g' size longer than 'p'
  crypto: dh - Don't permit 'p' to be 0
  crypto: dh - Fix double free of ctx->p
  hwrng: iproc-rng200 - Add support for BCM7278
  dt-bindings: rng: Document BCM7278 RNG200 compatible
  crypto: chcr - Replace _manual_ swap with swap macro
  crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[]
  hwrng: virtio - Virtio RNG devices need to be re-registered after suspend/resume
  crypto: atmel - remove empty functions
  crypto: ecdh - remove empty exit()
  MAINTAINERS: update maintainer for qat
  crypto: caam - remove unused param of ctx_map_to_sec4_sg()
  crypto: caam - remove unneeded edesc zeroization
  crypto: atmel-aes - Reset the controller before each use
  crypto: atmel-aes - properly set IV after {en,de}crypt
  hwrng: core - Reset user selected rng by writing "" to rng_current
  ...
2017-11-14 10:52:09 -08:00
Linus Torvalds af903dcd31 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes an unaligned panic in x86/sha-mb and a bug in ccm that
  triggers with certain underlying implementations"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ccm - preserve the IV buffer
  crypto: x86/sha1-mb - fix panic due to unaligned access
  crypto: x86/sha256-mb - fix panic due to unaligned access
2017-11-06 09:05:03 -08:00
Andrey Ryabinin d041b55779 crypto: x86/sha1-mb - fix panic due to unaligned access
struct sha1_ctx_mgr allocated in sha1_mb_mod_init() via kzalloc()
and later passed in sha1_mb_flusher_mgr_flush_avx2() function where
instructions vmovdqa used to access the struct. vmovdqa requires
16-bytes aligned argument, but nothing guarantees that struct
sha1_ctx_mgr will have that alignment. Unaligned vmovdqa will
generate GP fault.

Fix this by replacing vmovdqa with vmovdqu which doesn't have alignment
requirements.

Fixes: 2249cbb53e ("crypto: sha-mb - SHA1 multibuffer submit and flush routines for AVX2")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-11-03 21:35:35 +08:00