Written from scratch rather than copied from GMP, due to LGPL 2.1 vs GPL 3, but tested with the GMP testsuite. This is 50% faster than the generic code as measured on Cortex-A15. It is 25% slower than the current GMP routine on the same core.