Faster small code and fast code.
Allow fixed 4096-bit FFDHE parameters in benchmark.
Convert [u]int[32|64|128]*_t types to sp_[u]int[32|64|128].
Add a div for when top bits are all 1
WOLFSSL_SP_FAST_LARGE_CODE added to make mul_add function faster on
non-embedded platforms.
Change mod_exp window sizes for same performance but less memory.
P256 with c32 now 9 words instead of 10.