Add x86_64 assembly code:
- BMI2
- AVX2 (using ymm, slower than BMI2)
- AVX2 of 4 similtaneous hashes
Add SHAKE128 functions and tests.
Add Absorb and Squeeze functions for SHAKE128 and SHAK256 and tests.
Add doxygen for SHA-3 and SHAKE functions.
Update other generated x86_64 assembly files to include settings.h.