For Montgomery Reduction of P256: Don't set x10 and x11 to words of mu << 32. x11 is needed later and there are plenty of registers.
For Montgomery Reduction of P256: Don't set x10 and x11 to words of mu << 32. x11 is needed later and there are plenty of registers.