The delay in the TTS audio playing would be because your second command stops the dictation mode, and the "Say Something with Text-To-Speech" only plays when that is completed.
This issue has been reported before (
here and
here), though it is quite rare.
Unfortunately, as there is no diagnostic information available indicating why this delay in the Microsoft speech recognition engine is occurring, there is no real targeted solution for this.
One user reinstalled Windows, and still experienced the issue (on one machine, while never experiencing it on another at all), so there may even be a hardware/driver component to it.