Are you saying the text-to-speech output is recognized as a spoken command? If so, you have some type of crosstalk between your recording device and your playback device.
Are you using a (USB) headset, as is recommended? Speakers and a desktop microphone, for example, would be a near worst-case scenario, as the microphone will obviously pick up what comes out of the speakers.
If you are using a configuration like the latter, and have no alternatives, you can try enabling the "Wait until speech completes before continuing command" option for the "Say Something with Text-To-Speech" action, which will, as the name suggests, wait until the TTS playback is complete, before moving on to the next action in the command.