As the command is currently configured, it's using an action (the "Wait For Spoken Response" action) that by its very nature runs synchronously, meaning it is the only action actively running, and no other actions within the command can run until that action has stopped.
That action is specifically designed to do what its name suggests: wait for a spoken response. If you want to wait for other events simultaneously, it is not necessarily the best choice.
Having a separate "check" command that can be triggered by all three methods (voice, keyboard, joystick) would likely be the most feasible option.