As far as I'm aware, Speech Platform 11 doesn't use words for recognition, but rather phonemes (the sounds that form words, essentially), which it then attempts to match to the phonemes that form the predefined words or phrases in its command list.
Matching to a known set of possibilities is a lot simpler (though still complex, in this context) than matching to any possible word in a language
Perhaps think of like being given a list of a few words in a language you don't speak, like Chinese, for example, and recordings of how they sound.
Then, if you were to listen to a native speaker, you might be able to pick up those few words you know, but anything other than those words might as well be gibberish to you, as you have no idea what the sounds you're hearing signify.
So in practice, for Speech Platform 11, if it doesn't find a match in the predefined list of commands, it discards the input, as it has no capability of processing that input further. I.E. it can only compare against the commands it has been provided with.