VoiceAttack is intended to run as a single instance.
The SAPI speech recognition system that is normally used is, as mentioned, intended for a single user, and requires training for that user.
However, there is also
Speech Platform 11, which is intended to be user-agnostic, and does not need to be/cannot be trained.
If you don't intend to use dictation-based features (like the dictation mode or wildcard commands), the latter could be a solution.
For the actual sound input, assuming doing it in hardware isn't an option for you, perhaps something like VoiceMeeter (unrelated to VoiceAttack) could facilitate that, by combining two recording devices into a single virtual recording device, which the speech recognition system would then listen to.
It does support hotkeys, if I recall correctly, so setting up a push-to-talk key for each input is perhaps possible.