There are plans to add more options for speech recognition, however, the availability and timing of these additions will tie in to the level of complexity and space requirements of those engines. For instance, if adding a local model requires gigabytes and gigabytes of space and the (esoteric) configuration of several tools to make it work with Windows, it will be something that the average VA user is not going to be able to implement. It's not quite to the point of, 'If you make it, they will come' just yet, as there are not a lot of folks out there willing to take this step.
So again, short answer is yes, this will happen (as this is something that I would like to have). However, the research is still ongoing.