Most users do not have a system capable of such AI processing in an acceptable timeframe. The reason you get such a long return time from AI text-to-speech generation is the limitation of the resources being assigned to the task, and the service dictating those limitations in order to serve many people using the tool.
Local AI processing will be a far off technology for that reason, and presently, accessing AI tools through API's allows us to get responses in mere seconds rather than what would be much longer through local AI processing depending on the local hardware.
That being said, such local hardware would need to be extremely high-end for a local implementation of Llama 2:
A high-end consumer GPU, such as the NVIDIA RTX 3090 or 4090, has 24 GB of VRAM. If we quantize Llama 2 70B to 4-bit precision, we still need 35 GB of memory (70 billion * 0.5 bytes). The model could fit into 2 consumer GPUs.
With GPTQ quantization, we can further reduce the precision to 3-bit without losing much in the performance of the model. A 3-bit parameter weighs 0.375 bytes in memory. Llama 2 70B quantized to 3-bit would still weigh 26.25 GB. It doesn’t fit into one consumer GPU.
For now, the best way to leverage AI systems in VoiceAttack is through API access to existing online tools, such as OpenAI and others. Just recently, OpenAI has released a new tool in open beta for Text-to-Speech, offering several extremely human-like voices for a fairly affordable price for the quality on offer.
In addition to this, the Microsoft Azure Cognitive Voice services can also be accessed through VoiceAttack using
this plugin by jamescl604.
Unfortunately, program development is not as simple as cobbling together a bunch of parts like assembling a Lego creation, and presently local AI processing is beyond the reach of all but the highest end PC's, so implementing these new technologies into VoiceAttack as a native function is not as simple as 'insert tab A into slot A'.
For function driven voice communications with the computer, where input is first determined to either perform an action or provide a pre-programmed response, or to feed it instead to ChatGPT to formulate a response, I have already developed and released the OpenAI API Plugin for VoiceAttack. It provides access to most all OpenAI Technologies functions including Whisper speech-to-text, ChatGPT, Dall-E image generation/editing, and Completion, as well as Fine-Tuning and Embedding (which can be used for extremely powerful customizations and database interactions).
The new Assistants API (also in beta) by OpenAI is a very promising feature, and will take a lot of the work load off developers such as myself attempting to use current OpenAI tools to approximate such features already - it can be customized to execute pre-defined functions such as API calls (i.e. OpenWeather) based on user input at the same time it provides customized responses to user inputs, as well as other specific task such as files handling. It's so new, I'm still wrapping my head around all it can offer, and how it can be used. Once it is out of beta, I will include access to this API through my OpenAI Plugin for VoiceAttack, as well as any new models not presently supported by my plugin.
Right now, you can check out that awesome plugin by jamescl604 to get very realistic text-to-speech today in VoiceAttack, or you could develop a plugin which can take advantage of the Llama 2 API (if it has one) or the latest text-to-speech beta API by OpenAI. Definitely check out my OpenAI Plugin for VoiceAttack, and my AVCS CHAT profile which allows for verbal conversations with ChatGPT where responses are spoken back to users using our local system's text-to-speech voice, opening any code blocks in a notepad for manual review, etc.