Author Topic: WhisperAttack - OpenAI Whisper Integration with VoiceAttack (Read 419 times)

DirtyPaws · « **on:** January 02, 2025, 12:38:14 AM »

Hey Folks

Just wanted to let you know that together with SeaTechNerd83 and with the help of existing projects by hradec and BojoteX we have created a script that uses OpenAI's Whisper as voice recognition instead of the windows 95 equivalent defaults. Utilising ones GPU using CUDA it takes on average 2.5seconds to translate a sentence, with ability to run 'base' and 'tiny' with even faster times

So far it understood every command from me with no mistakes - this is an early build and I'm hoping to get something a bit more robust in the future

At the moment its injecting transcribed words through the following function:

# 5) Send recognized text to VoiceAttack
if recognized_text:
if not os.path.isfile(voiceattack_exe_path):
logging.error(f"VoiceAttack.exe not found at {voiceattack_exe_path}")
return
try:
subprocess.call([
voiceattack_exe_path,
'-command',
recognized_text
])
logging.info(f"Sent recognized text to VoiceAttack: {recognized_text}")
except Exception as e:
logging.error(f"Error calling VoiceAttack: {e}")
else:
logging.info("No recognized text to send to VoiceAttack.")

This means that users running Voice Attack from program files or VoiceAttack as administrator will be hassled by a UAC message. There has to be a more eligant way of injecting. If someone knows please let me know (or better still fork the Repo)

Many Thanks,

https://github.com/nikoelt/WhisperAttack

DirtyPaws · « **Reply #1 on:** January 03, 2025, 12:17:35 AM »

Quick update. It's now running a server in background. Now speech recognition is less than a second for a paragraph of speech. Like Usain Bolt in the Olympics this engine now demolishes windows speech in every aspect - , it's not even close

The way Voice Attack handles UAC when ran as administrator is problematic. For those of you needing Admin Privileges I have provided a workout with task scheduler. I would still like to know if there is a better way to inject commands

It would be good if VA had a port injection, it would be even better if you simply included whisper as part of voice recognition. It boggles me that you haven't

Gary · « **Reply #2 on:** January 03, 2025, 01:06:21 AM »

The command line approach to executing commands was added for instances where maybe you have a desktop shortcut - it was never intended to fully control VoiceAttack. Most would simply create a plugin for VoiceAttack in order to control it directly- hence the reason there is no "port injection". See the help documentation section titled, 'VoiceAttack Plugins (for the truly mad)' (press F1 while VoiceAttack has focus).

In the coming year, I will be exploring alternatives to the Windows speech engine(s). Latency is the biggest hurdle, as a whole second for any amount of recognition isn't something an everyday VA user would tolerate. If you are looking for something with a bit more speed with Windows' speech engines, take a look at the 'Restricted Continuous Speech' and 'Continuous Speech' options for commands.

In regards to UAC, there really is no good reason (except for a very few, and especially not for games) for anybody to be running applications as an administrator. VA suggests this, because many inadvertently do run their applications as an administrator and promptly write in 'Why does VA not work with my game?'. The suggestion is then always to not run their game/app as an administrator, but running VA as an admin will get them by in a pinch.

Congrats on getting Whisper going - that's a nice skill to have. That said, I will suggest that you rephrase your post and remove the reference to the special olympics, as that is highly inappropriate. Thanks!

DirtyPaws · « **Reply #3 on:** January 03, 2025, 06:08:43 AM »

Thank you for the info. I won't bother you anymore with any questions.

Over the years, you've had many people coming onto your forums—some with disabilities, others who speak English well enough for services like Alexa and Siri to understand them—but are unable to get Windows Voice Recognition to work. The replies on those threads speak for themselves.

In regards to what you have said about latency, I'm going to be blunt: you are wrong. Here is a real-time transcription with a 140 MB model—I recommend you play it back with the audio on, just to really let it sink in:

https://private-user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzU5MDg4MTEsIm5iZiI6MTczNTkwODUxMSwicGF0aCI6Ii8xOTkxMjk2LzE5NDkzNTc5My03NmFmZWRlNy1jZmE4LTQ4ZDgtYTgwZi0yOGJhODNiZTdkMDkubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDEwMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAxMDNUMTI0ODMxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTM2OWRiMjNlYjA4ZDk4OTkzYzgzY2EzNTYyMTkyZmQzNThkMWY0NTRjMjM1NGIzYjIzNTVjOTg2NDYyZGEwMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.iLIlrJ112scIQHtM9yoDeM__eluzE0ypqudaIHyECeY

Using the current implementation I linked above, I get an average transcription time of 0.2 seconds. This is using a model called small.en at 4x; there's also a base model at 7x and a tiny model at 10x.

see: https://cdn.discordapp.com/attachments/809527129422430218/1324713709506924605/image.png?ex=67792748&is=6777d5c8&hm=03f7df9473d2bb6fa92bc8f43e57435e618cfb18833b387f267cfe6be8bacafa&

I have huge respect for the software you've written, but its dependency on outdated voice recognition is its Achilles’ heel, and it's a prime example of how software is only as good as its weakest link. Your users deserve better and I hope you will take this to heart

Before I go, here's something that might set you on the right path: https://github.com/ggerganov/whisper.cpp

Gary · « **Reply #4 on:** January 03, 2025, 09:05:28 AM »

If you look a little closer to what I had actually written, alternate engines are not off the table. I have allocated my available resources to work on other accessibility features that can be used now, by people that use the software as it stands now. If you have the availability to create a plugin that will help this multitude of individuals that you describe to use NOW, then I implore you to direct it, write it, document it and support it (including the technical hurdles that come with a large Windows installation). Trust me when I say that simply tucking a speech engine under this thing simply will not fly, and the resources to make that happen significantly outweighs the immediate benefits of current efforts. Otherwise, this would have already been done.

I see you also didn't read my part about the special olympics. I will fix that for you now.

PS - your first link is broken.

Author Topic: WhisperAttack - OpenAI Whisper Integration with VoiceAttack (Read 419 times)

DirtyPaws

WhisperAttack - OpenAI Whisper Integration with VoiceAttack

DirtyPaws

Re: WhisperAttack - OpenAI Whisper Integration with VoiceAttack

Gary

Re: WhisperAttack - OpenAI Whisper Integration with VoiceAttack

DirtyPaws

Re: WhisperAttack - OpenAI Whisper Integration with VoiceAttack

Gary

Re: WhisperAttack - OpenAI Whisper Integration with VoiceAttack