Author Topic: VoiceAttack and AI (Read 2586 times)

Mike308 · « **on:** March 27, 2023, 05:34:55 PM »

Hey Gary,
Have you looked at the feasibility of tapping AI-generated text to feed into a TTS voice profile? I have been fascinated with the evolving GPT/OpenAI capabilities and it struck me that there might be a nice synergy between a system that write text and a system that combines speaking text with carrying out all sorts of actions. Just curious!

SemlerPDX · « **Reply #1 on:** March 27, 2023, 08:20:01 PM »

VoiceAttack is mostly used to combine predefined speech phrases with a set of predefined actions to carry out when one of those phrases is recognized by the speech engine.

AI, such as the products offered by OpenAI and their API or ChatGPT, requires free formed user input text to evaluate and produce a text response. These responses may even provide formatted text, such as code examples in a code block.

Free form speech to text is only as good as the dictation capabilities of the engine used, and I'm sure many here will agree with me that Windows Speech Recognition for dictation is spotty at best. Given that the quality of a return from AI is only as good as the quality of the input(s), the concept of "garbage in, garbage out" comes to mind.

With good dictation software, it would not be difficult to create a voice interaction alternative to something like ChatGPT, possibly using the OpenAI API for example, but out of the box using only WSR and VoiceAttack, this may be difficult to achieve (if even possible). Just my 2 cents - been working with OpenAI toys since December, if it was a steam game, I'd easily have over a hundred hours logged. There are plenty of curveballs that would need to be accounted for, even so far as limiting the length of any given response or giving flow control for these to users somehow.

llazzllo · « **Reply #2 on:** April 05, 2023, 10:22:03 PM »

I came here to ask this very thing, because I saw that you can get GPT-4 All to download and run on your computer.

Could something like this be a viable workflow?

use a command to open dictation mode, then say your prompt, then a command that runs a sequence that goes:

end dictation; paste dictation buffer into GPT window; read the response with TTS;

with an optional command to cut off the TTS if it's too long, or wrong.

I'm pretty sure I could at least get to the point where it pasted into the GPT window, but I'm not sure about the rest...

Bear in mind, I'm not looking to have an intellectual conversation or anything (and GPT-4 All isn't that bright anyway) I just want to give my virtual assistant some personality.

SemlerPDX · « **Reply #3 on:** April 07, 2023, 01:39:07 PM »

Quote from: llazzllo on April 05, 2023, 10:22:03 PM

I came here to ask this very thing, because I saw that you can get GPT-4 All to download and run on your computer.

Could something like this be a viable workflow?

use a command to open dictation mode, then say your prompt, then a command that runs a sequence that goes:

end dictation; paste dictation buffer into GPT window; read the response with TTS;

with an optional command to cut off the TTS if it's too long, or wrong.

I'm pretty sure I could at least get to the point where it pasted into the GPT window, but I'm not sure about the rest...

Bear in mind, I'm not looking to have an intellectual conversation or anything (and GPT-4 All isn't that bright anyway) I just want to give my virtual assistant some personality.

What is "GPT-4 All"? I googled for it, and came up empty - possibly due to the generic nature of the search terms, or that whatever "GPT-4 All" is doesn't have well established SEO (yet). AFAIK, the GPT-4 model is only available to paid customers but will become public and free eventually, replacing the GPT-3.5 model which currently powers OpenAI's ChatGPT application.

The flow you propose is certainly viable - but as I stated above in my previous reply to this thread, the quality of the dictation would dictate the quality of the input prompt you send to the OpenAI API, and thus the quality of the return from the AI. On top of that, the return time is dependent on a number of factors, first and foremost the number of tokens provided in the input prompt.

Since the return could come back fast or after several seconds, it might be smart to add some TTS notation at the start, such as, "Checking now..." or whatever is most appropriate - to prompt the user that input has been sent and will now wait for a return.

As an example on how to leverage custom system prompt(s), I provided the following example on VA Discord which I will share here, using the iRacing game as an example game:

Quote

One way you could get into using OpenAI for iRacing would be to create a specialization for this prompt, and then you can send prompts and get a response that would work well as TTS during the race. @Genide brings up a very good point that the time of a return can vary, and so it might be smart to design a system to gather all the TTS that would be used prior to the race, and then use some other system to trigger the responses, for example following the image below, turn by turn directions for each upcoming turn. This would eliminate the delay that could happen on a per-input/response basis. Using the example below, one might use the OpenAI API to send the specialization and the prompt (per turn) and gather the (per turn) text response to save as a text variable in VoiceAttack for use as needed, as the racer approaches each relative turn in the race. Using Road America as an example, I produced this very accurate response from the gpt-3.5-turbo model in 256 tokens:

In the above example for iRacing, one would first create a custom system prompt as appropriate, to be used when assembling the contents of the dictation as an input prompt to send to the API and wait for a return. On a per call basis, the predefined system prompt would be combined with the new dictation input prompt, so you'd only need to vary the actual dictation to send to the same system for its return.

Post processing of the return would be the only way to evaluate for the length of the return, aside from limiting the Maximum Length (in tokens), though I am not at all sure how anyone would evaluate if a return was wrong before hearing/reading it. With that being said, while AI does not seem to know the concept of "I don't understand your question" or even "I do not know the answer", you can design your custom system prompt which precedes your input prompt to allow for such things. You can obviously attempt to limit the response length with specific directions in your system prompt as I have done in my example above, too.

Check out the OpenAI playground and toy with the settings on the right. This playground is a way to test out send/return without programmatically sending these prompts and settings. Once you find something that works well, you'd jot down the details in order to assemble an actual API call that would be performed programmatically through an inline function in VoiceAttack, or a plugin for VoiceAttack. There are details on how this can be done in the documentation at OpenAI - these code examples can be converted easily to C# (or VB.net) using AI, too, though you'd want to test as you go as coding examples from AI can sometimes be incorrect or assumptive of potentially inaccessible or incompatible libraries.

If you really wanted to, you could leverage existing libraries for OpenAI, to simplify your coding efforts - though, to be fair, it is a very rudimentary thing to send a call to an API and wait for a return, and a library is not required.

https://platform.openai.com/playground?mode=chat

llazzllo · « **Reply #4 on:** April 09, 2023, 04:51:29 PM »

Thanks for the advice! I'm probably going to make an attempt at this this week (once I learn to clone a github repository, lol).

GPT4All is a free, open source, and small (400k prompt:response pairs, about 4Gb of data) generative text Ai. It's available here:
https://github.com/nomic-ai/gpt4all

my main interest in it is having stupid conversations with my virtual assistant while I stream. I think there's probably an opportunity for comedy there.

SemlerPDX · « **Reply #5 on:** May 08, 2023, 09:54:06 AM »

Just came back to drop a couple links - I created a public OpenAI API Plugin for VoiceAttack so that any profile or command builders out there can start taking advantage of this powerful technology right now in VoiceAttack!
https://forum.voiceattack.com/smf/index.php?topic=4519.0

...and the first ready-to-use profile using the OpenAI Plugin for VoiceAttack, my own AVCS CHAT - a Voice ChatGPT Conversational profile:
https://forum.voiceattack.com/smf/index.php?topic=4520.0

Author Topic: VoiceAttack and AI (Read 2586 times)

Mike308

VoiceAttack and AI

SemlerPDX

Re: VoiceAttack and AI

llazzllo

Re: VoiceAttack and AI

SemlerPDX

Re: VoiceAttack and AI

llazzllo

Re: VoiceAttack and AI

SemlerPDX

Re: VoiceAttack and AI