I came here to ask this very thing, because I saw that you can get GPT-4 All to download and run on your computer.
Could something like this be a viable workflow?
use a command to open dictation mode, then say your prompt, then a command that runs a sequence that goes:
end dictation; paste dictation buffer into GPT window; read the response with TTS;
with an optional command to cut off the TTS if it's too long, or wrong.
I'm pretty sure I could at least get to the point where it pasted into the GPT window, but I'm not sure about the rest...
Bear in mind, I'm not looking to have an intellectual conversation or anything (and GPT-4 All isn't that bright anyway) I just want to give my virtual assistant some personality.
What is "GPT-4 All"? I googled for it, and came up empty - possibly due to the generic nature of the search terms, or that whatever "GPT-4 All" is doesn't have well established SEO (yet). AFAIK, the GPT-4 model is only available to paid customers but will become public and free eventually, replacing the GPT-3.5 model which currently powers OpenAI's ChatGPT application.
The flow you propose is certainly viable - but as I stated above in my previous reply to this thread, the quality of the dictation would dictate the quality of the input prompt you send to the OpenAI API, and thus the quality of the return from the AI. On top of that, the return time is dependent on a number of factors, first and foremost the number of tokens provided in the input prompt.
Since the return could come back fast or after several seconds, it might be smart to add some TTS notation at the start, such as, "Checking now..." or whatever is most appropriate - to prompt the user that input has been sent and will now wait for a return.
As an example on how to leverage custom system prompt(s), I provided the following example on VA Discord which I will share here, using the iRacing game as an example game:
One way you could get into using OpenAI for iRacing would be to create a specialization for this prompt, and then you can send prompts and get a response that would work well as TTS during the race. @Genide brings up a very good point that the time of a return can vary, and so it might be smart to design a system to gather all the TTS that would be used prior to the race, and then use some other system to trigger the responses, for example following the image below, turn by turn directions for each upcoming turn. This would eliminate the delay that could happen on a per-input/response basis. Using the example below, one might use the OpenAI API to send the specialization and the prompt (per turn) and gather the (per turn) text response to save as a text variable in VoiceAttack for use as needed, as the racer approaches each relative turn in the race. Using Road America as an example, I produced this very accurate response from the gpt-3.5-turbo model in 256 tokens:
In the above example for iRacing, one would first create a custom system prompt as appropriate, to be used when assembling the contents of the dictation as an input prompt to send to the API and wait for a return. On a per call basis, the predefined system prompt would be combined with the new dictation input prompt, so you'd only need to vary the actual dictation to send to the same system for its return.
Post processing of the return would be the only way to evaluate for the length of the return, aside from limiting the Maximum Length (in tokens), though I am not at all sure how anyone would evaluate if a return was wrong before hearing/reading it. With that being said, while AI does not seem to know the concept of "I don't understand your question" or even "I do not know the answer", you can design your custom system prompt which precedes your input prompt to allow for such things. You can obviously attempt to limit the response length with specific directions in your system prompt as I have done in my example above, too.
Check out the OpenAI playground and toy with the settings on the right. This playground is a way to test out send/return without programmatically sending these prompts and settings. Once you find something that works well, you'd jot down the details in order to assemble an actual API call that would be performed programmatically through an inline function in VoiceAttack, or a plugin for VoiceAttack. There are details on how this can be done in the documentation at OpenAI - these code examples can be converted easily to C# (or VB.net) using AI, too, though you'd want to test as you go as coding examples from AI can sometimes be incorrect or assumptive of potentially inaccessible or incompatible libraries.
If you really wanted to, you could leverage existing libraries for OpenAI, to simplify your coding efforts - though, to be fair, it is a very rudimentary thing to send a call to an API and wait for a return, and a library is not required.
https://platform.openai.com/playground?mode=chat