Author Topic: OpenAI API Plugin for Voiceattack (ChatGPT)  (Read 20573 times)

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
OpenAI API Plugin for Voiceattack (ChatGPT)
« on: May 07, 2023, 08:23:43 PM »
OpenAI API Plugin for Voiceattack
by SemlerPDX



The OpenAI VoiceAttack Plugin provides a powerful interface between VoiceAttack and the OpenAI API, allowing us to seamlessly incorporate state-of-the-art artificial intelligence capabilities into our VoiceAttack profiles and commands.



I'm so excited to bring the power of true artificial intelligence to VoiceAttack through this plugin for all profile and command builders out there interested in working with OpenAI Technologies in VoiceAttack! I know everyone assumes that now that this technology is available, it will be easy to incorporate into existing programs or workflows, but the reality is that this is a brand new technology being made available and until some aspects of it become more accessible, working with the OpenAI API itself is a great way to get our foot in the door and start taking advantage of this awesome power right now.

All of the known limitations of these AI models apply here, ChatGPT will boldly state incorrect facts with high confidence at times, and we should always double-check or test responses - only difference is now, we can berate it verbally and ask for a correction which it can speak back to us!





We can use raw text input, dictation text, or captured audio from VoiceAttack as input prompts for ChatGPT, and we can receive responses as a text variable to use as we wish, or set it to be spoken directly and specifically tailored for text-to-speech in VoiceAttack. We can also perform completion tasks on provided input with options for selecting the GPT model (and more), processing audio via transcription or translation into (English) text using OpenAI Whisper, and generate or work with images using OpenAI Dall-E.


- Comprehensive Wiki and Samples for Profile Builders -



This plugin also features OpenAI Moderation to review provided input and return a list of any flagged categories. Lastly, we can use the plugin to upload, list, or delete files for fine-tuning the OpenAI GPT models, or make use of OpenAI Embedding, which returns a string of metadata that can be parsed and used as desired. With this plugin, we can access a wide range of OpenAI functionality with ease, directly from within VoiceAttack.



Find complete details, download link, and documentation on GitHub:
OpenAI Plugin for VoiceAttack




If you enjoy this plugin, Click this Pic to check out my AVCS Profiles:

(AVCS CHAT is the first ready-to-use public profile powered by this OpenAI Plugin for VoiceAttack!)
« Last Edit: August 13, 2023, 01:30:46 PM by SemlerPDX »

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack
« Reply #1 on: May 08, 2023, 12:39:44 PM »
I have had to release a day-one update. Apparently, the "Microsoft.Bcl.AsyncInterfaces.dll" assembly must also be placed into the VoiceAttack shared assemblies folder along with the other 4 .dll's already put there by this plugin.

 Please update when able - and this will resolve any former issues with this assembly.

Code: [Select]
MINOR UPDATE - v1.1.0.0 Changelog May-8-2023

Fixes/Improvements:
 - Missing "Microsoft.Bcl.AsyncInterfaces.dll" resolved
 - Configuration.CheckSharedAssemblies() added this assembly to VoiceAttack shared assemblies list

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack
« Reply #2 on: May 10, 2023, 12:22:52 PM »
An occasional error may occur when contacting the OpenAI API through use of any application, including any VoiceAttack profiles using the OpenAI Plugin such as my own AVCS CHAT profile. The GetResponse phase (and the 'thinking' sound, in AVCS CHAT) may seem like it is looping infinitely, but in fact, it is waiting for a return from the OpenAI API. Eventually, it may result in no response (and any sounds just ending), or users being so confused that they restart VoiceAttack.

I wanted users to know that they can just press the "Stop" button in VoiceAttack, closing and restarting is not required to end any looping sounds or a seemingly endless wait for a response from OpenAI API requests. Users can then immediately try their same input again - though note that any 'continuing session' will have ended due to the use of a "Stop" command, and the contextual memory of recent input/response pairs will be cleared, starting fresh again.

The error which would appear in the "openai_errors.log" may look like this:
Code: [Select]
==========================================================================
OpenAI Plugin Error at 2023-05-10 9:30:17 AM:
System.Exception: OpenAI Plugin Error: Error at chat/completions
(https://api.openai.com/v1/chat/completions) with HTTP status code: 429. Content: {
  "error": {
    "message": "That model is currently overloaded with other requests. You can retry
 your request, or contact us through our help center at help.openai.com if the error
 persists. (Please include the request ID a123456b7cdef890a12b3c456d789e0f in your
 message.)",
    "type": "server_error",
    "param": null,
    "code": null
  }
}

   at OpenAI_VoiceAttack_Plugin.ChatGPT.<Chat>d19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at OpenAI_VoiceAttack_Plugin.OpenAIplugin.<PluginContext>d40.MoveNext()
==========================================================================


Note on this 'overload' error from OpenAI API:
In the whole of development and testing, and all of my calls to OpenAI API since I got beta access in December, I have never seen these 'overload' messages resulting in an API call failure. There is no way I could have anticipated it beyond the current exception handling which already occurs, however I also read the OpenAI Discord channels often, and we are NOT the only ones surprised and bothered by this seemingly brand new issue with the OpenAI API.  This company has had to scale up faster than any new website in recent history - they went from a hundred thousand users to over ten million in less than two months, and I imagine each month of 2023 that goes by sees more and more tools such this allowing more and more users to access the OpenAI API, so they will need to scale it up accordingly.

We just have to wait and deal with the 'overloads' that happen now and then. Just know it is not a fault of the plugin systems, the libaries I am using to access the OpenAI API, or anything to do with individual user accounts, and there is nothing we as users can change or do better. This is as good as it gets for now, and will only get better in time.


edit:  I'm honored and humbled that someone pinned this topic to this forum section! thanks! :D
« Last Edit: May 18, 2023, 09:46:39 AM by SemlerPDX »

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack
« Reply #3 on: May 30, 2023, 03:41:53 PM »
BIG things on the horizon! Now that the plugin has been released, I've had time to circle back to the most advanced feature of the OpenAI Plugin for VoiceAttack: Embeddings


What are embeddings?
Embeddings are a way to represent a body of text as an array of numeric values which capture the meaning and context of the text, allowing for comparisons between different texts.  The OpenAI API offers a very fast and very affordable means to get these numeric values for a block of text, which are called 'embedding vectors'. The total number of embedding vectors generated for text is always the same, and the OpenAI Embeddings provides 1,536 float vectors for any text content.  With such a high dimensionality, comparisons can have an increased degree of accuracy.


How do we use embeddings?
Consider a database that contains many entries, each entry has various data fields, most importantly the text content and the 1,536 embedding vectors generated for that content. These are "float" number types, similar to a decimal.  If the user asks a question, a system can first get the new embedding vectors for that question, and then compare those to all entries on file through something called "cosine similarity". Through that, we can discover each block of text in the database which is most similar to the question which was asked. Then, we can take any number of those most similar text entries and present THAT to ChatGPT (along with the original question), instead of only the question, and tell it to use the data along with the question to produce an appropriate response.  By doing this, we can "feed" information to ChatGPT for which to base its response upon, for situations where it would NOT know this information (such as a help document, a wiki page, a book of short stories, etc.).  Rather than using its own knowledge base, it can formulate an organic response using the provided data.

A personalized, local "brain" for our AI chat bots!


What is this? TL;DNR
  • Introduction of local database processing for new user inputs before sending them to ChatGPT.
  • Ability to ignore irrelevant questions and respond using existing knowledge base.
  • Database selection, topic specification, and subject refinement options for ChatGPT context plugin calls.
  • Support for adding, viewing, editing, and removing documents and individual entries in the database, including PDF format.
  • Command Action system to execute commands directly in VoiceAttack when a user input matches a specific entry.
  • Contiguous subject system for reading entries using text-to-speech, with the ability to pause and resume reading from where it left off.

How will this work in OpenAI Plugin for VoiceAttack?
A few new plugin contexts will be added to expand the currently lacking Embeddings context, as well as a handful of additional option VoiceAttack variables for the ChatGPT context(s). These will allow us to specific that new inputs should be processed against a local database, which would occur just after getting new user input and before sending that input to ChatGPT. When the question is NOT relevant to the similar text content provided to ChatGPT to help formulate a response, it will ignore it and just answer as it would normally using its existing knowledge base, else it will use the data to respond to the user input.

Before beginning a ChatGPT context plugin call, we can indicate the database to use, optionally a particular topic contained in that database referring only to a certain set of entries, and also optionally a particular subject of that topic to further refine the specifics of a particular call. By default, when not specified, the entire database would be queried.

Users will be able to add new documents in whole (even in .PDF format), or add individual entries. There will be a system to view, edit, or remove entries as well - individually, all entries, or by topic name, or topic + subject name.

An additional system will allow setting a Command Action value for an entry, and a way to indicate that when a user input matches an entry with such an action set, to execute the command directly in VoiceAttack rather than provide the user input to ChatGPT as a question to be responded to.

Another interesting new system will have a VoiceAttack variable we can set, indicating that Embeddings should treat all entries in a subject as contiguous. Once identified through contextual similarity to the user input, they can be read using text-to-speech entry by entry until paused/stopped, where it will save the index of where it left off in that topic + subject. This could allow us to feed a document to the database such as a book of short stories, where we could ask it to read one of them to us, or continue reading from where it last left off - all without contacting OpenAI API beyond the initial embeddings for the user input to match it to and identify the contents of the database to be read.


When will this be available?
Because this system will be introducing an SQL database layer to the codebase of the OpenAI Plugin for VoiceAttack, it will be awhile before I can feel secure adding this to the public branch of this repo on GitHub. I expect it may be late June or into July before everything is ready for prime time, so I intend to introduce an early Beta branch to the GitHub repository. This will allow interested users to begin testing and trying out this new system, and help out by give me the feedback I need to ensure performance and functions are consistent across all systems. If all goes well, I should have this Beta branch available in a few weeks, but again due to the complexity of this refactor, public testing and some feedback will be required before I merge it with the Main branch and push this update to everyone.


Pics or it Didn't Happen
So far, I have been testing the loading phase of the database which occurs once when VoiceAttack is loaded, and the cosine calculations of the new embedding vectors against a test set of 25,000 entries, with a goal of optimizing the speed of these functions. For reference, a database of such size would contain about ten 300+ page documents. I have gotten the loading of the database to just over 8 seconds for such a massive test database (down from 24 seconds!), and the calculations return down to just 0.189, all achieved by parallelizing these tasks across all CPU* cores on the PC:

*(I should note that my CPU is an AMD R9 3900X with 12 Cores and 24 Threads which this 'Malcom' VM above has full access to, so I will be keen to discover how optimized this will be on systems with fewer cores/threads)


Thanks for all the feedback and support so far - hope you all are enjoying the concept of real AI tools in our VoiceAttack profiles & commands as much as I am!!
« Last Edit: March 21, 2024, 01:26:08 PM by SemlerPDX »

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack
« Reply #4 on: July 13, 2023, 01:38:37 PM »
Well, I certainly had high hopes when I wrote the Embeddings announcement above! HAHA!

As it turns out, this Embeddings project is a lot deeper than I had initially anticipated, and so due to the depth and scope of converting documents/PDFs into a truly useful Embeddings Database, I have spun that aspect of this project off into its own application outside this OpenAI API Plugin for VoiceAttack. This will help me keep my head straight as I develop a proper app with an intuitive GUI completely separate from this codebase entirely. I will still need to refactor this OpenAI Plugin to allow for the alternative embedding vectors processing optional flow path for a ChatGPT context, along with other minor changes, but there will be no point to updating this Plugin until I have completed the application for users to create their own local Embeddings Databases.

It may be some time before I complete this GUI app on the side, but I am whittling away at it day by day - will be a long wait, but it will be worth it. If a thing is worth doing, it's worth doing well, and I don't do anything if it isn't worth doing. Again, thanks for all the support and coffees, I really appreciate feedback!

Icacus010

  • Newbie
  • *
  • Posts: 2
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #5 on: September 05, 2023, 09:35:26 AM »
Is possible to just bring the versatile interaction a AI can give to the commands itself?

For example, instead of having to configure the trigger world and it's variantion, you can actually teach the AI the key and the association, for example: the "L" is a shortcut to turn on/off the light on the game, and with a AI capacity of undertanding, you can actually say it in many different ways and it will understand the command and execute it, and depending on a pre-configured behavior it will respond in a role, like a ship responding to a captain or a automated suit to it's user in a sci-fi game.

English is not my first language, so I apologize for any confusion.

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #6 on: September 05, 2023, 11:06:24 AM »
Yes, this is possible.  It is highly advanced and would require dedication, learning, testing, and coding - but what you describe is absolutely possible.  You would also need to know how to (or learn how to) work with databases, such as using SQL or Entity Framework, so that you can build it with entries of commands, edit as needed, and add new commands when desired down the road.  Programming skills in C# or Visual Basic would be a requirement for such a complex system, but the tools are present and available in the OpenAI API Plugin for VoiceAttack. 

This is called the OpenAI "Embeddings" API, and it gets deep very fast.

First of all, you would still need to have a command with the appropriate phrase(s) such as "Turn On Lights", with the action inside to press the "L" key, for example.  Next, you would build a simple database where each entry contains the actual command to execute (if this entry is a match) as well as a sentence which embodies the function - in this case, literally "Turn on the lights" - and then provide that sentence to Embeddings to gather the 1,501 float vectors for that entry.  Doing this for each command would build a database where each entry has its own float vectors, a command to execute, and a plain English sentence describing the function/command.

Finally, you would create a command which could execute anytime a command phrase is unrecognized, and inside that command, you would gather the last spoken phrase (which was unrecognized) and provide that phrase to Embeddings to gather the 1,501 float vectors for it, and then use cosine similarity to compare them to every set of float vectors in every entry in the database, using an appropriately high enough similarity threshold to be accurate (in my tests, I have found 0.815F to work well).

You would take the first (most similar) result and access the 'actual command' field for that entry, and then use that as the value of a new VoiceAttack text variable, such as "~myCommand" - and use the action to "Execute another command (by name)", and in the field where the command goes, use the text variable in a token, {TXT:~myCommand} to execute the command.

This is a very highly complex concept, and while it may seem easy for me to write out, this is because I have actually tested Embeddings and database searching via float vectors cosine similarity, and it works well (and fast!).  Things which may not work well is the concept of using the 'unrecognized command' trigger and accessing what was previously spoken because raw unrecognized speech can have serious flaws due to homophones and other factors.  That being said, even if you mean to say, "It's a bit dark in here, we should turn on the lights" and Windows heard "Franks a lit bard in here we show turn on the lights", so long as there is enough of the "turn on lights" preserved, the cosine similarity should still accurately discover this entry in a database full of such phrases when each phrase has its own 1,501 float vectors generated by OpenAI Embeddings, and where one of the entries is "Turn on Lights".

If you're code savvy, you could make this sort of thing right now using my OpenAI API Plugin for VoiceAttack - the tools are there for the AI processing part, but you would have some heavy lifting to do as far as setting up the database and programming systems to add/edit/delete entries - OpenAI API Plugin will merely provide access for you to get new Embeddings float vectors for entries.

As far as a dedicated application which can do this (create/manipluate an embeddings database), I have one cooking in the oven but it will be a very, very long time - maybe Summer 2024 if all goes well, God willing and the creeks don't rise.
« Last Edit: September 05, 2023, 12:12:42 PM by SemlerPDX »

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #7 on: September 05, 2023, 12:35:09 PM »
As an additional note on this concept, being so very advanced and requiring so much work, you may find that dynamic phrases and Wildcards are much more approachable manners of allowing more natural speech for command triggers.  I make frequent use of homophones and dynamic command phrases paying specific attention to the many ways a human might possibly speak a particular command.

For example, I have a command to "Open the Save File Folder" in my AVCS CORE profile.  In order to expand the ways this can be spoken, I have added additional keywords, and accounted for some homophones which I noticed can occur at times for the word "the", and make its use optional as well.  In the end, the dynamic command phrase I created for this in my public profile is this:
Code: [Select]
[View;Show;Display;Open] [me;] [the;a;uh;] Save [File;Files;] Folder
While this results in an additional 95 command variations for this one command, that is no trouble at all for VoiceAttack profiles (or Windows), and the value it adds by allowing such conceptual speech without enforcing a rigid restrictive command phrase is priceless.

At other times, we can make use of the experimental/somewhat-unsupported Wildcards feature of VoiceAttack.  In my AVCS4 BMS profile, I want virtual F-16 pilots flying in that simulator to be able to talk to the Tower, Air Traffic Control, etc. in very natural ways, yet the command actions are still rigid & restrictive because they have to be - a request to land will always need to have those words "request landing" of course.

My solution was to allow Wildcards, and they work well (for the most part).  When they don't work well for some commands or some people, we just recommend they put a pause in their "natural" speech before the "actual" command.  It looks like this, for example:
Code: [Select]
*Request LandingWhen executed, this command would press the keys for this Radio command in the game.

This type of use of Wildcard (asterisk before) allows users to say ANYTHING before the actual command phrase, such as,
"Osan Tower, this is Goblin 1-1, Request Landing"

...on top of this, I encourage users to add their own additional natural variations as needed, also as Wildcard commands, with the only action to Execute their "true" command (by name), for even more natural speech for commands.  A user might create a command like this:
Code: [Select]
*Requesting Landing Clearance...with the only action to 'Execute another command (by name)' of "Request Landing".

This would allow that user to say something along the lines of,
"Osan Tower, this is Goblin 1-1, Requesting Landing Clearance"
...where the command "Request Landing" would ultimately execute.


When developing personal or public profiles for VoiceAttack, there are many existing tools and means to allow for more relaxed and natural speech input for command phrases.  While AI could be used (through Embeddings), it should be noted that the speed of action between saying a phrase and a command taking action, the latency, is a key factor and needs to be minimized at all costs favoring the fastest system overall.  Returns from OpenAI Embeddings are MUCH faster than returns from ChatGPT or Completions APIs for example, a short second or two for Embeddings compared to 3-5 seconds (or longer) for a response from ChatGPT/Completions.  Still, 1-2 seconds is markedly longer than the typically fractions of a second range that direct command recognition and execution in VoiceAttack operates at.

What I have shown above using homophones and dynamic command phrases, as well as experimenting with Wildcards, will always be much faster than ANY system powered by AI such as the Embeddings example I detailed in my previous reply - not to mention easier and more approachable for most people.
« Last Edit: September 06, 2023, 10:37:06 PM by SemlerPDX »

Icacus010

  • Newbie
  • *
  • Posts: 2
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #8 on: September 05, 2023, 05:13:11 PM »
OK, I understand, I am fascinated by this AI and voice commands, but I'm not a Programmer, very far from it actually, I just like to play video games. But I get it, using a AI in this way need a lot of work, the coding and the problem of latency is indeed a huge problem, because some games or tasks need to be at least as fast as you pressing the keys yourself and a AI tends to generate slow answers because of its processing needs to understand and generate one.

About the homophones, dynamic command phrase and wildcards, I actually didn't know about it, it seems I need to understand VA better before wanting an AI to talk to and make things easier. In any way, I am actually very anxious to see AI implemented in VA, be able to hold a conversation while giving commands in a Space battle situation would be awesome.

I appreciate your explanation and will follow the development of this plugin. Thank you.

PS: If you use a trained AI (yours's or another's) and focus it access in files that contains it's "role in the game" and the "profile" of the game, so it can have an focused database to get data from while giving it the power to press keys, would help with answers latency and make it easier to configure?

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #9 on: September 06, 2023, 01:11:57 PM »
PS: If you use a trained AI (yours's or another's) and focus it access in files that contains it's "role in the game" and the "profile" of the game, so it can have an focused database to get data from while giving it the power to press keys, would help with answers latency and make it easier to configure?

No matter what, anytime we need to contact the OpenAI API and wait for a return, it will take upwards of 10 times the speed of direct command recognition triggers in VoiceAttack which immediately execute our actions for those commands as I described above.

In fact, adding any steps to a question/answer style system only increases the time before a command is executed.  For example, in my AVCS CHAT profile using my OpenAI Plugin, the integrated Whisper processing (on by default) first must capture the audio of the user input and then stitch those together into a single audio file, and send that off for transcription via the Whisper API call which takes a moment to wait for the return, and THEN use that transcription as the input for a subsequent API call to ChatGPT, which then takes several moments to return.  The tradeoff is worth it, and Whisper doesn't take that long - but then again, I'm not expecting it to answer fast, unlike pressing keys in games as you mentioned.

It is best to think outside the box, but also not to immediately think of AI as the answer to a complex problem or a goal for VoiceAttack which you want to accomplish.  First it would be best to see what is possible by default in VoiceAttack, which is why I posted such an in-depth example and use cases for dynamic command phrase and experimenting with Wildcards.

As far as pressing keys in games, favor the fastest system overall, unless it is something non-crucial or not time sensitive.

samreynolds28

  • Newbie
  • *
  • Posts: 2
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #10 on: September 27, 2023, 08:24:36 AM »
So interestingly enough, I am trying to find a way to question SC Wiki and other such material so that whilst I am in Star Citizen, I can ask questions and learn. Whether that be about the ship im in, LORE about the planet im on etc.

This doesnt need a quick answer as its conversational and technically not linked to any game input. Would you be able to build this into voice attack? Similar to feeling like you have a proper conversational AI on your ships computer!

To break this down to an application outside of StarCitizen and other such games. My struggle is to find an LLM that I can run in the background that has voice activation/always listening.

My assumptions are that Voice attack negates all these issues im having.

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #11 on: September 27, 2023, 08:52:15 AM »
Of course!  In addition to this plugin, I have also released the first public profile using it.  AVCS CHAT is a profile which focuses on the ChatGPT context specifically, and uses the Whisper context modifier to supply the user input through direct speech transcription.

If you'd like to have a vocal conversation with ChatGPT which runs perpetually in the background, remembers the last thing you asked (or it answered), and have its replies spoken through Text-to-Speech, then this profile is what you are looking for.  It includes a System Prompt menu where you can Add/Create/Edit/Select various System Prompts which are provided to ChatGPT at the start of a new conversation in order to refine and alter its behavior throughout that conversation.

Check out AVCS CHAT here:
https://forum.voiceattack.com/smf/index.php?topic=4520.0

samreynolds28

  • Newbie
  • *
  • Posts: 2
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #12 on: September 27, 2023, 10:04:01 AM »
Thank you so much for working on this. Il test this out this evening and if all goes well will recommend to my org and others.

Such a good platform to utilise AI. Kudos to you for the innovative thinking.

joney0210

  • Newbie
  • *
  • Posts: 2
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #13 on: December 05, 2023, 01:12:04 PM »
thanks for your work. that is good.

VA is user windows system voice engine to recognition voice commands. is there a way use ChatGPT as voice recognition engine?

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 291
  • Upstanding Lunatic
    • My AVCS Homepage
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #14 on: December 05, 2023, 01:39:50 PM »
ChatGPT is not capable of audio processing or speech recognition.  The speech-to-text capabilities of this OpenAI Plugin for VoiceAttack are powered by access to the Whisper API at OpenAI, and this involves a turn-around time between speaking and uploading that audio clip to the Whisper API and then waiting for the return, and then any subsequent post-processing on that text by some means (which would also add to the turn-around time from speech to action).

I have detailed the possibilities and challenges for such things in a previous reply, just click the link below. TLDNR is that regardless of anything, the most important limiting factor for voice control (such as pressing keys in a game) is the time between what we say and the computer taking action such as pressing that appropriate keyboard key.

We don't want to introduce any system which could potentially increase that time, so we can keep it as low as possible.  For this reason, and others I detail in the reply linked below, while it is certainly possible with some work and experience in coding, it may not be ideal unless it is for information or tasks which are not time sensitive the way keyboard keypress commands are.

https://forum.voiceattack.com/smf/index.php?topic=4519.msg21067#msg21067

joney0210

  • Newbie
  • *
  • Posts: 2
Re: OpenAI API Plugin for Voiceattack (ChatGPT)
« Reply #15 on: December 05, 2023, 02:49:18 PM »
thanks for reply

I'm use VA for more 2 years.
but, English is not my mother language. so, VA can not fully understand what am i saying.
for that reason, if VA can use Chatgpt as voice recognition engine, will greatly increases accuracy of recognition.

as i know, maybe i'm wrong, VA use voice engine for speak to text, and use that text to check command list. if match, VA will execute that command.

Chatgpt wisper can do same thing like windows voice engine doing.

I just want to know if there is an easy way to do this.
It might be difficult, especially since I don't understand programming at all.

Someone used Chatgpt to make an assistant program similar to VA, but it is still in its early stages and does not have the powerful functions of VA. Maybe you are interested to know.

https://github.com/ShipBit/wingman-ai#installing-wingman-ai