BIG things on the horizon! Now that the plugin has been released, I've had time to circle back to the most advanced feature of the OpenAI Plugin for VoiceAttack: EmbeddingsWhat are embeddings?Embeddings are a way to represent a body of text as an array of numeric values which capture the meaning and context of the text, allowing for comparisons between different texts. The OpenAI API offers a very fast and very affordable means to get these numeric values for a block of text, which are called 'embedding vectors'. The total number of embedding vectors generated for text is always the same, and the OpenAI Embeddings provides 1,536 float vectors for any text content. With such a high dimensionality, comparisons can have an increased degree of accuracy.
How do we use embeddings?Consider a database that contains many entries, each entry has various data fields, most importantly the text content and the 1,536 embedding vectors generated for that content. These are "float" number types, similar to a decimal. If the user asks a question, a system can first get the new embedding vectors for that question, and then compare those to all entries on file through something called "cosine similarity". Through that, we can discover each block of text in the database which is most similar to the question which was asked. Then, we can take any number of those most similar text entries and present THAT to ChatGPT (along with the original question), instead of only the question, and tell it to use the data along with the question to produce an appropriate response. By doing this, we can "feed" information to ChatGPT for which to base its response upon, for situations where it would NOT know this information (such as a help document, a wiki page, a book of short stories, etc.). Rather than using its own knowledge base, it can formulate an organic response using the provided data.
A personalized, local "brain" for our AI chat bots!
What is this? TL;DNR- Introduction of local database processing for new user inputs before sending them to ChatGPT.
- Ability to ignore irrelevant questions and respond using existing knowledge base.
- Database selection, topic specification, and subject refinement options for ChatGPT context plugin calls.
- Support for adding, viewing, editing, and removing documents and individual entries in the database, including PDF format.
- Command Action system to execute commands directly in VoiceAttack when a user input matches a specific entry.
- Contiguous subject system for reading entries using text-to-speech, with the ability to pause and resume reading from where it left off.
How will this work in OpenAI Plugin for VoiceAttack?A few new plugin contexts will be added to expand the currently lacking Embeddings context, as well as a handful of additional option VoiceAttack variables for the ChatGPT context(s). These will allow us to specific that new inputs should be processed against a local database, which would occur just after getting new user input and before sending that input to ChatGPT. When the question is NOT relevant to the similar text content provided to ChatGPT to help formulate a response, it will ignore it and just answer as it would normally using its existing knowledge base, else it will use the data to respond to the user input.
Before beginning a ChatGPT context plugin call, we can indicate the database to use, optionally a particular topic contained in that database referring only to a certain set of entries, and also optionally a particular subject of that topic to further refine the specifics of a particular call. By default, when not specified, the entire database would be queried.
Users will be able to add new documents in whole (even in .PDF format), or add individual entries. There will be a system to view, edit, or remove entries as well - individually, all entries, or by topic name, or topic + subject name.
An additional system will allow setting a Command Action value for an entry, and a way to indicate that when a user input matches an entry with such an action set, to execute the command directly in VoiceAttack rather than provide the user input to ChatGPT as a question to be responded to.
Another interesting new system will have a VoiceAttack variable we can set, indicating that Embeddings should treat all entries in a subject as contiguous. Once identified through contextual similarity to the user input, they can be read using text-to-speech entry by entry until paused/stopped, where it will save the index of where it left off in that topic + subject. This could allow us to feed a document to the database such as a book of short stories, where we could ask it to read one of them to us, or continue reading from where it last left off - all without contacting OpenAI API beyond the initial embeddings for the user input to match it to and identify the contents of the database to be read.
When will this be available?Because this system will be introducing an SQL database layer to the codebase of the OpenAI Plugin for VoiceAttack, it will be awhile before I can feel secure adding this to the public branch of this repo on GitHub. I expect it may be
late June or into July before everything is ready for prime time, so I intend to introduce an early Beta branch to the GitHub repository. This will allow interested users to begin testing and trying out this new system, and help out by give me the feedback I need to ensure performance and functions are consistent across all systems. If all goes well, I should have this Beta branch available
in a few weeks, but again due to the complexity of this refactor, public testing and some feedback will be required before I merge it with the Main branch and push this update to everyone.
Pics or it Didn't HappenSo far, I have been testing the loading phase of the database which occurs once when VoiceAttack is loaded, and the cosine calculations of the new embedding vectors against a test set of 25,000 entries, with a goal of optimizing the speed of these functions. For reference, a database of such size would contain about ten 300+ page documents. I have gotten the loading of the database to just over 8 seconds for such a massive test database (down from 24 seconds!), and the calculations return down to just 0.189, all achieved by parallelizing these tasks across all CPU* cores on the PC:
*(I should note that my CPU is an AMD R9 3900X with 12 Cores and 24 Threads which this 'Malcom' VM above has full access to, so I will be keen to discover how optimized this will be on systems with fewer cores/threads)Thanks for all the feedback and support so far - hope you all are enjoying the concept of real AI tools in our VoiceAttack profiles & commands as much as I am!!