Author Topic: PitGirl - my sim racing virtual assistant. iRacing, VR, Twitch, and now ChatGPT  (Read 2363 times)

Robertsmania

  • Newbie
  • *
  • Posts: 46
I participate in a lot of sim racing events on iRacing, driving in VR and usually stream on Twitch.

For some time now (since 2016?) I've had a virtual assistant in the form of a speech recognition and text to speech system running on VoiceAttack.  It started with a commercial plugin that was developed specifically to interact with the iRacing SDK.  We call her PitGirl.

Over time, I've added things to the system and developed my own plugins to interact with OBS, iRacing, Twitch IRC and now ChatGPT.

I can talk to her directly, and viewers on the Twitch chat can also use !pitgirl commands from the chat to change cameras on the stream's spectator view, request replays, checkout other cars, etc... 

In the past people would often gently troll the system and ask things like "!pitgirl what does Kris do after the race?"  I never felt that inclined to script responses to those kinds of questions, but now with the OpenAI ChatGPT integration - she answers them!

Here's a sort of highlight video showing many of the interactions that both I and the viewers have with PitGirl:

https://www.youtube.com/watch?v=AHKHq7zN2BM


Come by the stream some time, say hi and ask PitGirl a question!
https://twitch.tv/robertsmania

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 280
  • Upstanding Lunatic
    • My AVCS Homepage
Very awesome!! Had me hooked at the start of that YT vid - my family has seasons passes at Road America, along with a mess of friends, firmly planted between turn 6 and 7 just after the Corvette Bridge every weekend.  My Grandpa used to be the track Doctor, lots of fun summers at that track!



Wonderful that you got ChatGPT integrated into your stream - I love seeing how people make use of it for all sorts of things!  The Discord bot at my gaming group threw everyone for a loop when they realized it has a chat integration, too.  Fun stuff!!

Love the videos, best wishes and take care!

Robertsmania

  • Newbie
  • *
  • Posts: 46
Thats really neat to have a family connection to the track!  I've only been to a handful of big tracks here in California but one of the great things about iRacing is the opportunity to get familiar with racetracks all around the world.

As far as PitGirl and the VoiceAttack / Twitch / iRacing / OBS integration goes its been a lot of fun to develop and use while driving.

These are two of the core pieces that I did not develop:

The Digital Race Engineer, or DRE - https://www.thedigitalraceengineer.com
This is where PitGirl was born. I started using the Digital Race Engineer in 2016 when I was pretty new to iRacing and VR. I found that fumbling with the controls to try and set fuel for pitstops was a nightmare and a viewer suggested this as a potential solution. At that time, DRE was a commercially developed plugin for Voice Attack that integrated with the iRacing SDK. Being able to ask the system questions about what was happening in the race (like fuel requirements) and particularly being able to tell it what to do - “Pitgirl, set fuel to 3.7 gallons” was a a game changer. The developer still offers the legacy VoiceAttack system, but has gone on to develop a stand alone version. I’m deeply dependent on VoiceAttack for my other systems, so I don’t see myself switching, but encourage anyone to take a look and use DRE.
Its great.

Speech Chat - https://www.speechchat.com/
I use Speech Chat to hear the Twitch Chat. In the early days, everyone on the stream heard the voice, but people did troll it. They would push the limits of good taste to see what they could get it to say, put in huge numbers, post URLs, and stuff like that. Once I set it so that I’m the only one who hears the voice, all that stopped. In my experience, hearing the chat and responding ends up being much more like having a conversation and something I can do while racing. I have had windows for the the chat posted in VR, but in practice any time I tried to actually suffer the distracted driving of reading it would end in tears.


For a long time I just used VoiceAttack and DRE to handle the spectator view on my stream. I had custom voice commands to change the cameras, do simple replays, and manage the scenes for the stream in OBS. Those commands were cool and unique, but all really boiled down to VoiceAttack sending keystrokes to iRacing and OBS and the whole thing was terribly fragile. It had no idea if commands were successful, or what state things were in, and I had to reset it often when it went off the tracks.

Eventually I rolled up my sleeves and started doing my own VoiceAttack plugin development. It was the first time I had used C# and the first non trivial software development I had done for a couple decades. I have a computer science degree from before the turn of the century, and have had a career in video games as a designer - but have not been an active programmer for a long time. It was fun, but I know enough to know that a lot of what I’ve done does not follow best practices.

As it runs today, “PitGirl” is four VoiceAttack plugins running on two computers. There is my Driving Rig PC and the Stream PC. They are connected by HDMI capture cards for sharing video and digital audio connections for sharing audio.

On the Rig PC, VoiceAttack runs the Digital Race Engineer and a PitGirlSpeechCoordinator plugin I wrote. The speech coordinator has web sockets to get requests from the StreamPC plugins for when they want to say something and does its best to keep PitGirl from talking over herself. It knows when the DRE system is speaking, and can delay or decline requests to speak from plugins running on the StreamPC.

On the Stream PC VoiceAttack runs a PitGirlVAPlugin and TwitchVAPlugin.

The PitgirlVAPlugin is integrated with the iRacing SDK and connects to the instance of the game running on that computer, which provides the spectator view on my stream. This plugin provides command support to VoiceAttack to do direct integration with iRacing and OBS. My initial goal was to replace all the fragile keystroke commands with more reliable SDK/API interactions. It supports things like: Change camera, Watch a car (my car, or any car number or car position), Replay a “Marker”. That let me build higher level VoiceAttack speech recognition commands for things like:

Code: [Select]
PitGirl, set camera to chase
PitGirl, checkout the car ahead
PitGirl, show most exciting

It also has a loop that is constantly monitoring the iRacing telemetry and records these Marker events for any time anyone changes position, goes off track, broadcasts on the radio, etc. So rather than just relying on the iRacing next/previous incident access to the replay data, I have a whole data structure full of everything that the system recorded and can build high level commands like:

Code: [Select]
PitGirl, replay recent markers for car 5
The system will do an ad hoc replay of anything that car experienced in the last three minutes, doing nice OBS scene transactions, random camera selections, and voice overlay describing what happened.
The plugin also maintains html/javascript overlay elements for the stream so we can see which driver, car number, lap and position data both for real time when the spectator is watching the live race as well as historically so its still accurate when watching the replays.

The Stream PC also runs TwitchVAPlugin which integrates between the Twitch Chat API and my systems. This makes it so viewers can make requests to similar PitGirl commands that I use verbally, just from the Twitch Chat. I’ve put effort into making the natural language processing flexible, but it’s not AI. Users can say things like:

Code: [Select]
!pitgirl set camera to cockpit
!pitgirl checkout the car in position 4
!pitgirl replay overtakes for car 3 with the chopper camera

It works on dictionaries for commands: camera, replay, checkout, watch
And dictionaries for terms that relate to one or more commands: car number, camera name, marker type, etc.

The system scans what the user types in looking for matches in the command and terms dictionaries and builds a weighted list of the results. If there are matches, it runs the highest weighted result. That means the order of words doesnt matter and I can manage the weights of the commands/terms to try and prioritize expected results. So in the examples above, the command that asked for a replay but also specified a camera will run the replay command but knows that there was also a camera parameter to include as well. The real goal was to try and make it so viewers on the stream had interactive things they could ask PitGirl to do and have a feeling of agency and involvement. And they could help with the race direction for the stream broadcast and I could focus more on driving.

The most recent addition is the OpenAI ChatGPT integration.

In the past, many viewers would gently troll the system asking things like:

Code: [Select]
!pitgirl what does kris do after the race?
!pitgirl whats your favorite ice cream flavor?

I never felt motivated to try and script responses to those questions, but when I started playing with ChatGPT and realized the API was available, that seemed like it would be a fun thing to integrate.

So now when someone enters a !pitgirl command into the twitch chat that does not match one of the existing commands and they use a question word (who, what, when, where, why, how, can, is, etc) it gets set as a question for OpenAI.

There are special users (I trust) who get their questions send directly. They can ask things and she responds. For most users, it poses the question and I can hear what asked and have a VoiceAttack command I can use to send it:

Code: [Select]
PitGirl, ask the Twitch Chat question.
I can also just verbally interact with the system:

Code: [Select]
PitGirl, let me ask you a question
She responds:

Code: [Select]
What would you like to ask?
It uses free-form dictation to get a question and send it off. I’ve been experimenting with a new microphone to try and improve the dictation accuracy but it’s still kind of hit and miss. It’s best when the viewers ask the complex questions.

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 280
  • Upstanding Lunatic
    • My AVCS Homepage
WOW! That is an amazing system, and quite a lot to orchestrate!! Well done!

I am particularly impressed with the ability for you AND Twitch chat viewers to issue commands for replays/cameras, never seen anything like that. That is very unique and such an impressive way for users to interact & participate as well!

The OpenAI API is a great tool, and it offers a lot of functions beyond just ChatGPT.  For the dictation, you could consider simply recording your audio as a .wav file when it expects you to speak your question, and then sending this to the OpenAI Whisper API to get a near perfectly accurate transcription (very fast) which you can then provide as your subsequent User Input to the ChatGPT call.  I found this a very fast way to get around the limitations of Windows Dictation, but in my plugin, I also use it in conjunction with VoiceAttack's dictation system in order to recognize breaks in speech as an end to dictation. This produces a .wav file for each 'sentence' and only when I've stopped speaking for a couple seconds does it finalize and stitch all those .wav files into one for Whisper API.  It's a powerful OpenAI feature you might find interesting, though does take a bit more coding and testing to work with audio files, etc.

Another pre-processor you might investigate is the OpenAI Moderation API.  While I'm sure Twitch bots have plenty of options for automated moderation, it could be used as a final layer before a user's input is sent to ChatGPT (under your API key) allowing you to open it up to more users without having to manually permit a question to go through.  That being said, your method of using voice commands to be your own gatekeeper is certainly smart, and likely a better way of keeping it from being overused or abused in other ways that Moderation API would not prevent.

Finally, the way you use parsing to figure out what a user typed command should execute is very excellent, and worthy of praise!  That sorta stuff is not easy, I've toyed a bit with that manner of post-processing myself and it takes a lot of attention to detail - good on ya!  If there's just one more neat OpenAI thing I could mention on this front that you may enjoy, it's the Embeddings API.  This works by generating around 1,500 float vectors for a given string (such as user input, or any block of text really).

If, for example, we have a Webster's Dictionary and feed each word & definition to a database where each entry has Embeddings float vectors generated for it, we could then ask anything in any natural language speech like, "What is the definition of 'perplexing'?", and next request Embeddings float vectors for this new user input then compare it to all those in the database (using cosine similarity), we could return the top N result(s) and then provide the user input and the found data to ChatGPT with a note to "use the provided data to answer the user input" and it would do so.  This is a rough example, as ChatGPT already has a full knowledge of all words - but what about things it doesn't know?  What about finding a voice command in a long list of them based on poorly worded human input with no rigid structure?

i.e. User types out "Can you please switch the camera to the chase view, PitGirl?", which gets embeddings and compared to the command list, the top 1 command is found within margin of similarity, and executes "set camera to chase" in VoiceAttack by way of your plugin.  Obviously, this is a very advanced function of the OpenAI API, but is equally powerful and could potentially have many applications not only for data which ChatGPT does not possess but also examples such as that above for identifying a corresponding datum based on natural human language input (and all its flaws & nuances).



Just some food for thought, I love your systems as is - they are truly exceptional and a great way to use VoiceAttack!

Keep up the great work and best of luck with YT and Twitch!!  ;D

Robertsmania

  • Newbie
  • *
  • Posts: 46
Those are excellent suggestions, thank you!

A little more detail about how the current ChatGPT integration is working with their API, system messages and conversation history...

Right now it maintains an array of the past 10 user and assistant messages to provide context.

There is a Foundation System Message that gets sent with every request that tells her who she is and sets the framework for the interactions:

Code: [Select]
{
“role”: “system”,
“content”: “You are PitGirl, a race engineer assisting Kris Roberts, a sim racer on iRacing and Twitch streamer known as @Robertsmania. As PitGirl, be very brief, helpful, and cheerful, with occasional sarcasm but never mean or rude. Keep answers concise, ideally under 400 characters. Users will address you directly with questions. Their messages will be in the format ‘username: content’, so include their username in your response. Avoid mentioning being an AI or language model.”
}

Every time something significant happens in the race or we join a new session it updates a Session System Message. That tells the last important event, describes the session, lists what cars are used and which track. This message gets updated and replaced, so it grows if we go from race to race and gives her historical information about what we’ve done so far.

Code: [Select]
{
“role”: “system”,
“content”: " At 12:21 we finished the race in position 4. | Previous: At 11:43 the race started. | Previous: At 11:32 we joined a Race at Road America with 15 drivers. We are driving a Lotus 79 and our car number is 3. | Previous: "
},

Then any time a question is asked it gets a Snapshot System Message about what’s happening right then, usually the lap number, position and number of incidents.

Code: [Select]
{
“role”: “system”,
“content”: “As of 12:10 we are racing and currently in position 3 on lap number 15 with 0 incidents”
},

Additionally, it maintains a User System Message that includes the usernames of everyone who has asked questions so far. It gets updated and replaced, with the goal of being able to ask her who has asked questions. PitGIrl, say goodbye to our viewers and especially thank those who asked you questions:

Code: [Select]
{
“role”: “system”,
“content”: “Recent users who asked questions: Schultheisss, Lakel1, 5Fenix, Robertsmania.”
},

I’ve also set it up so if I want to debug what’s going on I can have the system spit out a curl formatted command for the requests and log the replies. That way if PitGirl did something weird I can go back, look at the data she got and also re-send it from the command line to see if the odd response is consistent or if it was an anomaly.

Code: [Select]
curl -X POST https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer XXX" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
  {
    "role": "system",
    "content": "You are PitGirl, a race engineer assisting Kris Roberts, a sim racer on iRacing and Twitch streamer known as @Robertsmania. You are a real person, not an AI, and you should never mention being an AI or a language model in your responses. Kris favorite track is Laguna Seca, while yours is the Nordschleife. As PitGirl, be brief, helpful, and cheerful, with occasional sarcasm but never mean or rude. Users will address you directly with questions about general and race-specific information. Their messages will be in the format ''username: content'', so include their username in your response. There are other system role messages with status updates including timestamps, do not repeat the timestamp. Keep answers concise, under 400 characters."
  },
  {
    "role": "system",
    "content": "Recent users who asked questions: Schultheisss."
  },
  {
    "role": "assistant",
    "content": "Great question, Robertsmania! Road America is a historic track located in Elkhart Lake, Wisconsin, USA. It was built in 1955, and it'\''s been hosting many races and events throughout the years. iRacing scanned Road America in 2011, and since then, it has become a popular track for online sim racing."
  },
  {
    "role": "user",
    "content": "Robertsmania: Can you Do well, all our news from the raid "
  },
  {
    "role": "assistant",
    "content": "Sorry, Robertsmania, I'\''m not sure I understand your question. Could you please rephrase or provide more context?"
  },
  {
    "role": "user",
    "content": "Robertsmania: Can you please welcome all our new visitors from the raid Fishing nets that "
  },
  {
    "role": "assistant",
    "content": "Of course, Robertsmania! Welcome to all the new viewers joining us from the Fishing Nets raid! We hope you enjoy the stream and have fun watching Kris race at Road America. If you have any questions or comments, feel free to ask!"
  },
  {
    "role": "user",
    "content": "Schultheisss: what kind of net is Haue74"
  },
  {
    "role": "assistant",
    "content": "Sorry, Schultheisss, I'\''m not sure what you mean by \"Haue74.\" Could you please provide more context or rephrase your question?"
  },
  {
    "role": "system",
    "content": " At 12:21 we finished the race in position 4.  | Previous:  At 11:43 the race started.  | Previous:  At 11:32 we joined a Race at Road America  with 15 drivers. We are driving a Lotus 79 and our car number is 3.  | Previous: "
  },
  {
    "role": "system",
    "content": "As of 12:23 we are racing."
  },
  {
    "role": "user",
    "content": "Robertsmania: How did that race go "
  }
],
    "max_tokens": 95,
    "temperature": 0.8
  }'

So far the results have been really good, and it’s fun to be able to ask general questions and get responses in character.

She still does sometimes come back with “as a language model, I don’t have a favorite ice cream flavor” or things like that, but what’s been really interesting is to feed back those interactions, provide the curl mesage context to ChatGPT-4 and get its advice on how to adjust the system messages and interaction to get better results.

It has often had very good and effective suggestions!

Here's another peek behind the curtain from the very beginning when I first got the ChatGPT integration working.  I was so excited!

https://youtu.be/we112J6cGS4


Robertsmania

  • Newbie
  • *
  • Posts: 46
The OpenAI API is a great tool, and it offers a lot of functions beyond just ChatGPT.  For the dictation, you could consider simply recording your audio as a .wav file when it expects you to speak your question, and then sending this to the OpenAI Whisper API to get a near perfectly accurate transcription (very fast) which you can then provide as your subsequent User Input to the ChatGPT call.  I found this a very fast way to get around the limitations of Windows Dictation, but in my plugin, I also use it in conjunction with VoiceAttack's dictation system in order to recognize breaks in speech as an end to dictation. This produces a .wav file for each 'sentence' and only when I've stopped speaking for a couple seconds does it finalize and stitch all those .wav files into one for Whisper API.  It's a powerful OpenAI feature you might find interesting, though does take a bit more coding and testing to work with audio files, etc.

YES!  Whisper indeed looks like the way to go for open dictation. 

I've been pretty frustrated trying to improve the free form accuracy of the native system but have never been very successful.  Modest improvements here and there but still glaring errors and mistakes happen all the time. 

I've been playing with your plugin today and its obvious that the consistency and accuracy of the text from the Whisper model is just so much better.  It really is amazing.

Outside the ChatGPT integration, there are a few other features where PitGirl uses dictation and this is going to be a big win for all of them.  We have the ability to send in game text messages that are either broadcast to the entire session or DMs to specific drivers.  I can have her do dictation with the old system, but it was so terrible that I ended up writing a fall back system where I could ask her to send a message of a particular type and she would randomly pick from one of a handfull of preset phrases.  I say: "PitGirl, tell them not to worry about me" - She sends them a message that says: "Dont worry about me, I'll try not to do anything foolish" or "You're clearly faster, I'm going to just try and keep in the draft" or whatever.

In practice its okay, and of course Twtich viewers can suggest messages with "!pitgirl chatmessage whatever they want to suggest we say".  But having accurate dictation for arbitrary text is going to be a game changer. 

I knew the Whisper model existed and had considered it for some other applications I've been working on (http://www.robertsmania.com/ai) but for some reason it never occurred to me to use it with VoiceAttack just for the specific task of free form dictation.  Its crazy how something can be right there in front of you and it takes someone else pointing it out!

Thank you!

Robertsmania

  • Newbie
  • *
  • Posts: 46
I've decided to release the core marker processing and replay PitGirl code.  I'll make videos, write up documentation and post about it in the Plugins section on this forum, iRacing forums and various social platforms soon.

https://github.com/Robertsmania/RobertsmaniaPitGirl

Here's a preview of the functions provided in the plugin:
Code: [Select]
        public static void ShowUsage()
        {
            string usage = "RobertsmaniaPitGirlReplay commands:\n";
            usage += "Print_Info\n";
            usage += "Print_Cameras\n";
            usage += "Print_Drivers\n";
            usage += "Set_Camera | {TXT:~~NewCamera}\n";
            usage += "Get_Camera | {TXT:~~HoldCamera}!\n";
            usage += "Watch_MyCar\n";
            usage += "Watch_MostExciting\n";
            usage += "Watch_CarNumber | {TXT:~~CarNumber}\n";
            usage += "Watch_CarPosition | {TXT:~~CarPosition}\n";
            usage += "Check_CarNumber | {TXT:~~CarNumber}!\n";
            usage += "Check_CarPosition | {TXT:~~CarPosition} {TXT:~~CarNumber}!\n";
            usage += "Jump_ToLive\n";
            usage += "Jump_ToBeginning\n";
            usage += "Marker_Add\n";
            usage += "PlayMarker_Next | {TXT:MarkerCarFilter} {TXT:MarkerTypeFilter} {INT:~~ReplayBufferSecs}\n";
            usage += "                | {TXT:~~MarkerDriver}! {TXT:~~MarkerType}!\n";
            usage += "PlayMarker_Previous | {TXT:MarkerCarFilter} {TXT:MarkerTypeFilter} {INT:~~ReplayBufferSecs}\n";
            usage += "                | {TXT:~~MarkerDriver}! {TXT:~~MarkerType}!\n";
            usage += "PlayMarker_Last\n";
            usage += "PlayMarker_First\n";
            usage += "SeekMarker_First\n";
            usage += "iRacingIncident_Next\n";
            usage += "iRacingIncident_Previous\n";
            usage += "Marker_Count | {INT:~~MarkerCount}\n";
            usage += "Marker_Summary | {TXT:~~MarkerSummary}! {TXT:~~MostOvertakesCarNum}!\n";
            usage += "                 {TXT:~~MostIncidentsCarNum}! {TXT:~~MostBroadcastsCarNum}!\n";
            usage += "                 {INT:~~IncidentMarkerCount}! {INT:~~OvertakeMarkerCount}!\n";
            usage += "                 {INT:~~RadioMarkerCount}! {INT:~~ManualMarkerCount}!\n";
            usage += "                 {INT:~~UndertakeMarkerCount}!\n";
            usage += "Marker_Summary_CarNumber | {TXT:~~CarNumber} {INT:~~CarNumberMarkerCount}!\n";
            usage += "                           {INT:~~CarNumberIncidentMarkerCount}! {INT:~~CarNumberOvertakeMarkerCount}!\n";
            usage += "                           {INT:~~CarNumberRadioMarkerCount}! {INT:~~CarNumberManualMarkerCount}!\n";
            usage += "                           {INT:~~CarNumberUndertakeMarkerCount}!\n";
            _vaProxy.WriteToLog(usage, "pink");
        }