Author Topic: Real-Time Voice Transcription for DCS Kneeboard using Whisper and VoiceAttack (Read 2380 times)

bojote · « **on:** October 02, 2024, 04:33:49 PM »

Hello everyone,

I’m excited to share a tool I’ve been working on that integrates real-time voice transcription directly into your DCS kneeboard using Whisper AI and VoiceAttack. As you know, flying in DCS can require managing tons of information on the fly — coordinates, radio frequencies, headings, or even just reminders. This tool is designed to make that easier by letting you speak and then automatically transcribe that information into your kneeboard in real time.

How it Works:
Two Python scripts, combined with VoiceAttack, let you record up to 10 seconds (or more if you edit the script) of audio by pressing a button on your HOTAS or joystick. Once you release the button, the recording is automatically processed using OpenAI’s Whisper model, transcribed into text, and pasted directly into your DCS kneeboard (automatically) — no manual typing required.

Voice Recording with a Safety Mechanism:
When you press a designated button in VoiceAttack, the first script (recorder) starts capturing audio. It records up to 10 seconds (configurable), ensuring that even if you forget to stop the recording, it won’t continue indefinitely.

Automatic Transcription and Paste to Kneeboard:
Once you release the button, VoiceAttack triggers the second script (transcriber). This script sends a signal to stop the recording, processes the recorded audio using OpenAI's Whisper, and transcribes it into text. The transcription is then automatically copied and pasted into your DCS kneeboard using a simulated keyboard shortcut (Ctrl + Alt + P) assuming you have it assigned on your DCS controls UI Controls screen.

Whisper AI for Accurate Transcription:
Whisper is one of the best AI models out there for real-time speech recognition, handling aviation terms, and numbers with remarkable accuracy. Whether you're calling out coordinates, frequencies, or instructions, it catches most of it pretty well. Even if you speak a different language, it will translate the instruction to the kneeboard in plain english.

Why Use This Tool?

Hands-Free: You don’t need to pause the game or type anything. Simply speak into your mic while flying and it will AUTOMATICALLY paste the transcribed version of your speech to your DCS Kneeboard on the fly! Coordinates, instructions, 9-Lines, reminders. anything.

Customizable: You can adjust the recording length, transcription settings, and trigger buttons based on your needs.

Robust Performance: Both scripts are designed to be reliable under various conditions, with built-in safety mechanisms to avoid race conditions or interruptions.

What do you need? (Requirements)

VoiceAttack: To trigger the recording and transcription scripts with a button press.

Python Installed: The scripts are written in Python, so you’ll need to have Python installed on your system. Detailed instructions will be provided to set this up.

Whisper (OpenAI): Whisper is the AI model used to process and transcribe the audio. The model can run on both CPU and GPU (CUDA-supported), but a GPU will significantly speed up transcription.

How to Set It Up:

Download it from https://github.com/BojoteX/KneeboardWhisper and check the README file included

Feel free to reach out if you have questions, or if you need help getting it set up!

Fly safe,
"Bojote"

SemlerPDX · « **Reply #1 on:** October 03, 2024, 03:05:31 PM »

Well done!! Works very nicely, faster than expected on my aging Nvidia 2070 Super (c. 2018/2019)

The steps to get this going are a bit cumbersome, makes me wonder if it could be streamlined in any way. Also, a bit of constructive criticism, the .wav recording could be done in other ways perhaps using a form of the "Dictation until silence" (at least conceptually) which Pfeil describes in this forum post here at VA Forums. I tested a .wav file created by my own form of this system - albeit written in C#, an actual VoiceAttack Plugin using the vaProxy object to get data about active speech. Your "transcribe" python had no issues with this .wav file it created, merged from several .wav files of individual sentences broken up by natural pauses in speech, until a large enough pause in speech was detected as the "end" of speech.

A note on the audio file path produced by the recording python: I feel you could do better to have it write to a new folder under AppData Roaming by default, rather than System32 which is a protected location. This would eliminate issues with folks who are not running VoiceAttack "As Admin" to subsequently execute the 'recorder.py'. People could still change this path as needed, but at least by default you'd be writing to a place designed for applications to write to.

You'd modify the `recorder.py` to add these lines replacing the existing file name variable line:

Code: [Select]

import os
# Get the AppData/Roaming path for the current user
appdata_path = os.getenv('APPDATA')

# Define the target folder and file name
whisper_folder = os.path.join(appdata_path, 'KneeboardWhisper')
audio_file = os.path.join(whisper_folder, 'sample.wav')

# Create the folder structure if it doesn't exist
os.makedirs(whisper_folder, exist_ok=True)

# Should be: audio_file == C:\Users\<username>\AppData\Roaming\KneeboardWhisper\sample.wav

You'd want to add these to the `transcriber.py` as well, so it knows where this file is too.

If there could be a way to simply provide users with a python application or plugin for VoiceAttack that could install the required python modules or even python itself (and those modules), I think that should be pursued. A secondary redundant means to handle speech end either using your push-to-talk button as a (singular) programmatic means to end the speech recording (beyond merely the trigger in transcribe.py), or a form of "Dictation until silence", could allow you to do away with the "Maximum recording duration" altogether, to allow more flexibility.

Regardless, very cool tool, very functional, and I'm a big fan! Cheers!!

bojote · « **Reply #2 on:** October 05, 2024, 06:17:04 AM »

Quote from: SemlerPDX on October 03, 2024, 03:05:31 PM

Regardless, very cool tool, very functional, and I'm a big fan! Cheers!!

First, let me start by thanking you for your useful suggestions. All of your points are valid, and I don’t take them as criticism. On the contrary, I always welcome anything that can improve what I do.

I did try an initial version using VoiceAttack’s dictation feature (which also saves the .wav file), but it was so inconsistent and unreliable that I eventually abandoned the effort and did it on my own. It seems VoiceAttack (or the speech engine) breaks the dictation into chunks due to its very short buffer. Imagine trying and sometimes getting the 'previous' speech, other times the current one—it was random and frustrating.

Originally, I made this for myself and wasn’t planning to share it. But since it’s been so useful (at least for me), I decided to make it available. My scripts work completely standalone; in fact, VoiceAttack is only needed for the triggers.

I did consider creating a plugin, as I'm comfortable doing this (if you check my GitHub page, you’ll see what I mean). But Torch and Whisper can be a pain to compile, and Python's performance is remarkably good—as you’ve probably noticed as well. CPU/GPU usage is practically non-existent with my scripts, which is great!

What really inspired me to build this was the surprisingly good, but little-known, feature in DCS that allows pasting clipboard contents into the in-game Kneeboard.

I don’t plan to develop this much further, except for a few of the suggestions you mentioned, as it’s working exactly as I intended. I’m currently working on another feature (again, just for myself, not for release) that uses the recorder with a different trigger. Instead of calling the transcriber upon release, it calls my personal AI Assistant, which I’ve trained specifically for my needs. I tried both OpenAI’s API and Perplexity’s, and I ended up using OpenAI’s GPT-4 and ElevenLabs’ speech generator. So far, it’s been working incredibly well.

The assistant is used for specific, complex math tasks during flight, like calculating perfect circular orbits around a target given a bank angle and distance, or determining the descent rate needed to go from one altitude to another. It can also calculate reciprocals of a given heading, and so on—really useful stuff.

It seems you already have this functionality in your plugin, so I propose the following: You’re clearly a more capable programmer than I am. If you find anything useful in my transcriber or recorder, feel free to adopt it, incorporate it into your plugin, and I’ll gladly switch to using yours instead.

Again, thanks for your kind comments, and let’s keep sharing ideas!

Author Topic: Real-Time Voice Transcription for DCS Kneeboard using Whisper and VoiceAttack (Read 2380 times)

bojote

Real-Time Voice Transcription for DCS Kneeboard using Whisper and VoiceAttack

SemlerPDX

Re: Real-Time Voice Transcription for DCS Kneeboard using Whisper and VoiceAttack

bojote

Re: Real-Time Voice Transcription for DCS Kneeboard using Whisper and VoiceAttack