Author Topic: Is there any way to turn voice into text?  (Read 1481 times)

kellanphil

  • Newbie
  • *
  • Posts: 1
Is there any way to turn voice into text?
« on: April 24, 2025, 07:45:17 PM »
Is this possible with Voice Attack? Is there any template I might be able to import as a test profile?

Basically I want to press a key that engages my microphone, and then everything I say until I release the key is transcribed and added to my clipboard, or even pasted into an active window if that's possible to.

Any workflows out there you can share? Thanks!

Pfeil

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4884
  • RTFM
Re: Is there any way to turn voice into text?
« Reply #1 on: April 24, 2025, 07:48:52 PM »
Technically that is possible, there are some examples on how to utilize dictation in this topic.

Practically speaking, you may find that the recognition accuracy of the Microsoft SAPI speech recognition system for freeform (I.E. non-predefined, and as such without context) speech does not necessarily lend itself to the purpose you are describing.

SemlerPDX

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 333
  • Upstanding Lunatic
    • My AVCS Homepage
Re: Is there any way to turn voice into text?
« Reply #2 on: April 25, 2025, 02:09:58 PM »
Is this possible with Voice Attack? Is there any template I might be able to import as a test profile?

Basically I want to press a key that engages my microphone, and then everything I say until I release the key is transcribed and added to my clipboard, or even pasted into an active window if that's possible to.

Any workflows out there you can share? Thanks!

Using the method Pfeil linked above, "Dictation until silence", when the dictation is completed, you can use the Captured Audio action (under the "Sounds" action menu in a command) to create an audio file out of the Dictation, with a path you set.

You can then use my OpenAI Plugin for VoiceAttack and the "Whisper" plugin context to transcribe that audio into proper English, something that simple Dictation is unable to properly do.  You'll then have a text variable containing the transcription of your Dictation that is far more accurate (>99%) than relying solely upon Windows Dictation.

It is not free, but it is pennies on the dollar - I spend maybe $5 every 8 months, and that's using combined Whisper transcriptions of what I say AND the subsequent ChatGPT API calls that I use them in (every day, many times each day).  OpenAI may also offer a free $5 credit to new users, they used to anyway - you'd create an account at the OpenAI API to get your API key here:
https://platform.openai.com/settings/organization/api-keys

You can check out my plugin for VoiceAttack here - it includes a sample profile you can use to learn how to work with the plugin and create your own profiles/commands following those examples:
OpenAI API Plugin for Voiceattack (ChatGPT)

Extensive documentation is available on the Wiki I created for my plugin on the GitHub page:
https://github.com/SemlerPDX/OpenAI-VoiceAttack-Plugin/wiki