Author Topic: Can't seem to improve speech recognition  (Read 12477 times)

rangoon

  • Guest
Can't seem to improve speech recognition
« on: March 05, 2017, 08:21:53 PM »
I am using the latest Voice Attack and Windows 10. I have run through the speech recognition improvement tool several times. I have also manually added many words and recorded me saying them. Still, I cannot seem to get Voice Attack to decipher between certain things, such as "stop" and "stopped". I have recorded both words, as well as phrases where I am using them and their individual words. Voice Attack continues to hear "stop" as "stopped" even though it seems like the consonant at the end of "stopped" should distinguish it (I can sure hear it in the dictionary recording so I'm sure it's showing up on the wave form analysis). I have good microphone levels and clarity.

The only thing I have found to do is to add "stopped" to the profile when I mean "stop". But this is not a great solution because this is only one example of this behavior. Is there something else I could do to try and fix this?

Another example is "driver" and "ever" or "never". Again, I entered all these words into the dictionary and recorded their pronunciation. No improvement. It makes me think the two aren't communicating (the dictionary or speech profile and Voice Attack). Is this possible?

Pfeil

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4782
  • RTFM
Re: Can't seem to improve speech recognition
« Reply #1 on: March 06, 2017, 04:10:17 AM »
I cannot seem to get Voice Attack to decipher between certain things, such as "stop" and "stopped".
"driver" and "ever" or "never"
Any speech recognition system will have trouble differentiating between similar sounding words, especially when lacking the context of a complete sentence. The state of the technology is such that recognition accuracy is just good enough to be usable, in some situations.
If it were more like science fiction, it'd play a much bigger part in our lives, but right now you're just as likely to receive "Bees, all angry" as "Tea, earl grey".


Are you issuing commands that can be matched to phrases, or using dictation?

If the former, you can see the confidence level with which the speech engine matches your command phrase by clicking the wrench icon on the main window, clicking the "Recognition" tab, and checking the "Show Confidence Level" option.
« Last Edit: March 06, 2017, 04:14:35 AM by Pfeil »

GreyArea

  • Guest
Re: Can't seem to improve speech recognition
« Reply #2 on: March 20, 2017, 02:44:11 AM »
Having similar problems;

When I say "Yes", voiceattattack insists I said "GS" or "BS". It's inaccurate in a lot of other ways...but i don't understand why it would "decide" that it heard something unpronouncable over a very common word.

I don't have any speech impediments or a strong accent...I'd be very interested in any ideas to improve this...

...as an aside, the recognition seems far worse in dictation mode...is there any reason for that?

Pfeil

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4782
  • RTFM
Re: Can't seem to improve speech recognition
« Reply #3 on: March 20, 2017, 07:06:27 AM »
the recognition seems far worse in dictation mode...is there any reason for that?
In normal operation, the speech engine will attempt to match what it hears to a list of predefined sentences, so it has some idea of what you may be trying to say.

In dictation mode, it has to match what it hears to a dictionary the size of the English language, which offers vastly more possibilities.
You'll find recognition, especially in dictation mode, works best with longer sentences. As with any system looking up information, the more data the better.

On top of that, if the speech engine hasn't been "trained" very well, it will have difficulty understanding which sounds belong to which phonemes, if any.

rangoon

  • Guest
Re: Can't seem to improve speech recognition
« Reply #4 on: March 20, 2017, 08:36:03 AM »
Quote from: Pfeil
Are you issuing commands that can be matched to phrases, or using dictation?

If the former, you can see the confidence level with which the speech engine matches your command phrase by clicking the wrench icon on the main window, clicking the "Recognition" tab, and checking the "Show Confidence Level" option.

I am using a profile with phrases, not dictation, so I would expect better results. I mean if the word "driver" is in the profile, why does it keep wanting to hear "ever" or "never"? And in the past, I have had good results from adding words to my dictionary. However, this time, adding "driver" (recording me saying it) didn't help.  That's why it almost seems like a disconnect between the speech recognition engine and my Voice Attack profile. I mean I'm sure it's not actually disconnected, but in the past I've had better success with the process.

So again, I don't understand: it hears similar (?) words like driver and never, but if "driver" is specifically in the profile, how is it not figuring this out? Sometimes it hears "driver", but less than half the time. If it never heard "driver" then I'd blame myself, but since it hears "driver" sometimes, I blame the software. When I test my microphone, I hear myself clearly. When I listen to my dictionary addition of "driver", it sounds okay...no worse than other cases where I've seen improvement.

In the end, my work-around is to add the words "never" and "ever" as options where I'm wanting to use "driver" and now it works (to the extent that this is considered working). Hopefully I never need to use "ever" or "never" as distinct results in the profile, I guess, right?

As for the confidence level, I've had that turned on since I started using Voice Attack a couple years ago, and with these, the confidence is always low (I can't recall the exact range, but usually around 50%). I'll go check and update this....

Any other ideas how to improve this? Or whether this indicates a possible problem in the system?

EDIT: I just tested this again -

If I remove "never" and "ever" as options, whether it hears "driver" depends on the context. When it hears "driver" the confidence is around 50%-55%. When I have "ever" and "never" as options, it always hears "ever" now. Confidence is 80%. Again, I'm saying "driver" and "driver" is in my dictionary, custom recorded through the Windows speech recognition dictionary.
« Last Edit: March 20, 2017, 08:47:53 AM by rangoon »

GreyArea

  • Guest
Re: Can't seem to improve speech recognition
« Reply #5 on: March 20, 2017, 04:22:57 PM »
Same here...I have "yes" as a command...but when I say "yes" it insists I said "GS"...I mean phonetically Gee-Ess could be seen as "Gee-Yes"...except I'm not saying "Gee"...and GS isn't in my commands...

Have a look at the fun I had getting VA to recognise a simple "no" in one of my other posts (I posted the log)...VA obviously hasn't heard that "no means no" and on occasion thinks it means "bone". I fear VA may potentially be the world's first AI date-rapist...

Gary

  • Administrator
  • Hero Member
  • *****
  • Posts: 2832
Re: Can't seem to improve speech recognition
« Reply #6 on: March 20, 2017, 04:33:00 PM »
Have you guys tried making sure your mics are not too, 'hot'?  That can mess things up quick.  I've been bitten lots of times by having a mic that was left up (with boost on).  Also, any third-party software that alters audio will definitely make things seem like they're not working.  Just thought I'd throw that in there.... and these for good measure:

http://voiceattack.com/smf/index.php?topic=63.0
and
http://voiceattack.com/smf/index.php?topic=64.0

GreyArea

  • Guest
Re: Can't seem to improve speech recognition
« Reply #7 on: March 21, 2017, 09:16:17 AM »
Not sure if related; I used "text to speech to say "It's not difficult, just say yes or no"

...she actually SAYS "BS or no"...how do I fix that?

Pfeil

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4782
  • RTFM
Re: Can't seem to improve speech recognition
« Reply #8 on: March 21, 2017, 09:35:12 AM »
I used "text to speech to say "It's not difficult, just say yes or no"

...she actually SAYS "BS or no"...how do I fix that?
Which TTS voice are you using? "Microsoft Anna - English(United States)" seems to pronounce it fairly well.