Author Topic: Assistance wanted - Gary is willing but not able!  (Read 6996 times)

Durham

  • Guest
Assistance wanted - Gary is willing but not able!
« on: October 18, 2016, 09:59:36 PM »
Gary has been very kind, but is unable to help me owing to his real life pressures.

He suggested I reach out to the community in his stead.

I am trying to put a speech activated front end onto a C# VOIP system, where the recognition engine will effectively handle switching channels on the client application which will then send the changes to the server application.

I thought initially I would do this as a VoiceAttack plugin to get the benefit of the interface.  I couldn't get to first base with the available documentation.

I then went native with C#'s SpeechRecognitionEngine class, flicking the grammar according to the inputs - like a menu system where top level is [File],[Edit],[View] and depending on what the user says we load the appropriate grammar.

On limited grammar, all is fine.  On what I want to do, it is a Bill Gates special, or as useless as an ejector seat in a helicopter.

I would welcome any input either as regards the C# SpeechRecognitionEngine approach (where the issue is its desire to guess the next command from the first voice command on the new grammar - ie the recognizedEvent keeps running even though you haven't said anything new), or, because I am a fan of the product, and I would love that my users had to pay Gary, some help on achieving the same outcome using the VoiceAttack PlugIn SDK.

Happy to share C# source code for simple windows form - voice in - labels.text out solution, the simple one works fine the real one is useless!

Best to Gary and kind regards to you all

Durham

Antaniserse

  • Global Moderator
  • Jr. Member
  • *****
  • Posts: 87
    • My VA plugins
Re: Assistance wanted - Gary is willing but not able!
« Reply #1 on: October 20, 2016, 04:47:58 AM »
I can't infer any specifics about what you are actually trying to accomplish with your module, but what i understand from all of the above is that
  • using VA directly, you didn't have much trouble with the voice recognition stuff, but you had with the plugin development
  • using you custom C# app, you have trouble with voice recognition itself
Is that correct?

If so, what issues did you have specifically when trying to setup your plugin code?
"I am not perfect and neither was my career. In the end tennis is like life, messy."
Marat Safin

Durham

  • Guest
Re: Assistance wanted - Gary is willing but not able!
« Reply #2 on: October 21, 2016, 09:44:10 PM »
Yes, you are correct.

With a couple of training sessions VA works great (although I use it on a button, so there is no misunderstanding from my TeamSpeak chat).

I need VA, because I use Oculus Rift in DCS and have found that with its help, I can fly all the aircraft I want to - KA50, F-15, F-5, M2000, SA432, UH-1 (don't think I could do A10-C, but haven't really tried).

With the c# SpeechRecognition class (Gary told me that VA is written in C#, so I am pretty sure this is the class he is using), I get great recognition with limited grammar (which is what a VAP file is), and really bad recognition with a large grammar file, which is my current construct.

The issue is that I am trying to load units in the US Army sense - so it goes "1 1 Alpha" through "1 4 Alpha" where the first digit is the platoon, the second is the unit inside the platoon and the letter is the Company.  Complicated by the fact that Alpha Company can have both at "CO" and "XO", so you need to handle those before getting to the integers.

Inside each unit, there are three positions: "TC", "Driver", "Gunner".  Amazingly, with only three phrases/words in the grammar, it is pretty good.

I tried it all out with 1/1/A through 1/4/A and all worked fine.

Then I wrote an xml writer that looped through all the possibilities and created a big grammar file, and it is hopeless.

I have gone through the speech recognition dictionary and added everything I use, bulked up by recordings of my voice saying Alpha through Zulu etc..

My current path is to disaggregate so I have two grammars, numbers (which will include "CO" and "XO"), and letters on (RecognizeMode.Single) and see if that works better.

I would really rather have total control, so I don't have to deal with VA updates and so on, but if the new approach doesn't work, then I will be back to trying to get a VA plugin to set variables in my C# application in real time.

Good luck with that you say.

Thanks for your help Antaniserse.

Durham

Antaniserse

  • Global Moderator
  • Jr. Member
  • *****
  • Posts: 87
    • My VA plugins
Re: Assistance wanted - Gary is willing but not able!
« Reply #3 on: October 22, 2016, 01:20:57 AM »
I have limited experience with the SpeechRecognitionEngine class, but I toyed with it a bit... one thing I've notice is that grammar entries are not necessary "flat", but can be composed with layers of separate choices and bits which are then compiled by the engine; not sure if this is just for convenience in code, or if has actual effect on reliability

Now, maybe you are already doing your in this exact way, but this is a small example that seems fairly accurate on my system (in VB because I'm faster that way, trivial to convert to C#):

Code: [Select]
    Private WithEvents speechEngine As New Speech.Recognition.SpeechRecognitionEngine
    Private WithEvents speechSynt As New Speech.Synthesis.SpeechSynthesizer

    '------------------------------------------------
    'somewhere in your inizialization section
    '------------------------------------------------
    speechEngine.SetInputToDefaultAudioDevice()
    'Sets a medium-high confidence factor
    speechEngine.UpdateRecognizerSetting("CFGConfidenceRejectionThreshold", 70)

    speechSynt.SetOutputToDefaultAudioDevice()

    Dim gramBuilder As GrammarBuilder
    Dim ch_Numbers As New Choices()
    ch_Numbers.Add("1", "2", "3", "4", "5", "6", "7", "8", "9")
    Dim ch_Letters As New Choices()
    ch_Letters.Add("Alpha", "Bravo", "Charlie", "Delta", "Echo", "Foxtrot", "Golf", "Hotel", "India" _
                    , "Juliett", "Kilo", "Lima", "Mike", "November", "Oscar", "Papa", "Quebec", "Romeo" _
                    , "Sierra", "Tango", "Uniform", "Victor", "Whiskey", "Xray", "Yankee", "Zulu")

    gramBuilder = New GrammarBuilder
    gramBuilder.Append("what Is")
    gramBuilder.Append(ch_Numbers)
    gramBuilder.Append("plus")
    gramBuilder.Append(ch_Numbers)
    Dim g_WhatIsXplusY As New Grammar(gramBuilder)
    Debug.WriteLine(gramBuilder.DebugShowPhrases)

    gramBuilder = New GrammarBuilder
    gramBuilder.Append(ch_Numbers)
    gramBuilder.Append(ch_Numbers)
    gramBuilder.Append(ch_Letters)
    Dim g_MilitaryUnits As New Grammar(gramBuilder)
    Debug.WriteLine(gramBuilder.DebugShowPhrases)

    speechEngine.LoadGrammarAsync(g_WhatIsXplusY)
    speechEngine.LoadGrammarAsync(g_MilitaryUnits)

    speechEngine.RecognizeAsync(RecognizeMode.Multiple)
   
    '------------------------------------------------
    'handlers for the SpeechEngine events
    '------------------------------------------------
    Private Sub speechEngine_SpeechRecognized(sender As Object, e As SpeechRecognizedEventArgs) Handles speechEngine.SpeechRecognized

        Dim txt As String = e.Result.Text

        If txt.IndexOf("what") >= 0 AndAlso txt.IndexOf("plus") >= 0 Then
            Dim words As String() = txt.Split(" "c)
            Dim num1 As Integer = Integer.Parse(words(2))
            Dim num2 As Integer = Integer.Parse(words(4))
            Dim sum As Integer = num1 + num2

            Dim output As String = String.Format("{0} plus {1} equals {2}", words(2), words(4), sum)
            Debug.WriteLine(String.Format("(Speaking: {0})", output))
            speechSynt.SpeakAsync(output)
            Exit Sub
        End If

        Dim regex As New System.Text.RegularExpressions.Regex("^(\d{1}) (\d{1}) (\w+)$")
        Dim match As System.Text.RegularExpressions.Match = regex.Match(txt)

        If match.Success Then
            Dim output As String = String.Format("Platoon {0}, unit {1}, company {2}: reporting in", match.Groups(1), match.Groups(2), match.Groups(3))
            Debug.WriteLine(String.Format( "(Speaking: {0})", output))
            speechSynt.SpeakAsync(output)
            Exit Sub
        End If

    End Sub

    Private Sub speechEngine_SpeechRecognitionRejected(sender As Object, e As SpeechRecognitionRejectedEventArgs) Handles speechEngine.SpeechRecognitionRejected

        Debug.WriteLine("Speech input was rejected")
        For Each phrase As RecognizedPhrase In e.Result.Alternates
            If phrase.Text.Length > 0 Then
                Debug.WriteLine(String.Format("  Rejected phrase: '{0}' confidence {1:N2}", phrase.Text, phrase.Confidence))
            End If
        Next
    End Sub


It is true that the engine is a bit eager to please, so phrases like "1 sierra lima" tends to be recognized with fairly high confidence values, even if that is not really the format you would accept... this, I'm afraid, I'm not really sure how sghould be handled, if again is a matter of building the grammar with a differente structure or something else.
"I am not perfect and neither was my career. In the end tennis is like life, messy."
Marat Safin