Tuesday, March 24, 2009

Speech Recognition: Why everybody is heading north?







This has been a hot topic for a while, but I don't know why I didn't "yet" seen the best software of the speech recognition spreading.

My Mac has a built-in speech recognition it is good. But still most of the time it doesn't recognize my own voice.

Lets think out of the box. Forget we have current speech systems, forget the current algorithms, forget everything and start fresh. Lets reinvent this together.

When I talk to you in English you understand me. Sometimes you don't if my accent wasn't clear enough. However, eventually you will get it.I want to talk to my machine as if I am talking to a human. I don't want the machine to feel, I just want it to understand. Whether I told it to "Quit everything" or , "Close everything" or "quit all applications". Whether I or my mother said it. It should get it if the machine understands English.

If I learned a new word, say "petrified" in English from Roy. I will recognize the word "petrified" from anyone. I won't say, sorry I only know Roy's
version of "petrified".

Perhaps we shall start from the capturing process (the computer Ear). How do I know if the computer is capturing my voice correctly?


"Open the control panel", the machine should open it. That's it. No room for mistakes.It think the current systems were driven by the technology and lost in the bit-by-bit and the neural networks analysis. I know this feeling when you get lost in the code and your left brain take the lead. You can't be creative while at that mode.


So Why everybody is heading north? Is it because there is one way to do Speech Recognition?

Its all the same:
Huffman started compression, Winzip, Winrar, Powerachieve followed the same approach.

Someone started voice recoginition, wrote an algorithm, now everybody is treating it as a Bible and heading north. 



EDIT: Siri, Google Voice and others are all new technology that hits this field pretty hard. But not hard enough.

5 comments:

  1. Speaking of that, I also find the current OCR technology lacking, each library I've tried can recognize full images included with it, but when I provide a clear black text on white background sample the results are garbage.

    Someone should find a new way to do OCR instead of the old techniques.

    ReplyDelete
  2. @Yaseen,

    That's the same story you are right.

    Its good to bring the ideas to fresh air, maybe someone will read this blog one day and decide to take the lead.

    ReplyDelete
  3. Instead of North, they should be traveling East in search of light in programming. :)

    Seriously, the problem with speech recognition, and with OCR, as Yassen points out, is the computer's accuracy with details. Yup, its strength, accuracy, is also its weakness in this case.

    It's so accurate that it can sense a slight variation in audio signal between Roy's voice sample of the word petrified and that of yours.

    That would not have been a problem except that it remembers signals and not meanings of words. So, even if there is no variation in meaning (same word) but there is a variation in signal (as the same word is spoken by different persons), it considers it a different word.

    Part of the solution to this might be to teach the computer some signal (audio or visual) abstraction.

    It should learn what details, and to what extent those details, of a data can change and still be the same data. Example, a dog can change color and height and still be a dog.

    ReplyDelete
  4. @Cody!!

    "That would not have been a problem except that
    it remembers signals and not meanings of words."

    Great analysis!
    to invent another signal that carry the meaning of the word

    I guess our brain is reading the signal of the meaning to understand the content and then reading the signal of the actual sound for emotion purposes.. (strong voice, girly voice, sexy voice etc..)

    or vise versa (read the voice signal then the meaning signal)

    However how this is done need some research..


    I loved your post and differentially it could lead the readers to a promising stage..


    Lets just put anything you are thinking of you don't have to be a computer guy, an IT guy or even a speech recognition expert..


    thanks again Cody !

    ReplyDelete
  5. You are always welcome and I'm always happy to make a contribution to your already brilliant ideas. ;)

    ReplyDelete

Share your thoughts

Note: Only a member of this blog may post a comment.