How Speech Recognition Works?
Dictionary and encyclopedia definitions of speech recognition state that this is technology for turning the spoken word into text and commands that are recognized by a personal computer. That’s the simple definition, but how does this happen? Is it magic?
No, it isn’t magic. People have tried to develop a reliable speech-recognition program for years and have actually had some success. But many users and developers don’t feel the speech-recognition methods offered are as good as expected (or as good as they can be).
In the early years, the software was expensive yet didn’t provide the accuracy users were looking for. In recent years, faster processing speeds in computers and new software techniques provided better results. Now, speech recognition is becoming reliable and practical, as early creators intended.
What happens when speech-recognition programs are used? First of all, think this as a device that allows the user to input information much as the keyboard or mouse does. For this to work, the computer has to have a quick central-processing unit (CPU) and a lot of random-access memory (RAM). In addition, the computer should have a good sound card to process the audible signals into usable, digital data. A quality microphone for voice input is also essential.
Computers don’t speak. These are manufactured products made from plastic, metal, silicone and other materials. They communicate in digital format. Even when we use a keyboard to input words or use a mouse to select information on a screen, our intentions have to be transformed into the binary form that the computer can use. The same goes for speech input.
Basically, the microphone converts the voice to an analog signal. This is processed by the sound card in the computer, which takes the signal to the digital stage. This is the binary form of “1s” and “0s” that make up computer programming languages. Computers don’t “hear” sounds in any other way.
Sound-recognition software has acoustic models convert the voice sounds to one of about four dozen basic speech elements (called phonemes). The latest versions of speech technology have been refined so that they eliminate the noise and useless information that is not needed to let the computer work. The words we speak are transformed into digital forms of the basic speech elements (phonemes).
Once this is complete, a second sector of the software begins to work. The language is compared to the digital “dictionary” that is stored in computer memory. This is a large collection of words, usually more than 100,000. When it finds a match based on the digital form it displays the words on the screen. This is the basic process for all speech recognition software. Of course, the English language and a few others have words that sound similar when spoken and are similar to the computer as well. The best speech-recognition software can tailor results based on individual voice. Results are still far from perfect. A computer is pushed to its limits by most good speech-recognition programs. Some software has allowed users to abandon word and letter input, in some situations.
Category: Technology

Can you please tell me how a speech to text converter works?