Speech input has always had a major drawback – it’s not silent. It might be before too long, though.
Not being able to speak silently doesn’t matter so much when you’re reading a document or perusing the Internet and voicing occasional navigation commands.
It matters a bit more when you’re dictating text. This is akin to having a phone conversation, but usually less distracting. People tend to be a bit quieter when dictating to a computer than when they’re on the phone. Good speech input microphones are sensitive, allowing speech input users to speak relatively softly.
But dictating a sensitive email is a completely different matter, even if you’re speaking softly. You’re exposing a potentially delicate matter to everyone within earshot, and also exposing your thought process as you put initial thoughts down and then revise.
So if you do most of your work by speaking to the computer and sit close enough to colleagues that you can be overheard, there are a couple of potential issues – security and, at times, an understandable shyness about being overheard. There’s also the reverse problem – although today’s noise canceling microphones work well even in mildly noisy environments, sounds that rise above the background noise slow speech recognition systems and can reduce accuracy rates. An office with a door solves these problems. But a means of silent speech input would be a more flexible solution.
At the same time, while sometimes it’s an advantage to hear something out loud as you’re writing, there’s something magical about having your thoughts appear more silently.
And then there are situations where you need to communicate with a computer but it would be much better if it were silent. Taking notes in a meeting, demonstrating software, and checking something on your computer while you’re talking on the phone or in a crowd come to mind…
For all these reasons speech users have long dreamed about silent speech.
World War II pilots used throat mics that used vibrations – even of a whisper – to communicate despite the noisy environment. There are modern-day versions for use with walkie-talkies in noisy environments. But throat mic’s generally aren’t precise enough for speech recognition engines.
Sub vocal recognition – using electrodes to pick up the signals sent to muscles when you speak – show more promise. This technology allows you to communicate by getting ready to vocalize a word, but stopping just short of actually speaking it.
MIT’s Media Lab has built an AI assistant that has a wearable sub vocal interface. The accuracy isn’t quite good enough to use in a speech recognition system whose vocabulary isn’t constrained, but it’s a technology to watch for folks who use speech input. An accurate sub-vocal interface for full-vocabulary speech input would change everything.