Mozilla’s DeepSpeech Project

Mozilla’s open source speech-to-text project has tremendous potential to improve speech input and make it much more widely available. It uses Google’s TensorFlow open source machine learning framework to implement Baidu Research’s DeepSpeech speech recognition technology,

Here’s Baidu’s research paper on DeepSpeech, published in December 2014: arxiv.org/abs/1412.5567. DeepSpeech uses a simpler architecture than traditional speech engines and is less resource intensive. It also handles noisy environments better than current systems.

Here’s the github site for the Mozilla project:
github.com/mozilla/DeepSpeech

As part of the effort, Mozilla is collecting utterances. For a system to learn to parse the continuous stream of sound that we call speech into discrete words and identify those words accurately, they need many samples. Mozilla has set up a page that allows anybody to contribute to that effort by reading sentences:
blog.mozilla.org/internetcitizen/2017/06/19/commonvoice/

An accurate open source speech recognition engine built on cutting-edge technology would make it possible for many more software and app developers to experiment and innovate with speech recognition. This is a key effort to watch – and to contribute to if you can.