The Data and Decision Sciences Lab is constantly working with local community partners on several data related projects that have a local and global impact.
Speech Recognition in Background Noise by a Convolution Neural Networks Model
Community Partner: Mutual of Omaha
Team Members Involved
The use of voice recognition has increased substantially in everyday life. Many leading companies are using speech recognition technologies to develop personal assistants such as Amazon’s Alexa, Google Assistant, and Apple’s Siri. Speech recognition can also be used to extract valuable business information from calls to create a more personalized customer experience, or to enhance business processes. A mixture of neural network techniques is currently used to transcribe speech into text; however, unhandled noise can wreck the accuracy of speech recognition and these models may fail to obtain reliable results in this case. To address this issue, this project developed a speech recognition model to transcribe noisy speech into text. The “Voices Obscured in Complex Environmental Settings” (VOiCES) dataset, which is available on Amazon Web Services, was used and convolution neural networks (CNNs) were utilized to train the speech recognition model. The performance of the model was measured by the word error rate (WER) metric, which considers the number of substitutions, deletions, and insertions when comparing the original and predicted texts. Computational results showed that the proposed CNN model predicted the noisy speeches from VOiCES with an average WER of 13%.