ASR on low dataset

Question

I am doing an ASR(automatic speech recognition) as master thesis on low key dataset. Voice and text data is labelled. There are around 4000 phrases and around 5 hours speech.

I don't have background in speech or signal processing. How huge would be pre processing task? Could someone give me a pointer on how to start with(MOOC..). Is it possible to make something out of this project in 5 months ?

score 3 · Answer 1 · answered Jan 15 '20 at 21:59

There would be various ways to approach this problem.

For developing a custom solution the latest research utilizes deep learning such as Recurrent Neural Networks to perform Speech Recognition. Here is one of the most known papers around this approach.

I would suggest you look at DeepSpeech. Either use PyTorch or Tensorflow to start from. Here is a PyTorch example and here is a Tensorflow example.

If you don't know either framework go with PyTorch. It's used more in research and thus will find more examples/guidance in the academic field. I also think Tensorflow has a steeper learning curve partly because they change things a lot (v1.0 vs v2.0)

If you can use an existing solution then I recommend using something like Amazon Transcribe or Google Speech to Text, but from experience, these "off the shelf" services are tailored to overgeneralize and aren't very accurate for specific problems domains. Let me know if I can help further

ASR on low dataset

1 Answers1