HELP

Contact Us Documentation Quick Start Guide API Status Join our Discord Channel

App Information

AI Lake

Community
Model
general-asr-nemo_jasper

general-asr-nemo_jasper

a

nvidia
asr

--

Notes

Jasper: An End-to-End Convolutional Neural Acoustic Model

The Jasper Model Jasper (“Just Another Speech Recognizer”) [ASR-MODELS6] is a deep time delay neural network (TDNN) comprising of blocks of 1D-convolutional layers. The Jasper family of models are denoted as Jasper_[BxR] where B is the number of blocks and R is the number of convolutional sub-blocks within a block. Each sub-block contains a 1-D convolution, batch normalization, ReLU, and dropout:


Figure 1: Jasper BxR model: B- number of blocks, R- number of sub-blocks	Figure 2: Jasper Dense Residual

Performance

The following table reports the word error rate (WER) of the acoustic model with greedy decoding on all LibriSpeech dev and test datasets for mixed precision training. | Number of GPUs | Batch size per GPU | Precision | dev-clean WER | dev-other WER | test-clean WER | test-other WER | |-----|-----|-------|-------|-------|------|-------| | 8 | 64 | mixed | 3.20 | 9.78 | 3.41 | 9.71 |

Note:

This Jasper model was trained on a combination of seven datasets of English speech, with a total of 7,133 hours of audio samples. Samples were limited to a minimum duration of 0.1s long, and a maximum duration of 16.7s long. The model was trained for 600 epochs with Apex/Amp optimization level O1.

The model will work for relatively short (<25 seconds) files.

ID
Name
general-asr-nemo_jasper
Model Type ID
Audio To Text
Description
--
Last Updated
Nov 23, 2022
Privacy
PUBLIC
License
Share
Badge