huggingface model id: vumichien/wav2vec2-large-xlsr-japanese
Wav2Vec2-Large-XLSR-53-Japanese
Fine-tuned facebook/wav2vec2-large-xlsr-53 on Japanese using the Common Voice and Japanese speech corpus of Saruwatari-lab, University of Tokyo JSUT.
When using this model, make sure that your speech input is sampled at 16kHz.
Evaluation
The model can be evaluated on the Japanese test data of Common Voice.
Test Result
WER: 30.84%,
CER: 17.85%
Training
The Common Voice train, validation datasets and Japanese speech corpus basic5000 datasets were used for training.
ID
Model Type ID
Audio To Text
Input Type
audio
Output Type
text
Description
Audio transcription model for converting Japanese audio to Japanese text