asr-wav2vec2-large-xlsr-cantonese

Audio transcription model for converting Cantonese audio to Chinese text

Notes

huggingface model id: scottykwok/wav2vec2-large-xlsr-cantonese

Wav2vec2-large-xlsr-cantonese

This model was based on wav2vec2-large-xlsr-53, finetuned using Common Voice/zh-HK/6.1.0. The training code is similar to user ctl, except that the number of training epochs was 80 (doubled) and fp16_backend is apex. The model was trained using a single RTX 3090 and docker image is nvidia/cuda:11.1-cudnn8-devel.

CER is 15.11% when evaluate against common voice zh-HK test set.

Result (CER)

15.11%

Source Code

See this GitHub Repo cantonese-selfish-project and demo video.

  • ID
  • Model Type ID
    Audio To Text
  • Description
    Audio transcription model for converting Cantonese audio to Chinese text
  • Last Updated
    Jun 28, 2022
  • Privacy
    PUBLIC
  • Toolkit
  • License
  • Share
    • Badge
      asr-wav2vec2-large-xlsr-cantonese