A language-aware optical character recognition workflow



A shortcoming of the current open-source OCR libraries such as EasyOCR and PaddleOCR is that you need to know the language beforehand. However, EasyOCR supports some specific combinations of languages (e.g. Japanese and English) but it doesn’t allow any arbitrary combination of languages (e.g. English, Japanese, Arabic).

Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. This model proposes an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads. This method outperforms the single-head model with a similar number of parameters in end-to-end recognition tasks and achieves state-of-the-art results on MLT17 and MLT19 joint text detection and script identification benchmarks.

We take the approach of this recent Facebook research paper. Note the codes and models are not released as of this model's creation date.


Distinguishing languages with very similar scripts such as German and English requires domain knowledge about the language itself and can’t be solely based on the language script. Therefore, it is strongly recommended to build the following workflow using text-aggregation and language-id operator to recognize the fine-grained language classes:

PaddleOCR Multiplexed -> text_aggregation_operator -> language_id_operator

Language script recognized: Arabic, Bangla, Chinese, Hindi, Japanese, Korean, Latin, Symbols

  • Workflow ID
  • Description
    A language-aware optical character recognition workflow
  • Last Updated
    Jul 22, 2022
  • Privacy
  • Share