- Community
- Model
- moderation-multilingual-text-classification
Notes
Multilingual Text Moderation
Introduction
The multilingual text moderation model can analyze text and detect harmful content. It is especially useful for detecting third-party content that may cause problems.
This model returns a list of concepts along with their corresponding probability scores indicating the likelihood that these concepts are present in the text. The list of concepts includes:
- toxic
- insult
- obscene
- identity_hate
- severe_toxic
- threat
Text moderation can be performed in the top 100 languages with the largest Wikipedias:
Afrikaans, Albanian, Arabic, Aragonese, Armenian, Asturian, Azerbaijani, Bashkir, Basque, Bavarian, Belarusian, Bengali, Bishnupriya Manipuri, Bosnian, Breton, Bulgarian, Burmese, Catalan, Cebuano, Chechen, Chinese (Simplified), Chinese (Traditional), Chuvash, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian, Hebrew, Hindi, Hungarian, Icelandic, Ido, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latin, Latvian, Lithuanian, Lombard, Low Saxon, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Marathi, Minangkabau, Nepali, Newar, Norwegian (Bokmal), Norwegian (Nynorsk), Occitan, Persian (Farsi), Piedmontese, Polish, Portuguese, Punjabi, Romanian, Russian, Scots, Serbian, Serbo-Croatian, Sicilian, Slovak, Slovenian, South Azerbaijani, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Volapük, Waray-Waray, Welsh, West Frisian, Western Punjabi, Yoruba
Our multilingual text moderation model is based on the BERT NLP model architecture, which is a transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google. It was fine-tuned using Kaggle's Toxic Comment Classification Challenge dataset..
Pro Tip
You can perform audio moderation by converting audio to text (such as Facebook ASR model) and then using this multilingual text moderation model on the generated text.
BERT
Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based machine learning technique for NLP pre-training developed by Google. The BERT model took less than a year to become an established baseline in NLP experiments.
There are two original English-language BERT models:
- BERT-BASE: 12 encoders with 12 bidirectional self-attention heads.
- BERT-LARGE: 24 encoders with 16 bidirectional self-attention heads.
Both models are pre-trained from unlabeled data extracted from the BooksCorpus with 800M words and English Wikipedia with 2,500M words.
BERT is a transformer language model with a variable number of encoder layers and self-attention heads. It was pre-trained with two tasks in mind:
Language modeling: 15% of tokens were masked, and BERT was trained to predict them from context.
Next sentence prediction: BERT was trained to predict if a chosen next sentence was probable or not given the first sentence.
After pretraining, BERT learns contextual embeddings for words. It is then fine-tuned on smaller datasets to address specific tasks.
BERT Paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Abstract
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Transformer Paper
Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
Abstract
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
The Transformer - model architecture
Scaled Dot Product Attention
Multi-Head Attention
BERT Author
Dataset
Our multilingual text moderation model was trained on Kaggle's Toxic Comment Classification Challenge dataset. This dataset was made available as part of Kaggle's competition (2018), challenging participants to build a multi-headed model capable of detecting different types of toxicity (threats, obscenity, insults, identity-based hate) in text better than Conversation AI's proprietary Perspective API.
From Kaggle's challenge description:
The Conversation AI team, a research initiative founded by Jigsaw and Google (both a part of Alphabet) are working on tools to help improve online conversation. One area of focus is the study of negative online behaviors, like toxic comments (i.e. comments that are rude, disrespectful or otherwise likely to make someone leave a discussion). So far they’ve built a range of publicly available models served through the Perspective API, including toxicity. But the current models still make errors, and they don’t allow users to select which types of toxicity they’re interested in finding (e.g. some platforms may be fine with profanity, but not with other types of toxic content).
Dataset Author
Author
- ID
- NameMultilingual Text Moderation
- Model Type IDText Classifier
- DescriptionMultilingual text classification model of concepts: toxic, insult, obscene, identity_hate, severe_toxic, and threat; in the selected language.
- Last UpdatedOct 16, 2024
- PrivacyPUBLIC
- Use Case
- Toolkit
- License
- Share
- Badge