multilingual-text-moderation-classifier workflow by clarifai

clarifai
text-moderation
multilingual-text-moderation-classifier

This workflow is wrapped around the Multilingual text moderation model that classifies text as toxic, insult, obscene, identity_hate and severe_toxic.

Notes

Text moderation Classifier

This workflow is wrapped around the Multilingual text moderation model that classifies text as toxic, insult, obscene, identity_hate and severe_toxic.

Multilingual Text Moderation Model

The multilingual text moderation model can analyze text and detect harmful content. It is especially useful for detecting third-party content that may cause problems.

This model returns a list of concepts along with their corresponding probability scores indicating the likelihood that these concepts are present in the text. The list of concepts includes:

toxic
insult
obscene
identity_hate
severe_toxic
threat

Text moderation can be performed in the top 100 languages with the largest Wikipedias:

Afrikaans, Albanian, Arabic, Aragonese, Armenian, Asturian, Azerbaijani, Bashkir, Basque, Bavarian, Belarusian, Bengali, Bishnupriya Manipuri, Bosnian, Breton, Bulgarian, Burmese, Catalan, Cebuano, Chechen, Chinese (Simplified), Chinese (Traditional), Chuvash, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian, Hebrew, Hindi, Hungarian, Icelandic, Ido, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latin, Latvian, Lithuanian, Lombard, Low Saxon, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Marathi, Minangkabau, Nepali, Newar, Norwegian (Bokmal), Norwegian (Nynorsk), Occitan, Persian (Farsi), Piedmontese, Polish, Portuguese, Punjabi, Romanian, Russian, Scots, Serbian, Serbo-Croatian, Sicilian, Slovak, Slovenian, South Azerbaijani, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Volapük, Waray-Waray, Welsh, West Frisian, Western Punjabi, Yoruba

Multilingual text moderation model is based on the BERT NLP model architecture, which is a transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google. It was fine-tuned using Kaggle's Toxic Comment Classification Challenge dataset..

How to use the Multilingual text moderation workflow?

Using Clarifai SDK

Export your PAT as an environment variable. Then, import and initialize the API Client.

Find your PAT in your security settings.

export CLARIFAI_PAT={your personal access token}

Prediction with the workflow

from clarifai.client.workflow import Workflow

workflow_url = 'https://clarifai.com/clarifai/text-moderation/workflows/multilingual-text-moderation-classifier'

text = 'I love this movie and i would watch it again and again!'

prediction = Workflow(workflow_url).predict_by_bytes(text.encode(), input_type="text")

# Get workflow results
print(prediction.results[0].outputs[-1].data)

Using Workflow

To utilize the Sentiment Analysis workflow, you can input text through the Blue Plus Try your own Input button and it will that classifies text as toxic, insult, obscene, identity_hate and severe_toxic.

Workflow ID
multilingual-text-moderation-classifier
Description
This workflow is wrapped around the Multilingual text moderation model that classifies text as toxic, insult, obscene, identity_hate and severe_toxic.
Last Updated
Apr 09, 2024
Privacy
PUBLIC
Share