Twitter-roBERTa-base for Sentiment Analysis

This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English.

The output of the text sentiment analysis model is one of three labels denoting the sentiment of the text:

0 -> Negative
1 -> Neutral
2 -> Positive

Pro Tip

Note that the sentiment analysis model is suitable for English text. If you require this workflow on text block in different languages, you create a derived custom workflow and insert a text translation model before the sentiment analysis model. This will ensure that the sentiment analysis model is receiving its input text in English.

More Info

Repositories:
- Original: GitHub
- Latest: GitHub
Hugging Face docs
- Original: cardiffnlp/twitter-roberta-base-sentiment
- Latest: cardiffnlp/twitter-roberta-base-sentiment-latest
- Multilingual: cardiffnlp/twitter-xlm-roberta-base-sentiment

Papers

Original

TWEETEVAL: Unified Benchmark and Comparative Evaluation for Tweet Classification

Authors: Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke

Abstract

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domainspecific data. In this paper, we propose a new evaluation framework (TWEETEVAL) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pretrained generic language models, and continue training them on Twitter corpora.

Latest

TimeLMs: Diachronic Language Models from Twitter

Authors: Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, Jose Camacho-Collados

Abstract

Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models’ capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift.

ID
Model Type ID
Text Classifier
Input Type
text
Output Type
concepts
Description
Text sentiment analysis with 3 classes positive, negative, neutral.
Last Updated
Aug 03, 2022
Privacy
PUBLIC
Use Case
Toolkit
License
Share
Badge