AfriSenti-SemEval Shared Task 12

AfriSenti-SemEval: Sentiment Analysis for African Languages

Part of the 17th International Workshop on Semantic Evaluation

Please use the following BibTex entry to cite us if you use our dataset:

SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval).

@inproceedings
{muhammadSemEval2023,
title = {{SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)}},
author = {Shamsuddeen Hassan Muhammad and Idris Abdulmumin and Seid Muhie Yimam and David Ifeoluwa Adelani and Ibrahim Sa'id Ahmad and Nedjma Ousidhoum and Abinew Ali Ayele and Saif M. Mohammad and Meriem Beloucif and Sebastian Ruder},
booktitle = {Proceedings of the 17th {{International Workshop}} on {{Semantic Evaluation}} ({{SemEval-2023}})},
publisher = {{Association for Computational Linguistics}},
year = {2023},
url = {https://arxiv.org/pdf/2304.06845.pdf}
}

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages.

@misc
{muhammad2023afrisenti,
title={{AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages}},
author={Shamsuddeen Hassan Muhammad and Idris Abdulmumin and Abinew Ali Ayele and Nedjma Ousidhoum and David Ifeoluwa Adelani and Seid Muhie Yimam and Ibrahim Sa'id Ahmad and Meriem Beloucif and Saif M. Mohammad and Sebastian Ruder and Oumaima Hourrane and Pavel Brazdil and Felermino Dário Mário António Ali and Davis David and Salomey Osei and Bello Shehu Bello and Falalu Ibrahim and Tajuddeen Gwadabe and Samuel Rutunda and Tadesse Belay and Wendimu Baye Messelle and Hailu Beshada Balcha and Sisay Adugna Chala and Hagos Tesfahun Gebremichael and Bernard Opoku and Steven Arthur},
year={2023},
doi={10.48550/arXiv.2302.08956},
url={https://arxiv.org/pdf/2302.08956.pdf}
}

Contact organizers at: afrisenti-semeval-organizers@googlegroups.com

Visit CodaLab competition website

AfriSenti dataset is available at task's:GitHub repo


Motivation

Due to the widespread use of the Internet and social media platforms, most languages are becoming digitally available. This allows for various artificial intelligence (AI) applications that enable tasks such as sentiment analysis, machine translation and hateful content detection. According to UNESCO (2003), 30% of all living languages, around 2,058, are African languages. However, most of these languages do not have curated datasets for developing such AI applications. Recently, various individual and funded initiatives, such as the Lacuna Fund, have set out to reverse this trend and create such datasets for African languages. However, research is required to determine both the suitability of current natural language processing (NLP) techniques and the development of novel techniques to maximize the applications of such datasets.

There has been a growing interest in sentiment analysis which applies to many domains, including public health, commerce/business, art and literature, social sciences, neuroscience, and psychology (Mohammad, Saif M, 2022). Previous shared tasks on sentiment analysis include Mohammad, Saif M et al., (2018), Nakov et al., (2016), Pontiki et al., Ghosh et al., (2015), (2014), and so on. However, none of these tasks included African languages. Though Mohammad, Saif, et al. (2018) included standard Arabic, we focus on Arabic dialects from African countries: Algerian Arabic and Tunisian Arabizi. We believe SemEval is the right venue, due to its popularity and widespread acceptance, to carry out shared tasks for African languages to strengthen their further development.

In this shared task, we have covered 17 African languages, Hausa, Yoruba, Igbo, Nigerian Pidgin from Nigeria, Amharic, Tigrinya, and Oromo from Ethiopia, Swahili from Kenya and Tanzania, Algerian Arabic dialect from Algeria, Kinyarwanda from Rwanda, Twi from Ghana, Mozambique Portuguese from Mozambique  and Moroccan Arabic/Darija from Morocco.

Task Overview

The AfriSenti-SemEval Shared Task 12 is based on a collection of Twitter datasets in 14 African languages for sentiment classification. It consists of three sub-tasks. Participants can select one or more sub-tasks depending on their preference. In each sub-task also, the participant may wish to participate in any number of languages as so wished.

Task A: Monolingual Sentiment Classification

Given training data in a target language, determine the polarity of a tweet in the target language (positive, negative, or neutral). If a tweet conveys both a positive and negative sentiment, whichever is the stronger sentiment should be chosen. This sub-task has 15 tracks:

Note: You are free to select one or more tracks in this sub-task.

  • Track 1: Hausa 
  • Track 2: Yoruba
  • Track 3: Igbo
  • Track 4: Nigerian_Pidgin
  • Track 5: Amharic
  • Track 6: Algerian Arabic
  • Track 7: Moroccan Arabic/Darija,
  • Track 8: Swahili
  • Track 9: Kinyarwanda
  • Track 10: Twi
  • Track 11: Mozambican Portuguese
  • Track 12: Xitsonga (Mozambique Dialect)
  • Track 13: Setswana (data to be released soon)
  • Track 14: isiZulu (data to be released soon)
  • Track 15: Xitsonga (South-African Dialect, to be released soon)

Note: Tweets in each language are code-mix. Read our NaijaSenti paper for more information.

Task B: Multilingual Sentiment Classification

Given combined training data from Task-A (Track 1 to 12), determine the polarity of a tweet in the target language (positive, negative, or neutral). This sub-task has only one track with 12 languages (Hausa, Yoruba, Igbo, Nigerian_Pidgin, Amharic, Algerian Arabic, Moroccan Arabic/Darija, Swahili, Kinyarwanda, Twi, Mozambican Portuguese, and Xitsonga(Mozambique Dialect)):

  • Track 16: 12 languages in Task A

Task C: Zero-Shot Sentiment Classification

Given unlabelled tweets in two African languages (Tigrinya and Oromo), leverage any or all of the available training datasets (in Task:A ) to determine the sentiment of a tweet in the two target languages. This task has two (2) tracks.

Note: You are free to select one or more tracks in this sub-task.

  • Track 17: Zero-Shot on Tigrinya
  • Track 18: Zero-Shot on Oromo

Dataset Examples

The dataset involves tweets labeled with three sentiment classes (positive, negative, neutral) in 14 African languages. Each tweet is annotated by three annotators following the annotation guidelines in (Mohammad, Saif M, 2016). We use a form of majority vote to determine the sentiment of the tweet. See more in our paper (Muhammad et al., 2022, Yimam et al., 2020). Below is a sample dataset for the 4 Nigerian languges (Muhammad et al., 2022):

Dataset Example

The datasets are available via the CodaLab competition website

Starter kit

We provide a Starter Kit on our GitHub Repo that can be used to crearte a baseline system.

Why Participate ?

  • Promote NLP research involving African languages,
  • Opportunity to write a system-description paper that describes their system, resources used, results, and analysis.
  • Stand a chance to win an award.
  • Opportunity to network with renowned experts in the AI and NLP community.

Resources on Paper Submission

Important Dates

Descriptions Deadlines
Sample Data Ready 15 July 2022
Training Data Ready 11 September 2022
Evaluation Start 10 January 2023
Evaluation End 31 January 2023
System Description Paper Due 28th February 2023
Notification to authors 31 March 2023
Camera ready due 21 April 2023
SemEval workshop 2023 13-14 July 2023 (co-located with ACL-2023 in Toronto, Canada

All deadlines are 23:59 UTC-12 ("anywhere on Earth").

Communication

Previous Shared Tasks

  1. Shared tasks in English:SemEval-2017,SemEval-2016,SemEval-2015,SemEval-2014,SemEval-2013

  2. Shared tasks in Spanish TASS-2017,TASS-2016,TASS-2015,TASS-2014,TASS-2013,TASS-2012.

References

  1. UNESCO. 2003. Sharing the world of difference. UNESCO.
  2. Mohammad, Saif M. "Ethics sheet for automatic emotion recognition and sentiment analysis." Computational Linguistics 48.2 (2022): 239-278.
  3. Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. 2016. Developing a successful SemEval task in sentiment analysis of twitter and other social media texts. Language Resources and Evaluation, 50(1):35–65.
  4. Mohammad, Saif, et al. "Semeval-2018 task 1: Affect in tweets." Proceedings of the 12th international workshop on semantic evaluation. 2018.
  5. Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar. 2014: SemEval-2014 Task 4: Aspect Based Sentiment Analysis, Dublin, Ireland
  6. Saif Mohammad. 2016. A Practical Guide to Sentiment Annotation: Challenges and Solutions. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 174–179, San Diego, California. Association for Computational Linguistics.
  7. Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso, Ekaterina Shutova, John Barnden, Antonio Reyes. 2015: SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter, Denver, Colorado
  8. Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Sebastian Ruder, Ibrahim Said Ahmad, Idris Abdulmumin, Bello Shehu Bello, Monojit Choudhury, Chris Chinenye Emezue, Saheed Salahudeen Abdullahi, Anuoluwapo Aremu, Alipio Jeorge, Pavel Brazdil. 2022, NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis, Marseille, France
  9. Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele, Chris Biemann. 2020: Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models, Barcelona, Spain (Online)

Funding Acknowledgements

This prize award for this shared task was generously supported by a grant from Lacuna Fund.