AfriSenti-SemEval Shared Task 12
AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset
Contact organizers at: email@example.com
Join Task Slack Channel to communicate with the organizers.
Dataset is released : visit CodaLab competition website
Due to the widespread use of the Internet and social media platforms, most languages are becoming digitally available. This allows for various artificial intelligence (AI) applications that enable tasks such as sentiment analysis, machine translation and hateful content detection. According to UNESCO (2003), 30% of all living languages, around 2,058, are African languages. However, most of these languages do not have curated datasets for developing such AI applications. Recently, various individual and funded initiatives, such as the Lacuna Fund, have set out to reverse this trend and create such datasets for African languages. However, research is required to determine both the suitability of current natural language processing (NLP) techniques and the development of novel techniques to maximize the applications of such datasets.
There has been a growing interest in sentiment analysis which applies to many domains, including public health, commerce/business, art and literature, social sciences, neuroscience, and psychology (Mohammad, Saif M, 2022). Previous shared tasks on sentiment analysis include Mohammad, Saif M et al., (2018), Nakov et al., (2016), Pontiki et al., Ghosh et al., (2015), (2014), and so on . However, none of these tasks included African languages. Though Mohammad, Saif, et al. (2018) included standard Arabic, we focus on Arabic dialects from African countries: Algerian Arabic and Tunisian Arabizi. We believe SemEval is the right venue, due to its popularity and widespread acceptance, to carry out shared tasks for African languages to strengthen their further development.
In this shared task, we have covered 13 African languages, 4 languages from Nigeria (Hausa, Yoruba, Igbo, Nigerian Pigdin), 3 from Ethiopia (Amharic and Tigrinya), Swahili from Kenya and Tanzania, Algerian Arabic dialect from Algeria, Kinyarwanda from Ruwanda, Twi from Ghana, Mozambican Portuguese from Mozambique, 3 languages from South Africa (isiZulu, Setswana and Tsonga(https://en.wikipedia.org/wiki/Tsonga_language) and Moroccan Arabic/Darija (https://en.wikipedia.org/wiki/Moroccan_Arabic).
In this shared task, we have covered 13 African languages, 4 languages from Nigeria (Hausa, Yoruba, Igbo, Nigerian Pigdin), 3 from Ethiopia (Amharic and Tigrinya), Swahili from Kenya and Tanzania, Algerian Arabic dialect from Algeria, Kinyarwanda from Ruwanda, Twi from Ghana, Mozambican Portuguese from Mozambique and 3 languages from South Africa (isiZulu, Setswana, Tsonga(https://en.wikipedia.org/wiki/Tsonga_language) and Moroccan Arabic/Darija (https://en.wikipedia.org/wiki/Moroccan_Arabic)
The AfriSenti-SemEval Shared Task 12 is based on a collection of Twitter datasets in 14 African languages for sentiment classification. It consists of three sub-tasks. Participants can select one or more tasks depending on their preference.
Task A: Monolingual Sentiment Classification
Given training data in a target language, determine the polarity of a tweet in the target language (positive, negative, or neutral). If a tweet For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen. This sub-task has 12 tracks:
- Track 1: Hausa
- Track 2: Yoruba
- Track 3: Igbo
- Track 4: Nigerian_Pidgin
- Track 5: Amharic
- Track 6: Algerian Arabic
- Track 7: Kinyarwanda
- Track 8: Twi
- Track 9: Mozambican Portuguese
- Track 10: Swahili
- Track 11: Setswana
- Track 12: isiZulu
- Track 13: Moroccan Arabic/Darija,
Note: Tweets in each language are code-mix. Read our NaijaSenti paper for more information.
Task B: Multilingual Sentiment Classification
Given a combined training data from 10 African languages, determine the polarity of a tweet in the target language (positive, negative, or neutral). This sub-task has only one track:
- Track 13 : All languages in Task A
Task C: Zero-Shot Sentiment Classification
Given unlabeled tweets in two African languages (Tigrinya and Xithonga), leverage any or all of the available training datasets in Subtasks 1 and 2 to determine the sentiment of a tweet in the two target languages is positive, negative, or neutral. This task has two tracks
- Track 14 : Zero-Shot on Tigrinya
- Track 15 : Zero-Shot on Tsonga
- Track 16 : Zero-Shot on Oromo
The dataset involves tweets labeled with three sentiment classes (positive, negative, neutral) in 14 African languages. Each tweet is annotated by three annotators following the annotation guidelines in (Mohammad, Saif M, 2016). We use a form of majority vote to determine the sentiment of the tweet. See more in our paper (Muhammad et al., 2022, Yimam et al., 2020). Below is a sample dataset for the 4 Nigerian languges (Muhammad et al., 2022):
The datasets are available on Github
Why Participate ?
- Promote NLP research involving African languages,
- Opportunity to write and submit a system-description paper to SemEval2023 workshop to be co-located with ACL 2023
- Stand a chance to win award.
- Opportunity to network with renowned experts in the NLP and AI area.
We will soon release a starter kit for all three sub-task. The starter-kit will be a collab notebook that can be used to create a baseline system. Stay tuned.
|Sample Data Ready||~~15 July 2022~~|
|Training Data Ready||11 September 2022|
|Evaluation Start||10 January 2023|
|Evaluation End||31 January 2023|
|System Description Paper Due||February 2023|
|Notification to authors||March 2023|
|Camera ready due||April 2023|
|SemEval workshop Summer 2023||(co-located with a major NLP conference)|
All deadlines are 23:59 UTC-12 ("anywhere on Earth").
- Join Task Mailing List
- Join Task Slack Channel to communicate with the organizers.
- Contact Organizers: firstname.lastname@example.org
Previous Shared Tasks
- UNESCO. 2003. Sharing the world of difference. UNESCO.
- Mohammad, Saif M. "Ethics sheet for automatic emotion recognition and sentiment analysis." Computational Linguistics 48.2 (2022): 239-278.
- Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. 2016. Developing a successful SemEval task in sentiment analysis of twitter and other social media texts. Language Resources and Evaluation, 50(1):35–65.
- Mohammad, Saif, et al. "Semeval-2018 task 1: Affect in tweets." Proceedings of the 12th international workshop on semantic evaluation. 2018.
- Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar. 2014: SemEval-2014 Task 4: Aspect Based Sentiment Analysis, Dublin, Ireland
- Saif Mohammad. 2016. A Practical Guide to Sentiment Annotation: Challenges and Solutions. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 174–179, San Diego, California. Association for Computational Linguistics.
- Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso, Ekaterina Shutova, John Barnden, Antonio Reyes. 2015: SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter, Denver, Colorado
- Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Sebastian Ruder, Ibrahim Said Ahmad, Idris Abdulmumin, Bello Shehu Bello, Monojit Choudhury, Chris Chinenye Emezue, Saheed Salahudeen Abdullahi, Anuoluwapo Aremu, Alipio Jeorge, Pavel Brazdil. 2022, NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis, Marseille, France
- Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele, Chris Biemann. 2020: Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models, Barcelona, Spain (Online)