MEMpHIS: Towards a New Benchmark for Arabic Figurative Speech Classification

Tracking #: 918-1898

Authors:

	Name	ORCID
	Zouheir Banou	https://orcid.org/0000-0002-6427-2342
	Sanaa El Filali	https://orcid.org/0000-0002-8933-1564
	El Habib Benlahmar	https://orcid.org/0000-0001-7098-4621
	Fatima-Zahra Alaoui	https://orcid.org/0000-0002-6905-845X
	Laila Eljiani	https://orcid.org/0000-0002-1797-7446

Submission Type:

Research Paper

Abstract:

The automatic detection and interpretation of figurative language remain critical challenges in Natural Language Processing (NLP), particularly for languages with limited annotated resources. In this work, we introduce 6- Figure, a novel dataset designed to facilitate the computational analysis of figurative speech in Arabic. Our dataset deals with figures of speech metaphors, idioms, similes, metonymy, hyperboles and euphemisms, annotated at sentence-level. The dataset is sourced from various previous research works, ensuring a good quality benchmark for Arabic language. We provide baseline results using transformer-based models and recurrent models, highlighting key challenges and areas for future research. This dataset serves as a valuable resource for advancing figurative language understanding and improving NLP models in Arabic. The dataset and annotation guidelines will be publicly released to encourage further research.

Manuscript:

ds-paper-918.pdf

Data repository URLs:

ZOUHEIRBN/MEMPHIS-data

Date of Submission:

Monday, June 2, 2025

Date of Decision:

Monday, June 16, 2025

Nanopublication URLs:

Decision:

Reject (Pre-Screening)

Data Science