Abstract:
The automatic detection and interpretation of figurative language remain critical challenges in Natural Language Processing (NLP), particularly for languages with limited annotated resources. In this work, we introduce 6- Figure, a novel dataset designed to facilitate the computational analysis of figurative speech in Arabic. Our dataset deals with figures of speech metaphors, idioms, similes, metonymy, hyperboles and euphemisms, annotated at sentence-level. The dataset is sourced from various previous research works, ensuring a good quality benchmark for Arabic language.
We provide baseline results using transformer-based models and recurrent models, highlighting key challenges and areas for future research. This dataset serves as a valuable resource for advancing figurative language understanding and improving NLP models in Arabic. The dataset and annotation guidelines will be publicly released to encourage further research.