The Open UniversitySkip to content

Identifying broken plurals in unvowelized Arabic text

Goweder, Abduelbaset; Poesio, Massimo; De Roeck, Anne and Reynolds, Jeff (2004). Identifying broken plurals in unvowelized Arabic text. In: 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004), 24-25 Jul 2004, Barcelona, Spain.

Google Scholar: Look up in Google Scholar


Irregular (so-called broken) plural identification in modern standard Arabic is a problematic issue for information retrieval (IR) and language engineering applications, but their effect on the performance of IR has never been examined. Broken plurals (BPs) are formed by altering the singular (as in English: tooth teeth) through an application of interdigitating patterns on stems, and singular words cannot be recovered by standard affix stripping stemming techniques. We developed several methods for BP detection, and evaluated them using an unseen test set. We incorporated the BP detection component into a new light-stemming algorithm that conflates both regular and broken plurals with their singular forms. We also evaluated the new light-stemming algorithm within the context of information retrieval, comparing its performance with other stemming algorithms.

Item Type: Conference or Workshop Item
Extra Information: pp 246-253
Keywords: Arabic information retrieval; broken plural
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Item ID: 5004
Depositing User: Anne De Roeck
Date Deposited: 26 Sep 2006
Last Modified: 07 Dec 2018 08:57
Share this page:

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU