The Open UniversitySkip to content
 

Identifying broken plurals in unvowelized Arabic text

Goweder, Abduelbaset; Poesio, Massimo; De Roeck, Anne and Reynolds, Jeff (2004). Identifying broken plurals in unvowelized Arabic text. In: 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004), 24-25 July 2004, Barcelona, Spain.

URL: http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Goweder...
Google Scholar: Look up in Google Scholar

Abstract

Irregular (so-called broken) plural identification in modern standard Arabic is a problematic issue for information retrieval (IR) and language engineering applications, but their effect on the performance of IR has never been examined. Broken plurals (BPs) are formed by altering the singular (as in English: tooth teeth) through an application of interdigitating patterns on stems, and singular words cannot be recovered by standard affix stripping stemming techniques. We developed several methods for BP detection, and evaluated them using an unseen test set. We incorporated the BP detection component into a new light-stemming algorithm that conflates both regular and broken plurals with their singular forms. We also evaluated the new light-stemming algorithm within the context of information retrieval, comparing its performance with other stemming algorithms.

Item Type: Conference Item
Extra Information: pp 246-253
Keywords: Arabic information retrieval; broken plural
Academic Unit/Department: Mathematics, Computing and Technology
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 5004
Depositing User: Anne De Roeck
Date Deposited: 26 Sep 2006
Last Modified: 02 Dec 2010 19:53
URI: http://oro.open.ac.uk/id/eprint/5004
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk