Copy the page URI to the clipboard
Goweder, Abduelbaset; Poesio, Massimo; De Roeck, Anne and Reynolds, Jeff
(2004).
URL: http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Goweder...
Abstract
Irregular (so-called broken) plural identification in modern standard Arabic is a problematic issue for information retrieval (IR) and language engineering applications, but their effect on the performance of IR has never been examined. Broken plurals (BPs) are formed by altering the singular (as in English: tooth teeth) through an application of interdigitating patterns on stems, and singular words cannot be recovered by standard affix stripping stemming techniques. We developed several methods for BP detection, and evaluated them using an unseen test set. We incorporated the BP detection component into a new light-stemming algorithm that conflates both regular and broken plurals with their singular forms. We also evaluated the new light-stemming algorithm within the context of information retrieval, comparing its performance with other stemming algorithms.