The Open UniversitySkip to content

Two-stage statistical language models for text database selection

Yang, Hui and Zhang, Minjie (2006). Two-stage statistical language models for text database selection. Information Retrieval, 9(1) pp. 5–31.

DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


As the number and diversity of distributed Web databases on the Internet exponentially increase, it is difficult for user to know which databases are appropriate to search. Given database language models that describe the content of each database, database selection services can provide assistance in locating databases relevant to the information needs of users. In this paper, we propose a database selection approach based on statistical language modeling. The basic idea behind the approach is that, for databases that are categorized into a topic hierarchy, individual language models are estimated at different search stages, and then the databases are ranked by the similarity to the query according to the estimated language model. Two-stage smoothed language models are presented to circumvent inaccuracy due to word sparseness. Experimental results demonstrate that such a language modeling approach is competitive with current state-of-the-art database selection approaches.

Item Type: Journal Item
ISSN: 1386-4564
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Item ID: 12990
Depositing User: Hui Yang
Date Deposited: 30 Jan 2009 01:36
Last Modified: 02 May 2018 12:56
Share this page:


Altmetrics from Altmetric

Citations from Dimensions

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU