Nanas, Nikolaos and De Roeck, Anne
(2008).
|
|
Due to copyright restrictions, this file is not available for public download |
| URL: | http://www.bcs.org/upload/pdf/ewic_ir08_nikosnanas... |
|---|---|
| Google Scholar: | Look up in Google Scholar |
Abstract
The characteristics of different corpora influence the success of Information Retrieval and NLP methods. How to best characterise a corpus is still an unexplored research area. In this paper, we use a model that has so far been applied for user profiling in Information Filtering, to profile the corpora of the TIPSTER collection. Each corpus profile is a network of terms that allows the extraction of a series of statistical features. These features can be used to calculate the similarity between the corpora in TIPSTER. This is part of ongoing work that aims at providing a corpus profiling service that will map corpora to their features and to the corresponding experimental results of various models and techniques.
| Item Type: | Conference Item |
|---|---|
| Copyright Holders: | 2008 Not known |
| Keywords: | corpus profiling, Nootropia |
| Academic Unit/Department: | Mathematics, Computing and Technology > Computing Mathematics, Computing and Technology |
| Interdisciplinary Research Centre: | Centre for Research in Computing (CRC) |
| Related URLs: | |
| Item ID: | 27984 |
| Depositing User: | Catherine McNulty |
| Date Deposited: | 08 Feb 2011 10:10 |
| Last Modified: | 09 Feb 2011 17:13 |
| URI: | http://oro.open.ac.uk/id/eprint/27984 |
Actions (login may be required)
| View Item | |
| Public: Report issue / request change |




