Nanas, Nikolaos and De Roeck, Anne
Corpus profiling with Nootropia.
In: BCS-IRSG Workshop on Corpus Profiling, 18 Oct 2008, London.
Full text available as:
Due to copyright restrictions, this file is not available for public download
The characteristics of different corpora influence the success of Information Retrieval and NLP methods. How to best characterise a corpus is still an unexplored research area. In this paper, we use a model that has so far been applied for user profiling in Information Filtering, to profile the corpora of the TIPSTER collection. Each corpus profile is a network of terms that allows the extraction of a series of statistical features. These features can be used to calculate the similarity between the corpora in TIPSTER. This is part of ongoing work that aims at providing a corpus profiling service that will map corpora to their features and to the corresponding experimental results of various models and techniques.
Actions (login may be required)