Corpus profiling with Nootropia

Nanas, Nikolaos and De Roeck, Anne (2008). Corpus profiling with Nootropia. In: BCS-IRSG Workshop on Corpus Profiling, 18 Oct 2008, London.



The characteristics of different corpora influence the success of Information Retrieval and NLP methods. How to best characterise a corpus is still an unexplored research area. In this paper, we use a model that has so far been applied for user profiling in Information Filtering, to profile the corpora of the TIPSTER collection. Each corpus profile is a network of terms that allows the extraction of a series of statistical features. These features can be used to calculate the similarity between the corpora in TIPSTER. This is part of ongoing work that aims at providing a corpus profiling service that will map corpora to their features and to the corresponding experimental results of various models and techniques.

