Copy the page URI to the clipboard
Nanas, Nikolaos and De Roeck, Anne
(2008).
URL: http://www.bcs.org/upload/pdf/ewic_ir08_nikosnanas...
Abstract
The characteristics of different corpora influence the success of Information Retrieval and NLP methods. How to best characterise a corpus is still an unexplored research area. In this paper, we use a model that has so far been applied for user profiling in Information Filtering, to profile the corpora of the TIPSTER collection. Each corpus profile is a network of terms that allows the extraction of a series of statistical features. These features can be used to calculate the similarity between the corpora in TIPSTER. This is part of ongoing work that aims at providing a corpus profiling service that will map corpora to their features and to the corresponding experimental results of various models and techniques.
Viewing alternatives
- Published Version (PDF) This file is not available for public download
Item Actions
Export
About
- Item ORO ID
- 27984
- Item Type
- Conference or Workshop Item
- Keywords
- corpus profiling, Nootropia
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
- Centre for Research in Computing (CRC)
- Copyright Holders
- © 2008 Not known
- Related URLs
- Depositing User
- Catherine McNulty