Copy the page URI to the clipboard
Yin, Ling and Power, Richard
(2006).
DOI: https://doi.org/10.1007/11735106_17
URL: http://www.springerlink.com/content/x0h57n741gj97r...
Abstract
This paper presents a machine-learning approach for ranking web documents according to the proportion of procedural text they contain. By 'pro-cedural text' we refer to ordered lists of steps, which are very common in some instructional genres such as online manuals. Our initial training corpus is built up by applying some simple heuristics to select documents from a large collection and contains only a few documents with a large proportion of procedural texts. We adapt the Naive Bayes classifier to better fit this less than ideal training corpus. This adapted model is compared with several other classifiers in ranking procedural texts using different sets of features and is shown to perform well when only highly distinctive features are used.
Viewing alternatives
Metrics
Public Attention
Altmetrics from AltmetricNumber of Citations
Citations from Dimensions- Unknown Version (PDF) This file is not available for public download
Item Actions
Export
About
- Item ORO ID
- 8471
- Item Type
- Book Section
- ISBN
- 3-540-33347-9, 978-3-540-33347-0
- ISSN
- 1611-3349
- Extra Information
- Proceedings of the 28th European Conference on IR Research, ECIR 2006, London, UK, April 10-12, 2006.
- Keywords
- machine learning; information retrieval
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
- Centre for Research in Computing (CRC)
- Depositing User
- Richard Power