Copy the page URI to the clipboard
Zhou, Deyu; He, Yulan and Kwoh, Chee Keong
(2007).
DOI: https://doi.org/10.1109/CIDM.2007.368941
Abstract
A major challenge in text mining for biology and biomedicine is automatically extracting protein-protein interactions from the vast amount of biological literature since most knowledge about them still hides in biological publications. Existing approaches can be broadly categorized as rule-based or statistical-based. Rule-based approaches require heavy manual efforts. On the other hand, statistical-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. The hidden vector state (HVS) model, an extension of the basic discrete Markov model, has been successfully applied to extract protein-protein interactions. In this paper, we propose a novel approach to train the HVS model on both annotated and un-annotated corpus. Sentences selection algorithm is designed to utilize the semantic parsing results of the un-annotated corpus generated by the HVS model. Experimental results show that the performance of the initial HVS model trained on a small amount of the annotated data can be improved by employing this approach
Viewing alternatives
Metrics
Public Attention
Altmetrics from AltmetricNumber of Citations
Citations from Dimensions- Published Version (PDF) This file is not available for public download
Item Actions
Export
About
- Item ORO ID
- 23796
- Item Type
- Conference or Workshop Item
- Extra Information
-
IEEE Symposium on Computational Intelligence and Data Mining, 2007. CIDM 2007.
ISBN: 1-4244-0705-2 - Keywords
- biological literature; biological publications; biology text mining; biomedicine text mining; discrete Markov model; hidden vector state model; protein-protein interactions extraction; semantic parsing; semisupervised learning
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
- Centre for Research in Computing (CRC)
- Copyright Holders
- © 2007 IEEE
- Depositing User
- Kay Dave