The Open UniversitySkip to content
 

Semi-supervised learning of the hidden vector state model for protein-protein interactions extraction

Zhou, Deyu; He, Yulan and Kwoh, Chee Keong (2007). Semi-supervised learning of the hidden vector state model for protein-protein interactions extraction. In: IEEE Symposium on Computational Intelligence and Data Mining, 2007 (CIDM 2007), 01-05 Apr 2007, Honolulu, HI , pp. 674–680.

Full text available as:
Full text not publicly available
Due to copyright restrictions, this file is not available for public download
DOI (Digital Object Identifier) Link: http://dx.doi.org/10.1109/CIDM.2007.368941
Google Scholar: Look up in Google Scholar

Abstract

A major challenge in text mining for biology and biomedicine is automatically extracting protein-protein interactions from the vast amount of biological literature since most knowledge about them still hides in biological publications. Existing approaches can be broadly categorized as rule-based or statistical-based. Rule-based approaches require heavy manual efforts. On the other hand, statistical-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. The hidden vector state (HVS) model, an extension of the basic discrete Markov model, has been successfully applied to extract protein-protein interactions. In this paper, we propose a novel approach to train the HVS model on both annotated and un-annotated corpus. Sentences selection algorithm is designed to utilize the semantic parsing results of the un-annotated corpus generated by the HVS model. Experimental results show that the performance of the initial HVS model trained on a small amount of the annotated data can be improved by employing this approach

Item Type: Conference Item
Copyright Holders: 2007 IEEE
Extra Information: IEEE Symposium on Computational Intelligence and Data Mining, 2007. CIDM 2007.
ISBN: 1-4244-0705-2
Keywords: biological literature; biological publications; biology text mining; biomedicine text mining; discrete Markov model; hidden vector state model; protein-protein interactions extraction; semantic parsing; semisupervised learning
Academic Unit/Department: Knowledge Media Institute
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 23796
Depositing User: Kay Dave
Date Deposited: 03 Mar 2011 10:43
Last Modified: 27 Oct 2012 19:01
URI: http://oro.open.ac.uk/id/eprint/23796
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk