The Open UniversitySkip to content
 

Training the hidden vector state model from un-annotated corpus

Zhou, Deyu; He, Yulan and Kwoh, Chee Keong (2007). Training the hidden vector state model from un-annotated corpus. In: The International Conference on Computational Science (ICCS 2007), 27-30 May 2007, Beijing, China.

Full text available as:
Full text not publicly available
Due to copyright restrictions, this file is not available for public download
URL: http://www.springerlink.com/content/c458254741383w...
DOI (Digital Object Identifier) Link: http://dx.doi.org/10.1007/978-3-540-72586-2_54
Google Scholar: Look up in Google Scholar

Abstract

Since most knowledge about protein-protein interactions still hides in biological publications, there is an increasing focus on automatically extracting information from the vast amount of biological literature. Existing approaches can be broadly categorized as rule-based or statistically-based. Rule-based approaches require heavy manual effort. On the other hand, statistically-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. We have proposed a hidden vector state (HVS) model for protein-protein interactions extraction. The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. State transitions are factored into a stack shift operation similar to those of a push-down automaton followed by the push of a new preterminal category label. In this paper, we propose a novel approach based on the k-nearest-neighbors classifier to automatically train the HVS model from un-annotated data. Experimental results show the improved performance over the baseline system with the HVS model trained from a small amount of the annotated data.

Item Type: Conference Item
Copyright Holders: 2007 Springer-Verlag
Extra Information: Computational Science – ICCS 2007

7th International Conference, Beijing, China, May 27 - 30, 2007, Proceedings, Part II

Yong Shi, Geert Dick van Albada, Jack Dongarra and Peter M.A. Sloot

ISBN: 978-3-540-72585-5

Lecture Notes in Computer Science 4488
Keywords: information extraction; Hidden Vector State Model; protein-protein interactions
Academic Unit/Department: Knowledge Media Institute
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 23795
Depositing User: Kay Dave
Date Deposited: 09 Mar 2011 14:18
Last Modified: 10 Mar 2011 17:33
URI: http://oro.open.ac.uk/id/eprint/23795
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk