Zhou, Deyu and He, Yulan
PDF (Version of Record)
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
|Google Scholar:||Look up in Google Scholar|
The knowledge about gene clusters and protein interactions is important for biological researchers to unveil the mechanism of life. However, large quantity of the knowledge often hides in the literature, such as journal articles, reports, books and so on. Many approaches focusing on extracting information from unstructured text, such as pattern matching, shallow and deep parsing, have been proposed especially for extracting protein-protein interactions (Zhou and He, 2008). A semantic parser based on the Hidden Vector State (HVS) model for extracting protein-protein interactions is presented in (Zhou et al., 2008). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. Maximum Likelihood estimation (MLE) is used to derive the parameters of the HVS model. In this paper, we propose a discriminative approach based on parse error measure to train the HVS model. To adjust the HVS model to achieve minimum parse error rate, the generalized probabilistic descent (GPD) algorithm (Kuo et al., 2002) is used. Experiments have been conducted on the GENIA corpus. The results demonstrate modest improvements when the discriminatively trained HVS model outperforms its MLE trained counterpart by 2.5% in F-measure on the GENIA corpus.
|Item Type:||Conference Item|
|Copyright Holders:||2008 Association for Computational Linguistics|
|Academic Unit/Department:||Knowledge Media Institute|
|Interdisciplinary Research Centre:||Centre for Research in Computing (CRC)|
|Depositing User:||Yulan He|
|Date Deposited:||08 Dec 2010 11:48|
|Last Modified:||24 Feb 2016 08:35|
|Share this page:|
► Automated document suggestions from open access sources
Download history for this item
These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.