Semi-supervised learning of the hidden vector state model for protein-protein interactions extraction

Zhou, Deyu; He, Yulan and Kwoh, Chee Keong (2007). Semi-supervised learning of the hidden vector state model for protein-protein interactions extraction. In: 2007 IEEE Symposium on Computational Intelligence and Data Mining, pp. 674–680.

DOI: https://doi.org/10.1109/CIDM.2007.368941

Abstract

A major challenge in text mining for biology and biomedicine is automatically extracting protein-protein interactions from the vast amount of biological literature since most knowledge about them still hides in biological publications. Existing approaches can be broadly categorized as rule-based or statistical-based. Rule-based approaches require heavy manual efforts. On the other hand, statistical-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. The hidden vector state (HVS) model, an extension of the basic discrete Markov model, has been successfully applied to extract protein-protein interactions. In this paper, we propose a novel approach to train the HVS model on both annotated and un-annotated corpus. Sentences selection algorithm is designed to utilize the semantic parsing results of the un-annotated corpus generated by the HVS model. Experimental results show that the performance of the initial HVS model trained on a small amount of the annotated data can be improved by employing this approach

Viewing alternatives

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions

Export

About