The Open UniversitySkip to content

A Text Mining Approach to the Prediction of a Disease Status from Clinical Discharge Summaries

Yang, Hui; Spasic, Irena; Keane, John A. and Nenadic, Goran (2009). A Text Mining Approach to the Prediction of a Disease Status from Clinical Discharge Summaries. Journal of the American Medical Informatics Association, 16(4) pp. 596–600.

DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


Objective: We present a system developed for the Challenge in Natural Language Processing for Clinical Data - the i2b2 obesity challenge, whose aim was to automatically identify the status of obesity and 15 related co-morbidities in patients using their clinical discharge summaries. The challenge consisted of two tasks, textual and intuitive. The textual task was to identify explicit references to the diseases, whereas the intuitive task focused on the prediction of the disease status when the evidence was not explicitly asserted.

Design: We assembled a set of resources to lexically and semantically profile the diseases and their associated symptoms, treatments, etc. These features were explored in a hybrid text mining approach, which combined dictionary look-up, rule-based and machine-learning methods.

Measurements: The methods were applied on a set of 507 previously unseen discharge summaries, and the predictions were evaluated against the manually prepared gold standard. The overall ranking of the participating teams was primarily based on the macro-averaged F-measure.

Results: The implemented method achieved the macro-averaged F-measure of 81% for the textual task (which was the highest achieved in the challenge) and 63% for the intuitive task (ranked 7th out of 28 teams - the highest was 66%). The micro-averaged F-measure showed an average accuracy of 97% for textual and 96% for intuitive annotations.

Conclusion: The performance achieved was in line with the agreement between human annotators, indicating the potential of text mining for accurate and efficient prediction of disease statuses from clinical discharge summaries.

Item Type: Journal Item
Copyright Holders: 2009 American Medical Informatics Association
ISSN: 1527-974X
Keywords: i2b2 obesity challenge ; clinical discharge summary ; textual task ; prediction of disease
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 15815
Depositing User: Hui Yang
Date Deposited: 06 Oct 2009 09:50
Last Modified: 04 Oct 2016 10:21
Share this page:


Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU