The Open UniversitySkip to content
 

Automatic ontology-based knowledge extraction from web documents

Alani, Harith; Kim, Sanghee; Millard, David E.; Weal, Mark J.; Hall, Wendy; Lewis, Paul H. and Shadbolt, Nigel R. (2003). Automatic ontology-based knowledge extraction from web documents. IEEE Intelligent Systems, 18(1) pp. 14–21.

Full text available as:
[img]
Preview
PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (4Mb)
DOI (Digital Object Identifier) Link: http://dx.doi.org/10.1109/MIS.2003.1179189
Google Scholar: Look up in Google Scholar

Abstract

To bring the Semantic Web to life and provide advanced knowledge services, we need efficient ways to access and extract knowledge from Web documents. Although Web page annotations could facilitate such knowledge gathering, annotations are rare and will probably never be rich or detailed enough to cover all the knowledge these documents contain. Manual annotation is impractical and unscalable, and automatic annotation tools remain largely undeveloped.Specialized knowledge services therefore require tools that can search and extract specific knowledge directly from unstructured text on the Web, guided by an ontology that details what type of knowledge to harvest. An ontology uses concepts and relations to classify domain knowledge. Other researchers have used ontologies to support knowledge extraction,1,2 but few have explored their full potential in this domain.
The Artequakt project links a knowledge-extraction tool with an ontology to achieve continuous knowledge support and guide information extraction. The extraction tool searches online documents and extracts knowledge that matches the given classification structure. It provides this knowledge in a machine-readable format that will be automatically maintained in a knowledge base (KB). Users could further enhance knowledge extraction using a lexicon-based term expansion mechanism that provides extended ontology terminology.

Item Type: Journal Article
Copyright Holders: 2003 IEEE
ISSN: 1541-1672
Academic Unit/Department: Knowledge Media Institute
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 20051
Depositing User: Harith Alani
Date Deposited: 15 Apr 2010 12:56
Last Modified: 05 Dec 2010 10:33
URI: http://oro.open.ac.uk/id/eprint/20051
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk