The Open UniversitySkip to content
 

Unsupervised learning of link discovery configuration

Nikolov, Andriy; d'Aquin, Mathieu and Motta, Enrico (2012). Unsupervised learning of link discovery configuration. In: 9th Extended Semantic Web Conference (ESWC 2012), 27 - 31 May 2012, Heraklion, Greece .

Full text available as:
[img]
Preview
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (338Kb)
Google Scholar: Look up in Google Scholar

Abstract

Discovering links between overlapping datasets on the Web is generally realised through the use of fuzzy similarity measures. Configuring such measures is often a non-trivial task that depends on the domain, ontological schemas, and formatting conventions in data. Existing solutions either rely on the user's knowledge of the data and the domain or on the use of machine learning to discover these parameters based on training data. In this paper, we present a novel approach to tackle the issue of data linking which relies on the unsupervised discovery of the required similarity parameters. Instead of using labeled data, the method takes into account several desired properties which the distribution of output similarity values should satisfy. The method includes these features into a fitness criterion used in a genetic algorithm to establish similarity parameters that maximise the quality of the resulting linkset according to the considered properties. We show in experiments using benchmarks as well as real-world datasets that such an unsupervised method can reach the same levels of performance as manually engineered methods, and how the different parameters of the genetic algorithm and the fitness criterion affect the results for different datasets.

Item Type: Conference Item
Copyright Holders: 2012 The Authors
Academic Unit/Department: Knowledge Media Institute
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Related URLs:
Item ID: 33434
Depositing User: Kay Dave
Date Deposited: 30 May 2012 13:16
Last Modified: 25 Oct 2012 14:54
URI: http://oro.open.ac.uk/id/eprint/33434
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk