The Open UniversitySkip to content

Rebuilding visual vocabulary via spatial-temporal context similarity for video retrieval

Wang, Lei; Eylan, Eyad and Song, Dawei (2014). Rebuilding visual vocabulary via spatial-temporal context similarity for video retrieval. In: Multimedia Modelling: 20th Anniversary International Conference, MMM 2014, Dublin, Ireland, January 6-10, 2014, Proceedings, Part I, Lecture Notes in Computer Science, Springer International Publishing, pp. 74–85.

Full text available as:
Full text not publicly available (Version of Record)
Due to publisher licensing restrictions, this file is not available for public download
DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


The Bag-of-visual-Words (BovW) model is one of the most popular visual content representation methods for large-scale contentbased video retrieval. The visual words are quantized according to a visual vocabulary, which is generated by a visual features clustering process (e.g. K-means, GMM, etc). In principle, two types of errors can occur in the quantization process. They are referred to as the UnderQuantize and OverQuantize problems. The former causes ambiguities and often leads to false visual content matches, while the latter generates synonyms and may lead to missing true matches. Unlike most state-of-the-art research that concentrated on enhancing the BovW model by disambiguating the visual words, in this paper, we aim to address the OverQuantize problem by incorporating the similarity of spatial-temporal contexts associated to pair-wise visual words. The visual words with similar context and appearance are assumed to be synonyms. These synonyms in the initial visual vocabulary are then merged to rebuild a more compact and descriptive vocabulary. Our approach was evaluated on the TRECVID2002 and CC WEB VIDEO datasets for two typical Query-By-Example (QBE) video retrieval applications. Experimental results demonstrated substantial improvements in retrieval performance over the initial visual vocabulary generated by the BovW model. We also show that our approach can be utilized in combination with the state-of-the-art disambiguation method to further improve the performance of the QBE video retrieval.

Item Type: Conference or Workshop Item
Copyright Holders: 2014 Springer International Publishing Switzerland
ISBN: 3-319-04113-4, 978-3-319-04113-1
Project Funding Details:
Funded Project NameProject IDFunding Body
973 ProgramGrant No. 2013CB329304, 2014CB744604Chinese National Program on Key Basic Research Project
Not SetGrant No. 61272265Natural Science Foundation of China
FP7 QONTEXT projectGrant No. 247590EU
Keywords: visual vocabulary; synonyms; spatial-temporal context; content based video retrieval; Bag-of-visual-Word
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Item ID: 40780
Depositing User: Dawei Song
Date Deposited: 04 Sep 2014 08:47
Last Modified: 07 Dec 2018 23:01
Share this page:


Altmetrics from Altmetric

Citations from Dimensions

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU