Modelling and Reasoning with Quantitative Representations of Vague Spatial Language used in Photographic Image Captions

Hall, Mark (2011). Modelling and Reasoning with Quantitative Representations of Vague Spatial Language used in Photographic Image Captions. PhD thesis Cardiff University, School of Computer Science.


Photography is an inherently spatial activity as every photograph is taken somewhere, and this influences the photographs’ captions, which frequently contain natural-language descriptions of the image’s location. These location descriptions consist of toponyms that are proximal to the image location and spatial prepositions that relate the image location to the toponyms (“near the London Eye”). To be able to express all possible spatial configurations between image location and toponym with the small number of spatial prepositions that exist, the spatial prepositions have to have vague interpretations. The area where something is “near the London Eye” does not cut off sharply, instead the applicability of the phrase diminishes as the location moves away from the toponym. When automatically interpreting or generating such spatial expressions it is necessary to decide where to draw the boundary between “near” and “not near” and for this quantitative models of the spatial prepositions’ applicability are required. In existing quantitative approaches the models of the spatial prepositions’ applicability have been defined by fiat, which means that they are not necessarily good representations of how people use the spatial prepositions. This thesis takes a data-driven approach based on actual human use of the spatial prepositions. Using a set of data-mining and human-subject experiments, quantitative models for the spatial prepositions “near”, “at”, “next to”, “between”, “at the corner”, and the cardinal directions have been developed and a field-based representation has been created to enable computational processing of the quantitative models. Based on these models two spatio-linguistic reasoners have been developed that enable the automatic translation, in both directions, between the linguistic representation of the image’s location in the caption and a computational, spatial representation. This enables the integration of existing images that lack spatial metadata in a geographic information retrieval workflow and improves geo-referenced image collections by automatically providing a human-style description of the image’s location. Both spatio-linguistic reasoners have been evaluated and the results show that the reasoners and quantitative models are good enough to allow them to be deployed in practice.

Viewing alternatives

Item Actions