Copy the page URI to the clipboard
Chiatti, Agnese
(2022).
DOI: https://doi.org/10.21954/ou.ro.000151b5
Abstract
Robots can take over many tasks that are unsafe or inconvenient for us to carry out. To assist us effectively, service robots are expected to make sense of fast-changing, real-world settings. In this dissertation, we tackle the problem of robot sensemaking from the standpoint of vision. Thus, we focus on the objective of building Visually Intelligent Agents, able to use their vision system, reasoning components, and background knowledge to make sense of the environment. We start by identifying a framework of requirements that contribute to the Visual Intelligence of a robot. In particular, we emphasise that the Visual Intelligence of state-of-the-art AI methods based on Deep Learning is severely limited, while humans excel at vision. Therefore, we derive an initial set of requirements from cognitive theories of the human vision system and further complement these requirements with insights from concrete robotic scenarios. We hypothesise that a promising direction for equipping Deep Learning methods with the missing requirements is to introduce reasoning components that rely on symbolic knowledge representations. To this aim, we audit the level of support that state-of-the-art Knowledge Bases provide for the required knowledge. Our requirement analysis and knowledge coverage study inform the development of two reasoners, which are able to consider the typical size and spatial relations of objects when trying to categorise them. These two components have been integrated in a general Robot Architecture for Visual Intelligence (RAVI), which augments a Deep Learning component with different knowledge-based reasoners. We evaluate RAVI in the test case of a robot that monitors an office environment in search of potential Health and Safety risks, demonstrating a significant improvement over the state of the art. Findings from this work also guide the discussion of the limitations of current solutions and provide a roadmap for further developing Visual Intelligent Agents.