Copy the page URI to the clipboard
Logie, Robert; Hall, Jon G. and Waugh, Kevin G (2003). Using safety and liveness properties to drive learning in a multi-agent system. Technical Report 2003/20; Department of Computing, The Open University.
DOI: https://doi.org/10.21954/ou.ro.00016000
Abstract
One of the strongest results in temporal logic is Chang, Manna and Pnueli's partitioning of reactive system properties into the classes of safety and liveness[1]. Safety and liveness properties state, intuitively and respectively, that something bad will not happen and that something good will eventually happen. In this paper we show how, in a multi-agent world, this safety/liveness partitioning can be used to drive learning. If an agent is introduced to a world and given a set of descriptions of system safety an liveness properties then how is it to discover how to behave in such a way as to satisfy them? Safety and liveness properties will influence agent behaviour, safety properties are cast as system norms exerting a restraining influence whilst liveness properties are cast as desires which exert a driving influence. Agents will randomly gather a set of atomic behaviours-simple actions which may be used individually, in combination or in conjunction with other agents. In order to discover behaviours which satisfy these system properties agents must have a "mischevious" element in their behaviour. Future worlds are given a preference ordering, when this ordering fails to provide clear guidance an agent may "mischeviously" select any available action not proscribed by safety norms. Undesirable world states are described by these safety norms and agents will be obliged to prevent these states by either refraining from actions which are known to bring them about or acting so as to attempt to clear these states if they are detected. A small number of dedicated coaching agents will assist "normal" agents in refining any behaviours they have developed. Coaches will also try to ensure that successful behaviours are propagated as quickly as possible. The mechanism for achieving these combined behaviours is a novel combination of belief update and belief revision. This arrangement provides a belief management framework which is capable of identifying factors governing the behaviour of the agent's world with no requirement for prior knowledge. The resulting set of beliefs will be filtered by an agent's desires and intentions so as to produce a partially ordered set of plausible worlds and, hence, a partial order on sets of available actions to control the agent's behaviour.