Anaya-Izquierdo, Karim; Critchley, Frank and Vines, Karen
Orthogonal simple component analysis: a new, exploratory approach.
Annals of Applied Statistics, 5(1) pp. 486–522.
Full text available as:
Combining principles with pragmatism, a new approach and accompanying algorithm are presented to a longstanding problem in applied statistics: the interpretation of principal components. Following Rousson and Gasser [53 (2004) 539–555]
'the ultimate goal is not to propose a method that leads automatically to a unique solution, but rather to develop tools for assisting the user in his or her choice of an interpretable solution'.
Accordingly, our approach is essentially exploratory. Calling a vector ‘simple’ if it has small integer elements, it poses the open question:
'What sets of simply interpretable orthogonal axes—if any—are angle close'
to the principal components of interest? its answer being presented in summary form as an automated visual display of the solutions found, ordered in terms of overall measures of simplicity, accuracy and star quality, from which the user may choose. Here, ‘star quality’ refers to striking overall patterns in the sets of axes found, deserving to be especially drawn to the user’s attention precisely because they have emerged from the data, rather than being imposed on it by (implicitly) adopting a model. Indeed, other things being equal, explicit models can be checked by seeing if their fits occur in our exploratory analysis, as we illustrate. Requiring orthogonality, attractive visualization and dimension reduction features of principal component analysis are retained.
Exact implementation of this principled approach is shown to provide an exhaustive set of solutions, but is combinatorially hard. Pragmatically, we provide an efficient, approximate algorithm. Throughout, worked examples show how this new tool adds to the applied statistician’s armoury, effectively combining simplicity, retention of optimality and computational efficiency, while complementing existing methods. Examples are also given where simple structure in the population principal components is recovered using only information from the sample. Further developments are briefly indicated.
Actions (login may be required)