Copy the page URI to the clipboard
Jolliffe, I.T.; Trendafilov, N.T. and Uddin, M.
(2003).
DOI: https://doi.org/10.1198/1061860032148
Abstract
In many multivariate statistical techniques, a set of linear functions of the original p variables is produced. One of the more difŽ cult aspects of these techniques is the interpretation of the linear functions, as these functions usually have nonzero coefŽ cients on all p variables.A common approach is to effectively ignore (treat as zero) any coefŽ cients less than some threshold value, so that the function becomes simple and the interpretation becomes easier for the users. Such a procedure can be misleading.There are alternatives to
principal component analysis which restrict the coefficients to a smaller number of possible values in the derivationof the linear functions,or replace the principal components by “principal variables.” This article introduces a new technique, borrowing an idea proposed by
Tibshirani in the context of multiple regressionwhere similar problemsarise in interpreting regression equations. This approach is the so-called LASSO, the “least absolute shrinkage and selection operator,” in which a bound is introduced on the sum of the absolute values of the coefficients, and in which some coefficients consequently become zero.We explore some of the propertiesof the newtechnique,both theoreticallyand using simulationstudies, and apply it to an example.