Measuring selection when you have too many traits that are too correlated

Author blog: Professor John Stinchcombe explains why multicollinearity is not the end of the road for measuring selection on biological traits.

Natural selection is the engine of adaptive evolutionary change, and it’s safe to say that evolutionary biologists since Darwin have devoted enormous effort to understanding it. How do we measure it? How strong is it? Does it act the same way on males and females, on life history and morphology? How spatially or temporally variable is it?  The strength, consistency, and mode of natural selection has profound implications for many fundamental evolutionary questions. Empirical progress requires a reliable way to measure selection acting on the individual traits we care about.

The Lande-Arnold Revolution

The Lande-Arnold approach gave us a straightforward way to measure natural selection, and revolutionized our understanding of it. By performing multiple regressions of relative fitness on multiple traits, empiricists had a means to estimate the strength and direction of selection (Fig 1). Its appeal is multi-faceted. Implementation is via multiple regression, easily within the wheelhouse of almost all biologists. The estimates obtained– partial regression coefficients or selection gradients– have direct connections to quantitative genetic theory for predicting evolutionary responses. Selection gradients also distinguish direct selection on the trait of interest from selection on correlated traits included in the statistical model. With these tools in hand, evolutionary biologists set about measuring natural selection in numerous contexts: the wild, the greenhouse and growth chamber, and experimental mesocosms. A vast literature has developed around the Lande-Arnold approach, including statistical methods and issues of experimental design, typical strengths of selection, ways to compare selection amongst different species and traits, and many other issues.


Fig 1. The traditional Lande-Arnold selection gradient approach, with the panels representing partial regression plots, illustrating the relationship between fitness and N traits, accounting for the correlations between the traits. For trait 2, we show wide confidence intervals, as might be seen if traits 1 and 2 are tightly correlated and lead to multicollinearity.

A stubborn problem

Since the advent of the Lande-Arnold approach, however, one issue has remained especially stubborn: how to make sense of selection on principal component (PC) scores rather than on traditional traits. PC scores have two main advantages: first, they reduce the dimensionality of the analysis. One might be able to summarize many to dozens of traditional traits in a few PC axes that describe most of the variation (e.g., Fig. 2 has two panels rather than the N panels in Fig. 1). In fact, Lande and Arnold performed PCA on the Bumpus dataset for this reason. Second, PC scores are uncorrelated with each other. This feature becomes advantageous when the original traits are so highly correlated that a regression of relative fitness on all traits encounters multicollinearity. Under multicollinearity, traits are so highly correlated that it becomes impossible to distinguish their separate influences on relative fitness. Using PC scores is a bit like having your cake and eating it too: information from all the traits is present in PC scores, but knotty issues of multicollinearity and dimensionality appear to go away.


Fig 2. In the PC regression approach, we show the regression of relative fitness on PC scores for PC axes 1 and 2. PC1 and PC2 contain information about all the traits, but we use fewer PC axes than there were traits in Fig 1.

While measuring selection on PC scores seems useful, it induces new drawbacks. Many biologists have an intuition about traits like date of first flowering, branch number, and growth rate; however, relating a linear combination of all three of these traits to an organism’s natural history or ecology is much harder. In addition, one of the major advances spawned by the Lande-Arnold revolution was to compare selection across studies: is selection on date of first reproduction similar between annuals and perennials?  When traits are on a common scale, these comparisons are possible. Comparing selection on PC1 scores across studies is much harder, especially given differences in what traits enter the analysis and how variable they are. For these and other reasons, measuring selection on PC scores has been roundly criticized.

An alternative approach

In our new Comment & Opinion piece, published in Evolution Letters, we suggest an alternative that allows investigators to measure selection on PC scores and quantitatively interpret them in light of the original traits. The approach seems well suited to dealing with multicollinearity or dimensionality problems. Our approach is to quantitatively combine information about how PC scores relate to relative fitness with information about how PC scores relate to the original traits, yielding an estimate of how traits are associated with fitness. Practically, this involves some linear algebra—multiplying a matrix of eigenvectors from the original PCA by a vector of selection estimates obtained using PC scores as traits—to obtain estimates of selection, but on the original traits (Fig 3). We applied this approach to our own data and to literature examples to illustrate its effectiveness.


Fig 3. The linear algebra required to reconstitute selection gradients for the original traits. On the right hand side, we use PC1 and PC2 as columns in a matrix, with the individual elements showing how traits 1, 2… N relate to the PC axes. The matrix of eigenvectors is multiplied by a vector, whose elements are the slopes from Fig 2.

Our approach requires one to use a subset of PC axes and PC scores as traits: using them all returns the same answers as traditional regression. Using a subset entails decisions about how many PC axes are required to capture variation in the traits without reintroducing multicollinearity. We suggest omitting the trailing PC axes describing relatively little variation (or mainly sampling variation), which statistical theory shows are responsible for multicollinearity, if it exists. It is important to note that decision-making like this is endemic to selection analysis: investigators must decide how many correlated traits to measure and include, or how many PC axes to include, especially when one recognizes how many features of an organism we could conceivably measure.

General lessons

We see three general lessons. First, measuring selection on PC scores does not have to be abandoned due to interpretation challenges. For cases when PC axes have many component traits, with different signs and magnitudes to their loadings, working in terms of the original biological traits is much simpler and more intuitive. Second, it is tempting to interpret selection on PC axes as selection on the trait that loads most heavily on those axes. The literature examples we reviewed reveal that this isn’t always true: those same traits also load on the remaining PC axes, which themselves can be under selection. Projecting selection estimates for all PC scores back into terms of the original traits gives an overall picture of selection. Finally, and perhaps most important, our approach shows how multicollinearity and high-dimensional data do not need to be a stopping point for selection analysis. In these situations, investigators are forced to do something: either drop traits, change hypotheses, or forgo estimating selection gradients. We suggest that in the case of multicollinearity, or high-dimensional datasets like expression, volatiles, or metabolites, measuring selection on PC scores and then projecting the estimates back into the terms of the original traits is a promising way forward.



John Stinchcombe is Professor of Ecology & Evolutionary Biology at the University of Toronto. The original paper is freely available to read and download from Evolution Letters.