As we all know, backtesting is not a research tool, but the very end of your research pipeline. If you want to evaluate if a given set of signals is predictive for returns, you can do this more clearly and directly by regressing returns on the signals or measuring their correlations. But “how strong” do those correlations need to be for the signals to be “good enough”? And what about their intercation?
Linear Model
Say we have signals collected in a vector and a scalar return we try to predict. We model the return as a linear function of the signals:
is the regression’s intercept (the unconditional expected return), is the vector of slope coefficients, and the residual is a mean-zero random variable with variance . We can decompose in into a scaled unit-variance residual term , where has unit variance, yielding . We assume , which we use throughout.
From the Model to Correlations
The marginal correlation between and a signal is , defined as , where is the standard deviation of the -th signal and is the standard deviation of the returns . We collect the marginal standard deviations in the diagonal matrix and the correlations in the vector . Dividing each component of by the corresponding term amounts to pre-multiplying by . Substituting into and carrying through:
In we write the vector form of the correlation definition, and in we substitute for . In we use the linearity of covariance. In we use three facts: because is a constant; where is the covariance matrix of the signals; and by our assumption. In we collect terms, and in we simplify by recalling how the covariance matrix decomposes into correlations and standard deviations: the signal correlation matrix is defined by , so pre-multiplying by gives .
From Correlations to Betas
For stating a signal evaluation criterion in the next step, we need to be expressed in terms of . We obtain this by inverting , multiplying both sides on the left by :
In we multiply both sides of by , where the on the right cancels with . In the product reduces to an identity, and in reduces to an identity as well. Reading from right to left:
This is the multivariate generalisation of the well-known univariate identity linking the regression slope to the correlation coefficient, where the inverse correlation matrix adjusts for cross-correlations among the signals by isolating each signal’s unique contribution conditional on the others.
Signal Evaluation Criterion
Finally, we state what it means for the correlations of a set of signals to be “good enough”. We require that, at a signal level standard deviations from the mean, i.e. at where is the signal mean vector, the corresponding absolute expected return exceeds a trading cost threshold :
In we state the criterion in general terms: the conditional expected return, evaluated at a signal realisation standard deviations from the mean, must exceed the threshold in absolute value. In we substitute from , and in we replace using . In we transpose the product, using the fact that and are both symmetric, so . In we distribute over the sum, noting that . In we define , the vector of standardised signal means whose -th component is . The absolute value reflects that the signals can be profitable in either direction (long or short).
Since the term appears repeatedly throughout the rest of the article, we name it :
This quantity collapses the entire vector of correlations, the inter-signal dependence structure , and the evaluation point into a single number, so that criterion simplifies and reads:
The Parameter
The vector controls our evaluation point and has a probabilistic interpretation. Intuitively, one might want to choose an arbitrary vector, such as setting all signals to standard deviations, and test if the resulting expected return is profitable. However, this approach breaks the probabilistic guarantees of our evaluation this parameter encodes. Instead, we must measure how “extreme” it is relative to the multivariate signal distribution, which is captured by the squared Mahalanobis distance :
However we are not done yet. If we were to test an arbitrary point that lies on the boundary of an ellipsoid with distance , and this point fails the profitability criterion, we cannot mathematically guarantee that the entire ellipsoid is unprofitable. This is because the expected return is a linear hyperplane, it might intersect the ellipsoid such that other, equally probable signal combinations on the same -ellipsoid yield more profitable returns.
Therefore, to establish a strict lower bound on the fraction of unprofitable realizations, we must fix our probability budget upfront and subsequently find the most profitable point on that specific -ellipsoid. If the correlatation fails to cover the cost threshold even at this optimum, i.e., (for a long position) or (for a short position), then by linearity, the entire ellipsoid and its tangent half-space extending in the adverse direction must be unprofitable.
Because the conditional expected return is a linear combination of the signals, this optimization is equivalent to projecting the entire -dimensional signal space onto a single 1-dimensional scalar axis . On this 1D projection axis, the mean is and the variance is .
To find the exact evaluation points that extremize this projection, we can drop the constant terms and from the expected return, isolating the relevant variable term . For both long (maximum) and short (minimum), the optimization problems are:
We can solve this analytically using Lagrange multipliers, which naturally yields both the global maximum and minimum simultaneously. We define the Lagrangian and take the first-order condition with respect to :
In we set the gradient to zero, and in we solve for . To find the multiplier , we substitute back into the constraint :
In we expand the transpose and group the terms. In we cancel and recognize the covariance matrix . In we isolate the scaling factor . Because taking the square root yields a solution, the Lagrange method perfectly captures both extremes: the positive root corresponds to the maximum (long), and the negative root corresponds to the minimum (short). Substituting this back into yields the closed-form analytical solution for the optimal evaluation points:
where we define to absorb the sign depending on the trade direction. At this exact point , the deviation from the mean in the projected space reaches its maximum/minimum. By operating on this 1-dimensional projection, we can bound the probability of the unprofitable tangent half-space using the one-sided Cantelli inequality:
If , the optimal evaluation point lies above the mean. All realizations satisfying are closer to (or on the opposite side of) the mean and therefore fail. By the Cantelli inequality with :
In we apply the Cantelli inequality, for any distance , setting , which is strictly positive. In we factor from the denominator and cancel. In we take the complement: since , it follows that .
If , the optimal evaluation point lies below the mean. All realizations satisfying are closer to (or on the opposite side of) the mean and therefore fail. Applying the Cantelli inequality with :
In we apply the Cantelli inequality, for any distance , setting . In we expand the square, factor from the denominator, and cancel it with the numerator. In we rearrange the inequality inside the probability measure to isolate . Since this form directly bounds the correct side, we bypass the need to calculate the complement.
In both cases, if the combined signal strength fails to clear at the boundary controlled by , the signals are economically non-viable for at least a fraction of realizations, which might be too large. A smaller raises the bar on because a lower fraction of unprofitable realizations is accepted, whereas a larger lowers the bar because a higher fraction of unprofitable realizations is accepted.
Case Distinction
The absolute value in splits into two cases, depending on whether the expression inside is strictly positive or strictly negative:
Case corresponds to the signals pushing expected returns above the positive threshold (profitable for a long position), while Case corresponds to pushing expected returns below (profitable for a short position). Both can be checked independently, and a set of signals may satisfy one, both, or neither.
Case : Long Profitability
We rearrange by moving to the right-hand side, dividing by , and expanding from :
In we move to the right-hand side. In we divide by , which preserves the inequality direction, and in we expand using . This is a on the correlation vector : the set of admissible is a half-space in with normal direction and offset . Notably, if , the right-hand side becomes negative, and profitability does not require since the unconditional return already exceeds the cost threshold .
Case : Short Profitability
We rearrange analogously, moving to the right-hand side, dividing by , and expanding from :
In we move to the right-hand side. In we divide by , which preserves the inequality direction, and in we expand using . This is again a on , now with the reversed inequality. Analogously, if , the right-hand side becomes positive, and profitability does not require since the unconditional return already lies below the cost threshold .
Application
Given a concrete set of signals with intercept , marginal correlation vector , signal correlation matrix , return volatility , standardised signal mean vector , a cost threshold , the procedure is as follows.
First, choose the Mahalanobis distance according to how selective you wish to be, noting that it determines the minimum fraction of signal realizations that is unprofitable by the Cantelli inequality, yielding by . Second, compute the combined signal strength . Third, determine whether the signals clear the profitability threshold for long positions (Case ), short positions (Case ), or both, keeping in mind that both cases can be checked independently.
The univariate derivation of this note, which is less complex and therefore more instuitive, can be found here.