Break-Even Correlation Thresholds

As we all know, backtesting is not a research tool, but the very end of your research pipeline. If you want to evaluate if a given signal is predictive for returns , you can do this more clearly and directly by regressing on or measuring their correlation. But “how strong” does that correlation need to be for the signal to be “good enough”? A popular heuristic by Macrocephalopod provides a practical way of thinking about this question.

This note builds on that idea, replacing all approximations with a more generalized and more detailed derivation.

Linear Model

We model the return as a linear function of the signal value :

where is the intercept (the unconditional expected return), is the slope coefficient, and the residual is a mean-zero random variable with variance , which we can refactor into a scaled unit-variance residual with denoting the residual standard deviation. We assume , which we use throughout.

The correlation between and follows directly from the linear model. To establish the exact notation used in the subsequent evaluation criteria, we briefly restate the derivation from the standard definition of the correlation coefficient:

In we write the standard definition of the correlation coefficient. In we substitute the linear model for , after which applies the bilinearity of covariance in the numerator and the full variance expansion in the denominator. In we evaluate each term: and because is a constant; and because scalars factor out (and as their square for variance); by the same rule; all covariance terms involving the constant vanish; in the remaining terms the scalars and factor out. In we use by assumption and by construction, which collapses both numerator and denominator. In we cancel one factor of between numerator and denominator and recognise the remaining as , since follows directly from and .

For stating a signal evaluation criterion in the next step, we need expressed in terms of , which we read off directly from by multiplying both sides by :

This is the standard identity linking the regression slope to the correlation coefficient.

Signal Evaluation Criterion

Finally, we state what it means for a signal to be “good enough”. We require that, at a signal level standard deviations from its mean, i.e. at , the corresponding absolute expected return exceeds a trading cost threshold :

In we state the criterion in general terms: the conditional expected return, evaluated at a signal realization standard deviations from its mean , must exceed the threshold in absolute value. In we substitute from , and in we replace using , which expresses the criterion entirely in terms of . The absolute value reflects that the signal can be profitable in either direction (long or short).

The parameter controls how strict the criterion is and has a direct probabilistic interpretation. Since is linear in , all realizations of closer to the mean , i.e. where , fail as well, if fails. By Chebyshev’s inequality,

So at least a fraction of all signal realizations fall within this range. If fails to clear at the boundary, the signal may be economically non-viable for the majority of realizations and should be discarded. A smaller raises the bar on because a lower fraction of unprofitable realizations is accepted, whereas a larger lowers the bar because a higher fraction of unprofitable realizations is accepted.

The absolute value in splits into two cases, depending on whether the expression inside is strictly positive or strictly negative:

Case corresponds to the signal pushing expected returns above the positive threshold (profitable for a long position), while Case corresponds to pushing expected returns below (profitable for a short position).

Case : Long Profitability

We rearrange by moving to the right-hand side and dividing by :

Dividing both sides of by , which is nonzero by assumption, yields two subcases depending on its sign:

In the evaluation point is positive, so dividing preserves the inequality direction, and must exceed the threshold on the right. In the evaluation point is negative, so dividing reverses the direction, and must fall below the threshold. Notably, if , profitability does not require a strictly positive correlation in or a strictly negative correlation in since the unconditional return already exceeds the cost threshold .

Case : Short Profitability

We rearrange analogously:

Dividing both sides of by :

In the evaluation point is positive, so dividing preserves the direction and must fall below the threshold. In the evaluation point is negative, so dividing reverses it and must exceed the threshold. Analogously, if , profitability does not require a strictly positive correlation in or a strictly negative correlation in since the unconditional return already lies below the cost threshold.

Application

Which bound of / and / applies is fully determined by the input parameters. Given a concrete signal with intercept , correlation , return volatility , signal mean , signal volatility , a cost threshold , and an evaluation level , the procedure is as follows:

First, determine whether you are checking for long profitability (Case ) or short profitability (Case ), keeping in mind that both can be checked independently and a signal may satisfy one, both, or neither. Second, check the sign of the evaluation point . If it is positive, use or ; if negative, use or . Third, choose according to how selective you wish to be.

No comment found.

Add a comment

You must log in to post a comment.