Robust learning of high-dimensional Gaussian distributions

Bogdan Grechuk
30 Jun 2026

The Gödel Prize is an annual prize for outstanding papers in theoretical computer science. The 2026 Gödel Prize has been awarded to Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart for their paper (Diakonikolas et al. 2019), which resolves a longstanding problem in robust statistics.

In many modern applications, the data are not just noisy in the usual random sense: a small part of the data may be completely unreliable. A sensor may fail, a database may contain malicious entries, or a very large experiment may include a few measurements produced under the wrong conditions. The classical estimators for the mean and the covariance of a Gaussian distribution are the empirical mean and the empirical covariance. They are excellent when all samples are honest, but they are fragile: a tiny fraction of carefully chosen outliers can move the empirical mean by a large amount in high dimensions.

We now describe a theorem of Diakonikolas, Kamath, Kane, Li, Moitra and Stewart (Diakonikolas et al. 2019), which shows that this obstacle can be overcome algorithmically for Gaussian distributions. We first introduce the precise language needed to state the result.

For a vector , let denote its transpose. A real symmetric matrix is positive semidefinite if for every , and it is positive definite if for every non-zero . If is a random vector in , its mean vector is and its covariance matrix is the matrix with entries Thus is the variance of the -th coordinate, while for measures how the -th and -th coordinates vary together.

For and a positive definite matrix , the Gaussian distribution is the probability distribution on with density where is the determinant of . The vector is the mean vector of the distribution, and the matrix is its covariance matrix.

To measure the quality of an estimated distribution, we use total variation distance. If and are two probability distributions on , define where the supremum is over all Borel sets , that is, over all sets whose probabilities are defined from open sets by countable unions, intersections, and complements. Equivalently, if and have densities and , then This number lies between and . It is exactly when the two distributions are the same, and it controls the advantage of any statistical test trying to distinguish a sample from from a sample from .

Finally, an -corrupted set of samples from a distribution is a list obtained as follows. First, independent samples are drawn from . Then an adversary, after seeing all the samples, is allowed to replace at most of them by arbitrary points of . The algorithm receives only the final corrupted list, not the identity of the corrupted points.

We use the asymptotic notation to denote for some universal constants .

The most striking feature of Theorem is that the final error bound has is independent of the dimension . Of course, the number of required samples must grow with : even without corruptions, estimating an arbitrary covariance matrix requires learning about numbers. But once enough samples are available, the theorem says that the effect of the adversary is controlled essentially by the corruption rate , not by the ambient dimension.

In the same work (Diakonikolas et al. 2019), the authors also proved versions of Theorem for many other important families of distributions, including a product distribution on the hypercube, mixtures of two product distributions, and mixtures of spherical Gaussians. These results became a turning point in robust statistics. Classical robust estimators in high dimensions often had good mathematical guarantees but were computationally intractable. This theorem shows, for some of the central families of probability distributions, that robustness and efficient computation can coexist: even when an adversary corrupts a constant fraction of the data, a polynomial-time algorithm can still recover a distribution that is almost indistinguishable from the truth.

References

Diakonikolas, Ilias, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2019. “Robust Estimators in High-Dimensions Without the Computational Intractability.” SIAM J. Comput. 48 (2): 742–864. https://doi.org/10.1137/17M1126680.

Robust learning of high-dimensional Gaussian distributions

References

Add a comment