Chebyshev's Inequalities

Chebyshev's inequality and its descendants allow you to place an upper bound on the probability that some random variable is >= a set value, given only the mean and variance of that variable. No other information about that variable's distribution is required. Some of the descendants exist to make use of information about higher moments, though.

This web page was sparked off by a web page from Henry Bottomley, at He gives proofs for the standard two-tailed inequality and for a one-tailed variant, and shows examples where the inequality is exactly satisified. Herman Rubin responded in usenet, linking this to the 'problem of moments'. This result is also Question 9 in Problems 7.11 of 'Probability and Random Processes', by Grimmett and Stirzaker, published by Oxford Science Publications, ISBN 0 19 853665 8 - page 327 of my copy, which is the 1993 reprinting. The main contribution of this page is to give a slightly different proof for the one-tailed case, which I find easier to follow than Henry's.

The main use for Chebyshev's inequality is in proving theorems, such as laws of large numbers. The existence of cases where it is exact shows that it is as strong as it can be, given its total lack of assumptions on the distribution of the variable involved. Despite this, you can usually get very much stronger bounds by assuming, for instance, that the variable in question is normally distributed - so most people do. It would be nice to use Chebyshev's inequalities as a defence against variables in real life where outliers are expected, but in such cases the variance of the variable in question may not be known - and judging the error of estimates of variance will probably involve you in distributional assumptions anyway.

The standard two-tailed inequality

We are given a variable X with known mean and variance. For convenience, add a constant to ensure that the mean of X is zero. P(|X| >= t) = E([X^2 >= t^2]), where [condition] has the value 1 if condition holds and 0 otherwise. E([X^2 >= t^2]) <= E(X^2/t^2) = Var(X)/t^2 (since we made sure E(X) = 0). So P(|X| >= t) <= Var(X)/t^2. If you remove the constant added you find that P(|X - E(X)| >= t) <= Var(X)/t^2. This equality can be exact for rather degenerate distributions where X can takes two possible values.

A proof of a one-tailed inequality

This proof will use the well-known equality Var(X) = E(Var(X|A)) + Var(E(X|A)). One proof of this is:
Var(X)=E(X2) - E(X)2
=E(E(X2|A)) - E(E(X|A))2
=E(Var(X|A) + E(X|A)2) - E(E(X|A))2
=E(Var(X|A)) + E(E(X|A)2) - E(E(X|A))2
=E(Var(X|A)) + Var(E(X|A))

We first of all subtract a constant as necessary to ensure that E(X) = 0. Now, let p, q, and r be the probabilities that X is > t, = t, and < t.
E(X) = 0 = pE(X|X > t) + qt + rE(X|X < t)
Let S = 1 if X > t, 0 if X = t, and -1 if X < t.
Var(X) = E(X2) = E(Var(X|S)) + Var(E(X|S))
And we ignore the term E(Var(X|S)) - which will be zero when our inequality is exact. So we are concerned only with Var(E(X|S)). It's easy to see that E(X|S >= 0) >= t. We can get an idea of E(X|S = -1) from the equality E(X) = 0. Note also that E(E(X|S)) = 0.
Var(X)>=pt2 + qt2 + r[(-pt - qt)/r]2
Var(X)>=(1-r)t2 + t2((1-r)/r)2r
>=t2(1-r) + t2(1-r)2/r

Where s is the prob of X >= t. So Var(X) - sVar(X) >= t2s. or Var(X) >= s(t2 + Var(X)), from which we work out that (adding in our constant again)..

Prob(X - E(X) >= t) <= Var(X)/(t2 + Var(X))

Proof in answer book

The proof in the answer book is neater, and doesn't need as much inspiration as you might think. Here we assume E(X)=0. Now give yourself an extra degree of freedom by noticing that P(X >= t) = P(X + c > = t + c). Since t + c >= 0, P(X >= t) <= E((X + c)2/(t + c)2) = (Var(X) + c2)/(t + c)2.

If you minimise the rhs over c, you will find that it occurs at c = Var(x)/t. I need only pluck this figure from the air and work out that P(X >= t) <= (Var(X) + Var(X)2/t2)/(t+Var(x)/t)2 = Var(X)(t + Var(X)/t)/(t(t+Var(X)/t)2) = Var(X)/(t2 + Var(x)).

Proof in Feller

Another proof of this appears in Volume II of "An introduction to Probability Theory and its Applications", by Feller, section V.7, example (a). This is similar to the proof in the answer book (but no doubt precedes it, at least if you judge by the date of the respective copyrights.

Connection with the Median

The first proof above was based on a proof in Bartoszynski and Niewiadomska-Bugaj that the median is no more than one standard deviation away from the mean, so it is fitting that Henry Bottomley pointed out that you can derive that fact from this inequality. Without loss of generality suppose that the median is greater than the mean. The probablity of X being at least the median value is at least one half. If the median was more than one standard deviation away from the mean, that would contradict the one-tailed Chebyshev inequality.