The jumping-off point for this is the view of the Second Coming that has passed into popular culture, but this isn't intended to be a serious comment on anybody's religion. For one thing, the maths here is probably dodgy enough to prohibit it from being a serious comment on anything.
Q. What are the odds on the second coming?
A. Probably not in the next 20 years.
Q. Will the current project ever be finished?
A. How should I know?
(see later for the case of the dragon and the clergyman's daughter)
I first started wondering about the second coming in late '95, as the year 2000 started to loom ahead. There have, after all, been warnings of millennial religious panic for some time. Unfortunately I can't possibly estimate the significance or otherwise of an exact number of thousands of years (though I would point out that we might not expect to notice much real change from an exact 2000-year gap until about the year 2020 at least).
However, Matthew chapters 24 and 25 seem quite clear that we can't tell when the second coming might be, so we might be justified in treating it as a nice illustration of an event which could happen with equal probability at any given moment in time, but so far hasn't (as far as I know).
There has been interest in slightly barmy prediction since at least Laplace, who spent some time trying to work out the probability that the sun would rise tomorrow. If you think about it, much of our knowledge derives from the implicit assumption that what happened (or didn't happen) yesterday will happen (or not happen) tomorrow, and it's hard to justify this assumption without ending up with rather circular arguments. Probably all of this is written up properly somewhere, but here I'm trying to patch it all together from such pieces as I can find at the moment.
I did hear a talk once from a Church of England Vicar about a situation in which a man had left his fortune to fund the second coming. The church managed to get a court to give it to them, on condition that they buy an insurance policy which would pay an equivalent amount to Jesus Christ, should he turn up and identify himself to their satisfaction. I asked the Vicar what the premium was, but he didn't know the details. I wonder if they went to a friendly insurance company, or one which was as atheistic as possible? Of course the two things aren't completely inconsistent. The question of the `nice guy' who just happens not to believe in God has been raised before. I think I have heard every approach between claims that they're almost a Christian but just won't come out of the closet and admit it, to a flat statement that they will burn in damnation eternally, but I'm willing to be corrected.
How sure are we that the second coming won't happen within the next 5 minutes, or the next 5 years?
Clearly this calculation applies not only to the second coming, but to the probability of a machine breaking down when it has worked perfectly for the last five days, and so on. You might also consider it as the ultimate in small sample statistics, whether you consider it as prediction from a single right-censored waiting time, or prediction from the lack of any observations at all. Now and again it comes to mind as I watch some character in a movie try and sneak past the guards. Will they get caught by a surprise patrol?
Like a lot of statistics, most of the probability involved isn't too bad, but we can't answer the question as stated. There are a variety of different approaches which answer related, but different, questions.
The first gives us a `confidence region'. One way to think of this is to imagine that you aren't answering any particular problem, but you are writing a rule book which will be used to get answers in a whole variety of different situations. You want to write rules which, when followed, will yield a range of possible values which will include the correct value most of the time, no matter what the underlying situation is. You might allow them to fail one time in 100, for example.
The way to do this is to take a situation, work out a small range of very misleading possibilities which occur with total probability about one in a hundred, and then write the rules so that they give a range of values so broad that they cover all the possibilities, except for the strange one in a hundred shots.
Here is a simple example. We are going to observe for 1 unit of time, and then wait for tk units of time. We will bet that nothing occurs in those tk units of time as long as we have observed no more than k events in the first unit. We expect all patterns of events with a total of n events to occur with equal probability. We will be fooled if n>k and there are k or less events in the first unit, or if 1 < n <= k, and there are n-1 or less events in the first unit. If you think of firing n items at our stretch of length 1+t, you can see that you are most likely to be fooled when n=k+1. In this case, you will be fooled with probability 1-1/(1+t)k+1: that is, unless all k+1 events occur in the first unit of time. So if you are going to test for 1 unit of time, bet for t units of time, and want to be bet wrong with probability at most p, you should be careful not to bet if you see more than -1-ln(1-p)/ln(1+t) events. This is the same as saying that if you are going to bet every time you see k events or less, you should set tk=(1-p)-1/(k+1)-1. If we plug in k=0, p=0.01, we find that tk=0.0101.., so we might again reckon on not seeing the second coming within the next 20 years.
Note that this does not say anything about your chances of betting right: in fact there are values of p and tk which mean that you will never bet at all - it just guarantees to limit the number of times you bet and bet wrong.
Unfortunately, what I'd really like to do is bet all the time, and choose t using the above formula. This is not the same as choosing a fixed k and working out a t from it, and isn't one of the things in my collection of textbooks - I'll cover it later.
If we recognise that what we have here is a Poisson process of unknown rate, it's fairly easy to find out that the probability of success with given rate parameter r and given failure probability p is
SUMk=0k=infinity rk e-r( tk+1)/ k!= SUMk=0k=infinity rk e-r (1-p)-1/(k+1)/k!
(Here SUMk=0k=infinity is the best I can do to denote a sum from k=0 to k=infinity in html).
The main expression simply the sum, over all possible values of k, of the probability of seeing k events in the first unit of time, and then seeing no events in the next tk. Using trial values of p and r we seem to converge to the requested success probability from above as r increases, but I don't know how to show that explicitly for all p, r. From this, it might seem that this rule is too conservative when k is small. It might be, but the perfect rule will also be at least slightly conservative: in the case when the underlying r is 0, every rule (except the insanely optimistic `if it hasn't happened yet, it's not going to happen at all') will be found, in hindsight, to be too conservative.
We can simplify things by changing our question slightly. Suppose that we decide that we will only bet when we observe no events in the first unit of our time, and we want to work out how often we will be fooled in the worst case. Using our Poisson approach, we can see that the chance that we will be fooled is the probability that no events happen in the first unit of time, but at least one does happen in the next t units of time. This is e-r(1-e-rt). The worst case r for any given t is (differentiate and set equal to zero) ln(1+t)/ t. Now our confidence-based approach, with probability of being wrong set equal to 0.01, gives us t=0.010101... The worst case r for this is 0.9949833. This gives us a failure probability of 0.0037, which suggests that we can afford to be a lot more daring.
However, we have actually changed the question a lot more than you might imagine. The previous case started off by assuming that there were going to be n events in all, where n was a worst possible case. What's more, it is quite promising as a rule of the form `look at the number of events and then choose t'. Here we set only the rate r, which means that we get a lot of use of the fact that we sometimes don't bet at all. Our failure probability of 0.0037 comes from multiplying together a probability of 0.36973, which is the probability that no events will occur in the first unit of time, by a probability of exactly 0.01, which is the probability that an event will occur in the following t units of time, with our worst case r.
We have drifted very close to decision theory. We are now just taking our formula, and looking at its behaviour as r and p change. What we have done so far amounts to a `worst case' scenario: we want a strategy which will work as well as possible in the worst possible situation. There are other alternatives. One is the `least regret' strategy. In terms of money, the worst case strategy minimises the worst loss we could possibly make. The least regret strategy minimises the difference between how well we could do if we knew the underlying value of r, and how well we will do using our strategy.
I will now attempt to produce a rule which allows us to predict an interval of time during which no events will take place, and which fails with probability no more than p, no matter what the underlying rate of the Poisson process involved is.
To start with, consider a more general situation. Suppose that an adversary marks a number of events on a circle, without being bound to follow any particular distribution or randomness constraints at all. I choose a point on the circle at random, and am told the distance back to the immediately previous event. What rule can I use to predict a safe distance forwards before I am likely to encounter the next event? Well, no matter what point on the circle I pick, there will be an event before my point and an event after it. Given that I am in the gap between two events, my point is equally likely to lie in any of the points between those two events. If we scale the problem so that the distance back to the previous attempt is 1 unit, then the probability that the distance forward to the next event is less than t is the probability that we have picked a point which is more than one unit to the right of an interval of size 1+t. So if you want to be wrong with probability at most p, you need t/(1+t) <= p, or t <= p/(1-p). I don't like this approach, because it assumes that the time at which you look at the evidence and make your decision how long to wait is random with respect to the stream of events (and I have to put the problem on a unit circle because I don't want to worry about exactly what a point uniformly chosen on the real line is), but it demands very few assumptions about the stream of events.
Now we go back to an underlying Poisson process. Suppose for the moment that we are not told how many events have occurred in the preceding unit of time, but we are told the waiting times to the past n events. We wish to put a confidence interval on the waiting time to the next event. Since statistics is born out of probability, we write down the probability of observing n+1 waiting times ti. This is rn+1PRODi=0n e-r ti. (it is easy to see that the waiting time is exponentially distributed with parameter r: the probability that the waiting time is at least t has to be the probability that 0 events occur in time t).
This probability depends only on the number of events and the total waiting time: we can write it as rn+1e-r SUM ti. So if we know T=SUM ti, then the waiting times are uniformly distributed, given that they sum to T. If n+1 events occur in time 1, we are interested in the probability that the waiting time associated with a previously chosen event is no more than tn. This is the same as choosing n+1 points in the unit circle, and looking at the gaps between the points. That in its turn is the same as choosing n points in the unit line, and looking at the gaps made by those points (just choose one of our n+1 points, cut the circle there, and unroll it). This means that the distribution of all the gaps in the unit line is identical. (See also "An Introduction to Probability Theory and its Applications", by Feller, vol II, I.7, Example (b)). But it is easy to figure out the distribution of the first gap in the unit line: the probability that the first gap is at least t is the probability that each of the n points on the unit line has x co-ordinate at least t, and this is (1-t)n. So if we have n+1 waiting times which sum to T=1, the probability that a chosen one is at least tn is (1-tn)n. Scaling from a total waiting time of 1 to a total waiting time of 1+tn, the probability of observing n waiting times which sum to 1, followed by a waiting time of at least tn, given that the total waiting time is 1+tn is (1/(1+tn))n. Therefore we want (1/(1+tn))n >= 1-p, or tn <=(1-p)-1/n-1. This does not match our previous formula - but it is not answering our previous question.
We started off with an observation of n events in our preceding unit time. The Poisson distribution is entirely symmetric in time, so we `run the film backwards' and observe n waiting times. The first waiting time we observe starts with what would otherwise be the end of our unit time period of observation, and ends with what is (in forwards time) the final event in this period. We don't observe the wait which extends beyond the end of our interval, but since we want a rule which promises us a probability of failure at most p, we will be pessimistic and assume that an event happened just before we started observing. This gives us n+1 events whose waiting times sum to 1, so in our previous situation we should chose tn=(1-p)-1/(n+1)-1 - which is a match.
Getting back to proper statistics, a more usual confidence argument is to say that we appear to have a Poisson process going on. For a given r, the probability of receiving 0 events in time 1 unit is e-r. So if we use a rule which tells us to choose r such that the probability of seeing no events in 1 unit of time is <= p, then we will find that the true r is greater our chosen r no more than p of the time. Given a range of values of r, we might work out probabilities. So if we need to find a range of r with probability of being wrong of p or less, we would say that r lies somewhere between 0 and -ln(p). This tells us that the probability of seeing no events in the next t units of time is somewhere between 1 and p-t with probability of being wrong no more than p, regardless of the underlying value of r. This sort of argument is used by, for instance, insurance companies, which are happy to take the underlying probability as an end in itself: after all, you take out an insurance policy for a fixed period of time, and pay a variable amount of money, which reflects the chances of you making a claim. If you drive fast cars rather badly, you pay a lot for insurance, rather than being told that you can only drive for six weeks in any one year. (Of course, if you drive them really badly, you will be told that you can't drive them at all for the next three years, but that's not down to the insurance company).
Another way to work is to notice that if had we some sort of idea of what probability to expect, we could solve the problem quite easily. Suppose that we have a black box which chooses a rate r at random between 0 and infinity, then generates the events in the first unit of time. We can therefore start off with a prior probability distribution on r (albeit an `improper' one). Given a prior probability on r, p(r), and a model for the probability of our observation (number of events so far), p(O|r), Bayes' theorem tells us that the posterior probability of r, p(r|O), is
p(O|r)p(r)/p(O)
which is
p(O|r)p(r)/INT0infinity p(O|r)p(r) dr
If we set p(r)=1, then the integral turns out to be one, so we can forget about it, and find out that the posterior probability is e-rrn/n!. We now want to take our probability distribution on r, and turn it into a probability of no events happening, by multiplying the probability of each possible r by the probability of no events given that r, which gives us
INT0infinity e-rrn e-r t/n! dr
So a few integrals will give us the exact probability of no events occurring in the next t units of time: it is (t+1)-(n+1), which, by the way, is what we would get if used the formula we first thought of. The catch with this approach is that it relies on you making an assumption about the black box. You can think of this as writing down what you would believe if you had no information whatsoever. It's hard to justify such guesses, and the choice of guess does affect the final answer (although for most sensible guesses, this effect tends to die away as you get more and more data in).
Here are the integrals.
Ik = INT0infinity e-axxk dx
Ik+1 = (k+1)/a Ik
I0 = 1/a
Ik = k!/ak+1
And what we wanted is
INT0infinity e-r rn e-r t/ n! drBy the way, Thomas Bayes, whose theorem we have just used, was an English clergyman. I have a reference to a reference which ties it in quite nicely with my own semi-philosophical motivation. E.T.Jaynes, in Chapter 16 of the excellent (if tragically unfinished) book available online via http://omega.albany.edu:8008/JaynesBook.html and bayes.wustl.edu, says that
Fisher (1956, p.10), Stigler (1983), and Zabell (1989) present quite good evidence - which seems to us, in its totality, just short of proof - that Thomas Bayes had found his result as early as 1748, and the original motivation for this work was his annoyance at the claim of the 18'th Century philosopher David Hume of the impossibility of induction. We may conjecture that Bayes sought to give an explicit counter-example, but found it a bit more difficult than he had at first expected, and so delayed publishing it. This would give a neat and natural explanation of many otherwise puzzling facts.
Returning to our problem of calculating safe waiting times, we can expand (1-p)-1/(n+1) -1 out using the binomial theorem. It seems that we have decided that if nothing has happened in the first stage, then we will relax for almost the same fraction of the first time as we are prepared to spend being wrong. This means that we need a lot of testing for very high assurance - so much that the only practical way to ensure e.g. that the chance of a plane failing during a 3-hour flight is less than one in a million is going to be to know enough about the structure of the problem that you can use some other method than simply waiting for failures in test flights.
Applying this idea to the problem of the second coming, you might decide to look for more everyday miracles. Sir Francis Galton, one of the early statisticians, decided to look at the power of intercessionary prayer. He looked for a situation where you could be sure that a lot of people had been praying honestly for one outcome, and there would be very few prayers said for the opposite outcome. What he decided to look at was the infant mortality rate of clergyman's children: prayers for their survival would be very frequent, and who would pray for the death of an infant, whether or not they could stand the father? I don't think he found anything very conclusive, but I don't remember that what I read was very clear on this. Looking to find the information again, I've so far only managed to find various conflicting second hand accounts. The general impression is that most people think any miraculous effect is hidden behind the benefits of the temparate lifestyle generally pursued by most believers.
Alternatively, you could argue that our calculations show that the world is a more surprising place than most people think. Just because nobody's seen a dragon for the past few hundred years doesn't mean that you won't see one in your lifetime.
Of course, you may be waiting with bated breath for the second coming, or for some other equally uncertain event, so, rather than wanting to find a confidence region which limits the likelihood of it occurring within the next five minutes, you might want to find some sort of encouragement that it will occur within the next five hundred years.
You should be careful trying to apply the above figures. It is clear that if an event has not yet happened even once, we cannot dismiss the theory that it occurs at rate 0, that is, that it will never happen. Also, the confidence regions we used above attempt to make sure that we aren't fooled into waiting too long, no matter what the underlying rate is. This means that for very small rates, the real likelihood of an event within our waiting time is much smaller than the figures imply.
We can produce a rule for cases when we observe at least one event in our initial period, much as before. We will decide that there is going to be at least one event in the following period of time t as long as we see at least n events in the initial period. We are most likely to be fooled if there are only going to be n events in total, all in the initial period. The probability of this is at most (1+t)-n, so we should set t=p-1/n-1, where, as usual, p is probability of our being fooled.
Here is a link back up to my home page.