McKinney's Generalization of the Birthday Problem: Part I
In 1966, E. H. McKinney of Ball State University published a paper entitled "Generalized Birthday Problem" in the American Mathematical Monthly, in which he seek to answer two questions related to the famous birthday problem.
The problems considered by McKinney are (a) Suppose n people are selected at random. What is the probability that at least r people will have the same birthday? (b) What is the smallest value of n such that the probability is greater or equal to 1/2 that a least r people to have the same birthday? What follows is the solution by McKinney with some notational adjustment.
Let Xj (j = 1, 2, ... , n) be the n random birthdays, let event E' be defined as "r or more birthdays are equal" and the event E be defined as "none of the r random birthdays are equal", then we obviously have P(E)+P(E') = 1, in which P(E) can be computed by summing the probabilities of all ways in which the n random birthdays can take on less than r equal values. Suppose we use the following notation:
Then it is obvious that nj are governed by the following Frobenius equation (also called the Diophantine equation of Frobenius):
In general, there are m sets of solutions to the Frobenius equation and we denote the i-th solution as
The general term of the summation representing P(E) is the probability that there are exactly n1 unique birthdays, n2 pairs of equal birthdays, n3 triples of equal birthdays, ... , nr-1 (r-1)-tuples of equal birthdays, which is given by
Therefore, the probability that r or more birthdays are equal is given by
McKinney was using an IBM 7090 computer to compute the probabilities and he stopped at r = 4. For r = 5, the computation for one case of n is estimated by McKinney to take more than 2 hours. Thus it would take him 26 days to arrive at the solution if the computation was attempted.
I tried to repeat McKinney's calculation for second problem in Mathematica 7 and detected that the third/fourth decimal place of the probabilities given in original paper was not correct, possibly due to roundoff errors in his IBM 7090.
It took only 22 seconds to compute the probabilities in Mathematica for the case of r = 5. Not surprisingly, I ran into memory problem when the case of r = 6 is attempted as the number of terms in the summation is expected to exceed 15 millions. In Part II of the article, a numerical example will be given to illustrate the use of the formula for a simple case.
The problems considered by McKinney are (a) Suppose n people are selected at random. What is the probability that at least r people will have the same birthday? (b) What is the smallest value of n such that the probability is greater or equal to 1/2 that a least r people to have the same birthday? What follows is the solution by McKinney with some notational adjustment.
Let Xj (j = 1, 2, ... , n) be the n random birthdays, let event E' be defined as "r or more birthdays are equal" and the event E be defined as "none of the r random birthdays are equal", then we obviously have P(E)+P(E') = 1, in which P(E) can be computed by summing the probabilities of all ways in which the n random birthdays can take on less than r equal values. Suppose we use the following notation:
Then it is obvious that nj are governed by the following Frobenius equation (also called the Diophantine equation of Frobenius):
In general, there are m sets of solutions to the Frobenius equation and we denote the i-th solution as
The general term of the summation representing P(E) is the probability that there are exactly n1 unique birthdays, n2 pairs of equal birthdays, n3 triples of equal birthdays, ... , nr-1 (r-1)-tuples of equal birthdays, which is given by
Therefore, the probability that r or more birthdays are equal is given by
McKinney was using an IBM 7090 computer to compute the probabilities and he stopped at r = 4. For r = 5, the computation for one case of n is estimated by McKinney to take more than 2 hours. Thus it would take him 26 days to arrive at the solution if the computation was attempted.
I tried to repeat McKinney's calculation for second problem in Mathematica 7 and detected that the third/fourth decimal place of the probabilities given in original paper was not correct, possibly due to roundoff errors in his IBM 7090.
It took only 22 seconds to compute the probabilities in Mathematica for the case of r = 5. Not surprisingly, I ran into memory problem when the case of r = 6 is attempted as the number of terms in the summation is expected to exceed 15 millions. In Part II of the article, a numerical example will be given to illustrate the use of the formula for a simple case.
Comments