-3

How would I find whether president birthdates are binomial, uniform, normal, or lognormal distribution.

I understand that something like 9% were born in Jan, Feb, Mar, Apr and Jul. 5% were born in May and June, and 11% were born in Aug and Nov. The last 2%, 14% and 7% where born in Sep, Oct and Dec (Respectively).

But how would I do a regression to find how the data fits (binomial, uniform, normal, or lognormal)? I am using excel.

Also, how would I find the probability that the next president may be born on a certain date?

Thank you in advance for helping, to whom ever helps.

  • 3
    The question doesn't really make sense; the domain of a binomial, normal, or log-normal distribution does not clearly map on to the set of months (because months are cyclically arranged, not linearly arranged). This is really a case of "make a histogram to visualize the distribution directly" and not a case where you describe the distribution in relation to others. – Milo Brandt Nov 25 '17 at 23:13

2 Answers2

1

Take a second to think about what each of the distributions that you have suggested models, and try to understand whether or not it makes any sense in the context of your problem.

Both the normal and lognormal distributions model continuous data. Birthdates are not continuous—they are discrete. On those grounds alone, a normal or lognormal model seems pretty far-fetched. It is possible that one could approximate a discrete distribution with a continuous distribution, but without a good reason to do so (either empirically or mathematically), this seems like a bad idea.

A binomial model at least has the advantage of being discrete, but that isn't enough to justify its use in this problem. Again, think about what a binomial random variable models: it counts the number of successes out of some number of trials, where each trial has some probability of either succeeding or failing. If presidential birthdates are binomial, what does a single trial look like? What is a success, and what is a failure?

Via the process of elimination, the only thing left is a uniform distribution. From a theoretical point of view, this actually makes some amount of sense. If presidential birthdays are uniformly distributed, this says that a president is equally likely to be born on any day of the year. Lacking better theoretical models, it seems reasonable to assume that every day is as likely as any other. You would have to show me relatively strong evidence to the contrary in order to convince me that birthdays are not uniformly distributed (presidential or otherwise).

Moreover, if you take Milo Brandt's advice and make a bar chart with your data (I wouldn't really call it a histogram, as the domain (horizontal axis) is not really numerical, but categorical, but that is, I suppose, a matter of opinion), I think that you will find that the distribution looks roughly uniform. September and October look a bit strange, but random data (even uniform random data) are typically lumpy.

Alternatively, you could note that the list of birthdays that you have already defines a probability distribution. You could simply assume that 9% of all presidents are born in January, 11% are born in August, and so on.

As to your last question, that depends on your answer to the first part. If you assume that birthdays are uniformly distributed, then the probability of the next president being born in any particular month is about 1/12, or 8.3%. If you assume that your empirical data are the true distribution, then you can use that to make your predictions.

0

A discrete uniform distribution would have $1/12 = 8.33\%$ born in each of the $k = 12$ months. You say you have relative frequencies for Jan through Dec as follows $$r = (.09, .09, .09, .09, .05, .05, .09, .11, .02, .14, .11, .07).$$

Depending on how the counting was done and what year the problem was written, there have been about $n = 45$ presidents. In order to do a chi-squared goodness-of-fit (GOF) test, you need counts not relative frequencies. Multiplying to get the vector $45r,$ and rounding to integers, I get counts $$ X = (4, 4, 4, 4, 2, 2, 4, 5, 1, 6, 5, 3),$$ which adds to $n = 44.$

Then assuming a discrete uniform distribution over the $k = 12$ months we have $E = 44/12 = 3.677$ expected presidential births in each month.

The chi-squared GOF test is uses the statistic $$Q = \sum_{i=1}^{12} \frac{(X_i - E)^2}{E},$$ which has roughly (very roughly for $E$ as small as 3.7) distributed as $\mathsf{Chisq}(df = 11).$ Large values of $Q$ indicate poor fit to the discrete uniform distribution.

Thus we would reject the null hypothesis of uniform births across the 12 months, if $Q > 16.675,$ a value that cuts 5% of the probability from the upper tail of the approximate distribution.

I get $Q = 6.18$ which means that presidential birthdays are "consistent" with a discrete uniform distribution. That is far from a definitive statement that presidential birthdays are uniformly distributed across the year. Because we have only $n = 44$ presidential birthdays, it is not really possible to say what distribution they might follow.

However, of the various distributions you mentioned @XanderHenderson has given good reasons for eliminating all but the discrete uniform. Also. there is no reason for you to suppose birth month has anything to do with becoming president (unless, perhaps, you are a believer in astrology) and so a discrete uniform model of presidential birthdays seems as good as any.

As for predicting future presidential birthdays, that does not seem to be a probability problem. (Maybe an astrologer could give some 'help' with that.)

BruceET
  • 52,418
  • There are plenty of reasons to believe the underlying distribution would not be exactly uniform discrete: for example: months are of different lengths, conceptions are seasonal, relative age at entry to schooling affects performance, and weather around birth could affect perinatal survival. But these effects could be too small to distinguish with a sample of $44$ and they may have changed over time – Henry Nov 26 '17 at 09:27
  • @Henry. True, but these departures from uniform do seem far below the threshold discernible from the limited data available here. Feb with 28 days has more than its share. In the US birthrates tend to be lower in late autumn and winter (except Dec) as per this chart, but the president data show no hint of this. – BruceET Nov 27 '17 at 08:40