Lesson 10: The Binomial Distribution

In this lesson, and some of the lessons that follow in this section, we'll be looking at specially named discrete probability mass functions, such as the geometric distribution, the hypergeometric distribution, and the poisson distribution. As you can probably gather by the name of this lesson, we'll be exploring the well-known binomial distribution in this lesson.

The basic idea behind this lesson, and the ones that follow, is that when certain conditions are met, we can derive a general formula for the probability mass function of a discrete random variable \(X\). We can then use that formula to calculate probabilities concerning \(X\) rather than resorting to first principles. Sometimes the probability calculations can be tedious. In those cases, we might want to take advantage of cumulative probability tables that others have created. We'll do exactly that for the binomial distribution. We'll also derive formulas for the mean, variance, and standard deviation of a binomial random variable.

Objectives

Upon completion of this lesson, you should be able to:

10.1 - The Probability Mass Function

10.1 - The Probability Mass Function

Example 10-1

beaver Stadium filled with students

We previously looked at an example in which three fans were randomly selected at a football game in which Penn State is playing Notre Dame. Each fan was identified as either a Penn State fan (\(P\)) or a Notre Dame fan (\(N\)), yielding the following sample space:

We let \(X\) = the number of Penn State fans selected. The possible values of \(X\) were, therefore, either 0, 1, 2, or 3. Now, we could find probabilities of individual events, \(P(PPP)\) or \(P(PPN)\), for example. Alternatively, we could find \(P(X = x)\), the probability that \(X\) takes on a particular value \(x\). Let's do that (again)! This time though we will be less interested in obtaining the actual probabilities as we will be in looking for a pattern in our calculations so that we can derive a formula for calculating similar probabilities.

Solution

Since the game is a home game, let's again suppose that 80% of the fans attending the game are Penn State fans, while 20% are Notre Dame fans. That is, \(P(P) = 0.8\) and \(P(N) = 0.2\). Then, by independence:

\(P(X = 0) = P(NNN) = 0.2 \times 0.2 \times 0.2 = 1 \times (0.8)^0\times (0.2)^3\)

And, by independence and mutual exclusivity of \(NNP\), \(NPN\), and \(PNN\):

\(P(X = 1) = P(NNP) + P(NPN) + P(PNN) = 3 \times 0.8\times 0.2\times 0.2 = 3\times (0.8)^1\times (0.2)^2\)

Likewise, by independence and mutual exclusivity of \(PPN\), \(PNP\), and \(NPP\):

\(P(X = 2) = P(PPN) + P(PNP) + P(NPP) = 3\times 0.8 \times 0.8 \times 0.2 = 3\times (0.8)^2\times (0.2)^1\)

Finally, by independence:

\(P(X = 3) = P(PPP) = 0.8\times 0.8\times 0.8 = 1\times (0.8)^3\times (0.2)^0\)

Do you see a pattern in our calculations? It seems that, in each case, we multiply the number of ways of obtaining \(x\) Penn State fans first by the probability of \(x\) Penn State fans \((0.8)^x\) and then by the probability of \(3-x\) Nebraska fans \((0.2)^\).

This example lends itself to the creation of a general formula for the probability mass function of a binomial random variable \(X\).

Binomial Random Variable \(X\)

The probability mass function of a binomial random variable \(X\) is:

We denote the binomial distribution as \(b(n,p)\). That is, we say:

where the tilde \((\sim)\) is read "as distributed as," and \(n\) and \(p\) are called parameters of the distribution.

Let's verify that the given p.m.f. is a valid one!

Now that we know the formula for the probability mass function of a binomial random variable, we better spend some time making sure we can recognize when we actually have one!

10.2 - Is X Binomial?

10.2 - Is X Binomial? Binomial Random Variable

A discrete random variable \(X\)is a binomial random variable if:

Example 10-2

gold coin

A coin is weighted in such a way so that there is a 70% chance of getting a head on any particular toss. Toss the coin, in exactly the same way, 100 times. Let \(X\)equal the number of heads tossed. Is \(X\)a binomial random variable?

Answer

Yes, \(X\) is a binomial random variable, because:

  1. The coin is tossed in exactly the same way 100 times.
  2. Each toss results in either a head (success) or a tail (failure).
  3. One toss doesn't affect the outcome of another toss. The trials are independent.
  4. The probability of getting a head is 0.70 for each toss of the coin.
  5. \(X\) equals the number of heads (successes).

Example 10-3

A college administrator randomly samples students until he finds four that have volunteered to work for a local organization. Let \(X\) equal the number of students sampled. Is \(X\) a binomial random variable?

Answer

No, \(X\) is not a binomial random variable, because the number of trials \(n\)was not fixed in advance, and \(X\) does not equal the number of volunteers in the sample.

Example 10-4

yarn skeins

A Quality Control Inspector (QCI) investigates a lot containing 15 skeins of yarn. The QCI randomly samples (without replacement) 5 skeins of yarn from the lot. Let \(X\)equal the number of skeins with acceptable color. Is \(X\) a binomial random variable?

Answer

No, \(X\) is not a binomial random variable, because \(p\), the probability that a randomly selected skein has acceptable color changes from trial to trial. For example, suppose, unknown to the QCI, that 9 of the 15 skeins of yarn in the lot are acceptable. For the first trial, \(p\)equals \(\frac\). However, for the second trial, \(p\)equals either \(\frac\) or \(\frac\)depending on whether an acceptable or unacceptable skein was selected in the first trial. Rather than being a binomial random variable, \(X\) is a hypergeometric random variable. If we continue to assume that 9 of the 15 skeins of yarn in the lot are acceptable, then \(X\) has the following probability mass function:

Example 10-5

sport utility vehicle

A Gallup Poll of \(n = 1000\) random adult Americans is conducted. Let\(X\)equal the number in the sample who own a sport utility vehicle (SUV). Is \(X\) a binomial random variable?

Answer

No, \(X\) is technically a hypergeometric random variable, not a binomial random variable, because, just as in the previous example, sampling takes place without replacement. Therefore, \(p\), the probability of selecting an SUV owner, has the potential to change from trial to trial. To make this point concrete, suppose that Americans own a total of \(N=270,000,000\) cars. Suppose too that half (135,000,000) of the cars are SUVs, while the other half (135,000,000) are not. Then, on the first trial, \(p\)equals \(\frac\) (from 135,000,000 divided by 270,000,000). Suppose an SUV owner was selected on the first trial. Then, on the second trial, \(p\) equals 134,999,999 divided by 269,999,999, which equals. punching into a calculator. 0.499999. Hmmmmm! Isn't that 0.499999. close enough to \(\frac\) to just call it \(\frac\)?Yes. that's what we do!

In general, when the sample size \(n\)is small in relation to the population size \(N\), we assume a random variable \(X\), whose value is determined by sampling without replacement, follows (approximately) a binomial distribution. On the other hand, if the sample size \(n\)is close to the population size \(N\), then we assume the random variable \(X\) follows a hypergeometric distribution.

10.3 - Cumulative Binomial Probabilities

10.3 - Cumulative Binomial Probabilities

Example 10-6

By some estimates, twenty-percent (20%) of Americans have no health insurance. Randomly sample \(n=15\) Americans. Let \(X\) denote the number in the sample with no health insurance. What is the probability that exactly 3 of the 15 sampled have no health insurance?

Solution

Since \(n=15\) is small relative to the population of \(N\) = 300,000,000 Americans, and all of the other criteria pass muster (two possible outcomes, independent trials, . ), the random variable \(X\) can be assumed to follow a binomial distribution with \(n=15\) and \(p=0.2\). Using the probability mass function for a binomial random variable, the calculation is then relatively straightforward:

That is, there is a 25% chance, in sampling 15 random Americans, that we would find exactly 3 that had no health insurance.

What is the probability that at most one of those sampled has no health insurance?

Solution

"At most one" means either 0 or 1 of those sampled have no health insurance. That is, we need to find:

Using the probability mass function for a binomial random variable with \(n=15\) and \(p=0.2\), we have:

That is, we have a 16.7% chance, in sampling 15 random Americans, that we would find at most one that had no health insurance.

What is the probability that more than seven have no health insurance?

Solution

Yikes! "More than seven" in the sample means 8, 9, 10, 11, 12, 13, 14, 15. As the following picture illustrates, there are two ways that we can calculate \(P(X>7)\):

caduceus symbol of medicine

The good news is that the cumulative binomial probability table makes it easy to determine \(P(X\le 7)\) To find \(P(X\le 7)\) using the binomial table, we:

  1. Find \(n=15\) in the first column on the left.
  2. Find the column containing \(p=0.20\).
  3. Find the 7 in the second column on the left, since we want to find \(F(7)=P(X\le 7)\).

Now, all we need to do is read the probability value where the \(p=0.20\) column and the (\(n = 15, x = 7\)) row intersect. What do you get?

Table II: continued
p
n x 0.05 0.10 0.15 0.20 0.25 0.30
11 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
15 0 0.4633 0.2059 0.0874 0.0352 0.0134 0.0047
1 0.8290 0.5490 0.3186 0.1671 0.0802 0.0353
2 0.9638 0.8159 0.6042 0.3980 0.2361 0.1268
3 0.9945 0.9444 0.8227 0.6482 0.4613 0.2969
4 0.9994 0.9873 0.9383 0.8358 0.6865 0.5155
5 0.9999 0.9978 0.9832 0.9389 0.8516 0.7216
6 1.0000 0.9997 0.9964 0.9819 0.9434 0.8689
7 1.0000 1.0000 0.9994 0.9958 0.9827 0.9500
8 1.0000 1.0000 0.9999 0.9992 0.9958 0.9848
9 1.0000 1.0000 1.0000 0.9999 0.9992 0.9963
10 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993
11 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Table II: continued
p
n x 0.05 0.10 0.15 0.20 0.25 0.30
11 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
15 0 0.4633 0.2059 0.0874 0.0352 0.0134 0.0047
1 0.8290 0.5490 0.3186 0.1671 0.0802 0.0353
2 0.9638 0.8159 0.6042 0.3980 0.2361 0.1268
3 0.9945 0.9444 0.8227 0.6482 0.4613 0.2969
4 0.9994 0.9873 0.9383 0.8358 0.6865 0.5155
5 0.9999 0.9978 0.9832 0.9389 0.8516 0.7216
6 1.0000 0.9997 0.9964 0.9819 0.9434 0.8689
7 1.0000 1.0000 0.9994 0.9958 0.9827 0.9500
8 1.0000 1.0000 0.9999 0.9992 0.9958 0.9848
9 1.0000 1.0000 1.0000 0.9999 0.9992 0.9963
10 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993
11 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

The cumulative binomial probability table tells us that \(P(X\le 7)=0.9958\). Therefore:

\(P(X>7) = 1 − 0.9958 = 0.0042\)

That is, the probability that more than 7 in a random sample of 15 would have no health insurance is 0.0042.

What is the probability that exactly 3 have no health insurance?

Solution

We can calculate \(P(X=3)\) by finding \(P(X\le 2)\) and subtracting it from \(P(X\le 3)\), as illustrated here:

electric meter

For large \(p\) and small \(n\), the binomial distribution is what we call skewed left. That is, the bulk of the probability falls in the larger numbers \(n, n-1, n-2, \ldots\) and the distribution tails off to the left. For example, here's a picture of the binomial distribution when \(n=15\) and \(p=0.8\):