The Chi-Square Test


This is the easiest test I know of that will let a breeder tell whether the observed ratio of affected to normal puppies is consistent with Mendelian expectations for a single gene trait (or a two-gene trait).

To use this test, all you need is a hypothesis that will let you figure out what you "should" have gotten, plus data about what you actually did get.


Let's do an example: 

Confirming mode of inheritance

Suppose you think, but you are not sure, that "butterfly" noses in your breed are inherited as a single-gene trait, that the gene has two alleles, and that inheritance works this way:  AA puppies have totally black noses; Aa puppies have a little pink on their noses, and aa puppies have very extensive pink on their noses.  You have bred several litters using a bitch you think is Aa, taking her to three different sires, all of which you believe were also Aa.

If you are right in your guess, what you should have gotten with these crosses (Aa x Aa) is a ratio of 25% AA puppies, 50% Aa puppies, and 25% aa puppies. (You can do a Punnett square to see how this ratio was obtained, if you wish.)  You can lump all the puppies together because all the crosses are the same with regard to the trait in question.  [If your cross was some other cross, say Aa x aa, then your expectations would change, but the method outlined below would not.] 

Let us say that what you actually got in these three combined litters were 8 puppies with completely black noses, 13 puppies with some pink, and 4 puppies with extensive pink.  This is not a 1 : 2 : 1 ratio, but the numbers are in the "right" direction -- are they close enough to confirm your guess?  You use the chi-square test to check.

Here's how you would set this up:

Phenotypes Observed (O) Expected (E) Difference (d)          d2      d2 / E
Black        8       6.25        2.25        5.06      0.81
Some pink       13     12.50        0.50        0.25      0.02
Lots of pink        4       6.25      -2.25        5.06      0.81
Total      25            1.64

How do you calculate the expected numbers?  The total number of puppies you got in these three litters was 8 +13 + 4 = 25.  25% of 25 is 6.25; 50% of 25 is 12.50.

The difference is the difference between the observed and the expected.

d2 is the difference, squared.

You add up every number in the final column to get a final chi-square number of 1.64.

In order to interpret this number, you must have access to a chi-square (χ2 ) table.  Any introductory genetics textbook should have one, but of course you may not have a genetics book handy.  Here is such a table and how to use it.

When doing this test, it helps to have a relatively large sample size to minimize chance; and it helps to take ambiguous results with a grain of salt.  A chi-square number of 0.00034 is tiny and probably very reliable when you draw your conclusion of no difference between observed and expected.  A chi-square number of 112.84 is very large and probably quite reliable when you draw a conclusion that there is a real difference between observed and expected.  A chi-square number of 6.02 or 4.87, compared to the 5.99 number off the table, suggests that it would be good to collect more data and repeat your calculations with a larger sample size.

In general, getting a little bitty chi-square number means that probably your guess about the system of inheritance was right and a big huge chi-square number means that probably your guess was wrong.


Correcting Flawed Data

There is a substantial risk, when trying to figure out mode of inheritance, of overestimating the number of affected puppies.  This occurs because usually breeders collect information only on litters in which at least one affected puppy appeared.  Only in such litters is it obvious that the cross was between carriers of a recessive trait.  Obviously this risk is greatest for breeds where the average litter size is small, since you are a lot more likely to fail (by pure chance) to get a recessive-phenotype puppy if you have only three pups in the litter than if you had, say, twelve.

When you cross two heterozygous animals, there is a 25% chance that any given puppy will show the recessive trait.  Expressed another way, you expect 1 out of 4 puppies to be affected.  But of course you don't have to get an exact 1 : 4 ratio, and usually you don't -- just as you don't exact to get exactly five heads and five tails every time you flip ten coins.  Sometimes, when you cross two heterozygous animals, you don't any puppies at all showing the recessive trait.  Generally speaking, what I see in students just learning genetics is a tendency to think that getting at least one recessive puppy "ought" to happen in a cross of two carriers.  In fact, fact, in a cross of two carriers, and given a litter of five puppies, about a quarter of the time you will get no recessive puppies.  It's just like having a litter of four girls and no boys -- there is no suggestion whatever that somehow the cross could not have produced boys just because none were actually seen in the litter.

So if your breed has small litter sizes, and you suspect or know that data has been collected only for litters in which recessive puppies appeared, then you should use the standard mathematical correction for this problem before you do your analysis.  Hutt shows the correction factors for litter sizes through twelve in his book (p. 77).  Here they are:

Size of litter Uncorrected Expected Corrected Expected
1  0.25      -------
2 0.50 1.143
3   0.75    1.297 
4 1.00 1.463
5 1.25 1.640
6 1.50 1.825
7 1.75 2.020
8 2.00 2.222
9 2.25 2.433
10 2.50 2.515
11 2.75 2.871
12 3.00 3.098

You see how much bigger the difference is between corrected and uncorrected for small litter sizes.  That's because you really don't expect to get zero recessives in a litter of twelve, if it was a carrier-carrier cross.  But it could happen, which is why there is a correction.

Let's see how this works.  Suppose you have data on four litters of puppies in which a severe type of muscular dystrophy has occurred.  The litters show these phenotypic distributions:

    Litter 1 -- three normal, two muscular dystrophy

    Litter 2 -- one normal, two dystrophy

    Litter 3 -- two normal, two dystrophy

    Litter 4 -- three normal, two dystrophy. 

You suspect this condition is a simple one-gene autosomal recessive.  You have a total of 9 normal : 8 dystrophy.  This is clearly not the three-to-one ratio expected for an autosomal recessive -- it fact, it looks a lot more like a 1 : 1 ratio.  If you do a normal chi-square test on these data, here's what you will get::

Phenotypes Observed (O) Expected (E) Difference (d)          d2      d2 / E
Normal         9           12.75           3.75        14.06      1.10
Dystrophy         8             4.25           3.75        14.06      3.31
Total       17            4.41

This chi-square number, if you check the table, is too high to confirm your hypothesis of autosomal recessive inheritance.  It looks like your first guess was wrong.  However, if you remember that you should correct for the possibility that you missed litters in which, by chance, no affected puppies appeared, you will apply the above correction factors for each litter::

Litter Affected, Observed Affected, Expected (uncorrected) Affected, Expected (corrected)
Litter one (five pups) 2           1.25       1.640
Litter two (three pups) 2           0.75       1.297
Litter three (four pups) 2           1.00       1.463
Litter four (five pups) 2



Total 8           4.25       6.040

Notice how the corrected expected proportion is now closer to the observed proportion of affected puppies:  that's what you would anticipate, if you were under-counting normal puppies.  Let's take this corrected proportion and do the same chi-square test.  We'll get the corrected expected proportion of normal puppies by subtracting the corrected dystrophy proportion from the total number of 17 puppies (17 - 6.04 = 10.96) and then go on as before:

Phenotypes Observed (O) Expected (E) Difference (d)          d2      d2 / E
Normal         9           10.96           1.96        3.84      0.35
Dystrophy         8             6.04           1.96        3.84      0.64
Total       17            0.98

There, wasn't that simple?  And now our chi-square number, 0.98, is much smaller.  This result now shows no difference between expected and observed:  our hypothesis is now, corrected, supported by our data.  This doesn't mean that it is true, but it means that it could be true.  There are other things you as a breeder could now do to establish that this really is a single-gene recessive trait.