Boschloo's test


Boschloo's test is a statistical hypothesis test for analysing 2x2 contingency tables. It examines the association of two Bernoulli distributed random variables and is a uniformly more powerful alternative to Fisher's exact test. It was proposed in 1970 by R. D. Boschloo.

Setting

A 2x2 contingency table visualizes independent observations of two binary variables and :
The probability distribution of such tables can be classified into three distinct cases.
  1. The row sums and column sums are fixed in advance and not random.
Then all are determined by. If and are independent, follows a hypergeometric distribution with parameters :
.
  1. The row sums are fixed in advance but the column sums are not.
Then all random parameters are determined by and and follow a binomial distribution with probabilities :


  1. Only the total number is fixed but the row sums and the column sums are not.
    Then the random vector follows a multinomial distribution with probability vector.
Fisher's exact test is designed for the first case and therefore an exact conditional test. The typical example of such a case is the Lady tasting tea: A lady tastes 8 cups of tea with milk. In 4 of those cups the milk is poured in before the tea. In the other 4 cups the tea is poured in first. The lady tries to assign the cups to the two categories. Following our notation, the random variable represents the used method and represents the lady's guesses. Then the row sums are the fixed numbers of cups prepared with each method:. The lady knows that there are 4 cups in each category, so will assign 4 cups to each method. Thus, the column sums are also fixed in advance:. If she is not able to tell the difference, and are independent and the number of correctly classified cups with milk first follows the hypergeometric distribution.
Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary endpoint is compared between two patient groups. Following our notation, represents the first group that receives some medication of interest. represents the second group that receives a placebo. indicates the cure of a patient. Then the row sums equal the group sizes and are usually fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance.
An example for the third case can be constructed as follows: Simultaneously flip two distinguishable coins and and do this times. If we count the number of results in our 2x2 table, we neither know in advance how often coin shows head or tail, nor do we know how often coin shows head or tail.

Test hypothesis

The null hypothesis of Boschloo's one-tailed test is:
The null hypothesis of the one-tailed test can also be formulated in the other direction :
The null hypothesis of the two-tailed test is:
There is no universal definition of the two-tailed version of Fisher's exact test. Since Boschloo's test is based on Fisher's exact test, a universal two-tailed version of Boschloo's test also doesn't exist. In the following we deal with the one-tailed test and.

Boschloo's idea

We denote the desired significance level by. Fisher's exact test is a conditional test and appropriate for the first of the above mentioned cases. But if we treat the observed column sum as fixed in advance, Fisher's exact test can also be applied to the second case. The true size of the test then depends on the nuisance parameters and. It can be shown that the size maximum is taken for equal proportions and is still controlled by. However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than. This leads to an undesirable loss of power.
Boschloo proposed to use Fisher's exact test with a greater nominal level. Here, should be chosen as large as possible such that the maximal size is still controlled by :. This method was especially advantageous at the time of Boschloo's publication because could be looked up for common values of and. This made performing Boschloo's test computationally easy.

Test statistic

The decision rule of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as test statistic. Fisher's p-value is calculated from the hypergeometric distribution :
The distribution of is determined by the binomial distributions of and and depends on the unknown nuisance parameter. For a specified significance level the critical value of is the maximal value that satisfies. The critical value is equal to the nominal level of Boschloo's original approach.

Modification

Boschloo's test deals with the unknown nuisance parameter by taking the maximum over the whole parameter space. The Berger & Boos procedure takes a different approach by maximizing over a confidence interval of and adding.
is usually a small value such as 0.001 or 0.0001. This results in a modified Boschloo's test which is also exact.

Comparison to other exact tests

All exact tests hold the specified significance level but can have varying power in different situations. Mehrotra et al. compared the power of some exact tests in different situations. The results regarding Boschloo's test are summarized in the following.

Modified Boschloo's test

Boschloo's test and the modified Boschloo's test have similar power in all considered scenarios. Boschloo's test has slightly more power in some cases, and vice versa in some other cases.

Fisher's exact test

Boschloo's test is by construction uniformly more powerful than Fisher's exact test. For small sample sizes the power difference is large, ranging from 16 to 20 percentage points in the regarded cases. The power difference is smaller for greater sample sizes.

Exact Z-Pooled test

This test is based on the test statistic
where are the group event rates and is the pooled event rate.
The power of this test is similar to that of Boschloo's test in most scenarios. In some cases, the -Pooled test has greater power, with differences mostly ranging from 1 to 5 percentage points. In very few cases, the difference goes up to 9 percentage points.
This test can also be modified by the Berger & Boos procedure. However, the resulting test has very similar power to the unmodified test in all scenarios.

Exact Z-Unpooled test

This test is based on the test statistic
where are the group event rates.
The power of this test is similar to that of Boschloo's test in many scenarios. In some cases, the -Unpooled test has greater power, with differences ranging from 1 to 5 percentage points. However, in some other cases, Boschloo's test has noticeably greater power, with differences up to 68 percentage points.
This test can also be modified by the Berger & Boos procedure. The resulting test has similar power to the unmodified test in most scenarios. In some cases, the power is considerably improved by the modification but the overall power comparison to Boschloo's test remains unchanged.

Software

The calculation of Boschloo's test can be performed in following software: