Bartlett’s test for equal variances#
In [1], the influence of vitamin C on the tooth growth of guinea pigs was investigated. In a control study, 60 subjects were divided into small dose, medium dose, and large dose groups that received daily doses of 0.5, 1.0 and 2.0 mg of vitamin C, respectively. After 42 days, the tooth growth was measured.
The small_dose, medium_dose, and large_dose arrays below record
tooth growth measurements of the three groups in microns.
import numpy as np
small_dose = np.array([
4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, 5.2, 7,
15.2, 21.5, 17.6, 9.7, 14.5, 10, 8.2, 9.4, 16.5, 9.7
])
medium_dose = np.array([
16.5, 16.5, 15.2, 17.3, 22.5, 17.3, 13.6, 14.5, 18.8, 15.5,
19.7, 23.3, 23.6, 26.4, 20, 25.2, 25.8, 21.2, 14.5, 27.3
])
large_dose = np.array([
23.6, 18.5, 33.9, 25.5, 26.4, 32.5, 26.7, 21.5, 23.3, 29.5,
25.5, 26.4, 22.4, 24.5, 24.8, 30.9, 26.4, 27.3, 29.4, 23
])
The scipy.stats.bartlett statistic is sensitive to differences in
variances between the samples.
from scipy import stats
res = stats.bartlett(small_dose, medium_dose, large_dose)
res.statistic
np.float64(0.6654670663030519)
The value of the statistic tends to be high when there is a large difference in variances.
We can test for inequality of variance among the groups by comparing the observed value of the statistic against the null distribution: the distribution of statistic values derived under the null hypothesis that the population variances of the three groups are equal.
For this test, the null distribution follows the chi-square distribution as shown below.
import matplotlib.pyplot as plt
k = 3 # number of samples
dist = dist = stats.chi2(df=k-1)
val = np.linspace(0, 5, 100)
pdf = dist.pdf(val)
fig, ax = plt.subplots(figsize=(8, 5))
def plot(ax): # we'll reuse this
ax.plot(val, pdf, color='C0')
ax.set_title("Bartlett Test Null Distribution")
ax.set_xlabel("statistic")
ax.set_ylabel("probability density")
ax.set_xlim(0, 5)
ax.set_ylim(0, 1)
plot(ax)
plt.show()
Matplotlib is building the font cache; this may take a moment.
The comparison is quantified by the p-value: the proportion of values in the null distribution greater than or equal to the observed value of the statistic.
fig, ax = plt.subplots(figsize=(8, 5))
plot(ax)
pvalue = dist.sf(res.statistic)
annotation = (f'p-value={pvalue:.3f}\n(shaded area)')
props = dict(facecolor='black', width=1, headwidth=5, headlength=8)
_ = ax.annotate(annotation, (1.5, 0.22), (2.25, 0.3), arrowprops=props)
i = val >= res.statistic
ax.fill_between(val[i], y1=0, y2=pdf[i], color='C0')
plt.show()
res.pvalue
np.float64(0.71696121509966)
If the p-value is “small” - that is, if there is a low probability of sampling data from distributions with identical variances that produces such an extreme value of the statistic - this may be taken as evidence against the null hypothesis in favor of the alternative: the variances of the groups are not equal. Note that:
The inverse is not true; that is, the test is not used to provide evidence for the null hypothesis.
The threshold for values that will be considered “small” is a choice that should be made before the data is analyzed [2] with consideration of the risks of both false positives (incorrectly rejecting the null hypothesis) and false negatives (failure to reject a false null hypothesis).
Small p-values are not evidence for a large effect; rather, they can only provide evidence for a “significant” effect, meaning that they are unlikely to have occurred under the null hypothesis.
Note that the chi-square distribution provides the null distribution when the observations are normally distributed. For small samples drawn from non-normal populations, it may be more appropriate to perform a permutation test: Under the null hypothesis that all three samples were drawn from the same population, each of the measurements is equally likely to have been observed in any of the three samples. Therefore, we can form a randomized null distribution by calculating the statistic under many randomly-generated partitionings of the observations into the three samples.
def statistic(*samples):
return stats.bartlett(*samples).statistic
ref = stats.permutation_test(
(small_dose, medium_dose, large_dose), statistic,
permutation_type='independent', alternative='greater'
)
fig, ax = plt.subplots(figsize=(8, 5))
plot(ax)
bins = np.linspace(0, 5, 25)
ax.hist(
ref.null_distribution, bins=bins, density=True, facecolor="C1"
)
ax.legend(['asymptotic approximation\n(many observations)',
'randomized null distribution'])
plot(ax)
plt.show()
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[6], line 3
1 def statistic(*samples):
2 return stats.bartlett(*samples).statistic
----> 3 ref = stats.permutation_test(
4 (small_dose, medium_dose, large_dose), statistic,
5 permutation_type='independent', alternative='greater'
6 )
7 fig, ax = plt.subplots(figsize=(8, 5))
8 plot(ax)
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/_lib/_util.py:365, in _transition_to_rng.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
358 message = (
359 "The NumPy global RNG was seeded by calling "
360 f"`np.random.seed`. Beginning in {end_version}, this "
361 "function will no longer use the global RNG."
362 ) + cmn_msg
363 warnings.warn(message, FutureWarning, stacklevel=2)
--> 365 return fun(*args, **kwargs)
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/stats/_resampling.py:2108, in permutation_test(data, statistic, permutation_type, vectorized, n_resamples, batch, alternative, axis, rng)
2104 null_calculator_args = (data, statistic, n_resamples,
2105 batch, rng)
2106 calculate_null = null_calculators[permutation_type]
2107 null_distribution, n_resamples, exact_test = (
-> 2108 calculate_null(*null_calculator_args, xp=xp))
2110 # See References [2] and [3]
2111 adjustment = 0 if exact_test else 1
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/stats/_resampling.py:1499, in _calculate_null_both(data, statistic, n_permutations, batch, rng, xp)
1497 # data_batch = np.split(data_batch, n_obs_ic[:-1], axis=-1)
1498 data_batch = [data_batch[..., i:j] for i, j in zip(n_obs_ic[:-1], n_obs_ic[1:])]
-> 1499 null_distribution.append(statistic(*data_batch, axis=-1))
1500 null_distribution = xp.concat(null_distribution, axis=0)
1502 return null_distribution, n_permutations, exact_test
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/stats/_resampling.py:50, in _vectorize_statistic.<locals>.stat_nd(axis, *data)
47 data = np.split(z, split_indices)
48 return statistic(*data)
---> 50 return np.apply_along_axis(stat_1d, 0, z)[()]
File /usr/lib/python3/dist-packages/numpy/lib/_shape_base_impl.py:416, in apply_along_axis(func1d, axis, arr, *args, **kwargs)
414 buff[ind0] = res
415 for ind in inds:
--> 416 buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
418 res = transpose(buff, buff_permute)
419 return conv.wrap(res)
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/stats/_resampling.py:48, in _vectorize_statistic.<locals>.stat_nd.<locals>.stat_1d(z)
46 def stat_1d(z):
47 data = np.split(z, split_indices)
---> 48 return statistic(*data)
Cell In[6], line 2, in statistic(*samples)
1 def statistic(*samples):
----> 2 return stats.bartlett(*samples).statistic
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/stats/_axis_nan_policy.py:592, in _axis_nan_policy_factory.<locals>.axis_nan_policy_decorator.<locals>.axis_nan_policy_wrapper(***failed resolving arguments***)
589 res = _add_reduced_axes(res, reduced_axes, keepdims)
590 return tuple_to_result(*res)
--> 592 res = hypotest_fun_out(*samples, **kwds)
593 res = result_to_tuple(res, n_out)
594 res = _add_reduced_axes(res, reduced_axes, keepdims)
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/stats/_morestats.py:3181, in bartlett(axis, *samples)
3178 chi2 = _SimpleChi2(xp.asarray(k-1, dtype=dtype, device=xp_device(T)))
3179 pvalue = _get_pvalue(T, chi2, alternative='greater', symmetric=False, xp=xp)
-> 3181 T = xp.clip(T, min=0., max=xp.inf)
3182 T = T[()] if T.ndim == 0 else T
3183 pvalue = pvalue[()] if pvalue.ndim == 0 else pvalue
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/_lib/array_api_compat/_internal.py:35, in get_xp.<locals>.inner.<locals>.wrapped_f(*args, **kwargs)
33 @wraps(f)
34 def wrapped_f(*args: object, **kwargs: object) -> object:
---> 35 return f(*args, xp=xp, **kwargs)
File /scipy-1.17.1/.pybuild/cpython3_3.13_scipy/build/scipy/_lib/array_api_compat/common/_aliases.py:434, in clip(x, min, max, xp, out)
432 if max is not None:
433 b = wrapped_xp.asarray(max, dtype=x.dtype, device=dev)
--> 434 b = xp.broadcast_to(b, result_shape)
435 ib = (out > b) | xp.isnan(b)
436 out[ib] = b[ib]
File /usr/lib/python3/dist-packages/numpy/lib/_stride_tricks_impl.py:410, in broadcast_to(array, shape, subok)
367 @array_function_dispatch(_broadcast_to_dispatcher, module='numpy')
368 def broadcast_to(array, shape, subok=False):
369 """Broadcast an array to a new shape.
370
371 Parameters
(...) 408 [1, 2, 3]])
409 """
--> 410 return _broadcast_to(array, shape, subok=subok, readonly=True)
File /usr/lib/python3/dist-packages/numpy/lib/_stride_tricks_impl.py:349, in _broadcast_to(array, shape, subok, readonly)
346 raise ValueError('all elements of broadcast shape must be non-'
347 'negative')
348 extras = []
--> 349 it = np.nditer(
350 (array,), flags=['multi_index', 'refs_ok', 'zerosize_ok'] + extras,
351 op_flags=['readonly'], itershape=shape, order='C')
352 with it:
353 # never really has writebackifcopy semantics
354 broadcast = it.itviews[0]
KeyboardInterrupt:
ref.pvalue # randomized test p-value
Note that there is significant disagreement between the p-value calculated here
and the asymptotic approximation returned by scipy.stats.bartlett above.
The statistical inferences that can be drawn rigorously from a permutation test
are limited; nonetheless, they may be the preferred approach in many
circumstances [3].