Typical situations in research include the comparison of two groups regarding a metric variable, in which case usually the two-sample t-test is applied. While common frequentist two-sample t-tests focus on the difference of means of both groups via a p-value, the quantity of interest in applied research most often is the effect size. Existing Bayesian alternatives of the two-sample t-test replace frequentist significance thresholds like the p-value with the Bayes factor, taking the same testing stance. The R package

Studies in a wide range of fields randomly assign individuals to two groups with different treatments and measure a metric response variable. In clinical trials patients are assigned either to the treatment or the control group, where people in the treatment group get a new drug or treatment while in the control group the status quo drug or treatment is used. Interest most often lies in finding out differences between both groups, which commonly is translated into comparing the means _{2} and _{1} of them. Frequentist methods proceed via the two-sample t-test, which can be conducted in R easily via:

Here, _{2}–_{1} = 0, that is, of no difference is rejected. Bayesian alternatives of the frequentist two-sample t-test proceed similarly and use the Bayes factor instead of the p-value for deciding for or against the null hypothesis of no effect. In most cases, researchers are not interested in rejecing a null hypothesis, but in the size of the effect between both groups, that is _{2}–_{1})/

The

For effect sizes, it is recommended to select a ROPE of [–.1, .1] around _{0} = 0 [

In contrast to the frequentist two-sample t-test, in Bayesian statistics the posterior distribution of the parameters _{1} and _{2} and standard deviations _{1} and _{2} of both groups. It also produces the posterior distribution _{2}–_{1}, which is usually used in the classical two-sample t-test as well as the posterior distribution of _{2}–_{1} and the effect size

The following example uses the original sleep dataset of Student, who invented the t-test. The underlying experiment was designed to observe the effect of soporific drugs on the the sleep duration of 10 patients. 10 additional patients represent the control group. To ensure reproducibility, we first set a seed and then load the package

The data can be accessed and prepared as follows:

The sleep duration in the second group seems to be increased, as shown by exploratory data analysis:

A traditional one-sided two-sample t-test shows that there is a significant difference between both groups.

What is missing, is how large the effect size is, which is of much more interest. To investigate this, we first install and load the

The function _{2}–_{1} and standard deviations _{2}–_{1} given the data as well as convergence diagnostics. The upper left plot of Figure _{2}–_{1}, the upper right plot the trace plot of the 5000 draws used (5000 were deleted as burnin) for the posterior distribution. The traceplot shows that the MCMC algorithm has stabilized, also confirmed by the lower left Gelman-Rubin-diagnostic plot, which has converged quickly to its target value of 1, see also [_{2}–_{1}. The function call above also produces similar analysis plots for the posteriors of _{1}, _{2}, _{1}, _{2} and the effect size

Posterior distribution of the effect size

_{2}–_{1}. _{2}–_{1}.

In summary, with the tools provided in the

The

The basis for the diagnostics of the package are provided by the

Internally, a Gibbs sampling algorithm iteratively samples the full posterior distribution _{1}, _{2}, _{1})^{s}_{2})^{s}^{s}^{s}_{1}–_{2} and _{1})^{s}_{2})^{s}^{s}^{s}_{1}–_{2})^{s}_{1}–_{2})^{s}_{1})^{s}_{2})^{s}_{1}–_{2} is then based on the samples (_{1}–_{2})^{s}

All packages on CRAN undergo standardized checks to ensure compatibility with the R package system. The provided R package contains tests as well as examples, which were run on Windows and Linux 86_64. Trusting the quantitative output should rely on verifying the open source code.

Works on all operating systems supporting R.

R (version 3.5.1 or higher)

None.

Riko Kelter

English

The software is written to make its use as easy as possible. Prominent use cases include clinical trials where the goal lies in estimating the effect size of a treatment or new drug, as well as psychological and sociological studies where two groups are compared and interest lies in the effect size between them. Also, there are various applications in the experimental natural sciences like biology, chemistry or physics. The target audience therefore are scientists aiming at comparing two groups and the software should be useful to them, whether it is in medical research, social science or anywhere else. Currently the package assumes that data in each group is approximately normal distributed, which could be relaxed by implementing more robust versions using for example t-distributions. We encourage users to contact the author via email under

The author has no competing interests to declare.