BayesTwin is an open-source R package that serves as a pipeline to the MCMC program JAGS to perform Bayesian inference on genetically-informative hierarchical twin data. Simultaneously to the biometric model, an item response theory (IRT) measurement model is estimated, allowing analysis of the raw phenotypic (item-level) data. The integration of such a measurement model is important since earlier research has shown that an analysis based on an aggregated measure (e.g., a sum-score based analysis) can lead to an underestimation of heritability and the spurious finding of genotype-environment interactions. The package includes all common biometric and IRT models as well as functions that help plot relevant information or determine whether the analysis was performed well.

Statistical analysis of data on twins focuses on determining the relative contributions of nature and nurture to individual differences in a behavioral trait (i.e., phenotype). Twin pairs are either identical (monozygotic, MZ) and share the same genomic sequence or non-identical (dizygotic, DZ) and share on average only half of the segregating genes. When MZ twin pairs are more similar in a phenotype (e.g., depression or educational achievement) than DZ twin pairs, this implies that genetic influences are important.

Traditionally, in the classical twin study, the ACE model is used which decomposes total phenotypic variance (e.g., the total variance in the scores on a depression scale or mathematics test),

Commonly, when twin data are analysed, the phenotypic data of an individual twin is aggregated to a single score for example by calculating a twin’s sum over all answered items (i.e., the sum score). However, earlier research has shown that using an aggregated score such as the sum score can lead to an underestimation of heritability [

Earlier research has also shown that the incorporation of an item response theory (IRT) measurement model into the genetic analysis can overcome this potential bias [_{ijk}

where the difficulty parameter of item _{k}

The 1PL and 2PL are suitable for dichotomous data (e.g., scored as correct = 1 and false = 0), as for example collected from ability tests (e.g., mathematical ability). For non-dichotomous data such as ordered categories (e.g., Likert scale data) as often collected in personality tests, the partial credit model (PCM, without different discrimination parameters) or the generalized partial credit model (GPCM, with different discrimination parameters) can be used [

Besides the ones above mentioned, further advantages of the IRT approach include the flexible handling of missing data and the harmonization of traits measured on different measurement scales. For example, when different twin registers have used different IQ tests not comparable in difficulty, IRT can be used to set the items scores on the same measurement scale [

While free and open-source software for analysis of twin data is available [

To take full advantage of the IRT approach, both genetic and IRT model have to be estimated simultaneously [

Although MCMC modelling of twin data is easily carried out using off-the-shelf software packages like JAGS [

To facilitate the use of Bayesian statistics and prevent bias in heritability and/or the spurious finding of G × E interactions by analysing both IRT and genetic model

The main function

A requirement to use the main function

The retrieved output includes posterior samples, posterior point estimates with standard deviations and the 95% HPD interval for variance components and, if applicable, regression parameters. The HPD can be seen as the Bayesian version of a confidence interval. All objects that are returned from the main function are automatically assigned the class “bayestwin”. The S3 method

The function

Example of a trace plot

Example of a G × E interaction plot produced by the function

The code is based on earlier publications [

Since

R version 3.4.0 or higher and JAGS version 4.2.0 or higher.

An Internet connection is required to install the

R packages:

The package was created by Inga Schwabe.

R and JAGS

The BayesTwin package is accompanied by extensive documentation of functionality. Each function is accompanied by an R help file, which can be accessed by the user by typing help(…) with the function name inside the brackets. Each help file contains worked examples of real R code that users can paste into the R console and run. Furthermore, code examples are provided at the personal website of the package author. These also include code example on how to run the R package on a computer cluster.

Full open access is provided to all source code and full reuse rights via the generous GPL-Clause license. This makes it easy for others to use the code base, also in order to collaborate or ask questions.

A drawback of the method is that it is computationally intensive and can take several hours to complete. Future research therefore will focus on optimizing the MCMC algorithm procedure. Furthermore, individual twins in the data set with partly missing covariate data cannot be included in the analysis, leading to reduced statistical power. Another problem is that covariates cannot always be defined as a twin pair covariate or an individual covariate: some covariates can be shared for some of the twin pairs, but non-shared for other pairs. In future releases of the package, a new Bayesian method will be applied that incorporates covariates that can be both shared and non-shared, and that are given a prior distribution so that even individuals with partly missing data on the covariates can be used in the analysis [

The author has no competing interests to declare.