(1) Overview

Introduction

Design of experiments (DOE) includes various statistical approaches for planning, executing and analysing experiments in a systematic and efficient way. Definitive Screening Design (DSD) [] is a particularly useful DOE technique in fields such as material engineering [, , ], chemistry [, ] and biology [, ], where numerous factors are involved in the experiments. The DSD has significant advantages over conventional DOEs. Most conventional DOE techniques are for experiments with two levels (e.g. –1/+1) for all factors. If there is a second-order effect of factor A (i.e. AA) that deals with two levels, a conventional DOE cannot find an optimal condition that may exist between the two levels. Even though some conventional DOEs can deal with three levels (e.g., –1/0/+1), such as L18 orthogonal array, the second-order effects often involve correlations with other factor effects. This implies that ambiguity may remain in the process of building models. In contrast, DSD deals with three levels (low, middle, and high values) of the factors (parameters) and the correlations between the factors are minimized. These features are useful for building second-order models and finding the optimal levels of factors in the investigated parameter windows of the experiments.

However, designing and analyzing DSD experiments typically requires commercial software (e.g., JMP, Design-Expert and Minitab) or programming knowledge (e.g., R or Python), which is, in many cases, a barrier for researchers to employ DSD in their experiments. Considering this situation, the authors of this work developed DSDApp, a free web application that provides an effortless way to use DSD for researchers with a lack of access to commercial software packages or statistical programming knowledge.

The user-friendly interface of DSDApp allows users to design DSD experiments, to create second-order models (including main effects, two-factor interactions and quadratic effects) and to perform parameter optimization to obtain desirable objective values. The app can be used in a very simple way only by button-clicking, which makes DSDApp a more attractive and easier option to employ DSD than existing R packages (e.g., “daewr”) or Python libraries (e.g., “definitive-screening-design”) that require statistical programming experience.

DSDApp is particularly profitable for researchers in small research or academic organizations or companies without access to commercial software. Also, those without any previous experience with DOE techniques or with limited experience in the use of DSD may benefit from the app because of its simpler and more intuitive operation compared to commercial software that offer complicated functionalities for more specialized needs. Overall, DSDApp is expected to extend the use of DSD and broaden experiments in various research and development fields.

Implementation and architecture

As shown in Figure 1, DSDApp offers three main functionalities: planning definitive screening designs, making regression models, and optimizing parameters. The detailed usage of DSDApp is explained in the following sections.

Figure 1 

Overview of DSDApp.

Planning DSD

In the “Plan” tab, one can create a DSD table with 4 to 12 factors. The process should start by changing the levels (low: –1, middle:0, high:1) of the factors. These levels are corresponding to the minimum, middle, and maximum values of the factors. Figure 2 shows an example of a six-factor (A, B, C, D, E, F) DSD table generated by DSDApp. Each row is an experimental run that users should execute with the factor levels given, and collect result Y. According to Ref. 9, two or more “fake factors” should be considered when constructing a DSD table for better effect detection performance. Fake factors do not correspond to the actual variables (therefore no need to consider when doing experiments), but only act when building a model as described in Appendix.

Figure 2 

Planning of DSD. By clicking “Download” in the left bottom corner, users can download the table shown in the right.

The table generated can be downloaded as a csv file by clicking on “Download.” In the experiment, the actual data should be recorded and added in a new column in the downloaded csv file.

In Figure 2, instead of actual data, the checkbox “Generate the sample data” was ticked to generate sample data given by:

(1)
y=3+2A+4BC+3D2AA2AB+CC+ε,ε~N(0,σ=0.3)

This model will be used for testing the app in the section “Quality control”.

Making Models

Before making the models, it is necessary to upload the file with the DSD table and experimental result(s). Multiple objective variables can be included in the added columns. By clicking on “Browse” (Figure 3), it is possible to upload a csv/txt file in the DSDApp from your computer.

Figure 3 

Uploading of DSD and experiment result.

The models can be built by following the steps below.

  1. Set the result column as an objective variable “Y”, input variables (A, B, C, D) as “X,” and fake factors (E, F) as “Fake Factor.”
  2. Click on “Find active terms” to obtain active main factors and second-order factors (Step 1). The detection of main and second-order factors is described in the tab “Step 1” as shown in Figure 4.
  3. After pressing “Find active terms,” the possible first- and second-order terms appear in “X1” and “X2” tab automatically, following the model building strategy detailed in the Appendix. The user can consider which terms should be included in the model and manually add/remove factors of interest (Step 2).
  4. (Optional) If the user manually altered terms in “X1”, press “Regenerate X2” to regenerate terms in “X2” by combining the first-order terms selected in “X1.” This procedure is optional if the user is satisfied with the terms that automatically appeared in “X1” and “X2” after pressing “Find active terms”.
  5. Click on “Build model” to make a model with the terms “X1” and “X2.” The summary of the model is described in the tab “Step2” as shown in Figure 5.
Figure 4 

Step 1: Detecting main factors and second-order terms.

Figure 5 

Step 2: Model making based on terms X1 and X2. The model information and the bar graph represent the coefficients of the terms. The red scatter plot displays how well the model can explain the obtained values.

DSDApp employs a model-selection strategy tailored for DSD, where active main (first-order) factors are detected first and then second-order terms related to main factors are included []. For example, when main factors A and B are active (or set as X1), possible second-order terms (X2) are AA, AB and BB. To avoid overfitting, the second-order terms are included to minimize Akaike Information Criteria with finite correction (AICc). See the Appendix for a more detailed explanation.

After creating the model, the prediction of the output value is possible in the tab “Predict”. The input vector × (or the factor levels) can be set to specified values, as it can be seen in Figure 6. The prediction value yx0 at x0 = [1, A, B, C, D, …] and its prediction interval is calculated as:

(2)
yx0±tα/2,npσ2(1+x0(XtX)1x0))

where X is the design matrix of DSD, α is the significance level (0.05), n is the number of runs, p is the number of terms in the model (including the intercept term) and tα/2,np is t-value with two-sided confidential level α and the degree of freedom n-p.

Figure 6 

Prediction of output variable at specified input variables.

Optimizing Factor Level

DSDApp offers the possibility to optimize the parameters, i.e., to find the optimal set of parameters for the main factors to obtain the desired objective variable(s). DSDApp transforms the objective value y into a “desirability function” that evaluates how satisfactory y is on a scale from 0 “not satisfactory” to 1 “completely satisfactory.” Figure 7 illustrates the desirability functions for three different cases: minimization, maximization and tuning y at a specific value. For minimizing and maximizing, the individual desirability function D1 is expressed as:

(3)
D1=[1+99exp{(yyallowable+ytarget2)(2pyallowableytarget) }]1,                                  p={1 (for minimizing)1 (for maximizing)

where yallowable refers to the minimum required value that the user wishes to secure, and ytarget refers to the ideal value. In the case of minimization, for example, if y > yallowable, D1 gets smaller than 0.01 as shown in Figure 7(a). This value of D1 is small enough to consider y as unsatisfactory when y > yallowable and, therefore, y tends to be smaller than yallowable. In the case of maximization, the value of D1 gets smaller when y < yallowable as shown in Figure 7(b), and thus y tends to be larger than yallowable. Note that y cannot get much smaller (or larger) than ytarget because D1 remains stable when y < ytarget for minimization, and y > ytarget for maximization.

Figure 7 

Desirability functions; (a) minimization, (b) maximization, and (c) tuning (at ytarget = 2) with ylower = 1 and yupper = 3.

Figure 7(c) shows the individual function D2 for tuning at a certain level ytarget. The function D2 was expressed as:

(4)
D2={exp{(yytarget)22 (ytargetylower3)2} ,  (yytarget)exp{(yytarget)22 (yupperytarget3)2} ,  (y>ytarget)

The desirability functions are calculated for various sets of the factor levels x (e.g., x = [–0.5,+0.2, 0, …, 1]). The set x that maximizes the desirability is the optimal parameter set. For the optimization calculation, limited-memory quasi-Newton code for bound-constrained optimization (L-BFGS-B) is employed (the function “optim” in R language can do the optimization).

In the case of having multiple objective variables, multi-objective optimization is also possible. Instead of maximizing individual desirability, the total desirability

(5)
Dtotal=(i=1nDi)1/n

is maximized; where i is the iterator corresponding to n objective variables. If no parameter set can meet all the limitations, i.e., if some of the desirability functions are zero, then Dtotal is evaluated as 0. In such situation, broadening a bit the limitations (yallowable) of some objective variables can be helpful to obtain an optimization result.

The optimization procedure is as follows.

  1. Click on “Register model” as shown in Figure 5. The registered model shows up in the selector in Figure 8.
  2. Click on “Set” button to define the purpose (minimize/maximize/target) and lower or raise the limits.
  3. Click on “Maximize desirability” to optimize total desirability Dtotal. “Maximize desirability” needs to be clicked several times because optimization starts from different initial points and can lead to different optimal conditions.
  4. For multiple output values, it is necessary to register all the models and set their purposes of optimization individually.
Figure 8 

Optimization.

Quality control

DSDApp has been validated by using a six-factor DSD table and simulated observations. The DSD table includes columns A-D as real factors and E and F as fake factors as shown in Table 1 (same as the table in Figure 5). Then, we generated simulated data y using the predefined models (same as the equation in Figure 2).

(1)
y=3+2A+4BC+3D2AA2AB+CC+ε,ε~N(0,σ=0.3)

Table 1

Data and simulated result.


NO.ABCDEFY

101–1–1–1–15.812

20–111112.055

310–111–17.749

4–101–1–11–3.521

5–1–101–1–1–3.901

6110–1111.754

7–11101–15.146

81–1–10–113.221

91–11–10–1–1.827

10–11–11019.908

111111–108.454

12–1–1–1–110–7.883

130000002.814

The users can verify the app functionality by checking whether or not the model built based on observation y is similar to the predefined model (1).

Afterwards, the model below was built based on the generated data using DSDApp as shown in “Model Information” in Figure 5.

(6)
y=2.886+2.018A+4.014B0.917C+2.904D1.968AB     –1.9355AA+1.106CC

As shown in Figure 9, the coefficients of the built model (6) and the original model (1) are similar, but with slight differences. To confirm that these differences are negligible compared to the variation ɛ, another set of the input points was introduced and the prediction validity of the model was checked.

Figure 9 

Comparison between the true model (1) in Table 2 and the built model.

Table 2 shows the confirmation points at the edge of the experimental space, where the maximum prediction error is expected to be observed. Figure 10 shows the correlation between predicted values ŷ based on the model (6) (vertical axis) and the true values y generated by the original function (horizontal axis) for the same input parameters as in Table 2. The blue dots correspond to y and ŷ at the DSD points, and the red dots to the confirmation points. By using the values of y and ŷ, the residual sum of squares for the confirmation points is calculated as 0.156, which is smaller than the original standard error σ = 0.30. Thus, the constructed model (6) fits well with the true model (1). Nevertheless, one should be cautious when several quadratic effects and two-factor interactions are active because these effects in DSD sometimes have relatively strong correlations []. In such a case, several possible models should be evaluated in later experiments to see which model is the most useful and reliable.

Table 2

Confirmation points (A, B, C, D). y is generated by the original function (see Test 1 in Table 3), and ŷ is generated by the model (9).


ABCDYŶ

1–1–1–10–0.02245

–11–1–143.938109

11–1–143.977689

–1–11–1–10–9.52573

–111–122.237702

111–122.277282

–1–1–11–2–1.83913

1–1–1165.963744

11–11109.963885

Table 3

The validation of DSDApp based on literature data.


LITERATUREMODELS OR ACTIVE TERMS IN LITERATUREMODELS OR ACTIVE TERM IN DSDAPPREMARKS

[]y = 56.1 + 8.9B–9.3C + 21.0G +1 6.8H–9.0y = 56.1 + 8.9B–9.3C + 21.0G + 16.8H–9.1F and I were set as fake factors in DSDApp.

[]y = 1170 + 10A + 216E*–316G*–181H*–83I–169AE*–313HH* + 81AH + 17D + 85DDy = 1000 + 10.2A + 216E*–316G*–180H*–83I–153AxE*–214HH* + 97AH–98EGB and C were set as fake factors, and A was manually included as a main factor in DSDApp.
“*” was marked as significant in the literature.

[]A, B, C, AB, AAA, B, C, AB, AANo fake factors

Figure 10 

Predicted and true values for DSD points (blue) and confirmation points (red) in Table 5. The fact that all the points align on the black line shows that the prediction agrees well with the original values.

Similar tests in this section can be done by users’ defined equations. For example, “test.R” generates CSV files (DSD8-with-Y1.csv, DSD8-with-Y2.csv) with a DSD table and sample data by a hand-made equation. Users can upload the generated CSV files in DSDApp, and then check if the built models are similar to user’s predefined models.

(2) Availability

Operating system

Windows (tested on Windows 10)

Programming language

R 3.6.3 (R.4.0)

Additional system requirements

Users can access to DSDApp at https://my-first-dsd.shinyapps.io/DSDApp_ver2/ on a web browser app.

For local usage, Rstudio is needed. Open server.R or ui.R in Rstudio, then click on “Run App.”

Dependencies

R packages daewr, shiny, and shinythemes.

List of contributors

P.C. conceptualized the idea and, together with J.H., provided supervision. R.H. developed the app and carried out tests to check its correct operation. All authors were involved in the writing of the original and the revised manuscript.

Software location

Code repository

Name: GitHub

Identifier:https://github.com/long-rh/DSDApp

Licence: MIT License

Date published: 03/12/2022

Language

English

(3) Reuse potential

DSDApp, accessible at https://my-first-dsd.shinyapps.io/DSDApp_ver2/, is designed for researchers and experimenters working in various fields who wish to include DSD in the experimental routine. Its intuitive interface allows users to navigate the application easily through button clicks.

The app is suitable for numerous applications where efficient and systematic experimentation is crucial. For example, in material engineering [, ], factors such as solution concentration, heating temperature, and humidity can be correlated with each other and need to be optimized for superior material properties.

Although specific examples of DSDApp usage have not been provided in literature, its potential applications are evident in the wide range of fields that rely on DSD as part of the experimental process. In Table 3, three examples are provided to show the potential of the use of the DSDApp on data already available in literature. For each reference, the DSD condition table and results were uploaded in DSDApp, the model generated in the app and compared to the one in the literature. The first example [] shows that the models in literature and the app are almost the same. The models in the second example [] were slightly different, but the terms with * (marked as significant in the literature) were successfully identified as active in DSDApp, too. In the final example [], although the coefficients of the model were not provided in the ref. [] the same active terms were identified in DSDApp and in the publication. Note that the models based on experimental results can be built by different model-selection strategies [], which causes differences in the resulting models. Also, the terms that should be included in the model depend on the knowledge and experience of the researcher in the field. Therefore, the models in literature and DSDApp are not always the same. The users have to keep in mind that DSD employs the specific model-selection strategy explained in detail in the Appendix.

Future development of DSDApp will include the ability to perform mixed-level DSD with two- and three-factors, allowing experimenters to incorporate blocking factors or categorical factors. This enhancement will further expand the app’s applicability to researchers who require more complex experimental designs.

For questions about using the app or inquiries regarding its application to specific research areas, R.H. and P.C. can be contacted via email. They can provide guidance and support to any user that wants to use the DSDApp for their experiments.

Additional File

The additional file for this article can be found as follows:

Appendix

Model-selection strategy in DSDApp. DOI: https://doi.org/10.5334/jors.462.s1