This repository contains R codes (for reproducibility) along with simulation results for "Varying Coefficient Regression for Group Testing Data". Our model is try to estimate an individual-level regression model based on group testing data that can capture the age-varying impact on the Chlamydia risk with selection. To relate available information, we consider
where
-
$\delta_{1d}=0\longrightarrow$ insignificant effects. -
$\delta_{1d}=1$ -
$\delta_{2d}=0\longrightarrow$ age-independent effects. -
$\delta_{2d}=1\longrightarrow$ age-varying effects.
-
To reproduce the results in the paper, we provide implementation details as follows.
username@login001 ~$ git clone git@github.com:yizenglistat/rvcm4gt.git
username@login001 ~$ cd rvcm4gtIn addition, for the privacy of the Iowa SHL group testing data, we create a simulated fake Iowa group testing data (under /data/simulated_fake_data.csv) for illustration. As we will see the code running on the fake data set successfully below
# A demo example to run 500 repetitions in one machine.
task_id <- 1
nreps <- 500
Ns <- c(3000, 5000)
pool_sizes <- c(5, 10)
model_names <- c("m1", "m2")
testings <- c("AT", "DT", "IT")
N_test <- 600
sigma <- 0.5task_id
The machine id. For example, 1,...,100 if running on the cluster. In this way, we will run 5 simulations independently on 100 nodes to have a total of 500 repetitions.
nreps
The repetitions.
Ns
A vector of sample sizes.
pool_sizes
A vector of pool sizes.
model_names
A vector of model names. Different model names corresponds to different varying function sets.
testings
A vector of testing protocols such as AT (array testing), DT (Dorfman Testing) or IT (Individual Testing).
N_test
Number of knots values in inference for estimated varying functions.
sigma
True random effect standard deviation
After setting up the environment (requirement.txt) and arguments, one should be able to run the following code in R to reproduce simulation results in the paper.
# R version 4.4.0 or above
> source('run.r')After collecting .RData files under output/, one should be able to reproduce the results subsequently. The following demo figure and demo table show that
IP: the inclusion probability of the any significant effect, i.e.,
$\alpha_d$ or$\beta_d(u)$ .
IPF: the inclusion probability of the age-independent effect, i.e.,
$\alpha_d$ only.
IPV: the inclusion probability of the age-varying effect, i.e.,
$\beta_d(u)$ only.
We include a simulated (fake) group testing data that closely emulate the structure of Iowa group testing data under the folder data/fake/. To reproduce the simulated data analysis results in the Supplementary Masterials, one can run the following script.
# R version 4.4.0 or above
> source('run_fake.r')
- Yizeng Li
- Dewei Wang - the corresponding author
- Joshua M. Tebbs
This project is licensed under the MIT License - see the License file for details.




