# Simulating Multilevel Data

Hey, I have a question on how to simulate data from
“A novel approach for biomarker selection and the integration of repeated measures experiments from two assays”.

I want to compare using Xw to X for PLSDA filter methods. I want to simulate data similar to the block correlation used for the subject random effect and error term in the paper. However I am using only two treatment groups.

This is my code:

library(MASS)
library(rlist)
library(Matrix)

mujk=list(c(rep(2,40),rep(0,160)),
c(rep(1,40),rep(0,160))) #means separating certain genes for both treatments

rho1=0.8 #correlation for subject random effect
rho2=0.7 #correlation for error
varR=3 #variance of subject random effect
varE=0.5 #variance of error

sigmarandom=list((diag(20) + (rho1varR)(1-diag(20)))-diag(20)+(diag(20)varR),(diag(20) + (rho1varR)*(1-diag(20)))-diag(20)+(diag(20)*varR)) #homogenous covariance matrix rho =0.8 for subject random effect block diagonal

sigmaerror=list((diag(20) + (rho2varE)(1-diag(20)))-diag(20)+(diag(20)varE),(diag(20) + (rho2varE)*(1-diag(20)))-diag(20)+(diag(20)*varE)) #homogenous covariance matrix rho =0.7 for error random effect block diagonal

Rsig=as.matrix(bdiag(rep(sigmarandom,5)))
Esig=as.matrix(bdiag(rep(sigmaerror,5)))

perswitch=c(1,-1) #gives labels to treatments

lmmper=list(2) #treatments
lmmS=list(12) #total subjects to simulate

for(s in 1:12){ #12 is number of subjects
###random effect is changing
Randeff=mvrnorm(n=1,mu=rep(0,200),Sigma=Rsig)# Random effect for subject same for each #observation on subject
for(p in 1:2){ #for each treatment
lmmper[[p]]=c(s,perswitch[p],mujk[[p]]+Randeff+mvrnorm(n=1,mu=rep(0,200),Sigma=Esig))#(sample ID, Y, X)
}

lmmS[[s]]=lmmper
}

lmmX=lapply(1:length(lmmS),function(i){list.rbind(lmmS[[i]])})
lmmX=data.frame(list.rbind(lmmX))

colnames(lmmX)[1:2]<-c(“sample”,“response”)

lmmY=data.frame(lmmX[,1:2])

X=lmmX[,-c(1,2)]
Y=lmmY\$response

I am curious if I am following the logic correctly from the paper. I can simulate the data using a block diagonal right? If so does my approach look correct or am I misunderstanding how to conduct the simulation?I am comparing filter methods with PLSDA on this design and am not seeing any difference in feature selection. Makes me think I am doing something wrong.

I appreciate the help!

Dear @Neystale

Unfortunately I wont be able to (efficiently) help you here. The person who developed the simulated data is Prof Benoit Liquet, you could try to contact him, I dont have access to his code.

Potentially you could reuse the data from vac18.simulated if you are after a two factor design and remove some treatment, but this would be suboptimal. I confirm that the effects should be strong through and you should be able to extract the relevant variables!

Kim-Anh