withinVariation() on part of dataset

Merry Christmas and Happy New Year!

I am trying to perform withinVariation() function before fitting data into sPLS-DA and DIABLO. However, only one-third of subjects in my dataset have repeat measurements (pre- vs. post-drug treatment).

I compared the PCA with vs. without “multilevel = subjects” on these repeat measurements and the difference was quite obvious. I am wondering if I can only apply withinVariation() on the subjects with repeat measurements and then combine the output with the remaining dataset to fit sPLS-DA?

Thank you!

Hi @Kang,

Yes this is possible. The below code uses the SRBCT dataset as an example, such that we pretend the first 20 samples are repeated measurements.

Note that with data used below, withinVariation() introduced negative values into the dataframe. It didn’t have negative values prior to the decomposition. Hence, I would advise you to be careful doing this as it may drastically change the distributions of the repeated samples while the non-repeated samples retain their original distributions.

This may have detrimental effects to your model. I would suggest running PCA on the original data (equivalent of X below) and the partially decomposed data (X.final below) and assessing the differences.

Hope this helped.

Cheers,
Max.

library(mixOmics)
data(srbct) 
X <- srbct$gene
Y <- srbct$class

# pretend that the first 20 samples from this dataset are of repeated design
# indices of "repeated samples"
repeated.samples <- 1:20 
sample <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10) 
# set multilevel design
design <- data.frame(sample = sample) 

# subset X dataframe for only repeated samples
X.r <- X[repeated.samples, ] 

# decompose
X.r.w <- withinVariation(X = X.r, design = design) 

# combine decomposed repeated samples with the remaining, non-repeated samples
X.final <- rbind(X.r.w, X[-repeated.samples, ]) 

# form sPLS-DA model
model <- splsda(X.final, Y, keepX = c(15,15))