withinVariation() on part of dataset

Hi @Kang,

Yes this is possible. The below code uses the SRBCT dataset as an example, such that we pretend the first 20 samples are repeated measurements.

Note that with data used below, withinVariation() introduced negative values into the dataframe. It didn’t have negative values prior to the decomposition. Hence, I would advise you to be careful doing this as it may drastically change the distributions of the repeated samples while the non-repeated samples retain their original distributions.

This may have detrimental effects to your model. I would suggest running PCA on the original data (equivalent of X below) and the partially decomposed data (X.final below) and assessing the differences.

Hope this helped.

Cheers,
Max.

library(mixOmics)
data(srbct) 
X <- srbct$gene
Y <- srbct$class

# pretend that the first 20 samples from this dataset are of repeated design
# indices of "repeated samples"
repeated.samples <- 1:20 
sample <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10) 
# set multilevel design
design <- data.frame(sample = sample) 

# subset X dataframe for only repeated samples
X.r <- X[repeated.samples, ] 

# decompose
X.r.w <- withinVariation(X = X.r, design = design) 

# combine decomposed repeated samples with the remaining, non-repeated samples
X.final <- rbind(X.r.w, X[-repeated.samples, ]) 

# form sPLS-DA model
model <- splsda(X.final, Y, keepX = c(15,15))