Tuning sPLS-DA and sPCA with low sample size

Dear all,
I’m trying to apply sPCA on my proteomics and metabolomics data to reduce the initial dataframes to key features only. I want to select the proper number of components to keep at least 80% of explained variance and then use sPCA function to extract the key features on the choosen components.

I’ve used the following lines to choose the optimal ncomp:

explainedVariance ← tune.pca(data_scaled, ncomp = 5, center = FALSE, scale = FALSE)
plot(explainedVariance)

The issue occurs when I’m performing the following steps:
set.seed(1306)
test.keepX ← c (20,20) #(given ncomp=2)

tune.spca.res ← tune.spca(data_scaled, ncomp = 2,
nrepeat = 5,
folds = 3,
test.keepX = test.keepX)

Given the very low number of samples (6 total), the number of folds for tuning is always too high. So the function do not run and I’m not able to obtain the choice.keepX.

How can I do to overcome this problem?
My aim is still to reduce the initial dataframe to the key features…

Thanks a lot for your always kind help.
Chiara

Hi @Chiara.Anser,

The steps you have shared make sense for using the sPCA model tuning to identify the key features on your chosen components. As you correctly pointed out you cannot perform proper cross-validation with 3 folds when you have only 6 samples. For low sample numbers (<10) we recommend using leave-one-out cross-validation rather than using M-fold cross-validation. You can do this by setting validation = loo when running tune.spca, please see our webpage on cross-validation in mixOmics for more details.

Hope that helps!
Eva