Tuning sPLS-DA and sPCA with low sample size

Dear all,
I’m trying to apply sPCA on my proteomics and metabolomics data to reduce the initial dataframes to key features only. I want to select the proper number of components to keep at least 80% of explained variance and then use sPCA function to extract the key features on the choosen components.

I’ve used the following lines to choose the optimal ncomp:

explainedVariance ← tune.pca(data_scaled, ncomp = 5, center = FALSE, scale = FALSE)
plot(explainedVariance)

The issue occurs when I’m performing the following steps:
set.seed(1306)
test.keepX ← c (20,20) #(given ncomp=2)

tune.spca.res ← tune.spca(data_scaled, ncomp = 2,
nrepeat = 5,
folds = 3,
test.keepX = test.keepX)

Given the very low number of samples (6 total), the number of folds for tuning is always too high. So the function do not run and I’m not able to obtain the choice.keepX.

How can I do to overcome this problem?
My aim is still to reduce the initial dataframe to the key features…

Thanks a lot for your always kind help.
Chiara

Hi @Chiara.Anser,

The steps you have shared make sense for using the sPCA model tuning to identify the key features on your chosen components. As you correctly pointed out you cannot perform proper cross-validation with 3 folds when you have only 6 samples. For low sample numbers (<10) we recommend using leave-one-out cross-validation rather than using M-fold cross-validation. You can do this by setting validation = loo when running tune.spca, please see our webpage on cross-validation in mixOmics for more details.

Hope that helps!
Eva

Hi Eva, thanks for your answer. I’ve tried to set the “loo” option in tune.spca but this is not working. Indeed it says that the argument validation =loo or “loo” is not utilized.
How can I solve this issue?

Thanks
Chiara

Hi @Chiara.Anser,

Sorry for the confusion, you’re right we currently don’t have loo functionality for tune.spca() (we do for tuning other models hence my mistake). In this case given that you only have 6 samples I would extract the loadings of your variables rather than running sparse tuning to extract the key features, you can do this using $loadings on your PCA or sPCA object and you can also visualise these using the plotLoadings() function.

Cheers,
Eva