Tuning sPLS-DA and sPCA with low sample size

Chiara.Anser · April 17, 2025, 3:45pm

Dear all,
I’m trying to apply sPCA on my proteomics and metabolomics data to reduce the initial dataframes to key features only. I want to select the proper number of components to keep at least 80% of explained variance and then use sPCA function to extract the key features on the choosen components.

I’ve used the following lines to choose the optimal ncomp:

explainedVariance ← tune.pca(data_scaled, ncomp = 5, center = FALSE, scale = FALSE)
plot(explainedVariance)

The issue occurs when I’m performing the following steps:
set.seed(1306)
test.keepX ← c (20,20) #(given ncomp=2)

tune.spca.res ← tune.spca(data_scaled, ncomp = 2,
nrepeat = 5,
folds = 3,
test.keepX = test.keepX)

Given the very low number of samples (6 total), the number of folds for tuning is always too high. So the function do not run and I’m not able to obtain the choice.keepX.

How can I do to overcome this problem?
My aim is still to reduce the initial dataframe to the key features…

Thanks a lot for your always kind help.
Chiara

evahamrud · April 28, 2025, 3:59am

Hi @Chiara.Anser,

The steps you have shared make sense for using the sPCA model tuning to identify the key features on your chosen components. As you correctly pointed out you cannot perform proper cross-validation with 3 folds when you have only 6 samples. For low sample numbers (<10) we recommend using leave-one-out cross-validation rather than using M-fold cross-validation. You can do this by setting validation = loo when running tune.spca, please see our webpage on cross-validation in mixOmics for more details.

Hope that helps!
Eva

Chiara.Anser · April 30, 2025, 2:22pm

Hi Eva, thanks for your answer. I’ve tried to set the “loo” option in tune.spca but this is not working. Indeed it says that the argument validation =loo or “loo” is not utilized.
How can I solve this issue?

Thanks
Chiara

evahamrud · May 9, 2025, 3:29am

Hi @Chiara.Anser,

Sorry for the confusion, you’re right we currently don’t have loo functionality for tune.spca() (we do for tuning other models hence my mistake). In this case given that you only have 6 samples I would extract the loadings of your variables rather than running sparse tuning to extract the key features, you can do this using $loadings on your PCA or sPCA object and you can also visualise these using the plotLoadings() function.

Cheers,
Eva

Topic		Replies	Views
LOO CV when running a sPCA Support	1	162	December 7, 2023
Issues with tune.spca() and perf() Support	1	168	February 29, 2024
Variable importance in sPLS-DA Analysis	4	520	March 11, 2021
keepX for sPLS-DA - small sample size Analysis	1	309	June 22, 2023
SPLS tune$choice.ncomp returns null Support	2	539	October 8, 2020

Tuning sPLS-DA and sPCA with low sample size

Related topics