error while trying to choose the optimum number of components

Dear All,

I am using mixOmics and facing trouble with sPLSDA tuning.

The error message is:
Error: Unexpected error while trying to choose the optimum number of components. Please check the inputs and if problem persists submit an issue to Issues · mixOmicsTeam/mixOmics · GitHub

Here is the code I use:
#set up the data as X expression matrix and Y as factor
Milk Cheese Yogurt
12 12 12

#PLSDA analysis
data.plsda ← mixOmics::splsda(X, Y, ncomp = 10)
mixOmics::plotIndiv(data.plsda ,
comp = c(1,2),
group = Y,
ind.names = FALSE,
ellipse = TRUE,
legend = TRUE,
title = ‘PLSDA results’)

#undergo performance evaluation in order to tune the number of components to use
perf.plsda<- mixOmics::perf(data.plsda, validation = “Mfold”, folds = 3, nrepeat = 10,
progressBar = TRUE, auc = TRUE)

#plot the outcome of performance evaluation across all ten components
plot(perf.plsda, col = color.mixo(5:7), sd = TRUE,
legend.position = “horizontal”)

max.dist centroids.dist mahalanobis.dist
overall 7 3 5
BER 7 3 5

#grid of possible keepX values that will be tested for each component
list.keepX ← c(1:10)

#undergo the tuning process to determine the optimal number of variables
tune.splsda<- mixOmics::tune.splsda(X, Y, ncomp = 5,
validation = ‘Mfold’,
folds = 3, nrepeat = 10, # use repeated cross-validation
dist = ‘max.dist’, # use max.dist measure
measure = “BER”, # use balanced error rate of dist measure
test.keepX = list.keepX)

Many thanks for your precious help


R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

[1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8 LC_MONETARY=French_France.utf8 LC_NUMERIC=C
[5] LC_TIME=French_France.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] mixOmics_6.23.4 ggplot2_3.4.2 lattice_0.20-45 MASS_7.3-60 ropls_1.28.2

hi @emeugnier,

There might be a few reasons (assuming the problem comes from tune.splsda()):
1 - comp = 5 is too large, even if this is the ‘recommended’ value. You have 3 groups, we would expect at most 4 components. Maybe that error comes at component 5.

2 - when we define the optimal number of variables to select, we run t-tests to assess whether the decrease in error rate is significant. It could be that something is happening there (i.e. the error rate for some reason is exactly the same between keepX = 1 and keepX = 2, and so the variance = 0). Could you spread out your grid a bit? a selection of 1 single marker seems a bit restrained to me (or not very insightful? depending if you are into biomarker selection, or a greater selection size for GO analysis down the track). Try list.keepX ← c(5:10, seq(15, 50, 5))

(I dont think this would come from a code bug, more like a specific characteristics from your data)