Hello
I am following the tutorial on sPLSDA:SRBCT with my dataset. I tried to do a preliminar PLS-DA tuned to 10 components in order to see the error rate and use the optimal ncomp for the sPLS-DA. As my number of samples is low (i.e., 47), I prefer using loo-cv. But when seeing the output of choice.ncomp, the table has only NAs:
coda_plsda <- plsda(codapls.dat, codapls.fac, ncomp = 10, mode = "regression")
coda_plsda_tune <- perf(coda_plsda, validation = "loo")
coda_plsda_tune$choice.ncomp
max.dist centroids.dist mahalanobis.dist
overall NA NA NA
BER NA NA NA
Is this normal? If so, should I use the plot(coda_plsda_tune) or is there another way to select my optimal ncomp?
In order for the method to assess whether there is a significant improvement in performance (reduction in error rate), the cross-validation has to be repeated at least 3 times. I believe if you change nrepeat to at least 3, you should be able to get optimal number of components.
Please do let us know if you keep having issues.
Thanks for the answer.
The problem is that the method seems to ignore nrepeat parameter when validation is set to ‘loo’, as this warning tells me:
coda_plsda_tune <- perf(coda_plsda, validation = "loo", nrepeat = 3)
Warning message:
In perf.mixo_plsda(coda_plsda, validation = "loo", nrepeat = 3) :
Leave-One-Out validation does not need to be repeated: 'nrepeat' is set to '1'.
coda_plsda_tune$choice.ncomp
max.dist centroids.dist mahalanobis.dist
overall NA NA NA
BER NA NA NA
You’re right. I hadn’t noted you were using Leave-One-Out Cross-Validation. Because by definition there can only be one repeat, it is not possible to test whether there’s significant improvement. As you mentioned, you can look at the perf plot and decide on the number of components.