I’m using the
plsda function to do binary classification on a metabolomics dataset with 718 features and perfectly balanced sample size of n=25 per group.
plsda.res <- plsda(data.sel, response, ncomp = 10) perf.plsda <- perf(plsda.res, validation = "Mfold", folds = 5, progressBar = FALSE, auc = TRUE, nrepeat = 500)
plotIndiv, it looks like there is pretty good separation between the groups along components 1 and 2:
However, the error rates in the
perf.plsda object are a lot higher than what I would expect based on the component plot. Something in the range of 40-46% error!
Is this behavior reasonable? I thought that if you could draw a line to separate groups on a projection plot, then the actual classification should be roughly similar in performance. Or am I doing something wrong?