I have a lipidomics dataset with more than 1800 features and 30 samples and I want to apply PLSDA. So, I used the following code to do this:

HLGA_plsda ← plsda(Normalized_data, Groups, ncomp = 10, scale = FALSE)

perf_plsda ← perf(HLGA_plsda, validation = “Mfold”,

folds = 3, nrepeat = 100,

progressBar = TRUE, auc = TRUE)

plot(perf_plsda, col = color.mixo(5:7), sd = TRUE,

legend.position = “horizontal”)

perf_plsda$choice.ncomp

and I get this plot:

Now, I get that only one component is the best with “$choice.ncomp”, although I feel it is not reasonable and I should choose 2. Also, how come I have a very low error rate, does it mean the model is overtrained? or am I doing something wrong?

I chose not to scale because I get a higher classification error rate with scaling.

Also, using RVAideMemoire package, cross validation with 2 components gives me a lower classification error rate than one component:

MVA.cv(Normalized_data, Groups, repet = 100, k = 3, ncomp = 1, scale = FALSE, model =“PLS-DA”)

Mean (standard error) classification error rate (%): 0.2 (0.08), but with 2 components I get 0% (0)

So, I am really confused how to decide the number of components, can anyone help? Both models with one or 2 components are significant upon permutation.

I am really new to this area so any advice will help!

Thank you