Choice of components for DIABLO

So I’m dealing with a dataset that has 2 classes with 5 and 7 samples in each class. I know that the sample size is small but I was wondering how I should go about in performing DIABLO. I have done the perf() function to determine the number of components to keep and noticed that the optimal was 5, which is alot. For this reason I am unsure how should I approach this? I also noticed in other discussions that other had this type of question and wondering the best way to approach it?

hi @bzavala

I am not sure if you use cross-validation or loo (you should use loo, or Fold = 3 + nrepeat).

For this case, I would say there is probably little benefit in considering that many components. The increase in the Mahalanobis distance is also weird (we expect the error rate to either stabilise, or decrease).

I would rather base the choice on visual considerations from the sample plots. It is highly likely that 1 or 2 components would be enough to discriminate your sample groups/

Kim-Anh

I used loo but could it also be interpreted as the classifications (with DIABLO) between classes are nearly indistinguishable?