I have 300 samples, with 100 000 characteristics and a binary outcome that I am trying to predict. I followed the splsda.srbct example. I don’t quite understand the results.
A plot post PLSDA shows that the samples separate nicely.
After running perf, I seem to get a large difference between the BER and the overall error rate
I selected 2 components based on max.dist
max.dist centroids.dist mahalanobis.dist
overall 2 1 1
BER 2 1 1
After running tune.splsda, choice.keepX recommends comp1 with 3 variables
As expected, the ROC curve looks bad:
Is this what is expected based on the initial PLSDA results?