Help understanding high error rate using PLS-DA

Hi Fan,
thank you for using mixOmics!
What the performance plot shows might be a case of overfitting. On the training data it looks fine (plotIndiv), but as soon as you use cross-validation, the PLS-DA model does not generate well. A few tips to improve performance:

  • consider using sparse PLS-DA to select only the best discriminant metabolites to explain our outcome. It means you need to tune the number of metabolites to select (we provide some examples in our book down vignette to tune sPLS-DA)
  • also increase the number of repeats to at least 1000 for more accurate estimations when using perf on the splsda object.

Let us know if that helps!

Kim-Anh