sPLS-DA model for repeated longitudinal measurements

Dear @iwilliams,
Thanks for the positive feedback!

Of note, initial PCA analysis shows that variation due to time is greater than the variation due to patient group. However, we are more interested in the variation due to patient group.

We often recommend our users to inspect those PCAs, as you have done. If you do not see a strong different between a PCA and a multilevel PCA, it means that the multilevel approach is not great at handling the time effect (so that could be reason 1).

For the sPLS-DA model, the tune.splsda function kept throwing an error message "n if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed ". Interestingly, when I ditched the multilevel approach, I was able to both 1) get a more reasonable classification error vs. component plot and 2) tune the sPLS-DA model.

The multilevel approach consists in decomposing the data further, it may happen that the within matrix that we extract (you can do it as a separate step with the withinVariation() function) is a bit empty, depending on your design / data characteristics. This, compounded with variable selection may explain this error (reason #2).

Try do the tuning on a larger number of variables, and potentially not too many components (which you would have tuned anyway in a previous step with PLS-DA) and see what happens. If that breaks down, let us know and we can try debugging it.

Kim-Anh + pinging @aljabadi