Hi there!
I would first like to first commend the mixOmics team on the great package. The detailed vignettes have helped me to understand these multivariate statistical approaches both at the conceptual and practical level.
The dataset I am working contains unbalanced repeated serum lipidomic measurements at 3 time points for 2 different groups of patients. Of note, initial PCA analysis shows that variation due to time is greater than the variation due to patient group. However, we are more interested in the variation due to patient group.
I have tried building both PLS-DA and sPLS-DA models using a multilevel approach to account for the repeated measures. When assessing the performance of the PLS-DA model the classification error vs. component relationship was almost perfectly flat. For the sPLS-DA model, the tune.splsda function kept throwing an error message "n if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed ". Interestingly, when I ditched the multilevel approach, I was able to both 1) get a more reasonable classification error vs. component plot and 2) tune the sPLS-DA model.
Any idea why the multilevel approach might hinder the modeling? How would you recommend treating the repeated longitudinal measures in this case?
Happy to share any data or code that would help address this question.
Thanks,
Ian
Dear @iwilliams,
Thanks for the positive feedback!
Of note, initial PCA analysis shows that variation due to time is greater than the variation due to patient group. However, we are more interested in the variation due to patient group.
We often recommend our users to inspect those PCAs, as you have done. If you do not see a strong different between a PCA and a multilevel PCA, it means that the multilevel approach is not great at handling the time effect (so that could be reason 1).
For the sPLS-DA model, the tune.splsda function kept throwing an error message "n if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed ". Interestingly, when I ditched the multilevel approach, I was able to both 1) get a more reasonable classification error vs. component plot and 2) tune the sPLS-DA model.
The multilevel approach consists in decomposing the data further, it may happen that the within matrix that we extract (you can do it as a separate step with the withinVariation()
function) is a bit empty, depending on your design / data characteristics. This, compounded with variable selection may explain this error (reason #2).
Try do the tuning on a larger number of variables, and potentially not too many components (which you would have tuned anyway in a previous step with PLS-DA) and see what happens. If that breaks down, let us know and we can try debugging it.
Kim-Anh + pinging @aljabadi