Hi there!

I would first like to first commend the mixOmics team on the great package. The detailed vignettes have helped me to understand these multivariate statistical approaches both at the conceptual and practical level.

The dataset I am working contains unbalanced repeated serum lipidomic measurements at 3 time points for 2 different groups of patients. Of note, initial PCA analysis shows that variation due to time is greater than the variation due to patient group. However, we are more interested in the variation due to patient group.

I have tried building both PLS-DA and sPLS-DA models using a multilevel approach to account for the repeated measures. When assessing the performance of the PLS-DA model the classification error vs. component relationship was almost perfectly flat. For the sPLS-DA model, the tune.splsda function kept throwing an error message "n if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed ". Interestingly, when I ditched the multilevel approach, I was able to both 1) get a more reasonable classification error vs. component plot and 2) tune the sPLS-DA model.

Any idea why the multilevel approach might hinder the modeling? How would you recommend treating the repeated longitudinal measures in this case?

Happy to share any data or code that would help address this question.

Thanks,

Ian