sPLS-DA model for repeated longitudinal measurements

iwilliams · September 11, 2020, 6:46pm

Hi there!
I would first like to first commend the mixOmics team on the great package. The detailed vignettes have helped me to understand these multivariate statistical approaches both at the conceptual and practical level.

The dataset I am working contains unbalanced repeated serum lipidomic measurements at 3 time points for 2 different groups of patients. Of note, initial PCA analysis shows that variation due to time is greater than the variation due to patient group. However, we are more interested in the variation due to patient group.

I have tried building both PLS-DA and sPLS-DA models using a multilevel approach to account for the repeated measures. When assessing the performance of the PLS-DA model the classification error vs. component relationship was almost perfectly flat. For the sPLS-DA model, the tune.splsda function kept throwing an error message "n if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed ". Interestingly, when I ditched the multilevel approach, I was able to both 1) get a more reasonable classification error vs. component plot and 2) tune the sPLS-DA model.

Any idea why the multilevel approach might hinder the modeling? How would you recommend treating the repeated longitudinal measures in this case?

Happy to share any data or code that would help address this question.

Thanks,
Ian

kimanh.lecao · September 13, 2020, 11:55pm

Dear @iwilliams,
Thanks for the positive feedback!

Of note, initial PCA analysis shows that variation due to time is greater than the variation due to patient group. However, we are more interested in the variation due to patient group.

We often recommend our users to inspect those PCAs, as you have done. If you do not see a strong different between a PCA and a multilevel PCA, it means that the multilevel approach is not great at handling the time effect (so that could be reason 1).

For the sPLS-DA model, the tune.splsda function kept throwing an error message "n if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed ". Interestingly, when I ditched the multilevel approach, I was able to both 1) get a more reasonable classification error vs. component plot and 2) tune the sPLS-DA model.

The multilevel approach consists in decomposing the data further, it may happen that the within matrix that we extract (you can do it as a separate step with the withinVariation() function) is a bit empty, depending on your design / data characteristics. This, compounded with variable selection may explain this error (reason #2).

Try do the tuning on a larger number of variables, and potentially not too many components (which you would have tuned anyway in a previous step with PLS-DA) and see what happens. If that breaks down, let us know and we can try debugging it.

Kim-Anh + pinging @aljabadi

Topic		Replies	Views
Multilevel PLSDA- Avoid overfitting on small sample size experiment Analysis	1	1268	August 28, 2022
SPLS-DA for two time points (repeated), plotLoadings mean vs median, CSS normalisation and scaling Analysis	5	1205	April 25, 2020
Help understanding high error rate using PLS-DA Analysis	6	3603	October 21, 2020
PLS-DA with missing '' values predicted in Y Analysis	1	723	April 26, 2020
PLSDA on small sample size, and OPLSDA Analysis	1	595	June 23, 2023

sPLS-DA model for repeated longitudinal measurements

Related topics