How to split the training and test set for repeated measure data

Tonima · August 1, 2025, 4:10pm

Hello,

How to split the training and test set for performing SPLS-DA? I have 23 subjects and samples collected from each for 5 time points. Should I use withinvariation(X, design) and then split the data? What would be the correct approach?

Thanks!

kimanh.lecao · October 16, 2025, 10:43pm

Hi @Tonima,

Apologies for the late answer as we are going through a team change.

It really depends on how you want to take into account time in your analysis. If you expect time variation to be quite large, but you are not too concerned about the correlation between time points (or the time variation is large enough to reflect this), then a multilevel decomposition is appropriate (follow those steps to assess: Multilevel – mixOmics). If you choose multilevel, then the perf function associatied to sPLS-DA Multilevel internally does the cross-validation by taking the repeated measurements into account, so I would just use that.

Given the number of samples, it’s probably better to stick to cross-validation, rather than splitting into training / testing. However, if you really want to do this, then yes you should randomly split at the subject level.

PS: we have other sister packages, such as timeOmics that might be a follow up on your analysis (it answers a different question).

Kim-Anh

Topic		Replies	Views
withinVariation() on part of dataset Support	1	484	February 3, 2022
sPLSDA multilevel and timepoints Support	1	245	December 8, 2022
Paired splsda with only two times Analysis	2	40	October 18, 2024
sPLS-DA model for repeated longitudinal measurements Analysis	1	1049	September 13, 2020
Two Factor Multi-Level sPLS-DA Analysis Analysis	4	861	January 10, 2021

How to split the training and test set for repeated measure data

Related topics