How to split the training and test set for repeated measure data

Hello,

How to split the training and test set for performing SPLS-DA? I have 23 subjects and samples collected from each for 5 time points. Should I use withinvariation(X, design) and then split the data? What would be the correct approach?

Thanks!

Hi @Tonima,

Apologies for the late answer as we are going through a team change.

It really depends on how you want to take into account time in your analysis. If you expect time variation to be quite large, but you are not too concerned about the correlation between time points (or the time variation is large enough to reflect this), then a multilevel decomposition is appropriate (follow those steps to assess: Multilevel – mixOmics). If you choose multilevel, then the perf function associatied to sPLS-DA Multilevel internally does the cross-validation by taking the repeated measurements into account, so I would just use that.

Given the number of samples, it’s probably better to stick to cross-validation, rather than splitting into training / testing. However, if you really want to do this, then yes you should randomly split at the subject level.

PS: we have other sister packages, such as timeOmics that might be a follow up on your analysis (it answers a different question).

Kim-Anh