Test and training datasets

Chiara.Anser · October 7, 2024, 4:15pm

Hi there,
I’m using MixOmics - DIABLO to perform integration of proteomics (1625 ids), metabolomics (90 ids) and lipids (289 ids), on 62 patients (32 group REL and 30 group NOT.REL).
I’ve used the tune.block.splsda function using this test.keepX from my entire datasets:

test.keepX ← list(metabolites = c(seq(1,90, 6)),
protein = c (seq(100,1600,107)),
lipids = c(seq(10,280,19)))

from this i have obtained the list.keepX and created my final model.

Now I would like to validate this model. I was looking at the " Case Study of sPLS-DA with SRBCT dataset" in the PREDICTION part where it say:

“In real scenarios, the training model should be tuned itself. It is crucial that when tuning the training model, it is done in the absence of the testing data. This also reduces likelihood of overfitting”.

if I’m understanding well I need to split the dataset at the very beginning of my analysis in test and training… then tune the model only on the training.dataset (with the previous described test.keepX parameters) and then use the predict function on the test.dataset…

It’s that correct?
The main point is that i have only one dataset of 62 patients on which i would like to build and validate the model, is that possible? Is in your experience the sample dimension enough?

Thanks

kimanh.lecao · October 17, 2024, 9:32pm

hi @Chiara.Anser,

In a utopian case you would have a second validation dataset available to do a proper validation but most researchers cannot achieve this. So your validation would only come from cross-validation using the perf() function of your final model, given that the number of samples you have from the start is quite small.

You could divide your original dataset into training and testing but your results would be highly dependent on the test set, and so you would have to repeat this with several test sets. That would be equivalent to do cross-validation.

Kim-Anh

Topic		Replies	Views
Using keep.X from separate sPLS-DA analyses for Diablo Analysis	3	960	October 8, 2020
List.keepx for sPLS-DA Analysis	3	1295	April 12, 2021
Observing sample varialbe in Comp 1 and Comp2 Analysis	3	419	January 31, 2021
Tune.block.splsda error Bugs	8	2059	April 17, 2020
Tune.block.spls? Analysis	6	1167	July 12, 2022

Test and training datasets

Related topics