Hi there,
I’m using MixOmics - DIABLO to perform integration of proteomics (1625 ids), metabolomics (90 ids) and lipids (289 ids), on 62 patients (32 group REL and 30 group NOT.REL).
I’ve used the tune.block.splsda function using this test.keepX from my entire datasets:
test.keepX ← list(metabolites = c(seq(1,90, 6)),
protein = c (seq(100,1600,107)),
lipids = c(seq(10,280,19)))
from this i have obtained the list.keepX and created my final model.
Now I would like to validate this model. I was looking at the " Case Study of sPLS-DA with SRBCT dataset" in the PREDICTION part where it say:
“In real scenarios, the training model should be tuned itself. It is crucial that when tuning the training model, it is done in the absence of the testing data. This also reduces likelihood of overfitting”.
if I’m understanding well I need to split the dataset at the very beginning of my analysis in test and training… then tune the model only on the training.dataset (with the previous described test.keepX parameters) and then use the predict function on the test.dataset…
It’s that correct?
The main point is that i have only one dataset of 62 patients on which i would like to build and validate the model, is that possible? Is in your experience the sample dimension enough?
Thanks