Double cross-validation

w.zeng · January 29, 2021, 7:23pm

I have been enjoying using mixOmics package and as I dive deeper into my analysis, I had more questions.

According to some papers on PLSDA (e.g.https://link.springer.com/article/10.1007/s11306-011-0330-3), a double cross-validation scheme may be the most unbiased way to validate the PLSDA model. I noticed that I can perform cross-validation through both the perf() and the tune() function in the mixOmics package. Is the cross-validation implemented in these two functions a double cross-validation scheme? Will the number of folds affect the error rate? And how should I determine the number of folds?

Thanks!
Wenjie

kimanh.lecao · February 10, 2021, 4:00am

hi @w.zeng ,

Double cross-validation is indeed the way to go, but only if you have a sufficient number of samples. Which is not often the case in omics studies. One way is for you to create an outer loop around either perf() or tune().

Yes the number of folds affects how the error rate is estimated. We recommend that your test set (inner fold) includes at least 5 samples. Don’t forget that we also have a repeat argument to repeat the CV process.

Kim-Anh

w.zeng · February 10, 2021, 7:28pm

Thanks for the kind reply. If I decide to use the double-cross validation scheme. Can I reduce the number of repeats of CV in the inner loop without affecting the validity of the outcome?

mdmanurung · March 2, 2021, 6:16pm

Hi @w.zeng, did you manage to do the double cross-validation?

mdmanurung · March 2, 2021, 10:45pm

Hi @kimanh.lecao,

I attempted to code the nested cross-validation. For the outer and inner loop, I used 5 and 2 folds, respectively. If I understand correctly, I tune the parameters on the inner loop and then test the best performing model on the data from the outer loop. I then take the average of the metrics across all folds to estimate model performance. But then, which hyperparameters should I use for the final refit of my model?

Thanks in advance.

Best,
Mikhael

alexsunny123 · February 23, 2022, 1:04am

thanks for the awesome information.

Daniel · January 10, 2023, 2:39pm

Hi,
We were trying the same with our dataset. Did you manage to solve it in the end, especially for the selection of the hyperparameters?

Topic		Replies	Views
PLSDA on small sample size, and OPLSDA Analysis	1	589	June 23, 2023
Dividing dataset to create model Analysis	2	336	January 18, 2021
Nested cross validation Analysis	1	255	January 10, 2023
Multilevel PLSDA- Avoid overfitting on small sample size experiment Analysis	1	1256	August 28, 2022
Help deciding the number of components in PLS-DA Analysis	3	391	June 27, 2024

Double cross-validation

Related topics