AUROC, Model Accuracy and Biomarker Detection

John_1 · April 19, 2024, 5:26pm

Hey everyone,

I’m having trouble understanding the output from sPLSDA analysis. I’ve read other post related to the topic, but wasnt sure if I am interpreting the results correctly. I’m working with metabolomics data from 100 samples, 50 being from disease group, 50 being from a healthy. When running the AUROC function on a tuned model I get a value of ~ 0.7 however when assessing the model with the the perf function using 10 folds and 30 repeats I get a AUC all roughly .5-.6 for each component and an error rate > 50%.

Would this mean that the model is heavily overfit, and would variable importance from the tuned model be pretty much useless for biomarker detection since the model is not generalized?

kimanh.lecao · May 2, 2024, 11:36pm

hi @John_1,

The AUROC is largely optimistic, because it evaluates the discrimination along a range of predicted score values, whereas the (s)PLS-DA has already made that cutoff call. So I would always trust the perf() results more than the AUROC (but users have asked for the AUROC, so here you go )

I think your results indicate a lack of generalisability for biomarker discovery indeed. Try a small variable selection size? And other types of models (e.g machine learning).

Kim-Anh

Topic		Replies	Views
ROC analysis on a PLS-DA model built on only training data Analysis	10	1972	April 18, 2024
Problems with AUC for splsda object Analysis	1	804	November 4, 2020
Correct performances/error rates interpretation? Analysis	3	244	July 21, 2023
Help understanding high error rate using PLS-DA Analysis	6	3709	October 21, 2020
Perf error in performance evaluation (Error rate un AUROC check) Analysis	2	45	July 12, 2024

AUROC, Model Accuracy and Biomarker Detection

Related topics