AUROC, Model Accuracy and Biomarker Detection

Hey everyone,

I’m having trouble understanding the output from sPLSDA analysis. I’ve read other post related to the topic, but wasnt sure if I am interpreting the results correctly. I’m working with metabolomics data from 100 samples, 50 being from disease group, 50 being from a healthy. When running the AUROC function on a tuned model I get a value of ~ 0.7 however when assessing the model with the the perf function using 10 folds and 30 repeats I get a AUC all roughly .5-.6 for each component and an error rate > 50%.

Would this mean that the model is heavily overfit, and would variable importance from the tuned model be pretty much useless for biomarker detection since the model is not generalized?

hi @John_1,

The AUROC is largely optimistic, because it evaluates the discrimination along a range of predicted score values, whereas the (s)PLS-DA has already made that cutoff call. So I would always trust the perf() results more than the AUROC (but users have asked for the AUROC, so here you go :slight_smile: )

I think your results indicate a lack of generalisability for biomarker discovery indeed. Try a small variable selection size? And other types of models (e.g machine learning).

Kim-Anh