Understanding AUC results for DIABLO integration

8Rocco8 · September 26, 2024, 11:51am

Hi,
I am using DIABLO to integrate data which includes four blocks (different datasets).

When I do the auroc(myDiablo) for each of the four blocks they give me AUC values as follows:

0.95
0.69
0.67
0.63

However, the AUC value for my entire DIABLO model that I get from running the perf function with auc = TRUE gives me an AUC of 0.77.

I do not understand how one of my blocks individually can have such a high AUC, but when I combine all the blocks for my DIABLO model, the AUC lowers which makes it seems like combining the datasets actually lowers the performance of the model. Shouldn’t the combination of my blocks increase performance? Is there something I am completely misinterpreting here?

Thanks a lot to the whole mixOmics team for the amazing package!

kimanh.lecao · October 4, 2024, 9:09am

hi @8Rocco8,

If you go through the (extensive details) of the help files for both functions, you will see that:

AUROC is calculated on the actual data set (assuming you have not provided new data):

If newdata is not provided, AUROC is calculated from the training data set, and may result in overfitting (too optimistic results).

the perf is averaging several repeats across cross-validation

Note also that the AUC is still calculated on a DIABLO integrated output, but on each omics component, whereas perf() end up averaging all AUC values.

Also (we have discussed this in the DIABLO paper), integration does not necessarily mean you will increase the performance.

And finally (see help file from AUROC):

The ROC and AUC are calculated based on the predicted scores obtained from the predict function applied to the multivariate methods (predict(object)$predict ). Our multivariate supervised methods already use a prediction threshold based on distances (see predict ) that optimally determine class membership of the samples tested. As such AUC and ROC are not needed to estimate the performance of the model (see perf , tune that report classification error rates). We provide those outputs as complementary performance measures.

I hope that helps!

Kim-Anh

8Rocco8 · October 14, 2024, 7:56am

Dear Kim-Anh,

thank you for the answer and clarification.

Thank you again for the mixOmics package.

best regards
Rocco

Topic		Replies	Views
AUC for the entire model in DIABLO Support	1	1068	January 23, 2020
AUC for DIABLO object Bugs	15	1511	May 3, 2024
Cross-validated AUC in DIABLO	2	627	April 15, 2020
Identify dataset that best segregates samples and loading values interpretation across data sets, and error rate per data set Analysis	3	465	October 27, 2020
AUC ROC Diablo for combined data sets Support	0	20	July 25, 2024

Understanding AUC results for DIABLO integration

Related topics