Understanding AUC results for DIABLO integration

Hi,
I am using DIABLO to integrate data which includes four blocks (different datasets).

When I do the auroc(myDiablo) for each of the four blocks they give me AUC values as follows:

  • 0.95
  • 0.69
  • 0.67
  • 0.63

However, the AUC value for my entire DIABLO model that I get from running the perf function with auc = TRUE gives me an AUC of 0.77.

I do not understand how one of my blocks individually can have such a high AUC, but when I combine all the blocks for my DIABLO model, the AUC lowers which makes it seems like combining the datasets actually lowers the performance of the model. Shouldn’t the combination of my blocks increase performance? Is there something I am completely misinterpreting here?

Thanks a lot to the whole mixOmics team for the amazing package!

hi @8Rocco8,

If you go through the (extensive details) of the help files for both functions, you will see that:

  • AUROC is calculated on the actual data set (assuming you have not provided new data):

If newdata is not provided, AUROC is calculated from the training data set, and may result in overfitting (too optimistic results).

  • the perf is averaging several repeats across cross-validation

Note also that the AUC is still calculated on a DIABLO integrated output, but on each omics component, whereas perf() end up averaging all AUC values.

Also (we have discussed this in the DIABLO paper), integration does not necessarily mean you will increase the performance.

And finally (see help file from AUROC):

The ROC and AUC are calculated based on the predicted scores obtained from the predict function applied to the multivariate methods (predict(object)$predict ). Our multivariate supervised methods already use a prediction threshold based on distances (see predict ) that optimally determine class membership of the samples tested. As such AUC and ROC are not needed to estimate the performance of the model (see perf , tune that report classification error rates). We provide those outputs as complementary performance measures.

I hope that helps!

Kim-Anh

Dear Kim-Anh,

thank you for the answer and clarification.

Thank you again for the mixOmics package.

best regards
Rocco