I am considering using Diablo for multi-omics analysis of a dataset with 3 data types, for which I have a total of 41 individuals, distributed across three groups. The diablo framework seems ideal for what I want to do. However, one of the questions I would like to answer is which of the three data types better segregates the samples across the three treatment groups / better predicts group membership. Looking at the vignette (https://mixomicsteam.github.io/Bookdown/diablo.html) it seemed to me that it might be possible to do this using the AUCROC curves. Thus, for example, would it make sense to compare the AUCROC curves for component 1 between mRNA and proteomic data?

Related to this, can the loadings across different datatypes be directly compared. E.g. if for component 1 my top loading for mRNA is 0.8 and for proteomics this is 0.3, can we say the top loading is higher for mRNA?

We dont recommend you use the AUROC to conclude on the data set with best segregation. In the perf() function you should have classification error outputs per class as well as per data set (see ?perf and the output $error.rate). These performance measures are better than the AUROC, who are often inflated (even if we use cross-validation) and not totally appropriate to reflect the performance of DIABLO.

Regarding your question about the loadings, the answer is: no, loadings values across data sets are not comparable as their values depend on the number of features per dataset and number of features selected per data set. We recommend you focus on the top features, whatever their loading values.

How should I interpret these values? e.g. can i say that based on dataset A on comp1, I get an erroneous classification on 37% of the cases? Moreover, would you recommend any approach to statistically compare the error rates for the same component, across datasets? (I guess I could compare $error.rate +/- $error.rate.sd, but I am wondering if there is a better approach)

How should I interpret these values? e.g. can i say that based on dataset A on comp1, I get an erroneous classification on 37% of the cases?
Yes your interpretation is correct, 37% of samples are misclassified based on the components associated to the A data set on comp 1 with this distance.

Moreover, would you recommend any approach to statistically compare the error rates for the same component, across datasets? (I guess I could compare $error.rate +/- $error.rate.sd , but I am wondering if there is a better approach)
I would say they are comparable. Remember that the model fits sets of components associated to each data set, so that the covariance is maximised (see our website with many resources including webinar + articles), but the components are ‘comparable’ to each other across data sets. This output just gives you more insight into which data set is more discriminative than the other.