Integration of 2 data sets with DIABLO

[from Hao]

I have a question on using DIABLO on transcriptome and metabolome data.

We have gotten a transcriptome data for three conditions years ago and we have currently done a metabolome analysis for the same conditions. I am wondering whether I can use DIABLO for integration of those two omics ? And if not, can those two datasets be integrated to reveal the correlation between genes and metabolites ?

Thanks, we will be very appreciate for receiveing your kindly response.

Dear Hao,

We recommend reading these two posts How to link data in DIABLO and Choosing Diablo Design Matrix.

You could consider first using a PLS or sparse PLS method on your two data sets before going to a DIABLO analysis. Also consider applying PLS-DA and sPLS-DA on each data set individually for a discriminant analysis. This will then help you understand the correlation structure (PLS) and discriminative power (PLS-DA) of you data before you move to DIABLO. See the bookdown vignette for some examples.


1 Like

Thanks for your response! I have read the two posts and to understand how to use DIABLO method. However, I still have a confusion.

For my datasets, a transcriptome data are in 5 conditions, each in 3 replicates. And the metabolome data for the same 5 conditions, each in 6 replicates. However, although the treatment is the same, the cell used is not the same for two omics. If this could affect on the analysis and could we randomly match the transcriptome sample and metabolome sample in the same condition ? (For example, transcriptome-condition1-rep1 matches metabolome-condition1-rep2)

Or I should do some random sampling on the distribution of the omics data?

@kimanh.lecao thanks for your response

hi @wh960823,
We do assume that the samples are matching in most of our methods, except MINT. If you are using cells you could violate this assumption but randomly sampling 3 replicates out of the 6 in the metabolome and assess whether the results are similar when you compare with other random samples of 3 (that comparison could be based for example by calculating the correlation between the variates of your PLS models from one subsample to another subsample).