Design matrix between omics datasets?

I don’t know how to get the design matrix by using PLS method mentioned in “The latter approach uses PLS method implemented in mixOmics that models pair-wise associations between omics datasets.”

The plot is below.

It coms from this artile:“DIABLO-an integrative, multi-omics, multivariate method for multi-group classification (https://www.biorxiv.org/node/18589)”

Hi @XiaoFie,

Thanks for using mixOmics.

You can use the plotDiablo function from mixOmics to produce a similar visualisation for each component. See example below.

library(mixOmics)
data('breast.TCGA')
Y = breast.TCGA$data.train$subtype

data = list(mrna =  breast.TCGA$data.train$mrna,
            mirna =  breast.TCGA$data.train$mirna, prot =  breast.TCGA$data.train$protein)

# set number of component per data set
ncomp = 3
# set number of variables to select, per component and per data set (arbitrarily set)
list.keepX = list(mrna = rep(20, 3), mirna = rep(10,3), prot = rep(10,3))

# set up a full design where every block is connected
design = matrix(1, ncol = length(data), nrow = length(data),
                dimnames = list(names(data), names(data)))
diag(design) =  0
design

BC.diablo = block.splsda(X = data, Y = Y, ncomp = ncomp, keepX = list.keepX, design = design)
## Look at pairwise correlations of component 1
plotDiablo(BC.diablo, ncomp = 1)

Please let us know if you have any other questions.

Best wishes,

Al

Does it mean that you use the pairwise correlations of component 1 (in a full design) as the design matrix to construct the new DIABLO model?

Thank you very much!

But I required the design matrix before “block.splsda”?

I dont know. I think that the design matrix is built before building the DIABLO model? But i dont know how to do it?

Hi @XiaoFie
Please see the two detailed tutorials:
http://mixomics.org/mixdiablo/case-study-tcga/

And https://mixomicsteam.github.io/Bookdown/

We have developed these tutorials to help users as best as we could.

Kim-Anh

I have read them carefully and still found the description confusing to me.

Parameters tuning.

The first parameter to tune in the design matrix C, which can be determined using either prior
biological knowledge, or a data-driven approach. The latter approach uses PLS method
implemented in mixOmics that models pair-wise associations between omics datasets. If thecorrelation between the first component of each omics dataset is above a given threshold (e.g.0.8) then a connection between those datasets is included in the DIABLO design.**

So, does it mean that if I construct a DIABLO model without setting design matrix and then plot the graph which shows the correlation between the first component of each omics dataset, and then I use the correlation as the design matrix in the final DIABLO model?

That is my understanding of the procedure. Please let me know if my understanding is correct. Thanks a lot!

Best,
Martina

hi @martina,

A PLS (or sparse PLS) is a different method from DIABLO. You can have a look at this post and my answer on how you could use PLS to then create your design matrix to input into DIABLO.

Kim-Anh