Dear mixOmics members,
First of all, I would like to thank the mixOmics team for their amazing work!
I’m a new user of this tool and I really appreciate working with it as it is easy to use, and it contains many reduction dimension methods to integrate multiomics data and plots to show outputs.
I’m especially interested in unsupervised Nintegration, which is done with the method block.pls. My goal is to find correlations (or links) between my variables coming from different omics data. I started with 2integrations (rcc and canonical pls) and used the CIM plot to show correlations between my variables. I have seen that another CIM version was available for supervised Nintegration (block.plsda) but not for unsupervised Nintegration (block.pls). Here are my questions:

The 2integration’s CIM presents variables from the first omics dataset against variables from the second. Hence it cannot be generalized to integration of 3 or more omics dataset. However, I would like to know if it is possible to use the computational idea behind CIM with more than 2 omics dataset. Indeed, to create the CIM, mixOmics creates a similarity matrix then uses a hierarchical clustering on it. It gives values used to create the CIM. Can I make a similarity matrix with 3 or more omics dataset then use a hierarchical clustering on it?

CIM for blockplsda is a more “classic” heatmap as its rows and columns are those from the X input data. This plot can also be used to find correlations between omics data, but until now it can only be used with supervised learning. Has anyone thought about creating a CIM version for blockpls?
Best regards,
Emile
Hello there,
I have started creating a cimDiablo version for block.pls, and I have new questions:

I have read that block.pls does unsupervised learning and block.splsda supervised learning. However, both of them take as inputs a matrix ‘X’ and a response matrix ‘Y’. Concerning Y, it seems that the only difference is that block.spls takes continuous variables when block.splda takes a class vector. Hence, I don’t understand how block.pls can do unsupervised learning with a response matrix as input. It is important for me as I want to do a ‘symetric’ integration of my 3 omics datasets, so I want each dataset to have the same role than the others and not to choose one of them to be the ‘response’ dataset.

I don’t really understand how works cimDiablo. For me, cimDiablo does a heatmap of the inputs instead of the outputs of block.splsda. Indeed, the plot is creating by the line ‘cim(XDat, …)’ but XDat corresponds to the input data X which have been sparsed then bounded. I don’t see where are used the outputs of block.splsda (which are for me in ‘block.splsda(…)$variates’ and ‘block.splsda(…)$loadings’) except during the sparse part of cimDIablo…

Because of the previous point, I have created a new cimDiablo version which now used the outputs (’$variates’ and ‘$loadings’) instead of inputs (’$X’). To do it, I modified the definition of XDatList. Before, it was just the sparsed version of X. In my own version, it is:
XDatList < sapply(1:length(object$variates), function(i) object$variates[[i]][,comp] %*% t(object$loadings[[i]])[comp,])
Indeed, the dimension reduction assumes that for each omics dataset, X = variates * loadings + noise, so we can approximate the unbiaised omics data by doing variates * loadings. I think this version better uses block.splsda’s outputs to create heatmaps, but I would like to have your opinion about it. Note that with my dataset, cimDiablo with block.splsda gives unexpected results but my new cimDiablo version with block.pls gives more expected results.
Best regards,
Emile
Dear @emile.mardoc
Thank you for your feedback and suggestions / elements of discussions (and apologies for the delay in answering your post).
We have not worked too much on the outputs of block.spls()
so far as our efforts were mostly focused on block.splsda()
, so it is great if you can contribute to this module in some way!
 Can I make a similarity matrix with 3 or more omics dataset then use a hierarchical clustering on it?
Yes, you could use a generalisation of the similarity matrix to get all pairwise similarities between variables, but I wonder if that means you would need a 3D types of heat map Also, the similarity matrix for Nintegration can be extracted from circosPlot, but for a block.splsda()
object.
 Has anyone thought about creating a CIM version for blockpls?
We have thought about it, it is simple to do. Just our time is a limiting factor here!
 I have read that block.pls does unsupervised learning and block.splsda supervised learning. However, both of them take as inputs a matrix ‘X’ and a response matrix ‘Y’. Concerning Y, it seems that the only difference is that block.spls takes continuous variables when block.splda takes a class vector.
We are still in a PLS type of framework, where we consider a Y continuous matrix as the response matrix. If you would like to be completely unsupervised with no response variable, have a look at the method wrapper.sgcca()
/ wrapper.rgcca()
. Those are wrappers from the RGCCA package, and there is no single visualisation available, sorry!
 For me, cimDiablo does a heatmap of the inputs instead of the outputs of block.splsda.
Your interpretation is correct. cimDiablo is just a heat map of the input data (plus some distance / linkage methods for the hierarchical clustering) only showing the variables selected, and a coloured bar to indicate the grouping of the samples, and the types of variables, nothing more. What you propose is a sort of extension a reconstruction of the matrix (rankh reconstruction) based on the components and loading vectors for the selected variables only. Could be nice for us to investigate further with our developer @aljabadi. If you are happy to share a reproducible code + qualitative comparison on the classic heat map vs your approach on the breast cancer data set, we are happy to look into this more closely, contact us at mixomicsdevel@math.univtoulouse.fr!
KimAnh