CIM for blockpls?

emile.mardoc · August 27, 2020, 2:29pm

Dear mixOmics members,

First of all, I would like to thank the mixOmics team for their amazing work!

I’m a new user of this tool and I really appreciate working with it as it is easy to use, and it contains many reduction dimension methods to integrate multi-omics data and plots to show outputs.

I’m especially interested in unsupervised N-integration, which is done with the method block.pls. My goal is to find correlations (or links) between my variables coming from different omics data. I started with 2-integrations (rcc and canonical pls) and used the CIM plot to show correlations between my variables. I have seen that another CIM version was available for supervised N-integration (block.plsda) but not for unsupervised N-integration (block.pls). Here are my questions:

The 2-integration’s CIM presents variables from the first omics dataset against variables from the second. Hence it cannot be generalized to integration of 3 or more omics dataset. However, I would like to know if it is possible to use the computational idea behind CIM with more than 2 omics dataset. Indeed, to create the CIM, mixOmics creates a similarity matrix then uses a hierarchical clustering on it. It gives values used to create the CIM. Can I make a similarity matrix with 3 or more omics dataset then use a hierarchical clustering on it?
CIM for blockplsda is a more “classic” heatmap as its rows and columns are those from the X input data. This plot can also be used to find correlations between omics data, but until now it can only be used with supervised learning. Has anyone thought about creating a CIM version for blockpls?

Best regards,
Emile

emile.mardoc · August 31, 2020, 9:53am

Hello there,

I have started creating a cimDiablo version for block.pls, and I have new questions:

I have read that block.pls does unsupervised learning and block.splsda supervised learning. However, both of them take as inputs a matrix ‘X’ and a response matrix ‘Y’. Concerning Y, it seems that the only difference is that block.spls takes continuous variables when block.splda takes a class vector. Hence, I don’t understand how block.pls can do unsupervised learning with a response matrix as input. It is important for me as I want to do a ‘symetric’ integration of my 3 omics datasets, so I want each dataset to have the same role than the others and not to choose one of them to be the ‘response’ dataset.
I don’t really understand how works cimDiablo. For me, cimDiablo does a heatmap of the inputs instead of the outputs of block.splsda. Indeed, the plot is creating by the line ‘cim(XDat, …)’ but XDat corresponds to the input data X which have been sparsed then bounded. I don’t see where are used the outputs of block.splsda (which are for me in ‘block.splsda(…)$variates’ and ‘block.splsda(…)$loadings’) except during the sparse part of cimDIablo…
Because of the previous point, I have created a new cimDiablo version which now used the outputs (’$variates’ and ‘$loadings’) instead of inputs (’$X’). To do it, I modified the definition of XDatList. Before, it was just the sparsed version of X. In my own version, it is:
XDatList <- sapply(1:length(object$variates), function(i) object$variates[[i]][,comp] %*% t(object$loadings[[i]])[comp,])
Indeed, the dimension reduction assumes that for each omics dataset, X = variates * loadings + noise, so we can approximate the unbiaised omics data by doing variates * loadings. I think this version better uses block.splsda’s outputs to create heatmaps, but I would like to have your opinion about it. Note that with my dataset, cimDiablo with block.splsda gives unexpected results but my new cimDiablo version with block.pls gives more expected results.

Best regards,
Emile

kimanh.lecao · September 14, 2020, 12:06am

Dear @emile.mardoc
Thank you for your feedback and suggestions / elements of discussions (and apologies for the delay in answering your post).

We have not worked too much on the outputs of block.spls() so far as our efforts were mostly focused on block.splsda(), so it is great if you can contribute to this module in some way!

Can I make a similarity matrix with 3 or more omics dataset then use a hierarchical clustering on it?

Yes, you could use a generalisation of the similarity matrix to get all pairwise similarities between variables, but I wonder if that means you would need a 3D types of heat map Also, the similarity matrix for N-integration can be extracted from circosPlot, but for a block.splsda() object.

Has anyone thought about creating a CIM version for blockpls?

We have thought about it, it is simple to do. Just our time is a limiting factor here!

I have read that block.pls does unsupervised learning and block.splsda supervised learning. However, both of them take as inputs a matrix ‘X’ and a response matrix ‘Y’. Concerning Y, it seems that the only difference is that block.spls takes continuous variables when block.splda takes a class vector.

We are still in a PLS type of framework, where we consider a Y continuous matrix as the response matrix. If you would like to be completely unsupervised with no response variable, have a look at the method wrapper.sgcca() / wrapper.rgcca(). Those are wrappers from the RGCCA package, and there is no single visualisation available, sorry!

For me, cimDiablo does a heatmap of the inputs instead of the outputs of block.splsda.

Your interpretation is correct. cimDiablo is just a heat map of the input data (plus some distance / linkage methods for the hierarchical clustering) only showing the variables selected, and a coloured bar to indicate the grouping of the samples, and the types of variables, nothing more. What you propose is a sort of extension a reconstruction of the matrix (rank-h reconstruction) based on the components and loading vectors for the selected variables only. Could be nice for us to investigate further with our developer @aljabadi. If you are happy to share a reproducible code + qualitative comparison on the classic heat map vs your approach on the breast cancer data set, we are happy to look into this more closely, contact us at mixomics-devel@math.univ-toulouse.fr!

Kim-Anh

Topic		Replies	Views
Selecting method for integrating multiple data Analysis	3	123	June 27, 2024
Small samples and non omics Analysis	4	485	June 17, 2020
Multiblock data integration	1	114	February 1, 2024
Multiblock sPLS model evaluation Analysis	1	146	February 29, 2024
Model validation for block.pls Support	2	373	October 14, 2020

CIM for blockpls?

Related topics