What exactly are the value used by CIM?

Hey,
Thank you very much for the tremendous work regarding the mixomics suite.
I have been digging in the documentation for a while now and even tried to understand the “raw” code of cim and cimDiablo function and I still don’t quite understand to what the plotted values correspond.
Are these correlation, distances? And between samples or also features? An finally, when using multiple comp, how is the information of the dimensions combined into one value.

Sorry if this has been already asked and answered somewhere but I failed to find it by myself.
Thanks again for your work and help.
Cheers!

hi @George

Apologies for the late answer.

The cimDIABLO is a simple heat map of the data showing the variables selected for a given component (6 N-Integration | mixOmics vignette). For this function you need to specify exactly the component to identify the selected variables.

For the cim it is a bit more advanced as we calculate the association matrix M based on the correlation of the variables with the components. I am sure you have come across the explanation here: cim() | mixOmics

For the case of 2 omics especially, it is explained in more details here: González I., Lê Cao K.-A., Davis, M.D. and Déjean S. (2013) Insightful graphical outputs to explore relationships between two ‘omics’ data sets. BioData Mining 5:19
Across several components, we ‘add’ these correlations. This is why I prefer to refer to this matrix as ‘association’ rather than ‘correlation’ as it may happen that you go beyond a value of 1 (or -1).

Kim-Anh

Kim-Anh

Hi,

Thanks for this answer.
So if I get it right, when plotting cimDiablo(diablo.obj, comp=c(1,2)) it returns a heatmap of the raw data (centered) without variables not selected on either comp 1 or 2?

If that’s right, why can we do the same as with cim() and plot the association matrix from a diablo fitted object?

Thanks again for taking the time to explain this to me!
Cheers!

Hi @George

cimDiablo(diablo.obj, comp=c(1,2)) . Reading from the help file I think we expect only 1 value (so the code is likely to only show the variables selected on comp 1. You can check by trying comp = 1 and comp = c(1,2).

To go from cimDIABLO to cim would be difficult because people are integrating several omics data sets, and you would need to evaluate every pairwise association. The heat map would be big.

This is why we propose the circos plot instead, of the varible plot plotVar().

Kim-Anh

Ok, very clear now.
Thanks a lot for your answers!
FYI, I tried comp=1 and comp=c(1,2) and it works indeed (different Hmaps with selected variables on comp1 or comp1 and 2).

Cheers,
George

1 Like