Correlation matrix not equal between cim and network analysis

Hello,

I am working on evaluating a partial least squares canonical analysis that I produced using the code pls(X, Y, mode=“canonical”) with the other arguments set to default.

After creating the PLSCA, I tried to create a CIM heatmap and network analysis using the cim() and network() functions. I noticed that both of these functions graph different sets of data, despite both using the same pls object. I confirmed this by checking the correlation matrices produced from each function using cim()$mat.cor and network()$M and seeing that these matrices have different values (i.e. the approximate Pearson correlation coefficients). Based on my reading from the vignette and Gonzalez et al. 2012 publication, I was under the impression that both of these functions use the same correlation matrix.

The R documentation for the network() function states that it is used for PLS regressions. Is the reason why these two matrices are different because network() only works correctly when you use pls(mode=“regression”) and not pls(mode=“canonical”)? Or is there something else that I am missing that could explain why these matrices are different?

To go along with this, is there a similarity matrix that is an inherent value of the pls object, or is it produced when using functions like cim() and network()?

Thank you for your help!
-James

hi @JamesR,

This is a valid question.
In CIM, by default it calculated the similarity matrix from 1:object$ncomp, whereas in network the default value is 1. So that could be the main difference depending on what you are plotting (and extracting).

The Mat calculations are done inside those methods, but from the results of the different PLS results (depending on the method you have used).

In network, you can see how it is calculated in line 390 (and there is a version also for a canonical version) mixOmics/R/network.R at master · mixOmicsTeam/mixOmics · GitHub

In cim, the code is pretty much the same (line 1087) but a bit more messy. mixOmics/R/cim.R at master · mixOmicsTeam/mixOmics · GitHub

So, as a short answer, it should give the same results!
Kim-Anh

Hi Kim-Anh,

Thank you for your response and pointing me to the where the similarity matrices are listed in the codes.

I think you pointed out the source of my confusion in the network function. It appears that this function calculates the similarity matrix differently between a PLS regression and PLS canonical analysis (line 411), whereas the CIM function calculates them the same way.

I was able to reproduce this situation with the nutrimouse dataset:
library(mixOmics)
data(nutrimouse)
X<-nutrimouse$gene
Y<-nutrimouse$lipid
plsca<-pls(X,Y, mode=“canonical”)
cim(plsca)$mat.cor
network(plsca)$M

After running this code, the two matrices that are produced are different. It appears that if we change to pls(X,Y, mode=“regression”), the matrices are the same. If this change is intended, can you explain why these are calculated differently for the different functions?

Also, my understanding is that the values produced in these matrices are approximations of the Pearson correlation coefficients between each pair of variables. If this is the case, is one of these matrices a better approximation?

Thank you again!
-James

hi @JamesR

Thanks for looking into this in detail (to summarise, cim seems to only deflate in a regression mode style)

There are no reasons why cim would be only on regression mode except historical reasons, or an oversight from us :grin:. … I am flagging your message, so that when I hire a new person in the team next year we can attend to this.

Also, my understanding is that the values produced in these matrices are approximations of the Pearson correlation coefficients between each pair of variables. If this is the case, is one of these matrices a better approximation?

In the paper that you have read we show that this approximation holds for canonical mode deflation on simulated data, but we actually did not look into PLS-reg. I really can’t remember why as I was not the lead author, but my feeling is that it made more sense to use to focus on canonical mode when you are primarily interested in associations between variables (rather than with the outcome Y).

I think we will look into details on all of this next year. In the meantime, you should tweak the code for your own purpose!

Kim-Anh