I have carried out a DIABLO analysis integrating three blocks of variables against my outcome (0 = healthy, 1 = diseased).
However, the CircosPlot shows incongruent results when comparing expression lines and correlations between some pairs of variables.
For example, based on the expression lines (and confirmed by a univariate comparison beween groups), variable X is higher in group 0 than in group 1, while variable Y is higher in group 1 than in group 0. However, the correlation between these variables depicted as positive. When I extract the correlation matrix from the CircosPlot, the correlation between the two variables is indeed positive. Can you please explain how this is possible?
Please note that the degree to which the multivariate correlations would agree with the univariate ones would depend on the covariance structure between the blocks. They would generally agree but not in every case (see heatmaps of univariate vs multivariate correlations below). Additionally, looking more closely into your case, the ICNs block has a number of samples with completely missing values (see below). This would also lead to sub-optimal integration. Also, some metabolites are consistently missing across specific samples. In this case that the missing values are not random, you can look into imputation to see if you see any improvements (see mixOmics’ ?impute.nipals function or impute::impute.knn).
The following code goes through some of the points discussed.
Hope it helps
Al
library(mixOmics)
## change this to your own data path
load('/Users/alabadi/Projects/dev/R/_work/mixOmics/mixOmics_ajabadi/mixOmics_ajabadi/buildignore/devel/circosplot/MyDiablo_data.RData')
## get correlation matrix
corMat <- circosPlot(MyResult.diablo, cutoff = 0.4)
## get univariate correlations for pairwise blocks
X.merge <- Reduce(cbind, MyResult.diablo$X)
univariate.cors <- cor(X.merge, use = 'pairwise.complete.obs')
## order features based on corMat
univariate.cors <- univariate.cors[rownames(corMat), colnames(corMat)]
## heatmaps generally agree but there are exceptions especially for low correlations
pheatmap::pheatmap(univariate.cors, cluster_rows = FALSE, cluster_cols = FALSE, show_rownames = FALSE, show_colnames = FALSE)
Thank you so much for such a detailed explanation!
Can I just clarify how the similarity matrix (i.e. the multivariate correlations) is calculated please? I’ve read that “the values in the similarity matrix are computed as the correlation between the two types of projected variables onto the space spanned by the first components retained in the analysis” (González et al., 2012). Does it mean that the correlation is computed on the weight (i.e. loading) of each variable on principal component 1 rather than between the variables themselves (as for univariate correlations)? If this was the case, there would be only 1 loading value for each variable, which would make it impossible to calculate a correlation between variable pairs. Can you please explain?
As for the multi-block correlation generated with the function plotDiablo() shown below, how is the correlation among blocks calculated? Is it a Pearson correlation between each pair of variates? And in the resulting plot, I suppose that the scores for each X block are plotted on a common variate?
Basically, for every block and for a given component:
The correlation between original variables and the derived variates are calculated as a vector cord
For a given pair of blocksi, j (i = j allowed), the outer product (cord_i ⊗ cord_j) of all the correlations calculated at step 1 is calculated which gives a feature by feature similarity matrix. This ensures a given pair of features would only be considered highly correlated if they both are highly correlated with the corresponding derived components of their respective blocks.
All these values are merged across blocks to create a matrix including all pairwise similarity measures.
As you would see in ?plotDiablo: The lower triangular panel indicated the Pearson's correlation coefficient, the upper triangular panel the scatter plot.
I’m not sure if I got your question right but here’s what should address your query. For a given component, plotDiablo creates a variate by variate scatter plot for all pairs of blocks. This means for any given block we have only one vector of values shown (for the given component) against other variates.