Help with DIABLO heatmap


Sorry if this is a silly question, but I’m new to DIABLO and just looking for a little help and reassurance I’m on the right track with my analysis :slight_smile:

I have shotgun metagenomic data which I was looking to link the taxonomic and functional data to look for associations between the to for my disease diagnosis group.

So far I have

sgccda.res = block.splsda(X = data, Y = Y, ncomp = 5,
design = design)

Where Y= my disease/health metadata value
X= taxa and functional data

I successfully built the spiderweb of associations but found it a bit hard to read because it listed so many taxa + pathways

circosPlot(sgccda.res, cutoff = 0.7)

So I thought I would instead draw a heatmap to make it easier to read.

#from the tutorials this doesn’t work for me because of the margins I think
cimDiablo(sgccda.res, margin=c(8,20), legend.position = “right”, trim = “FALSE”)

#I found this in another topic and manged to generate the heatmap but again i think it has listed every taxa and pathway so it’s difficult to read

get univariate correlations for pairwise blocks

X.merge ← Reduce(cbind, sgccda.res$X)
univariate.cors ← cor(X.merge, use = ‘pairwise.complete.obs’)

order features based on corMat

univariate.cors ← univariate.cors[rownames(corMat), colnames(corMat)]

heatmaps generally agree but there are exceptions especially for low correlations

pheatmap::pheatmap(univariate.cors, cluster_rows = FALSE, cluster_cols = FALSE, show_rownames = FALSE, show_colnames = FALSE)

pheatmap::pheatmap(corMat, cluster_rows = FALSE, cluster_cols = FALSE, show_rownames = FALSE, show_colnames = FALSE)

Any help would be greatly appreciated :slight_smile:

I’m a bit confused as to what you’re actually asking sorry.

Are you just wanting to reduce the number of features shown in a heatmap? Have you applied any feature selection methods?

Sorry for that, yes I’d like to limit the number of features in the heatmap and I’m not quite sure how to do this.

Is there an option to limit to the top 50-100 most significant matches?

Have you applied any feature selection? This will result in only the selected features showing.

CIMs are not applying statistical tests, and so don’t have a degree of “significance”. Do you mean those with the largest loadings, the most discriminatory features, greatest correlation with the response or something else?

Via cimDiablo(), there isn’t currently a way to cutoff certain features (as there is in cim()). My suggestion would be to take the $M component from the output of cimDiablo(), apply your filtering and then pass into pheatmap(). That, or exploring the use of keepX to select features when building your model

Thanks Max :slightly_smiling_face:

I haven’t tried any feature selection because I’m not 100% sure what to focus on yet.

I guess what I’d ideally like to create is a heatmap with correlations (i.e., spearman) showing which functional pathways match which bacteria in disease vs healthy group.

Based on your answer above would you suggest

  1. Get correlations for dataset—> does Diablo have spearmans? This is the most common measure I’ve seen in publication, but I’ve not seen reference to it in the tutorial. If you have other suggestions I’m very happy to follow your guidance.
X.merge ← Reduce(cbind, sgccda.res$X)
univariate.cors ← cor(X.merge, use = ‘pairwise.complete.obs’)
  1. Get M component from cimDiablo (the code below gives me an error saying: “‘trim’ must be either logical or numeric”)

M <- cimDiablo(univariate.cors, margin=c(8,20), legend.position = “right”, trim = “FALSE”)

  1. Filter by significance of spearman correlation to a set cut-off (i.e., P=0.01). How do I do this exactly?

  2. pheatmap results with filtered M object

pheatmap::pheatmap(M, cluster_rows = FALSE, cluster_cols = FALSE, show_rownames = FALSE, show_colnames = FALSE)

Sorry for the trouble and thank you for the help, R is still new for me so I apologise for being slow if the answer is very obvious

For your methodology, it would be much easier to calculate correlations manually and then pass it to cim(), not cimDiablo(). Have you read the documentation for these functions?

Point 1:
if you are just wanting the correlations between each feature, why are you using the DIABLO method? This method is used to integrate multiple datasets in order to best discriminate your categorical response and focuses on maximisation of covariance between the latent components within the data. It generates a predictive model.That is quite different to simply calculating the spearman correlations between each feature.

Note, I’m not saying to not use DIABLO dow the line, but for the purposes of correlation analysis, it isn’t the right way to go.

Point 2:
You have "FALSE" (note the quotation marks). When working with logical variables, remove the quotation marks (eg TRUE and FALSE).

Point 3:
You can just use the which() function. Read more via ?which

Here is a brief example of what you could do to achieve your desired results. You will need to adjust things to work for your specific use case:

``` r

X1 <- breast.TCGA$data.train$mrna
X2 <- breast.TCGA$data.train$mirna
X3 <- breast.TCGA$data.train$protein

X <- cbind(X1, X2, X3)
#> [1] 150 384

cor.mat <- cor(X, method = "spearman")
cim(cor.mat, cutoff = 0.5)

1 Like

Thank you so much for your amazing help and patience

I ran into an error about margin sizes being to small but saw in the help page that if you expand the size of the RStudio plot window it will work :slight_smile:

1 last question…is there a way to limit the taxa to 1 axis and the function to the other?

Thank you again!!

I answered my question :slight_smile:

I dropped the cbind and instead listed the 2 datasets to compare in cor

cor.mat <- cor(Genus_1, Funct, method = "spearman")

I do have one final final question… how do I look for these correlations in respect to my disease diagnosis? (I’ll reply to myself if I work it out over the weekend in case it helps others in future)

Thank you

how do I look for these correlations in respect to my disease diagnosis?

I’m not sure I get what you mean. You can color the sides of the cim() plot with a custom vector if that would be useful (see row.sideColors and col.Sidecolors via ?cim). Your disease diagnosis would be on a sample by sample basis, whereas this CIM is looking at the correlations between pairs of features