Questions about diablo correlation plot

I am trying to see the correlation between bacterial abundance, fungal abundance, and host gene expression and the results don’t make sense to me. When I ran sPLS originally, I obtained also contrasting results but in general it looked like there were many more genes strongly correlated with fungi than with gene and bacteria.

When I ran diablo on all three datasets, the correlations are worse for gene and bacteria, but the PCA plots look much better for gene and bacteria. Figure attached below for component 2 which was the best one.

However when I ran diablo on just pairwise datasets (so gene and bacteria or gene and fungi), the results suggest a stronger correlation between gene and bacteria than gene and fungi. How do I interpret these results and what would be more reliable - running all 3 datasets together or focusing on pairwise?

Diablo_Bacteria_comp2
Diablo_Fungi_comp1

1 Like

PCA plots look much better for gene and bacteria

Don’t forget that PCA builds components which maximise the captured variance while DIABLO looks to maximise the covariance. These methods are not equivalent.

when I ran diablo on just pairwise datasets (so gene and bacteria or gene and fungi), the results suggest a stronger correlation between gene and bacteria than gene and fungi

Comparing the values seen in the figures:

  • With 3-block DIABLO:
    • gene-bacteria = 0.89
    • gene-fungi = 0.83
  • With 2-block DIABLO:
    • gene-bacteria = 0.84
    • gene-fungi = 0.71

These seem fairly consistent with each other. We can expect a decrease when using the 2-blocks as the model doesn’t have access to the information provided by the components generated between the bacteria and the fungi data frames.

what would be more reliable - running all 3 datasets together or focusing on pairwise?

Using both pairwise analysis and multiblock analysis will provide the most holistic insight into the structure and relationships in your data. Hence, the answer is both. I would say that the final model should be using all three - but I can’t say that for certain as I don’t have context or the data itself.

1 Like

Thank you so much for your input. This helps me understand the data a bit better.