Questions about diablo correlation plot

kdb.chau · August 30, 2022, 8:05pm

I am trying to see the correlation between bacterial abundance, fungal abundance, and host gene expression and the results don’t make sense to me. When I ran sPLS originally, I obtained also contrasting results but in general it looked like there were many more genes strongly correlated with fungi than with gene and bacteria.

When I ran diablo on all three datasets, the correlations are worse for gene and bacteria, but the PCA plots look much better for gene and bacteria. Figure attached below for component 2 which was the best one.

However when I ran diablo on just pairwise datasets (so gene and bacteria or gene and fungi), the results suggest a stronger correlation between gene and bacteria than gene and fungi. How do I interpret these results and what would be more reliable - running all 3 datasets together or focusing on pairwise?

Diablo_Bacteria_comp2
Diablo_Fungi_comp1

MaxBladen · August 30, 2022, 11:07pm

PCA plots look much better for gene and bacteria

Don’t forget that PCA builds components which maximise the captured variance while DIABLO looks to maximise the covariance. These methods are not equivalent.

when I ran diablo on just pairwise datasets (so gene and bacteria or gene and fungi), the results suggest a stronger correlation between gene and bacteria than gene and fungi

Comparing the values seen in the figures:

With 3-block DIABLO:
- gene-bacteria = 0.89
- gene-fungi = 0.83
With 2-block DIABLO:
- gene-bacteria = 0.84
- gene-fungi = 0.71

These seem fairly consistent with each other. We can expect a decrease when using the 2-blocks as the model doesn’t have access to the information provided by the components generated between the bacteria and the fungi data frames.

what would be more reliable - running all 3 datasets together or focusing on pairwise?

Using both pairwise analysis and multiblock analysis will provide the most holistic insight into the structure and relationships in your data. Hence, the answer is both. I would say that the final model should be using all three - but I can’t say that for certain as I don’t have context or the data itself.

kdb.chau · August 31, 2022, 12:24pm

Thank you so much for your input. This helps me understand the data a bit better.

Topic		Replies	Views
Diablo Interpretation of correlations between Components Analysis	2	489	January 25, 2022
Choosing Diablo Design Matrix Analysis	9	2723	April 18, 2024
Block.splsda - DIABLO - correlation between samples and variables Analysis	7	444	April 27, 2023
Unexpected correlations compared to expression lines in DIABLO CircosPlot Support	4	1061	January 7, 2021
Final integrated PCA projection Analysis	4	476	March 25, 2020

Questions about diablo correlation plot

Related topics