Hi,
Thanks again for a great tool! I have a question about the correlation values in the CircosPlot. I understand you do not provide p-values as this is based on multivariate analysis but I wonder at which cutoff you would judge a correlation/association as significant? E.g. I have fecal microbes and metabolites in serum that I integrated to analyse which bacteria influence/have an impact on which serum metabolites. The strongest correlation values range from 0.67 to 0.4. When I perform Spearman correlations on these bacteria/metabolite pair, most of them have VERY low r (0.1-0.2) and are not significant. I am struggling to interpret this, i.e. are there any relevant associations? Or would you use a cutoff higher than 0.4? If so, which?
Thanks a lot for your input!
/Stef
Hello @stepra
I wonder at which cutoff you would judge a correlation/association as significant?
I would recommend not thinking about “significance” when it comes to the results of the plotVar()
figure. There is no statistical test being performed here, as you noted by the fact that there are no p-values.
The strongest correlation values range from 0.67 to 0.4
In the grand scheme of things, these are not terrible results! While these may not be the strongest correlations, the actual value is sometimes not important. I’d encourage you to use the plotVar()
results to assess how your features cluster and look at the sign of the correlation on the axes to determine their influence.
When I perform Spearman correlations on these bacteria/metabolite pair, most of them have VERY low r (0.1-0.2) and are not significant. I am struggling to interpret this, i.e. are there any relevant associations?
Remember, the values shown in plotVar()
are the correlations between each feature and each component, not between each feature pair. That’s why you’re not getting the same values as what you see in plotVar()
.
As you likely know, when it comes to biological system, there are usually very complicated and intricate mechanisms by which aspects of that system impact each other. This is certainly true for microbial and metabolic data in my experience. Examining your features in a pairwise fashion is totally missing out on all that complexity! Therefore, it’s unsurprising that you have very low pairwise correlations. That’s why the methods in mixOmics
are handy, they can more accurately capture these complex systems by using a single feature to measure the impact of multiple features.
would you use a cutoff higher than 0.4? If so, which?
It really depends on how many features remain after the cutoff. As mentioned above, the plotVar()
figure is not a statistical test of significant relationships - its an exploration tool we can use to gain a deeper understanding of how our data relates to itself. If you can see clear patterns of clustering (which maybe corresponds to how your groups cluster in the output of plotIndiv()
), then leave your cutoff as is. If there is just too much going on in the correlation circle plot, increase the cutoff to reduce the noise. Conversely, if there are barely any features showing, decrease the cutoff.
Unfortunately, you will just have to play around with the cutoff value a bit until you’re satisfied. One last reminder that the correlation circle plot is an explorative tool, not a test of significance.
Hope this was clear and helps a bit. Feel free to reach out if you have any other questions!
Dear Max,
Thank you for this detailed response! It was very helpful. I will take a closer look at the plotVar and plotIndiv once again. Ok, good to know that a value of e.g. 0.4 is not raising any red flags for you
/Stef