Scatter plot using sgcca() output

Hello everyone,

I will describe what I want to do to see if you guys @MaxBladen @kimanh.lecao and anyone that can help me, please.

I have this two features and I am plotting a scatter plot of their expression values from all my samples. Visually (and also running correlation functions like Pearson or spearman) they are not correlated at all.
But when I use sgcca() these two features are correlated, it’s visible in the correlations plots and also in the relevance network with cutoff higher than 0.7.

My question is, there is some values outputted by the sgcca() function that I can use to plot a scatter plot to more clearly show the correlation depicted by the sgcca()? I know the correlation plot and network are showing that, but it is for a public that is not familiar with this kind of plots. A scatter plot should be ideal even more because I can plot side-by-side with the scatter plots from expression values, then the difference between methods could be clearly noted.

EDIT: Is the variates list inside the sgcca object what I am looking for?

Thanks for the great package and active support,

– Ana

hi @annaol,
the reason why they are correlated from sgcca is because the correlation is calculated via the variates/components (i.e we calculate the correlation between each variable and the variate).

I am copy pasting some of my slides showing why this is better than Pearson / Spearman. But unfortunately you will have to explain to your audience that Pearson/Spearman leads to spurious correlation when the number of variables is large, also because it does not take into account unmeasured variables. Your audience should be familiar with a network visualisation though (which you can export to Cytoscape).

Screen Shot 2023-03-24 at 09.58.53

I hope that helps,

Hi @kimanh.lecao

Thank you very much for your answer. Indeed, I agree the slides are very helpful. A very simple and creative way to explain. But is there no way to generate a plot similar to a scatter plot?

Hi @annaol,
No, not directly, because the correlation is calculated via the component, not between variables, as is shown on the slides. This is why we use correlation circle plots / network / CIM based on the similarity matrix.