Question regarding the similarity matrix

Dear mixOmics team,

Thank you for such a great R package. I am currently trying to combine metabolomics data with gene expression data. Given the small sample size per condition (N=3) I am taking a more exploratory approach. I have 5 groups (control, 4d, 6d, 8d and 12 days of treatment) and see a clear separation between groups using plotDiablo() and plotIndiv() with groups having undergone a longer treatment being separated further from the control group.

With regard to my approach, I want to use the similarity matrix to see what genes are correlating most with specific metabolites. However, my keepX is based on 5000 genes and 500 metabolites. Therefore, when I use the circosPlot, R crashes and I cannot save the circosplot as an object to extract the similarity matrix. Is there another way to obtain the similarity matrix?
and does it make sense to use such a large number of genes/metabolites for the keepX?
The separation of groups in plotDiablo() and plotIndiv() looks very good and I do not want to exclude potentially relevant genes/metabolites.

Thank you for your time!
kind regards,

hi @NilsM,

I dont think it makes much sense to select so many genes and metabolites, as you are experiencing in trying to interpret your results! Usually between 50 - 150 genes would be enough for gene enrichment, for example?

You can save the circosPlot similarity as an object, i.e
my object <- circosPlot(...)

as mentioned in ?circosPlot:


If saved in an object, the circos plot will output the similarity matrix and the names of the variables displayed on the plot (see attributes(object)).