keepX and feature selection for circos plot


We are trying to perform DIABLO for our own dataset containing mRNA data (65932 features), proteomics data (1594 features) and metabolomics data (2912 features) for 78 samples. Could you help us with the following questions?

  1. We performed DIABLO with test.keepX = list (mRNA = c(5:9, seq(10, 18, 2), seq(20,30,5)), metabolomics = c(5:9, seq(10, 18, 2), seq(20,30,5)), proteomics = c(5:9, seq(10, 18, 2), seq(20,30,5))) as default mentioned in the manual. The ncomp was set to 2. The number of features obtained in circos plot are mRNA = 73, metabolomics = 54 and proteomics = 61. The number of features that we defined in keepX did not match with the number of features in the circos plot. What is the basis for selecting the features in the circos plot?

  2. How to set keepX for our own dataset? On what basis we have to decide the grid for keepX?

Priyanka Ramesh

hi @r.priyanka1802,

Are you saying that the optimal keepX is not mRNA = 73, metabolomics = 54 and proteomics = 61 across 2 components? It should show the same, unless you have set a cutoff in the circosPlot.

You can decide yourself the values of the keepX, based on previous analysis, or just because the number of features might be too large / small for your analysis. The grid set in the example is arbitrary, mostly based on what we ‘expect’ from the data. (PS: also consider pre filtering your mRNA based on the mostly variant features, as you have too many to start with. We often include 5,000 - 10,000 mRNA max).


Thank you so much for your support. We will try different cut-off for the circos plot and also different pre-filtering methods to reduce mRNAs based on mostly variant features.

Priyanka Ramesh