cimDiablo number of genes on the x axis

Hi mixOmics team!

First thanks for producing such a neat tool. I have a couple of questions - one specific, one general.

First, after producing the cimDiablo plot I notice that there are fewer gene ids than there are branches. Is there a way to get all the gene ids printed or exported so that I can look at it manually? I was able to export a matrix from the cimDiablo object but I don’t know that they are in the same order as the plot (below).

Second, I remain a bit unclear about how one chooses the input subsets (test.keepX) and why and how that corresponds to nrepeat in tune.block.splsda . Can you please point me to guidance on choosing the best or at least sensible test.keepX and when to use different values of nrepeat?

Again, thanks!

Hi @Seth,

Thanks for reporting this.

This is an issue with the visualiser that we’re trying to fix (cimDiablo plot size affects variables shown · Issue #142 · mixOmicsTeam/mixOmics · GitHub)

In the meantime, you can simply expand the plot width in RStudio and it will show all the features.

Hope it helps

Al

Hi @Seth,

You can now do :

BiocManager::install('aljabadi/mixOmics@cimdiablo-plotdims')

And then save your plot output with a wider width following the examples at cimDiablo plot size affects variables shown · Issue #142 · mixOmicsTeam/mixOmics · GitHub

1 Like

Hi @Seth,

In addition to what @aljabadi wrote, you can also change the size of the labels (col.cex and row.cex arguments). This often solves the problem for me.

If you are using an updated mixOmics version, the cimDiablo will be saved as list object. The col.names vector herein contains the column names from left to right, and the row.names contains the rownames from bottom to top.

You choose the test.keepX based on your research question. If you looking for a minimal signature of variables to predict an outcome, then you should not set test.keepX too high. If you are interested in retaining alot information for some reason, you can go higher, as long as you are able to interpret the results. Another thing to consider, is how percise you want your results to be? If you want very precise results, you can choose a fine grid (e.g. c(5:50)), but if this is not of vital importance, then you can choose a coarse grid (e.g. c(5:9, seq(10,49,5))). Increasing the number of variables to test, increases the computational time/demand of the tuning step, and so does the number of nrepeats (e.g. how many time should the cross-validation be repeated). If you are looking for very precise and reproducible results, you can increase the nrepeat to above 50, given that you have the computational requirements and/or patience.

  • Christopher
2 Likes

Thanks for the answers both!