Correlation value in the circosPlost


I have run a correlation analysis using Block.splsda to draw a circos plot. I saved the plot in an excel to see the actual correlation value. I found that the correlation value of the same compound/parameter is not one. Just wondering why that is so. It that case, the rownames and colnames are not correct?

Hi @Shyam,

Thanks for using mixOmics.

The correlations are not exact but an estimation using the canonical variates. You can refer to the Methods in (Pair-wise variable associations for CCA) for more details. That is why, the more components you add the closer you should typically get to 1. See example below.

Let us know if you have further questions.

## Function to create example circosPlots given 'ncomp'
circosPlot_example <- function(ncomp = 2) {
    Y = nutrimouse$diet
    data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid)
    design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3, byrow = TRUE)
    nutrimouse.sgccda <- wrapper.sgccda(X=data,
                                        Y = Y,
                                        design = design,
                                        keepX = list(gene=c(10,10), lipid=c(15,15)),
                                        ncomp = ncomp,
                                        scheme = "horst")
    circosPlot(nutrimouse.sgccda, cutoff = 0.7, ncol.legend = 2, size.legend = 1.1)

## ------- 2 components
simMat1 <- circosPlot_example(ncomp = 2)
## estimated correlation for ACC2
#>      ACC2 
#> 0.7805765

## ------- 4 components
simMat2 <- circosPlot_example(ncomp = 4)
## estimated correlation for ACC2
#>      ACC2 
#> 0.8067362

## ------- 8 components
simMat3 <- circosPlot_example(ncomp = 8)
## estimated correlation for ACC2
#>      ACC2 
#> 0.8955538



Thank you so much, Aljabadi.

Hi AI,

I have one more question. How important is the use of ‘keepX’ function and how can I chose right number for my keepX list. Like you choose c(10,10) for gene and c(15,15) for lipid. Can I chose any row or column from my data or is there any comments on how to choose those number? I am beginner and it might be a naive question.

Thank you and looking forward to hear from you.


Hi @Shyam,

You can use the tune.block.splsda function for that purpose. Essentially, you provide a ncomp (The number of classes can be a good ncomp for tuning) and a list of candidate test.keepX for each block. The function will then try to recommend a number of components and number of keepX which will have the most predictive performance using cross validation. It is important that you use Mfold cross-validation and repeat the process(nrepeat) at least 2 times so the function can evaluate the significance of model improvement (if the computation is not intensive, the more nrepeat, up to 50, the more robust the model evaluation) . You can refer to the tune.block.splsda documentation for more details. I have included the example below with parallel processing (cpus >1).

# this is the X data as a list of mRNA and miRNA; the Y data set is a single data set of proteins
data = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna,
protein = breast.TCGA$data.train$protein)
# set up a full design where every block is connected
design = matrix(1, ncol = length(data), nrow = length(data),
dimnames = list(names(data), names(data)))
diag(design) =  0
# set number of component per data set
ncomp = 5

# Tuning the first two components
# definition of the keepX value to be tested for each block mRNA miRNA and protein
# names of test.keepX must match the names of 'data'
test.keepX = list(mrna = seq(10,40,20), mirna = seq(10,30,10), protein = seq(1,10,5))

# the following may take some time to run, note that for through tuning
# nrepeat should be > 1
tune = tune.block.splsda(X = data, Y = breast.TCGA$data.train$subtype,
ncomp = ncomp, test.keepX = test.keepX, design = design, nrepeat = 3, cpus = parallel::detectCores()-1)


Please let us know if you have further questions.