After get the correlation value by block.splsda, then how to compute the pvalue of the correlation.
Thank you Cherry for your question. Unfortunately we don’t provide a p-value, because our methods do not fit into a hypothesis testing framework. What you could do if you really want a p-value is to conduct a correlation test after the identification of specific pairs of variables of interest. That would be done outside mixOmics.
Hi,Kimanh.lecao
Thank you for your reply, I am very sorry for not respond timely.
I have a question about the correlation of circosPlot.
why the group information affect the value and direction of correlation
For example, I set y like this :
y<-c(“CASEA”,“CASEA”,“CASEA”,“CTRO”,“CTRO”,“CTRO”).
and set y like this:
For example, I set y like this :
y<-c(“CASEA”,“CASEA”,“CASEB”,“CASEB”,“CTRO”,“CTRO”).
The correlation values of two condition are different.
Hi @Cherry,
As Kim-Anh is away I’d like to see if I can help you.
Is y
the group membership of samples? In which case please note that it represents the group membership of samples in your X matrices and by changing its order the model and the estimated variates will also change. The correlations are estimated using the model variates and hence by changing the model, the estimations will most likely vary. You can refer to https://biodatamining.biomedcentral.com/articles/10.1186/1756-0381-5-19 for more on the correlation estimation rationale and methodology.
Hope it helps.
Please let us know if you have further questions.
Al
Hi, Al
Thank you very much for your response.
Here are the details:
x<-list(a,b)
x
[[1]]
A B
CASEA1 2.107631 6.789404
CASEA2 2.869464 14.438547
CASEA3 4.225236 3.877956
CTRO1 5.078838 5.062241
CTRO2 1.869390 11.026929
CTRO3 0.983715 6.290816
[[2]]
a c
CASEA1 2.471672e-05 3.517924e-06
CASEA2 2.559754e-05 4.058982e-06
CASEA3 2.370850e-05 3.840532e-06
CTRO1 1.386839e-05 2.508619e-06
CTRO2 1.935603e-05 3.344055e-06
CTRO3 1.801009e-05 3.132687e-06
y<-c("CASEA","CASEA","CASEA","CTRO","CTRO","CTRO")
cc<-length(unique(y))
cc
[1] 2
design<-matrix(0.1,ncol = length(x),nrow = length(x),dimnames = list(names(x)))
sgccda.res<-block.splsda(X=x,Y=y,ncomp=2,design = design)
Design matrix has changed to include Y; each block will be linked to Y.
> P<-circosPlot(sgccda.res,cutoff = 0,color.blocks = c('darkorchid','brown1'),color.Y=1:cc, color.cor = c('blue','red'),line=TRUE,size.labels = 1.2,size.variables = 1.2)
>P
A B a c
A 1.0000000 -0.3548633 0.5471020 0.2485341
B -0.3548633 1.0000000 0.5884418 0.8173878
a 0.5471020 0.5884418 1.0000000 0.9467750
c 0.2485341 0.8173878 0.9467750 1.0000000
Then I changed the sample names and y:
>x<-list(a,b)
>x
[[1]]
A B
CASEA1 2.107631 6.789404
CASEA2 2.869464 14.438547
CASEB1 4.225236 3.877956
CASEB2 5.078838 5.062241
CASEC1 1.869390 11.026929
CASEC2 0.983715 6.290816
[[2]]
c d
CASEA1 2.471672e-05 3.517924e-06
CASEA2 2.559754e-05 4.058982e-06
CASEB1 2.370850e-05 3.840532e-06
CASEB2 1.386839e-05 2.508619e-06
CASEC1 1.935603e-05 3.344055e-06
CASEC2 1.801009e-05 3.132687e-06
y<-c("CASEA","CASEA","CASEB","CASEB","CASEC","CASEC")
cc<-length(unique(y))
cc
[1] 3
>design<-matrix(0.1,ncol = length(x),nrow = length(x),dimnames = list(names(x)))
> sgccda.res<-block.splsda(X=x,Y=y,ncomp=2,design = design)
Design matrix has changed to include Y; each block will be linked to Y.
>P<-circosPlot(sgccda.res,cutoff = 0,color.blocks = c('darkorchid','brown1'),color.Y=1:cc, color.cor = c('blue','red'),line=TRUE,size.labels = 1.2,size.variables = 1.2)
P
A B c d
A 1.0000000 -0.3548633 -0.7225435 -0.9066211
B -0.3548633 1.0000000 0.9027369 0.7162113
c -0.7225435 0.9027369 1.0000000 0.9467750
d **-0.9066211** 0.7162113 0.9467750 1.0000000
I just changed the sample names and group, didn’t change the order, but the correlation between A with d is quite different.
Hi @Cherry,
Thanks for the details. My understanding is that:
-
You are performing two analyses on datasets that contain different but overlapping samples
-
You are wondering why the estimated correlations between the same variables are different (possibly the variables
c
andA
?)
Is that right?
In which case, it’s important to remember that you are looking at different datasets with non-identical samples so the correlation values will be most likely different.
Yes, two two analyses , the samples are the same , I just change the sample names. In the first analysis, six samples, two group. In the second analysis, six samples, three group. Yes, I didn’t understand why
the estimated correlations between the same variables are different (possibly the variables c
and A
)?
The correlations are estimated
using the model’s canonical variates and thus will be different when you change the model by changing Y.
You can refer to the mentioned paper’s Methods section for more details. Given that your datasets have limited number of variables, you can always calculate the exact correlations using the cor
function in R but I’m not sure if that is something useful for your analysis.
Hope it helps
Al