Thank you for a great tool! I think I got a pretty good understanding of the (s)PLS-DA rather quickly DIABLO is a bit complicated though.
I have identified variables from 3 datasets that distinguish each dataset between healthy and disease. Basically, 9 variables of comp 1 in dataset1,
200 variables of comp1 in dataset2
and 8 variables of comp1 in dataset3.
I have extracted these variables and their values to create a list of objects for DIABLO only containing these 217 variables. Does this make sense?
Next, I want to tune the number of variables to keep from each dataset:
test.keepX = list (colon = c(3:9), plasma = c(5:10, seq(20, 100, 20)),olink = c(3:8))
design = matrix(0, ncol = length(X), nrow = length(X),dimnames = list(names(X), names(X)))
tune.BBM = tune.block.splsda(X = X, Y = Y, ncomp = 5,test.keepX = test.keepX, design = design, validation = ‘Mfold’, folds = 10, nrepeat = 5,dist = “centroids.dist”, cpus = 7)
But I do not understand the output, e.g. I get an error rate of 0 for all comp and
 3 3 3 3 3
 5 10 5 5 5
 3 3 3 3 3
Please, see the attached screenshots.
How do I interpret the results? Error = 0 can’t be right. What went wrong?
In addition, I ran circosPlot including all variables. I wanted to get the corMat as well, so I ran:
corMat <- circosPlot(MyResult.diablo1,cutoff=.7, ncol.legend = 2, size.legend = 1.1)
as suggested in another post here in the forum. However, I am puzzled by the output.
How does this code calculate the correlation?
Looking at the matrix, I see one metabolite of dataset1 correlating with itself from the same dataset by only 0.65 while correlating with a protein from a different dataset (and biological compartment) with 0.72. How can this be?