Hi,

Thank you for a great tool! I think I got a pretty good understanding of the (s)PLS-DA rather quickly DIABLO is a bit complicated though.

I have identified variables from 3 datasets that distinguish each dataset between healthy and disease. Basically, 9 variables of comp 1 in dataset1,

200 variables of comp1 in dataset2

and 8 variables of comp1 in dataset3.

I have extracted these variables and their values to create a list of objects for DIABLO only containing these 217 variables. Does this make sense?

Next, I want to tune the number of variables to keep from each dataset:

test.keepX = list (colon = c(3:9), plasma = c(5:10, seq(20, 100, 20)),olink = c(3:8))

design = matrix(0, ncol = length(X), nrow = length(X),dimnames = list(names(X), names(X)))

design

tune.BBM = tune.block.splsda(X = X, Y = Y, ncomp = 5,test.keepX = test.keepX, design = design, validation = âMfoldâ, folds = 10, nrepeat = 5,dist = âcentroids.distâ, cpus = 7)

But I do not understand the output, e.g. I get an error rate of 0 for all comp and

tune.BBM$choice.keepX

$colon

[1] 3 3 3 3 3

$plasma

[1] 5 10 5 5 5

$olink

[1] 3 3 3 3 3

Please, see the attached screenshots.

How do I interpret the results? Error = 0 canât be right. What went wrong?

In addition, I ran circosPlot including all variables. I wanted to get the corMat as well, so I ran:

`corMat <- circosPlot(MyResult.diablo1,cutoff=.7, ncol.legend = 2, size.legend = 1.1)`

as suggested in another post here in the forum. However, I am puzzled by the output.

How does this code calculate the correlation?

Looking at the matrix, I see one metabolite of dataset1 correlating with itself from the same dataset by only 0.65 while correlating with a protein from a different dataset (and biological compartment) with 0.72. How can this be?

/Stef