Hi, I have three datasets I wish to integrate. I have performed separate sPLS-DA for each dataset and I have two categories: healthy (n=9) and diseased (n=6). The ideal ncomp and keep.X according to tune.splsda were as follows:

dataset ncomp keep.X

A 1 9

B 3 600, 460, 110

C 1 10

Analysing B further, the first 30 variables of comp1 are only significantly changed.

I thus would like to integrate these variables in Diablo.

I ran:

list.keepX <- list(colon = c(9,1), plasma = c(30,1), olink = c(10,1))

MyResult.diablo <- block.splsda(X, Y, keepX=list.keepX, ncomp=2)

But when I visualise the data, e.g. by circosPlot many of the variables included in the plot are not those I wished to select, i.e. colon = c(9,1), plasma = c(30,1), olink = c(10,1) and many interesting ones are missing. So it seems my code is not extracting the correct variables. Did I misunderstand? How can I integrate only the variables of interest? Or would you argue against doing this at all given the small number of samples? Alternatively, how can I identify the best number of variables for each dataset for DIABLO? Is there something similar to tune.splsda that works for X with three datasets?

The variables we identified using the separate analyses are highly significant and make sense, so I wish to identify relationships amongst them.

Thank you very much for your help.

I very much enjoy mixomics and it is very easy to do PLS-DA with it and get beautiful figures

Cheers,

Stef