DIABLO of Fold Change data

Hello mixOmics team,

I have been trying to use the DIABLO model to integrate RNAseq data and Metabolomics data of different plants in response to a virus. Since I am interested only in the differences in response to the virus, I want to use fold change data as input in the model (i.e. the ratio between inoculated and non-inoculated plants). I managed to make DIABLO work using the raw fold change ratio (inoc/mock) but when I try to use the log2(Fold Change) in order to make the data symetrical, I get stuck at the tuning step. I get this error:

tune.diablo = tune.block.splsda(X = data, Y = Y, ncomp = ncomp, 
                                 test.keepX = test.keepX, design = design,
                                 validation = 'loo', folds = 10, nrepeat = 1,
                                 dist = "max.dist")
Error in 1:n : Argument is NA / NaN

Is it because the Log2 transformation introduces negative values?

Also I read that one would have to use a high number of repeats (at least 100) in order to tune the model correctly. Is that the case with LOO cross validation?

Thank you in advance for your help,

Julien

hi @Julien ,

The problem is not negative values but perhaps a high correlation between the variables after you apply the fold change? or a number of folds that is too high, or a number of components that is too high.

From the error you mention, it seems that is only happens during the tuning?

I think in your case since you ask already loo, the folds = x does not have any precedent (I was about to suggest you try below:
Try folds = N / 6 (roughly that means I’d like you to retain at least 6 samples in your test fold every time); N = number of samples.
)

For ncomp, hard to tell but if you followed our vignette, I assume you have tuned ncomp first, but try ncomp - 1 to see what happens.

Is block.splsda running fine for a small number of components? I think you want to first try this point. If it is throwing a similar error, then it’s due to the value of the fold change and perhaps a large number of variables with 0 variance.

Your other option, which is what we do is you include the virus samples and don’t do the fold change, and then compare your groups based on ‘virus’, ‘treatment A’ etc.

Kim-Anh