Hi,
I have a numeric outcome as a response to an intervention (%). I have used sPLS regression to associate it to metabolite changes (%) in patient serum from time point 1 to time point 2 with very interesting results!
I now want to see which gut microbial changes are associated with the serum metabolite changes correlating with my outcome, i.e. are there specific microbes influencing the metabolite levels thus impacting my response to the intervention? For the metabolites I just calculated the relative change in serum levels but due to compositionality, this is not possible for my taxa, as we do not have any absolute measures like total bacterial load. I was thus wondering whether it is possible to perform a multilevel DIABLO with a numeric outcome? Is this possible in mixomics? If yes, do you have a tutorial for this? Or is it just using Y as numeric and including multilevel in block.pls or block.spls? Or is there another better way?
Thank you so much for your help!
Best wishes,
Stef
While waiting for your response, I thought I’d try to run an mDIABLO with a binary outcome, i.e. responders vs non-responders, but I am unsure how to account for the repeated measure, as there is no multilevel
parameter in block.pls. I thus tried the following
```{r}
#multilevel is not a parameter in block.plsda, only in pca, but it can be calculated manually:
design <- data.frame(sample = meta$PatientGroup) # set multilevel design using sample IDs for each instance
taxa_w <- withinVariation(taxa, design) # decompose the dataframe
metab_w <- withinVariation(metab, design)
# merge data and name each data frame
X <- list(taxa = taxa_w,
metab = metab_w)
Y=as.factor(meta$Epilepsy)
rownames(X$metab)
rownames(X$taxa)
summary(Y)
I am getting `Splitting the variation for 1 level factor.
Splitting the variation for 1 level factor.` Not sure what this means, but when I continue with:
pls.res = mixOmics::pls(X$metab, X$taxa, ncomp = 1)
cor(pls.res$variates$X, pls.res$variates$Y)
comp1
comp1 0.92
MyDesignPLSfull <- matrix(c(0, 0.92, 1,
0.92, 0, 1,
1,1,0),
byrow=TRUE,
ncol = 3, nrow = 3)
colnames(MyDesignPLSfull) = c("taxa","metab","Y")
rownames(MyDesignPLSfull) = c("taxa","metab","Y")
MyDesignPLSfull
MyResult.diabloPLSfull <- block.plsda(X, Y, ncomp=7), design=MyDesignPLSfull)
I get en error:
Error in if (diff.value < tol | iter > max.iter) break :
missing value where TRUE/FALSE needed
How can I fix this?
Thanks so much for your help!
/Stef
Hi, I am still stuck here. Could someone please advice?
I wonder whether it is at all possible to run DIABLO with a numeric Y. Have not seen any tutorial on that. SO just a quick response on yes or no on that would be great and if yes, is it just the same way as DIABLO with a discriminant outcome or is there something I need to do differently? If so, is there a tutorial? I very much appreciate your help
/Stef
hi @stepra,
Apologies, urgency is not really something we can address given the smallness of the team!
You are doing things correctly, as far as I can tell. DIABLO is a supervised method so it should handle a Y factor. If you have a numeric, it will consider it as a factor nonetheless. Otherwise you need to use a block.pls() with Y as numeric but there are not many functions associated to this method (e.g. tune, perf etc, we are still working on it). Here might be a far off example (we put Y = transcriptomics): Multiblock sPLS Gastrulation (Single Cell) Case Study | mixOmics
Check first that you have CLR transformed your taxa proportions (using the logratio.transfo() function with a small offset if necessary).
It’s crashing I think because you have too many components. With 7 your matrices are running on empty after a while.
Kim-Anh