I am new to mixOmics and first of all, thank you so much for this package, the information, the example and the community. It is so helpful and quite impressive.
I am trying to integrate two dataset : proteomics and RNAseq with a total of 18 samples (9 individuals, in 2 conditions)
- Protemics : 18 x 572
- RNAseq : 18 x 16 793
I have the following error with the perf function to tune the spls model :
Error in Ypred[omit, , h] <- Y.hat[, , 1] :
number of items to replace is not a multiple of replacement length
I understand that Ypred is trying to used Y.hat values, but is there a way to compute this perf ?
I tried to look at Ypred and Y.hat but I am not able to locate these results.
Thank you so much
I am a user of the mixOmics package but I am not part of the team, so be aware that my answer may not be the right one.
In my experience, that error may be related to the dimensions. Although it says you have 18 samples in both datasets, maybe I would explore if they are exactly the same or something like that.
Another tip is to try to filter the RNAseq data a bit more because with 16,000 genes the model can go crazy (that’s what I would do based on my experience).
Hope it works! Good luck!
Welcome! and I agree with @Margonmon, let’s start simple.
Only keep about 5,000 most variable genes to start with, and inspect your proteomics data, for example with the nearZeroVar() function. It could be during the cross validation process that for a particular fold the proteins have a 0 (or missing? values) across all samples. Decrease the fold value to 3 or even ‘loo’. Check if you dont have missing values.
Than you so much for the responses. It did help me to have a better understanding of the function. Now that I have kept less genes, I don’t get the error regarding the perf function. However, when I try to check the dimension of Ypred and Yhat, I only get NULL for both dimensions.
I looked online to see what it means, but did not find anything related to this issue.
Is there an explanation to the NULL dimension ?
Again, thank you so much for your time
You can only get the Y.pred from the predict() function, when you ask to predict on the test set, e.g.
X <- linnerud$exercise
Y <- linnerud$physiological
linn.pls <- pls(X, Y, ncomp = 2, mode = "classic")
indiv1 <- c(200, 40, 60)
indiv2 <- c(190, 45, 45)
newdata <- rbind(indiv1, indiv2)
colnames(newdata) <- colnames(X)
pred <- predict(linn.pls, newdata)
The perf() function does cross-validation * nrepeats, and we chose not to output every prediction because it can be quite big. Do you really need this output in your case?