Hello,
Firstly, thanks for developing the useful mixomics package!
I am writing to ask about two problems when I was trying to do the sPLSDA analysis. Basically I have two datasets, both of which I am using the first 1440 columns as X and the last column as Y. For the first one, I have some empty predicted values when Y should only have two classes; for the second one, the error message keeps popping up when I am trying to tune the model (in the tune.splsda).
I would really appreciate your help and suggestions in fixing this! Thanks in advance and take care.
Best,
Kathleen
Hi Kathleen,
thanks for sharing your data offline.
The main issue you are facing here is the number of zeroes. Basically your first data set is running on empty (you have 76% zero values). By chance during the cross-validation you may end up with a training data set with 0 values and so the PLS-DA struggles to predict anything.
For both cases, note that you have not removed the Y variable in your X data set so you are also overfitting (the PLS-DA will use that information twice to explain Y). This was clear in dataset 2 where the error rate went down to 0 in the second component (not the case once I remove the Y variable:
X2.test <- data2[-index2, -1441]
X2.train <- data2[index2,-1441]
The errors on the tune do not pop out on my end, consider updating the new version of the package, either from GitHub or on BioC (new release due in a few days).
Hope that helps,
Kim-Anh
2 Likes