PLS-DA with missing '' values predicted in Y

mixOmics_user · April 26, 2020, 10:33pm

Hello,

Firstly, thanks for developing the useful mixomics package!

I am writing to ask about two problems when I was trying to do the sPLSDA analysis. Basically I have two datasets, both of which I am using the first 1440 columns as X and the last column as Y. For the first one, I have some empty predicted values when Y should only have two classes; for the second one, the error message keeps popping up when I am trying to tune the model (in the tune.splsda).

I would really appreciate your help and suggestions in fixing this! Thanks in advance and take care.

Best,

Kathleen

kimanh.lecao · April 26, 2020, 10:47pm

Hi Kathleen,

thanks for sharing your data offline.

The main issue you are facing here is the number of zeroes. Basically your first data set is running on empty (you have 76% zero values). By chance during the cross-validation you may end up with a training data set with 0 values and so the PLS-DA struggles to predict anything.

For both cases, note that you have not removed the Y variable in your X data set so you are also overfitting (the PLS-DA will use that information twice to explain Y). This was clear in dataset 2 where the error rate went down to 0 in the second component (not the case once I remove the Y variable:
X2.test <- data2[-index2, -1441]
X2.train <- data2[index2,-1441]

The errors on the tune do not pop out on my end, consider updating the new version of the package, either from GitHub or on BioC (new release due in a few days).

Hope that helps,
Kim-Anh

Topic		Replies	Views
Possible reasons for no prediction Analysis	1	438	April 20, 2020
Prediction PLS-DA Analysis	3	989	July 4, 2021
Help understanding high error rate using PLS-DA Analysis	6	3611	October 21, 2020
Difference between PLS-DA and sPLS-DA Analysis	3	4069	December 21, 2020
`plsda`: NA values in Y data	2	184	May 2, 2023

PLS-DA with missing '' values predicted in Y

Related topics