PLS-da on DNA-methylation data

Hi

I recently discovered mixOmics and have tried to use PLS-da on my data set consisting of methylation percentages for cytosine sites among different groups, so what I have is a matrix of 27 samples and 440 000 sites.

The initial output of PLSDA() works fine, but when I try to do assess performance with perf(), I get the error message:
Error in solve.default(Sr) : **
** system is computationally singular: reciprocal condition number = 8.57685e-18

I initialy thought, by reading some comments here, that I used to many components in the initial PLSDA()-call, but I also read that to similar values of the different variables(?) might cause this error message to occur. I know from PCA that there is very little variation between the different groups in my data, meaning that the percentage-values might be very similar for each variable across all samples.

Could this be the reason I get this error message or is there something else I need to consider?

Thank you,

Best,
Line

Dear @lieblein,

We usually filter out the number of sites because of the sheer number, and the fact that most have near zero variance. In the perf() function, we also have a nearzerovar() step to try remove some sites which happen to have zero variance across the CV folds. The issue you face can be explained by:

  • Too many components, the residual matrices run on empty after a few components

  • Too many predictors from the start, as highlighted above, even with an internal prefiltering step, it breaks down

  • Not enough samples in the test set or in the training set during cross validation (i.e. M value in perf() is too large or too small)

I think you can attend those items, it may help.

Kim-Anh

Thank you so much for your help. I´ll try it out.

Best, Line