Error for perf.plsda

Hello MixOmics team

I am using perf to test the performance of plsda for my data, but it shows the following errors:
the basic information is

  1. raw peak areas on GC-TOF-MS data without normilzation for plsda analysis
  2. 15 samples with each 1825 features after mzMine treatment
    perf.TR.plsda <- perf(plsda.TR,
  •                   validation = "Mfold", 
    
  •                   folds = 5)
    

Error in solve.default(Sr) :
system is computationally singular: reciprocal condition number = 3.94774e-17
In addition: Warning messages:
1: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
2: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
3: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
4: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
5: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
6: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
7: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
8: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4
9: In MCVfold.spls(X, Y, multilevel = multilevel, validation = validation, :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 4

I am looking forward to your reply. Thanks in advance!
Best wishes,
Weichao Wu

Hi Weichao,

Thanks for using mixOmics.

It might be the case that there are highly correlated variables in the data. Have you considered the sparse model (splsda) which performs variable selection and remedies this issue?

Also, as you can see in warning messages, there is one class that has only 4 samples in it (minimum in table(Y): 4), so I recommend you use a lower folds argument (4 or 3), or use validation = 'loo'

Let us know if you how you go.

Best wishes,

Al

Hi mixOmics Team,

I ran into a similar problem as mentioned by this thread with a data set of 4 variables:
lapply(diablo.tt,dim)
$rna
[1] 4 5898

$proteomics
[1] 4 4472

$metabolomics
[1] 4 128

I tried lowering the folds argument & validation = ‘loo’ and ran into this error. Could you help out?

lower fold error:

perf.diablo = perf(sgccda.res.tt, validation = ‘Mfold’, fold = 3, nrepeats = 50)
Error in if (diff.value < tol | iter > max.iter) break :
missing value where TRUE/FALSE needed
In addition: Warning message:
In repeat_cv_perf.diablo(nrep) :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 1

validation ‘loo’ error:

perf.diablo = perf(sgccda.res.tt, validation = ‘loo’)
Error in solve.default(t(Pmat[, 1:x]) %*% Wmat[, 1:x]) :
system is computationally singular: reciprocal condition number = 4.50056e-32

Thank you & looking forward to hearing from you!

Warm wishes,
Thi

1 Like

hi @thi.tran01,

Unfortunately 4 is a very low number for samples so we do not guarantee our methods would work in any capacity on your data.

Al