Error in tune.block.splsda(): "system is exactly singular" (collinearity issue?)

Hi everyone,

I’m encountering an error while running tune.block.splsda() on my dataset:
tune.TCGA ← tune.block.splsda(X = data_list, Y = Y, ncomp = ncomp,
test.keepX = test.keepX, design = matdis,
validation = “Mfold”, folds = 10, nrepeat = 1,
dist = “centroids.dist”, progressBar = TRUE,
near.zero.var = TRUE)

Erreur : BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in solve.default(t(Pmat) %*% Wmat): Routine Lapack dgesv : the system is exactly singular: U[2,2] = 0

I suspect this issue is due to collinearity among covariates in my dataset, as I don’t observe much variation in my features at this time point. However, when running the same model at a later time point, the error disappears.

Has anyone encountered this problem before? Any suggestions on how to handle it?

Thanks in advance for your help!

Hi @gd_remesha,

I think you’re right that you are encountering a collinearity issue, but I’ve noticed that you have a very high fold number (10), so this may be splitting your data into too many small pieces for cross-validation. We advise choosing folds that contain at least 5-6 samples in each (e.g. if you have 20 samples do not choose more than 4 folds). Perhaps try reducing your fold number and see if that solves your issue?

You can use this webpage to read more details about performance assessment parameters and how to choose them.

Hope that helps!
Cheers,
Eva

Thank you @evahamrud
But even if i replace the folds(e.g folds = 2 or 3 and increase nrpeat = 10) i encountering the same error

Y
 [1] 9_MS     9_MS     9_BF     9_BF     9_BFcaps 9_BFcaps 9_MS    
 [8] 9_MS     9_BF     9_BF     9_BFcaps 9_BFcaps 9_MS     9_BF    
[15] 9_BF     9_BFcaps 9_MS     9_MS     9_BF     9_BFcaps 9_BFcaps
[22] 9_MS     9_MS     9_BF     9_BF     9_BFcaps 9_MS     9_MS    
[29] 9_BFcaps 9_BFcaps 9_MS     9_MS     9_BF     9_BF     9_BFcaps
[36] 9_BFcaps 9_MS     9_BF     9_BF    
Levels: 9_BF 9_BFcaps 9_MS
> table(Y)
Y
    9_BF 9_BFcaps     9_MS 
      13       12       14 
> lapply(data_list, dim)
$Microbiote
[1] 39 14

$Metabolome
[1]  39 202

$Transcriptome
[1]   39 9607

> # set grid of values for each component to test
> test.keepX = list (Metabolome = c(5:9, seq(10, 18, 2), seq(20,30,5)),  # → On teste 10 et 70 variables pour le métabolome(Tester plusieurs tailles intermédiaires).
+                    Microbiote = c(8, 10, 12),  # → On teste 10 et 14 variables pour le microbiote(presque toutes les variables)
+                    Transcriptome = c(5:9, seq(10, 18, 2), seq(20,30,5)))  # → On teste 10 et 70 variables pour le transcriptome
> 
> BPPARAM <- BiocParallel::SnowParam(workers = parallel::detectCores()-1)
> 
> 
> 
> # run the feature selection tuning
> tune.TCGA <- tune.block.splsda(X = data_list, Y = Y, ncomp = ncomp, 
+                               test.keepX = test.keepX, design = matdis,
+                               validation = 'Mfold', folds = 2, nrepeat
+                               =1,dist = "max.dist", progressBar = TRUE,
+                               near.zero.var = TRUE, BPPARAM = BPPARAM)
Design matrix has changed to include Y; each block will be
            linked to Y.

You have provided a sequence of keepX of length: 13 for block Metabolome and  3 for block Microbiote and 13 for block Transcriptome.
This results in 507 models being fitted for each component and each nrepeat, this may take some time to run, be patient!

As code is running in parallel, the progressBar is not available.
Avis : Zero- or near-zero variance predictors.
 Reset predictors matrix to not near-zero variance predictors.
 See $nzv for problematic predictors.
tuning component 1
  |                                                           |   0%Avis : 'package:stats' peut-être indisponible lors du chargementErreur : BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in solve.default(t(Pmat) %*% Wmat): Routine Lapack dgesv : le système est exactement singulier : U[2,2] = 0 "

Hm I see it is complaining about “zero- or near-zero variance predictors” despite the fact that you have set near.zero.var = TRUE in your tuning run, somehow they are still getting through.

Could you try removing the zero variance variables manually before tuning? You can use code like this:

# Identify which variables have 0 variance
nzv <- nearZeroVar(data)
print(length(nzv$Position))
# Remove those variables
data <- data[, -nzv$Position, drop=FALSE]

And you should run it on all of your datasets: metabolome, microbiome and transcriptome. Could you also run:

nzv <- nearZeroVar(data)
print(length(nzv$Position))

again after the code above just to double check the zero variance variables have been filtered out.

Then try re-running your tuning (maybe do less test.keepX elements to speed things up) and see if that fixes the issue.

Cheers,
Eva