Hello everyone, I hope you are doing well.
I am having a problem making the choice of parameters for my Diablo analysis.
My data, from the same 21 rats are:
Microbiota seq 16S - 29 bacterial families.
Metabolon Feces - 575 metabolites
Plasma Metabolon - 635 metabolites.
I used these data as X in my model, while as Y, I used the data of whether these rats belonged to the Tg or WT group.
Previously I performed the perf function, which recommended me to select 2 components.
I did not run the model with 10 fold because when I ran the choice of components with 10 fold I got this error:
10: In repeat_cv_perf.diablo(nrep) :
At least one class is not represented in one fold, which may unbalance the error rate.
Consider a number of folds lower than the minimum in table(Y): 9
So I ran it with 9 initially, then with 5 fold.
Now, when I do the tune.block.splsda, I get errors that I did not know how to interpret, as soon as I found in the forum. I copy two different runs:
set.seed(123) # Forreproducibilitywiththishandbook,removeotherwise
test.keepX <- list(microbiota_clr = c(seq(2, 29, 4)),
metabolon_feces = c(5:10, seq(11, 575, 20)),
metabolon_plasma = c(5:10,seq(5, 635, 20)))
> tune.diablo.tcga <- tune.block.splsda(X, groups$GENOTYPE, ncomp = 2,
+ test.keepX = test.keepX, design = design,
+ validation = 'Mfold', folds = 5, nrepeat = 5,
+ dist = "centroids.dist")
Design matrix has changed to include Y; each block will be
linked to Y.
You have provided a sequence of keepX of length: 7 for block microbiota_clr and 35 for block metabolon_feces and 38 for block metabolon_plasma.
This results in 9310 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
You can look into the 'BPPARAM' argument to speed up computation time.
Error: BiocParallel errors
1 remote errors, element index: 1
4 unevaluated and other errors
first remote error: Lapack routine dgesv: system is exactly singular: U[2,2] = 0
> tune.diablo.tcga <- tune.block.splsda(X, Y, ncomp = 2,
+ test.keepX = test.keepX, design = design,
+ validation = 'Mfold', folds = 5, nrepeat = 1,
+ dist = "centroids.dist")
Design matrix has changed to include Y; each block will be
linked to Y.
You have provided a sequence of keepX of length: 7 for block microbiota_clr and 35 for block metabolon_feces and 38 for block metabolon_plasma.
This results in 9310 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
You can look into the 'BPPARAM' argument to speed up computation time.
Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error: Lapack routine dgesv: system is exactly singular: U[2,2] = 0
In addition: There were 13 warnings (use warnings() to see them)
If you have any suggestions, it would be very helpful. Thank you very much!