PLS and DIABLO tuning

EiriniP · November 1, 2021, 2:56pm

Hi,

I will ask couple of questions as they are all regarding tuning of ncomp and keepx parameters in DIABLO and pls.

I am running pls and when I want to tune for optimal ncomp selection I get this error:
Error in Ypred[omit, , h] ← Y.hat[, , 1] :
number of items to replace is not a multiple of replacement length

dim(X); dim(Y)
[1] 24 1943
[1] 24 158
#First a PLS with sufficient components and then we will validate the ncomp
MyResult.pls1 ← pls(Y,X, ncomp = 4)
View(MyResult.pls1)
set.seed(30) # for reproducbility in this vignette, otherwise increase nrepeat
perf.pls ← perf(MyResult.pls1, validation = “Mfold”, folds = 4,

             progressBar = FALSE, nrepeat = 10)

Error in Ypred[omit, , h] ← Y.hat[, , 1] : **
** number of items to replace is not a multiple of replacement length

I get an error when performing DIABLO, which I was not getting two days ago when I rerun the analysis

X ← list(volatiles = data_gcms_P[c(1:19),c(7:164)],

      nonvolatilesNEG = data_neg_P[c(1:19), c(7:1949)],

      nonvolatilesPOS = data_pos_P[c(1:19), c(7:1256)])

type<-data_neg_P[c(1:24),]
Subtype ← as.vector(type$Experiment)
Subtype<-as.factor(Subtype)
Y ← Subtype[c(1:19)]
summary(Y)
cooked raw
10 9
#set up arbitrarily the number of variables keepX that we wish to select in each data set and each component.
list.keepX ← list(volatiles = c(15, 5), nonvolatilesNEG = c(20,15), nonvolatilesPOS = c(10,5))
MyResult.diablo.less ← block.splsda(X, Y, keepX=list.keepX, ncomp=2) #default: ncomp=2, scale=T, mode=regression
Warning messages:
1: In cor(A[[k]], variates.A[[k]]) : the standard deviation is zero
2: In cor(A[[k]], variates.A[[k]]) : the standard deviation is zero
#various plots
plotIndiv(MyResult.diablo.less) ## sample plot
plotVar(MyResult.diablo.less) ## variable plot
Warning messages:
1: In cor(object$blocks[], object$variates[][, c(comp1, comp2)], :
the standard deviation is zero
2: In cor(object$blocks[], object$variates[][, c(comp1, comp2)], :
the standard deviation is zero
#CV
MyPerf.diablo ← perf(MyResult.diablo.less, validation = ‘Mfold’, folds = 3,

```
                  nrepeat = 50, 
```

                  dist = 'centroids.dist')

Error: Unexpected error while trying to choose the optimum number of components. Please check the inputs and if problem persists submit an issue to Issues · mixOmicsTeam/mixOmics · GitHub

When tuning for the keepx, in your case script you have:
test.keepX = list (datasetA = c(5:9, seq(10, 18, 2), seq(20,30,5)),
datasetB = c(5:9, seq(10, 18, 2), seq(20,30,5)),
datasetC = c(5:9, seq(10, 18, 2), seq(20,30,5)))
How do I choose these lists? and what is the impact of them? Does it mean that not all variables are checked for the model?

Thank you very much in advance for your help and my compliments for your work!

Leandro · March 21, 2022, 4:39pm

Hi EiriniP,

I’m getting the same error (" Ypred[omit, , h] ← Y.hat[, , 1] : number of items to replace is not a multiple of replacement length ") … have you already resolved this problem?

MaxBladen · March 21, 2022, 10:39pm

In regards to the first error when using the perf() function on a pls object, even with data of the same dimensions and using your exact code, I am unable to replicate your error.

If you are familiar with the use of breakpoints in RStudio, I would advise placing on at line 542 of the perf() function. Examine the dimensions of the Ypred and Y.hat variables. If this does not provide a clear answer to your issue, feel free to let me know what your email is. I can then reach out to you regarding your data and code and we can work through the issue.

MaxBladen · March 27, 2022, 11:43pm

Hi @Leandro and @EiriniP,

In regards to the issue of perf() not functioning on your pls objects, I’ve raised an error on the Github and implemented a fix for it. If you are wanting to use this build to work around the bug you reported, simply install the devtools library and run the following commands:

library(devtools)
install_github("mixOmicsTeam/mixOmics", ref = github_pull("197"))

If you are wanting to revert back to the standard release, navigate to your library folder within the R install directory, delete the mixOmics folder and then run the following line within RStudio:

BiocManager::install("mixOmics")

Let me know if this fixes your issue!

Cheers,
Max.

EiriniP · March 28, 2022, 6:57am

Hi Max,

First, sorry I did not reply to your previous message. It has been long ago since I posted my question and eventually as I hadn’t managed to work around it, I used other package…

Second, Thank you very very much for fixing the error. I will do as you say and definitely use mixOmics for my current analyses!

Have a great day

Eirini

~WRD0000.jpg

Leandro · March 28, 2022, 9:48am

So the problem was the zero variance features… got it! Anyway, thank you for your work!

Leandro

MaxBladen · March 31, 2022, 12:41am

As a suggestion for you both (as well as anyone else reading this post), the pre-processing of your data is extremely crucial (potentially where more time should be spent when compared to the actual analysis). You should have a strong understanding of how your features are related (and correlated), how many and where missing values are and what features have little to no variance.

There are no hard and fast rules as to whether these features should be retained or removed prior to analysis, but exploring these things is of paramount consequence.

estefaniatn · February 1, 2023, 5:03pm

Hello, I saw that this question (which is exactly my same question) was not responded.

On the website guide how these lists are chosen is also not explained. I would like to know how I should do that. Thanks in advance.

kimanh.lecao · February 23, 2023, 11:57pm

Hi @estefaniatn @EiriniP,

Does it mean that not all variables are checked for the model?

No, it means that only the top ones (e.g. 5) are selected during the evaluation.

How do I choose these lists?

You could be comprehensive and try something like:
datasetA = c(1:ncol(datasetA))
but of course this would take for ages to run. So instead you have to be strategic and think of - what is the minimum and maximum number of variables per dataset you need for interpretation, per component? 1, 5 or 20?

You could try a few options first and then refine, in order to reduce the computational burden.

Kim-Anh

ada · December 11, 2024, 1:38am

Dear All,
“Unexpected error while trying to choose the optimum number of components. Please check the inputs and if problem persists submit an issue to Issues · mixOmicsTeam/mixOmics · GitHub ”，Why does it appear in some random seeds but pass smoothly in other random seeds？

kimanh.lecao · December 19, 2024, 9:41pm

hi @ada,

It would depend on the method you are trying to tune, how the folds are randomly assigned and your total number of samples. Try a larger M-fold value, or leave-one-out if n<10.

Kim-Anh

ada · March 3, 2025, 8:47am

I have tried 5-fold, 10-fold, and 15-fold cross-validation. My sample size is around 700. However, I still encounter errors when the random seed is set to 41. The other random seeds can produce results normally.
Error: Unexpected error while trying to choose the optimum number of components. Please check the inputs and if problem persists submit an issue to GitHub · Where software is built.

Is it required that the input data must be positive values?

perf.diablo = perf(basic.diablo.model, validation = ‘Mfold’,
folds = 10, nrepeat = 10) #

Topic		Replies	Views
DIABLO of selected variables from tuned sPLS-DA Analysis	4	1309	October 18, 2020
I Have a problem/error with tune.block.splsda Support	5	1253	July 27, 2022
Error when trying to tune number of features Support	1	29	December 8, 2024
Ncomp settings in block.splsda function Support	1	525	October 21, 2019
Using keep.X from separate sPLS-DA analyses for Diablo Analysis	3	958	October 8, 2020

PLS and DIABLO tuning

Related topics