PLS and DIABLO tuning

Hi,

I will ask couple of questions as they are all regarding tuning of ncomp and keepx parameters in DIABLO and pls.

  1. I am running pls and when I want to tune for optimal ncomp selection I get this error:
    Error in Ypred[omit, , h] ← Y.hat[, , 1] :
    number of items to replace is not a multiple of replacement length

dim(X); dim(Y)
[1] 24 1943
[1] 24 158
#First a PLS with sufficient components and then we will validate the ncomp
MyResult.pls1 ← pls(Y,X, ncomp = 4)
View(MyResult.pls1)
set.seed(30) # for reproducbility in this vignette, otherwise increase nrepeat
perf.pls ← perf(MyResult.pls1, validation = “Mfold”, folds = 4,

  •              progressBar = FALSE, nrepeat = 10)
    

Error in Ypred[omit, , h] ← Y.hat[, , 1] : **
** number of items to replace is not a multiple of replacement length

  1. I get an error when performing DIABLO, which I was not getting two days ago when I rerun the analysis

X ← list(volatiles = data_gcms_P[c(1:19),c(7:164)],

  •       nonvolatilesNEG = data_neg_P[c(1:19), c(7:1949)], 
    
  •       nonvolatilesPOS = data_pos_P[c(1:19), c(7:1256)])
    

type<-data_neg_P[c(1:24),]
Subtype ← as.vector(type$Experiment)
Subtype<-as.factor(Subtype)
Y ← Subtype[c(1:19)]
summary(Y)
cooked raw
10 9
#set up arbitrarily the number of variables keepX that we wish to select in each data set and each component.
list.keepX ← list(volatiles = c(15, 5), nonvolatilesNEG = c(20,15), nonvolatilesPOS = c(10,5))
MyResult.diablo.less ← block.splsda(X, Y, keepX=list.keepX, ncomp=2) #default: ncomp=2, scale=T, mode=regression
Warning messages:
1: In cor(A[[k]], variates.A[[k]]) : the standard deviation is zero
2: In cor(A[[k]], variates.A[[k]]) : the standard deviation is zero
#various plots
plotIndiv(MyResult.diablo.less) ## sample plot
plotVar(MyResult.diablo.less) ## variable plot
Warning messages:
1: In cor(object$blocks[[x]], object$variates[[x]][, c(comp1, comp2)], :
the standard deviation is zero
2: In cor(object$blocks[[x]], object$variates[[x]][, c(comp1, comp2)], :
the standard deviation is zero
#CV
MyPerf.diablo ← perf(MyResult.diablo.less, validation = ‘Mfold’, folds = 3,

  •                   nrepeat = 50, 
    
  •                   dist = 'centroids.dist')
    

Error: Unexpected error while trying to choose the optimum number of components. Please check the inputs and if problem persists submit an issue to Issues · mixOmicsTeam/mixOmics · GitHub

  1. When tuning for the keepx, in your case script you have:
    test.keepX = list (datasetA = c(5:9, seq(10, 18, 2), seq(20,30,5)),
    datasetB = c(5:9, seq(10, 18, 2), seq(20,30,5)),
    datasetC = c(5:9, seq(10, 18, 2), seq(20,30,5)))
    How do I choose these lists? and what is the impact of them? Does it mean that not all variables are checked for the model?

Thank you very much in advance for your help and my compliments for your work!