Tuning the number of components for block.spls using test datasets

gdrd · July 7, 2021, 1:46pm

Hello everyone,

I am currently working on a project and I want to regress two continuous variables from several omics data. The use of a multi-block PLS in a sparse framework seems very appropriate. However, as I cannot tune the number of components, I have been looking for a way to justify it and that is what my post is about.

I’ve set aside 20 individuals before building my model, and the idea would be to predict the values of these 20 individuals from k components. The final goal would be to keep only a reasonable number of components minimizing the RMSE (Root Mean Square Error) for these 20 individuals, i.e. reject all the components that do not minimize the RMSE. I then obtain this kind of graphs for one of my two Y variables:

Looking at this graph, I would tend to select 13 components, as the following ones do not reduce the RMSE enough. I wanted to know if this type of methodology could be applied to select components for a block.spls. It would also be possible to implement cross-validation to make this selection more robust. What do you think? Does this methodology seem appropriate to justify the number of components? Is 13 components too high?

This is a part of my R code :

k<-30 #number of components
Y1 <- rep(NA,k) #first Y variable
Y2 <- rep(NA,k) #second Y variable
MyResult.diablo <- block.spls(X, Y, keepX=list.keepX, ncomp=k, design = MyDesign,
                              mode="regression") #build a model
Mypredict.diablo <- predict(MyResult.diablo, newdata = X.test, dist = "centroid") #test with 20 ind
mypred <- Mypredict.diablo$WeightedPredict #get the weighted predictions

# a look for each k
for(i in 1:k){
  #dim1
  mypreddim <- mypred[,,i]
  mypred1 <- mypreddim[,1]*sd1+m1 # de-scale (since Y1 has been scaled) : for comparisons
  mypred2 <- mypreddim[,2]*sd2+m2 # de-scale (since Y2 has been scaled) : for comparisons
  Resp1[i] <- sqrt(mean((Y.test[,1] - mypred1)**2)) # RMSE for 1st Y var
  Resp2[i] <- sqrt(mean((Y.test[,2] - mypred2)**2)) # RMSE for 2nd Y var
}
# getting plots
plot(1:k,Resp1,xlab="Number of components",ylab="RMSE",
     main="Prediction : 20 individuals",pch=16)
plot(1:k,Resp2,xlab="Number of components",ylab="RMSE",
     main="Prediction : 20 individuals",pch=16)

Thank you in advance for your advice. I would like to take this opportunity to congratulate you on this package, which is very ergonomic.
Sincerely

kimanh.lecao · July 8, 2021, 4:22am

hi again @gdrd,

I think 13 components is too large a number for what you are trying to achieve with the block.spls! Presumably you would not need more than 1 or 2 components to explain those two variables.

What we have proposed for a classic PLS2 model (2 blocks) is to use the Q2 criterion, which is a bit more global than looking a the RMSE. Potentially it could be extended to a block.spls but it would require a bit of implementation (spoiler: at that stage the code is not great to understand in the package!).

Here are some details about the Q2.

Can you send me an email and we can have this discussion offline until we find a workable solution?

Kim-Anh

Serena · November 11, 2021, 4:47pm

Hi @kimanh.lecao and @gdrd,

Indeed I found very interesting and valuable this discussion. Have you finally included a tuning procedure in either block.spls or wapper.sgcca models?

In a 3-blocks model, would it be correct/acceptable to tune the number of components in separated spls models? Or, based on your last advances on this package, what would you suggest?

Many thanks for your time.

Bests,

Serena

MarianaPLR · April 8, 2024, 12:00pm

Hello,

I was wondering if there is any update on the tuning for block.spls.

Thanks!
Mariana

kimanh.lecao · April 18, 2024, 11:28pm

hi @MarianaPLR

Still not! we need funding
We use block.spls recently in this paper: https://www.biorxiv.org/content/10.1101/2024.01.30.577864v1.full. We chose the number of variables to select arbitrarily, and we inspected the sample plots to choose the number of components that made sense (here 1).

(I think in general this is a hard methodological question, and I dont think we will solve it any time soon).

Topic		Replies	Views
Tune.block.spls? Analysis	6	1178	July 12, 2022
Block PLS: variance explained by components Analysis	2	402	September 12, 2023
sPLS choice of optimal number of components Analysis	4	1177	July 29, 2021
Questions on multiblock (s)PLS Analysis	3	198	December 7, 2023
Block sPLS: how to tune the parameters? Support	3	62	October 24, 2024

Tuning the number of components for block.spls using test datasets

Related topics