PLS regression coefficients

Hi all,

I’m switching over from {pls} to {mixOmics} and am having a hard time completing the transition - specifically, difficulty obtaining regression coefficients.

In {pls} I would do the following:


mod1 <- pls::plsr(mpg ~ .,
                  data = mtcars,
                  
                  ncomp = 3,
                  scale = TRUE,
                  center = TRUE)

coef(mod1)
#> , , 3 comps
#> 
#>             mpg
#> cyl  -0.6315124
#> disp -0.6633487
#> hp   -0.9388964
#> drat  0.5148309
#> wt   -1.5341138
#> qsec  0.1051610
#> vs    0.1929931
#> am    1.0338038
#> gear  0.2979549
#> carb -1.3102833
#> 

How do I get the vector of coefficients in {mixOmics}?

mod2 <- mixOmics::spls(X = mtcars[,-1], Y = mtcars[,1], 
                       
                       ncomp = 3,
                       scale = TRUE)

Just to demonstrate that these models are equivalent - they give identical predictions(which is why I expect them to give the same coefficients):

predict(mod1, newdata = mtcars[1:2,])[,,3]
#>     Mazda RX4 Mazda RX4 Wag 
#>      22.41019      22.04333
predict(mod2, newdata = mtcars[1:2,-1])$predict[,,3]
#>     Mazda RX4 Mazda RX4 Wag 
#>      22.41019      22.04333

Okay, I got this working by using the B.hat values returned by predict(), but this feels like there should be an easier way to get these:

mod1 <- pls::plsr(mpg ~ .,
                  data = mtcars,
                  
                  ncomp = 3,
                  scale = TRUE,
                  center = TRUE)

mod2 <- mixOmics::spls(X = mtcars[,-1], Y = mtcars[,1], 
                       
                       ncomp = 3,
                       scale = TRUE)

coef.mixo_spls <- function(x) {
  sd_y <- attr(x$Y, "scaled:scale")
  
  pr <- predict(x, newdata = x$X[1,,drop = FALSE])
  
  B.hat <- pr$B.hat
  B.hat[,,dim(B.hat)[3]] * sd_y  
}

cbind(
  "pls::plsr" = coef(mod1),
  "mixOmics::spls" = coef(mod2)
)
#>       pls::plsr mixOmics::spls
#> cyl  -0.6315124     -0.6315124
#> disp -0.6633487     -0.6633487
#> hp   -0.9388964     -0.9388964
#> drat  0.5148309      0.5148309
#> wt   -1.5341138     -1.5341138
#> qsec  0.1051610      0.1051610
#> vs    0.1929931      0.1929931
#> am    1.0338038      1.0338038
#> gear  0.2979549      0.2979549
#> carb -1.3102833     -1.3102833

hi @mattansb,

Glad you sorted it out. It’s because we differentiate the training vs the prediction in our pls function (i.e we don’t ‘re’ fit on the data as the coefficients we are mostly interested in are the loading coefficients for variable selection, whereas the coefficients you are interested in are only relevant in our case for prediction, on a new data set (the B hat).

We’ll keep it in mind for later.

Kim-Anh

Thanks @kimanh.lecao!
Just to make sure thought - B.hat will not change between predictions? It’s generated at the prediction stage, but it is not a function of newdata, right?