How to get regression coefficients by using function sPLS

Hi Every one

First, I apologize if any of my doubts sound like baby questions. I’m new in this field. Currently I’m trying to use spls to make models using spectroscopic data gathered using LIBS. My goal is to make a multi element concentration regression. I have 300 spectra out of 5 different steel alloys. Each spectra has 6000 predictors (Wavelengths). I want to make predictions about the concentration of 5 elements. So, my X matrix is 1500x6000 and my Y matrix is 1500x5. My plan is to use SPLS to make features selection (could be regression?). I made a first try approaching this problem as a classification problem, just to warm up and get insights about possibly important predictors. I got really good results. Now, as said above, I’m going further to make regression on elemental composition. Signals from some elements are easier than others to detect and spectral data is full of noise.

My question is about the output of pls() and the predict method.

When I use predict on pls() output, I get predictions for the 5 elements I’m modeling but got it for each component. I have for example 10 components so I have 10 different predictions for the Yi column. Should I make the final linear model between Y and Latent components or is this result in the pls output? Sorry, I’m reading the documentation and papers about the method, but I’m confused by terminology.

Could I have a quick explanation about the elements “predict”, “variate” and “B.hat”?

Thank you!

I have for example 10 components so I have 10 different predictions for the Yi column

This is what you want to see, such that each column represents the predicts made by a model with upto that many components. In other words, the first column are predictions from a 1 component model, the second column are those from a 2 component model, etc.

Should I make the final linear model between Y and Latent components or is this result in the pls output?

I’m not quite sure what you’re trying to say here sorry. You model is essentially two sets of latent components, as there are components made from the Y dataframe as well as the X dataframe. The spls() function returns all the relevant information for that model.

“predict”, “variate” and “B.hat”

For clarification, look at our website. This page, this page and this page might be of assistance.

  • $predict: The values generated by your model (model.spls) when it was provided your testing data (X.test).
  • $variates: The projection of the predicted values ($predict) onto the components found in your model (model.spls). Refer to the third link I sent for explanation about these terms if you’re unclear
  • $B.hat: as part of the model building process, we iteratively adjust the weights associated with each input feature using a “regression coefficient”. B.hat can be thought of as the final values of these coefficients.
1 Like

Many thanks @MaxBladen!! Your answer was exactly what I needed