I had a general question regarding the theory behind using VIP vs. loadings when explaining/reporting models done by sPLS, sPLS-DA, and DIABLO.
It is my understanding from past posts and documentation that VIPs are mainly used only in PLS and sPLS models and loadings can be used in models using discriminant analyses. If someone could please help explain which would be the best in extracting the most “important” variables from the model I would truly appreciate it!
You can refer to this post here where you can access the formulae: Variance of the VIP statistics from PLS
The VIP measures the importance of the variables to explain the variance related to the predictors (typically the X data set), whereas the loading weights are the direct regression coefficients used to define the latent components (to maximise the covariance between data sets X and Y).
We do not consider PLS-DA as a ‘regression’ method given that it performs classification, but the VIP is still implemented for this method. We typically use the loading vectors, especially when using a sparse method where variable selection is performed. (I did some test previously and some selected variables that are the least important may lead to a VIP <1).
“We typically use the loading vectors, especially when using a sparse method where variable selection is performed. (I did some test previously and some selected variables that are the least important may lead to a VIP <1).”
I was wondering you could elaborate on what you mean here. Why are loadings more informative, especially for the sparse method?
When you write that “you tested previously and said that some of the selected variables that are least important lead to a VIP < 1”…isn’t that ideal? if they are not as important, they should get a lower VIP score? Or would you optimally expect all selected variables to have VIP scores higher than 1?
Thanks for following up on this. I posted the formula of the VIP in this post:
Why are loadings more informative, especially for the sparse method?
because the way we select the variables are based on the loading vectors, not the VIP.
When you write that “you tested previously and said that some of the selected variables that are least important lead to a VIP < 1”…isn’t that ideal? if they are not as important, they should get a lower VIP score? Or would you optimally expect all selected variables to have VIP scores higher than 1?
You want a VIP > 1 to define an important variable (according to the VIP), so if the selected variables have a VIP < 1, then either the VIP is not adapted, or the selection is not optimal!
The way we select the variables is sPLS/DA and the way the VIP is defined are not completely aligned in their purpose. VIP is based on the amount of total variance in Y. sPLS/DA is based on the maximisation of the covariance between linear combinations of the datasets or components.
Thank you for the very informative reply! I really appreciate it.
Your responses also makes sense in the context of my data. I have very messy signatures with lots of overlap amongst 3 of my 4 groups (3 have been exposed to something, 1 is a control). Of the selected variables on each component, only about half of them have VIP scores > 1.
Thanks again for taking the time to explain.
-Erin