hi @emccallum
Thanks for following up on this. I posted the formula of the VIP in this post:
Why are loadings more informative, especially for the sparse method?
because the way we select the variables are based on the loading vectors, not the VIP.
When you write that “you tested previously and said that some of the selected variables that are least important lead to a VIP < 1”…isn’t that ideal? if they are not as important, they should get a lower VIP score? Or would you optimally expect all selected variables to have VIP scores higher than 1?
You want a VIP > 1 to define an important variable (according to the VIP), so if the selected variables have a VIP < 1, then either the VIP is not adapted, or the selection is not optimal!
The way we select the variables is sPLS/DA and the way the VIP is defined are not completely aligned in their purpose. VIP is based on the amount of total variance in Y. sPLS/DA is based on the maximisation of the covariance between linear combinations of the datasets or components.
Hope that helps,
Kim-Anh