VIP score and PLS regression coefficient

uc_55 · October 5, 2019, 12:46am

Dear mixomics users,

I want to obtain the VIP score from a PLS model. The vip function from the package returns a VIP coefficient for each predictor variable and for each PLS component. However, it is my understanding that this coefficient summarizes the importance of the predictor variables across the PLS components. In other words, the VIP matrix returned by the function should be of dimension p x 1 even for ncomp>1. I provide the reference describing how to obtain the VIP score.

https://www.sciencedirect.com/science/article/abs/pii/S0169743912001542

In addition, I would like to know how to obtain the matrix of PLS regression coefficients as I understand this is not a direct output from any function.

Best wishes

uc_55 · October 18, 2019, 7:13am

Hello all,
Can somebody help me with this query please?
I thought that the advantage of the VIP score over the loading coefficients was that the former describes the importance of the predictor variables across all PLS components while the later is the weight associated to the predictor variables for each component.
Can somebody explain me why the output from the vip function of the package provides a VIP score for each component?

Many thanks in advance.

kimanh.lecao · October 21, 2019, 1:26am

Hi @uc_55,
Thank you for your question (and your patience)

Regarding the VIP question:
We actually do not use the VIP much as it is very specific to a PLS regression mode model.

Short answer: we calculate the VIP per component and this calculation includes the loading vectors from X. We use the formula similar to what is proposed by SIMCA-P / Tenenhaus book (see details below). The idea is that, because components are orthogonal, one variable might be influential in one component, but not the other. The paper you mention seems to sum over all components, so they are using a different formula. If you wish to have a global overview of: what is the importance of the variables after 3 components, then go for the definition of Mehmood. However if your problem is complex (i.e. several X variables with a problem that can be solve orthogonally) then the definition we propose might be better.

Details on how to calculate the VIP:
1- We calculate cor2, the correlation between the Y variables, and the PLS components associated to the X data set, squared cor(object$Y, object$variates$X, use = "pairwise")^2. It returns a matrix of size P x H where P is the number of variables in X and H is the number of components.
W = loading vectors from X, for each component h.

2 - We calculate the redundancy value Rd. For h = 1, Rd = W^2 (loadings squared). For h >1, Rd = the sum of cor2 of all Y variables, for each component (this is when you have more than 1 Y variable)

3 - For a given variable from X and a given component, the VIP is defined as sqrt(P * Rd %*% t(W^2)/sum(Rd))

What is means is that we assess the importance of a variable from X to explain all variables Y (through the Rd calculation), while also taking into account its loading weight W^2. However,
sum(Rd) takes into account all components, whereas the rest of the elements in the equation are component based.

Regarding the coefficients, I’ll get back to you as I need to dig further in the code.

Kim-Anh

NickBliziotis · October 15, 2020, 2:19pm

Hello!

I am having a problem regarding this issue. Trying to calculate VIP values using the classical term by Mehmood et al (2012), I found the plsVarSel R package, which calculates VIP values with this definition, to my understanding. However, given that I’ve used MixOmics for calculating my PLSDA models, I’m having some trouble in implementation. Specifically, I am unable to find the so-called Yloadings in a plsda object, as the loadings$Y as well as the loadings.star are not it. Could you please point me in the right direction?

Thanks in advance,
Nick

NickBliziotis · October 15, 2020, 4:51pm

Hi,

From my calculations I found that by calculating VIPs using the command in plsVarSel I was able to get the same VIP values as with the command implemented in MixOmics, for the same component. Thought I’d let you (all) know.

Cheers,
Nick

kimanh.lecao · October 18, 2020, 10:50pm

Thanks @NickBliziotis!
I assume you managed to extract the Y loadings you needed then?

Kim-Anh

Topic		Replies	Views
VIP by groups in PLS-DA Analysis	5	3133	August 10, 2020
HELP VIP SCORE analysis Analysis	1	98	June 13, 2024
Loadings vs. VIP Analysis	5	3064	August 14, 2020
VIP scores when there is only one response variable Analysis	1	239	October 25, 2022
Variance of the VIP statistics from PLS Analysis	2	1623	June 4, 2020

VIP score and PLS regression coefficient

Related topics