VIP score mismatch with number of components

emccallum · August 13, 2020, 12:04pm

Hello,

I am running an sPLS-DA analysis on a metabolomics data set that I have tuned so that I have 3 components, with 7, 20, and 110 variables on each component, respectively.

I was using the “vip” function on my final sPLS-DA model to get variable importance projections, but find that I am getting non-zero values on variables on some of components for more variables than have been selected in the sPLS-DA (i.e., I get vip score for 7, 24 and 115 variables). But, when I extract the components from the same sPLS-DA model, it extracts the correct number of variables on each component (i.e., 7, 20, 110) that match the keepX values. So I am confident this isn’t an error with that input. Is this normal? I would have thought it wouldn’t be possible since the final sPLS-DA model limits the number of variables on each component. Perhaps I am just misunderstanding something.

Thank you so much for any help!
-Erin McCallum

kimanh.lecao · August 13, 2020, 10:49pm

@emccallum I am posting this on our gitHub issue. We’ll let you know when this is solved (it looks like a small computational error approximation). In the meantime just extract the VIP for the selected variables.

Kim-Anh

aljabadi · August 14, 2020, 1:03am

Hi @emccallum,

Thanks for sharing this issue.

To easier narrow down to the problem, could you please send an email with the plsda object and the code which reproduces the observed behaviour?

You can click on this text to send us an email.
Alternatively, you can right-click on the above text and choose ‘Copy Email Address’

Best,

Al

aljabadi · August 14, 2020, 1:39am

@emccallum I just ran a toy example and realised what the artefact was. The thing to consider is that a selected variable on a given component can have non-zero VIP on other components too, although it had been selected on the component it has the most importance. Having said that, you should be able to verify that all the variables with positive VIP will be among the selected variables.
See example below:

suppressMessages(library(mixOmics))
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- as.factor(breast.tumors$sample$treatment)
res <- splsda(X, Y, ncomp = 2, keepX = c(25, 5))
all_selected_vars <- unlist(sapply(1:2, function(j) selectVar(res, j)$name))
all_selected_vars
#>  [1] "115"  "692"  "689"  "685"  "682"  "48"   "4"    "762"  "653"  "298" 
#> [11] "3"    "729"  "700"  "708"  "845"  "111"  "170"  "187"  "769"  "852" 
#> [21] "848"  "838"  "190"  "541"  "1424" "1280" "1471" "1146" "1498" "1582"
vip.res <- vip(res)
nonzero.vip <- vip.res[rowSums(vip.res)>0,]
nonzero.vip
#>            comp1       comp2
#> 3     3.75697614  3.37660155
#> 1498  0.00000000  1.75206064
#> 115  14.41978365 12.95985440
#> 1582  0.00000000  0.24621667
#> 4     7.99616437  7.18659368
#> 1146  0.00000000  6.07352868
#> 848   0.45493447  0.40887469
#> 1424  0.07622126  0.06850425
#> 298   5.19855693  4.67222966
#> 769   0.60179771  0.54086878
#> 190   0.25097665  0.22556655
#> 845   1.88059528  1.69019464
#> 700   2.18190256  1.96099610
#> 729   2.50314151  2.24971125
#> 708   2.00334637  1.80051781
#> 170   0.88949501  0.79943820
#> 692  14.08812794 12.66177714
#> 682   8.61285272  7.74084549
#> 1280  0.00000000 10.15089238
#> 111   1.53674302  1.38115566
#> 48    8.20841578  7.37735572
#> 1471  0.00000000  7.01288288
#> 762   7.22607183  6.49446906
#> 541   0.08295004  0.07455178
#> 187   0.81222454  0.72999096
#> 685  10.30174763  9.25874844
#> 653   5.58743854  5.02173900
#> 838   0.29792050  0.26775757
#> 689  11.57414746 10.40232431
#> 852   0.55828359  0.50176024
all(rownames(nonzero.vip) %in% all_selected_vars)
#> [1] TRUE
<sup>Created on 2020-08-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

emccallum · August 14, 2020, 9:19am

Hi everyone,

Thanks for looking into this so quickly!! Yes, your toy example matches my data, I think. I had 7 variables selected on component 1, 4 if these variables had VIP scores > 1. Those 4 variables then also appear on the component two VIP list, making it 24 long, instead of the original selected 20. This then passes into component 3, where the variables that had VIP scores > 1 on both the previous two components are also on the list for component 3, even if they weren’t selected for originally on component 3.

I’ve learned through this post and another that my third component likely isn’t especially helpful, even if it was selected by the tuning process. Very few of the 100 selected variables have VIP scores over 1.

This all makes sense to me. Let me know if you’d still like me to send my data and code along, I’m happy to provide it.

And thank you for designing such a well-documented and informative R-package! It’s been great to work with as a beginner in this area.

Thank you,
Erin

aljabadi · August 25, 2020, 10:44am

Hi @emccallum,

It’s great to know you find mixOmics useful in your analysis.

That’s correct. As I mentioned, for variables selected on a given component, you can safely ignore their importance values for the following components.

NickBliziotis · January 12, 2021, 3:56pm

Hello and happy new year!

I have come accross a question, relevant to this topic: what happens if a given variable has a non-zero VIP value in component n, but a higher value in component n+1? Can we still safely ignore the value in component n+1?

Thank you,
Nick Bliziotis

aljabadi · January 12, 2021, 11:47pm

Hi @NickBliziotis,

Happy New Year to you too!

By definition, this should never happen. Unless the data are too pathological I guess, where the components are not fully orthogonal. Have you come across any instances where it does happen?

Best,

Al

NickBliziotis · March 18, 2021, 2:13pm

Hello,

Apologies for the late reply. I have, please find my VIP list below:

This phenomenon can be observed for several entries, e.g. feature no 1 (0.967). Does this mean the components are not fully orthogonal? Can I still safely ignore one of the two values?
Thanks in advance,
Nick

Topic		Replies	Views
Need some insights fo vip() function Support	2	102	February 2, 2024
HELP VIP SCORE analysis Analysis	1	100	June 13, 2024
VIP by groups in PLS-DA Analysis	5	3142	August 10, 2020
VIP score and PLS regression coefficient Analysis	5	5234	October 18, 2020
Variables in more than 1 component Analysis	1	179	August 10, 2023

VIP score mismatch with number of components

Related topics