VIP score mismatch with number of components

Hello,

I am running an sPLS-DA analysis on a metabolomics data set that I have tuned so that I have 3 components, with 7, 20, and 110 variables on each component, respectively.

I was using the “vip” function on my final sPLS-DA model to get variable importance projections, but find that I am getting non-zero values on variables on some of components for more variables than have been selected in the sPLS-DA (i.e., I get vip score for 7, 24 and 115 variables). But, when I extract the components from the same sPLS-DA model, it extracts the correct number of variables on each component (i.e., 7, 20, 110) that match the keepX values. So I am confident this isn’t an error with that input. Is this normal? I would have thought it wouldn’t be possible since the final sPLS-DA model limits the number of variables on each component. Perhaps I am just misunderstanding something.

Thank you so much for any help!
-Erin McCallum

@emccallum I am posting this on our gitHub issue. We’ll let you know when this is solved (it looks like a small computational error approximation). In the meantime just extract the VIP for the selected variables.

Kim-Anh

Hi @emccallum,

Thanks for sharing this issue.

To easier narrow down to the problem, could you please send an email with the plsda object and the code which reproduces the observed behaviour?

You can click on this text to send us an email.
Alternatively, you can right-click on the above text and choose ‘Copy Email Address’

Best,

Al

@emccallum I just ran a toy example and realised what the artefact was. The thing to consider is that a selected variable on a given component can have non-zero VIP on other components too, although it had been selected on the component it has the most importance. Having said that, you should be able to verify that all the variables with positive VIP will be among the selected variables.
See example below:

suppressMessages(library(mixOmics))
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- as.factor(breast.tumors$sample$treatment)
res <- splsda(X, Y, ncomp = 2, keepX = c(25, 5))
all_selected_vars <- unlist(sapply(1:2, function(j) selectVar(res, j)$name))
all_selected_vars
#>  [1] "115"  "692"  "689"  "685"  "682"  "48"   "4"    "762"  "653"  "298" 
#> [11] "3"    "729"  "700"  "708"  "845"  "111"  "170"  "187"  "769"  "852" 
#> [21] "848"  "838"  "190"  "541"  "1424" "1280" "1471" "1146" "1498" "1582"
vip.res <- vip(res)
nonzero.vip <- vip.res[rowSums(vip.res)>0,]
nonzero.vip
#>            comp1       comp2
#> 3     3.75697614  3.37660155
#> 1498  0.00000000  1.75206064
#> 115  14.41978365 12.95985440
#> 1582  0.00000000  0.24621667
#> 4     7.99616437  7.18659368
#> 1146  0.00000000  6.07352868
#> 848   0.45493447  0.40887469
#> 1424  0.07622126  0.06850425
#> 298   5.19855693  4.67222966
#> 769   0.60179771  0.54086878
#> 190   0.25097665  0.22556655
#> 845   1.88059528  1.69019464
#> 700   2.18190256  1.96099610
#> 729   2.50314151  2.24971125
#> 708   2.00334637  1.80051781
#> 170   0.88949501  0.79943820
#> 692  14.08812794 12.66177714
#> 682   8.61285272  7.74084549
#> 1280  0.00000000 10.15089238
#> 111   1.53674302  1.38115566
#> 48    8.20841578  7.37735572
#> 1471  0.00000000  7.01288288
#> 762   7.22607183  6.49446906
#> 541   0.08295004  0.07455178
#> 187   0.81222454  0.72999096
#> 685  10.30174763  9.25874844
#> 653   5.58743854  5.02173900
#> 838   0.29792050  0.26775757
#> 689  11.57414746 10.40232431
#> 852   0.55828359  0.50176024
all(rownames(nonzero.vip) %in% all_selected_vars)
#> [1] TRUE
<sup>Created on 2020-08-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

Hi everyone,

Thanks for looking into this so quickly!! Yes, your toy example matches my data, I think. I had 7 variables selected on component 1, 4 if these variables had VIP scores > 1. Those 4 variables then also appear on the component two VIP list, making it 24 long, instead of the original selected 20. This then passes into component 3, where the variables that had VIP scores > 1 on both the previous two components are also on the list for component 3, even if they weren’t selected for originally on component 3.

I’ve learned through this post and another that my third component likely isn’t especially helpful, even if it was selected by the tuning process. Very few of the 100 selected variables have VIP scores over 1.

This all makes sense to me. Let me know if you’d still like me to send my data and code along, I’m happy to provide it.

And thank you for designing such a well-documented and informative R-package! It’s been great to work with as a beginner in this area.

Thank you,
Erin

Hi @emccallum,

It’s great to know you find mixOmics useful in your analysis.

That’s correct. As I mentioned, for variables selected on a given component, you can safely ignore their importance values for the following components.