Feasibility of combined VIP score

Hi everyone,

I am attempting to integrate multiple omics datasets using the block sPLS-DA method. Despite employing cross-validation to select an appropriate number of variables, the loading matrix lacks a specific criterion during feature selection (loading>0 is too broad). This has led me to consider commonly used criteria such as VIP>1.

However, VIP is not applicable for block-related methods. VIP is designed to work with a single Y matrix, whereas block methods involve multiple Y matrices (considering other omics datasets as Y, as emphasized by the design matrix). Nevertheless, it can be understood that
the design matrix actually indicates the contribution of the associations between different omics datasets X and their correlation with the Y matrix in the final model.

If this idea holds, could it be possible to calculate VIP values separately for X1 and (X2, …, Xn, Y) pairwise, and then weight them based on the design matrix? Ultimately, this could result in a weighted VIP score. Using a weighted VIP>1 as a filtering criterion, would this be a feasible approach?

I appreciate any advice you may have on this matter.

hi @huanren,

Thanks for sharing your ideas.

We use lasso so we are not just looking at ‘loading > 0’. If you inspect the results from selectVar() you will have access to those coefficients, and combined with a stability selection analysis, you can get quite some insights into the features that are important in your analysis. (example here in ‘Variable plots’ sPLSDA SRBCT Case Study() | mixOmics).

In the mixOmics book section 10.5 I looked a bit at the VIP vs lasso selection and state ’ This output shows that all X variables that were selected, are important for explaining Y , since their VIP is greater than 1.’

I would say the design matrix in the multi block methods is ‘aspirational’, we use it to weight the linear combinations of features, but if you look at the ‘plotDiablo’ outputs, you can see that sometimes it does not work out. Maybe you can use those outputs instead to weight your VIP.

Anyway, food for thought. As you said the VIP is not appropriate for multi block models, but I am not sure you are gaining much with this measure (I am not a super big fan myself!).


hi @kimanh.lecao

Thank you for your detailed response!

As you rightly pointed out, lasso is indeed suitable for feature selection, and the loading matrix obtained from the block.splsda method is essentially the result of lasso filtering. Moving forward, I will utilize various evaluation methods provided by mixOmics to make a comprehensive assessment rather than being confined to a single metric.

You mentioned the inconsistency between the ‘plotDiablo’ output and the weighting by the design matrix. It seems that this discrepancy may be reflected in the correlation coefficients of the same components across omics. Despite the design matrix assigning a weight of only 0.1, the correlations between components can often reach 0.7 or even higher. These correlation coefficients truly reflect the real connections between the same components in different omics data. While acknowledging the limitations of VIP, considering its widespread application, I might still incorporate it as part of the evaluation (perhaps as an additional assessment based on the results of lasso filtering).

Thank you once again for your detailed responses to my questions. The atmosphere in the community is truly wonderful!

1 Like