Hello,
I am exploring the impact of different light treatments on hundreds of phytochemical features.
I would like to use sPLS-DA to select the features that are most influential in differentiating among the light treatments. I followed the turtorials on the Mixomics page (and also based on previous discussions on this forum) I tuned my models and reduced the field of “relevant” features. comp1=25 features, comp2=45 features, comp3=10 features, comp4=65 features
For some follow up analysis I would like to focus on the top 20 most “influential” features. It was suggested to use the loading values of each component as an indicator of its relevance. My question is: are higher loading values always indicating a higer feature relevance (i.e. higher discriminative power)?
i.e. is a feature on comp. 4 with a loading value of 0.35 equally relevant as a feature feature on comp. 1 with a loading value of 0.35? If not, is there a way to rank all features from all compounds based on their relevance /disriminative power?
Thanks a lot!
hi @eimichae ,
Thanks a lot for your (tricky) question!
The loading vectors weights are independent on each component (as each component is orthogonal to each others). This mean that you cannot directly compare those weights across components. Also, as each component is discriminating a particular aspect in your data, so it is difficult to combine all this information together.
You might be able to use the VIP (variable importance in the projection) by rerunning your model for 1 component (extract the VIP), 2 components (extract the VIP) etc. It’s a measure that assesses the contribution of each variable in explaining Y through the components. We consider the VIP>1 as important. I have this feeling the way the VIP is calculated would ‘normalise’ across components.
Kim-Anh
Dear Kim-Anh,
Thank you very much for your reply.
I have a few “newby” follow-up questions:
First, I am a bit confused as I have read several times now that VIPs are not the best measure of judiging a variable’s “importance” as VIPs tend to overestimate variable importance. Isn’t that why you developed the sparsePLS-DA approach and the variable selection via loading values etc (or maybe I am confusing things). Otherwise, why would I not alway work with VIPs instead of the loading values as shown in the work-through example?
Also you suggest to caluclate VIPs in a step wise approach (i.e. rerunning the model for 1 component (extract the VIP), 2 components (extract the VIP) etc). I do not really understand why. When I run the model directly for 2 components mixOmics automatically also gives me the VIP for the 1 component. Hence is the step wise approach even necessary?
Many thanks!
Michael
Hi @eimichae,
I rechecked the code and you are correct, there is no need to run the VIP as it will automatically extract the VIP values for each of the components.
I agree with you that VIP is not the best, hence sPLS-DA. But you were asking for values of importance that are comparable across components, and my answer is that you can’t with the loading vectors. They are component specifics and a value of a gene on component 1 can’t be compared directly to the value of another gene on component 2. You can only examine / compare the importance of genes within a component, not across. That is why I said you could use VIP to answer your specific question.
I hope that helps!
Kim-Anh
1 Like
Dear @kimanh.lecao,
Thank you very much. That helps a lot!
Michael