Model performance vs. colinearity between features

gsalle · September 22, 2020, 6:52am

Hi there,

I am running a few analyses with mixOmics and that’s really great tool + well documented.
It looks that some of the features I am considering (e.g. blood cell populations/parameters) show strong correlations. This is certainly the case in many situations.
I was thinking of discarding the most redundant ones but maybe sPLS already accounts for this. Is this correct to assume so ? Does it matter at all ?

Thanks,
Guillaume

kimanh.lecao · September 22, 2020, 11:36pm

Dear @gsalle,
thanks for using mixOmics! We use lasso in sPLS, so if your keepX is small enough, it should only focus on the most (uncorrelated) variables. However, as keepX increases, it will necessarily include correlated variables. A selection of correlated variables is often useful in biology if you are interested in explaining the biological system in a holistic way, but not so much if you are interested only in the top biomarkers for diagnostic. In that case it would make sense to remove highly correlated variables if you think you are not missing any critical information.
sPLS will be fine handling correlated variables.

Kim-Anh

Topic		Replies	Views
Pre-filtering data prior to sPLS Analysis	1	256	November 1, 2022
sPLS explained variance and variable selection	2	157	June 17, 2024
Using keep.X from separate sPLS-DA analyses for Diablo Analysis	3	1014	October 8, 2020
Spls / keepx / keep specific variables Support	3	405	August 30, 2022
How does keepX and keepY choose the variables?	5	450	September 26, 2022

Model performance vs. colinearity between features

Related topics