Model performance vs. colinearity between features

Hi there,

I am running a few analyses with mixOmics and that’s really great tool + well documented.
It looks that some of the features I am considering (e.g. blood cell populations/parameters) show strong correlations. This is certainly the case in many situations.
I was thinking of discarding the most redundant ones but maybe sPLS already accounts for this. Is this correct to assume so ? Does it matter at all ?


Dear @gsalle,
thanks for using mixOmics! We use lasso in sPLS, so if your keepX is small enough, it should only focus on the most (uncorrelated) variables. However, as keepX increases, it will necessarily include correlated variables. A selection of correlated variables is often useful in biology if you are interested in explaining the biological system in a holistic way, but not so much if you are interested only in the top biomarkers for diagnostic. In that case it would make sense to remove highly correlated variables if you think you are not missing any critical information.
sPLS will be fine handling correlated variables.