Highly correlated omic features

efratmuller · September 20, 2022, 6:58am

Hi!

First of all thanks for this great package I’ve just recently discovered.
I was wondering what are your recommendations when having some highly correlated features within one of the omics (e.g. correlation > 0.99), possibly suggesting an artifact of the raw data processing rather than actual biological signals (but we can’t know for sure).

Should they perhaps be removed before running your multi-block methods? Or should they be left as is, and then should I expect these highly correlated features to appear in the same component and with a similar coefficient?

Many thanks,
Efrat

MaxBladen · September 20, 2022, 10:15pm

Great question. Whether by processing artifacts or biological mechanisms, these highly correlated features aren’t going to help you. Both methods you suggested are valid - unfortunately the answer is very case dependent so I don’t want to say one way is definitively better. I would lean towards the removal of highly correlated features as this slows computation and can negatively impact the mathematical processes of the methods. I would explore also leaving them in and seeing how this impacts your model’s performance.

I’d encourage you to look at the cim() function (via ?mixOmics::cim). This will help you determine highly correlated features within a block of data.

efratmuller · September 21, 2022, 5:12am

Thanks Max for the quick response!
I’ll definitely check out the cim function.
Best,
Efrat

Topic		Replies	Views
Model performance vs. colinearity between features	1	301	September 22, 2020
Supervised binary classification of 2 distinct datasets sharing only a small number of common samples Analysis	3	34	November 7, 2024
How to deal with varying number of features and high feature correlation in DIABLO? Support	2	168	February 29, 2024
What constitutes as an "omic" Analysis	3	68	July 30, 2024
Using DIABLO Output for ML Training Analysis	1	9	June 13, 2025

Highly correlated omic features

Related topics