First of all thanks for this great package I’ve just recently discovered.
I was wondering what are your recommendations when having some highly correlated features within one of the omics (e.g. correlation > 0.99), possibly suggesting an artifact of the raw data processing rather than actual biological signals (but we can’t know for sure).
Should they perhaps be removed before running your multi-block methods? Or should they be left as is, and then should I expect these highly correlated features to appear in the same component and with a similar coefficient?
Great question. Whether by processing artifacts or biological mechanisms, these highly correlated features aren’t going to help you. Both methods you suggested are valid - unfortunately the answer is very case dependent so I don’t want to say one way is definitively better. I would lean towards the removal of highly correlated features as this slows computation and can negatively impact the mathematical processes of the methods. I would explore also leaving them in and seeing how this impacts your model’s performance.
I’d encourage you to look at the
cim() function (via
?mixOmics::cim). This will help you determine highly correlated features within a block of data.
Thanks Max for the quick response!
I’ll definitely check out the cim function.