DIABLO interpretation in light of low stability of feature selection

efratmuller · October 30, 2022, 1:44pm

Hi mixOmics team!

Experimenting with a few different datasets/omics, I see that in some cases my DIABLO components are very unstable (as outputted by the perf function, <output>$features$stable). I was wondering how would you recommend dealing with these cases, and whether or not you believe it still makes sense to further analyze these components?

Some additional notes to put my question in context:

In the specific dataset I’m referring to I have ~180 samples, and 4 blocks. Each block has ~100-~5000 features. I’m trying to find small groups of features so keepX values are relatively small (2-10).
The overall model performance is decent, with results comparable to other machine learning models. I’m mostly looking at AUC (values are ~0.75).
By low stability scores I mean a few features with mean stability (averaged over repeats) of 10%-40% and all others <10%.
I’m getting similar results when using leave-one-out CV or other folds/nrepeat choices. So it’s not just a specific cross-validation strategy that yields this instability.

Any tip or thoughts on this matter would be greatly appreciated.
Thanks!
Efrat

MaxBladen · November 1, 2022, 9:27pm

I’m trying to find small groups of features so keepX values are relatively small (2-10).

This seems to me to be the culprit behind the low stabilities. With so many features, there are many different options to generate a given component and have it be similarly effective. Your predictors will have correlations between them, causing this. I would recommend potentially increasing your keepX values slowly, all the while examining how it is impacting feature stability.

I’m mostly looking at AUC (values are ~0.75).

AUC should just be used as a complementary metric. It shouldn’t be used as the sole metric to evaluate model performance. Use the model’s error rare (ER) or balanced ER (BER). AUC can be used to check the relative performance of two similar models, but I wouldn’t use it to compare DIABLO models to other ML models.

whether or not you believe it still makes sense to further analyze these components?

With stabilities around and below 10%, your models are too subject to stochastic mechanisms. I think you should continue your analysis, but adjust your keepX values. If this doesn’t do anything (or you can’t due to study design limitations), I’m not sure how else you could go about dealing with these features save for ignoring them

efratmuller · November 2, 2022, 9:23am

Thanks Max. Increasing keepX sonuds like the way to go.
Best,
Efrat

Topic		Replies	Views
Feature stability selection in DIABLO Bugs	9	539	February 3, 2025
DIABLO: Handling high dimensionality and tuning keepX Analysis	10	995	December 11, 2022
How to deal with varying number of features and high feature correlation in DIABLO? Support	2	173	February 29, 2024
Generic questions about DIABLO: perf, keepX and no variable selection Support	5	1385	December 11, 2022
sPLS-DA with only stable features? Analysis	2	335	April 6, 2022

DIABLO interpretation in light of low stability of feature selection

Related topics