Using DIABLO Output for ML Training

Mehrdadameri · June 12, 2025, 3:28am

Hi everyone,

I’m working on a multi-omics data integration project using transcriptomics, proteomics, and methylation data from cancer samples. My main goal is to extract informative features for downstream model training. I plan to build and compare several predictive models using different algorithms.

I’m trying to decide on the best approach for selecting features after running DIABLO (sparse multiblock PLS-DA):

Should I use the components obtained during DIABLO training as input features for my downstream models?
Or would it be better to extract the omics-specific features (e.g., genes, proteins, CpGs) selected for each component, combine them into a single feature set, and use those as the input for model training?

I’d really appreciate any advice, best practices, or experiences you can share regarding this decision. Thank you!

evahamrud · June 13, 2025, 4:28am

Hi @Mehrdadameri,

In terms of tuning DIABLO models, we recommend tuning both 1) the number of components and 2) the number of features for each component for each omic block. You can tune both of these using the tune() function in mixOmics, see more information here. The most efficient way is to first tune the number of components and then the number of variables, see our DIABLO case study for more details.

Cheers,
Eva

Topic		Replies	Views
Analytical issues using DIABLO Analysis	2	737	April 13, 2022
Generic questions about DIABLO: perf, keepX and no variable selection Support	5	1379	December 11, 2022
Integration of 2 data sets with DIABLO Analysis	4	1454	April 22, 2020
Pre-filtering and binary data Analysis	1	373	June 9, 2022
Diablo analysis for differentially expressed omics Support	2	375	June 17, 2021

Using DIABLO Output for ML Training

Related topics