Hi everyone,
I’m working on a multi-omics data integration project using transcriptomics, proteomics, and methylation data from cancer samples. My main goal is to extract informative features for downstream model training. I plan to build and compare several predictive models using different algorithms.
I’m trying to decide on the best approach for selecting features after running DIABLO (sparse multiblock PLS-DA):
- Should I use the components obtained during DIABLO training as input features for my downstream models?
- Or would it be better to extract the omics-specific features (e.g., genes, proteins, CpGs) selected for each component, combine them into a single feature set, and use those as the input for model training?
I’d really appreciate any advice, best practices, or experiences you can share regarding this decision. Thank you!