DIABLO: different variable sizes and choosing the design matrix

I am integrating proteomics, peptidomics and physicochemical data using the DIABLO framework. I have been referencing your excellent 2019 publication on DIABLO, and I find the methodology very powerful for my research.

I had a question regarding block weighting when dealing with datasets of significantly different sizes. In my case, I am working with a proteomics dataset (~3600 variables) and a peptidomics dataset (~324 variables), along with a smaller block of physicochemical properties (only 3 variables). I am concerned about the disparity in variable numbers across these blocks and their influence on the integration and feature selection process.

In your paper, I didn’t find specific guidance on adjusting the weights of blocks in the design matrix to account for this imbalance. Could you kindly clarify if block weighting to adjust for the size of blocks (or other factors) is recommended in DIABLO? If so, could you point me to any literature or specific methodology regarding how to effectively implement such an approach in DIABLO?

Hello,

Regarding the varying variable size this is not an issue as each block is considered ‘separately’ (so to speak) with equal importance so it will not affect the integration process. When you do the variable selection, you can either tune the optimal number of variables to select, or choose arbitrary numbers (for example if you think it would be more appropriate to select more variables in one block than the other).

Here are some indications on how to choose the design in DIABLO: 6 N-Integration | mixOmics vignette

I hope that helps,
You can contact us via this forum, search past posts, or ask new questions.

Kim-Anh