Choice of method: MINT/DIABLO for independent sets of cell lines

Dear sir/madam,

I have a question regarding what method of mixomics I could use for my data. I have metabolomics, proteomics and phosphoproteomics data. This data has been acquired from cells stemming from the same cell line and have been treated similarly (infected/not infected in a timecourse (every timepoint is a separate group of samples)). However, these experiments are independent as they are performed at different times and by different people. My question is whether I should use MINT because the experiments are independent and thus the data is not acquired on exactly the same samples (although the samples are treated similarly), or can I use DIABLO (when I use all three datasets) or PLS (when I use 2 out of 3 datasets) since I have three different types of data and the samples are treated similarly? Additionally, I would like to ask how far the development is of the timecourse extension. Can I already use this extension or is it still in progress? And if it is still in progress, how would you recommend to deal with different time points? Lastly, the metabolomics dataset is relatively small (around 150 metabolites). Can I still perform pls-da on this dataset, or is this dataset too small for pls-da?

Yours sincerely,
Lonneke Nouwen

Dear @lonnekenouwen,
Thank you for considering mixOmics in your exploratory and integrative analyses!

However, these experiments are independent as they are performed at different times and by different people.

Given that those are cell lines (assuming you have the same number per omic type, I would be tempted to advise you to consider they are the same. One way to check that the source of variation is similar to to run a PCA on each data set individually and examine the sample plots.

or can I use DIABLO (when I use all three datasets) or PLS (when I use 2 out of 3 datasets) since I have three different types of data and the samples are treated similarly? Lastly, the metabolomics dataset is relatively small (around 150 metabolites). Can I still perform pls-da on this dataset, or is this dataset too small for pls-da?

We usually advise our user to start gradually: with one omic analysis first, (PCA, PLS-DA) then 2, then 3. Each step is going to answer a specific type of question and give a better understanding of the agreement between data sets. DIABLO is the summum because you need to choose the design matrix, and the previous analyses will help you make this choice. For the variable selection size, do a first pass first with 20 to 50 variables selected on each dimension. For the metabolomics data, absolutely you can use PLS-DA and sPLS-DA (basically, as soon as the number of variables if larger than the number of samples you should consider a sparse method to identify key variables).

how far the development is of the timecourse extension

We have it now! https://www.frontiersin.org/articles/10.3389/fgene.2019.00963/full
Antoine is currently working on pushing his bioconductor package, I think you can pull it from his GitHub GitHub - abodein/timeOmics: Time-Course Multi-Omics data integration. Make sure you read the paper first to understand what type of analyses you should consider. You need about 5 time points (if less than that, you can consider the multilevel decomposition. It wont take into account the correlation between time points, but it allows to remove the individual variation if such variation >> time variation).

Hope that helps,

Kim-Anh

Dear Kim-Anh,

Thank you for your fast reply, it really helps me to understand how to proceed and I am really excited to read the paper regarding the timecourse extension! I have two more questions: would it in theory be possible to use MINT? I understood form the website/paper on this topic is that it is used for the same omics type that has been acquired from different platforms, but is is possible to use it with different types of omics as well (just in case the experiment (time/person that performed it) effect is greater than the treatment effect)? The last question regards the pre-processing of the data. The proteomics data have already been log 2 transformed, the metabolomics data has not been log 2 transformed as of yet. Would you recommend to also log 2 transform the metabolomics data so that both datasets have been treated the same, or is that irrelevant?

Yours sincerely,

Lonneke Nouwen

hi @lonnekenouwen
Apologies I missed your message.
You can only use MINT if it is on the same type of omic (imagine that you are stacking the data sets on the same column variables and so P should be the same across the two data sets, even if N is different), so that wont work. Definitely go for DIABLO, @aljabadi will respond to you bug shortly.

Kim-Anh

Thank you for the clarification!

Kind regards,
Lonneke Nouwen