Choosing the right analysis for my scientific question

Julien · December 12, 2024, 9:12am

Hello,
I am exploring mixOmics in order to integrate my metabolomics data (~1000 features) and RNAseq data (~2000 genes). I have 19 samples from 3 different genotypes and 2 different virus inoculation parameters (inoculated and mock inoculated).
My question is what method should I use in order to identify genes and metabolites that could explain the differences in susceptibility to the virus between those 3 genotypes?

Thank you in advance for your help,

Julien

evahamrud · December 19, 2024, 10:36pm

Hi @Julien,

Thanks for your interest in using mixOmics! Have a look at our webpage which has a great decision tree which can help you choose which method to use. We can break down your analysis into its different components:

You have two ‘omics’, also called ‘data blocks’ (metabolomics and RNAseq), this means you will need to perform N-integration, you can see on the page I linked that this time of analysis is useful for answering the question: “Which variables across the different omics data sets discriminate the different outcomes?”
You have an outcome variable, which is susceptibility to the virus. If this output is continuous, you will be looking at a regression method, whereas if your output is categorical (e.g. yes/no or low/mid/highly susceptible), you will need to use a classification method. Have a look at the decision tree on the webpage I linked for more info on that.
The final aspect to consider is the grouping of your samples. If you would like to identify which genes and metabolities explain differences in virus susceptibility regardless of genotype, you may want to use P-integration, which integrates samples across groups (also called ‘studies’) and can be used to answer questions like: “Which variables are discriminative across all studies?”.

Your other sample grouping is inoculated and mock inoculated, I personally would not include the mock inoculated samples in this analysis, as you would expect all of those samples to not be infected at all, I would think of them more as a control to ensure your experiment worked correctly.

As you can see, there are a few things to consider when choosing which mixOmics method to use, namely: integrating omics across the same samples, integrating different sample groups and what your outcome variable looks like. These methods can be combined, e.g. we have functionality to run N- and P- integration simultaneously. However, as described in our case studies we recommend breaking down the data into simpler problems in the first instance. For example, you could use mixOmics to run a simple PCA on your RNAseq and metabolites data separately. If, for example, you find that your genotypes do not strongly influence gene expression or metabolities, you might not have to run P-integration.

I hope this is useful and feel free to post in this forum if you have further questions!
Cheers,
Eva

Julien · December 27, 2024, 2:52pm

Hi Eva,

Thank you very much for your very informative response.
I decided to normalize the inoculated group by the mock inoculated group in order to have a metabolic and transcriptomic response to the infection.
I am now trying to use DIABLO to perform the integration.

Thanks again,

Julien

Topic		Replies	Views
DIABLO (N-integration) for different omic data and same set of genes Analysis	1	319	February 23, 2023
Using DIABLO to integrate multiple metabacording datasets Analysis	2	446	September 6, 2021
Choice of method: MINT/DIABLO for independent sets of cell lines Analysis	4	846	April 16, 2020
Variable Selection Analysis	1	422	October 21, 2019
DIABLO without outcome variables? Analysis	1	34	May 9, 2025

Choosing the right analysis for my scientific question

Related topics