Choosing the right analysis for my scientific question

Hello,
I am exploring mixOmics in order to integrate my metabolomics data (~1000 features) and RNAseq data (~2000 genes). I have 19 samples from 3 different genotypes and 2 different virus inoculation parameters (inoculated and mock inoculated).
My question is what method should I use in order to identify genes and metabolites that could explain the differences in susceptibility to the virus between those 3 genotypes?

Thank you in advance for your help,

Julien

Hi @Julien,

Thanks for your interest in using mixOmics! Have a look at our webpage which has a great decision tree which can help you choose which method to use. We can break down your analysis into its different components:

  1. You have two ‘omics’, also called ‘data blocks’ (metabolomics and RNAseq), this means you will need to perform N-integration, you can see on the page I linked that this time of analysis is useful for answering the question: “Which variables across the different omics data sets discriminate the different outcomes?”

  2. You have an outcome variable, which is susceptibility to the virus. If this output is continuous, you will be looking at a regression method, whereas if your output is categorical (e.g. yes/no or low/mid/highly susceptible), you will need to use a classification method. Have a look at the decision tree on the webpage I linked for more info on that.

  3. The final aspect to consider is the grouping of your samples. If you would like to identify which genes and metabolities explain differences in virus susceptibility regardless of genotype, you may want to use P-integration, which integrates samples across groups (also called ‘studies’) and can be used to answer questions like: “Which variables are discriminative across all studies?”.

Your other sample grouping is inoculated and mock inoculated, I personally would not include the mock inoculated samples in this analysis, as you would expect all of those samples to not be infected at all, I would think of them more as a control to ensure your experiment worked correctly.

As you can see, there are a few things to consider when choosing which mixOmics method to use, namely: integrating omics across the same samples, integrating different sample groups and what your outcome variable looks like. These methods can be combined, e.g. we have functionality to run N- and P- integration simultaneously. However, as described in our case studies we recommend breaking down the data into simpler problems in the first instance. For example, you could use mixOmics to run a simple PCA on your RNAseq and metabolites data separately. If, for example, you find that your genotypes do not strongly influence gene expression or metabolities, you might not have to run P-integration.

I hope this is useful and feel free to post in this forum if you have further questions!
Cheers,
Eva