Question on intergrating Microbiome and Metabolomics Data

Hi everyone! :grinning:

I just followed the tutorial that provided by mixOmics team and I am still confusing about two issues.

  1. In Pre-processing OTU table(I use ASV-table generated by DADA2,but I think it is the same type) section, the tutorial use TSS (relative abundance) as the normalisation method.
    Generally " Rarefy"(Subsamples an OTU table to a fixed number of reads per sample using random subsampling without replacement") has been recommanded both Usearchv11 and QIIME2.
    Usearch author Robert Edgar even said it is the best strategy for microbiome dataset since it preserves the shape of the abundance distribution in each sample more accurately than systematic rounding as used in the obsolete otutab_norm(CSS method) command.

    So I want to know if Rarefy method can be used as processing method for mixMC.
    And if yes how do I deal with the rarefied OTU/ASV table next? Should I follow another TSS normalisation? Or other log ratio transformation?:thinking:

  2. I want to intergrating my microbiome data with a metabolomics data which contains the volatile organic compound(voc)and organic acid(oa) concentrations in sample,which is measured by GC-MS. I think I should use PLS/sPLS ,am I right? :joy:
    Also the voc and oa concentrations have already calculated and conversed to the same unit(such as mg/g),do I still need to normalised it?:face_with_raised_eyebrow:

Hope somebody can give me some advice! Thank you!:joy:
Sincerely,
Sixvable

hi @sixvable,
thank you for your interest in mixOmics and apologies for the post-holiday answer.

We generally do notrecommend rarefaction because we share the view of this paper: https://www.jstatsoft.org/article/view/v023i12, although depending on the type of data you deal with (e…g swabs with little material) we acknowledge you may have to! But this is not implemented in the package. You can run your rarefaction step outside the package first and get a table of OTU count.

Also note (we will correct this in the website) there is no need to do a TSS transformation if CLR transformation follows (and we definitely recommend you do a CLR transformation on the microbiome data). The CLR transformation is done via the argument logratio = 'CLR' in the PCA function but we have not implemented it yet for PLS. So for PLS you would need to use our external function logratio.transfo().

To integrate two data sets you have several choices, assume X = microbiome, Y = metabolome and y = disease outcome.
spls(X, Y) enables to select the X and Y variables that are most covariant
rcc(X, Y) does not perform variable selection but tells you how much agreement there is between the two data sets
block.splsda (X = list(X, Y), Y = y) does similar to the spls while also focusing on discriminating the sample groups, if applicable.

We always recommend you normalise each data type with a method of your choice, so check in the metabolomics field what is the best normalisation method.