Pre-processing steps for diablo analysis

I have a question regarding using pre-processing steps for different omic datasets. I have metabolome, microbiome, and clinical markers (derived from blood) datasets. The datasets have been collected for a clinical trial with two time-points and a treatment and control group.

I have calculated the clr-transformed change for metabolome and microbiome dataset, to be used as input to predict the group (treatment or control) in DIABLO.

I am confused regarding the pre-processing steps for the clinical markers. Logically, I would prefer log fold change to be consistent with other datasets, however, the general norm for clinical markers is to use absolute change for analysis. I wonder if it is ok to use absolute change in clinical marker and log fold change in metabolome & microbiome as an input to diablo?

NOTE: The distribution for clinical markers is already normally distributed (without log transformation). We also have compared both (log fold and absolute change in clinical markers. ). The results are different. Diablo has chosen similar clinical markers , however, the markers in the metabolome dataset are very different.

Thank you

hi @Aakash,

Yups, different data transformation are likely to lead to different results.
I would:

  • log transform (or any normalisation) the metabolomics data, not really a CLR
  • CLR transform the microbiome data
  • put the clinical markers as is (or with a log transformation).

I would not use a fold change. Instead you can have a look at the multilevel analysis (basically an extra transformation step) to take into account the two time points: Multilevel | mixOmics and also the function withinVariation() if you want to do this transformation outside the package (as we dont have it implemented currently in DIABLO to integrate more than 2 data sets).