Differential abundance and alpha-diversity after PLSDA-Batch correction

Hello,

I have used the PLSDA-batch package to perform batch correction on two datasets that were sequenced using different Illumina chemistries. After running the pipeline, the final output is a matrix with CLR-transformed and batch-corrected data.
My understanding is that for differential abundance analysis using packages like ALDEx and ANCOM (which seem to be the most recommended), I would typically need a count table. Given that my data is now CLR-transformed, I’m unsure how to proceed.
Is there a way to transform the CLR data back to positive integer counts? Should that be feasible, is such a transformation even recommended for this type of analysis?
What would be the best approach for performing differential abundance analysis on this dataset? Should I consider using the mixOmics package to identify discriminative variables between treatments, as it might be a suitable alternative to ALDEx/ANCOM in this context?

How could I visualize compositional data and alpha diversity using this batch-corrected CLR-transformed data? I noticed that in the paper that introduced me to PLSDA-batch, the authors did not use the batch-corrected data for compositional plots or alpha diversity calculations but rather the data prior to correction. Would you recommend a similar approach?

I am sorry if my question is outside the scope of the forum, but I have been struggling with those issues for some time now. So, any guidance or suggestions will be greatly appreciated!

Best regards,

Adriana

Hello Adriana,

Thank you for your interest in our package. If you strongly wish to transform back to counts, the only feasible approach is to use a standard log transformation instead of CLR (Centered Log Ratio). This would allow you to use the exponential function to reverse the transformation. However, differences between samples due to library sizes will be somewhat reduced by PLSDA-batch, but not entirely removed. Therefore, this method is not highly recommended.

If your goal is to perform differential abundance analysis after batch effect correction, a simpler approach would be to use linear regression or mixed linear regression on the CLR-transformed and batch effect corrected data. Additionally, we recommend using a multivariate approach, such as PLSDA (from mixOmics package), to identify microbial signatures, which are groups of microbes. Since microbes often work cooperatively, multivariate methods are generally more appropriate.

I have not conducted any alpha diversity analysis following batch effect correction. If you must do so, it may be better to revert to counts as described above. You can try the method I mentioned.

I hope this helps.

Best regards,
Eva

1 Like

Hello Eva,
Thank you for the clarification and suggestion.
Best,
Adriana