Microbiome log transformation

Hi team,

I am aiming to perform PLS on my microbiome and cognition data. I have included microbiome data into PLS after log transformation.

I have used the following code-
microbiome.pls<-pls(Microbiomelog, Microbiome_IT, ncomp = 10, mode = “regression”)

However, I got the following error message.

Error in Check.entry.pls(X, Y, ncomp, keepX, keepY, mode = mode, scale = scale, : ‘X’ must be a numeric matrix.

I tried converting matrix into numeric. But did not work. Could you please help with this.

Many thanks!

Hi Mrudhula,
For microbiome data, we advise performing a centered log ratio transfomation, following the different steps here: http://mixomics.org/mixmc/pre-processing/ (i.e. OTU filtering, offset, then CLR transformation). The error may happen because you have infinite values because of the log transform on zeroes.

Hi Kimanh,

Thank you for the response.

Yes, I have performed OTU filtering, offset addition, TSS and then CLR before performing PLS. However, after performing CLR, I did rowSums to check if the sum of the variables within each sample is 0. It was not 0 after CLR. Do you think will it be the problem. And also my data is in percentage format. So, I transformed the data by dividing whole dataset by 100 (to get relative abundance in integer format).

Thanks heaps!

Hi Mrudhula,

If you had done a TSS transformation and do a rowSums on each sample, then you would get 1.
But the CLR transforms the data outside a simplex (simplex = when the sum of your data = 1) and so this is why it does not sum to any specific value, so this is to be expected.

You should start from the raw counts before you filter then transform your data, you should not have percentages. Have a look at the examples on our website on how the data look like after transformation:

> library(mixOmics)
> data("diverse.16S")
> diverse.16S$data.raw[1:5,1:5]
      OTU_97.10 OTU_97.10029 OTU_97.101 OTU_97.1010 OTU_97.10101
700015293         1            2          1           1            5
700015227         1            1          1           1            2
700105879         1            1          1           1            1
700037087         4            1          1           1            1
700105616         1            1          1           1            1

(note the raw data have all an offset of 1 here to deal with the zeroes)
TSS data:

> diverse.16S$data.TSS[1:5,1:5]
             OTU_97.10 OTU_97.10029   OTU_97.101  OTU_97.1010 OTU_97.10101
700015293 0.0001613163 0.0003226327 0.0001613163 0.0001613163 0.0008065817
700015227 0.0005359057 0.0005359057 0.0005359057 0.0005359057 0.0010718114
700105879 0.0003362475 0.0003362475 0.0003362475 0.0003362475 0.0003362475
700037087 0.0010416667 0.0002604167 0.0002604167 0.0002604167 0.0002604167
700105616 0.0005341880 0.0005341880 0.0005341880 0.0005341880 0.0005341880

CLR data after TSS:

> data.CLR = logratio.transfo(diverse.16S$data.TSS, logratio = 'CLR')
> data.CLR[1:5,1:5]
            OTU_97.10 OTU_97.10029  OTU_97.101 OTU_97.1010 OTU_97.10101
700015293 -0.35096366   0.34218352 -0.35096366 -0.35096366   1.25847425
700015227 -0.05537786  -0.05537786 -0.05537786 -0.05537786   0.63776932
700105879 -0.11700281  -0.11700281 -0.11700281 -0.11700281  -0.11700281
700037087  1.25158619  -0.13470817 -0.13470817 -0.13470817  -0.13470817
700105616 -0.02803651  -0.02803651 -0.02803651 -0.02803651  -0.02803651

After that we carry on with the analysis (note in most of our functions you will see a logratio argument, which means you can skip the CLR step, but this is really what is happening internally, the input data to our functions are CLR data). Thee is no other transformation for interpretations or plots.

Hi @Mrudhula, can you please let us know what you get when you run the following code:

class(Microbiomelog)
dim(Microbiomelog)
is.numeric(Microbiomelog)    
sum(is.infinite(Microbiomelog))

Hi,

Class is matrix

dim- 200 768

is.numeric- TRUE

sum(is.infinite)- 0

Thank you

Hi @Mrudhula,
Based on your answers, it seems that the data you have entered seem ‘ok’ but it depends on the amount of zero values you have in the data set. This is why we recommend you go through the processing steps I have described earlier to you, and then apply either a PCA or a PLS-DA, before you embark on the data integration with PLS.
If you have a data entry error popping out again, let us know and we will be able to help.