I have been reading the book on mixOmics (I got to chapter 8), to prepare myself to perform an analysis of a data I have. There are some questions about sample processing that I could not solve, that is why I am asking for your help.
I understand, that in order to run the mixOmics package functions, my data must be Normalized, Centralized and Scaled (I understand that they are three different processes, from what I was reading).
I have two sets of data, obtained from an experiment on rats that we have done in the lab:
- A table of Microbiota abundance at the Family level (coming from 16S sequencing).
Here, we were given a very clear pre-processing protocol:
First question. Is the transformation of the 16S data to CLR just a step prior to normalization, centering and scaling? Or does this protocol leave the data ready to use?
- Metabolomics tables from the Metabolon, Inc. service.
Metabolon provided me with different versions of the metabolomics data: Peak Area Data, Batch-normalized Data, Batch-norm Imputed Data, Log Tranformed Data.
Originally, I was planning to use one of the last two tables, and scale and center it. However, I was presented with an issue that I don’t know how to resolve.
Not all of the rats that were subjected to Microbiota and Metabolomics sequencing match. That is, there are a couple (few) samples that I need to eliminate. Therefore, in case of eliminating a couple of subjects from the Metabolon data, should I perform the normalization again (without the rats i eliminate)?
In that case, I was looking in several sites, and I did not find a clear normalization protocol for beginners that I can apply. Does anyone have a protocol in R?
Also, I want to know if just doing the normalization is enough. Can the scaling and centering be done automatically when you run PLS? or I should do it before i run the script?
Thank you very much!