Hello, and thank you for your great work, I have enjoyed using mixomics and more recently also reading the book.
Previously I have used mixomics DIABLO to analyse similar data as in the tutorial (RNAseq, methylation array, microbiome) but now have a project involving mice with different types of measurements such as immunohistochemistry, short chain fatty acid profiles (SCFAs), qPCR as well as behavioural assessments all with number of variables less than a dozen. In addition, we also have more “traditional” high dimensional omics measurements from microbiome (16S), and metabolome (HPLC-MS). The mice are separated to two groups, case and control.
My question is: how to go about incorporating the immunehistochemistry, SCFA, qPCR and behavioural -measurements to the analysis? Would DIABLO be suitable for the analysis, i.e. would they be considered as different omics?
If so, then would they all go in separate blocks, or could they / some of them be grouped into one block, since they don’t have that many variables?
Also, how to go about normalization of these unconventional omics? Is it enough to just center and scale them as DIABLO does? When the lab analyzes them separately (t-tests, anova) they do just that.
Thank you for the help!
Hi again, and sorry to spam you. I hope you get the chance to look into this soon. If it helps, I could refine the question a bit:
Can I combine data measured with different technologies (e.g. RT-qPCR and flow cytometry with gas chromatography) into one omic -block for DIABLO?
I recall that the values in the individual omics should be on a similar scale, but can’t find the reference for this. If the measurements can be combined, should I normalize them individually first?
I also looked into the preprocessing a bit more, and found that RT-qPCR is already normalized so that the data is centered using the control, i.e. x - mean(x_ctrl)
Should I use the raw RT-qPCR values instead and allow mixomics to do the centering & scaling?
As for the flow cytometry and gas chromatography, the values are shown as proportions, and I assume the correct normalization approach would be centered log ratio (CLR)?
Many thanks!
hi @karoliinas
(we / I only answer on Fridays when I have some time)
You should consider each data type as a dataset (e.g immunehistochemistry, SCFA, qPCR and behavioural -measurements == 4 datasets) and normalise them individually first.
For the normalisation, this is outside the package and I don’t have recommendations. Based on previous single-dataset analysis, you should be able to tell if a normalisation is appropropriate. (and yes CLR is good for proportions, see also mixMC Preprocessing | mixOmics although here we start from count data).
In most methods we center and scale per variable (optional in PCA, but by default in PLS methods).
Kim-Anh
Hi @kimanh.lecao and thank you for looking into this. I appreciate your insights! I have tested different normalizations, and found that for qPCR data doing log2 transformation brings the values closer to normal distribution. For compositional/proportional data I used CLR as you suggested. Behavioural variables were normalized using data from a control mouse -cohort. Using DIABLO with default settings (scale = T) nicely picks out the variables that separate the groups in the component -plot as well as the heatmap, even though the datasets only have a few variables to begin with.
1 Like