Hi all,
I have recently discovered mixOmics and I have experimented with it, but I have some questions regarding my results.
My study design includes 2 groups of different individuals, and for each individual I have 2 time points, so I am using the multilevel design.
I have a metabolomics dataset (700x250) and a proteomics dataset (700x1400). I am applying sPLS in order to find correlations between metabolites and proteins and I am using the canonical mode in order to consider both datasets symmetric.
Based on the Q2 score, I proceeded with 2 components. During the tuning procedure, only 5 variables (the lowest I tested) were chosen from each dataset for each component (so, 10 metabolites and 10 proteins overall).
The explained variance in the metabolomics dataset is 0.22 and 0.26 for the 2 components, whereas for the proteomics it’s 0.01 and 0.02.
- I find this a bit weird. I have hundreds of variables, and it chooses only 10, the lowest I tested, while it captures so little variance for the proteins.
- Since the explained variance is so small but also so different between the datasets, does it make sense to proceed?
In the cim plot, for each protein in each component, the correlation value with all the selected 5 metabolites is (almost) identical. This results in a heatmap where the whole column (protein) has the same value.
These 5 metabolites are also almost the same regarding their biological function. So, overall, this doesn’t leave me much room for biological interpretation, but on the other hand I guess it makes sense (since they’re so similar, you expect them to be similarly correlated with the same proteins).
3. However, I’m wondering if I can somehow work around this and make it select only 1 of these highly correlated metabolites, with the hope of uncovering more metabolite-protein correlations.
As a different analysis, I also tested the same metabolites with a microbiome dataset (700x700).
The explained variance in the microbiome dataset is similar to the proteomics dataset (too small). Overall it chose 1 component (but I changed it to 2), 10 metabolites and 30 microbes.
In the heatmap we observe 4 major correlations and the rest are very low. The pattern is the same (the same value in the whole column). But what troubled me the most is that the selected metabolites in the first component are exactly the same as in the previous analysis, and on the second component also very similar.
- So this makes me wonder: Does the variable selection take place in each dataset separately? And then we test the correlations of these variables between the datasets? From what I have understood, this is not what’s happening. So, did I get so “lucky” that in both analyses the bigger correlations involved the same metabolites?
I hope all this make sense and sorry for the long post!
Thank you,
Christina