Thanks for the great work with mixOmics!!
I have data from many different omics-modalities, carried out on various tissues from the same samples across all omics…
I want to start simple; I have RNA-seq data and phosphoproteomics (LC-MS/MS data, from the same placenta ctl + treatment samples (10 treatment, 10 ctl)
I want to compare the treatment response from RNA-seq data with treatment response from phosphoprotemic data.
I have TPM values for RNA-seq data and normalised sitewise peaks from the phosphoproteomics data.
I know from doing some correlations that there is some but not tremendous overlap of DE genes/phosphoproteins across the treatment samples.
How do I start - Should I use the sPLS model? Or?
Starting simple is the way to go
I’ll do a PCA per omics, then a PLS-DA per omics, and then start integrating with PLS, then move to DIABLO.
Each step will be important (even in an unsupervised context such as PCA and PLS) for you to get to know your data better. Make sure your data are normalised well (I can’t really comment on TPM but basically your PCA plots should look somewhat round, not weird shapes as I show here on the left side of the plot (this is a really really obvious example). So you can try also different transformations, e.g log helps.
Be aware that it’s actually rare to find those ‘known’ between genes and phosphoproteins, because of the noise / technology and what we think should happen and does not really happen in the data…