Hello , I’m currently using time omics to analyze time course data. However, I had questions regarding transforming my data for the analysis. I know that transforming your data is an essential process for clustering. I noticed that scaling my data from normalized read counts from DESeq2 does not provide meaning info from feature selection. So I decided against it and did it on purely the normalized counts of RNA seq and quantile normalization of proteomics. They are on different scales but give meaning results in the system of development that we are studying. As expected the results are dominated by genes that have very large abundance but are relevant to the biology of the organism. Is this a valid reason to do a PCA/ PLS without scaling? Or would you recommend something else instead?
hi @bzavala,
(I am assuming you use PCA or PLS in timeOmics to do your analyses)
When you say
As expected the results are dominated by genes that have very large abundance
I assume you mean they appear in the clusters more than the proteins.
In timeOmics we propose to center and scale across time and after the spline modelling. It helps to capture similar behaviours, regardless of abundance. That is helpful in PLS to adjust for the difference in scale between the datasets.
You could run with center + scale first, extract the clusters and then explore in depth the original data for the proteins.
Your approach is valid if you are also interested in high abundances. You could also consider filtering the proteins beforehand with ‘at least one data point with high abundance’ before entering / scaling. Also note that proteins data are inherently noisy so capturing trends can be difficult.
Kim-Anh