Issue with "timeOmics"

Hello,

I am starting to work with “timeOmics”. At present, I have only limited experience about this algorithm and therefore my question is probably a very naive one.

I have a total number of 10 individuals, 5 of them belonging to group A and 5 of them belonging to group B. The two groups correspond to biological differences. For each individual I have RNAseq data produced at 4 different time points. Thus, the total number of samples I am working on is 40.

I have performed a first round of analysis using all the 40 samples, i.e. without separating the two biological groups, and the results I have obtained look strange. The “silhouette plot” corresponding to the PCA shows only negative values, suggesting poor clustering of the samples.

I am wondering if the problem is that I have not separated the two biological groups. In this case, could I ask which is the best workflow to use? Should I analyze the two groups separately? How should I go ahead then?

Thanks in advance,

Marco

Hi @Marco

It all depends on your question. In our paper we separated the groups before we fitted the splines, as we assumed that coordinated patterns would be very specific of each group. It would be the reason for the silhouette plot.

TimeOmics works by first fitting splines (using the lmms package - unfortunately this is not regularly maintained on CRAN, and so you would need to use a RStudio instance specifying an earlier version of R in order to install it!). It would be useful for you first to extract the splines from either timeOmics or lmms, and then do a PCA plot to understand what is going on there at the sample level, both groups together and each group separately.

Kim-Anh

Hello Kim-Anh,

Thanks for your anwer. In fact, I am not sure what to expect in my case, I would say just limited differences between the two groups. In light of this, I could analyze the two groups together.

I fit the splines with “lmms”, which I installed by using this turnaround:

devtools::install_github(“cran/lmms”)

Anyway, to date I have not been able to obtain a valid model with my data. I have 53,000 sunflower genes, which were obtained by filtering out low expressed ones – the initial number is much higher.

I tested several levels of filtering based on the CV. I filtered out genes with low CVs, as suggested, but I also filtered out genes with extremely high CVs, because those CV values seemed to correspond to unusually low values of the means of the expression values. Overall, I have the feeling that there is a lot of variation among the different individuals used for the analysis.

I always get a PCA silhouette plot with just one acceptable component. I have attached the silhouette plot and the longitudinal plot corresponding to the most extreme level of filtering (i.e. 1414 genes).


I have the feeling that there are two many different “patterns” of longitudinal gene variation, and that these patterns are not properly clustered. Of course, the 2 components have been forced when performing the PCA, because just 1 was considered acceptable.

Thanks again and see you soon,

Marco

hi @Moroldo,
To finish the discussion: I think in your case the Silhouette might not be highly relevant and that you could be more lenient in looking at other components and patterns for an exploration and description of your data.
Evaluation parameters are here to guide the analysis, but may not always apply in every case.

Kim-Anh

Hello Kim-Anh,

Thanks again for your help and for your time.
Have a nice day!

Marco