Hi, thanks for this cool tools and this open community. We are running a project where the first strategy is to integrate multi-omics data. As we have worked with Diablo, some doubts have arisen. I hope you can help us to better understand and perform this magnificent method.
-
From what I understand, Diablo is capable of handling large high dimensional omic data and during the computation of Diablo there is a penalty step. In some comments o this forum I read that too many predictors from the start, even with an internal prefiltering step, could breaks down. But in general, do you recommend performing feature selection before analyzing the data with Diablo by, for example, pre-filtering the data with a lasso regression or selecting the most variable genes calculated by the mean absolute deviation, or would it be better to analyze all data (i.e. expression matrix where low counts genes are removed) without any additional filter, although some problems like runtime might appear?
-
We are interested in integrating mutations and copy number alteration along with RNA expression. But since this data type is binomial, it is difficult to handle with traditional statistical approaches that assume normality like PLS. As a contingency strategy, we pre-filtered the mutation and SCNA data with the nearZeroVar function. Do you have any other suggestions for managing this type of data?
We will appreciate your comments.
Greetings