The study design:
-
I have three datasets (RNA-Seq, ATAC-Seq and methylation) on the same samples I would like to integrate using DIABLO.
-
Samples have been stimulated and treated with various stimuli and treatments respectively, resulting in 9 repeat samples per cell line and a total of 18 groups (the Y outcome variable).
Q1: Do I first the decompose the input data for repeat measures using withinVariation() for each dataset and then use the standard analysis DIABLO pipeline? Or do you recommend a different method? I understand that other mixomics methods have multilevel functionality incorporated.
The closest answer I could find in the forum is this: withinVariation() on part of dataset - #2 by MaxBladen
Q2: The RNA-Seq and ATAC-Seq are highly correlated > 90% whilst the RNA-Seq and Methylation, and ATAC-Seq and methylation only correlate by 50%.
I have trialled various weights in the design ranging from 0.1 to 0.9. I find that the classification error rate is very high (> 0.8), regardless of the weight and regardless of whether I used withinVariation().
Can the classification error rate be improved? Is it so high because of the repeat measures and/or having a Y variable of 18 groups? Is 18 groups too high?
Thanks for your help!