DIABLO adjusting for repeat measures using withinVariation()

The study design:

  • I have three datasets (RNA-Seq, ATAC-Seq and methylation) on the same samples I would like to integrate using DIABLO.

  • Samples have been stimulated and treated with various stimuli and treatments respectively, resulting in 9 repeat samples per cell line and a total of 18 groups (the Y outcome variable).

Q1: Do I first the decompose the input data for repeat measures using withinVariation() for each dataset and then use the standard analysis DIABLO pipeline? Or do you recommend a different method? I understand that other mixomics methods have multilevel functionality incorporated.

The closest answer I could find in the forum is this: withinVariation() on part of dataset - #2 by MaxBladen

Q2: The RNA-Seq and ATAC-Seq are highly correlated > 90% whilst the RNA-Seq and Methylation, and ATAC-Seq and methylation only correlate by 50%.

I have trialled various weights in the design ranging from 0.1 to 0.9. I find that the classification error rate is very high (> 0.8), regardless of the weight and regardless of whether I used withinVariation().

Can the classification error rate be improved? Is it so high because of the repeat measures and/or having a Y variable of 18 groups? Is 18 groups too high?

Thanks for your help!

hi @Anya,

  1. For DIABLO we have not implemented the multilevel function, so yes, you you need to do this prior to the analysis, on each data set.
  2. The high error rate can be due to different aspect of your data: some datasets just don’t discriminate (a PLSDA would tell you this, or extracting the error rate per data set from a perf DIABLO object). And yes, I think the number of groups is very high. I’d recommend you backtrack and inspect each data set using PLS-DA, and look particularly at the error rate per group (perf from PLS-DA), and how the samples / groups overlap on the sample plot. The other option is to include more components (up to K-1 = 17) but from an interpretation perspective this will make it very hard for yourself! (and for the tuning of DIABLO).

Kim-Anh