DIABLO adjusting for repeat measures using withinVariation()

Anya · January 22, 2025, 3:13am

The study design:

I have three datasets (RNA-Seq, ATAC-Seq and methylation) on the same samples I would like to integrate using DIABLO.
Samples have been stimulated and treated with various stimuli and treatments respectively, resulting in 9 repeat samples per cell line and a total of 18 groups (the Y outcome variable).

Q1: Do I first the decompose the input data for repeat measures using withinVariation() for each dataset and then use the standard analysis DIABLO pipeline? Or do you recommend a different method? I understand that other mixomics methods have multilevel functionality incorporated.

The closest answer I could find in the forum is this: withinVariation() on part of dataset - #2 by MaxBladen

Q2: The RNA-Seq and ATAC-Seq are highly correlated > 90% whilst the RNA-Seq and Methylation, and ATAC-Seq and methylation only correlate by 50%.

I have trialled various weights in the design ranging from 0.1 to 0.9. I find that the classification error rate is very high (> 0.8), regardless of the weight and regardless of whether I used withinVariation().

Can the classification error rate be improved? Is it so high because of the repeat measures and/or having a Y variable of 18 groups? Is 18 groups too high?

Thanks for your help!

kimanh.lecao · January 23, 2025, 9:57pm

hi @Anya,

For DIABLO we have not implemented the multilevel function, so yes, you you need to do this prior to the analysis, on each data set.
The high error rate can be due to different aspect of your data: some datasets just don’t discriminate (a PLSDA would tell you this, or extracting the error rate per data set from a perf DIABLO object). And yes, I think the number of groups is very high. I’d recommend you backtrack and inspect each data set using PLS-DA, and look particularly at the error rate per group (perf from PLS-DA), and how the samples / groups overlap on the sample plot. The other option is to include more components (up to K-1 = 17) but from an interpretation perspective this will make it very hard for yourself! (and for the tuning of DIABLO).

Kim-Anh

Topic		Replies	Views
Biological replicates and Diablo Analysis	6	793	August 11, 2022
Integration with DIABLO for N-ingretaion with low sample size Analysis	7	3156	June 27, 2024
withinVariation() on part of dataset Support	1	452	February 3, 2022
Analytical issues using DIABLO Analysis	2	737	April 13, 2022
DIABLO for small N Analysis	1	877	April 15, 2020

DIABLO adjusting for repeat measures using withinVariation()

Related topics