Filtering before applying sPLS-DA on DNA-methylation data

rita · June 12, 2026, 9:32am

I am working with methylation data from long reads, which includes approximately 12 million CpG sites. According to the mixOmics guidelines, it is recommended to use a maximum of around 10,000 features.
I initially applied filtering by removing the bottom 5% of variance and excluding CpG sites with mean beta values close to zero or one. However, after these steps, I still retained a very large number of CpG sites (around 7 million).
Running mixOmics with such a high number of features is extremely time-consuming and leads to instability in the analysis.
Do you have any recommendations for more effective filtering strategies to reduce the number of features to a manageable size?
Thank you very much for your help.

Topic		Replies	Views
Pre-filtering data prior to sPLS Analysis	1	256	November 1, 2022
PLS-da on DNA-methylation data Analysis	2	679	September 14, 2020
Filtering large data to use with DIABLO Analysis	1	1192	April 25, 2020
Filtering features for classification using Diablo Analysis	2	449	April 13, 2021
Unable to understand selectVar() output in sPLS-DA Bugs	4	1114	June 9, 2020

Filtering before applying sPLS-DA on DNA-methylation data

Related topics