Sample size for PLS and DIABLO


I have a question about your mixOmics R package. Do you have any recommendations for what the “minimum sample size” should be for using PLS to integrate 2 data sources, or DIABLO to integrate 3 data sources? Or are you aware of any publications that give sample size recommendations for PCA or PLS?

My dataset has 3 data sources (each with about 10-20 variables), and I have 3 groups of subjects, each with N=10 (total N=30). I plan to only use unsupervised PLS and DIABLO and not use the supervised versions in order to help prevent over-fitting due to my smaller sample size. I’m hoping N=10 per group is enough to use mixOmics but I am not sure if this is a good idea.



hi Jaron,
consider this post: Integration with DIABLO for N-ingretaion with low sample size

Basically, you can choose an exploratory analysis (no cross-validation, just data mining), or use leave-one-out with caution in your interpretation. It all depends on what you say about the results that really matters.

There is not set formula to define N, it depends on the number of datasets, the variation and noise, the separation between sample classes …