Permutation Test

Lorengol · April 21, 2025, 1:28pm

I understand that the perf function evaluates the overfitting of the feautures. The method it uses is evaluating model classification performance. In case I want to do a permutation test, what command should I run. I did not see in the mixOmics package an option to do a permutation test.

evahamrud · April 28, 2025, 4:09am

Hi @Lorengol,

We don’t have a function in mixOmics to do a permutation test, but you can do this simply by randomly shuffling the Y variables on your dataset and comparing the performance of these randomly shuffled data with your real data. I’ve made a quick bit of code to do this for a PLS-DA model built with the breast tumors test dataset below that you can modify for your own use.

library(mixOmics)

# Fit model
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- breast.tumors$sample$treatment
model_real <- plsda(X, Y, ncomp = 2)
perf_real <- perf(model_real, validation = "Mfold", folds = 5, nrepeat = 10)
real_error <- perf_real$error.rate$overall[2] # extract real error rate of model on component 2

# Run permutation test to get random error rate from model
n_permutations <- 100
perf_permutations <- numeric(n_permutations)
set.seed(123) # for reproducibility
for (i in 1:n_permutations) {
  Y_perm <- sample(Y)  # Randomly shuffle Y labels
  model_perm <- plsda(X, Y_perm, ncomp = 2)
  perf_perm <- perf(model_perm, validation = "Mfold", folds = 5, nrepeat = 10)
  perf_permutations[i] <- perf_perm$error.rate$overall[2]  # Store overall error rate extracted from perf on component 2
}

# Calculate p-value: proportion of random models better than real model
p_value <- mean(perf_permutations <= real_error)
p_value

Cheers,
Eva

Lorengol · April 28, 2025, 5:27pm

Eva! Thank you very much!!! I will try it

Lorengol · April 28, 2025, 5:47pm

Eva! In my case, I perform a sPLD-DA (Diablo). So I have multiple data sets. In this case, I should extract real_error for each variable set?

evahamrud · May 9, 2025, 4:07am

I think in terms of the permutation test you can just run the code above as is, but swap the plsda model with your DIABLO model, you still have just a categorical Y output which you can randomly shuffle the labels for and assess the classification error rate.

Lorengol · May 22, 2025, 1:45pm

Eva. Thank you very much!!! I ran it. Now I have a doubt. The p-value I obtain is 0! It is too good to be perfect.

I think on two possibilities:

There is a strong signal in our multi-omics data that robustly differentiates the four groups. Even when the group labels are shuffled, the underlying structure in your omics data might not yield models with classification error rates as low as your actual model.
Could be “The small n of one group (n=3) might limit the ability to detect overfitting with cross-validation (perf function) prediction methods. Because of this group size, we had to reduce the number of folds to 3, thus assembling groups of 3 (the other three groups has n = 6). I believe that assembling groups of n=3 (permuted) of which have 6 samples makes it more difficult to achieve a model with a low error rate.

Do you know what the problem may be?

evahamrud · May 30, 2025, 5:29am

I think having a group with n=3 is definitely too small to make any reliable conclusions as to how well your multiomics data can distinguish that group. Instead of permutation tests and p-values I would recommend using the perf() function we have in mixOmics to assess model classification performance with error rates and use visualisations like sample plots (plotIndiv()) to explore how your groups separate across your data.

If you would really like a reliable p-value perhaps you need to exclude your small n=3 group and re-run the permutation test with your remaining larger groups.

Cheers,
Eva

Lorengol · June 4, 2025, 2:16am

Eva,

Very clear. I agree with you.
Thank you very much for you help!

Best,

Lorenzo

Topic		Replies	Views
PLS and Permutation Testing	1	28	May 16, 2025
Significance of between-group difference Analysis	3	186	January 19, 2024
New to mixOmics - sPLS perf function gives error Support	4	244	June 22, 2023
Permuted Labels Case Study	1	20	December 9, 2024
Double cross-validation Analysis	6	1066	January 10, 2023

Permutation Test

Related topics