I understand that the perf function evaluates the overfitting of the feautures. The method it uses is evaluating model classification performance. In case I want to do a permutation test, what command should I run. I did not see in the mixOmics package an option to do a permutation test.
Hi @Lorengol,
We don’t have a function in mixOmics to do a permutation test, but you can do this simply by randomly shuffling the Y variables on your dataset and comparing the performance of these randomly shuffled data with your real data. I’ve made a quick bit of code to do this for a PLS-DA model built with the breast tumors test dataset below that you can modify for your own use.
library(mixOmics)
# Fit model
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- breast.tumors$sample$treatment
model_real <- plsda(X, Y, ncomp = 2)
perf_real <- perf(model_real, validation = "Mfold", folds = 5, nrepeat = 10)
real_error <- perf_real$error.rate$overall[2] # extract real error rate of model on component 2
# Run permutation test to get random error rate from model
n_permutations <- 100
perf_permutations <- numeric(n_permutations)
set.seed(123) # for reproducibility
for (i in 1:n_permutations) {
Y_perm <- sample(Y) # Randomly shuffle Y labels
model_perm <- plsda(X, Y_perm, ncomp = 2)
perf_perm <- perf(model_perm, validation = "Mfold", folds = 5, nrepeat = 10)
perf_permutations[i] <- perf_perm$error.rate$overall[2] # Store overall error rate extracted from perf on component 2
}
# Calculate p-value: proportion of random models better than real model
p_value <- mean(perf_permutations <= real_error)
p_value
Cheers,
Eva
Eva! Thank you very much!!! I will try it
Eva! In my case, I perform a sPLD-DA (Diablo). So I have multiple data sets. In this case, I should extract real_error for each variable set?
I think in terms of the permutation test you can just run the code above as is, but swap the plsda
model with your DIABLO model, you still have just a categorical Y output which you can randomly shuffle the labels for and assess the classification error rate.