Testing/comparing two models

Malin · November 28, 2022, 2:49pm

Hello! I use mixOmics for research with 40 blood cell variables. Thank you for this very useful package! I would like to prove that analyzing data for men and women separately yields better model performance. I have a binary outcome variable and use sPLS-DA. Any suggestions on how I can best (statistically) substantiate this (not necessarily based on eye-balling but something more statistically valid)? The error rate for women separately is lower than for the pooled data, can I also provide statistical support for this? Thank you very much in advance! Best regards, Malin

MaxBladen · November 28, 2022, 9:03pm

A brief note: you state you " would like to prove that…". Be very careful about trying to actively “prove” something, rather than assessing whether your hypothesis is correct or incorrect. Especially with tools like those in mixOmics, if you approach your analysis with this mindset you are likely to find results which you want to find, rather than those that are really there.

The only way I could think to statistically assess whether the model is significantly better would be the following process:

Randomly split data into training and testing sets - stratify by the gender of the samples.
Generate a model using all samples, a model with just male samples and a model with just female samples. Use the training samples to generate these.
Assess the predictive performance of each of these samples and extract the error rate
Repeat this process many times, I would suggest a minimum of 100 times.

From here, you will have three distributions of error rates, one for all samples, one for male samples and one for female samples. Now, you can apply a t-test or some sort of ANOVA to determine if the difference in mean error rate between these model types is significant.

This introduces a host of assumptions and isn’t exactly the most rigorous procedure, but I can’t think of anything else. I would also explore different classification models, unless you are specifically looking at the sPLS-DA algorithm

Malin · November 29, 2022, 9:12am

Hi Max, thank you very much for your quick response! You are completely right, the phrasing was incorrect. Based on different classification models, a better model performance with sex-stratified data emerged, therefore I was already a little further along in my thought process/phrasing :-). For now I prefer sPLS-DA because of the lasso integration, variable selection and visualization capabilities. And I will proceed to do the error rate analysis, thank you for the suggestion! Malin

Topic		Replies	Views
Help understanding high error rate using PLS-DA Analysis	6	3607	October 21, 2020
PLS-DA questions Analysis	10	2024	April 9, 2021
Proportion explained variance in PLS vs sPLS model Analysis	4	74	March 28, 2025
ROC analysis on a PLS-DA model built on only training data Analysis	10	1913	April 18, 2024
Correct performances/error rates interpretation? Analysis	3	226	July 21, 2023

Testing/comparing two models

Related topics