Interpretation of MSEP/RMSEP/R2/Q2

Hi everyone,

I am reading in the literature about how to interpret MSEP/RMSEP/R2/Q2 for my sPLS models and I am at a loss with the maths! I understand that lower MSEP/RMSEP indicate a better predictive accuracy, a higher R2 suggests a better fit to the training data and a higher Q2 suggests a better predictive ability. However, what does getting an MSEP of 1.04 ± 0.23 for example actually mean?

Also, as I only have 10 samples, my Q2 starts close to 0 or even negative, and adding a 2nd component always makes it smaller/more negative. Does this mean I am (already) overfitting my data?

Thank you in advance!

Best wishes,
Evelyn

Hi @windsnowflake,

There are no fixed rules as to which MSEP values mean your model has ‘good’ or ‘bad’ predictive ability, instead this depends on your dataset and the scale of the Y variables you are predicting. An MSEP of 1.04 it means that, on average, your model’s predictions differ from the actual values by about 1.04 units (squared), but whether this is a big or small difference depends on the Y variables you are predicting.

Your interpretation is correct, if adding a 2nd component consistently makes your Q2 smaller or more negative then you are probably overfitting, which is going to be a problem with a smaller sample size.

Based on the other information you’ve shared about your analysis and the fact that you have 10 samples, I am wondering whether it is worth reducing the complexity of the analysis and running a sPLS-DA with just one covariate (like before/after treatment) might be more suitable than a PLS trying to account for more information. You can always try a few things and see which makes more sense when you plot visualisations and performance assessment outputs :slight_smile:

Hope that helps!
Eva