Hi everyone,
I am reading in the literature about how to interpret MSEP/RMSEP/R2/Q2 for my sPLS models and I am at a loss with the maths! I understand that lower MSEP/RMSEP indicate a better predictive accuracy, a higher R2 suggests a better fit to the training data and a higher Q2 suggests a better predictive ability. However, what does getting an MSEP of 1.04 ± 0.23 for example actually mean?
Also, as I only have 10 samples, my Q2 starts close to 0 or even negative, and adding a 2nd component always makes it smaller/more negative. Does this mean I am (already) overfitting my data?
Thank you in advance!
Best wishes,
Evelyn
Hi @windsnowflake,
There are no fixed rules as to which MSEP values mean your model has ‘good’ or ‘bad’ predictive ability, instead this depends on your dataset and the scale of the Y variables you are predicting. An MSEP of 1.04 it means that, on average, your model’s predictions differ from the actual values by about 1.04 units (squared), but whether this is a big or small difference depends on the Y variables you are predicting.
Your interpretation is correct, if adding a 2nd component consistently makes your Q2 smaller or more negative then you are probably overfitting, which is going to be a problem with a smaller sample size.
Based on the other information you’ve shared about your analysis and the fact that you have 10 samples, I am wondering whether it is worth reducing the complexity of the analysis and running a sPLS-DA with just one covariate (like before/after treatment) might be more suitable than a PLS trying to account for more information. You can always try a few things and see which makes more sense when you plot visualisations and performance assessment outputs 
Hope that helps!
Eva