I am using sPLS regression for the first time and I wonder what the criteria are for a good MAE/MSE and when to use which.
Thank you for your support!
Unfortunately, I have to answer a very general question with a very general answer: it depends!
There is no specific level of error rate which is deemed as “appropriate” in all contexts. For example, as a sample size decreases, the acceptable threshold of error rate is likely to increase for many researchers (ie. less samples → greater acceptable error rate). If you misclassify 1 sample out of 50, thats a 2% error rate. Misclassifying 1 sample out of 10 is an error rate of 10%. I think you can see what I’m getting at.
I am very hesitant to provide a specific value as a good starting point as I have no information about your study design or data to use to inform my recommendation.
I cannot stress this enough: you need to determine your threshold by doing a literature review and exploring your data yourself. There are no criteria which work in all contexts!
Thank you for getting back to me! Your explaination was very helpful, as I did not know that the unit was percent. I have been working with 2 (small, first, n = 40, second, n = 20) datasets and based on the first plot attached, I thought the unit might be fraction, i.e. %/100 with max = 1 and I have a model with below 20%, but then, seeing the other model I got confused as the MAE was reaching above 1. So as you are telling me now, the MAE in my plots is below 0.2% in the first and 0.83% in the second, is that correct? Just double checking as I find this very low. And looking at the first plot, it seems to me that selecting just one feature gives me a model just as good as selecting almost 200? And is it unusual for the MAE to increase as it does in my second plot?
Again, thank you for your help!
You’ll have to forgive me. I got myself confused as I was writing a few other responses at the same time. I did not mean to write the “20%”. I’ve amended it to prevent further confusion.
The MAE and MSE measurement are NOT percentages, they represent averaged metrics of the deviation between the predicted regression value and the true value.
The scale of your MAE/MSE is very case dependent, hence why I was trying to stress that there is no specific threshold which can be regarded as “good” in all contexts.
Thank you, Max. So, based on my two plots, how can I evaluate my model performance then? Or is there any other way? I have run a lot of classification models with mixomics, so I am used to evaluating the error rate, but now I would like to run some regression models as well.
These are just tuning plots - to help you decide how many features to use to construct each component. You can think of MAE/MSE as akin to error rate, but for regression contexts. We use it to select the optimal number of features (that which minimises MAE/MSE).
So, based on my two plots, how can I evaluate my model performance then?
The error metrics of a regression model are really only useful when comparing models to one another. Having a singular measure of MAE/MSE (or any related metrics) doesn’t really give any information.
My best suggestion is to build a baseline model, as well as your optimised model, and use MAE/MSE to evaluate the degree of improvement. Additionally, you could look at the R-squared metric.