Prediction results

Fabien-Filaire · August 1, 2023, 7:13am

Dear all,

I did a tuned sPLS-DA to discriminate between two groups. The group separation is clear and the error rate per class is 0.01 for both classes. I am really happy about it and it is logical based on biological results on both groups and other analyses.
However, when I want to predict the class of new data (3 samples per class), but the classification correctly classify 1 class but completely failed for the other one. (see below)

I was so surprised by this that instead of giving new data to the test matrix I gave, as a trial, data that I use in the training part. The results are exactly the same.

Is this normal or I am doing something wrong?

Thank you

Best regards

Fabien Filaire

Margonmon · August 1, 2023, 11:12am

Hi Fabien!

First of all, I am not part of the mixOmics team, so maybe my answer is not entirely accurate.

From what you comment, it is likely that you have created a classification model that is too tightly tuned to the training data (overfitting) and, by passing it the unlabeled data for the test, the model is not able to perform a good classification.

To begin with, are the groups balanced when you train the model? If they are not, I have used SMOTE to balance the groups, as long as I take into account the risks of balancing groups synthetically. I leave here some material in case you are interested in reading more about this.

Best regards,
Marta

Fabien-Filaire · August 1, 2023, 1:22pm

Hi Marta,

Thank you for your answer.

Yes I assumed that too but I am really surprised to see that it did work better the second time.
My groups are perfectly balanced… but thanks for the tips I’ll need something like this soon

Fabien

kimanh.lecao · August 3, 2023, 10:53pm

hi @Fabien-Filaire

@Margonmon is correct in the overfitting issue. We don’t know what is the sample size in your training set, but basically the model is not generalising well to new data (it seems to be biased towards that first Ctrl class).
Of note: when we perform cross validation (in tune, predict) we subsample so as to respect the unbalance of the classes, and this is why we recommend reporting the Balanced Error Rate (BER) rather than the error rate.

Kim-Anh

Fabien-Filaire · August 4, 2023, 7:36am

Hi @kimanh.lecao ,

Thank you for your answer.
Actually I changed my pre-processing and it works better. I had variables that create too much confusion for nothing.

I do not understand your point about BER. My understanding was to focus on BER with my initial groups were not balanced. Mine are (3 groups of 15 samples/group).

Thank you again

Best regards

Fabien

kimanh.lecao · August 10, 2023, 11:06pm

Great @Fabien-Filaire,
I had no information regarding whether your groups were balanced or not. It was a generic comment (both BER and ER would be the same if your classes are balanced anyway).

Kim-Anh

Topic		Replies	Views
PLS-DA with missing '' values predicted in Y Analysis	1	722	April 26, 2020
Testing/comparing two models Analysis	2	232	November 29, 2022
Correct performances/error rates interpretation? Analysis	3	224	July 21, 2023
Help understanding high error rate using PLS-DA Analysis	6	3591	October 21, 2020
Splsda error rate Analysis	1	770	September 7, 2020

Prediction results

Related topics