PLS-DA Amount of variance explained by components


I’m using PLS-DA to discriminate the modalities of a qualitative variable Y (here kinetic points) based on gene expression data. I have some difficulties to clearly interpret the % of variance explained by each component.
In my case the first component explains 45% of the variance and the second one 15%. In other cases I found that the first axis explained less variance than the others but led to a better discrimination of the groups.
From what I understood, the best axis is not necessary the one which explains the highest amount of variance but the one which is the most discriminant. Is this correct ?
So when the first axis is the one who explains the highest amount of variance and is the most discriminant can we say that the combinations of genes explaining the highest amount of variance between samples are also the most discriminant ones ?

Thanks in advance for your help.

Florian Rocher

Hi @flrocher
Apologies for the late answer, after 200+ days in lockdown, we are not our best in terms of prductivity! :sob:

Yes your interpretation is correct. The aim of PLS-DA is to discriminate the sample groups, rather than maximising the variance. When the maximisation of the variance corresponds to a discrimination of the classes, then this is awesome, but otherwise, it means that the major source of variation is probably not mainly due to the sample group differences.


Thanks a lot for your answer !