Hello,

I’m using PLS-DA to discriminate the modalities of a qualitative variable Y (here kinetic points) based on gene expression data. I have some difficulties to clearly interpret the % of variance explained by each component.

In my case the first component explains 45% of the variance and the second one 15%. In other cases I found that the first axis explained less variance than the others but led to a better discrimination of the groups.

From what I understood, the best axis is not necessary the one which explains the highest amount of variance but the one which is the most discriminant. Is this correct ?

So when the first axis is the one who explains the highest amount of variance and is the most discriminant can we say that the combinations of genes explaining the highest amount of variance between samples are also the most discriminant ones ?

Thanks in advance for your help.

Florian Rocher