Explained variance of Y in splsda

Dear mixOmics team,

I know some questions were very close to the one I am asking, but I haven’t found the exact answer I am looking for.

When we use for instance splsda, we can print the explained variance of X (which is not very interesting as we are interested in Cov(X,Y) and not Var(X) ) by component, and the same for Y.
However, I can’t understand how it is possible to get the explained variance of a binary variable (Y), and why in the attached graph we see that the explained variance of Y is equal to 1 in the first component when we clearly see that comp1 does not perfectly discriminate the two groups.

Could you please help me explain this?


Hi @emile.mardoc

After doing some thinking and some reading, I think I’ve determined the cause of the explained variance equaling 1 for the first Y component.

In your scenario, the Y dataframe is a represented by a single variable (0 or 1 for each class). As far as the method is concerned, this is considered its own “block” - in the same way your various X blocks are treated.

Components generated for the X blocks use a combination of all the input features. For example, if you have three features, the loadings for the first component might be 0.3, 0.8 and 0.5. Using these weights in a linear combination of the input features allows us to represent all three features “simultaneously” with the one component.

Now when we try to do the same for the Y block, there is only a single variable to generate a component from. Therefore, it just uses this component as is (sometimes flipping the sign), so the resulting loading will just be 1 (or -1). Hence, when calculating the explained variance, the original Y data and the first Y component are essentially identical, meaning the proportion of explained variance is equal to 1.

When calculating the explained variance for subsequent Y components, the process is a little more complicated and subject to few different requirements. Hence, the second (and further) components are not identical to the Y vector, resulting in an explained variance value lower than 1.

Hope this clarifies things a bit

Thank you @MaxBladen , it is clearer to me know, and explains the problem I had with this output :slight_smile:


1 Like