Explained variance of Y in splsda

emile.mardoc · July 1, 2022, 8:34am

Dear mixOmics team,

I know some questions were very close to the one I am asking, but I haven’t found the exact answer I am looking for.

When we use for instance splsda, we can print the explained variance of X (which is not very interesting as we are interested in Cov(X,Y) and not Var(X) ) by component, and the same for Y.
However, I can’t understand how it is possible to get the explained variance of a binary variable (Y), and why in the attached graph we see that the explained variance of Y is equal to 1 in the first component when we clearly see that comp1 does not perfectly discriminate the two groups.

Could you please help me explain this?

Best,
Emile

MaxBladen · October 16, 2022, 11:53pm

Hi @emile.mardoc

After doing some thinking and some reading, I think I’ve determined the cause of the explained variance equaling 1 for the first Y component.

In your scenario, the Y dataframe is a represented by a single variable (0 or 1 for each class). As far as the method is concerned, this is considered its own “block” - in the same way your various X blocks are treated.

Components generated for the X blocks use a combination of all the input features. For example, if you have three features, the loadings for the first component might be 0.3, 0.8 and 0.5. Using these weights in a linear combination of the input features allows us to represent all three features “simultaneously” with the one component.

Now when we try to do the same for the Y block, there is only a single variable to generate a component from. Therefore, it just uses this component as is (sometimes flipping the sign), so the resulting loading will just be 1 (or -1). Hence, when calculating the explained variance, the original Y data and the first Y component are essentially identical, meaning the proportion of explained variance is equal to 1.

When calculating the explained variance for subsequent Y components, the process is a little more complicated and subject to few different requirements. Hence, the second (and further) components are not identical to the Y vector, resulting in an explained variance value lower than 1.

Hope this clarifies things a bit

emile.mardoc · October 24, 2022, 9:10am

Thank you @MaxBladen , it is clearer to me know, and explains the problem I had with this output

Best,
Emile

Topic		Replies	Views
Block PLS: variance explained by components Analysis	2	402	September 12, 2023
Variance explained in PLS-DA in X and Y Analysis	5	93	December 11, 2024
Help with the % variance explained in block splsda (diablo) Analysis	6	495	October 17, 2022
Block.plsda explained variance Analysis	0	304	July 20, 2021
Understanding interpretation of higher percent variance in Component 2	1	455	September 21, 2023

Explained variance of Y in splsda

Related topics