Making sense of sPCA output v.2

DjamilaE · October 20, 2022, 11:28pm

Hi mixOmics team!

I’m interested in using the sPCA function on a microbiota dataset consisting of 153 genera and 158 samples. I’ve managed to run this successfully, with 2 components of interest, but I wanted to double check that I’m understanding all of the various final.spca output generated:

the variable ‘X’ represents each sample’s original (mean-centred?) abundance of each genus
the variable ‘loadings’ represents the weights (or coefficients) assigned to each of the genera to determine their contribution to each of the given components

I’m following a paper’s methods for generating scores on PCs which states that:
“a participant’s score for PC1 = (their loading of genus 1 on PC1 x their relative abundance of genus 1) + (their loading of genus 2 on PC1 x their relative abundance of genus 2) + etc.” - and then the whole process is repeated for the other PCs. I am guessing that the wording is perhaps a little bit misleading, because “their loading of genus 1 on PC1” makes it seem like they mean every sample’s individual loading, which I don’t think is the case. Rather, I think it’s “the” loading of genus 1 on PC1 which is then multiplied by each individual sample’s rel. abundance of that genus. Does that sound right?

I’m also unsure what the variable ‘variates’ is for. In the glossary it states that “variates are essentially synonymous with components” in the context of CCAs. Does this mean I should ignore this variable since I’m running sPCA? I also noticed that the values for ‘variates’ are not the same as for ‘loadings’ so I just wanted to check what each variable represents in this case.

I hope all of my questions make sense!

Thanks so much in advance.

DJ

MaxBladen · October 25, 2022, 10:53pm

the variable ‘X’ represents each sample’s original (mean-centred?) abundance of each genus

This variable will reflect your input X dataframe after centering and scaling if applied (see the center and scale parameters)

Rather, I think it’s “the” loading of genus 1 on PC1 which is then multiplied by each individual sample’s rel. abundance of that genus. Does that sound right?

Your interpretation sounds correct enough to me

Does this mean I should ignore this variable since I’m running sPCA?

No, these are your components, or rather, the projection of your samples onto the components.

DjamilaE · November 7, 2022, 3:33am

Thanks Max, that’s really helpful!

One more question: is it necessary to do centering and scaling if the abundance data is already CLR transformed? I can see that the default settings are TRUE for ‘center’ and FALSE for ‘scale’ but I was wondering what the advantage would be to centering and, potentially, scaling the data on top of CLR?

Thanks so much!

MaxBladen · November 8, 2022, 10:18pm

You don’t need to center it (the “C” in “CLR” stands for centered). Scaling is case dependent - if you features have wildly different scales then yes. If anything, you should potentially scale it prior to CLR transformation.

DjamilaE · November 11, 2022, 5:38am

Thanks very much, Max!

Topic		Replies	Views
Manually generate sPLS component scores Support	1	25	April 28, 2025
How are the plotVar coordinates (correlation circle plot) calculated? Analysis	5	1699	September 22, 2020
sPCA tuning results y-axis interpretation Analysis	2	192	November 2, 2022
Understanding loading plot	3	2298	July 19, 2020
Unexpected result in plotVar comparatively to the optimized sPLS I performed Analysis	1	411	September 22, 2020

Making sense of sPCA output v.2

Related topics