Making sense of sPCA output v.2

Hi mixOmics team!

I’m interested in using the sPCA function on a microbiota dataset consisting of 153 genera and 158 samples. I’ve managed to run this successfully, with 2 components of interest, but I wanted to double check that I’m understanding all of the various final.spca output generated:

  • the variable ‘X’ represents each sample’s original (mean-centred?) abundance of each genus
  • the variable ‘loadings’ represents the weights (or coefficients) assigned to each of the genera to determine their contribution to each of the given components

I’m following a paper’s methods for generating scores on PCs which states that:
a participant’s score for PC1 = (their loading of genus 1 on PC1 x their relative abundance of genus 1) + (their loading of genus 2 on PC1 x their relative abundance of genus 2) + etc.” - and then the whole process is repeated for the other PCs. I am guessing that the wording is perhaps a little bit misleading, because “their loading of genus 1 on PC1” makes it seem like they mean every sample’s individual loading, which I don’t think is the case. Rather, I think it’s “the” loading of genus 1 on PC1 which is then multiplied by each individual sample’s rel. abundance of that genus. Does that sound right?

I’m also unsure what the variable ‘variates’ is for. In the glossary it states that “variates are essentially synonymous with components” in the context of CCAs. Does this mean I should ignore this variable since I’m running sPCA? I also noticed that the values for ‘variates’ are not the same as for ‘loadings’ so I just wanted to check what each variable represents in this case.

I hope all of my questions make sense!

Thanks so much in advance.

DJ

the variable ‘X’ represents each sample’s original (mean-centred?) abundance of each genus

This variable will reflect your input X dataframe after centering and scaling if applied (see the center and scale parameters)

Rather, I think it’s “the” loading of genus 1 on PC1 which is then multiplied by each individual sample’s rel. abundance of that genus. Does that sound right?

Your interpretation sounds correct enough to me

Does this mean I should ignore this variable since I’m running sPCA?

No, these are your components, or rather, the projection of your samples onto the components.

Thanks Max, that’s really helpful!

One more question: is it necessary to do centering and scaling if the abundance data is already CLR transformed? I can see that the default settings are TRUE for ‘center’ and FALSE for ‘scale’ but I was wondering what the advantage would be to centering and, potentially, scaling the data on top of CLR?

Thanks so much!

You don’t need to center it (the “C” in “CLR” stands for centered). Scaling is case dependent - if you features have wildly different scales then yes. If anything, you should potentially scale it prior to CLR transformation.

Thanks very much, Max!