# DIABLO's methods: SGCCA vs PLS

Hi!

I am trying to understand the methods behind DIABLO and I am confused between SGCCA and PLS. In DIABLO’s paper, it is explained that DIABLO extends SGCCA, while in youtube tutorials it seems that DIABLO is based on PLS. From the block.splsda() manual specifications, I understand that actually DIABLO is based on both SGCCA and PLS.

I know, from Tenenhause & Tenenhause, 2011 and from Tenenhause et al., 2014, that RGCCA, from which SGCCA derives, employs an approach from PLS which allows it to indicate the degree of conection between the different blocks of data.

I would like to ask the following questions:

1. Which is the relationship between SGCCA and PLS? from what I read, I understand that both of them are methods to reduce the dimensions of a matrix. So, in which way DIABLO employs SGCCA and in which way does it employ PLS?

2. In which way does PLS allow to indicate the degree of the connection between blocks of data? I know that the design matrix is employed to achieve this, but, I don’t understand why SGCCA needs to employ PLS to do it.

3. Could you explain what is actually this connection between blocks of data? If DIABLO is going to maximize the covariance between the latent variants of the different blocks of data; what is the difference between indicating a null design matrix or a full design matrix? I know the first one means that the program is going to focus more on the discriminant variants of each block regarding the groups of samples instead of the connection between blocks of data; but, I don’t understand the methods behind this. From the equation (1) explained in DIABLO’s paper, I understand that the design matrix is multiplied by the covariances between blocks of data for each component; so, if the design matrix is 0, then the result of the equation should also be 0. Could you explain this? I am new to this type of methods and I am lost.

4. Coming back to my first questions, I understand that this equation comes from SGCCA, so, how can I explain the impact of PLS on the calculation of the principal components? (both on the component scores and on the loading vectors?)

Here I paste a picture of the equation I am referring to:

Thank you very much!

I’ve been doing some digging through the `mixOmics` source code to answer your questions, @Jeni. Here’s what I’ve found:

1. sGCCA and sPLS are complementary methods feature reduction methods. However, sGCCA (note the “G”) is a “generalised” form of sCCA to work in multiblock contexts. rGCCA (note the “r”) is the “regularised” form of GCCA. Within `mixOmics` specifically, sGCCA and DIABLO (`block.splsda()`) essentially call the exact same code, and these derive from the PLS algorithm. However, rGCCA uses methodology derived from Tenenhaus, A. and Guillemot.

Its easier to think of DIABLO as being able to behave as either a pseudo rGCCA algorithm or a PLS-derived one, not both simultaneously.

2. The design matrix is employed at each iteration of the PLS algorithm and is involved in the calculation of loadings and variates (see here). PLS is not required for sGCCA, but a portion of the PLS algorithm was repurposed to format the data into a form sGCCA can use effectively (see here).

3. I feel you have somewhat answered your own question here. If the covariance between a specific pair of blocks is not of importance to consider, then set this value in the design matrix to 0. It therefore has no contribution to the above equation, but not that we are summing over all design values. This is only a problem if all design values are 0, but then there’s no point in using this method if the design matrix is all 0’s.

4. I’m a bit confused as to what you’re actually asking here sorry. Do you mean from a mathematical, a programmatic or a conceptual level?

As a result of that, you may ask why the CCA acronym is used rather than PLS in the available resources. Within `mixOmics`, “sGCCA” uses the PLS algorithm to decompose the input data, but algorithms from the `rGCCA` package are utilised to deflate the dataframes. Refer to ** at bottom for explanation of deflation.