hi @NickBliziotis

Keep in mind that those methods are exploratory so we cannot really talk about significance (let alone statistical significance, since we are not testing anything).

How the Q2 is defined in PLS2 is based on the calculation of the Predicted Error Sum of Squares (based on the test set defined during the CV process), PRESS vs the Residual Sum of Squares (calculated directly from the fitted data).

Each is summed over all the Y variables for a given component. You would like to see:

\sqrt(PRESS) < \sqrt(RSS), or, if you want to put some slack \sqrt(PRESS) < 0.95* \sqrt(RSS).

After squaring and rearranging the terms, you come up with

Q^2 = 1 - PRESS/RSS <= 0.95^2 = 0.0975

So if your Q2 is negative, it means that the model is not good at predicting / generalising. It could be because your number of samples is too small during the CV process (even if you use loo, it may give you an unsufficient estimation); or, as you say, because X does not explain Y.

If the Q2 is low, but positive, it means you are still in the right ‘bandwidth’ because PRESS < RSS.

I like to look at the `plotIndiv()`

to work out if the sample scores are similar from X and Y (or you could extract the `$X$variates`

and `$Y$variates`

and plot one against the other for each component. Similar information could be extracted from `plotArrow()`

.

then, only if I see some common information that seems to be extracted, I look at `plotVar()`

to figure out the correlation between specific subsets of variables.

Considering a sparse model with sPLS could also help to filter out some variables. We are currently looking at a new criterion to tune sPLS, hopefully in the next mixOmics update.

Kim-Anh