# Regression coefficients, PRESS, RSS and Q2 in PLSDA

Hi all, I would like to know how to calculate PRESS and TSS for my PLSDA. The main reason is I would like to follow the methodology in this paper (PDF) Discriminant Q 2 (DQ 2 ) for improved discrimination in PLSDA models

My first issue is i don’t know the meaning of the predicted values (from predict function) - I have tried multiplying the reg.coefficients (from predict(plsdamodel, X)\$B.hat) by the centred and scaled X matrix (plsdamodel\$X), but my predicted values don’t match the predicted values I obtain from predict(plsdamodel, X), and I think I remember reading somewhere that the predict(plsdamodel,X)\$predict would give me the likelihood of belonging to some class - so if this is true, it’s expectable that my values don’t match, but also I can’t really use likelihood to calculate the TSS and PRESS. I need some sort of predicted value that leads to a predicted class, depending on some threshold.

Note: I know PRESS is calculated based on cross-validation. I am just starting by the simplest example to understand where to get the data before going into the perf with CV loo, but the ultimate goal is to 1- get predicted values to be able to 2- calculate PRESS 3- maybe even calculate PRESSD if I manage to code it in R and then 4-get the DQ2 from there, otherwise 2- calculate PRESS and 3- get Q2.

Let me know if I need to clarify this question a bit, I can also create a set of fake data and do the codes if it’s easier.

Many thanks!
Joana

Hi @Joana,

I may not be the best person to ask about this but I’ll see how I go answering it. Looking at the image below, we can see how the `predict()` function calculates the expected score of each sample for each response class. This is due to `mixOmics` encoding all classification responses as dummy matrices. This may make converting to your desired output a bit tricky. Note that you are after Y.hat.new not T.pred.

Looking at the `predict()` function code, line 593 shows how `Ypred` is calculated. If my understanding is correct, the values from this line is what you want. You do not want the transformed version (on line 594) as this converts it to a likelihood.

The first code snippet directly below sets things up. Note that the X and Y datasets were sliced such that this was a two-class problem (as the methodology described in the linked paper cannot apply to multiclass problems):

``````data(breast.TCGA)
X.train <- breast.TCGA\$data.train\$mirna[1:75,]
Y = as.factor(as.vector(breast.TCGA\$data.train\$subtype[1:75]))

ncomp <- 2

res.splsda <- plsda(X.train, Y, ncomp = 2)

X.test <- breast.TCGA\$data.test\$mirna[1:35, ]
``````

This is where you would employ the `predict()` function. Instead, run the following lines:

``````D <- crossprod(scale(X.train), res.splsda\$variates\$X)
B <- crossprod(res.splsda\$ind.mat, res.splsda\$variates\$X)

ncomp.list <- 1:ncomp

Ypred <- list()
for (i in ncomp) {
Ypred = lapply(1 : ncomp.list[i], function(x){scale(X.test) %*% W[, 1:x] %*% solve(t(D[, 1:x]) %*% W[, 1:x]) %*% t(B)[1:x, ]})
}
``````

If we inspect the first few samples of `Ypred` on the first component:

``````head(Ypred[[i]])
#>           Basal       Her2
#> A54N  1.0254366 -1.0254366
#> A2NL  1.9738867 -1.9738867
#> A6VY  0.1388726 -0.1388726
#> A3XT  0.6176910 -0.6176910
#> A1F0  0.1721207 -0.1721207
#> A26I -0.1705571  0.1705571
``````

You’ll see that each sample has a predicted value for each class, but that the magnitudes are equal (for a given sample). You can essentially take the first column and think of it as your y.hat.i from the PRESS formula. If the value is >0, then its predicted as the +1 class (“Basal” in this case) and if its <0 then its the -1 class. You can then apply the PRESSD formula, excluding those which have a magnitude greater than 1.

I hope this helps and feel free to follow up with any more questions you have

Cheers,
Max.

1 Like

Hi @MaxBladen , could I have the link to the source of the image you showed?

Thanks!