Plot cross-validated scores

Hi,

I would like to plot cross-validated scores from a PLS-DA model, is there an easy way to do this?

Thank you,
Kiana

Hi @kw124,
You should be able to from the output $predict of the perf() function that performs cross-validation for all samples (or one of the many many outputs from perf() :slight_smile:), then plot them. We use style = 'graphics' in plotIndiv() to be able to then overlay the predicted samples.
Another option you could consider is background.predict() (have a look at the help file):

data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])

plsda.liver <- plsda(X, Y, ncomp = 2)

# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")

plotIndiv(plsda.liver, background = background, legend = TRUE)

Kim-Anh

Dear Kim-Anh,

Like kw124, I try to plot scores from a cross-validation from a PLS-DA model. In your example,

plotIndiv(plsda.liver , comp = 1:2,
group = Y, ind.names = FALSE,
ellipse = TRUE, legend = TRUE, pch=1)

creates a plot for the observations in the model, based on the scores saved in plsda.liver$variates$X. I am trying to make a similar plot but using the scores from a cross-validation from perf() – I believe this would give a somewhat more conservative or less optimistic intuitive plot of the predictive performance of the model.

But I can’t figure out how to do it – the $predict output from the perf() function appears predict classification, not scores? Is it possible to read out the scores in some way?

It should be possible to do this manually by building several models, each time saving the scores of the left-out predicted samples, and plot these together. But I guess precisely this is done somewhere inside the perf()?

Thanks a lot for a great mixOmics package!

Best,
Axel

hi @ax20,

We have a couple of examples given into ?predict.plsda

This below should extract the predicted scores.

data(liver.toxicity)
X <- liver.toxicity$gene
Y <- as.factor(liver.toxicity$treatment[, 4])

## if training is performed on 4/5th of the original data
samp <- sample(1:5, nrow(X), replace = TRUE)
test <- which(samp == 1)   # testing on the first fold
train <- setdiff(1:nrow(X), test)

plsda.train <- plsda(X[train, ], Y[train], ncomp = 2)
plsda.predict <- predict(plsda.train, X[test, ], dist = "max.dist")
plsda.predict$variates

Also, there is another example to overlay the training samples with the test samples. You would need to specify type = 'graphics' in plotIndiv() to do so (the ggplot style does not allow this).

Hope that helps,

Kim-Anh

Hi Kim-Anh,

indeed - this was very helpful. Thanks so much!

Axel

Hi again,

I post my code, based on Kim-Anh’s suggestion, below in case someone else is looking for a similar plot. Is a bit of a hack, I manually supply what appears to be essential for plotIndiv() to an object which I then assign the same class as plsda models have, without really understanding what is going on in plotIndiv(). I verified this approach by manually re-creating the original scores plot in the same way - but use at your own risk.

According to my taste, this plot from cross-validation scores is useful because it provides some visual safeguard against overfitting models – in such cases, some ellipses would get larger compared to the original scores plot. In the example, mostly the green ellipse gets slightly wider, apparently due to animal 412.

Best,
Axel

data(liver.toxicity)
X <- liver.toxicity$gene
Y <- as.factor(liver.toxicity$treatment[, 4])

#create PLS-DA model and scores plot
plsda.liver <- plsda(X, Y, ncomp = 2)
plotIndiv(plsda.liver , comp = 1:2,
          group = Y, ind.names = TRUE,
          ellipse = TRUE, title = "Original scores plot")
          
#Collect scores from leave-one-out cross validation (CV), modified for a 
#leave-one-out cross-validation from Kim-Anh's code above 

samp<-1:64 #64 samples in data set

for (i in 1:64) {
  test <- which(samp == i)   # testing on the i-th sample
  train <- setdiff(1:nrow(X), test)
  
  plsda.train <- plsda(X[train, ], Y[train], ncomp = 2)
  plsda.predict <- predict(plsda.train, X[test, ], dist = "max.dist")
  
  if(i==1) {CVscores<-plsda.predict$variates}
  if(i>1)  {CVscores<-rbind(CVscores, plsda.predict$variates)}
}
#make sure CVscores is ordered if different CV groups are chosen
#CVscores<-CVscores[order(row.names(CVscores)),]

#manually create object of class "plsda" from CV scores
plsda.CV<-list("X"=matrix(nrow=64, ncol=3116), "variates"=list("X"=CVscores), 
               "names"= list("sample"= rownames(CVscores)))
attr(plsda.CV, "class")<-c("mixo_plsda", "mixo_pls", "DA" )

#create scores plot from CV scores
plotIndiv(plsda.CV , comp = 1:2,
          group = Y, ind.names = TRUE,
          ellipse = TRUE, title="plot of scores from leave-one-out crossvalidation")

#a simpler version of the plot (not shown)
plot(CVscores, col=as.numeric(as.factor(Y)))

thanks for sharing @ax20,
We will see whether we can showcase this either on our vignette, mixOmics.org tutorials and help files (possibilities are endless). We have used similar in some of our publications (e.g. DIABLO).

Kim-Anh