PLS-DA: prediction of new upcoming samples

Hi there,

I have a practical question regarding PLS-DA/sPLS-DA.

Let’s say I have created a PLS-DA model based on some samples, and want to use it to classify new unlabelled samples. I could not find the way of doing this, is that a possibility?


Explore the predict() function. If you’d like some usage examples, look here

@MaxBladen perfect! That’s what I am looking for, thanks.

One additional question from the tutorial you have linked.
It is about splitting the dataset for training and testing.

I have in total 184 samples. Did the split as instructed in the tutorial, where for training I have allocated 140 samples, and the rest is testing set (44 samples)

train ← sample(1:nrow(X), 140) # randomly select 140 samples in training
test ← setdiff(1:nrow(X), train)

store matrices into training and test set:

X.train ← X[train, ]
X.test ← X[test,]
Y.train ← Y[train]
Y.test ← Y[test]

After training

train.splsda.h2s ← splsda(X.train, Y.train, ncomp = optimal.ncomp, keepX = optimal.keepX)

and testing

predict.splsda.h2s ← predict(train.splsda.h2s, X.test, dist = “mahalanobis.dist”)

evaluation with the confusion matrix shows only 6 samples.

predict.comp2 ← predict.splsda.h2s$class$mahalanobis.dist[,2]
table(factor(predict.comp2, levels = levels(Y)), Y.test)

I would expect here evaluation of all 44 test samples. Why is that?

So the confusion matrix only sums to 6? Are you sure nrow(X) returns 184?

My best guess is that some of the values in predict.comp2 are NAs. If not, then I don’t know what would be causing that.

Let me know if you cant resolve it and I’ll look into it.