PLS-DA: prediction of new upcoming samples

DeniR · January 4, 2023, 2:35pm

Hi there,

I have a practical question regarding PLS-DA/sPLS-DA.

Let’s say I have created a PLS-DA model based on some samples, and want to use it to classify new unlabelled samples. I could not find the way of doing this, is that a possibility?

Thanks

MaxBladen · January 5, 2023, 12:36am

Explore the predict() function. If you’d like some usage examples, look here

DeniR · January 5, 2023, 8:21am

@MaxBladen perfect! That’s what I am looking for, thanks.

DeniR · January 6, 2023, 9:58am

One additional question from the tutorial you have linked.
It is about splitting the dataset for training and testing.

I have in total 184 samples. Did the split as instructed in the tutorial, where for training I have allocated 140 samples, and the rest is testing set (44 samples)

train ← sample(1:nrow(X), 140) # randomly select 140 samples in training
test ← setdiff(1:nrow(X), train)

store matrices into training and test set:

X.train ← X[train, ]
X.test ← X[test,]
Y.train ← Y[train]
Y.test ← Y[test]

After training

train.splsda.h2s ← splsda(X.train, Y.train, ncomp = optimal.ncomp, keepX = optimal.keepX)

and testing

predict.splsda.h2s ← predict(train.splsda.h2s, X.test, dist = “mahalanobis.dist”)

evaluation with the confusion matrix shows only 6 samples.

predict.comp2 ← predict.splsda.h2s$class$mahalanobis.dist[,2]
table(factor(predict.comp2, levels = levels(Y)), Y.test)

I would expect here evaluation of all 44 test samples. Why is that?

MaxBladen · January 9, 2023, 9:38pm

So the confusion matrix only sums to 6? Are you sure nrow(X) returns 184?

My best guess is that some of the values in predict.comp2 are NAs. If not, then I don’t know what would be causing that.

Let me know if you cant resolve it and I’ll look into it.

DeniR · July 12, 2023, 2:35pm

Hi @MaxBladen,
Sorry for the very late answer. I have not found my way around this actually.

Yes, The confusion matrix sums up to 6 samples. nrow(X) (before splitting data) returns 184, and the nrow of train and test datasets returns expected values, i.e, 140 and 44, respectively.

Yes, that is true, most of predict.comp2 are NaNs, except for the ones that are shown in confusion matrix. The same pattern one can find in predict.splsda.h2s

kimanh.lecao · July 13, 2023, 10:41pm

Hi @DeniR

Yes, have a look at the link that Max gave earlier, in the section ‘Prediction’. In that case we artificially created a test set from the original data. In your case that would be a new data set.
The issue is how you would normalise your new unlabelled samples first, without overfitting.

Kim-Anh

DeniR · July 14, 2023, 8:14am

Hi @kimanh.lecao,

Thanks for following up on this.

The problem I am facing is in evaluation output. As mentioned earlier, I have my training and test datasets. I have trained the model with the training set (140 samples), and used it for prediction of the test dataset (44 samples). However the output of the confusion matrix recognises in total only 6 samples. Meaning 38 have no prediction- empty rows.

kimanh.lecao · July 21, 2023, 12:35am

ok that is weird. If you want I can have a look at your data and code if you send me your .RData and everything I need to rerun that part (i.e the final PLS-DA model and the prediction).

Kim-Anh

Fabien-Filaire · July 22, 2023, 2:40pm

Hello @kimanh.lecao ,

I have the same issue. Strangely, I found my way around it by uploading a single data file and manually dividing the training and test data. I don’t understand why it’s working now but at least I have results…

Best

Fabien

kimanh.lecao · July 27, 2023, 11:12pm

@Fabien-Filaire @DeniR,

Thanks @Fabien-Filaire, then I think there is something weird happening when you divide the training / test data sets directly into R, i.e droplevels(Y) has not been applied, or something of that kind (i.e some information still remains in memory and has not been ‘cut’ properly.

Kim-Anh

Topic		Replies	Views
PLS-DA prediction when the sample class are unknown Support	2	28	October 18, 2024
Prediction PLS-DA Analysis	3	987	July 4, 2021
PLS-DA classification Analysis	1	308	July 27, 2022
PLS-DA with missing '' values predicted in Y Analysis	1	723	April 26, 2020
sPLS-DA metagenomic data category prediction Analysis	1	406	June 15, 2022

PLS-DA: prediction of new upcoming samples

store matrices into training and test set:

Related topics