Projecting new samples onto pca.plsda space

montoyam · January 22, 2021, 8:06pm

Hi,

Thank you for creating the mixomics package, it has been really helpful in multiomics analysis. I was wondering if there was a way of projecting new samples into an existing splsda space. Let me develop the question:

We have a sample processing methodology that is supposed to enrich for cancer derived CNV. Each patient that we analyze ends up being divided into four different samples and only one of them should be more “cancer like”. In adittion, I have CNV data (read counts) from both healthy patients and cancer patients as my positive and negative controls. I performed both a pca and a splsda on the controls to select for cancer specific differences. The cluster perfectly without the need of many features. Now I would like to project the patients that we processed into the splsda space that I created with my controls. What I’m expecting is that the more “cancer like” samples will cluster closer to the cancer controls compared to the other three. Is there a way to do something along these lines?

Thanks, sorry if the question is a bit confusing.

Jose

christoa · January 23, 2021, 12:40pm

Hi @montoyam,

Assuming that you observe perfect clustering using PCA, why don’t you try to create a sPCA or PCA model with all your data together in the first place? If the enriched samples cluster with positive controls, and the non-enriched samples cluster with negative controls, i would not go any further, as this is by far the most superior way to demonstrate that your sample processing methodology works.

Otherwise see this example using the predict function:

data(liver.toxicity)
X <- liver.toxicity$gene
Y <- as.factor(liver.toxicity$treatment[, 4])

samp <- sample(1:4, nrow(X), replace = TRUE)
test <- which(samp == 1)
train <- setdiff(1:nrow(X), test)

plsda.train <- plsda(X[train, ], Y[train], ncomp = 2)
plsda.predict <- predict(plsda.train, X[test, ], dist = "max.dist")
plsda.predict$variates

background <- background.predict(plsda.train, comp.predicted = 2)

plotIndiv(plsda.train, comp = 1:2, rep.space = "X-variate",style="graphics",ind.names=FALSE, background = background, pch = 1)
points(plsda.predict$variates[, 1], plsda.predict$variates[, 2], pch = 4, cex = 1) 
text(plsda.predict$variates[, 1], plsda.predict$variates[, 2], rownames(plsda.predict[["MajorityVote"]][["max.dist"]]), pos = 1, cex = 0.7)

In your case you would just skip the data splitting and use data from the four different samples as the test dataset instead. I am curious to see if this works, since the response variables in the test and train datasets are comparable, yet two different things.

Christopher

montoyam · January 26, 2021, 8:21pm

Hi Christopher,

it worked nicely! Thank you so much for the help. I had to play around with the data since I was running into the “system is computationally singular” error but I imagined it was because I had multicollinearity due to having a large matrix with many similar values. After reducing my dataset to only differential variables, the whole thing worked.

Thanks for the help,

Jose

Topic		Replies	Views
Project method for (si)pca and consistent element names for pca and (si)pca Suggestions for improvement	3	445	September 16, 2021
Small samples and non omics Analysis	4	488	June 17, 2020
Biplot for (s)PLSDA Support	1	410	December 2, 2021
Multilevel PLSDA- Avoid overfitting on small sample size experiment Analysis	1	1268	August 28, 2022
PLSDA on small sample size, and OPLSDA Analysis	1	595	June 23, 2023

Projecting new samples onto pca.plsda space

Related topics