Hello,
Thanks for all your replies on my many questions in other topics. I have another one :
I was able to create a model with MINT-PLS DA and I would like to be able to use the prediction function on another samples with some missing values. However, it seems that it’s not possible, it’s true?
So, I think I have only two choices : (1) to replace the missing values with NIPALS or (2) recreate the model without the missing values in my new samples which I want to predict. Is that right?
Jérémy Tournayre
Hi @JeremyTournayre
Thanks for getting back to us.
I tried and I didn’t run into any issues with our datasets:
rand_na <- function(mat,
prop=0.1){ ## proportion of NAs
set.seed(42)
## calculate the total number of NAs
total_na <- floor(prop*prod(dim(mat)))
vec <- as.vector(mat)
vec[sample(seq_along(vec), size = total_na, replace = FALSE)] <- NA
matrix(vec, ncol = ncol(mat), dimnames = dimnames(mat))
## because it should be
}
suppressMessages(library(mixOmics))
data(stemcells)
## -- training set
ind.test = which(stemcells$study == "3")
gene.train = stemcells$gene[-ind.test,]
Y.train = stemcells$celltype[-ind.test]
study.train = factor(stemcells$study[-ind.test])
## -- test set
gene.test = stemcells$gene[ind.test,]
## add NA
gene.test <- rand_na(gene.test)
sum(is.na(gene.test))
#> [1] 840
Y.test = stemcells$celltype[ind.test]
study.test = factor(stemcells$study[ind.test])
res = mint.plsda(X = gene.train, Y = Y.train, ncomp = 3,
study = study.train)
pred = predict(res, newdata = gene.test, study.test = study.test)
head(pred$class$max.dist)
#> comp1 comp2 comp3
#> sample39 "Fibroblast" "Fibroblast" "hESC"
#> sample40 "Fibroblast" "Fibroblast" "Fibroblast"
#> sample41 "Fibroblast" "hESC" "hESC"
#> sample42 "Fibroblast" "Fibroblast" "Fibroblast"
#> sample43 "Fibroblast" "Fibroblast" "Fibroblast"
#> sample44 "Fibroblast" "Fibroblast" "Fibroblast"
## NAs in predictions?
lapply(pred$class, function(x) sum(is.na(x)))
#> $max.dist
#> [1] 0
#>
#> $centroids.dist
#> [1] 0
#>
#> $mahalanobis.dist
#> [1] 0
Created on 2019-11-28 by the reprex package (v0.3.0)
Do you seem to have issue running the above example and getting the results?
Best,
Al
Hello,
Your exemple works! I have made a mistake in my data.frame. Thanks for the quick answer !