MINT-PLS - Prediction with missing values (NA)

Hello,

Thanks for all your replies on my many questions in other topics. I have another one :

I was able to create a model with MINT-PLS DA and I would like to be able to use the prediction function on another samples with some missing values. However, it seems that it’s not possible, it’s true?
So, I think I have only two choices : (1) to replace the missing values with NIPALS or (2) recreate the model without the missing values in my new samples which I want to predict. Is that right?

Jérémy Tournayre

Hi @JeremyTournayre

Thanks for getting back to us.

I tried and I didn’t run into any issues with our datasets:

rand_na <- function(mat, 
                    prop=0.1){ ## proportion of NAs
    set.seed(42)
    ## calculate the total number of NAs
    total_na <- floor(prop*prod(dim(mat)))
    vec <- as.vector(mat)
    vec[sample(seq_along(vec), size = total_na, replace = FALSE)] <- NA
    matrix(vec, ncol = ncol(mat), dimnames = dimnames(mat))
    ## because it should be
}
suppressMessages(library(mixOmics))
data(stemcells)

## -- training set
ind.test = which(stemcells$study == "3")
gene.train = stemcells$gene[-ind.test,]
Y.train = stemcells$celltype[-ind.test]
study.train = factor(stemcells$study[-ind.test])

## -- test set
gene.test = stemcells$gene[ind.test,]
## add NA
gene.test <- rand_na(gene.test)
sum(is.na(gene.test))
#> [1] 840

Y.test = stemcells$celltype[ind.test]
study.test = factor(stemcells$study[ind.test])

res = mint.plsda(X = gene.train, Y = Y.train, ncomp = 3, 
                  study = study.train)
pred = predict(res, newdata = gene.test, study.test = study.test)
head(pred$class$max.dist)
#>          comp1        comp2        comp3       
#> sample39 "Fibroblast" "Fibroblast" "hESC"      
#> sample40 "Fibroblast" "Fibroblast" "Fibroblast"
#> sample41 "Fibroblast" "hESC"       "hESC"      
#> sample42 "Fibroblast" "Fibroblast" "Fibroblast"
#> sample43 "Fibroblast" "Fibroblast" "Fibroblast"
#> sample44 "Fibroblast" "Fibroblast" "Fibroblast"
## NAs in predictions?
lapply(pred$class, function(x) sum(is.na(x)))
#> $max.dist
#> [1] 0
#> 
#> $centroids.dist
#> [1] 0
#> 
#> $mahalanobis.dist
#> [1] 0

Created on 2019-11-28 by the reprex package (v0.3.0)
Do you seem to have issue running the above example and getting the results?

Best,

Al

Hello,

Your exemple works! I have made a mistake in my data.frame. Thanks for the quick answer !