`plsda`: NA values in Y data

Hey guys, first of all, thankyou very much for creating mixOmics, I very gratefully use it in my own BioC package autonomics and love it.

Recently I noticed that the PLS X variates plot changes whether or not samples with missing Y values are dropped. I initially thought this is a bug, but then thought this is by design. Given that PLS is a method that looks at the covariance of X and Y, somehow it seems that when Y is missing it falls back to the variance.

Let me give a reproducible example.
First go to github/bhagwataditya/autonomics.
Then download the devel version and install.
Then run:

file <- download_data('atkin18.metabolon.xlsx')
object <- read_metabolon(file)
object$subgroup[object$subgroup == 't2'] <- NA
biplot(pls(object))                                # uses mixOmics::plsda internally
biplot(pls(filter_samples(object, !is.na(subgroup)))   # result differs

hi @Aditya,

Apologies for the late answer, your post ended in spam on the forum for some reason.

I am not sure what your Y includes (one single variable? several? and I can only see sample plots, no biplot so I won’t be able to answer your question very specifically. I would not know where to start from your gitHub repo).

I can only explain what is happening inside the PLS in general when you have missing values. The method performs local regression of each data set onto the components and so when data are missing, they are dropped to 0 in the algorithm as we fit these regression. We explain this in our book in Chapters 9 and 10 if you can get hold of an electronic copy.

Screen Shot 2023-04-14 at 10.08.01

I dont know what is happening in your case or what you are trying to show.


Dear Kim, thank you very much for your response! Your book looks very interesting, thank you for that pointer! And for mixomics off course : ).